Automated Cleaning of Occurrence Records from Biological Collections

Automated flagging of common spatial and temporal errors in biological and paleontological collection data, for the use in conservation, ecology and paleontology. Includes automated tests to easily flag (and exclude) records assigned to country or province centroid, the open ocean, the headquarters of the Global Biodiversity Information Facility, urban areas or the location of biodiversity institutions (museums, zoos, botanical gardens, universities). Furthermore identifies per species outlier coordinates, zero coordinates, identical latitude/longitude and invalid coordinates. Also implements an algorithm to identify data sets with a significant proportion of rounded coordinates. Especially suited for large data sets.


Build Status codecov.io CRAN_Status_Badge downloads rstudio mirror downloads Project Status: Active – The project has reached a stable, usable state and is being actively developed.

Note: the documentation of CoordinateCleaner has moved to https://ropensci.github.io/CoordinateCleaner/.

Automated flagging of common spatial and temporal errors in biological and palaeontological collection data, for the use in conservation, ecology and palaeontology. Specifically includes tests for

  • General coordinate validity
  • Country and province centroids
  • Capital coordinates
  • Coordinates of biodiversity institutions
  • Spatial outliers
  • Temporal outliers
  • Coordinate-country discordance
  • Duplicated coordinates per species
  • Assignment to the location of the GBIF headquarters
  • Urban areas
  • Seas
  • Plain zeros
  • Equal longitude and latitude
  • Rounded coordinates
  • DDMM to DD.DD coordinate conversion errors
  • Large temporal uncertainty (fossils)
  • Equal minimum and maximum ages (fossils)
  • Spatio-temporal outliers (fossils)

CoordinateCleaner can be particularly useful to improve data quality when using data from GBIF (e.g. obtained with rgbif) or the Paleobiology database (e.g. obtained with paleobioDB) for historical biogeography (e.g. with BioGeoBEARS or phytools), automated conservation assessment (e.g. with speciesgeocodeR or conR) or species distribution modelling (e.g. with dismo or sdm). See scrubr and taxize for complementary taxonomic cleaning or biogeo for correcting spatial coordinate errors. You can find a detailed comaprison of the functionality of CoordinateCleaner, scrubr, and biogeo here.

See News for update information.

Installation

Stable from CRAN

install.packages("CoordinateCleaner")
library(CoordinateCleaner)

Developmental using devtools

devtools::install_github("ropensci/CoordinateCleaner")
library(CoordinateCleaner)

Usage

A simple example:

# Simulate example data
minages <- runif(250, 0, 65)
exmpl <- data.frame(species = sample(letters, size = 250, replace = TRUE),
                    decimallongitude = runif(250, min = 42, max = 51),
                    decimallatitude = runif(250, min = -26, max = -11),
                    min_ma = minages,
                    max_ma = minages + runif(250, 0.1, 65),
                    dataset = "clean")
 
# Run record-level tests
rl <- clean_coordinates(x = exmpl)
summary(rl)
plot(rl)
 
# Dataset level 
dsl <- clean_dataset(exmpl)
 
# For fossils
fl <- clean_fossils(x = exmpl,
                          taxon = "species",
                          lon = "decimallongitude", 
                          lat = "decimallatitude")
summary(fl)
 
# Alternative example using the pipe
library(tidyverse)
 
cl <- exmpl %>%
  cc_val()%>%
  cc_cap()%>%
  cd_ddmm()%>%
  cf_range(lon = "decimallongitude", 
           lat = "decimallatitude", 
           taxon  ="species")

Documentation

Pipelines for cleaning data from the Global Biodiversity Information Facility (GBIF) and the Paleobiology Database (PaleobioDB) are available in here.

Contributing

See the CONTRIBUTING document.

Citation

Zizka A, Silvestro D, Andermann T, Azevedo J, Duarte RItter C, Edler D, Farooq H, Herdean A, Ariza M, Scharn R, Svanteson S, Wengtrom N, Zizka V & Antonelli A (2018) CoordinateCleaner: standardized cleaning of occurrence records from biological collection databases. https://github.com/ropensci/CoordinateCleaner

ropensci_footer

News

CoordinateCleaner 2.0-3 (2018-10-22)

MINOR IMPROVEMENTS

  • Removed convenience functionality to only download data from rnaturalearth at first use, to comply with CRAN guidelines

CoordinateCleaner 2.0-2 (2018-10-12)

MAJOR IMPROVEMENTS

  • tutorial on outlier detection on the bookdown documentation
  • tutorial on using custom gazetteers
  • rasterisation heuristic in cc_outl
  • added sampling correction to cc_outl
  • added verify option to cc_inst
  • transfer to rOpenSci

MINOR IMPROVEMENTS

  • reduced packages size, by switching to data download from rnauralearth for urbanareas and landmass
  • fixed issue with names of plot.spatialvalid
  • grouped functions on documentation webpage
  • fixed broken links in the help pages ' improved documentation structure

CoordinateCleaner 2.0-1 (2018-06-08)

MAJOR IMPROVEMENTS

  • changed and more consistent naming scheme for the functions

MINOR IMPROVEMENTS

  • fixed typos in Readme
  • set a download from naturalearth as default for cc_urb
  • reduced vignette memory use and size
  • enables sf format for sustom references
  • added speedup option for cc_sea
  • added webpage (https://azizka.github.io/CoordinateCleaner/)

CoordinateCleaner 1.2-1 (2018-06-08)

MAJOR IMPROVEMENTS

  • Adapted function and argument names consistently to underscore_case
  • Simplified internal code structure of wrapper functions

MINOR IMPROVEMENTS

  • adapted package to Ropensci reviews

DEPRECATED AND DEFUNCT

  • CleanCoordinates deprecated, replaced by clean_coordinates
  • CleanCoordinatesDS deprecated, replaced by clean_dataset
  • CleanCoordinatesFOS deprecated, replaced by clean_fossils
  • WritePyrate deprecated, replaced by writ_pyrate

CoordinateCleaner 1.1-1 (2018-05-15)

MINOR IMPROVEMENTS

  • Switched documentation and NAMESPACE generation to roxygen2
  • Switched from sapply to vapply
  • Improved code readability

CoordinateCleaner 1.1-0 (2018-04-08)

NEW FEATURES

MINOR IMPROVEMENTS

  • Adaption of code to ROpensci guidelines

BUG FIXES

DEPRECATED AND DEFUNCT

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("CoordinateCleaner")

2.0-3 by Alexander Zizka, 3 months ago


https://ropensci.github.io/CoordinateCleaner/


Report a bug at https://github.com/ropensci/CoordinateCleaner/issues


Browse source code at https://github.com/cran/CoordinateCleaner


Authors: Alexander Zizka [aut, cre] , Daniele Silvestro [ctb] , Tobias Andermann [ctb] , Josue Azevedo [ctb] , Camila Duarte Ritter [ctb] , Daniel Edler [ctb] , Harith Farooq [ctb] , Andrei Herdean [ctb] , Maria Ariza [ctb] , Ruud Scharn [ctb] , Sten Svanteson [ctb] , Niklas Wengstrom [ctb] , Vera Zizka [ctb] , Alexandre Antonelli [ctb] , Irene Steves [rev] (Irene reviewed the package for ropensci , see <https://github.com/ropensci/onboarding/issues/210>) , Francisco Rodriguez-Sanchez [rev] (Francisco reviewed the package for ropensci , see <https://github.com/ropensci/onboarding/issues/210>)


Documentation:   PDF Manual  


GPL-3 license


Imports dplyr, geosphere, ggplot2, graphics, methods, raster, rgeos, rgdal, rnaturalearth, stats, sp, tidyselect, utils

Suggests countrycode, covr, knitr, maps, paleobioDB, rgbif, rmarkdown, testthat


See at CRAN