Automated Cleaning of Occurrence Records from Biological Collections

Automated flagging of common spatial and temporal errors in biological and paleontological collection data, for the use in conservation, ecology and paleontology. Includes automated tests to easily flag (and exclude) records assigned to country or province centroid, the open ocean, the headquarters of the Global Biodiversity Information Facility, urban areas or the location of biodiversity institutions (museums, zoos, botanical gardens, universities). Furthermore identifies per species outlier coordinates, zero coordinates, identical latitude/longitude and invalid coordinates. Also implements an algorithm to identify data sets with a significant proportion of rounded coordinates. Especially suited for large data sets. The reference for the methodology is: Zizka et al. (2019) doi:10.1111/2041-210X.13152.


Build Status codecov.io CRAN_Status_Badge downloads rstudio mirror downloads Project Status: Active – The project has reached a stable, usable state and is being actively developed. DOI

There was a bug in the cc_outl function in previsous versions, so for outlier testing make sure to use version 2.0-9 or higher.

Automated flagging of common spatial and temporal errors in biological and palaeontological collection data, for the use in conservation, ecology and palaeontology. Specifically includes tests for

  • General coordinate validity
  • Country and province centroids
  • Capital coordinates
  • Coordinates of biodiversity institutions
  • Spatial outliers
  • Temporal outliers
  • Coordinate-country discordance
  • Duplicated coordinates per species
  • Assignment to the location of the GBIF headquarters
  • Urban areas
  • Seas
  • Plain zeros
  • Equal longitude and latitude
  • Rounded coordinates
  • DDMM to DD.DD coordinate conversion errors
  • Large temporal uncertainty (fossils)
  • Equal minimum and maximum ages (fossils)
  • Spatio-temporal outliers (fossils)

CoordinateCleaner can be particularly useful to improve data quality when using data from GBIF (e.g. obtained with rgbif) or the Paleobiology database (e.g. obtained with paleobioDB) for historical biogeography (e.g. with BioGeoBEARS or phytools), automated conservation assessment (e.g. with speciesgeocodeR or conR) or species distribution modelling (e.g. with dismo or sdm). See scrubr and taxize for complementary taxonomic cleaning or biogeo for correcting spatial coordinate errors. You can find a detailed comparison of the functionality of CoordinateCleaner, scrubr, and biogeo here.

See News for update information.

Installation

Stable from CRAN

install.packages("CoordinateCleaner")
library(CoordinateCleaner)

Developmental using devtools

devtools::install_github("ropensci/CoordinateCleaner")
library(CoordinateCleaner)

Usage

A simple example:

# Simulate example data
minages <- runif(250, 0, 65)
exmpl <- data.frame(species = sample(letters, size = 250, replace = TRUE),
                    decimallongitude = runif(250, min = 42, max = 51),
                    decimallatitude = runif(250, min = -26, max = -11),
                    min_ma = minages,
                    max_ma = minages + runif(250, 0.1, 65),
                    dataset = "clean")
 
# Run record-level tests
rl <- clean_coordinates(x = exmpl)
summary(rl)
plot(rl)
 
# Dataset level 
dsl <- clean_dataset(exmpl)
 
# For fossils
fl <- clean_fossils(x = exmpl,
                          taxon = "species",
                          lon = "decimallongitude", 
                          lat = "decimallatitude")
summary(fl)
 
# Alternative example using the pipe
library(tidyverse)
 
cl <- exmpl %>%
  cc_val()%>%
  cc_cap()%>%
  cd_ddmm()%>%
  cf_range(lon = "decimallongitude", 
           lat = "decimallatitude", 
           taxon  ="species")

Documentation

Pipelines for cleaning data from the Global Biodiversity Information Facility (GBIF) and the Paleobiology Database (PaleobioDB) are available in here.

Contributing

See the CONTRIBUTING document.

Citation

Zizka A, Silvestro D, Andermann T, Azevedo J, Duarte Ritter C, Edler D, Farooq H, Herdean A, Ariza M, Scharn R, Svanteson S, Wengtrom N, Zizka V & Antonelli A (in press) CoordinateCleaner: standardized cleaning of occurrence records from biological collection databases. Methods in Ecology and Evolution, 0:0-0, doi:10.1111/2041-210X.13152, https://github.com/ropensci/CoordinateCleaner

ropensci_footer

News

CoordinateCleaner 2.0-11 (2019-04-24)

MINOR IMPROVEMENTS

  • changes to the description file

CoordinateCleaner 2.0-10 (2019-04-23)

MINOR IMPROVEMENTS

  • improved error handling by cc_sea and cc_urb, in case the defaultreference cannot be obtained from the web
  • added a reference for the methodology to the description file

CoordinateCleaner 2.0-9 (2019-04-02)

MINOR IMPROVEMENTS

  • recoded cc_outl, and added a thinning argument to account for sampling bias
  • fixed a bug with the cc_outl test, that produced erroneous flags under some settings of mltpl
  • extended the example dataset for the coordinate level-test suite to be more realistic

CoordinateCleaner 2.0-8 (2019-03-21)

MINOR IMPROVEMENTS

  • moved vignettes to online documentation
  • added an area column to the countryref dataset
  • fixed some minor spelling issues in the documentation

CoordinateCleaner 2.0-7 (2019-01-22)

MINOR IMPROVEMENTS

  • added citation
  • reduced testing time on CRAN
  • improved documentation of the cc_outl function

CoordinateCleaner 2.0-6 (2019-01-16)

MINOR IMPROVEMENTS

  • further url fixes

CoordinateCleaner 2.0-5 (2019-01-15)

MINOR IMPROVEMENTS

  • fixed broken url to the CIA factbook

CoordinateCleaner 2.0-4 (2019-01-14)

MINOR IMPROVEMENTS

  • minor bugfix with cc_cap
  • corrected duplicated vignette index entries
  • updated maintainer email

CoordinateCleaner 2.0-3 (2018-10-22)

MINOR IMPROVEMENTS

  • removed convenience functionality to only download data from rnaturalearth at first use, to comply with CRAN guidelines

CoordinateCleaner 2.0-2 (2018-10-12)

MAJOR IMPROVEMENTS

  • tutorial on outlier detection on the bookdown documentation
  • tutorial on using custom gazetteers
  • rasterisation heuristic in cc_outl
  • added sampling correction to cc_outl
  • added verify option to cc_inst
  • transfer to rOpenSci

MINOR IMPROVEMENTS

  • reduced packages size, by switching to data download from rnaturalearth for urbanareas and landmass
  • fixed issue with names of plot.spatialvalid
  • grouped functions on documentation webpage
  • fixed broken links in the help pages ' improved documentation structure

CoordinateCleaner 2.0-1 (2018-06-08)

MAJOR IMPROVEMENTS

  • changed and more consistent naming scheme for the functions

MINOR IMPROVEMENTS

  • fixed typos in Readme
  • set a download from naturalearth as default for cc_urb
  • reduced vignette memory use and size
  • enables sf format for sustom references
  • added speedup option for cc_sea
  • added webpage (https://azizka.github.io/CoordinateCleaner/)

CoordinateCleaner 1.2-1 (2018-06-08)

MAJOR IMPROVEMENTS

  • Adapted function and argument names consistently to underscore_case
  • Simplified internal code structure of wrapper functions

MINOR IMPROVEMENTS

  • adapted package to rOpenSci reviews

DEPRECATED AND DEFUNCT

  • CleanCoordinates deprecated, replaced by clean_coordinates
  • CleanCoordinatesDS deprecated, replaced by clean_dataset
  • CleanCoordinatesFOS deprecated, replaced by clean_fossils
  • WritePyrate deprecated, replaced by write_pyrate

CoordinateCleaner 1.1-1 (2018-05-15)

MINOR IMPROVEMENTS

  • Switched documentation and NAMESPACE generation to roxygen2
  • Switched from sapply to vapply
  • Improved code readability

CoordinateCleaner 1.1-0 (2018-04-08)

NEW FEATURES

MINOR IMPROVEMENTS

  • Adaption of code to rOpenSci guidelines

BUG FIXES

DEPRECATED AND DEFUNCT

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("CoordinateCleaner")

2.0-11 by Alexander Zizka, 8 months ago


https://ropensci.github.io/CoordinateCleaner/


Report a bug at https://github.com/ropensci/CoordinateCleaner/issues


Browse source code at https://github.com/cran/CoordinateCleaner


Authors: Alexander Zizka [aut, cre] , Daniele Silvestro [ctb] , Tobias Andermann [ctb] , Josue Azevedo [ctb] , Camila Duarte Ritter [ctb] , Daniel Edler [ctb] , Harith Farooq [ctb] , Andrei Herdean [ctb] , Maria Ariza [ctb] , Ruud Scharn [ctb] , Sten Svanteson [ctb] , Niklas Wengstrom [ctb] , Vera Zizka [ctb] , Alexandre Antonelli [ctb] , Irene Steves [rev] (Irene reviewed the package for ropensci , see <https://github.com/ropensci/onboarding/issues/210>) , Francisco Rodriguez-Sanchez [rev] (Francisco reviewed the package for ropensci , see <https://github.com/ropensci/onboarding/issues/210>)


Documentation:   PDF Manual  


GPL-3 license


Imports dplyr, geosphere, ggplot2, graphics, methods, raster, rgbif, rgeos, rgdal, rnaturalearth, stats, sp, tidyselect, utils

Suggests covr, knitr, maps, rmarkdown, testthat

System requirements: GDAL (>= 2.0.1)


See at CRAN