Global Surface Summary of the Day ('GSOD') Weather Data Client

Provides automated downloading, parsing, cleaning, unit conversion and formatting of Global Surface Summary of the Day ('GSOD') weather data from the from the USA National Centers for Environmental Information ('NCEI') for use in R. Units are converted from from United States Customary System ('USCS') units to International System of Units ('SI'). Stations may be individually checked for number of missing days defined by the user, where stations with too many missing observations are omitted. Only stations with valid reported latitude and longitude values are permitted in the final data. Additional useful elements, saturation vapour pressure ('es'), actual vapour pressure ('ea') and relative humidity are calculated from the original data and included in the final data set. The resulting data include station identification information, state, country, latitude, longitude, elevation, weather observations and associated flags. Additional data are included with this R package: a list of elevation values for stations between -60 and 60 degrees latitude derived from the Shuttle Radar Topography Measuring Mission ('SRTM'). For information on the 'GSOD' data from 'NCEI', please see the 'GSOD' 'readme.txt' file available from, < http://www1.ncdc.noaa.gov/pub/data/gsod/readme.txt>.


CircleCI Build Status Build status codecov DOI CRAN_Status_Badge Project Status: Active – The project has reached a stable, usable state and is being actively developed. JOSS

The GSOD or Global Surface Summary of the Day (GSOD) data provided by the US National Centers for Environmental Information (NCEI) are a valuable source of weather data with global coverage. However, the data files are cumbersome and difficult to work with. GSODR aims to make it easy to find, transfer and format the data you need for use in analysis and provides five main functions for facilitating this:

  • get_GSOD() - this function queries and transfers files from the NCEI's FTP server, reformats them and returns a tidy data frame in R. NOTE If you have used file exporting capabilities in versions prior to 1.2.0, these have been removed now in the latest version. This means less dependencies when installing. Examples of how to export the data are found in the GSODR vignette.

  • reformat_GSOD() - this function takes individual station files from the local disk and re-formats them returning a tidy data frame in R

  • nearest_stations() - this function returns a vector of station IDs that fall within the given radius (kilometres) of a point given as latitude and longitude

  • update_station_list() - this function downloads the latest station list from the NCEI's FTP server updates the package's internal database of stations and their metadata.

  • get_inventory() - this function downloads the latest station inventory information from the NCEI's FTP server and returns the header information about the latest version as a message in the console and a tidy data frame of the stations' inventory for each month that data are reported.

When reformatting data either with get_GSOD() or reformat_GSOD(), all units are converted to International System of Units (SI), e.g., inches to millimetres and Fahrenheit to Celsius. File output is returned as a tibble(), summarising each year by station, which also includes vapour pressure and relative humidity elements calculated from existing data in GSOD. Additional data are calculated by this R package using the original data and included in the final data. These include vapour pressure (ea and es) and relative humidity.

For more information see the description of the data provided by NCEI, http://www7.ncdc.noaa.gov/CDO/GSOD_DESC.txt.

Quick Start Install

Stable Version

A stable version of GSODR is available from CRAN.

install.packages("GSODR")

Development Version

A development version is available from from GitHub. If you wish to install the development version that may have new features or bug fixes before the CRAN version does (but also may not work properly), please install the remotes package, available from CRAN. We strive to keep the master branch on GitHub functional and working properly.

if (!require("remotes")) {
  install.packages("remotes", repos = "http://cran.rstudio.com/")
  library("remotes")
}
 
install_github("ropensci/GSODR")

Other Sources of Weather Data in R

There are several other sources of weather data and ways of retrieving them through R. Several are also rOpenSci projects.

The GSODTools by Florian Detsch is an R package that offers similar functionality as GSODR, but also has the ability to graph the data and working with data for time series analysis.

The gsod package from DataBrew aims to streamline the way that researchers and data scientists interact with and utilize weather data and relies on [GSODR], but provides data in the package rather than downloading so it is faster (though available data may be out of date).

rnoaa, from rOpenSci offers tools for interacting with and downloading weather data from the United States National Oceanic and Atmospheric Administration but lacks support for GSOD data.

bomrang, from rOpenSci provides functions to interface with Australia Government Bureau of Meteorology (BoM) data, fetching current and historical data including précis and marine forecasts, current weather data from stations, agriculture bulletin data, BoM 0900 or 1500 weather bulletins and satellite and radar imagery.

riem from rOpenSci allows to get weather data from Automated Surface Observing System (ASOS) stations (airports) in the whole world thanks to the Iowa Environment Mesonet website.

weathercan from rOpenSci makes it easier to search for and download multiple months/years of historical weather data from Environment and Climate Change Canada (ECCC) website.

CliFlo from rOpenSci is a web portal to the New Zealand National Climate Database and provides public access (via subscription) to around 6,500 various climate stations (see https://cliflo.niwa.co.nz/ for more information). Collating and manipulating data from CliFlo (hence clifro) and importing into R for further analysis, exploration and visualisation is now straightforward and coherent. The user is required to have an Internet connection, and a current CliFlo subscription (free) if data from stations, other than the public Reefton electronic weather station, is sought.

weatherData provides a selection of functions to fetch weather data from Weather Underground and return it as a clean data frame.

Notes

Other Data Sources

Elevation Values

90 m hole-filled SRTM digital elevation (Jarvis et al. 2008) was used to identify and correct/remove elevation errors in data for station locations between -60˚ and 60˚ latitude. This applies to cases here where elevation was missing in the reported values as well. In case the station reported an elevation and the DEM does not, the station reported is taken. For stations beyond -60˚ and 60˚ latitude, the values are station reported values in every instance. See https://github.com/ropensci/GSODR/blob/master/data-raw/fetch_isd-history.md for more detail on the correction methods.

WMO Resolution 40. NOAA Policy

Users of these data should take into account the following (from the NCEI website):

international commercial use. They can be used within the U.S. or for non-commercial international activities without restriction. The non-U.S. data cannot be redistributed for commercial purposes. Re-distribution of these data by others must provide this same notification." WMO Resolution 40. NOAA Policy

Meta

  • Please report any issues or bugs.

  • License: MIT

  • To cite GSODR, please use: Adam H Sparks, Tomislav Hengl and Andrew Nelson (2017). GSODR: Global Summary Daily Weather Data in R. The Journal of Open Source Software, 2(10). DOI: 10.21105/joss.00177.

  • Please note that the GSODR project is released with a Contributor Code of Conduct. By participating in the GSODR project you agree to abide by its terms.

References

Jarvis, A., Reuter, H. I., Nelson, A., Guevara, E. (2008) Hole-filled SRTM for the globe Version 4, available from the CGIAR-CSI SRTM 90m Database (http://srtm.csi.cgiar.org)

ropensci

News

GSODR 1.3.2

Bug fixes

  • Fixes a bug where extra data could be appended to dataframe. See https://github.com/ropensci/GSODR/issues/49. This also means that when you are retrieving large amounts of data, e.g. global data for 20+ years, you won't fill up your hard disk space due to the raw data before processing.

Minor changes

  • Update internal database of station locations

GSODR 1.3.1

Bug fixes

  • Fix examples that did not run properly

Minor changes

  • Update internal database of station locations

GSODR 1.3.0

New Functionality

  • Use future_apply in processing files after downloading. This allows for end users to use a parallel process of their choice.

GSODR 1.2.3

Bug fixes

  • Refactor internal functionality to be more clear and efficient in execution

    • country-list is not loaded unless user has specified a country in get_GSOD()

    • An instance where the FIPS code was determined twice was removed

  • Replace \dontrun{} with \donttest{} in documentation examples

  • Ensure that DESCRIPTION file follows CRAN guidelines

Minor changes

  • Format help files, fixing errors and formatting for attractiveness

  • Update internal database of station locations

  • Store internal database of station locations fields BEGIN and END as integer, not double

  • Clarify code of conduct statement in README that it only applies to this, GSODR, project

  • Prompt user for input with warning about reproducibility if using the update_station_list() function

  • Adds metadata header to the tibble returned by get_inventory()

  • Remove startup message to conform with rOpenSci guidelines

  • Remove extra code, clean up code-chunks and use hrbrthemes::theme_ipsum() for data-raw/fetch_isd-history.md


GSODR 1.2.2

Bug fixes

  • Fix bug in creating isd-history.rda file where duplicate stations existed in the file distributed with GSODR but with different corrected elevation values

  • Repatch bug reported and fixed previously in version 1.2.0 where Windows users could not successfully download files. This somehow snuck back in.

Minor changes

  • Refactor vignettes for clarity

GSODR 1.2.1

Bug fixes

  • Introduce a message if a station ID is requested but files are not found on the server. This is in response to an inquiry from John Paul Bigouette where a station is reported as having data in the inventory but the files do not exist on the server.

  • Fix bug that removed a few hundred stations from the internal GSODR database of stations in the data-raw files.

Minor changes

  • Clean documentation, shortening long lines, fixing formatting, incomplete sentences and broken links

  • Clarify the reasons for errors that a user may encounter

  • Update internal databases of station metadata

  • Clean up this file

GSODR 1.2.0

Major changes

  • Remove ability to export files from get_GSOD() to slim down the package dependencies and this functions parameters. Examples of how to convert to a spatial object (both sp and sf are shown) and export ESRI Shapefiles and GeoPackage files are now included in the vignette.

  • As a result of the previous point, the sp and rgdal packages are no longer Imports but are now in Suggests along with sf for examples in the GSOD vignette.

Bug fixes

  • Fix a nasty bug where GSOD files downloaded using Windows would not untar properly. This caused the get_GSOD() function to fail. Thanks to Ross Darnell, CSIRO, for reporting this.

  • Correct options in "GSODR use case: Specified years/stations vignette" on line 201 where file was incorrectly used in place of path. Thanks to Ross Darnell, CSIRO, for reporting this.

  • Correct documentation for reformat_GSOD()

Minor changes

  • Update internal databases of station metadata

  • Vignettes contain pre-built figures for faster package installation when building vignettes


GSODR 1.1.2

Bug fixes

  • Fix startup message formatting

  • Correct ORCID comment in author field of DESCRIPTION

  • Update internal databases for country list and isd_history

Minor changes

  • Add X-schema tags to DESCRIPTION

GSODR 1.1.1

Bug fixes

  • MAX_FLAG and MIN_FLAG columns now report NA when there is no flag

Minor changes

  • Comment for Bob and Hugh in DESCRIPTION now only ORCID url

  • dplyr version set to >= 0.7.0 not 0.7 as before

  • Start-up message statement is more clear in relation to WMO resolution 40, that GSODR does not redistribute any weather data itself

  • Remove unnecessary function, .onLoad(), from zzz.R

  • Function titles in documentation now in title case

  • Correct grammar in documentation


GSODR 1.1.0

Bug fixes

Major changes

  • The data.table and fields packages are no longer imported. All internal functions now use dplyr or base R functionality, reducing the dependencies of GSODR

  • Any data frames returned by GSODR functions are returned as a tibble() object

  • The YEARMODA column is now returned as Date without time, rather than Character

  • Add new function, get_inventory(), which downloads the NCEI's station inventory document and returns a tibble() object of the data

  • Use larger images and provide a table of contents in vignettes

  • Updated and enhanced introductory vignette

  • Update internal stations list


GSODR 1.0.7

Bug fixes

  • Fix documentation in vignette where first example would not run due to changes in package data formats

  • Fix bug in GSODR vignette where examples would not run due to libraries not being loaded

  • Fix bug where prior server queries would be pre/appended to subsequent queries

  • Fix bug where invalid stations would return an empty dataframe, should stop and return message about checking the station value supplied to get_GSOD() and check if data are available for the years requested

Minor changes

  • Update Appendix 2 of GSODR vignette, map of station locations, to be more clear and follow same format as that of bomrang package

  • Update example output in GSODR vignette where applicable

Major changes

  • Update internal stations list

GSODR 1.0.6

Bug fixes

  • Fix bug where WSPD (mean windspeed) conversion was miscalculated

GSODR 1.0.5

Major changes

  • Add welcome message on startup regarding data use and sharing

  • Update internal stations list

Minor changes

  • Tidy up informative messages that the package returns while running

Bug fixes

  • Fix bug where "Error in read_connection_(con):" when writing to CSV occurs

  • Fix typo in line 160 of get_GSOD() where "Rda" should be "rda" to properly load internal package files

GSODR 1.0.4

Major changes

  • Data distributed with GSODR are now internal to the package and not externally exposed to the user

  • Vignettes have been updated and improved with an improved order of information presented and some have been combined for easier use

Minor changes

  • Clean code using linting

GSODR 1.0.3

Major changes

  • Data for station locations and unique identifiers is now provided with the package on installation. Previously this was fetched each time from the ftp server.

  • The station metadata can now be updated if necessary by using update_station_list(), this change overwrites the internal data that were originally distributed with the package. This operation will fetch the latest list of stations and corresponding information from the NCEI ftp server. Any changes will be overwritten when the R package is updated, however, the package update should have the same or newer data included, so this should not be an issue.

  • Replace plyr functions with purrr, plyr is no longer actively developed

  • plyr is no longer an import

Minor changes

  • Fix bugs in the vignettes related to formatting and spelling

Deprecated and defunct

  • get_station_list() is no longer supported. Instead use the new

  • update_station_list() to update the package's internal station database.


GSODR 1.0.2.1

Minor changes

  • Correct references to GSODRdata package where incorrectly referred to as GSODdata

GSODR 1.0.2

Minor changes

  • Improved documentation (i.e., spelling corrections and more descriptive)

  • More descriptive vignette for "GSODR use case: Specified years/stations vignette"

  • Round MAX/MIN temp to one decimal place, not two

  • Update SRTM elevation data

  • Update country list data

  • Fix missing images in README.html on CRAN


GSODR 1.0.1

Minor changes

  • Update documentation for get_GSOD() when using station parameter

  • Edit paper.md for submission to JOSS

  • Remove extra packages listed as dependencies that are no longer necessary

  • Correct Working_with_spatial_and_climate_data.Rmd where it was missing the first portion of documentation and thus examples did not work


GSODR 1.0.0

Major changes

  • The get_GSOD() function returns a data.frame object in the current R session with the option to save data to local disk

  • Multiple stations can be specified for download rather than just downloading a single station or all stations

  • A new function, nearest_stations() is now included to find stations within a user specified radius (in kilometres) of a point given as latitude and longitude in decimal degrees

  • A general use vignette is now included

  • New vignette with a detailed use-case

  • Output files now include fields for State (US only) and Call (International Civil Aviation Organization (ICAO) Airport Code)

  • Use FIPS codes in place of ISO3c for file name and in output files because some stations do not have an ISO country code

  • Spatial file output is now in GeoPackage format (GPKG). This results in a single file output unlike shapefile and allows for long field names

  • Users can specify file name of output

  • R >= 3.2.0 now required

  • Field names in output files use "_" in place of "."

  • Long field names now used in file outputs

  • Country is specified using FIPS codes in file name and output file contents due to stations occurring in some locales that lack ISO 3166 3 letter country codes

  • The get_GSOD() function will retrieve the latest station data from NCDC and automatically merge it with the CGIAR-CSI SRTM elevation values provided by this package. Previously, the package provided it's own list of station information, which was difficult to keep up-to-date

  • A new reformat_GSOD() function reformats station files in "WMO-WBAN-YYYY.op.gz" format that have been downloaded from the United States National Climatic Data Center's (NCDC) FTP server.

  • A new function, get_station_list() allows for fetching latest station list from the FTP server and querying by the user for a specified station or location.

  • New data layers are provided through a separate package, GSODRdata, which provide climate data formatted for use with GSODR.

  • Improved file handling for individual station downloads

  • Missing values are handled as NA not -9999

  • Change from GPL >= 3 to MIT licence to bring into line with ropensci packages

  • Now included in ropensci, ropensci/GSODR

Minor changes

  • get_GSOD() function optimised for speed as best possible after FTPing files from NCDC server

  • All files are downloaded from server and then locally processed, previously these were sequentially downloaded by year and then processed

  • A progress bar is now shown when processing files locally after downloading

  • Reduced package dependencies

  • The get_GSOD() function now checks stations to see if the years being queried are provided and returns a message alerting user if the station and years requested are not available

  • When stations are specified for retrieval using the station = "" parameter, the get_GSOD() function now checks to see if the file exists on the server, if it does not, a message is returned and all other stations that have files are processed and returned in output

  • Documentation has been improved throughout package

  • Better testing of internal functions

Bug Fixes

  • Fixed: Remove redundant code in get_GSOD() function

  • Fixed: The stations data frame distributed with the package now include stations that are located above 60 latitude and below -60 latitude

Deprecated and defunct

  • Missing values are reported as NA for use in R, not -9999 as previously

  • The path parameter is now instead called dsn to be more inline with other tools like readOGR() and writeOGR()

  • Shapefile file out is no longer supported. Use GeoPackage (GPKG) instead

  • The option to remove stations with too many missing days is now optional, it now defaults to including all stations, the user must specify how many missing stations to check for an exclude

  • The max_missing parameter is now user set, defaults to no check, return all stations regardless of missing days


GSODR 0.1.9

Bug Fixes

  • Fix bug in precipitation calculation. Documentation states that PRCP is in mm to hundredths. Issues with conversion and missing values meant that this was not the case. Thanks to Gwenael Giboire for reporting and help with fixing this

Minor changes

  • Users can now select to merge output for station queries across multiple years. Previously one year = one file per station. Now are set by user, merge_station_years = TRUE parameter, only one output file is generated

  • Country list is now included in the package to reduce run time necessary when querying for a specific country. However, this means any time that the country-list.txt file is updated, this package needs to be updated as well

  • Updated stations list with latest version from NCDC published 12-07-2016

  • Country list is now included in the package to reduce run time necessary when querying for a specific country. However, this means any time that the country-list.txt file is updated, this package needs to be updated as well

  • Country level, agroclimatology and global data query conversions and calculations are processed in parallel now to reduce runtime

  • Improved documentation with spelling fixes, clarification and updates

  • Enable ByteCompile option upon installation for small increase in speed

  • Use write.csv.raw from [iotools]("https://cran.r-project.org/web/packages/iotools/index.html") to greatly improve runtime by decreasing time used to write CSV files to disk

  • Use writeOGR() from rgdal, in place of raster's shapefile to improve runtime by decreasing time used to write shapefiles to disk


  • Country level, agroclimatology and global data query conversions and calculations are processed in parallel now to reduce runtime

  • Improved documentation with spelling fixes, clarification and updates

  • Enable ByteCompile option upon installation for small increase in speed

  • Use write.csv.raw from [iotools]("https://cran.r-project.org/web/packages/iotools/index.html") to greatly improve runtime by decreasing time used to write CSV files to disk


GSODR 0.1.8

Bug Fixes

  • Fix bug with connection timing out for single station queries commit: a126641e00dc7acc21844ff0436e5702f8b6e04a

  • Somehow the previously working function that checked country names broke with the toupper() function. A new function from juba fixes this issue and users can now select country again

  • User entered values for a single station are now checked against actual station values for validity

  • stations.rda is compressed

  • stations.rda now includes a field for "corrected" elevation using hole-filled SRTM data from Jarvis et al. 2008, see https://github.com/ropensci/GSODR/blob/master/data-raw/fetch_isd-history.md for a description

  • Set NA or missing values in CSV or shapefile to -9999 from -9999.99 to align with other data sources such as Worldclim

Minor changes

  • Documentation is more complete and easier to use

GSODR 0.1.7

Bug Fixes

  • Fix issues with MIN/MAX where MIN referred to MAX (Issue 5)

  • Fix bug where the tf item was incorrectly set as tf < - "~/tmp/GSOD-2010.tar, not tf < - tempfile, in get_GSOD() (Issue 6)

  • CITATION file is updated and corrected

Minor changes

  • User now has the ability to generate a shapefile as well as CSV file output (Issue 3)

  • Documentation is more complete and easier to use


GSODR 0.1.6

Bug Fixes

  • Fix issue when reading .op files into R where temperature was incorrectly read causing negative values where T >= 100F, this issue caused RH values of

    100% and incorrect TEMP values (Issue 1)

  • Spelling corrections

Major changes

  • Include MIN/MAX flag column

  • Station data is now included in package rather than downloading from NCDC every time get_GSOD() is run, this data has some corrections where stations with missing LAT/LON values or elevation are omitted, this is not the original complete station list provided by NCDC


GSODR 0.1.5

Bug Fixes

  • Fixed bug where YDAY not correctly calculated and reported in CSV file

  • CSV files for station only queries now are names with the Station Identifier. Previously named same as global data

  • Likewise, CSV files for agroclimatology now are names with the Station Identifier. Previously named same as global data

Minor Changes

  • Set values where MIN > MAX to NA

  • Set more MIN/MAX/DEWP values to NA. GSOD README indicates that 999 indicates missing values in these columns, this does not appear to always be true. There are instances where 99 is the value recorded for missing data. While 99F is possible, the vast majority of these recorded values are missing data, thus the function now converts them to NA


GSODR 0.1.4

Bug Fixes

  • Fixed bug related to MIN/MAX columns when agroclimatology or all stations are selected where flags were not removed properly from numeric values.

Minor Changes


GSODR 0.1.3

Bug fixes

  • Bug fix in MIN/MAX with flags. Some columns have differing widths, which caused a flag to be left attached to some values

  • Correct URL in README.md for CRAN to point to CRAN not GitHub

Minor Changes

  • Set NA to -9999.99

GSODR 0.1.2

Bug Fixes

  • Bug fix in importing isd-history.csv file. Previous issues caused all lat/lon/elev values to be >0.

  • Bug fix where WDSP was mistyped as WDPS causing the creation of a new column, rather than the conversion of the existing

  • Bug fix if Agroclimatology selected. Previously this resulted in no records.

  • Set the default encoding to UTF8.

  • Bug fix for country selection. Some countries did not return proper ISO code.

  • Bug fix where WDSP was mistyped as WDPS causing the creation of a new column, rather than the conversion of the existing

  • Use write.csv, not readr::write_csv due to issue converting double to string: https://github.com/hadley/readr/issues/387


GSODR 0.1.1

Major changes

  • Now available on CRAN

  • Add single quotes around possibly misspelled words and spell out comma-separated values and geographic information system rather than just using "CSV" or "GIS" in DESCRIPTION.

  • Add full name of GSOD (Global Surface Summary of the Day) and URL for GSOD, https://data.noaa.gov/dataset/dataset/global-surface-summary-of-the-day-gsod/ to DESCRIPTION as requested by CRAN.

  • Require user to specify directory for resulting .csv file output so that any files written to disk are interactive and with user's permission


GSODR 0.1

  • Initial submission to CRAN

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("GSODR")

1.3.2 by Adam Sparks, 6 months ago


https://github.com/ropensci/GSODR, https://ropensci.github.io/GSODR/


Report a bug at https://github.com/ropensci/GSODR/issues


Browse source code at https://github.com/cran/GSODR


Authors: Adam Sparks [aut, cre] , Tomislav Hengl [aut] , Andrew Nelson [aut] , Hugh Parsonage [cph, ctb] , Bob Rudis [cph, ctb] , Gwenael Giboire [ctb] (Several bug reports in early versions and testing feedback) , Łukasz Pawlik [ctb] (Reported bug in windspeed conversion calculation) , Ross Darnell [ctb] (Reported bug in 'Windows OS' versions causing 'GSOD' data untarring to fail)


Documentation:   PDF Manual  


Task views: Hydrological Data and Modeling


MIT + file LICENSE license


Imports curl, dplyr, future.apply, magrittr, purrr, R.utils, readr, rlang, stats, tibble, utils

Suggests covr, future, ggplot2, ggthemes, gridExtra, knitr, lubridate, mapproj, maps, plotKML, raster, reshape2, rgdal, rgeos, rmarkdown, roxygen2, sf, sp, spacetime, testthat, tidyr


See at CRAN