Global Summary Daily Weather Data in R

Provides automated downloading, parsing, cleaning, unit conversion and formatting of Global Surface Summary of the Day (GSOD) weather data from the from the USA National Centers for Environmental Information (NCEI) for use in R. Units are converted from from United States Customary System (USCS) units to International System of Units (SI). Stations may be individually checked for number of missing days defined by the user, where stations with too many missing observations are omitted. Only stations with valid reported latitude and longitude values are permitted in the final data. Additional useful elements, saturation vapour pressure (es), actual vapour pressure (ea) and relative humidity are calculated from the original data and included in the final data set. The resulting data include station identification information, state, country, latitude, longitude, elevation, weather observations and associated flags. Data may be automatically saved to disk. File output may be returned as a comma-separated values (CSV) or GeoPackage (GPKG) file. Additional data are included with this R package: a list of elevation values for stations between -60 and 60 degrees latitude derived from the Shuttle Radar Topography Measuring Mission (SRTM). For information on the GSOD data from NCEI, please see the GSOD readme.txt file available from, < http://www1.ncdc.noaa.gov/pub/data/gsod/readme.txt>.


An R package that provides a function that automates downloading and cleaning data from the "Global Surface Summary of the Day (GSOD)" data provided by the US National Climatic Data Center (NCDC). Stations are individually checked for number of missing days to assure data quality, stations with too many missing observations are omitted. All units are converted to metric, e.g., inches to milimetres and Fahrenheit to Celsius. Output is saved as a Comma Separated Value (CSV) file or ESRI format shapefile summarizing each year by station, which includes vapor pressure and relative humidity variables calculated from existing data in GSOD.

This package was largely based on Tomislav Hengl's work in "A Practical Guide to Geostatistical Mapping", with updates for speed, cross-platform functionality, and more options for data retrieval and error correction.

Be sure to have disk space free and allocate the proper time for this to run. This is a time, RAM and disk space intensive process. For any query of GSOD data other than a single station, the process runs in parallel to clean and reformat the data.

For more information see the description of the data provided by NCDC, http://www7.ncdc.noaa.gov/CDO/GSOD_DESC.txt.

A stable release of GSODR is available on CRAN.

install.packages("GSODR", dependencies = TRUE)

If you wish to install the development version that may have new features (but also may not work properly), install the devtools package, available from CRAN.

install.packages("devtools", dependencies = TRUE)
devtools::install_github("adamhsparks/GSODR")

Output

This package consists of a single function, get_GSOD(), which generates a .csv file or ESRI format shapefile in the respective year directory containing the following variables:
STNID - Station number (WMO/DATSAV3 number) for the location;
WBAN - number where applicable--this is the historical "Weather Bureau Air Force Navy" number - with WBAN being the acronym;
STN.NAME - Unique text identifier;
CTRY - Country;
LAT - Latitude. Station dropped in cases where values are <-90 or >90 degrees or Lat = 0 and Lon = 0;
LON - Longitude. Station dropped in cases where values are <-180 or >180 degrees or Lat = 0 and Lon = 0;
ELEV.M - Elevation converted to metres.
ELEV.M.SRTM.90m - Elevation in metres corrected for possible errors, see Notes for more;
YEARMODA - Date in YYYY-MM-DD format;
YEAR - The year;
MONTH - The month;
DAY - The day;
YDAY - Sequential day of year (not in original GSOD);
TEMP - Mean daily temperature converted to degrees C to tenths. Missing = -9999;
TEMP.CNT - Number of observations used in calculating mean daily temperature;
DEWP- Mean daily dewpoint converted to degrees C to tenths. Missing = -9999;
DEWP.CNT - Number of observations used in calculating mean daily dew point;
SLP - Mean sea level pressure in millibars to tenths. Missing = -9999;
SLP.CNT - Number of observations used in calculating mean sea level pressure;
STP - Mean station pressure for the day in millibars to tenths. Missing = -9999;
STP.CNT - Number of observations used in calculating mean station pressure;
VISIB - Mean visibility for the day converted to kilometers to tenths Missing = -9999;
VISIB.CNT - Number of observations used in calculating mean daily visibility;
WDSP - Mean daily wind speed value converted to metres/second to tenths Missing = -9999;
WDSP.CNT - Number of observations used in calculating mean daily windspeed;
MXSPD - Maximum sustained wind speed reported for the day converted to metres/second to tenths. Missing = -9999;
GUST - Maximum wind gust reported for the day converted to metres/second to tenths. Missing = -9999;
MAX - Maximum temperature reported during the day converted to Celsius to tenths--time of max temp report varies by country and region, so this will sometimes not be the max for the calendar day. Missing = -9999;
MAX.FLAG - Blank indicates max temp was taken from the explicit max temp report and not from the 'hourly' data. * indicates max temp was derived from the hourly data (i.e., highest hourly or synoptic-reported temperature);
MIN- Minimum temperature reported during the day converted to Celsius to tenths--time of min temp report varies by country and region, so this will sometimes not be the max for the calendar day. Missing = -9999; ;
MIN.FLAG - Blank indicates max temp was taken from the explicit max temp report and not from the 'hourly' data. * indicates max temp was derived from the hourly data (i.e., highest hourly or synoptic-reported temperature);
PRCP - Total precipitation (rain and/or melted snow) reported during the day converted to millimetres to hundredths; will usually not end with the midnight observation--i.e., may include latter part of previous day. .00 indicates no measurable precipitation (includes a trace). Missing = -9999; Note: Many stations do not report '0' on days with no precipitation-- therefore, '-9999' will often appear on these days. For example, a station may only report a 6-hour amount for the period during which rain fell. See FLAGS.PRCP column for source of data;
PRCP.FLAG -
A = 1 report of 6-hour precipitation amount;
B = Summation of 2 reports of 6-hour precipitation amount;
C = Summation of 3 reports of 6-hour precipitation amount;
D = Summation of 4 reports of 6-hour precipitation amount;
E = 1 report of 12-hour precipitation amount;
F = Summation of 2 reports of 12-hour precipitation amount;
G = 1 report of 24-hour precipitation amount;
H = Station reported '0' as the amount for the day (eg., from 6-hour reports), but also reported at least one occurrence of precipitation in hourly observations--this could indicate a trace occurred, but should be considered as incomplete data for the day;
I = Station did not report any precip data for the day and did not report any occurrences of precipitation in its hourly observations--it's still possible that precip occurred but was not reported;
SNDP - Snow depth in millimetres to tenths. Missing = -9999;
I.FOG - Indicator for fog, (1 = yes, 0 = no/not reported) for the occurrence during the day;
I.RAIN_DZL - Indicator for rain or drizzle, (1 = yes, 0 = no/not reported) for the occurrence during the day;
I.SNW_ICE - Indicator for snow or ice pellets, (1 = yes, 0 = no/not reported) for the occurrence during the day;
I.HAIL - Indicator for hail, (1 = yes, 0 = no/not reported) for the occurrence during the day;
I.THUNDER - Indicator for thunder, (1 = yes, 0 = no/not reported) for the occurrence during the day;
I.TDO_FNL - Indicator for tornado or funnel cloud, (1 = yes, 0 = no/not reported) for the occurrence during the day;

ea - Mean daily actual vapour pressure;
es - Mean daily saturation vapour pressure;
RH - Mean daily relative humidity;

90m hole-filled SRTM digital elevation (Jarvis et al. 2008) was used to identify and correct/remove elevation errors in data for station locations between -60˚ and 60˚ latitude. This applies to cases here where elevation was missing in the reported values as well. In case the station reported an elevation and the DEM does not, the station reported is taken. For stations beyond -60˚ and 60˚latitude, the values are station reported values in every instance. See https://github.com/adamhsparks/GSODR/blob/devel/data-raw/fetch_isd-history.md for more detail on the correction methods.

Users of these data should take into account the following (from the NCDC website):

international commercial use. They can be used within the U.S. or for non-commercial international activities without restriction. The non-U.S. data cannot be redistributed for commercial purposes. Re-distribution of these data by others must provide this same notification." WMO Resolution 40. NOAA Policy

# Download weather station for Toowoomba, Queensland for 2010, save resulting
# file, GSOD-955510-99999-2010.csv, in the user's home directory.
 
get_GSOD(years = 2010, station = "955510-99999", path = "~/")
 
 
# Download global GSOD data for agroclimatology work for years 2009 and 2010
# and generate yearly summary files, GSOD-agroclimatology-2010.csv and
# GSOD-agroclimatology-2011.csv, in the user's home directory with a maximum
# of five missing days per weather station allowed.
 
get_GSOD(years = 2010:2011, path = "~/", agroclimatology = TRUE)
 
 
# Download data for Philippines for year 2010 and generate a yearly
# summary file, GSOD-PHL-2010.csv, file in the user's home directory with a
# maximum of five missing days per station allowed.
 
get_GSOD(years = 2010, country = "Philippines", path = "~/")

References

Jarvis, A, HI Reuter, A Nelson, E Guevara, 2008, Hole-filled SRTM for the globe Version 4, available from the CGIAR-CSI SRTM 90m Database (http://srtm.csi.cgiar.org)

News

GSODR v0.1.9

  • Fix bug in precipitation calculation. Documentation states that PRCP is in mm to hundredths. Issues with conversion and missing values meant that this was not the case. Thanks to Gwenael Giboire for reporting and help with fixing this
  • Users can now select to merge output for station queries across multiple years. Previously one year = one file per station. Now were set by user, merge_station_years = TRUE parameter, only one output file is generated
  • Country list is now included in the package to reduce run time necessary when querying for a specific country. However, this means any time that the country-list.txt file is updated, this package needs to be updated as well
  • Updated stations list with latest version from NCDC published 12-07-2016
  • Country level, agroclimatology and global data query conversions and calculations are processed in parallel now to reduce runtime
  • Improved documentation with spelling fixes, clarification and updates
  • Enable ByteCompile option upon installation for small increase in speed
  • Use write.csv.raw from [iotools]("https://cran.r-project.org/web/packages/iotools/index.html") to greatly improve runtime by decreasing time used to write CSV files to disk
  • Use writeOGR from rgdal, in place of raster's shapefile to improve runtime by decreasing time used to write shapefiles to disk

GSODR v0.1.8.1 (Release Date: 2016-07-07)

  • Fix bug where no station is specified, function fails to run

GSODR v0.1.8 (Release Date: 2016-07-04)

  • Fix bug with connection timing out for single station queries commit: a126641e00dc7acc21844ff0436e5702f8b6e04a
  • Somehow the previously working function that checked country names broke with the toupper() function. A new function from juba fixes this issue and users can now select country again
  • User entered values for a single station are now checked against actual station values for validity
  • stations.rda is compressed
  • stations.rda now includes a field for "corrected" elevation using hole-filled SRTM data from Jarvis et al. 2008, see https://github.com/adamhsparks/GSODR/blob/devel/data-raw/fetch_isd-history.md
    for a description
  • Set NA or missing values in CSV or shapefile to -9999 from -9999.99 to align with other data sources such as Worldclim
  • Documentation is more complete and easier to use

GSODR v0.1.7 (Release Date: 2016-06-02)

  • Fix issues with MIN/MAX where MIN referred to MAX (Issue 5)
  • Fix bug where the tf item was incorrectly set as tf <- "~/tmp/GSOD-2010.tar, not tf <- tempfile, in get_GSOD (Issue 6)
  • CITATION file is updated and corrected
  • User now has the ability to generate a shapefile as well as CSV file output (Issue 3)
  • Documentation is more complete and easier to use

GSODR v0.1.6 (Release date: 2016-05-26)

  • Fix issue when reading .op files into R where temperature was incorrectly read causing negative values where T >= 100F, this issue caused RH values of >100% and incorrect TEMP values (Issue 1)
  • Spelling corrections
  • Include MIN/MAX flag column
  • Station data is now included in package rather than downloading from NCDC every time get_GSOD() is run, this data has some corrections where stations with missing LAT/LON values or elevation are omitted, this is not the original complete station list provided by NCDC

GSODR v0.1.5 (Release date: 2016-05-16)

  • Fixed bug where YDAY not correctly calculated and reported in CSV file
  • CSV files for station only queries now are names with the Station Identifier. Previously named same as global data
  • Likesise, CSV files for agroclimatology now are names with the Station Identifier. Previously named same as global data
  • Set values where MIN > MAX to NA
  • Set more MIN/MAX/DEWP values to NA. GSOD README indicates that 999 indicates missing values in these columns, this does not appear to always be true. There are instances where 99 is the value recorded for missing data. While 99F is possible, the vast majority of these recorded values are missing data, thus the function now converts them to NA

GSODR v0.1.4 (Release date: 2016-05-09)

  • Fixed bug related to MIN/MAX columns when agroclimatology or all stations are selected where flags were not removed properly from numeric values.
  • Add more detail to DESCRIPTION regarding flags found in original GSOD data.

GSODR v0.1.3 (Release date: 2016-05-06)

  • Bug fix in MIN/MAX with flags. Some columns have differing widths, which caused a flag to be left attached to some values
  • Correct URL in README.md for CRAN to point to CRAN not GitHub
  • Set NA to -9999.99

GSODR v0.1.2 (Release date: 2016-05-05)

  • Bug fix in importing isd-history.csv file. Previous issues caused all lat/lon/elev values to be >0.
  • Bug fix where WDSP was mistyped as WDPS causing the creation of a new column, rather than the conversion of the existing
  • Bug fix if Agroclimatology selected. Previously this resulted in no records.
  • Set the default encoding to UTF8.
  • Bug fix for country selection. Some countries did not return proper ISO code.
  • Use write.csv, not readr::write_csv due to issue converting double to string: https://github.com/hadley/readr/issues/387

GSODR v0.1.1 (Release date: 2016-04-21)

  • Now available on CRAN
  • Add single quotes around possibly misspelled words and spell out comma-separated values and geographic information system rather than just using "CSV" or "GIS" in DESCRIPTION.
  • Add full name of GSOD (Global Surface Summary of the Day) and URL for GSOD, https://data.noaa.gov/dataset/global-surface-summary-of-the-day-gsod to DESCRIPTION as requested by CRAN.
  • Require user to specify directory for resulting .csv file output so that any files written to disk are interactive and with user's permission

GSODR v0.1 (Release date: 2016-04-18)

  • Initial submission to cran

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("GSODR")

1.0.5 by Adam Sparks, 3 hours ago


https://github.com/ropensci/GSODR


Report a bug at https://github.com/ropensci/GSODR/issues


Browse source code at https://github.com/cran/GSODR


Authors: Adam Sparks [aut, cre] (http://orcid.org/0000-0002-0061-8359), Tomislav Hengl [aut] (http://orcid.org/0000-0002-9921-5129), Andrew Nelson [aut] (http://orcid.org/0000-0002-7249-3778)


Documentation:   PDF Manual  


MIT + file LICENSE license


Imports curl, data.table, dplyr, fields, magrittr, purrr, R.utils, readr, rgdal, sp, stats, utils

Suggests ggplot2, knitr, lubridate, plotKML, raster, reshape2, rgeos, rmarkdown, roxygen2, spacetime, testthat, tibble, tidyr, covr


See at CRAN