Extract and Tidy Canadian 'Hydrometric' Data

Provides functions to access historical and real-time national 'hydrometric' data from Water Survey of Canada data sources (< http://dd.weather.gc.ca/hydrometric/csv/> and < http://collaboration.cmc.ec.gc.ca/cmc/hydrometrics/www/>) and then applies tidy data principles.


dev License Travis-CI BuildStatus Coveragestatus

CRAN_Status_Badge CRANDownloads cranchecks

DOI DOI

Project Status

This package is maintained by the Knowledge Management Branch of the British Columbia Ministry of Environment and Climate Change Strategy.

What does tidyhydat do?

  • Provides functions (hy_*) that access hydrometric data from the HYDAT database, a national archive of Canadian hydrometric data and return tidy data.
  • Provides functions (realtime_*) that access Environment and Climate Change Canada’s real-time hydrometric data source.
  • Provides functions (search_*) that can search through the approximately 7000 stations in the database and aid in generating station vectors
  • Keep functions as simple as possible. For example, for daily flows, the hy_daily_flows() function queries the database, tidies the data and returns a tibble of daily flows.

Installation

You can install tidyhydat from CRAN:

install.packages("tidyhydat")

To install the development version of the tidyhydat package, you need to install the remotes package then the tidyhydat package

if(!requireNamespace("devtools")) install.packages("devtools")
devtools::install_github("ropensci/tidyhydat")

Usage

A more thorough vignette can be found on the tidyhydat CRAN page.

When you install tidyhydat, several other packages will be installed as well. One of those packages, dplyr, is useful for data manipulations and is used regularly here. To use dplyr, it is required to be loaded by itself. A helpful dplyr tutorial can be found here.

library(tidyhydat)
library(dplyr)

HYDAT download

To use many of the functions in the tidyhydat package you will need to download a version of the HYDAT database, Environment and Climate Change Canada’s database of historical hydrometric data then tell R where to find the database. Conveniently tidyhydat does all this for you via:

download_hydat()

This downloads (with your permission) the most recent version of HYDAT and then saves it in a location on your computer where tidyhydat’s function will look for it. Do be patient though as this takes a long time! To see where HYDAT was saved you can run hy_dir(). Now that you have HYDAT downloaded and ready to go, you are all set to begin looking at Canadian hydrometric data.

Most functions in tidyhydat follow a common argument structure. We will use the hy_daily_flows() function for the following examples though the same approach applies to most functions in the package (See help(package = "tidyhydat") for a list of exported objects). Much of the functionality of tidyhydat originates with the choice of hydrometric stations that you are interested in. A user will often find themselves creating vectors of station numbers. There are several ways to do this.

The simplest case is if you would like to extract only station. You can supply this directly to the station_number argument:

hy_daily_flows(station_number = "08LA001")
#> No start and end dates specified. All dates available will be returned.
#> All station successfully retrieved
#> # A tibble: 29,159 x 5
#>    STATION_NUMBER Date       Parameter Value Symbol
#>    <chr>          <date>     <chr>     <dbl> <chr> 
#>  1 08LA001        1914-01-01 Flow        144 <NA>  
#>  2 08LA001        1914-01-02 Flow        144 <NA>  
#>  3 08LA001        1914-01-03 Flow        144 <NA>  
#>  4 08LA001        1914-01-04 Flow        140 <NA>  
#>  5 08LA001        1914-01-05 Flow        140 <NA>  
#>  6 08LA001        1914-01-06 Flow        136 <NA>  
#>  7 08LA001        1914-01-07 Flow        136 <NA>  
#>  8 08LA001        1914-01-08 Flow        140 <NA>  
#>  9 08LA001        1914-01-09 Flow        140 <NA>  
#> 10 08LA001        1914-01-10 Flow        140 <NA>  
#> # ... with 29,149 more rows

Another method is to use hy_stations() to generate your vector which is then given the station_number argument. For example, we could take a subset for only those active stations within Prince Edward Island (Province code: PE) and then create vector which is passed to the multi-parameter function hy_daily(). This function queries the flow, level, sediment load and suspended sediment concentration tables and combines them (if present) into one dataframe:

PEI_stns <- hy_stations() %>%
  filter(HYD_STATUS == "ACTIVE") %>%
  filter(PROV_TERR_STATE_LOC == "PE") %>%
  pull_station_number()
#> All station successfully retrieved
 
PEI_stns
#> [1] "01CA003" "01CB002" "01CB004" "01CC002" "01CC005" "01CC010" "01CD005"
 
hy_daily(station_number = PEI_stns)
#> # A tibble: 123,225 x 5
#>    STATION_NUMBER Date       Parameter Value Symbol
#>    <chr>          <date>     <chr>     <dbl> <chr> 
#>  1 01CA003        1961-08-01 Flow         NA <NA>  
#>  2 01CA003        1961-08-02 Flow         NA <NA>  
#>  3 01CA003        1961-08-03 Flow         NA <NA>  
#>  4 01CA003        1961-08-04 Flow         NA <NA>  
#>  5 01CA003        1961-08-05 Flow         NA <NA>  
#>  6 01CA003        1961-08-06 Flow         NA <NA>  
#>  7 01CA003        1961-08-07 Flow         NA <NA>  
#>  8 01CA003        1961-08-08 Flow         NA <NA>  
#>  9 01CA003        1961-08-09 Flow         NA <NA>  
#> 10 01CA003        1961-08-10 Flow         NA <NA>  
#> # ... with 123,215 more rows

We can also merge our station choice and data extraction into one unified pipe which accomplishes a single goal. For example, if for some reason we wanted all the stations in Canada that had the name “Canada” in them we could unify those selection and data extraction processes into a single pipe:

search_stn_name("canada") %>%
  pull_station_number() %>%
  hy_daily_flows()
#> No start and end dates specified. All dates available will be returned.
#> The following station(s) were not retrieved: 07DB006
#> Check station number typos or if it is a valid station in the network
#> # A tibble: 77,044 x 5
#>    STATION_NUMBER Date       Parameter Value Symbol
#>    <chr>          <date>     <chr>     <dbl> <chr> 
#>  1 01AK001        1918-08-01 Flow      NA    <NA>  
#>  2 01AK001        1918-08-02 Flow      NA    <NA>  
#>  3 01AK001        1918-08-03 Flow      NA    <NA>  
#>  4 01AK001        1918-08-04 Flow      NA    <NA>  
#>  5 01AK001        1918-08-05 Flow      NA    <NA>  
#>  6 01AK001        1918-08-06 Flow      NA    <NA>  
#>  7 01AK001        1918-08-07 Flow       1.78 <NA>  
#>  8 01AK001        1918-08-08 Flow       1.78 <NA>  
#>  9 01AK001        1918-08-09 Flow       1.5  <NA>  
#> 10 01AK001        1918-08-10 Flow       1.78 <NA>  
#> # ... with 77,034 more rows

These example illustrate a few ways that an vector can be generated and supplied to functions within tidyhydat.

Real-time

To download real-time data using the datamart we can use approximately the same conventions discussed above. Using realtime_dd() we can easily select specific stations by supplying a station of interest:

realtime_dd(station_number = "08LG006")

Another option is to provide simply the province as an argument and download all stations from that province:

realtime_dd(prov_terr_state_loc = "PE")

A simple plotting tool is also provided to quickly visualize realtime data:

realtime_plot("08LG006")

Getting Help or Reporting an Issue

To report bugs/issues/feature requests, please file an issue.

These are very welcome!

How to Contribute

If you would like to contribute to the package, please see our CONTRIBUTING guidelines.

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Citation

Get citation information for tidyhydat in R by running:

citation("tidyhydat")

ropensci_footer

License

Copyright 2017 Province of British Columbia

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

News

tidyhydat 0.4.0

IMPROVEMENTS

  • All functions now return either "hy" or "realtime" class with associated print and plot methods (#119)
  • prov_terr_state_loc now accepts a "CA" value to specify only stations located in Canada (#112)
  • functions that access internet resources now fail with an informative error message (#116)
  • tests that require internet resources are skipped when internet is down
  • Add small join example to calculate runoff to introduction vignette (#120)

BUG FIXES

  • pull_station_number now only returns unique values (#109)
  • Adding a offset column that reflects OlsonNames() and is thus DST independent (#110)
  • Caught all R_CHECK_LENGTH_1_CONDITION instances

tidyhydat 0.3.5

IMPROVEMENTS

  • New function: realtime_add_local_datetime() adds a local datetime column to realtime_dd() tibble (#64)
  • New function: pull_station_number() wraps pull(STATION_NUMBER) for convenience

MINOR BREAKING CHANGES

  • In effort to standardize, the case of column names for some rarely used function outputs were changed to reflect more commonly used function outputs. This may impact some workflows where columns are referenced by names (#99).

BUG FIXES

  • Functions that have a start_date and end_date actually work with said argument (#98)
  • hy_annual_instant_peaks() now parses the date correctly into UTC and includes a datetime and time zone column. (#64)
  • hy_stn_data_range() now returns actual NA's rather than string NA's (#97)

MINOR IMPROVEMENT

  • download_hydat() now returns an informative error if the download fails due to proxy-related connection issues (@rywhale, #101).

tidyhydat 0.3.4

IMPROVEMENT

  • Added rlang as a dependency and applied tidyeval idiom to more safety control variable environments
  • 15% speed improvement in realtime_dd by eliminating loop (#91)
  • 40% speed improvement when querying full provinces (#89)
  • reorganized file naming so that helper functions are placed in utils-* files

BUG FIXES

  • Fixed hy_monthly_flows and hy_monthly_levels date issue (#24)

MINOR IMPROVEMENT

  • realtime tidying now not duplicated and is handled by a function
  • simplified tidyhydat:::station_choice and added more unit testing
  • no longer outputting a message when station_number = "ALL".
  • Exporting pipe (%>%)

tidyhydat 0.3.3

NEW FEATURES

  • Open a connection to the HYDAT database directly using hy_src() for advanced functionality (PR#77).
  • New vignette outlining hy_src() (PR#77)
  • Add some tools to improve the usability of the test database (PR#77).
  • download_hydat() now uses httr::GET()

MINOR IMPROVEMENTS

  • Better downloading messages

BUG FIXES

  • Fixed package startup message so it can be supressed. (#79)
  • Fixed bug that resulted in download_hydat choice wasn't respected.
  • onAttach() now checks 115 days after last HYDAT release to prevent slow package load times if HYDAT is longer than 3 months between RELEASES.
  • Fixed margin error in hy_plot()
  • Fixed a bug in realtime_plot() that prevented a lake level station from being called
  • Fixed a bug in hy_daily() that threw an error when only a level station was called
  • Added new tests for hy_daily() and realtime_plot()
  • Added HYD_STATUS and REAL_TIME columns to allstations.

tidyhydat 0.3.2

NEW FEATURES

  • New hy_daily() function which combines all daily data into one dataframe.
  • Add a quick base R plotting feature for quick visualization of realtime and historical data.
  • Add realtime_daily_mean function that quickly converts higher resolution data into daily means.
  • New vignette outlining some example usage.

BUG FIXES

  • Fixed bug in download_hydat() that create a path that wasn't OS-independent.
  • Fixed a bug on download_hydat() where by sometimes R had trouble overwriting an existing version of existing database. Now the old database is simply deleted before the new one is downloaded.
  • hy_annual_instant_peaks() now returns a date object with HOUR, MINUTE and TIME_ZONE returned as separed columns. (#10)
  • All variable values of LEVEL and FLOW have been changed to Level and Flow to match the output of hy_data_types. (#60)
  • Tidier and coloured error messages throughout.
  • Review field incorrectly specified the rOpenSci review page. Removed the link from the DESCRIPTION.

tidyhydat 0.3.1

NEW FEATURES

  • When package is loaded, tidyhydat checks to see if HYDAT is even present
  • When package is loaded, it now tests to see if their a new version of HYDAT if the current date is greater than 3 months after the last release date of HYDAT.
  • Prep for CRAN release
  • Starting to use raw SQL for table queries
  • Removing 2nd vignette from build. Still available on github

tidyhydat 0.3.0

NEW FEATURES

  • New NEWS template!
  • Moved station_number to first argument to facilitate piped analysis (#54)
  • search_stn_name and search_stn_number now query both realtime and historical data sources and have tests for a more complete list (#56)
  • With credential stored in .Renviron file, ws_token can successfully be called by ws_token().
  • .onAttach() checks if HYDAT is downloaded on package load.

MINOR IMPROVEMENTS

  • Significant function and argument name changes (see below)
  • Adding rappdirs to imports and using to generate download path for download_hydat() (#44)
  • Adding rappdirs so that all the hy_* functions access hydat from rappdirs::user_data_dir() via hy_dir() (#44)
  • Revised and cleaned up documentation including two vignettes (#48)
  • FULL MONTH evaluate to a logic (#51)
  • All download tests are skipped on cran (#53)
  • Removed time limit for download_realtime_ws() with some documentation on actual limits. (3234c22)

BUG FIXES

  • Add informative error message for a single missing station input (#38)
  • No longer trying to build .Rd file for .onload (#47)
  • Fixed SED_MONTHLY_LOADS (#51)

FUNCTION NAME CHANGES (#45)

  • hy_agency_list <- AGENCY_LIST
  • hy_annual_instant_peaks <- ANNUAL_INSTANT_PEAKS
  • hy_annual_stats <- ANNUAL_STATISTICS
  • hy_daily_flows <- DLY_FLOWS
  • hy_daily_levels <- DLY_LEVELS
  • hy_monthly_flows <- MONTHLY_FLOWS
  • hy_monthly_levels <- MONTHLY_LEVELS
  • hy_sed_daily_loads <- SED_DLY_LOADS
  • hy_sed_daily_suscon <- SED_DLY_SUSCON
  • hy_sed_monthly_loads <- SED_MONTHLY_LOADS
  • hy_sed_monthly_suscon <- SED_MONTHLY_SUSCON
  • hy_sed_samples <- SED_SAMPLES
  • hy_sed_samples_psd <- SED_SAMPLES_PSD
  • hy_stations <- STATIONS
  • hy_stn_remarks <- STN_REMARKS
  • hy_stn_datum_conv <- STN_DATUM_CONVERSION
  • hy_stn_datum_unrelated <- STN_DATUM_UNRELATED
  • hy_stn_data_range <- STN_DATA_RANGE
  • hy_stn_data_coll <- STN_DATA_COLLECTION
  • hy_stn_op_schedule <- STN_OPERATION_SCHEDULE
  • hy_stn_regulation <- STN_REGULATION
  • hy_agency_list <- AGENCY_LIST
  • hy_reg_office_list <- REGIONAL_OFFICE_LIST
  • hy_datum_list <- DATUM_LIST
  • hy_version <- VERSION
  • realtime_dd <- download_realtime_dd
  • realtime_stations <- realtime_network_meta
  • search_stn_name <- search_name
  • search_stn_number <- search_number

ARGUMENT NAME CHANGES (#45)

  • station_number <- STATION_NUMBER
  • prov_terr_state_loc <- PROV_TERR_STATE_LOC

tidyhydat 0.2.9

  • Explicitly state in docs that time is in UTC (#32)
  • Added test for realtime_network_meta and moved to httr to download.
  • download functions all use httr now
  • removed need for almost all @import statement by referencing them all directly (#34)
  • Fixed error message when directly calling some tidyhydat function using :: (#31)
  • To reduce overhead, output_symbol has been added as an argument so code can be produced if desired (#33)

tidyhydat 0.2.8

  • Added examples to every function
  • Completed test suite including download_realtime_ws (#27)
  • Fixed bugs in several STN_* functions
  • Added STN_DATUM_RELATED
  • Updated documentation

tidyhydat 0.2.7

  • Updated documentation
  • Updated README
  • Created a small database so that unit testing occurs remotely (#1)
  • Fixed STN_DATA_RANGE bug (#26)

tidyhydat 0.2.6

  • using styler package to format code to tidyverse style guide
  • added PROV_TERR_STATE_LOC to allstations
  • added search_number function
  • added MONTHLY functions
  • created function families
  • added on.exit() to internal code; a better way to disconnect
  • Updated documentation

tidyhydat 0.2.5

  • fixed minor bug in download_realtime_ws so that better error message is outputted when no data is returned

tidyhydat 0.2.4

  • download_realtime_dd can now accept stations from multiple provinces or simply select multiple provinces
  • better error messages for get_ws_token and download_realtime_ws
  • All functions that previously accepted STATION_NUMBER == "ALL" now throw an error.
  • Added function to download hydat

tidyhydat 0.2.3

  • Remove significant redundancy in station selecting mechanism
  • Added package startup message when HYDAT is out of date
  • Add internal allstations data
  • Added all the tables as functions or data from HYDAT
  • Made missing station ouput truncated at 10 missing stations

tidyhdyat 0.2.2

  • Adding several new tables
  • removed need for both prov and stn args
  • reduced some repetition in code

tidyhydat 0.2.1

  • added STN_REGULATION
  • tidied ANNUAL_STATISTICS
  • added a series of lookup tables (DATUM_LIST, AGENCY_LIST, REGIONAL_OFFICE_LIST)
  • cleared up output of STATIONS

tidyhydat 0.2.0

  • standardize hydat outputs to consistent tibble structure
  • Adding search_name function
  • final names for download functions
  • functions output an information message about stations retrieved

tidyhydat 0.1.1

*Renamed real-time function as download_realtime and download_realtime2 *Added more units tests *Wrote vignette for package utilization *Brought all data closer to a "tidy" state

tidyhydat 0.1.0

*Added ability for STATIONS to retrieve ALL stations in the HYDAT database *Added ability for STATIONS to retrieve ALL stations in the HYDAT database *Standardize documentation; remove hydat_path default *Better error handling for download_realtime *Update documentation *Adding param_id data, data-raw and documentation *Dates filter to ANNUAL_STATISTICS and DLY_FLOWS; func and docs *DLY_LEveLS function and docs *download_ws and get_ws_token function and docs *UPDATE README

tidyhydat 0.0.4

*Added ability for STATIONS to retrieve ALL stations in the HYDAT database *Added ability for STATIONS to retrieve ALL stations in the HYDAT database *Standardize documentation; remove hydat_path default *Better error handling for download_realtime *Update documentation *Adding param_id data, data-raw and documentation *Dates filter to ANNUAL_STATISTICS and DLY_FLOWS; func and docs *DLY_LEveLS function and docs *download_ws and get_ws_token function and docs *UPDATE README

tidyhydat 0.0.3

*fixed db connection problem; more clear documentation *better error handling; more complete realtime documentation *harmonized README with standardized arguments

tidyhydat 0.0.2

*Added example analysis to README *Added devex badge; license to all header; import whole readr package *Able to take other protidyhydat inces than BC now *Update documentation; README

tidyhydat 0.0.1

*Initial package commit *Add license and include bcgotidyhydat files in RBuildIgnore *Two base working function; package level R file and associated documentation *Only importing functions used in the function *Update README with example *Added download_ functions *Added ANNUAL_STATISTICS query/table and docs *Updated docs and made DLY_FLOWS more rigorous

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("tidyhydat")

0.4.0 by Sam Albers, 7 months ago


https://github.com/ropensci/tidyhydat


Report a bug at https://github.com/ropensci/tidyhydat/issues


Browse source code at https://github.com/cran/tidyhydat


Authors: Sam Albers [aut, cre] , David Hutchinson [ctb] , Dewey Dunnington [ctb] , Ryan Whaley [ctb] , Province of British Columbia [cph] , Luke Winslow [rev] (Reviewed for rOpenSci) , Laura DeCicco [rev] (Reviewed for rOpenSci)


Documentation:   PDF Manual  


Task views: Hydrological Data and Modeling


Apache License (== 2.0) | file LICENSE license


Imports cli, crayon, DBI, dbplyr, dplyr, httr, lubridate, rappdirs, readr, rlang, RSQLite, tidyr

Suggests ggplot2, knitr, rmarkdown, testthat, covr


See at CRAN