Utility functions to retrieve data from the UK National River Flow Archive (< http://nrfa.ceh.ac.uk/>). The package contains R wrappers to the UK NRFA data temporary-API. There are functions to retrieve stations falling in a bounding box, to generate a map and extracting time series and general information.
The UK National River Flow Archive serves daily streamflow data, spatial rainfall averages and information regarding elevation, geology, land cover and FEH related catchment descriptors.
There is currently an API under development that in future should provide access to the following services: metadata catalogue, catalogue filters based on a geographical bounding-box, catalogue filters based on metadata entries, gauged daily data for about 400 stations available in WaterML2 format, the OGC standard used to describe hydrological time series.
The information returned by the first three services is in JSON format, while the last one is an XML variant.
The RNRFA package aims to achieve a simpler and more efficient access to data by providing wrapper functions to send HTTP requests and interpret XML/JSON responses.
The rnrfa package depends on the gdal library, make sure you have it installed on your system before attempting to install this package.
R package dependencies can be installed running the following code:
install.packages(c("cowplot", "plyr", "httr", "xml2", "stringr", "xts", "rjson", "ggmap", "ggplot2", "sp", "rgdal", "parallel"))
This demo makes also use of external libraries. To install and load them run the following commands:
packs <- c("devtools", "DT", "leaflet")install.packages(packs)lapply(packs, require, character.only = TRUE)
The stable version of the rnrfa package is available from CRAN:
Or you can install the development version from Github with devtools:
Now, load the rnrfa package:
The R function that deals with the NRFA catalogue to retrieve the full list of monitoring stations is called catalogue(). The function, used with no inputs, requests the full list of gauging stations with associated metadata. The output is a dataframe containing one record for each station and as many columns as the number of metadata entries available.
# Retrieve information for all the stations in the catalogue:allStations <- catalogue()
Those entries are briefly described as follows:
id= Station identification number
name= Name of the station
location= Area in which the station is located
river= River catchment
stationDescription= General station description, containing information on weirs, ratings, etc.
catchmentDescription= Information on topography, geology, land cover, etc.
hydrometricArea= UK hydrometric area identification number
operator= UK measuring authorities
haName= Hydrometric Area name
gridReference= OS Grid Reference number
stationType= Type of station (e.g. flume, weir, etc.)
catchmentArea= Catchment area in (Km^2)
gdfStart= Year in which recordings started
gdfEnd= Year in which recordings ended
farText= Information on the regime (e.g. natural, regulated, etc.)
categories= various tags (e.g. FEH_POOLING, FEH_QMED, HIFLOWS_INCLUDED)
altitude= Altitude measured in metres above Ordnance Datum or, in Northern Ireland, Malin Head.
sensitivity= Sensitivity index calculated as the percentage change in flow associated with a 10 mm increase in stage at the Q95 flow.
lat= a numeric vector of latitude coordinates.
lon= a numeric vector of longitude coordinates.
The same function catalogue() can be used to filter stations based on a bounding box or any of the metadata entries.
# Define a bounding box:bbox <- list(lonMin=-3.82, lonMax=-3.63, latMin=52.43, latMax=52.52)# Filter stations based on bounding boxsomeStations <- catalogue(bbox)# Filter stations belonging to a certain hydrometric areasomeStations <- catalogue(columnName="haName", columnValue="Wye (Hereford)")# Filter based on bounding box & metadata stringssomeStations <- catalogue(bbox,columnName="haName",columnValue="Wye (Hereford)")# Filter stations based on thresholdsomeStations <- catalogue(bbox,columnName="catchmentArea",columnValue=">1")# Filter based on minimum recording yearssomeStations <- catalogue(bbox,columnName="catchmentArea",columnValue=">1",minRec=30)# Filter stations based on identification numbersomeStations <- catalogue(columnName="id",columnValue=c(3001,3002,3003))# Other combined filteringsomeStations <- catalogue(bbox,columnName="id",columnValue=c(54022,54090,54091,54092,54097),minRec=35)
The only geospatial information contained in the list of station in the catalogue is the OS grid reference (column "gridRef"). The RNRFA package allows convenient conversion to more standard coordinate systems. The function "osg_parse()", for example, converts the string to easting and northing in the BNG coordinate system (EPSG code: 27700), as in the example below:
# Where is the first catchment located?someStations$gridReference# Convert OS Grid reference to BNGosg_parse("SN853872")
The same function can also convert from BNG to latitude and longitude in the WSGS84 coordinate system (EPSG code: 4326) as in the example below.
# Convert BNG to WSGS84osg_parse("SN853872", CoordSystem = "WGS84")
osg_parse() also works with multiple references:
The first column of the table "someStations" contains the id number. This can be used to retrieve time series data and convert waterml2 files to time series object (of class zoo).
The National River Flow Archive serves two types of time series data: gauged daily flow and catchment mean rainfall.
These time series can be obtained using the functions gdf() and cmr(), respectively. Both functions accept three inputs:
id, the station identification numbers (single string or character vector).
metadata, a logical variable (FALSE by default). If metadata is TRUE means that the result for a single station is a list with two elements: data (the time series) and meta (metadata).
cl, This is a cluster object, created by the parallel package. This is set to NULL by default, which sends sequential calls to the server.
Here is how to retrieve mean rainfall (monthly) data for Shin at Lairg (id = 3001) catchment.
# Fetch only time series data from the waterml2 serviceinfo <- cmr(id = "3001")plot(info)# Fetch time series data and metadata from the waterml2 serviceinfo <- cmr(id = "3001", metadata = TRUE)plot(info$data, main=paste("Monthly rainfall data for the",info$meta$stationName,"catchment"),xlab="", ylab=info$meta$units)
Here is how to retrieve (daily) flow data for Shin at Lairg (id = 3001) catchment.
# Fetch only time series data from the waterml2 serviceinfo <- gdf(id = "3001")plot(info)# Fetch time series data and metadata from the waterml2 serviceinfo <- gdf(id = "3001", metadata = TRUE)plot(info$data, main=paste("Daily flow data for the",info$meta$stationName,"catchment"),xlab="", ylab=info$meta$units)
By default, the functions
getTS() can be used to fetch time series data from multiple site in a sequential mode (using 1 core):
# Search data/metadata in the waterml2 services <- cmr(c(3002,3003), metadata = TRUE)# s is a list of 2 objects (one object for each site)plot(s[]$data,main = paste(s[]$meta$stationName, "and", s[]$meta$stationName))lines(s[]$data, col="green")
Upgrade your data.frame to a data.table:
Create interactive maps using leaflet:
library(leaflet)leaflet(data = someStations) %>% addTiles() %>%addMarkers(~lon, ~lat, popup = ~as.character(paste(id,name)))
Interactive plots using dygraphs:
library(dygraphs)dygraph(info$data) %>% dyRangeSelector()
Sequential vs Concurrent requests: a simple benchmark test
library(parallel)# Use detectCores() to find out many cores are available on your machinecl <- makeCluster(getOption("cl.cores", detectCores()))# Filter all the stations within the above bounding boxsomeStations <- catalogue(bbox)# Get flow data with a sequential approachsystem.time( s1 <- gdf(someStations$id, cl = NULL) )# Get flow data with a concurrent approach (using `parLapply()`)system.time( s2 <- gdf(id = someStations$id, cl = cl) )
The measured flows are expected to increase with the catchment area. Let's show this simple regression on a plot:
# Calculate the mean flow for each catchmentsomeStations$meangdf <- unlist( lapply(s2, mean) )# Linear modellibrary(ggplot2)ggplot(someStations, aes(x = as.numeric(catchmentArea), y = meangdf)) +geom_point() +stat_smooth(method = "lm", col = "red") +xlab(expression(paste("Catchment area [Km^2]",sep=""))) +ylab(expression(paste("Mean flow [m^3/s]",sep="")))
Please refer to the following Terms and Conditions for use of NRFA Data and disclaimer: http://nrfa.ceh.ac.uk/costs-terms-and-conditions
This package uses a non-public API which is likely to change. Package and functions herein are provided as is, without any guarantee.
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Michael Spencer (contributor) updated the function OSGparse to work with grid references of different lengths.
Added testthat framework for unit tests