The Resource Description Framework, or 'RDF' is a widely used data representation model that forms the cornerstone of the Semantic Web. 'RDF' represents data as a graph rather than the familiar data table or rectangle of relational databases. The 'rdflib' package provides a friendly and concise user interface for performing common tasks on 'RDF' data, such as reading, writing and converting between the various serializations of 'RDF' data, including 'rdfxml', 'turtle', 'nquads', 'ntriples', and 'json-ld'; creating new 'RDF' graphs, and performing graph queries using 'SPARQL'. This package wraps the low level 'redland' R package which provides direct bindings to the 'redland' C library. Additionally, the package supports the newer and more developer friendly 'JSON-LD' format through the 'jsonld' package. The package interface takes inspiration from the Python 'rdflib' library.
A friendly and consise user interface for performing common tasks on rdf data, such as parsing and converting between formats including rdfxml, turtle, nquads, ntriples, and trig, creating rdf graphs, and performing SPARQL queries. This package wraps the redland R package which provides direct bindings to the redland C library. Additionally, the package supports parsing and serialization of rdf into json-ld through the json-ld package, which binds the official json-ld javascript API. The package interface takes inspiration from the Python rdflib library.
You can install rdflib from GitHub with:
# install.packages("devtools")devtools::install_github("ropensci/rdflib")
While not required, rdflib
is designed to play nicely with %>%
pipes, so we will load the magrittr
package as well:
library(magrittr)library(rdflib)
Parse a file and serialize into a different format:
system.file("extdata/dc.rdf", package="redland") %>%rdf_parse() %>%rdf_serialize("test.nquads", "nquads")
Perform SPARQL queries:
sparql <-'PREFIX dc: <http://purl.org/dc/elements/1.1/>SELECT ?a ?cWHERE { ?a dc:creator ?c . }'system.file("extdata/dc.rdf", package="redland") %>%rdf_parse() %>%rdf_query(sparql)#> # A tibble: 1 x 2#> a c#> <chr> <chr>#> 1 http://www.dajobe.org/ Dave Beckett
Initialize graph a new object or add triples statements to an existing graph:
x <- rdf()x <- rdf_add(x,subject="http://www.dajobe.org/",predicate="http://purl.org/dc/elements/1.1/language",object="en")x#> Total of 1 triples, stored in hashes#> -------------------------------#> <http://www.dajobe.org/> <http://purl.org/dc/elements/1.1/language> "en" .
Change the default display format (nquads
) for graph objects:
options(rdf_print_format = "jsonld")x#> Total of 1 triples, stored in hashes#> -------------------------------#> {#> "@id": "http://www.dajobe.org/",#> "http://purl.org/dc/elements/1.1/language": "en"#> }
We can also work with the JSON-LD format through additional functions
provided in the R package, jsonld
.
out <- tempfile()rdf_serialize(x, out, "jsonld")rdf_parse(out, format = "jsonld")#> Total of 1 triples, stored in hashes#> -------------------------------#> {#> "@id": "http://www.dajobe.org/",#> "http://purl.org/dc/elements/1.1/language": "en"#> }
For more information on the JSON-LD RDF API, see https://json-ld.org/spec/latest/json-ld-rdf/.
See articles from the documentation for advanced use including applications to large triplestores, example SPARQL queries, and information about additional database backends.
Please also cite the underlying redland
library when citing rdflib
Carl Boettiger. (2018). rdflib: A high level wrapper around the redland package for common rdf applications (Version 0.1.0). Zenodo. https://doi.org/10.5281/zenodo.1098478
Jones M, Slaughter P, Ooms J, Boettiger C, Chamberlain S (2018). redland: RDF Library Bindings in R. doi: 10.5063/F1VM496B (URL: http://doi.org/10.5063/F1VM496B), R package version 1.0.17-10, <URL: https://github.com/ropensci/redland-bindings/tree/master/R/redland>.
rdf()
supports all major storage backends: Virtuoso, SQLite, Postgres, MySQL,
in addition to existing support for BDB and memory-based storage.length()
method added to report length of triplestoreprint()
method gains rdf_max_print()
option and does not print huge triplestoresprint()
method sumarizes total number of triples and backendrdf()
supports BDB backend for disk-based storage for large
triplestores #6rdf_parse()
gains an argument rdf
to append triples to existing graphc()
method to concatenate rdf
objectsrdf_query
now bypasses the the very slow iteration over getNextResult
approach and uses an internal redland function call to access all results
at once in csv format.
experimental as_rdf
method now uses a poor-man's nquad serializer to
rapidly generate rdf (instead of slowly iterating over add_rdf
).
rdf_add
argument for object
can now take all atomic types
(numeric, integer, string, Date, POSIX, logical) and
will automatically declare the appropriate datatype_uri
if the user has not manually specified this.
Numerous improvements to documentation from rOpenSci onboarding feedback, see #9 and #10
both functions and unit tests are broken out into separate files in their respective directories.
additional example RDF data added in extdata
rdf_serialize
passes ...
arguments to serializeToFile (e.g. to set a baseUri
)
rdf_free()
will also remove the object from the parent frame,
reducing the potential for crashing R by referring to a freed pointer.rdf_query()
now coerces data into appropriate type
if it recognizes the data URI and can match that
to an R type (a few XMLSchema types are recognized,
otherwise still defaults to character string)turtle
parser/serializer fixedtrig
support removed (not working in redland without optional
libraries and alternative compile configuration)NEWS.md
file to track changes to the package.