Tidy RSS for R

With the objective of including data from RSS feeds into your analysis, 'tidyRSS' parses RSS, Atom XML, JSON and geoRSS feeds and returns a tidy data frame.


CRAN_Status_Badge CRAN_Download_Badge CRAN_Download_Badge

tidyRSS is a package for extracting data from RSS feeds, including Atom feeds, JSON feeds and georss feeds.

It is easy to use as it only has one function, tidyfeed(), which takes two arguments, the url of the feed and a logical flag for whether you want a geoRSS feed returned as a simple features dataframe or not. Running this function will return a tidy data frame of the information contained in the feed. If the url is not an rss or atom feed, it will return an error message.

Included in the package is a simple dataset, a list of feed urls, which you can use to experiment with. You can access this with data("feeds").

It can be installed directly from CRAN with:

 
install.packages("tidyRSS")

The development version can be installed from GitHub with the devtools package.

 
devtools::install_github("robertmyles/tidyrss")

Usage

RSS feeds can be parsed with tidyfeed(), and some examples are included in the “feeds” dataset. Here is an example of using the package:

library(tidyRSS)
 
data("feeds")
 
# select a feed:
rss <- sample(feeds$feeds, 1)
 
tidyfeed(rss)
#> # A tibble: 50 x 5
#>    feed_title  feed_link   item_title    item_date_published item_link    
#>    <chr>       <chr>       <chr>         <dttm>              <chr>        
#>  1 Instructab… http://www… All  You Nee… 2018-07-04 10:03:03 http://www.i
#>  2 Instructab… http://www… Automatic Ra… 2018-07-04 09:35:14 http://www.i
#>  3 Instructab… http://www… DIY survival… 2018-07-04 09:27:17 http://www.i
#>  4 Instructab… http://www… Recycled and… 2018-07-04 08:29:32 http://www.i
#>  5 Instructab… http://www… ESP8266 Temp… 2018-07-04 08:23:55 http://www.i
#>  6 Instructab… http://www… How to Cool … 2018-07-04 06:58:42 http://www.i
#>  7 Instructab… http://www… DIY Laundry … 2018-07-04 06:38:56 http://www.i
#>  8 Instructab… http://www… DIY Li-ion C… 2018-07-04 05:54:44 http://www.i
#>  9 Instructab… http://www… Aluminum Cas… 2018-07-04 04:55:02 http://www.i
#> 10 Instructab… http://www… Receipt hold… 2018-07-04 04:44:50 http://www.i
#> # ... with 40 more rows

More information is contained in the vignette: vignette("tidyrss", package = "tidyRSS").

Issues

RSS & Atom XML feeds can be finicky things, if you find one that doesn’t work with tidyfeed(), let me know. Please include the url of the feed that you are trying. Pull requests and general feedback are welcome. Many feeds are malformed. What this means is that, for a well-formed feed, you’ll get back a tidy data frame with information on the feed and the individual items (like blog posts, for example), including content. For malformed feeds, it will be less than this, as tidyfeed() deletes NA columns, where the information wasn’t in the feed in the first place.

Related

The package is a ‘tidy’ version of two other related packages, rss and feedeR, both of which return lists. In comparison to feedeR, tidyRSS returns more information from the RSS feed (if it exists), and development on rss seems to have stopped some time ago.

Other

For an example Shiny app that uses geoRSS, see here.

News

tidyRSS v1.2.9 (Release date: 08/05/2019)

Changes: Added funcionality to process dc:date tags in v1 RSS feeds and better handling of item category columns, see https://github.com/luke-a/tidyRSS/commit/c677022996fa971b49ef1a858ae21ca720b56c8e .

tidyRSS v1.2.8 (Release date: 05/03/2019)

Changes: Fix to add proper href links in Atom feeds.

tidyRSS v1.2.7 (Release date: 03/11/2018)

Changes: Small fix to add item descriptions for RSS feeds.

tidyRSS v1.2.6 (Release date: 29/08/2018)

Changes: Fixed an error with feeds parsing as geoRSS when they didn't have the necessary lat/lon columns.

tidyRSS v1.2.5 (Release date: 05/08/2018)

Changes: Removed tests. The tests were based on checking a bunch of RSS feed URLs. Since feeds undergo maintenance, or are taken down etc., the tests were failing randomly.

tidyRSS v1.2.4 (Release date: 01/06/2018)

Changes: Added support for geo RSS feeds.

tidyRSS v1.2.3 (Release date: 28/01/2018)

Changes: Changed tests to avoid failures due to problems outside the package (faulty connections, site maintenance, feeds being taken down).

tidyRSS v1.2.2 (Release date: 7/9/2017)

Changes: Added preliminary support for jsonfeeds; minor changes to parsing other feeds; minor change to data included with the package.

tidyRSS v1.2.1 (Release date: 16/6/2017)

Changes: Minor changes to parsing Atom feeds.

tidyRSS v1.2.0 (Release date: 6/6/2017)

Changes:

  • Re-wrote package: less dependencies, streamlined code, more robust.

tidyRSS v1.0.1 (Release date: 28/2/2017)

Changes:

  • Fixed certain feeds not parsing (Issue #1)

tidyRSS v1.0.0 (Release date: 24/2/2017)

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("tidyRSS")

1.2.11 by Robert Myles McDonnell, 2 months ago


https://github.com/RobertMyles/tidyrss


Report a bug at https://github.com/RobertMyles/tidyrss/issues


Browse source code at https://github.com/cran/tidyRSS


Authors: Robert Myles McDonnell


Documentation:   PDF Manual  


Task views: Web Technologies and Services


MIT + file LICENSE license


Imports xml2, httr, lubridate, magrittr, tibble, dplyr, testthat, jsonlite, purrr, sf, stringr

Suggests knitr, rmarkdown


See at CRAN