Interface to the Search API for 'PLoS' Journals

A programmatic interface to the 'SOLR' based search API (<>) provided by the Public Library of Science journals to search their articles. Functions are included for searching for articles, retrieving articles, making plots, doing 'faceted' searches, 'highlight' searches, and viewing results of 'highlighted' searches in a browser.

Project Status: Active – The project has reached a stable, usable state and is being actively developed. cran checks Build Status Build status rstudio mirror downloads cran version

You can get this package at CRAN here, or install it within R by doing


Or install the development version from GitHub


What is this?

rplos is a package for accessing full text articles from the Public Library of Science journals using their API.


You used to need a key to use rplos - you no longer do as of 2015-01-13 (or v0.4.5.999).

rplos tutorial:

PLOS API documentation:

PLOS Solr schema is at but is 1.5 years old so may not be up to date.

Crossref API documentation here, and here. Note that we are working on a new package rcrossref (on CRAN) with a much fuller implementation of R functions for all Crossref endpoints.


Beware, PLOS recently has started throttling requests. That is, they will give error messages like "(503) Service Unavailable - The server cannot process the request due to a high load", which means you've done too many requests in a certain time period. Here's what they say on the matter:

Quick start


Search for the term ecology, and return id (DOI) and publication date, limiting to 5 items

searchplos('ecology', 'id,publication_date', limit = 5)
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1    47678     0
#> $data
#> # A tibble: 5 x 2
#>   id                           publication_date    
#>   <chr>                        <chr>               
#> 1 10.1371/journal.pone.0001248 2007-11-28T00:00:00Z
#> 2 10.1371/journal.pone.0059813 2013-04-24T00:00:00Z
#> 3 10.1371/journal.pone.0155019 2016-05-11T00:00:00Z
#> 4 10.1371/journal.pone.0080763 2013-12-10T00:00:00Z
#> 5 10.1371/journal.pone.0208370 2019-01-30T00:00:00Z

Get DOIs for full article in PLoS One

searchplos(q="*:*", fl='id', fq=list('journal_key:PLoSONE',
   'doc_type:full'), limit=5)
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1   217655     0
#> $data
#> # A tibble: 5 x 1
#>   id                          
#>   <chr>                       
#> 1 10.1371/journal.pone.0155491
#> 2 10.1371/journal.pone.0168631
#> 3 10.1371/journal.pone.0168627
#> 4 10.1371/journal.pone.0184491
#> 5 10.1371/journal.pone.0155489

Query to get some PLOS article-level metrics, notice difference between two outputs

out <- searchplos(q="*:*", fl=c('id','counter_total_all','alm_twitterCount'), fq='doc_type:full')
out_sorted <- searchplos(q="*:*", fl=c('id','counter_total_all','alm_twitterCount'),
   fq='doc_type:full', sort='counter_total_all desc')
#> # A tibble: 6 x 3
#>   id                           alm_twitterCount counter_total_all
#>   <chr>                                   <int>             <int>
#> 1 10.1371/journal.pone.0155491                0              2025
#> 2 10.1371/journal.pone.0168631                0               703
#> 3 10.1371/journal.pone.0168627                0              2392
#> 4 10.1371/journal.pone.0184491               10               745
#> 5 10.1371/journal.pone.0155489                0              3085
#> 6 10.1371/journal.pone.0127059                1              1449
#> # A tibble: 6 x 3
#>   id                                      alm_twitterCount counter_total_a…
#>   <chr>                                              <int>            <int>
#> 1 10.1371/journal.pmed.0020124                        3472          2728832
#> 2 10.1371/journal.pcbi.1003149                         200          1322780
#> 3 10.1371/annotation/80bd7285-9d2d-403a-…                0          1235195
#> 4 10.1371/journal.pone.0141854                        3438           887162
#> 5 10.1371/journal.pcbi.0030102                          65           872604
#> 6 10.1371/journal.pone.0088278                         975           699336

A list of articles about social networks that are popular on a social network

   fq=list('doc_type:full','subject:"Social networks"','alm_twitterCount:[100 TO 10000]'),
   sort='counter_total_month desc')
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1       60     0
#> $data
#> # A tibble: 10 x 2
#>    id                           alm_twitterCount
#>    <chr>                                   <int>
#>  1 10.1371/journal.pone.0150989              241
#>  2 10.1371/journal.pone.0069841              895
#>  3 10.1371/journal.pmed.1000316             1055
#>  4 10.1371/journal.pone.0183551              405
#>  5 10.1371/journal.pone.0073791             1883
#>  6 10.1371/journal.pone.0175368             1114
#>  7 10.1371/journal.pone.0149777              217
#>  8 10.1371/journal.pone.0138717              180
#>  9 10.1371/journal.pbio.1001535             2142
#> 10 10.1371/journal.pone.0143611              107

Show all articles that have these two words less then about 15 words apart

searchplos(q='everything:"sports alcohol"~15', fl='title', fq='doc_type:full', limit=3)
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1      137     0
#> $data
#> # A tibble: 3 x 1
#>   title                                                                    
#>   <chr>                                                                    
#> 1 Alcohol Advertising in Sport and Non-Sport TV in Australia, during Child…
#> 2 Alcohol intoxication at Swedish football matches: A study using biologic…
#> 3 Symptoms of Insomnia and Sleep Duration and Their Association with Incid…

Narrow results to 7 words apart, changing the ~15 to ~7

searchplos(q='everything:"sports alcohol"~7', fl='title', fq='doc_type:full', limit=3)
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1       79     0
#> $data
#> # A tibble: 3 x 1
#>   title                                                                    
#>   <chr>                                                                    
#> 1 Alcohol Advertising in Sport and Non-Sport TV in Australia, during Child…
#> 2 Alcohol intoxication at Swedish football matches: A study using biologic…
#> 3 Symptoms of Insomnia and Sleep Duration and Their Association with Incid…

Remove DOIs for annotations (i.e., corrections) and Viewpoints articles

searchplos(q='*:*', fl=c('id','article_type'),
   fq=list('-article_type:correction','-article_type:viewpoints'), limit=5)
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1  2131077     0
#> $data
#> # A tibble: 5 x 2
#>   id                                                  article_type    
#>   <chr>                                               <chr>           
#> 1 10.1371/journal.pone.0058099/materials_and_methods  Research Article
#> 2 10.1371/journal.pone.0030394/introduction           Research Article
#> 3 10.1371/journal.pone.0030394/results_and_discussion Research Article
#> 4 10.1371/journal.pone.0002157/materials_and_methods  Research Article
#> 5 10.1371/journal.pone.0030394/supporting_information Research Article

Faceted search

Facet on multiple fields

facetplos(q='alcohol', facet.field=c('journal','subject'), facet.limit=5)
#> $facet_queries
#> $facet_fields
#> $facet_fields$journal
#> # A tibble: 5 x 2
#>   term                             value
#>   <fct>                            <fct>
#> 1 plos one                         27112
#> 2 plos genetics                    609  
#> 3 plos medicine                    532  
#> 4 plos neglected tropical diseases 492  
#> 5 plos pathogens                   367  
#> $facet_fields$subject
#> # A tibble: 5 x 2
#>   term                          value
#>   <fct>                         <fct>
#> 1 biology and life sciences     28864
#> 2 medicine and health sciences  25887
#> 3 research and analysis methods 16394
#> 4 biochemistry                  13814
#> 5 physical sciences             10851
#> $facet_pivot
#> $facet_dates
#> $facet_ranges

Range faceting

facetplos(q='*:*', url=url, facet.range='counter_total_all',
 facet.range.start=5, facet.range.end=100,
#> $facet_queries
#> $facet_fields
#> $facet_pivot
#> $facet_dates
#> $facet_ranges
#> $facet_ranges$counter_total_all
#> # A tibble: 10 x 2
#>    term  value
#>    <fct> <fct>
#>  1 5     969  
#>  2 15    521  
#>  3 25    730  
#>  4 35    1126 
#>  5 45    1591 
#>  6 55    1860 
#>  7 65    1953 
#>  8 75    1887 
#>  9 85    1783 
#> 10 95    1620

Highlight searches

Search for and highlight the term alcohol in the abstract field only

(out <- highplos(q='alcohol', hl.fl = 'abstract', rows=3))
#> $`10.1371/journal.pone.0201042`
#> $`10.1371/journal.pone.0201042`$abstract
#> [1] "\nAcute <em>alcohol</em> administration can lead to a loss of control over drinking. Several models argue"
#> $`10.1371/journal.pone.0185457`
#> $`10.1371/journal.pone.0185457`$abstract
#> [1] "Objectives: <em>Alcohol</em>-related morbidity and mortality are significant public health issues"
#> $`10.1371/journal.pone.0071284`
#> $`10.1371/journal.pone.0071284`$abstract
#> [1] "\n<em>Alcohol</em> dependence is a heterogeneous disorder where several signalling systems play important"

And you can browse the results in your default browser



Full text urls

Simple function to get full text urls for a DOI

#> [1] ""

Full text xml given a DOI

(out <- plos_fulltext(doi='10.1371/journal.pone.0086169'))
#> 1 full-text articles retrieved 
#> Min. Length: 110717 - Max. Length: 110717 
#> DOIs: 10.1371/journal.pone.0086169 ... 
#> NOTE: extract xml strings like output['<doi>']

Then parse the XML any way you like, here getting the abstract

xpathSApply(xmlParse(out$`10.1371/journal.pone.0086169`), "//abstract", xmlValue)
#> [1] "Mammalian females pay high energetic costs for reproduction, the greatest of which is imposed by lactation. The synthesis of milk requires, in part, the mobilization of bodily reserves to nourish developing young. Numerous hypotheses have been advanced to predict how mothers will differentially invest in sons and daughters, however few studies have addressed sex-biased milk synthesis. Here we leverage the dairy cow model to investigate such phenomena. Using 2.39 million lactation records from 1.49 million dairy cows, we demonstrate that the sex of the fetus influences the capacity of the mammary gland to synthesize milk during lactation. Cows favor daughters, producing significantly more milk for daughters than for sons across lactation. Using a sub-sample of this dataset (N = 113,750 subjects) we further demonstrate that the effects of fetal sex interact dynamically across parities, whereby the sex of the fetus being gestated can enhance or diminish the production of milk during an established lactation. Moreover the sex of the fetus gestated on the first parity has persistent consequences for milk synthesis on the subsequent parity. Specifically, gestation of a daughter on the first parity increases milk production by ∼445 kg over the first two lactations. Our results identify a dramatic and sustained programming of mammary function by offspring in utero. Nutritional and endocrine conditions in utero are known to have pronounced and long-term effects on progeny, but the ways in which the progeny has sustained physiological effects on the dam have received little attention to date."

Search within a field

There are a series of convience functions for searching within sections of articles.

  • plosauthor()
  • plosabstract()
  • plosfigtabcaps()
  • plostitle()
  • plossubject()

For example:

plossubject(q='marine ecology',  fl = c('id','journal'), limit = 10)
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1     4010     0
#> $data
#> # A tibble: 10 x 2
#>    id                                        journal 
#>    <chr>                                     <chr>   
#>  1 10.1371/journal.pone.0167252              PLOS ONE
#>  2 10.1371/journal.pone.0167252/title        PLOS ONE
#>  3 10.1371/journal.pone.0167252/abstract     PLOS ONE
#>  4 10.1371/journal.pone.0167252/references   PLOS ONE
#>  5 10.1371/journal.pone.0167252/body         PLOS ONE
#>  6 10.1371/journal.pone.0149852/title        PLOS ONE
#>  7 10.1371/journal.pone.0149852/abstract     PLOS ONE
#>  8 10.1371/journal.pone.0149852/references   PLOS ONE
#>  9 10.1371/journal.pone.0149852/body         PLOS ONE
#> 10 10.1371/journal.pone.0149852/introduction PLOS ONE

However, you can always just do this in searchplos() like searchplos(q = "subject:science"). See also the fq parameter. The above convenience functions are simply wrappers around searchplos, so take all the same parameters.

Search by article views

Search with term marine ecology, by field subject, and limit to 5 results

plosviews(search='marine ecology', byfield='subject', limit=5)
#>                             id counter_total_all
#> 2 10.1371/journal.pone.0201675                 0
#> 3 10.1371/journal.pone.0201602               252
#> 1 10.1371/journal.pone.0167252              1379
#> 5 10.1371/journal.pone.0021810              3190
#> 4 10.1371/journal.pone.0149852             11918


Visualize word use across articles

plosword(list('monkey','Helianthus','sunflower','protein','whale'), vis = 'TRUE')
#> $table
#>   No_Articles       Term
#> 1       13216     monkey
#> 2         572 Helianthus
#> 3        1636  sunflower
#> 4      149565    protein
#> 5        1880      whale
#> $plot



  • Please report any issues or bugs.
  • License: MIT
  • Get citation information for rplos in R doing citation(package = 'rplos')
  • Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

This package is part of a richer suite called fulltext, along with several other packages, that provides the ability to search for and retrieve full text of open access scholarly articles. We recommend using fulltext as the primary R interface to rplos unless your needs are limited to this single source.



rplos 0.8.6


  • use preserve_exact_body_bytes for tests for plosabstract and plosfigtabcaps to avoid non-ascii text problems on debian clang devel (#125)

rplos 0.8.4


  • update docs for searchplos() and all wrapper fxns to explain that internal pagination is used, but that users can do their own pagination if they like (#122)


  • fix to pagination in searchplos() and all wrapper fxns. large numbers were being passed as scientific notation, fixed now (#123)

rplos 0.8.2


  • Integration with vcr and webmockr packages for unit test stubbing


  • for highbrow() open pages with instead of (#117)
  • remove message "Looping - printing progress ..." from searchplos() (#120)
  • fix internal pagination for searchplos(): were accidentally dropping fq statements if more than 1, woopsy (#121)

rplos 0.8.0


  • Now using solrium for under the hood Solr interaction instead of solr package (#106)
  • Along with above change, the following: facetplos, searchplos, and highplos lose parameter verbose, and gain parameters error and proxy for changing how verbose error reporting is, and for setting proxy details, respectively.
  • Now using crul instead of httr for HTTP requests (#110)


  • Fix to placement of images for README requested by CRAN (#114)
  • Replaced XML with xml2 (#112)
  • citations function for PLOS rich citations is defunct as the service is gone (#113)
  • package tm dropped from Enhances (#111)
  • added code of conduct, issue and pull request templates

rplos 0.6.4


  • URLs to full text XML have been changed - old URLs were working but were going through 2 302 redirects to get there. Updated URLs. (#107)


  • Fixed content-type check for plos_fulltext() function. XML can be either application/xml or text/xml (#108)

rplos 0.6.0


  • Added notes to documentation for relavant functions for how to do phrase searching. (#96) (#97) thanks @poldham
  • Removed parameter random parameter from citations() function as it's no longer available in the API (#103)
  • Swapped out all uses of dplyr::rbind_all() for dplyr::bind_rows() (#105)
  • full_text_urls() now gives back NA when DOIs for annotations are given, which can be easily removed.


  • Fixed full_text_urls() function to create full text URLs for PLOS Clinical Trials correctly (#104)

rplos 0.5.6


  • move ggplot2 from Depends to Imports, and using @importFrom for ggplot2 functions, now all imports are using @importFrom (#99)
  • Fixes for httr::content() to parse manually, and use explicit encoding of UTF-8 (#102)

rplos 0.5.4


  • Change solr dependency to require version v0.1.6 or less (#94)

rplos 0.5.2


  • More tests added (#94)


  • Fix encoding in parsing of XML data in plos_fulltext() to avoid unicode problems (#93)

rplos 0.5.0


  • Now importing non-Base R functions from utils, stats, and methods packages (#90)


  • Fixes for httr v1 that broke rplos when length 0 list passed to query parameter (#89)

rplos 0.4.7


  • New function citations() for querying the PLOS Rich Citations API ( (#88)


  • Added vignettes/figure to .Rbuildignore as requested by CRAN admin (#87)

rplos 0.4.6


  • API key no longer required (#86)


  • searchplos() now returns a list of length two, meta and data, and meta is a data.frame of metadata for the search.
  • Switched from CC0 to MIT license.
  • No longer importing libraries RCurl, data.table, googleVis, assertthat, RJSONIO, and stringr (#79) (#82) (#84)
  • Now importing dplyr.
  • Moved jsonlite from Suggests to Imports. Replaces use of RJSONIO. (#80)
  • crossref() now defunct. See package rcrossref (#83)
  • highplos() now uses solr::solr_highlight() to do highlight searches.
  • searchplos(), plosabstract(), and other functions that wrap searchplos() now use ... to pass in curl options to httr::GET(). You'll now get an error on using callopts parameter.
  • Added manual file entry for the dataset isocodes.
  • Reworked both plosword() and plot_throughtime() to have far less code, uses httr now instead of RCurl, but to the user, everything should be the same.
  • Made documentation more clear on discrepancy between PLOS website behavior and rplos behavior, and how to make them match, or match more closely (#76)
  • Added package level man file to allow ?rplos to go to help page.


  • Removed some examples from searchplos() that are now not working for some unknown reason. (#81)
  • Previously when user set limit=0, we still gave back data, this is fixed, and now the meta slot given back, and the data slot gives an NA (#85)

rplos 0.4.1


  • Fixed some broken tests.

rplos 0.4.0


  • Errors from the data provider are reported now. At least we attempt to do so when they are given, for example if you specify asc or desc incorrectly with the sort parameter. See the check_response() function for examples.
  • New functions facetplos() and highplos() using the solr R wrapper to the Solr indexing engine. The PLOS API just exposes the Solr endpoints, so we can use the general Solr wrapper package solr to allow more flexible Solr searching.
  • New function highbrow() to visualize highlighting results in a browser.
  • New function plos_fulltext() to get full text xml of PLOS articles. Helper function full_text_urls() constructs the URL's for full text xml.


  • Fixed bug in tests where we forgot to give a key. No key is required per se, but PLOS encourages it so we prevent a call from happening without at least a dumby key.
  • Added function check_response() to check responses from the PLOS API, deals with capturing server error messages, and checking for correct content type, etc.


  • Removed function crossref_r() as we are working on a package for the CrossRef API.
  • Parameter arguments in searchplos(), plosauthor(), plosfigtabcaps(), plossubject(), and plostitle() were changed to match closer the Solr parameter names. terms to q. fields to fl. toquery to fq.
  • Multiple values passed to fields
  • returndf parameter is gone from searchplos(), plosauthor(), plosfigtabcaps(), plossubject(), and plostitle(). You can easily get raw JSON, etc. data using the solr package.
  • Now using httr instead of RCurl in plosviews() function.

rplos 0.3.6


  • All search functions (searchplos(), plosabstract(), plosauthor(), plosfigtabcaps(), plossubject(), and plostitle()) gain highlighting argument, setting to TRUE (default=FALSE) returns matching sentence fragments that were matched. NOTE that if highlighting=TRUE the output can be a list of data.frame's if returndf=TRUE, with two named elements 'data' and 'highlighting', or a list of lists if returndf=FALSE.
  • All search functions (searchplos(), plosabstract(), plosauthor(), plosfigtabcaps(), plossubject(), and plostitle()) gain sort argument. You can pass a field to sort by even if you don't return that field in the output, e.g., sort='counter_total_month desc'.
  • A tiny function parsehighlight() added to parse out html code from highlighting output.


  • Some examples in docs didn't work - fixed them.
  • Fixed bug in searchplos() that was causing elements of a return field to cause failure because they were longer than 1 (e.g., authors). Now concatenating elements of length > 1.
  • Fixed bug in searchplos() that was causing elements of length 0 to cause failure. Now removing elements of length 0.
  • Fixed parsehighlight function to return NA if highlighting return of length 0.
  • Fixed broken test for plosauthor(), plosabstract(), and plot_throughtime().

rplos 0.3.0


  • Added httr::stop_for_status() calls to a few functions to give informative http status errors when they happen


  • Fixed bug in plot_throughtime() that was throwing errors and preventing fxn from working, thanks to Ben Bolker for the fix.
  • Simplified code in many functions to have cleaner and simpler code.
  • ... parameter in many functions changed to callopts=list(), which passes in curl options to a call to either RCurl::getForm() or httr::GET()
  • Fixed bug in function plosviews() that caused errors in some calls. Now forces full document searches, so that you get views data back for full papers only, not sections of papers. See package alm ( for more in depth PLOS article-level metrics.

rplos 0.2.0


  • All functions for interacting with the PLOS ALM (altmetrics) API have been removed, and are now in a separate package called alm (
  • Convenience functions plosabstract, plosauthor, plosfigtabcaps, plossubject, and plostitle, that search specifically within those sections of papers now wrap searchplos, so they should behave the same way.
  • ldfast() fxn added as an attempt to do ldply faster
  • performance improvements in searchplos


  • Dependency on assertthat removed since it's not on CRAN.
  • Fixed namespace conflicts by importing only functions needed from some packages.
  • searchplos() now removes leading, trailing, and internal whitespace from character strings

rplos 0.1.1

  • remove alm*() functions so that this package now only wraps the PLoS search API.

rplos 0.1.0

  • The almdateupdated function has been deprecated - use almupdated instead.

  • The articlelength function has been deprecated - didn't see the usefulness any longer.

  • In general simplified and prettified code.

  • Changed from using RCurl to httr in many functions, but not all.

  • Added more examples for many functions.

  • Added three internal functions: concat_todf, addmissing, and getkey.

  • Added Karthik Ram as a package author.


  • All url arguments in functions put inside functions as they are not likely to change that often.

  • Fixed crossref function, and added more examples.


  • The alm function (previously almplosallviews) gains many ### new features: now allows up to 50 DOIs per call; you can specify the source you want to get alm data from as an argument; you can specify the year you want to get alm data from as an argument.

  • Added the plosfields data file to get all the possible fields to use in function calls.


  • almplosallviews changed to alm.

  • almplotallviews changed to almplot.

  • almevents added to specifically search and get detailed events data for a specific source or N sources.

  • crossref_r gets 20 random DOIs from

  • Added package startup message.

  • journalnamekey function to get the short name keys for each PLoS Journal.

rplos 0.0-7


  • ALM functions (any functions starting with alm) received updated arguments/parameters according to the ALM API version 3.0 changes.

  • ### Bug fixes in general across library.

  • Added tests.

  • almplosallviews now outputs different output - two data.frames, one total metrics (summed across time), and history (for metrics for each time period specified in the search)

  • crossref function returns R's native bibtype format. See examples in crossref function documentation

rplos 0.0-5


  • almpub changed to almdatepub

  • changed help file rplos to help - use help('rplos') in R

  • changed URL from to

  • added sleep argument to plosallviews function to allow pauses between API calls when running plosallviews in a loop - this is an attempt to limit hitting the PLoS API too hard

  • various other fixed to functions

  • more examples added to some functions


  • added function journalnamekey to get short keys for journals to use in searching for specific journals

rplos 0.0-1


  • released to CRAN

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


1.0.0 by Scott Chamberlain, 9 months ago

Report a bug at

Browse source code at

Authors: Scott Chamberlain [aut, cre] , Carl Boettiger [aut] , Karthik Ram [aut] , rOpenSci [fnd] (

Documentation:   PDF Manual  

Task views: Web Technologies and Services

MIT + file LICENSE license

Imports ggplot2, crul, jsonlite, dplyr, plyr, lubridate, reshape2, whisker, solrium

Suggests xml2, knitr, rmarkdown, testthat, webmockr, vcr

See at CRAN