Analyse Citation Data from Google Scholar

Provides functions to extract citation data from Google Scholar. Convenience functions are also provided for comparing multiple scholars and predicting future h-index values.

The scholar R package provides functions to extract citation data from Google Scholar. In addition to retrieving basic information about a single scholar, the package also allows you to compare multiple scholars and predict future h-index values.

Development of the scholar package has resumed and a new maintainer should be confirmed shortly. Please continue to file issues and make pull requests against going forwards.

Basic features

Individual scholars are referenced by a unique character string, which can be found by searching for an author and inspecting the resulting scholar homepage. For example, the profile of physicist Richard Feynman is located at and so his unique id is B7vSqZsAAAAJ.

Basic information on a scholar can be retrieved as follows:

# Define the id for Richard Feynman
id <- 'B7vSqZsAAAAJ'

# Get his profile and print his name
l <- get_profile(id)

# Get his citation history, i.e. citations to his work in a given year 

# Get his publications (a large data frame)

Additional functions allow the user to query the publications list, e.g. get_num_articles, get_num_distinct_journals, get_oldest_article, get_num_top_journals. Note that Google doesn't explicit categorize publications as journal articles, book chapters, etc, and so journal or article in these function names is just a generic term for a publication.

Comparing scholars

You can also compare multiple scholars, as shown below. Note that these two particular scholars are rather profilic and these queries will take a very long time to run.

# Compare Feynman and Stephen Hawking
ids <- c('B7vSqZsAAAAJ', 'qj74uXkAAAAJ')

# Get a data frame comparing the number of citations to their work in
# a given year 

# Compare their career trajectories, based on year of first citation

Predicting future h-index values

Finally users can predict the future h-index of a scholar, based on the method of Acuna et al.. Since the method was originally calibrated on data from neuroscientists, it goes without saying that, if the scholar is from another discipline, then the results should be taken with a large pinch of salt. A more general critique of the original paper is available here. Still, it's a bit of fun.

## Predict h-index of original method author, Daniel Acuna
id <- 'GAi23ssAAAAJ'


scholar 0.1.7 (03 Jul 2018)

  • update impact factor data to 2017 (released on 2018-06-26)

  • plot_coauthors function to plot co-author network (thanks @cimentadaj)

  • get_impactfactor function to query journal's impact factor (thanks @DominiqueMakowski)

  • getCompleteAuthors to get the complete list of authors for a publication (thanks @abfleishman)

scholar 0.1.6 (23 May 2018)

  • add vignette

scholar 0.1.5 (28 September 2017)

  • update get_publications and get_article_cite_history according to the change of Google Scholar (thanks @guangchuangyu)

scholar 0.1.4 (21 November 2015)

  • Fixed bug with missing cookies that was preventing data from being downloaded.

  • Converted code from XML to rvest/dplyr for legibility

  • For clarity, get_publications now uses cid as the name of the column used to link to a publication's full citation history. This avoids any confusion when you add the scholar's id, which is id elsewhere in the package.

scholar 0.1.3 (28 September 2015)

  • Added a get_article_cite_history function to get the citation history of a single article (#6, thanks @mkiang)

  • Added a pagesize argument to get_publications. By default 100 publications will be fetched.

  • Added an option to flush cache in get_publications

  • Improved performance of large numbers of publications (#15, thanks @jefferis)

  • Improved documentation

scholar 0.1.2 (22 August 2014)

  • Updated functions to work with new Google Scholar layout

  • Added a CITATION file

  • Added the publication id to get_publications (thanks @dfalster)

scholar 0.1.1

  • Fixed bug with incorrect parsing of profile summary table (#2)

  • predict_h_index now predicts a scholar's h-index for every year in the next ten years, not just at 1, 5, and 10 year intervals. Thanks to Daniel Acuna for providing the necessary regression coefficients.

scholar 0.1.0

  • initial release

  • get profile and publications data for researchers on Google Scholar

  • compare multiple scholars and predict future h-index values

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.2.2 by Guangchuang Yu, 5 months ago

Report a bug at

Browse source code at

Authors: Guangchuang Yu [aut, cre] , James Keirstead [aut] , Gregory Jefferis [ctb] , Gordon Getzinger [ctb] , Jorge Cimentada [ctb] , Max Czapanskiy [ctb] , Dominique Makowski [ctb]

Documentation:   PDF Manual  

Task views: Web Technologies and Services

MIT + file LICENSE license

Imports R.cache, dplyr, httr, rvest, stringr, xml2, tidygraph, ggraph, ggplot2

Suggests knitr, rmarkdown, prettydoc, roxygen2, testthat

Imported by CoTiMA.

See at CRAN