Reading Bibliometric Data from Lattes Platform

A simple API for downloading and reading xml data directly from Lattes < http://lattes.cnpq.br/>.


ATTENTION: The code is currently NOT working due to the return of the captcha page.

Lattes is an unique and largest platform for academic curriculumns. There you can find information about the academic work of all Brazilian scholars. It includes institution of PhD, current employer, field of work, all publications metadata and more. It is an unique and reliable source of information for bibliometric studies.

I've been working with Lattes data for some time. Here I present a short list of papers that have used this data.

  • Is predatory publishing a real threat? Evidence from a large database study. Scientometrics

  • The Brazilian scientific output published in journals: A study based on a large CV database. Journal of Informetrics

  • The researchers, the publications and the journals of Finance in Brazil: An analysis based on resumes from the Lattes platform. Brazilian Review of Finance

  • Análise do Perfil dos Acadêmicos e de suas Publicações Científicas em Administração (in Portuguese. RAC

Package GetLattesData is a wrap up of functions I've been using for accessing the dataset. It's main innovation is the possibility of downloading data directly from Lattes, without any manual work or captcha solving.

Installation

The package is available in CRAN:

install.packages('GetLattesData')

You can also install the development version from Github:

#install.packages('devtools')
devtools::install_github('msperlin/GetLattesData')

Example of usage

See vignette for more examples.

library(GetLattesData)

# ids from EA-UFRGS
my.ids <- c('K4713546D3', 'K4440252H7', 
            'K4783858A0', 'K4723925J2')

# qualis for the field of management
field.qualis = 'ADMINISTRAÇÃO PÚBLICA E DE EMPRESAS, CIÊNCIAS CONTÁBEIS E TURISMO'

l.out <- gld_get_lattes_data(id.vec = my.ids, field.qualis = field.qualis)

tpublic <- l.out$tpublic.published
dplyr::glimpse(tpublic)

News

Version 1.2 (2018-10-11)

  • Changes to the Lattes website seems to be permanent and stable. The main change is that in order to download xml zip files, one must mannualy break a captcha. Based on this change, the package no longer works by downloading the files, but acessing it from locally (yes, you must download all files manualy). This update fix Github issue 10 and Github issue 09.

Version 1.0 (2018-08-13)

Lattes website is back without captcha! The new version of GetLattesData is a bit slower than the old one, but it works.

Minor changes:

Version 1.1 (2018-08-19)

Sadly, lattes website is once again using captcha. This version removes the code needed to build the package, so that it can be hosted in CRAN.

Once again, I'll keep checking Lattes over time and see whether any solution comes to life.

Version 1.0 (2018-08-13)

Lattes website is back without captcha! The new version of GetLattesData is a bit slower than the old one, but it works.

Minor changes:

Version 0.9 (2017-11-27)

Lattes website is offline. Online downloading of xml files is no longer possible.

  • Added comments at main function, which now returns empty df, and vignettes.

Version 0.8 (2017-11-12)

  • Now with support for conferences and accepted articles
  • Added a function for finding information about accepted and published papers

Version 0.7 (2017-10-31)

  • Added support for books and books chapters

Version 0.6 (2017-10-14)

  • Forced UTF-8 encoding in all output
  • Users can now download information about academic supervisions (msc, phd, ic)

Version 0.5 (2017-09-04)

First commit

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("GetLattesData")

1.3 by Marcelo Perlin, 11 days ago


https://github.com/msperlin/GetLattesData/


Report a bug at https://github.com/msperlin/GetLattesData/issues


Browse source code at https://github.com/cran/GetLattesData


Authors: Marcelo Perlin [aut, cre]


Documentation:   PDF Manual  


GPL-2 license


Imports stringr, XML, dplyr, readr, stringdist, curl, tools

Suggests knitr, rmarkdown, testthat, ggplot2


See at CRAN