Manage Google Spreadsheets from R

Interact with Google Sheets from R.



Access and manage Google spreadsheets from R with googlesheets.

Features:

  • Access a spreadsheet by its title, key or URL.
  • Extract data or edit data.
  • Create | delete | rename | copy | upload | download spreadsheets and worksheets.
  • Upload local Excel workbook into a Google Sheet and vice versa.

googlesheets is inspired by gspread, a Google Spreadsheets Python API

The exuberant prose in this README is inspired by Tabletop.js: If you've ever wanted to get data in or out of a Google Spreadsheet from R without jumping through a thousand hoops, welcome home!

The released version is available on CRAN

install.packages("googlesheets")

Or you can get the development version from GitHub:

devtools::install_github("jennybc/googlesheets")

GitHub versions:

googlesheets is designed for use with the %>% pipe operator and, to a lesser extent, the data-wrangling mentality of dplyr. This README uses both, but the examples in the help files emphasize usage with plain vanilla R, if that's how you roll. googlesheets uses dplyr internally but does not require the user to do so. You can make the %>% pipe operator available in your own work by loading dplyr or magrittr.

library("googlesheets")
suppressPackageStartupMessages(library("dplyr"))

To play nicely with tab completion, we use consistent prefixes:

  • gs_ = all functions in the package.
  • gs_ws_ = all functions that operate on worksheets or tabs within a spreadsheet.
  • gd_ = something to do with Google Drive, usually has a gs_ synonym, might one day migrate to a Drive client.

Here's how to get a copy of a Gapminder-based Sheet we publish for practicing and follow along. You'll be sent to the browser to authenticate yourself with Google at this point.

gs_gap() %>% 
  gs_copy(to = "Gapminder")
## or, if you don't use pipes
gs_copy(gs_gap(), to = "Gapminder")

Register a Sheet (in this case, by title):

gap <- gs_title("Gapminder")
#> Sheet successfully identified: "Gapminder"

Here's a registered googlesheet object:

gap
#>                   Spreadsheet title: Gapminder
#>                  Spreadsheet author: gspreadr
#>   Date of googlesheets registration: 2016-06-25 02:05:56 GMT
#>     Date of last spreadsheet update: 2015-03-23 20:34:08 GMT
#>                          visibility: private
#>                         permissions: rw
#>                             version: new
#> 
#> Contains 5 worksheets:
#> (Title): (Nominal worksheet extent as rows x columns)
#> Africa: 625 x 6
#> Americas: 301 x 6
#> Asia: 397 x 6
#> Europe: 361 x 6
#> Oceania: 25 x 6
#> 
#> Key: 1HT5B8SgkKqHdqHJmn5xiuaC04Ngb7dG9Tv94004vezA
#> Browser URL: https://docs.google.com/spreadsheets/d/1HT5B8SgkKqHdqHJmn5xiuaC04Ngb7dG9Tv94004vezA/

Visit a registered googlesheet in the browser:

gap %>% gs_browse()
gap %>% gs_browse(ws = "Europe")

Read all the data in a worksheet:

africa <- gs_read(gap)
#> Accessing worksheet titled 'Africa'.
#> No encoding supplied: defaulting to UTF-8.
str(africa)
#> Classes 'tbl_df', 'tbl' and 'data.frame':    624 obs. of  6 variables:
#>  $ country  : chr  "Algeria" "Algeria" "Algeria" "Algeria" ...
#>  $ continent: chr  "Africa" "Africa" "Africa" "Africa" ...
#>  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
#>  $ lifeExp  : num  43.1 45.7 48.3 51.4 54.5 ...
#>  $ pop      : int  9279525 10270856 11000948 12760499 14760787 17152804 20033753 23254956 26298373 29072015 ...
#>  $ gdpPercap: num  2449 3014 2551 3247 4183 ...
head(africa)
#> <tibble [6 x 6]>
#>   country continent  year lifeExp      pop gdpPercap
#>     <chr>     <chr> <int>   <dbl>    <int>     <dbl>
#> 1 Algeria    Africa  1952  43.077  9279525  2449.008
#> 2 Algeria    Africa  1957  45.685 10270856  3013.976
#> 3 Algeria    Africa  1962  48.303 11000948  2550.817
#> 4 Algeria    Africa  1967  51.407 12760499  3246.992
#> 5 Algeria    Africa  1972  54.518 14760787  4182.664
#> 6 Algeria    Africa  1977  58.014 17152804  4910.417

Some of the many ways to target specific cells:

gap %>% gs_read(ws = 2, range = "A1:D8")
gap %>% gs_read(ws = "Europe", range = cell_rows(1:4))
gap %>% gs_read(ws = "Africa", range = cell_cols(1:4))

Full readr-style control of data ingest -- highly artificial example!

gap %>%
  gs_read(ws = "Oceania", col_names = paste0("Z", 1:6),
          na = c("1962", "1977"), col_types = "cccccc", skip = 1, n_max = 7)
#> Accessing worksheet titled 'Oceania'.
#> No encoding supplied: defaulting to UTF-8.
#> <tibble [7 x 6]>
#>          Z1      Z2    Z3    Z4       Z5       Z6
#>       <chr>   <chr> <chr> <chr>    <chr>    <chr>
#> 1 Australia Oceania  1952 69.12  8691212  10039.6
#> 2 Australia Oceania  1957 70.33  9712569 10949.65
#> 3 Australia Oceania  <NA> 70.93 10794968 12217.23
#> 4 Australia Oceania  1967  71.1 11872264 14526.12
#> 5 Australia Oceania  1972 71.93 13177000 16788.63
#> 6 Australia Oceania  <NA> 73.49 14074100  18334.2
#> 7 Australia Oceania  1982 74.74 15184200 19477.01

Create a new Sheet from an R object:

iris_ss <- gs_new("iris", input = head(iris, 3), trim = TRUE)
#> Warning: At least one sheet matching "iris" already exists, so you may
#> need to identify by key, not title, in future.
#> Sheet "iris" created in Google Drive.
#> Range affected by the update: "A1:E4"
#> Worksheet "Sheet1" successfully updated with 20 new value(s).
#> Accessing worksheet titled 'Sheet1'.
#> Sheet successfully identified: "iris"
#> Accessing worksheet titled 'Sheet1'.
#> Worksheet "Sheet1" dimensions changed to 4 x 5.
#> Worksheet dimensions: 4 x 5.

Edit some arbitrary cells and append a row:

iris_ss <- iris_ss %>% 
  gs_edit_cells(input = c("what", "is", "a", "sepal", "anyway?"),
                anchor = "A2", byrow = TRUE)
#> Range affected by the update: "A2:E2"
#> Worksheet "Sheet1" successfully updated with 5 new value(s).
iris_ss <- iris_ss %>% 
  gs_add_row(input = c("sepals", "support", "the", "petals", "!!"))
#> Row successfully appended.

Look at what we have wrought:

iris_ss %>% 
  gs_read()
#> Accessing worksheet titled 'Sheet1'.
#> No encoding supplied: defaulting to UTF-8.
#> <tibble [4 x 5]>
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <chr>       <chr>        <chr>       <chr>   <chr>
#> 1         what          is            a       sepal anyway?
#> 2          4.9           3          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4       sepals     support          the      petals      !!

Download this precious thing as csv (other formats are possible):

iris_ss %>% 
  gs_download(to = "iris-ish-stuff.csv", overwrite = TRUE)
#> Sheet successfully downloaded:
#> /Users/jenny/rrr/googlesheets/iris-ish-stuff.csv

Download this precious thing as an Excel workbook (other formats are possible):

iris_ss %>% 
  gs_download(to = "iris-ish-stuff.xlsx", overwrite = TRUE)
#> Sheet successfully downloaded:
#> /Users/jenny/rrr/googlesheets/iris-ish-stuff.xlsx

Upload a Excel workbook into a new Sheet:

gap_xlsx <- gs_upload(system.file("mini-gap", "mini-gap.xlsx",
                                  package = "googlesheets"))
#> File uploaded to Google Drive:
#> /Users/jenny/resources/R/library/googlesheets/mini-gap/mini-gap.xlsx
#> As the Google Sheet named:
#> mini-gap

Clean up our mess locally and on Google Drive:

gs_vecdel(c("iris", "Gapminder"))
file.remove(c("iris-ish-stuff.csv", "iris-ish-stuff.xlsx"))

Remember, the vignette shows a lot more usage.

fxndescription
gs_ls()List Sheets
gs_title()Register a Sheet by title
gs_key()Register a Sheet by key
gs_url()Register a Sheet by URL
gs_gs()Re-register a googlesheet
gs_browse()Visit a registered googlesheet in the browser
gs_read()Read data and let googlesheets figure out how
gs_read_csv()Read explicitly via the fast exportcsv link
gs_read_listfeed()Read explicitly via the list feed
gs_read_cellfeed()Read explicitly via the cell feed
gs_reshape_cellfeed()Reshape cell feed data into a 2D thing
gs_simplify_cellfeed()Simplify cell feed data into a 1D thing
gs_edit_cells()Edit specific cells
gs_add_row()Append a row to pre-existing data table
gs_new()Create a new Sheet and optionally populate
gs_copy()Copy a Sheet into a new Sheet
gs_rename()Rename an existing Sheet
gs_ws_ls()List the worksheets in a Sheet
gs_ws_new()Create a new worksheet and optionally populate
gs_ws_rename()Rename a worksheet
gs_ws_delete()Delete a worksheet
gs_delete()Delete a Sheet
gs_grepdel()Delete Sheets with matching titles
gs_vecdel()Delete the named Sheets
gs_upload()Upload local file into a new Sheet
gs_download()Download a Sheet into a local file
gs_auth()Authorize the package
gs_deauth()De-authorize the package
gs_user()Get info about current user and auth status
gs_webapp_auth_url()Facilitates auth by user of a Shiny app
gs_webapp_get_token()Facilitates auth by user of a Shiny app
gs_gap()Registers a public Gapminder-based Sheet (for practicing)
gs_gap_key()Key of the Gapminder practice Sheet
gs_gap_url()Browser URL for the Gapminder practice Sheet

Think of googlesheets as a read/write CMS that you (or your less R-obsessed friends) can edit through Google Docs, as well via R. It's like Christmas up in here.

Use a Google Form to conduct a survey, which populates a Google Sheet.

  • The googleformr package provides an R API for Google Forms, allowing useRs to POST data securely to Google Forms without authentication. On CRAN and GitHub (README has lots of info and links to blog posts).

Gather data while you're in the field in a Google Sheet, maybe with an iPhone or an Android device. Take advantage of data validation to limit the crazy on the way in. You do not have to be online to edit a Google Sheet! Work offline via the Chrome browser, the Sheets app for Android, or the Sheets app for iOS.

There are various ways to harvest web data directly into a Google Sheet. For example:

  • IFTTT, which stands for "if this, then that", makes it easy to create recipes in which changes in one web service, such as Gmail or Instagram, trigger another action, such as writing to a Google Sheet.
  • IMPORTXML(), IMPORTHTML(), IMPORTFEED(): Google Sheets offer functions to populate Sheets based on web data.
    • Aylien.com blog post on using =IMPORTXML() to populate a Google Sheet with restaurant reviews and ratings from TripAdvisor.
    • Martin Hawksey blog post, Feeding Google Spreadsheets, shows how to scrape web data into a Google Sheet with no programming.
  • Martin Hawksey offers TAGS, a free Google Sheet template to setup and run automated collection of search results from Twitter.

Use googlesheets to get all that data into R.

Use it in a Shiny app! Several example apps come with the package.

What other ideas do you have?

News

googlesheets 0.2.1

  • XML is no longer a dependency.
  • Automatic retries for Internal Server Error (HTTP 500). On or around 2016-03-11, there was a huge increase in the frequency of this error on Google Drive API calls.
    • Remedy: all HTTP GET calls in the package are automatically retried up to 5 times, with exponential backoff, for statuses 500 and higher.
  • Functions prefixed with gd_ refer to Google Drive and might eventually migrate into a separate Google Drive package. Generally there is a synonym with the gs_ prefix.
  • gd_token() is a new function to expose information about the current Google token. Some of this was migrated out of gd_user() and into gd_token(). New information includes scopes and cache path.
  • gd_user() now returns an S3 object of class drive_user, but it's really just a list with a nice print method. It exposes information about the current Google user. New information includes user's Drive permissionId and rootFolderId.

googlesheets 0.2.0

  • Added a NEWS.md file to track changes to the package.
  • httr v1.1.0: to become compatible with this version, we now require it.
  • Dependency changes:
    • plyr is no longer required (#204)
    • purrr is a new dependency
  • gs_browse() is a new function to visit a Google Sheet in the browser.
  • gs_rename() is a new function to rename an existing Sheet (#145).
  • gs_add_row() now works for two-dimensional input, by calling itself once per row of input (#188, @jimhester).
  • Updated the scope for the Drive API. It is possible that new/updated Drive functions will require a token obtained with the new scope. This could mean that tokens stored and loaded from file in a non-interactive environment will need to be remade.
  • gs_read_listfeed() now supports parameters to manipulate data in the API call itself: reverse inverts row order, orderby selects a column to sort on, sq accepts a structured query to filter rows. (#17)
  • gs_read_listfeed() doesn't return API-mangled column names anymore. They should now be the same as those from the other read functions and what you see in the browser.
  • readr-style data ingest: We explicitly try to match the interface of readr::read_csv(). The read functions gs_read(), gs_read_csv(), and gs_read_listfeed() and the reshaper gs_reshape_cellfeed() should all return the same data frame when operating on the same worksheet. And this should match what readr::read_csv() would return on a .csv file exported from that worksheet. The type conversion arguments for gs_simplify_cellfeed() have also changed accordingly.
    • The header argument is no longer accepted. Use col_names.
    • If you're not happy with the defaults, take control via the ... arguments of gs_read* or reshape/simplify functions. Specify column_types, col_names, locale, na, trim_ws, etc. here.
    • See the sections "Controlling data ingest, theory and practice" in the the Basic Usage vignette for details and examples.
    • readr exception #1: variables that consist entirely of missing values will be NA of the logical type, not NA_character_.
    • readr exception #2: googlesheets will never return a data frame with NA as a variable name. Instead, it will create a dummy variable name, like X5.
    • readr exception #3: All read/reshape functions accept check.names, in the spirit of utils::read.table(), which defaults to FALSE. If TRUE, variable names will be run through make.names(..., unique = TRUE). (#208)
  • gs_read_cellfeed() now returns all possible definitions of cell contents:
    • value: The variable previously known as cell_text. What you see in the browser and what Sheets API returns by default.
    • input_value: What you would have typed into the cell. Will give unevaluated formulas. (#18, #19, #152)
    • numeric_value: An actual number, if such exists, unmangled by rounding or other numeric formatting. (#152, #178)
  • New argument literal = FALSE available in reading/reshaping functions that call the cell feed. Tries to be clever about using different definitions of cell contents.
  • gs_deauth() is a newly exported function that allows you to suspend the current token and, optionally, disable the .httr-oauth token cache file by renaming it to .httr-oauth-SUSPENDED.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("googlesheets")

0.2.2 by Jennifer Bryan, 3 months ago


https://github.com/jennybc/googlesheets


Report a bug at https://github.com/jennybc/googlesheets/issues


Browse source code at https://github.com/cran/googlesheets


Authors: Jennifer Bryan [aut, cre], Joanna Zhao [aut]


Documentation:   PDF Manual  


Task views: Web Technologies and Services


MIT + file LICENSE license


Imports cellranger, dplyr, httr, jsonlite, purrr, readr, stats, stringr, tidyr, utils, xml2

Suggests covr, ggplot2, knitr, testthat, rmarkdown, rprojroot


Imported by gfer.


See at CRAN