Google 'BigQuery' Support for 'sparklyr'

A 'sparklyr' extension package providing an integration with Google 'BigQuery'. It supports direct import/export where records are directly streamed from/to 'BigQuery'. In addition, data may be imported/exported via intermediate data extracts on Google 'Cloud Storage'.


sparkbq is a sparklyr extension package providing an integration with Google BigQuery. It builds on top of spark-bigquery, which provides a Google BigQuery data source to Apache Spark.

Version Information

You can install the released version of sparkbq from CRAN via


or the latest development version through

devtools::install_github("miraisolutions/sparkbq", ref = "develop")

The following table provides an overview over supported versions of Apache Spark, Scala, and Google Dataproc:

sparkbq spark-bigquery Apache Spark Scala Google Dataproc
0.1.x 0.1.0 2.2.x and 2.3.x 2.11 1.2.x and 1.3.x

sparkbq is based on the Spark package spark-bigquery which is available in a separate GitHub repository.

Example Usage

config <- spark_config()
sc <- spark_connect(master = "local[*]", config = config)
# Set Google BigQuery default settings
  billingProjectId = "<your_billing_project_id>",
  gcsBucket = "<your_gcs_bucket>",
  datasetLocation = "US",
  serviceAccountKeyFile = "<your_service_account_key_file>",
  type = "direct"
# Reading the public shakespeare data table
hamlet <- 
    name = "hamlet",
    projectId = "bigquery-public-data",
    datasetId = "samples",
    tableId = "shakespeare") %>%
  filter(corpus == "hamlet") # NOTE: predicate pushdown to BigQuery!
# Retrieve results into a local tibble
hamlet %>% collect()
# Write result into "mysamples" dataset in our BigQuery (billing) project
  datasetId = "mysamples",
  tableId = "hamlet",
  mode = "overwrite")


When running outside of Google Cloud it is necessary to specify a service account JSON key file. Information on how to generate service account credentials can be found at The service account key file can either be passed as parameter serviceAccountKeyFile to bigquery_defaults or directly to spark_read_bigquery and spark_write_bigquery. Alternatively, an environment variable export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/service_account_keyfile.json can be set (see for more information). When running on Google Cloud, e.g. Google Cloud Dataproc, application default credentials (ADC) may be used in which case it is not necessary to specify a service account key file.

Further Information


0.1.0 2018-07-31

  • Initial Release

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.1.1 by Martin Studer, 2 years ago,

Report a bug at

Browse source code at

Authors: Mirai Solutions GmbH [aut] , Martin Studer [cre] , Nicola Lambiase [ctb] , Omer Demirel [ctb]

Documentation:   PDF Manual  

Task views:

GPL-3 | file LICENSE license

Imports sparklyr

Suggests dplyr

System requirements: Spark (>= 2.2.x)

See at CRAN