Functions for querying the 'Google Analytics' core reporting, real-time, multi-channel funnel and management APIs, as well as the 'Google Tag Manager' (GTM) API. Write methods are also provided for the management and GTM APIs so that you can change tag, property or view settings, for example. Define reporting queries using natural R expressions instead of being concerned as much about API technical intricacies like query syntax, character code escaping, and API limitations.
Johann de Boer 2018-06-07
Classes and methods for interactive use of the Google Analytics core reporting, real-time reporting, multi-channel funnel reporting, metadata, configuration management and Google Tag Manager APIs.
The aim of this package is to support R users in defining reporting queries using natural R expressions instead of being concerned about API technical intricacies like query syntax, character code escaping and API limitations.
This package provides functions for querying the Google Analytics core reporting, real-time reporting, multi-channel funnel reporting and management APIs, as well as the Google Tag Manager API. Write methods are also provided for the Google Analytics Management and Google Tag Manager APIs so that you can, for example, change tag, property or view settings.
Support for GoogleAnalyticsR integration is now available for segments
and table filter objects. You can supply these objects to the
google_analytics function in GoogleAnalyticsR by using
supplying the appropriate GoogleAnalyticsR class names, which are
"segment_ga4" for segments and
".filter_clauses_ga4" for table
filters. Soon GoogleanalyticsR will implicitly coerce ganalytics
segments and table filters so that you do not need to explicitly coerce
Many new functions have been provided for writing segmentation expressions:
Segments(...)- define a list of segments dynamically based on one or more expressions and/or a selection of built-in and/or custom segments by their IDs.
Include(...)- expressions (conditions or sequences) defining users or sessions to include in the segment
Exclude(...)- expressions (conditions or sequences) defining users or sessions to exclude from the segment
PerUser(...)- set the scope of one or more segment conditions or sequences to user-level, or set the scope of a metric condition to user-level.
PerSession(...)- set the scope of one or more segment conditions or sequences to user-level, or set the scope of a metric condition to session-level.
PerHit(...)- specify that a set of logically combined conditions must all be met for a single hit, or set the scope of a metric condition to hit-level.
Sequence(...)- define a sequence of one or more conditions to use in a dynamic segment definition.
Then(condition)- used within a
Sequence()to specify that this condition must immediately follow the preceding condition, as opposed to the default of loosely following at some point later.
Later(condition)- similar to
Then()but means that a condition can happen any point after the preceding condition - this is how conditions are treated by default in a sequence if not explicitly set.
First(condition)- similar to
Then()but means that a condition must be the first interaction (hit) by the user within the specified date-range. Using
First()is optional. Without using
First()at the start of a sequence, then the first condition does not need to match the first interaction by the user. It does not make sense to use
First()anywhere else in the sequence other than at the start, if used at all.
Multi-channel funnel (MCF) and real-time (RT) queries can now be constructed, but work is still needed to process the response from these queries - stay tuned for updates on this.
Instead of using
Not, it is now possible to use
familiar R language Boolean operators,
Not) instead (thanks to @hadley for suggestion
#2). It is important
to keep in mind however that Google Analytics requires
Or to have
And, which is the opposite to the natural precedence
given by R when using the
& operators. Therefore, remember to
) to enforce the correct order of operation to
your Boolean expressions. For example
my_filter <- !bounced & (completed_goal | transacted) is a valid structure for a Google
Analytics reporting API filter expression.
You can now query the Google Analytics Management API to obtain details in R about the configuration of your accounts, properties and views, such as goals you have defined. There are write methods available too, but these have not been fully tested so use with extreme care. If you wish to use these functions, it is recommended that you test these using test login, otherwise avoid using the “INSERT”, “UPDATE” and “DELETE” methods.
There is also some basic support for the Google Tag Manager API, but again, this is a work in progress so take care with the write methods above.
You can install the released version of ganalytics from CRAN with:
Alternatively, you can execute the following statements in R to install the current stable development version of ganalytics from GitHub:
# Install the latest version of remotes via CRANinstall.packages("remotes")# Install ganalytics via the GitHub repository.remotes::install_github("jdeboer/ganalytics")# End
Note: For further information about Google APIs, please refer to the References section at the end of this document.
Add the following two user variables:
|Variable name||Variable value|
.Renvironfile within your active R working directory that is structured like this:
GOOGLE_APIS_CONSUMER_ID = <Your client ID> GOOGLE_APIS_CONSUMER_SECRET = <Your client secret>
Alternatively you can temporarily set your environment variables straight from R using this command:
Sys.setenv(GOOGLE_APIS_CONSUMER_ID = "<Your client ID>",GOOGLE_APIS_CONSUMER_SECRET = "<Your client secret>")
Note: For other operating systems please refer to the Reference section at the end of this document.
ganalytics needs to know the ID of the Google Analytics view that you wish to query. You can obtain this in a number of ways:
.../a11111111w22222222p33333333/shows a view ID of
Alternatively, ganalytics can look up the view ID for you:
Return to R and execute the following to load the ganalytics package:
If you have successfully set your system environment variables in step 3 above, then you can execute the following, optionally providing the email address you use to sign-in to Google Analytics:
my_creds <- GoogleApiCreds("[email protected]")
Otherwise do one of the following:
If you downloaded the JSON file containing your Google API app credentials, then provide the file path:
my_creds <- GoogleApiCreds("[email protected]", "client_secret.json")
Or, instead of a file you can supply the
my_creds <- GoogleApiCreds(list(client_id = "<client id>", client_secret = "<client secret>"))
Now formulate and run your Google Analytics query, remembering to
view_id with the view ID you wish to
myQuery <- GaQuery( view_id, creds = my_creds ) # view_id is optionalGetGaData(myQuery)
You should then be directed to accounts.google.com within your default web browser asking you to sign-in to your Google account if you are not already. Once signed-in you will be asked to grant read-only access to your Google Analytics account for the Google API project you created in step 1.
Make sure you are signed into the Google account you wish to use, then grant access by selecting “Allow access”. You can then close the page and return back to R.
If you have successfully executed all of the above R commands you should see the output of the default ganalytics query; sessions by day for the past 7 days. For example:
date sessions 1 2015-03-27 2988 2 2015-03-28 1594 3 2015-03-29 1912 4 2015-03-30 3061 5 2015-03-31 2609 6 2015-04-01 2762 7 2015-04-02 2179 8 2015-04-03 1552
Note: A small file will be saved to your home directory (‘My Documents’ in Windows) to cache your new reusable authentication token.
As demonstrated in the installation steps above, before executing any of the following examples:
gaQueryobject using the
GaQuery()function and assigning the object to a variable name such as
The following examples assume you have successfully completed the
above steps and have named your Google Analytics query object:
# Set the date range from 1 January 2013 to 31 May 2013: (Dates are specified in the format "YYYY-MM-DD".)DateRange(myQuery) <- c("2013-01-01", "2013-05-31")myData <- GetGaData(myQuery)summary(myData)# Adjust the start date to 1 March 2013:StartDate(myQuery) <- "2013-03-01"# Adjust the end date to 31 March 2013:EndDate(myQuery) <- "2013-03-31"myData <- GetGaData(myQuery)summary(myData)# End
# Report number of page views insteadMetrics(myQuery) <- "pageviews"myData <- GetGaData(myQuery)summary(myData)# Report both pageviews and sessionsMetrics(myQuery) <- c("pageviews", "sessions")# These variations are also acceptableMetrics(myQuery) <- c("ga:pageviews", "ga.sessions")myData <- GetGaData(myQuery)summary(myData)# End
# Similar to metrics, but for dimensionsDimensions(myQuery) <- c("year", "week", "dayOfWeekName", "hour")# Lets set a wider date rangeDateRange(myQuery) <- c("2012-10-01", "2013-03-31")myData <- GetGaData(myQuery)head(myData)tail(myData)# End
# Sort by descending number of pageviewsSortBy(myQuery) <- "-pageviews"myData <- GetGaData(myQuery)head(myData)tail(myData)# End
# Filter for Sunday sessions onlysundayExpr <- Expr(~dayOfWeekName == "Sunday")TableFilter(myQuery) <- sundayExprmyData <- GetGaData(myQuery)head(myData)# Remove the filterTableFilter(myQuery) <- NULLmyData <- GetGaData(myQuery)head(myData)# End
# Expression to define Sunday sessionssundayExpr <- Expr(~dayOfWeekName == "Sunday")# Expression to define organic search sessionsorganicExpr <- Expr(~medium == "organic")# Expression to define organic search sessions made on a SundaysundayOrganic <- sundayExpr & organicExprTableFilter(myQuery) <- sundayOrganicmyData <- GetGaData(myQuery)head(myData)# Let's concatenate medium to the dimensions for our queryDimensions(myQuery) <- c(Dimensions(myQuery), "medium")myData <- GetGaData(myQuery)head(myData)# End
# In a similar way to ANDloyalExpr <- !Expr(~sessionCount %matches% "^[0-3]$") # Made more than 3 sessionsrecentExpr <- Expr(~daysSinceLastSession %matches% "^[0-6]$") # Visited sometime within the past 7 days.loyalOrRecent <- loyalExpr | recentExprTableFilter(myQuery) <- loyalOrRecentmyData <- GetGaData(myQuery)summary(myData)# End
loyalExpr <- !Expr(~sessionCount %matches% "^[0-3]$") # Made more than 3 sessionsrecentExpr <- Expr(~daysSinceLastSession %matches% "^[0-6]$") # Visited sometime within the past 7 days.loyalOrRecent <- loyalExpr | recentExprsundayExpr <- Expr(~dayOfWeekName == "Sunday")loyalOrRecent_Sunday <- loyalOrRecent & sundayExprTableFilter(myQuery) <- loyalOrRecent_SundaymyData <- GetGaData(myQuery)summary(myData)# Perform the same query but change which dimensions to viewDimensions(myQuery) <- c("sessionCount", "daysSinceLastSession", "dayOfWeek")myData <- GetGaData(myQuery)summary(myData)# End
# Continuing from example 8...# Change filter to loyal session AND recent sessions AND visited on SundayloyalAndRecent_Sunday <- loyalExpr & recentExpr & sundayExprTableFilter(myQuery) <- loyalAndRecent_Sunday# Sort by decending visit count and ascending days since last visit.SortBy(myQuery) <- c("-sessionCount", "+daysSinceLastSession")myData <- GetGaData(myQuery)head(myData)# Notice that the Google Analytics Core Reporting API doesn't recognise 'numerical' dimensions as# ordered factors when sorting. We can use R to sort instead, such as using dplyr.library(dplyr)myData <- myData %>% arrange(desc(sessionCount), daysSinceLastSession)head(myData)tail(myData)# End
# Visit segmentation is expressed similarly to row filters and supports AND and OR combinations.# Define a segment for sessions where a "thank-you", "thankyou" or "success" page was viewed.thankyouExpr <- Expr(~pagePath %matches% "thank\\-?you|success")Segments(myQuery) <- thankyouExpr# Reset the filterTableFilter(myQuery) <- NULL# Split by traffic source and mediumDimensions(myQuery) <- c("source", "medium")# Sort by decending number of sessionsSortBy(myQuery) <- "-sessions"myData <- GetGaData(myQuery)head(myData)# End
# Sessions by date and hour for the years 2016 and 2017:# First let's clear any filters or segments defined previouslyTableFilter(myQuery) <- NULLSegments(myQuery) <- NULL# Define our date rangeDateRange(myQuery) <- c("2016-01-01", "2017-12-31")# Define our metrics and dimensionsMetrics(myQuery) <- "sessions"Dimensions(myQuery) <- c("date", "dayOfWeekName", "hour")# Let's allow a maximum of 20000 rows (default is 10000)MaxResults(myQuery) <- 20000myData <- GetGaData(myQuery)nrow(myData)## Let's use dplyr to analyse the datalibrary(dplyr)# Sessions by day of weeksessions_by_dayOfWeek <- myData %>%count(dayOfWeekName, wt = sessions) %>%mutate(dayOfWeekName = factor(dayOfWeekName, levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"), labels = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"), ordered = TRUE)) %>%arrange(dayOfWeekName)with(sessions_by_dayOfWeek,barplot(n, names.arg = dayOfWeekName, xlab = "day of week", ylab = "sessions"))# Sessions by hour of daysessions_by_hour <- myData %>%count(hour, wt = sessions)with(sessions_by_hour,barplot(n, names.arg = hour, xlab = "hour", ylab = "sessions"))# End
To run this example first install ggplot2 if you haven’t already.
Once installed, then run the following example.
library(ggplot2)library(dplyr)# Sessions by date and hour for the years 2016 and 2017:# First let's clear any filters or segments defined previouslyTableFilter(myQuery) <- NULLSegments(myQuery) <- NULL# Define our date rangeDateRange(myQuery) <- c("2016-01-01", "2017-12-31")# Define our metrics and dimensionsMetrics(myQuery) <- "sessions"Dimensions(myQuery) <- c("date", "dayOfWeek", "hour", "deviceCategory")# Let's allow a maximum of 40000 rows (default is 10000)MaxResults(myQuery) <- 40000myData <- GetGaData(myQuery)# Sessions by hour of day and day of weekavg_sessions_by_hour_wday_device <- myData %>%group_by(hour, dayOfWeek, deviceCategory) %>%summarise(sessions = mean(sessions)) %>%ungroup()# Relabel the days of weeklevels(avg_sessions_by_hour_wday_device$dayOfWeek) <- c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat")# Plot the summary dataqplot(x = hour,y = sessions,data = avg_sessions_by_hour_wday_device,facets = ~dayOfWeek,fill = deviceCategory,geom = "col")# End
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Google Analytics and Google Tag Manager are trademarks of Google.
2018-06-23 Version type dimensions, e.g. ga:appVersion, are now coerced to
numeric_version class, so that version numbers (e.g. ‘2.4.7’, ‘2.5.13’, ‘2.32.1’, etc...) can be correctly sorted and compared as if they were numeric values. Updated gademo.R. The dateRange class now inherits 'lubridate' interval as its superclass. It is now possible to query more than 10 metrics with just one query.
2018-05-30 Additional methods to coerce ganalytics segment classes and table filters for use with the 'googleAnalyticsR' package. Dynamic Segments objects now have a name property. Updated
googleAnalyticsR-demo and examples in the readme file.
2018-02-18 Added methods for coercing a range of ganalytics classes into 'googleAnalyticsR' classes, so that ganalytics segments, filters and expressions can be used by the google_analytics function of the 'googleAnalyticsR' package.
2018-02-12 Scope and negation of segment conditions can now be defined at the filter level. Fixed bug where backslashes were being escaped incorrectly in expression operands. The methods of the Segment generic function are now split into two generic functions, Segment and Segments. Segments is used to set or get a named list segments, whereas Segment is for defining a single segment to be added to a Segments list.
2017-12-27 Support for %starts_with% and %ends_with% operators in addition to %contains% which was incorrectly described as %starts_with% previously. Added PerProduct method for setting metric scope for dynamic segmentation. ganalytics now searches the working directory for a json file containing the API client ID and secret supplied by Google Cloud Console credentials manager.
2015-12-19 Username (email address) for Google API authentication can now be set using via a system environment variable with a key named as [APPNAME]_USER, where [APPNAME] is 'GOOGLE_APIS' default, e.g. GOOGLE_APIS_USER = [email protected] Updates to support recent API additions, including Management API view bot filtering flag.
2015-12-12 Ability to query past 10 metrics by automatically joining API results by there dimensions. Added methods for NOTing one-of-in-list '' dimension and within-range '<>' metric expressions. Default date range has been pushed back 1 day to ensure complete data is returned by MCF queries.
2015-11-27 Support multiple segments being applied to a query.
2015-11-25 profileId argument for Query generators is deprecated.
2015-10-06  and [] operators now supported by Management API collection objects, so get a single entity resource from collection such as Accounts, Properties, Views, Goals, etc.
2015-09-25 gaGoal objects now include goal configuration details.
2015-09-20 Comparators are now generic functions with methods for supplying a .var (LHS) and .operand (RHS)
2015-09-18 PerUser and PerSession can now be used instead of SegmentFilters to create a scoped segment filter list. Also, Include and Exclude have been added to add include and exclude (i.e negate) filters to a segment definition, rather than needing to use the negate argument of the Sequence and SegmentConditionFilter functions.
Added function for generating a segment definition from a list where ... can be used to mean 'followed-by / Later' prior to the next step in the sequence. Note this function uses non-standard evaluation.
PerHit can be used to transform a condition filter into a sequence of length one, which offers a powerful form of segmentation where all conditions must be met for a single hit rather than scoped across sessions or users.
Expr can now be used with a formula denoted by the prefix
~. This uses non-standard evaluation so that variable names and condition operators do not need to be surrounded by quotation marks.
Added IsNegated generic function and method for testing whether a segment filter's negated slot is set to TRUE.
2015-09-16 Renamed segmentation functions: GaSequenceCondition -> Sequence; GaNonSequenceCondition -> SegmentConditionFilter; GaSegmentCondition -> SegmentFilters
2015-09-15 Renamed GaSequence to Sequence. Added PerUser, PerSession and PerHit generic functions for setting the scope of segment filters, and metrics conditions used within segments. Also, renamed GaScopeLevel and GaScopeLevel<- functions to ScopeLevel and ScopeLevel<- respectively. Sampling warnings are now more informative by notifying you of the total sample size and space with a sampling rate percentage too. Authentication credentials are remembered between commands without the need for the user to store them in a local variable.
2015-09-01 Added Domain Specific Language (DSL) functions utilising Non-standard Evaluation (NSE) for defining conditions and sequences. 2015-08-20 Renamed segmentation functions: GaStartsWith -> First, GaPreceeds -> Later, GaImmediatelyPreceeds -> Then . Renamed operator to comparator. Added functions to set scope of segment filters and segment metric expressions.
2015-08-17 Update to latest dimension and metrics metadata and added support for custom dimensions and device category as view filter fields. Changed default metric for real-time queries to rt:pageviews. 2015-08-14 Added demos. Added support for new alphanumeric segment IDs. Foundations to support multiple segments within a single query. 2015-06-05 Support the use of a 'lubridate' interval object as a dateRange object for GA Reporting API queries.
2015-05-04 Added support for real-time and multi-channel-funnels reporting APIs - both formulating queries and processing the query responses.
2015-05-02 Renaming of many functions by removing the Ga prefix in the name, with backwards compatibility for the old function names via aliases.
2015-04-26 Support for base R logical expression operators for defining GA query expressions. Added validity check for dimension and metric names of MCF and RT expressions.
2015-04-06 Added function for setting or getting the scope level of segments and expressions. 2015-04-05 Added support for dateOfSession dimension when used for segmentation. Suggest valid dimension and metric names to the user if a partial match is found. Automatic handling of date formatting for API requests and responses. 2015-04-04 Added support for list and range comparator operators when used for segment expressions.
2015-03-22 Ability to update and delete existing resource entities such as user links. 2015-03-21 Query view filter definitions. Ability to insert new resource entities where supported by the Management API, e.g. adding new user links. Also, ability to query definitions of custom dimensions and metrics via the management API 2015-02-12 Ability to query user permissions for accounts, properties and views via the Management API. 2015-02-09 Added Google Tag Manager classes and methods.
2015-01-29 Extend GaSegment methods to accept a gaUserSegment class object in addition to already accepted expressions and segment IDs.
2015-01-26 Automatically select view from a given gaProperty or gaAccount class object. Extend GaQuery methods to accept a gaView class object in addition to already accepted view IDs. Ability to request user defined segments via the Management API. If no OAuth app creds provided, then use JSON file in current directory called ".app_oauth_creds.json" if exists.
2015-01-25 Query the Google Tag Manager API 2015-01-12 Values for various class properties defined as factors with appropriate levels.
2015-01-11 Ability to automatically select the default view of a given property 2014-12-27 Functions to retrieve details about available Google Analytics accounts, properties and views that can be queried.
2014-12-20 Implemented exponential back off algorithm to improve reliability of fetching reporting API data in case of intermittent network outages. 2014-11-22 Optionally supply a username to use for the OAuth2.0 user authentication dance with Google. 2014-11-21 Optionally supply a JSON file from the Google APIs console that contains the client ID and secret to use for OAuth2.0 authentication.
2014-09-30 Ability to negate a segment expression using R's NOT (!) operator.
2014-09-16 Include sample size and sample space as attributes in the returned dataframe from a reporting API request. 2014-08-09 Support for defining unified segment expressions. Added optional argument to set the sampling level of a query.
2014-06-21 Query from multiple views with a binded response
2014-06-05 Upgrade to using Meta Data API for updating available dimensions and metrics 2014-06-04 Warning given for queries resulting in a sampled report being returned
2014-05-23 Upgrade to OAuth2.0 functionality built into httr 2013-09-16 Implemented function to split a date range into N or daily increments 2013-06-10 Automate update of available dimensions and metrics. Abstraction of Google APIs request as a generalised function 2013-05-31 Implemented OAuth2.0 reference classes
2013-05-25 Initial version released via GitHub