An implementation of calls designed to collect and organize Twitter data via Twitter's REST and stream Application Program Interfaces (API), which can be found at the following URL: < https://developer.twitter.com/en/docs>. This package has been peer-reviewed by rOpenSci (v. 0.6.9).
R client for accessing Twitter’s REST and stream APIs. Check out the rtweet package documentation website.
{rtweet}} should be used in strict accordance with Twitter’s developer terms.
To get the current released version from CRAN:
## install rtweet from CRANinstall.packages("rtweet")## load rtweet packagelibrary(rtweet)
To get the current development version from Github:
## install devtools package if it's not alreadyif (!requireNamespace("devtools", quietly = TRUE)) {install.packages("devtools")}## install dev version of rtweet from githubdevtools::install_github("mkearney/rtweet")## load rtweet packagelibrary(rtweet)
All you need is a Twitter account and you can be up in running in minutes!
auth
vignette (or the API authorization section below) for
instructions on obtaining access to Twitter’s APIs:
https://rtweet.info/articles/auth.html.All users must be authorized to interact with Twitter’s APIs. To become authorized, follow the instructions below to (1) make a Twitter app and (2) create and save your access token (using one of the two authorization methods described below).
Callback URL
exactly as it appears below):
Name
: Name of Twitter app e.g., my_twitter_research_app
Description
: Describe use case e.g., for researching trends and behaviors on twitter
Website
: Valid website e.g.,
https://twitter.com/kearneymw
***Callback URL***
: http://127.0.0.1:1410
Go to your app’s page at
apps.twitter.com and click the tab
labeled Keys and Access Tokens
Copy the Consumer Key
and Consumer Secret
values and
pass them, along with the name of your app, to the create_token()
function
## web browser method: create token and save it as an environment variablecreate_token(app = "my_twitter_research_app",consumer_key = "XYznzPFOFZR2a39FwWKN1Jp41",consumer_secret = "CtkGEWmSevZqJuKl6HHrBxbCybxI1xGLqrD5ynPd9jG0SoHZbD")
Go to your app’s page at
apps.twitter.com and click the tab
labeled Keys and Access Tokens
Scroll down to Token Actions
and click Create my access token
Copy the Consumer Key
, Consumer Secret
, Access Token
, and Access Token Secret
values and pass them, along
with the name of your app, to the create_token()
function
## access token method: create token and save it as an environment variablecreate_token(app = "my_twitter_research_app",consumer_key = "XYznzPFOFZR2a39FwWKN1Jp41",consumer_secret = "CtkGEWmSevZqJuKl6HHrBxbCybxI1xGLqrD5ynPd9jG0SoHZbD",access_token = "9551451262-wK2EmA942kxZYIwa5LMKZoQA4Xc2uyIiEwu2YXL",access_secret = "9vpiSGKg1fIPQtxc5d5ESiFlZQpfbknEN1f1m2xe5byw7")
And that’s it! You’re ready to start collecting and analyzing Twitter
data! And because create_token()
automatically saves your token as an
environment variable, you’ll be set for future sessions as well!
Search for up to 18,000 (non-retweeted) tweets containing the rstats hashtag.
## search for 18000 tweets using the rstats hashtagrt <- search_tweets("#rstats", n = 18000, include_rts = FALSE)
Quickly visualize frequency of tweets over time using ts_plot()
.
## plot time series of tweetsts_plot(rt, "3 hours") +ggplot2::theme_minimal() +ggplot2::theme(plot.title = ggplot2::element_text(face = "bold")) +ggplot2::labs(x = NULL, y = NULL,title = "Frequency of #rstats Twitter statuses from past 9 days",subtitle = "Twitter status (tweet) counts aggregated using three-hour intervals",caption = "\nSource: Data collected from Twitter's REST API via rtweet")
Twitter rate limits cap the number of search results returned to 18,000
every 15 minutes. To request more than that, simply set
retryonratelimit = TRUE
and rtweet will wait for rate limit resets for
you.
## search for 250,000 tweets containing the word datart <- search_tweets("data", n = 250000, retryonratelimit = TRUE)
Search by geo-location—for example, find 10,000 tweets in the English
language sent from the United States. Note: lookup_coords()
requires
users have a Google API
key
## search for 10,000 tweets sent from the USrt <- search_tweets("lang:en", geocode = lookup_coords("usa"), n = 10000)## create lat/lng variables using all available tweet and profile geo-location datart <- lat_lng(rt)## plot state boundariespar(mar = c(0, 0, 0, 0))maps::map("state", lwd = .25)## plot lat and lng points onto state mapwith(rt, points(lng, lat, pch = 20, cex = .75, col = rgb(0, .3, .7, .75)))
Randomly sample (approximately 1%) from the live stream of all tweets.
## random sample for 30 seconds (default)rt <- stream_tweets("")
Stream all geo enabled tweets from London for 60 seconds.
## stream tweets from london for 60 secondsrt <- stream_tweets(lookup_coords("london, uk"), timeout = 60)
Stream all tweets mentioning realDonaldTrump or Trump for a week.
## stream london tweets for a week (60 secs x 60 mins * 24 hours * 7 days)stream_tweets("realdonaldtrump,trump",timeout = 60 * 60 * 24 * 7,file_name = "tweetsabouttrump.json",parse = FALSE)## read in the data as a tidy tbl data framedjt <- parse_stream("tweetsabouttrump.json")
Retrieve a list of all the accounts a user follows.
## get user IDs of accounts followed by CNNcnn_fds <- get_friends("cnn")## lookup data on those accountscnn_fds_data <- lookup_users(cnn_fds$user_id)
Retrieve a list of the accounts following a user.
## get user IDs of accounts following CNNcnn_flw <- get_followers("cnn", n = 75000)## lookup data on those accountscnn_flw_data <- lookup_users(cnn_flw$user_id)
Or if you really want ALL of their followers:
## how many total follows does cnn have?cnn <- lookup_users("cnn")## get them all (this would take a little over 5 days)cnn_flw <- get_followers("cnn", n = cnn$followers_count, retryonratelimit = TRUE)
Get the most recent 3,200 tweets from cnn, BBCWorld, and foxnews.
## get user IDs of accounts followed by CNNtmls <- get_timelines(c("cnn", "BBCWorld", "foxnews"), n = 3200)## plot the frequency of tweets for each user over timetmls %>%dplyr::filter(created_at > "2017-10-29") %>%dplyr::group_by(screen_name) %>%ts_plot("days", trim = 1L) +ggplot2::geom_point() +ggplot2::theme_minimal() +ggplot2::theme(legend.title = ggplot2::element_blank(),legend.position = "bottom",plot.title = ggplot2::element_text(face = "bold")) +ggplot2::labs(x = NULL, y = NULL,title = "Frequency of Twitter statuses posted by news organization",subtitle = "Twitter status (tweet) counts aggregated by day from October/November 2017",caption = "\nSource: Data collected from Twitter's REST API via rtweet")
Get the 3,000 most recently favorited statuses by JK Rowling.
jkr <- get_favorites("jk_rowling", n = 3000)
Search for 1,000 users with the rstats hashtag in their profile bios.
## search for users with #rstats in their profilesusrs <- search_users("#rstats", n = 1000)
Discover what’s currently trending in San Francisco.
sf <- get_trends("san francisco")
Obtaining and using Twitter API tokens
## quick overview of rtweet functionsvignette("auth", package = "rtweet")
Quick overview of rtweet package
## quick overview of rtweet functionsvignette("intro", package = "rtweet")
## working with the streamvignette("stream", package = "rtweet")
Troubleshooting common rtweet problems
## working with the streamvignette("FAQ", package = "rtweet")
Communicating with Twitter’s APIs relies on an internet connection, which can sometimes be inconsistent. With that said, if you encounter an obvious bug for which there is not already an active issue, please create a new issue with all code used (preferably a reproducible example) on Github.
lookup_coords()
now requires a Google Maps API key. It will be stored for
easy future use once supplied.bearer_token()
option for access to more generous rate limits.create_token()
when using browse-based authentication
method.post_list()
, which now allows users
to create and populate lists as well as delete lists on behalf of one's own
Twitter account.lists_memberships()
and now scrolls through multiple pages of results to
automate collection of larger numbers of lists.create_token()
which allows for creation of token
non-interactive sessions via accepting inputs for consumer key, consumer
secret (always required), oauth key, and oauth secret (optional, if supplied
then non-browser sign method is used).ts_*()
functions now offer a tz
(timezone) argument, allowing users to
more easily print and plot in non-UTC time.destroy_id
argument in post_tweet()
join_rtweet()
, which omitted users who didn't have
available tweets.all_suggested_users()
, which automates the collection of Twitter's
suggested users data.save_as_csv()
, including addition of new
prep_as_csv()
as convience function for flattening Twitter data frames.save_as_csv()
with improved flattening and ID-preserving
saving methods. THe function now saves a single [joined] data set as
well.get_favorites()
and in several lists_*()
functions.stream_tweets()
stream_tweets2()
function for more robust streaming
method. Streams JSON files to directory and reconnects following
premature disruptions.get_timeline()
, get_favorites()
, get_friends()
, and
get_followers()
now accept vectors of length > 1.users_data()
stream_data()
, designed to parse files that cannot
wholely fit into memory. stream_data()
can now work in parallel as well.post_status()
function has been fixed and can now be used to
upload media.mentions_screen_name
may consist of
4 screen names).flatten()
function.
Exporting functions such as save_as_csv
will apply flatten by default.coords_coords
,
geo_coords
, and bbox_coords
bounding box. The first two come in
pairs of coords (a list column) and bbox_coords comes with 8
values (longX4 latX4). This should allow users to maximize returns
on
geo-location data.plain_tweets()
added for textual analysis.ts_plot()
with improved time-aggregating method. Now a
wrapper around ts_data()
, deprecating ts_filter
.mutate_coords()
and retryonratelimit
.ts_plot()
, ts_filter()
and more!post_
. This was done to clearly distinguish write functions from
retrieval functions.ts_plot()
function is now more robust with more adaptive
characteristics for variations in the number of filters, the method
of distiguishing lines, the position of the legend, and the
aesthetics of the themes.ts_filter()
function which allows users to convert Twitter
data into a time series-like data frame. Users may also provide
filtering rules with which ts_filter()
will subset the data as it
converts it to multiple time series, which it then outputs as a
long-form (tidy) data frame.search_tweets
now includes retryonratelimit
argument to
allow for searches requesting more than 18,000 tweets. This
automates what was previously possible through use of max_id
.stream_tweets
.parse.piper
functionts_plot
to enable different filtered time series and
an aesthetic overhaul of the plot function as well.as_double
argument to provide flexibility in handling
id variables (as_double provides performance boost but can create
problems when printing and saving, depending on format). By default
functions will return IDs as character vectors.clean_tweets
argument provided to allow user more control over
encoding and handling of non-ascii characters.search_users
and implemented several
improvements to stream_tweets
and plot_ts
.twitter_tokens
, twitter_token
, tokens
, or token
, rtweet
will find it.search_tweets
and stream_tweeets
include_retweets
arg added to search_tweets()
function.user_id
class changed to double when parsed. double is significantly
faster and consumes less space. it's also capable of handling the length of
id scalars, so the only downside is truncated printing.stream_tweets()
stream_tweets()
function. By default,
the streaming query argument, q
, is now set to an empty string,
q = ""
, which returns a random sample of all Tweets
(pretty cool, right?).post_tweet()
function. Users can now post tweets from their R console.get_favorites()
functionAdded lookup_statuses()
function, which is the counterpart to
lookup_users()
. Supply a vector of status IDs and return tweet data
for each status. lookup_statuses()
is particularly powerful when
combined with other methods designed to collect older Tweets. Early
experiments with doing this all through R have turned out surprisingly
well, but packaging it in a way that makes it easy to do on other
machines is unlikely to happen in the short term.
Removed dplyr dependencies. Everyone should install and use dplyr
,
but for sake of parsimony, it's been removed from rtweet.
Continued development of S4 classes and methods. Given removal of dplyr dependencies, I've started to integrate print/show methods that will limit the number of rows (and width of columns) when printed. Given the amount of data returned in a relatively short period of time, printing entire data frames quickly becomes headache-inducing.
Added new trends functions. Find what trending locations are
available with trends_available()
and/or search for trends
worldwide or by geogaphical location using get_trends()
.
Stability improvements including integration with Travis CI and code analysis via codecov. Token encryption method also means API testing conducted on multiple machines and systems.
search_users()
function! Search for users by keyword,
name, or interest and return data on the first 1000 hits.Output for search_tweets()
, stream_tweets()
, and
get_timeline()
now consists of tweets data and contains users data
attribute.
Output for lookup_users()
now consists of users data and contains
tweets data attribute.
To access users data from a tweets object or vice-versa, use
users_data()
and tweets_data()
functions on objects outputed
by major rtweet retrieval functions.
Updated testthat tests
Output for get_friends()
and get_followers()
is now a tibble
of "ids". To retrieve next cursor value, use new next_cursor()
function.
Major stability improvements via testthat tests for every major function.
Since previous CRAN release, numerous new features and improvements to functions returning tweets, user data, and ids.
Search function now optimized to return more tweets per search.
Numerous improvements to stability, error checks, and namespace management.
Improvements to get_friends
and get_followers
. Returns list
with value (next_cursor
) used for next page of results. When
this value is 0, all results have been returned.
Functions get_friends
and get_followers
now return the list
of user ids as a tibble data table, which makes the print out much
cleaner.
Improved scrolling methods such that search_tweets
and
get_timeline
should return a lot more now
Added parser
function to return status (tweets) AND user (users)
data frames when available. As a result, the parsed output for some
functions now comes as a list containing two data frames.
Added get_timeline
function that returns tweets from selected user
Added vignettes covering tokens and search tweets
Fixed issue with count
argument in search and user functions
Fixed parsing issue for return objects with omitted variables
Added clean_tweets
convenience function for text analysis
More examples included in documentation.
Added recode_error
argument to get_friends
function. This is
especially useful for tracking networks over time.
Further integrated ROAuth
methods/objects to increase
compatibility with twitteR
authorization procedures.
Improved token checking procedures.
Added NEWS.md
file
Added key features
and more descriptions to README.md
.
There are now two stable parse (convert json obj to data frame)
types. For user objects (e.g., output of lookup_users
), there
is parse_user
. For tweet objects (e.g., output of search_tweets
or stream_tweets
), there is parse_tweets
.
New parse functions are now exported, so they should available for use with compatible Twitter packages or user-defined API request operations.
More parsing improvements
Added format_date
function
Various stability improvements