An implementation of calls designed to extract and organize Twitter data via Twitter's REST and stream APIs. Functions formulate and send API requests, convert response objects to more user friendly data structures---e.g., data frames---and provide some aesthetically pleasing visualizations for exploring the data.
R client for collecting data via Twitter's REST and stream API's.
NEW (dev version on Github): Out of the box functionality! Start using
rtweet the moment you install the package. Limited authorization access provided for users looking to test-drive the package before obtaining and using access tokens.
Tweet from your R console using the
Stream a random sample of tweets using
stream_tweets(). The function default,
q = "", now streams a random sample of all tweets.
Save as CSV: If you'd like to open Twitter data in Excel or SPSS, use the
Gather tweet data by searching past tweets
search_tweets(), streaming live tweets
stream_tweets(), collecting tweets from a user's timeline
get_timeline(), or gathering all the tweets favorited by a user
Gather user data by looking up Twitter users
lookup_users(). Easily return data on thousands of users.
Gather followers and friends data by collecting the ids of accounts following a user
get_followers() or the ids of accounts followed by a user
Organized and easily translatable data formats. Functions return tidy data frames ready for data analysis.
To get the current released version from CRAN:
To get the current development version from github:
Quick authorization method: To make your life easier, follow the recommended steps in obtaining and using access tokens. However, for a quick start (note: much slower in long term), you can also follow the instructions below.
First, you'll need to create a Twitter app. For the callback field, make sure to enter:
Once you've created an app, record your consumer (api) and secret keys.
Generate a token by using the
twitter_token <- create_token(app = "rtweet_tokens", # whatever you named appconsumer_key = "XZgqotgOZNKlLFJqFbd8NjUtL",consumer_secret = "1rDnU3H3nrxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")
twitter_tokenevery time you use a data retrieval function, like the example below:
tw <- search_tweets("r", n = 1200, token = twitter_token, lang = "en")
More technical description: An implementation of calls designed to extract and organize Twitter data via Twitter's REST and stream API's. Functions formulate GET and POST requests and convert response objects to more user friendly structures, e.g., data frames or lists. Specific consideration is given to functions designed to return tweets, friends, and followers.
Email me at firstname.lastname@example.org
Data Analysis Helpers - Network analysis matrices and edge lists data structures - Text cleaner/utility functions - Data base management (SQL) integration for big data
get_retweeters()Retrieve users retweeting a status (in progress)
get_list()Retrieve users in list
ts_plotto enable different filtered time series and an aesthetic overhaul of the plot function as well.
as_doubleargument to provide flexibility in handling id variables (as_double provides performance boost but can create problems when printing and saving, depending on format). By default functions will return IDs as character vectors.
clean_tweetsargument provided to allow user more control over encoding and handling of non-ascii characters.
search_usersand implemented several improvements to
token, rtweet will find it.
include_retweetsarg added to
user_idclass changed to double when parsed. double is significantly faster and consumes less space. it's also capable of handling the length of id scalars, so the only downside is truncated printing.
stream_tweets()function. By default, the streaming query argument,
q, is now set to an empty string,
q = "", which returns a random sample of all Tweets (pretty cool, right?).
post_tweet()function. Users can now post tweets from their R console.
lookup_statuses() function, which is the counterpart to
lookup_users(). Supply a vector of status IDs and return tweet data
for each status.
lookup_statuses() is particularly powerful when
combined with other methods designed to collect older Tweets. Early
experiments with doing this all through R have turned out surprisingly
well, but packaging it in a way that makes it easy to do on other
machines is unlikely to happen in the short term.
Removed dplyr dependencies. Everyone should install and use
but for sake of parsimony, it's been removed from rtweet.
Continued development of S4 classes and methods. Given removal of dplyr dependencies, I've started to integrate print/show methods that will limit the number of rows (and width of columns) when printed. Given the amount of data returned in a relatively short period of time, printing entire data frames quickly becomes headache-inducing.
Added new trends functions. Find what trending locations are
trends_available() and/or search for trends
worldwide or by geogaphical location using
Stability improvements including integration with Travis CI and code analysis via codecov. Token encryption method also means API testing conducted on multiple machines and systems.
search_users()function! Search for users by keyword, name, or interest and return data on the first 1000 hits.
get_timeline() now consists of tweets data and contains users data
lookup_users() now consists of users data and contains
tweets data attribute.
To access users data from a tweets object or vice-versa, use
tweets_data() functions on objects outputed
by major rtweet retrieval functions.
Updated testthat tests
get_followers() is now a tibble
of "ids". To retrieve next cursor value, use new
Major stability improvements via testthat tests for every major function.
Since previous CRAN release, numerous new features and improvements to functions returning tweets, user data, and ids.
Search function now optimized to return more tweets per search.
Numerous improvements to stability, error checks, and namespace management.
get_followers. Returns list
with value (
next_cursor) used for next page of results. When
this value is 0, all results have been returned.
get_followers now return the list
of user ids as a tibble data table, which makes the print out much
Improved scrolling methods such that
get_timeline should return a lot more now
parser function to return status (tweets) AND user (users)
data frames when available. As a result, the parsed output for some
functions now comes as a list containing two data frames.
get_timeline function that returns tweets from selected user
Added vignettes covering tokens and search tweets
Fixed issue with
count argument in search and user functions
Fixed parsing issue for return objects with omitted variables
clean_tweets convenience function for text analysis
More examples included in documentation.
recode_error argument to
get_friends function. This is
especially useful for tracking networks over time.
ROAuth methods/objects to increase
Improved token checking procedures.
key features and more descriptions to
There are now two stable parse (convert json obj to data frame)
types. For user objects (e.g., output of
parse_user. For tweet objects (e.g., output of
stream_tweets), there is
New parse functions are now exported, so they should available for use with compatible Twitter packages or user-defined API request operations.
More parsing improvements
Various stability improvements