Jane Austen's Complete Novels

Full texts for Jane Austen's 6 completed novels, ready for text analysis. These novels are "Sense and Sensibility", "Pride and Prejudice", "Mansfield Park", "Emma", "Northanger Abbey", and "Persuasion".

CRAN_Status_Badge Build Status DOI

An R Package for Jane Austen's Complete Novels

must be intolerably stupid.”

(from Mr. Tilney in Northanger Abbey)

This package provides access to the full texts of Jane Austen's 6 completed, published novels. The UTF-8 plain text for each novel was sourced from Project Gutenberg, processed a bit, and is ready for text analysis. Each text is in a character vector with elements of about 70 characters. The package contains:

  • sensesensibility: Sense and Sensibility, published in 1811
  • prideprejudice: Pride and Prejudice, published in 1813
  • mansfieldpark: Mansfield Park, published in 1814
  • emma: Emma, published in 1815
  • northangerabbey: Northanger Abbey, published posthumously in 1818
  • persuasion: Persuasion, also published posthumously in 1818

There is also a function austen_books() that returns a tidy data frame of all 6 novels.

Users should be aware that there are some differences in usage between the novels as made available by Project Gutenberg. For example, "anything" vs. "any thing", "Mr" vs. "Mr.", and using underscores vs. all caps to indicate italics/emphasis.


To install the package type the following:


Or you can install the development version from Github:


How to Use This Package

For some ideas on getting started with analyzing these texts, see my blog post on sentiment analysis of Austen's novels. For help within R, try ?persuasion or similar for getting started with the data sets.

This project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.


janeaustenr 0.1.5

  • Fixed encoding for Mansfield Park
  • Added package to calls to data objects, since they are lazy-loaded and not in the namespace

janeaustenr 0.1.4

  • Actually fixed factor order in austen_books function to align with publication order

janeaustenr 0.1.3

  • Attempted to fix factor order in austen_books function to align with publication order (made an error in this)
  • Added unit test to check output of austen_books

janeaustenr 0.1.2

  • Moved dplyr to Suggests; change implementation of austen_books to use base functions thanks to Jeroen Ooms

janeaustenr 0.1.1

  • Added a NEWS.md file to track changes to the package.
  • Added some details on usage differences between novels to README
  • Replaced all data files with new versions to solve problem of formatting change at 10000 lines

janeaustenr 0.1.0

  • Initial release of full texts of Jane Austen's 6 completed, published novels

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.1.5 by Julia Silge, 2 years ago


Report a bug at https://github.com/juliasilge/janeaustenr/issues

Browse source code at https://github.com/cran/janeaustenr

Authors: Julia Silge [aut, cre]

Documentation:   PDF Manual  

MIT + file LICENSE license

Suggests dplyr, testthat

Imported by tidytext.

Suggested by hunspell, sparklyr, widyr.

See at CRAN