Students learning both econometrics and R may find the introduction to both challenging. The wooldridge data package aims to lighten the task by efficiently loading any data set found in the text with a single command. Data sets have been compressed to a fraction of their original size. Documentation files contain page numbers, the original source, time of publication, and notes from the author suggesting avenues for further analysis and research. If one needs an introduction to linear model syntax, a vignette contains R solutions to examples from each chapter of the text. Data sets are from the 7th edition (Wooldridge 2020, ISBN-13: 978-1-337-55886-0), and are backwards compatible with all versions of the text.
Economics students new to both econometrics and R may find the introduction to both challenging. However, if their text is "Introductory Econometrics: A Modern Approach, 6e" by Jeffrey M. Wooldridge, they are in luck!
wooldridge data package aims to lighten the task by easily loading any data set from the text. The package contains full documentation for every data set and all data have been compressed to a fraction of their original size. Just install the package, load it, and call the data you wish to work with.
But wait...there's more! A vignette, Introductory Econometrics Examples✨, illustrates solutions to examples from each chapter of the text, offering a relevant introduction to econometric modelling with R. The vignette also includes an Appendix of helpful resources, such as Using R for Introductory Econometrics by Florian Hess.
While the original course companion site provides publicly available data sets for Eviews, Excel, and Stata commercial software, this package is the official R open source option. Using R while building a foundation in econometric modeling, not only saves learners a few units of currency, but also introduces them to software capable of scaling with the demands of modern statistical computing.
Note: All data sets are from the 6th edition (Wooldridge 2016,
ISBN-13: 978-1-305-27010-7), which is compatible with all other editions.
wooldridge v1.3 directly from The Comprehensive R Archive Network (CRAN). The package contains all data sets from the 6th edition and depends on R >= 3.0.0.
wooldridge package and use the
data() function to bring the desired data set into the working environment. Data set names match those in the text. Once present in the working environment, modelling data is quick and easy, leaving learners with more time to focus on interpretation.
library(wooldridge)data("wage1")lm(lwage ~ educ + exper + tenure, data = wage1)
It's always recommended that one read supporting documentation for data sets of interest. This becomes trivially easy with the
Documentation includes variable column names, original source of data, and page number(s) where data appear in the text.
Updated vignette to reflect breaking changes to the prais package api in which the function prais.winsten became prais_winsten. Minor changes in DESCRIPTION file to content. Also, bumped R dependency up to R (>= 3.2.0) to bring this package inline with the maximum dependency of its dependent packages (prais).
Updated vignette to add new graphs through chapter 7. Also, remove tidy = TRUE markdown chunk parameters as breaking changes appear to have been made, which now require the installion of a seperate package.
Removed chapter 17 examples for now, as glm(formula, family = poisson, cata=crime1) is outputing an error which requires deeper exploration.
Update bibliography pages with current packages.
Added six additional data sets and documentation for the most recent edition, "Introductory Econometrics: A Modern Approach, 6th edition" (Wooldridge 2016, ISBN-13: 978-1-305-27010-7).
Updated tests accordingly.
Further compression of data sets by writing a function to delete unnecessary attributes attached to each data.frame. Excess attributes were assigned to each data.frame during the import from Stata .dta files. In addition, the row.names attribute was saved as a character. I converted the row.names of all data sets to to integer, which reduced the size of the package as a whole.
Created a Github pages site for the package, and eliminated the .pdf vignette.
Relaxed dependency from
R (>= 3.4.0) to
R (>= 3.0.0) as we no longer need to be concerned about build fails when kniting the .pdf vignette. Minor content edits to Description section.
Added downloads by month badge. Updates reflecting changes above.
Name change to "Introductory Econometrics Examples".
In addition, I updated stargazer output to
type = "html".
Finally, added forecast content to the chapter 18 example.
Added two additional tests on build. One tests checks if 105 data sets are present. The other tests if each one loads correctly and is of class
In previous versions, the documentation merely contained column variable names and dimensions of each data set. This version has been updated to include a plethora of additional information for 101 data sets. Updates include descriptions for each column variable, the original sources Wooldridge used to acquire each data set, detailed notes describing suggested analysis approaches, and page numbers for each data set located in the text. The source of this information comes from a .pdf file titled the "DATA SET HANDBOOK" by Jeffrey M. Wooldridge. I wrote a script which iteratively extracted its contents and inserted them into roxygen2 style .R files for each data set.
Fixed an error in chapter 6, misspelling the name of the data set called within the data function. The model example still worked due to lazy loading of data.
While I have an old affinity for the Farnsworth Econometrics document, it contains some outdated information. I removed it as it might do new learners more harm than good. In its place I added a citation for the book, "Applied Econometrics with R". While I previously cited the AER package, the book should go in the Appendix as a resource for those considering expanding their knowledge beyond this introductory text.
The newly acquired original description of data sets have also been added for every example, with the syntax for calling documentation on each one.
Added instructions to access dev branch, which I pushed to Github.
I removed written descriptions of the example problems, putting more emphasis on viewing clean code. Short notes point to what function arguments are used or to various R packages. For descriptions of the examples and rational behind each problem, students should refer to their textbook as the content offered by Wooldridge is quite clear.
A bibliography section has been added, including package citations and their authors.
An Appendix has been added, pointing readers to a few excellent sources for computing Econometric models with R. These are "Econometrics in R" by Grant Farnsworth and "Using R for Introductory Econometrics" by Florian Hess.
First version of wooldridge data package containing 105 data sets, documentation, and a vignette.