Last updated on 2020-12-20 by Achim Zeileis

Base R ships with a lot of functionality useful for computational econometrics, in particular in the stats package. This functionality is complemented by many packages on CRAN, a brief overview is given below. There is also a considerable overlap between the tools for econometrics in this view and those in the task views on Finance, SocialSciences, and TimeSeries.

The packages in this view can be roughly structured into the following topics. If you think that some package is missing from the list, please contact the maintainer.

**Basic linear regression**

*Estimation and standard inference*: Ordinary least squares (OLS) estimation for linear models is provided by`lm()`

(from stats) and standard tests for model comparisons are available in various methods such as`summary()`

and`anova()`

.*Further inference and nested model comparisons*: Functions analogous to the basic`summary()`

and`anova()`

methods that also support asymptotic tests (*z*instead of*t*tests, and Chi-squared instead of*F*tests) and plug-in of other covariance matrices are`coeftest()`

and`waldtest()`

in lmtest. Tests of more general linear hypotheses are implemented in`linearHypothesis()`

and for nonlinear hypotheses in`deltaMethod()`

in car.*Robust standard errors*: HC, HAC, clustered, and bootstrap covariance matrices are available in sandwich and can be plugged into the inference functions mentioned above.*Nonnested model comparisons*: Various tests for comparing non-nested linear models are available in lmtest (encompassing test, J test, Cox test). The Vuong test for comparing other non-nested models is provided by nonnest2 (and specifically for count data regression in pscl).*Diagnostic checking*: The packages car and lmtest provide a large collection of regression diagnostics and diagnostic tests. In addition to these two packages, skedastic contains further diagnostics specifically for detecting heteroscedasticity.

**Microeconometrics**

*Generalized linear models (GLMs)*: Many standard microeconometric models belong to the family of generalized linear models and can be fitted by`glm()`

from package stats. This includes in particular logit and probit models for modeling choice data and Poisson models for count data. Effects for typical values of regressors in these models can be obtained and visualized using effects. Marginal effects tables for certain GLMs can be obtained using the margins and mfx packages. Interactive visualizations of both effects and marginal effects are possible in LinRegInteractive.*Binary responses*: The standard logit and probit models (among many others) for binary responses are GLMs that can be estimated by`glm()`

with`family = binomial`

. Bias-reduced GLMs that are robust to complete and quasi-complete separation are provided by brglm. Discrete choice models estimated by simulated maximum likelihood are implemented in Rchoice. bife provides binary choice models with fixed effects. Heteroscedastic probit models (and other heteroscedastic GLMs) are implemented in glmx along with parametric link functions and goodness-of-link tests for GLMs.*Count responses*: The basic Poisson regression is a GLM that can be estimated by`glm()`

with`family = poisson`

as explained above. Negative binomial GLMs are available via`glm.nb()`

in package MASS. Another implementation of negative binomial models is provided by aod, which also contains other models for overdispersed data. Zero-inflated and hurdle count models are provided in package pscl. A reimplementation by the same authors is currently under development in countreg on R-Forge which also encompasses separate functions for zero-truncated regression, finite mixture models etc.*Multinomial responses*: Multinomial models with individual-specific covariates only are available in`multinom()`

from package nnet. An implementation with both individual- and choice-specific variables is mlogit and mnlogit. Generalized multinomial logit models (e.g., with random effects etc.) are in gmnl. A flexible framework of various customizable choice models (including multinomial logit and nested logit among many others) is implemented in the apollo package. Generalized additive models (GAMs) for multinomial responses can be fitted with the VGAM package. A Bayesian approach to multinomial probit models is provided by MNP. Various Bayesian multinomial models (including logit and probit) are available in bayesm. Furthermore, the package RSGHB fits various hierarchical Bayesian specifications based on direct specification of the likelihood function.*Ordered responses*: Proportional-odds regression for ordered responses is implemented in`polr()`

from package MASS. The package ordinal provides cumulative link models for ordered data which encompasses proportional odds models but also includes more general specifications. Bayesian ordered probit models are provided by bayesm.*Censored responses*: Basic censored regression models (e.g., tobit models) can be fitted by`survreg()`

in survival, a convenience interface`tobit()`

is in package AER. Further censored regression models, including models for panel data, are provided in censReg. Censored regression models with conditional heteroscedasticity are in crch. Furthermore, hurdle models for left-censored data at zero can be estimated with mhurdle. Models for sample selection are available in sampleSelection. Package matchingMarkets corrects for selection bias when the sample is the result of a stable matching process (e.g., a group formation or college admissions problem).*Truncated responses*: crch for truncated (and potentially heteroscedastic) Gaussian, logistic, and t responses. Homoscedastic Gaussian responses are also available in truncreg.*Fraction and proportion responses*: Fractional response models are in frm. Beta regression for responses in (0, 1) is in betareg and gamlss.*Duration responses*: Many classical duration models can be fitted with survival, e.g., Cox proportional hazard models with`coxph()`

or Weibull models with`survreg()`

. Many more refined models can be found in the Survival task view. The Heckman and Singer mixed proportional hazard competing risk model is available in durmod.*High-dimensional fixed effects*: Linear models with potentially high-dimensional fixed effects, also for multiple groups, can be fitted bysgaure/lfe . The corresponding GLMs are covered in alpaca. Another implementation, based on C++ code covering both OLS and GLMs is in fixest.*Miscellaneous*: Further more refined tools for microeconometrics are provided in the micEcon family of packages: Analysis with Cobb-Douglas, translog, and quadratic functions is in micEcon; the constant elasticity of scale (CES) function is in micEconCES; the symmetric normalized quadratic profit (SNQP) function is in micEconSNQP. The almost ideal demand system (AIDS) is in micEconAids. Stochastic frontier analysis (SFA) is in frontier and certain special cases also in sfa. Semiparametric SFA in is available in semsfa and spatial SFA in spfrontier and ssfa. The package bayesm implements a Bayesian approach to microeconometrics and marketing. Inference for relative distributions is contained in package reldist.

**Instrumental variables**

*Basic instrumental variables (IV) regression*: Two-stage least squares (2SLS) is provided by`ivreg()`

in AER. Other implementations are in`tsls()`

in package sem, in ivpack, andsgaure/lfe (with particular focus on multiple group fixed effects).*Binary responses*: An IV probit model via GLS estimation is available in ivprobit. The LARF package estimates local average response functions for binary treatments and binary instruments.*Panel data*: Certain basic IV models for panel data can also be estimated with standard 2SLS functions (see above). Dedicated IV panel data models are provided by ivfixed (fixed effects) and ivpanel (between and random effects).*Miscellaneous*: REndo fits linear models with endogenous regressor using various latent instrumental variable approaches.

**Panel data models**

*Panel standard errors*: A simple approach for panel data is to fit the pooling (or independence) model (e.g., via`lm()`

or`glm()`

) and only correct the standard errors. Different types of clustered, panel, and panel-corrected standard errors are available in sandwich (incorporating prior work from multiwayvcov), clusterSEs, pcse, clubSandwich, plm, and geepack, respectively. The latter two require estimation of the pooling/independence models via`plm()`

and`geeglm()`

from the respective packages (which also provide other types of models, see below).*Linear panel models*: plm, providing a wide range of within, between, and random-effect methods (among others) along with corrected standard errors, tests, etc. Another implementation of several of these models is in Paneldata. Various dynamic panel models are available in plm and dynamic panel models with fixed effects in OrthoPanels. feisr provides fixed effects individual slope (FEIS) models. Panel vector autoregressions are implemented in panelvar.*Generalized estimation equations and GLMs*: GEE models for panel data (or longitudinal data in statistical jargon) are in geepack. The pglm package provides estimation of GLM-like models for panel data.*Mixed effects models*: Linear and nonlinear models for panel data (and more general multi-level data) are available in lme4 and nlme.*Instrumental variables*: ivfixed and ivpanel, see also above.*Miscellaneous*: Autocorrelation and heteroscedasticity correction are available in wahc and panelAR. Threshold regression and unit root tests are in pdR. The panel data approach method for program evaluation is available in pampe.

**Further regression models**

*Nonlinear least squares modeling*:`nls()`

in package stats.*Quantile regression*: quantreg (including linear, nonlinear, censored, locally polynomial and additive quantile regressions).*Generalized method of moments (GMM) and generalized empirical likelihood (GEL)*: gmm.*Spatial econometric models*: The Spatial view gives details about handling spatial data, along with information about (regression) modeling. In particular, spatial regression models can be fitted using spatialreg and sphet (the latter using a GMM approach). splm is a package for spatial panel models. Spatial probit models are available in spatialprobit.*Bayesian model averaging (BMA)*: A comprehensive toolbox for BMA is provided by BMS including flexible prior selection, sampling, etc. A different implementation is in BMA for linear models, generalizable linear models and survival models (Cox regression).*Linear structural equation models*: lavaan and sem. See also the Psychometrics task view for more details.*Simultaneous equation estimation*: systemfit.*Nonparametric methods*: np using kernel smoothing and NNS using partial moments.*Linear and nonlinear mixed-effect models*: nlme and lme4.*Generalized additive models (GAMs)*: mgcv, gam, gamlss and VGAM.*Design-based inference*: estimatr contains fast procedures for several design-appropriate estimators with robust standard errors and confidence intervals including linear regression, instrumental variables regression, difference-in-means, among others.*Extreme bounds analysis*: ExtremeBounds.*Miscellaneous*: The packages VGAM, rms and Hmisc provide several tools for extended handling of (generalized) linear regression models.

**Time series data and models**

- The TimeSeries task view provides much more detailed information about both basic time series infrastructure and time series models. Here, only the most important aspects relating to econometrics are briefly mentioned. Time series models for financial econometrics (e.g., GARCH, stochastic volatility models, or stochastic differential equations, etc.) are described in the Finance task view.
*Infrastructure for regularly spaced time series*: The class`"ts"`

in package stats is R's standard class for regularly spaced time series (especially annual, quarterly, and monthly data). It can be coerced back and forth without loss of information to`"zooreg"`

from package zoo.*Infrastructure for irregularly spaced time series*: zoo provides infrastructure for both regularly and irregularly spaced time series (the latter via the class`"zoo"`

) where the time information can be of arbitrary class. This includes daily series (typically with`"Date"`

time index) or intra-day series (e.g., with`"POSIXct"`

time index). An extension based on zoo geared towards time series with different kinds of time index is xts. Further packages aimed particularly at finance applications are discussed in the Finance task view.*Classical time series models*: Simple autoregressive models can be estimated with`ar()`

and ARIMA modeling and Box-Jenkins-type analysis can be carried out with`arima()`

(both in the stats package). An enhanced version of`arima()`

is in forecast.*Linear regression models*: A convenience interface to`lm()`

for estimating OLS and 2SLS models based on time series data is dynlm. Linear regression models with AR error terms via GLS is possible using`gls()`

from nlme.*Structural time series models*: Standard models can be fitted with`StructTS()`

in stats. Further packages are discussed in the TimeSeries task view.*Filtering and decomposition*:`decompose()`

and`HoltWinters()`

in stats. The basic function for computing filters (both rolling and autoregressive) is`filter()`

in stats. Many extensions to these methods, in particular for forecasting and model selection, are provided in the forecast package.*Vector autoregression*: Simple models can be fitted by`ar()`

in stats, more elaborate models are provided in package vars along with suitable diagnostics, visualizations etc. Panel vector autoregressions are available in panelvar.*Unit root and cointegration tests*: urca, tseries, CADFtest. See also pco for panel cointegration tests.*Miscellaneous*:- tsDyn - Threshold and smooth transition models.
- midasr -
*MIDAS regression*and other econometric methods for mixed frequency time series data analysis. - gets - GEneral-To-Specific (GETS) model selection for either ARX models with log-ARCH-X errors, or a log-ARCH-X model of the log variance.
- tsfa - Time series factor analysis.
- bimets - Econometric modeling of time series data using flexible specifications of simultaneous equation models.
- dlsem - Distributed-lag linear structural equation models.
- lpirfs - Local projections impulse response functions.
- apt - Asymmetric price transmission models.

**Data sets**

*Textbooks and journals*: Packages AER, Ecdat, and wooldridge contain a comprehensive collections of data sets from various standard econometric textbooks (including Greene, Stock & Watson, Wooldridge, Baltagi, among others) as well as several data sets from the Journal of Applied Econometrics and the Journal of Business & Economic Statistics data archives. AER and wooldridge additionally provide extensive sets of examples reproducing analyses from the textbooks/papers, illustrating various econometric methods. In pder a wide collection of data sets for "Panel Data Econometrics with R" (Croissant & Millo 2018) is available. Theccolonescu/PoEdata package on GitHub provides the data sets from "Principles of Econometrics" (4th ed, by Hill, Griffiths, and Lim 2011).*Canadian monetary aggregates*: CDNmoney.*Penn World Table*: pwt provides versions 5.6, 6.x, 7.x. Version 8.x and 9.x data are available in pwt8 and pwt9, respectively.*Time series and forecasting data*: The packages expsmooth, fma, and Mcomp are data packages with time series data from the books 'Forecasting with Exponential Smoothing: The State Space Approach' (Hyndman, Koehler, Ord, Snyder, 2008, Springer) and 'Forecasting: Methods and Applications' (Makridakis, Wheelwright, Hyndman, 3rd ed., 1998, Wiley) and the M-competitions, respectively.*Empirical Research in Economics*: Package erer contains functions and datasets for the book of 'Empirical Research in Economics: Growing up with R' (Sun, forthcoming).*Panel Study of Income Dynamics (PSID)*: psidR can build panel data sets from the Panel Study of Income Dynamics (PSID).- US state- and county-level panel data: rUnemploymentData.
- World Bank data and statistics: The wbstats package provides programmatic access to the World Bank API.

**Miscellaneous**

*Matrix manipulations*: As a vector- and matrix-based language, base R ships with many powerful tools for doing matrix manipulations, which are complemented by the packages Matrix and SparseM.*Optimization and mathematical programming*: R and many of its contributed packages provide many specialized functions for solving particular optimization problems, e.g., in regression as discussed above. Further functionality for solving more general optimization problems, e.g., likelihood maximization, is discussed in the the Optimization task view.*Bootstrap*: In addition to the recommended boot package, there are some other general bootstrapping techniques available in bootstrap or simpleboot as well some bootstrap techniques designed for time-series data, such as the maximum entropy bootstrap in meboot or the`tsbootstrap()`

from tseries.*Inequality*: For measuring inequality, concentration and poverty the package ineq provides some basic tools such as Lorenz curves, Pen's parade, the Gini coefficient and many more.*Structural change*: R is particularly strong when dealing with structural changes and changepoints in parametric models, see strucchange and segmented.*Exchange rate regimes*: Methods for inference about exchange rate regimes, in particular in a structural change setting, are provided by fxregime.*Global value chains*: Tools and decompositions for global value chains are in gvc and decompr.*Regression discontinuity design*: A variety of methods are provided in the rdd, rdrobust, and rdlocrand packages. The rdpower package offers power calculations for regression discontinuity designs. And rdmulti implements analysis with multiple cutoffs or scores.*Gravity models*: Estimation of log-log and multiplicative gravity models is available in gravity.*z-Tree*: zTree can import data from the z-Tree software for developing and carrying out economic experiments.*Numerical standard errors*: nse implements various numerical standard errors for time series data, especially in simulation experiments with correlated outcome sequences.

- Task view: Finance
- Task view: Optimization
- Task view: Psychometrics
- Task view: SocialSciences
- Task view: Spatial
- Task view: Survival
- Task view: TimeSeries
sgaure/lfe ccolonescu/PoEdata - Journal of Statistical Software: Special Volume on 'Econometrics in R' (2008)
- Book: Applied Econometrics with R (Kleiber and Zeileis)
- Book: Using R for Introductory Econometrics (Heiss)
- Book: Introduction to Econometrics with R (Hanck, Arnold, Gerber, Schmelzer)
- Book: Hands-On Intermediate Econometrics Using R (Vinod)
- Book: Panel Data Econometrics with R (Croissant & Millo)
- Book: Spatial Econometrics (Kelejian and Piras)
- Manual: Principles of Econometrics with R (Colonescu)
- Manual: Introduction to Econometrics with R (Oswald, Robin, Viers)
- Manual: Econometrics In-Class Labs (Ransom)
- Manual: Data Science for Economists (McDermott)
- A Brief Guide to R for Beginners in Econometrics
- R for Economists

2 years ago by Scott Kostyshak

Functions for the Book "An Introduction to the Bootstrap"

5 months ago by Ioannis Kosmidis

Bias Reduction in Binomial-Response Generalized Linear Models

4 years ago by Claudio Lupi

A Package to Perform Covariate Augmented Dickey-Fuller Unit Root Tests

6 years ago by Paul Gilbert

Components of Canadian Monetary and Credit Aggregates

a month ago by James Pustejovsky

Cluster-Robust (Sandwich) Variance Estimators with Small-Sample Corrections

a year ago by Justin Esarey

Calculate Cluster-Robust p-Values and Confidence Intervals

6 years ago by Rob J Hyndman

Data Sets from "Forecasting with Exponential Smoothing"

a year ago by Rob Hyndman

Data Sets from "Forecasting: Methods and Applications" by Makridakis, Wheelwright & Hyndman (1998)

6 months ago by Mikis Stasinopoulos

Generalised Additive Models for Location Scale and Shape

10 days ago by Genaro Sucarrat

General-to-Specific (GETS) Modelling and Indicator Saturation Methods

9 months ago by Pierre Chausse

Generalized Method of Moments and Generalized Empirical Likelihood

a year ago by Martin Meermeyer

Interactive Interpretation of Linear Regression Models

19 days ago by Brian Ripley

Support Functions and Datasets for Venables and Ripley's MASS

2 years ago by Alan Fernihough

Marginal Effects, Odds Ratios and Incidence Rate Ratios for GLMs

14 days ago by Simon Wood

Mixed GAM Computation Vehicle with Automatic Smoothness Estimation

4 years ago by Arne Henningsen

Demand Analysis with the Almost Ideal Demand System (AIDS)

2 months ago by Arne Henningsen

Analysis with the Constant Elasticity of Substitution (CES) Function

a month ago by Brian Ripley

Feed-Forward Neural Networks and Multinomial Log-Linear Models

a year ago by Jeffrey S. Racine

Nonparametric Kernel Smoothing Methods for Mixed Data Types

a year ago by Rune Haubo Bojesen Christensen

Regression Models for Ordinal Data

2 years ago by Mark Pickup

Dynamic Panel Models with Orthogonal Reparameterization of Fixed Effects

5 years ago by Ainhoa Vega-Bayo

Implementation of the Panel Data Approach Method for Program Evaluation

7 years ago by Konstantin Kashin

Estimation of Linear AR(1) Panel Data Models with Cross-Sectional Heteroskedasticity and/or Correlation

2 years ago by Ho Tsung-wu

Threshold Model and Unit Root Tests in Cross-Section and Time Series Data

9 months ago by Mauricio Sarrias

Discrete Choice (Binary, Poisson and Ordered) Models with Random Parameters

2 months ago by Gonzalo Vazquez-Bare

Analysis of RD Designs with Multiple Cutoffs or Scores

7 days ago by Sebastian Calonico

Robust Data-Driven Statistical Inference in Regression-Discontinuity Designs

21 days ago by Raluca Gui

Fitting Linear Models with Endogenous Regressors using Latent Instrumental Variables

2 years ago by Jeff Dumont

Functions for Hierarchical Bayesian Estimation: A Flexible Approach

4 years ago by Ari Lamstein

Data and Functions for USA State and County Unemployment Data

21 days ago by Vito M. R. Muggeo

Regression Models with Break-Points / Change-Points Estimation

3 years ago by Giancarlo Ferrara

Semiparametric Estimation of Stochastic Frontier Models

6 months ago by Thomas Farrar

Heteroskedasticity Diagnostics for Linear Regression Models

3 years ago by Gianfranco Piras

Estimation of Spatial Autoregressive Models with and without Heteroscedasticity

6 years ago by Zaghdoudi Taha

Autocorrelation and Heteroskedasticity Correction in Fixed Effect Panel Data Model

3 months ago by Jesse Piburn

Programmatic Access to Data and Statistics from the World Bank API

2 years ago by Justin M. Shea

111 Data Sets from "Introductory Econometrics: A Modern Approach, 6e" by Jeffrey M. Wooldridge

10 months ago by Achim Zeileis

S3 Infrastructure for Regular and Irregular Time Series (Z's Ordered Observations)