Estimate Regression Models with Instrumental Variables

A consistent API for estimating linear and generalized linear models with instrumental variables.

Instruments makes it easy to fit instrumental variable regression models. It has two core functions:

1. `iv.lm` for estimating linear regression models with instrumental variables;
2. `iv.glm` for estimating generalized linear models with instrumental variables.

Each function takes arguments just like R's `lm` and `glm` functions, but allows you to express one variable as instrumented by one or more instrumental variables. Say you have an independent variable `outcome`, an endogenous dependent variable `endo`, and an instrument `violin`. You can express a linear instrumental variable regression naturally:

If `outcome` is binary, you may want to use a probit model.

Linear models

First we'll create some fake data to demonstrate how instrumental variable regressions work and when they might be appropriate. Suppose you want to estimate the impact of a subscription program on customers' spending, but you know that customers who tend to spend more also tend to sign up for subscriptions. How can you disentangle those effects?

Suppose that you also offer incentives (free samples, buy-one-get-one-free, etc) to encourage customers to enroll in subscriptions, and that these incentives are offered at random. You may be able to use incentives as an instrumental variable to isolate the impact of subscriptions on spending.

The true impact of a subscription on total spending is 2. Since the unobserved variable baseline speding propensity also impacts whether someone gets a subscription, we cannot identify the impact of a subscription on spending with an ordinary linear regression. The ordinary linear regression will tend to overestimate the impact of a subscription on spending.

Instead we can use an instrumental variable regression, with `incentive` as the instrument.

Generalized Linear Models

Suppose instead that we want to estimate the impact of subscriptions on customer churn. Continuing our fake data example from above, we define churn as a function of both subscription and baseline spending.

Since churn is a binary variable (0, 1), a linear regression is not appropriate. Instead we will estimate this model with a logistic regression using the `glm` function in base R and `iv.glm` function in instruments. As before, the simple `glm` function will overestimate the impact of subscription on churn due to the correlation between baseline spending rates and having a subscription.

Diagnostics

You've probably noticed that `iv.lm` and `iv.glm` print the correlation between the independent variable and the instrument and the correlation betweent the instrument and the regression error term. These two metrics help understand whether the model is valid.

Instrumental variable regression models rely on two assumptions:

1. Exclusion restriction: the instrument is not correlated with the error term;
2. Validity: the instrument is correlated with the endogenous explanatory variable (subscription, in our example).

We can also use the `diagnose` funcion to automatically evaluate a fit `iv` model. If it returns nothing, that means that your instrumental variable model seems valid.

Reference manual

install.packages("instruments")

0.1.0 by Alex Pavlakis, 7 months ago

Browse source code at https://github.com/cran/instruments

Authors: Alex Pavlakis

Documentation:   PDF Manual