Piecewise Structural Equation Modeling

Implements piecewise structural equation models.


Implementation of piecewise structural equation modeling (SEM) in R, including estimation of path coefficients and goodness-of-fit statistics.

A formal description of this package can be found at:

Lefcheck, Jonathan S. (2015) piecewiseSEM: Piecewise structural equation modeling in R for ecology, evolution, and systematics. Methods in Ecology and Evolution. 7(5): 573-579. DOI: 10.1111/2041-210X.12512

Version: 1.2.1 (2016-12-06)

Author: Jon Lefcheck jslefche@vims.edu

Supported model classes include:

lm, glm, rq, glm.nb, gls, pgls, merMod, merModLmerTest, lme, glmmPQL, glmmadmb, and glmmTMB.

###WARNING

Some tests of directed separation are non-symmetrical -- the partial slope of a ~ b is not the same as b ~ a -- when the variables are non-linear (i.e., are transformed via a link function when fit to a non-normal distribution). We are currently investigating the phenomenon, but in the interim, the latest version of the package returns the lowest P-value. This the more conservative route. Stay tuned for more updates...

This is only a problem if you are fitting generalized linear models!!

##Examples

###Load package

# Install latest version from CRAN
install.packages("piecewiseSEM")
 
# Install dev version from GitHub
# library(devtools)
# install_github("jslefche/piecewiseSEM")
 
library(piecewiseSEM)

###Load data from Shipley 2009

data(shipley2009)

The data is alternately hosted in Ecological Archives E090-028-S1 (DOI: 10.1890/08-1034.1).

###Create model set

The model corresponds to the following hypothesis (Fig. 2, Shipley 2009);

Models are constructed using a mix of the nlme and lmerTest packages, as in the supplements of Shipley 2009.

# Load required libraries for linear mixed effects models
library(lme4)
library(nlme)
 
# Load example data
data(shipley2009)
 
# Create list of models corresponding to SEM
shipley2009.modlist = list(
 
  lme(DD~lat, random = ~1|site/tree, na.action = na.omit, 
  data = shipley2009),
  
  lme(Date~DD, random = ~1|site/tree, na.action = na.omit, 
  data = shipley2009),
  
  lme(Growth~Date, random = ~1|site/tree, na.action = na.omit, 
  data = shipley2009),
  
  glmer(Live~Growth+(1|site)+(1|tree), 
  family=binomial(link = "logit"), data = shipley2009) 
  
  )

###Run Shipley tests

sem.fit returns a list of the following: (1) the missing paths (omitting conditional variables), the estimate, standard error, degrees of freedom, and associated p-values; (2) the Fisher's C statistic, degrees of freedom, and p-value for the model (derived from a Chi-squared distribution); (3) the AIC, AICc (corrected for small sample size), the likelihood degrees of freedom, and the model degrees of freedom.

The argument add.vars allows you to specify a vector of additional variables whose causal independence you also wish to test. This is useful if you are comparing nested models. Default is NULL.

The argument adjust.p allows you to adjust the p-values returned by the function based on the the total degrees of freedom for the model (see supplementary material, Shipley 2013). Default is FALSE (uses the d.f. reported in the summary table).

(See "p-values and all that" for a discussion of p-values from mixed models using the lmer package.)

sem.fit(shipley2009.modlist, shipley2009)
 
# Conditional variables have been omitted from output table for clarity (or use argument conditional = T)
# $missing.paths
#         missing.path estimate std.error   df crit.value p.value 
# 1   Date ~ lat + ...  -0.0091    0.1135   18    -0.0798  0.9373 
# 2 Growth ~ lat + ...  -0.0989    0.1107   18    -0.8929  0.3837 
# 3   Live ~ lat + ...   0.0305    0.0297   NA     1.0279  0.3040 
# 4  Growth ~ DD + ...  -0.0106    0.0358 1329    -0.2967  0.7667 
# 5    Live ~ DD + ...   0.0272    0.0271   NA     1.0046  0.3151 
# 6  Live ~ Date + ...  -0.0466    0.0298   NA    -1.5622  0.1182 
# $Fisher.C
#   fisher.c df p.value
# 1    11.54 12   0.483
# $AIC
#     AIC   AICc  K    n
# 1 49.54 50.079 19 1431

The missing paths output differs from Table 2 in Shipley 2009. However, running each d-sep model by hand yields the same answers as this function, leading me to believe that updates to the lme4 and nlme packages are the cause of the discrepancy. Qualitatively, the interpretations are the same.

###Extract path coefficients

Path coefficients can be either unstandardized or standardized (centered and scaled in units of standard deviation of the mean, or scaled by the range the data). Default is none. The function returns a data.frame sorted by increasing significance.

sem.coefs(shipley2009.modlist, shipley2009)
 
#   response predictor   estimate   std.error p.value    
# 1       DD       lat -0.8354736 0.119422385       0 ***
# 2     Date        DD -0.4976475 0.004933274       0 ***
# 3   Growth      Date  0.3007147 0.026631405       0 ***
# 4     Live    Growth  0.3478536 0.058415948       0 ***
 
sem.coefs(shipley2009.modlist, shipley2009, standardize = "scale")
 
#   response predictor   estimate   std.error p.value    
# 1       DD       lat -0.7014051 0.100258794       0 ***
# 2     Date        DD -0.6281367 0.006226838       0 ***
# 3   Growth      Date  0.3824224 0.033867469       0 ***
# 4     Live    Growth  0.3478536 0.058415948       0 ***
# Warning message:
# In get.scaled.data(modelList, data, standardize) :
#   One or more responses not modeled to a normal distribution: keeping response(s) on original scale!

Note the error indicating that one of the responses (Live) cannot be scaled because it would violate the distributional assumptions, so only the predictors have been scaled.

We can plot a rudimentary path diagram of the SEM using sem.plot which reports the coefficients, above:

sem.plot(shipley2009.modlist, shipley2009)

###Generate variance-covariance SEM using lavaan

Generate variance-covariance based SEM from the list of linear mixed models. The resulting object can be treated like any other model object constructed using the package lavaan.

(lavaan.model = sem.lavaan(shipley2009.modlist, shipley2009))
 
# lavaan (0.5-18) converged normally after  27 iterations
#                                                   Used       Total
#   Number of observations                          1431        1900
#   Estimator                                         ML
#   Minimum Function Test Statistic               38.433
#   Degrees of freedom                                 6
#   P-value (Chi-square)                           0.000
 

The output shows that the variance-covariance SEM is a worse fit, indicating that a hierarchical piecewise approach is justified given the hierarchical structure of the data.

###Plot partial effect between two variables

One might be interested in the partial effects of one variable on another given covariates in the SEM. The function partial.resid returns a data.frame of the partial residuals of y ~ x and plots the partial effect (if plotit = T).

# Load model package
library(nlme)
 
# Load data from Shipley (2013)
data(shipley2013) 
 
# Create list of structured equations
shipley2013.modlist = list(
 
  lme(x2~x1, random = ~x1 | species, data = shipley2013),
  
  lme(x3~x2, random = ~x2 | species, data = shipley2013),
  
  lme(x4~x2, random = ~x2 | species, data = shipley2013),
  
  lme(x5~x3+x4, random = ~x3+x4 | species, data = shipley2013)
  
  )
 
# Get partial residuals of x3 on x5 conditional on x4
resids.df = partial.resid(x5 ~ x3, shipley2013.modlist, list(lmeControl(opt = "optim")))

###Get R2 for individual models

Return R2 and AIC values for component models in the SEM.

sem.model.fits(shipley2009.modlist)
 
#      Class   Family     Link    N  Marginal Conditional
# 1      lme gaussian identity 1431 0.4766448   0.6932571
# 2      lme gaussian identity 1431 0.4083328   0.9838487
# 3      lme gaussian identity 1431 0.1070265   0.8364736
# 4 glmerMod binomial    logit 1431 0.5589205   0.6291980

###Return model predictions

Generate model predictions from new data.

# Create new data for predictions
shipley2009.new = data.frame(
  
  lat = rnorm(length(shipley2009$lat), mean(shipley2009$lat, na.rm = T), 
    sd(shipley2009$lat, na.rm = T)),
  
  DD = rnorm(length(shipley2009$DD), mean(shipley2009$DD, na.rm = T), 
    sd(shipley2009$DD, na.rm = T)),
    
  Date = rnorm(length(shipley2009$Date), mean(shipley2009$Date, na.rm = T), 
    sd(shipley2009$Date, na.rm = T)),
    
  Growth = rnorm(length(shipley2009$Growth), mean(shipley2009$Growth, na.rm = T), 
    sd(shipley2009$Growth, na.rm = T))
  
)
 
# Generate predictions
head(sem.predict(shipley2009.modlist, shipley2009.new))

News

piecewiseSEM Change Log

  • Bug fix: use of standardize = "scale" in sem.coefs with cbind in glmer
  • Warning: use of poly polynomials in model formula
  • Feature addition: support for models of class rq
  • Bug fix: error in scaling of transformed variables in get.scaled.data
  • Bug fix: sem.basis.set failed to remove variables regressed against their interaction
  • Bug fix: multiple levels of grouping.vars in sem.fit
  • Feature addition: new function acyclic to test for acyclic DAGs
  • Feature addition: new function rsquared for easier calling of sem.model.fits
  • Bug fix: order of adjacency matrix in get.sort.dag
  • Feature addition: model class glmmTMB
  • Bug fix: inclusion of transformed variables as additional boxes in sem.plot
  • Bug fix: specification of invalid models in endogenous.reverse
  • Feature addition: implemented two-column binding as response in GLM(M)s
  • Bug fix: memory allocation error in sem.model.fits
  • Bug fix: inclusion of entries in the basis set without corresponding models in the model list in endogenous.reverse
  • Bug fix: crossed random effects in get.random.formula not returning correct random slopes
  • Bug fix: gls models in sem.missing.paths could not update basis model
  • Feature addition: report of intercepts in sem.coefs
  • Bug fix: warning with random slopes not present as fixed effects in sem.model.fits
  • Feature addition: new function endogenous.reverse to assist with independence claims among intermediate endogenous variables not fit to a normal distribution
  • Bug fix: negative binomial models fail to return R2s in sem.model.fits
  • Bug fix: Failed to index correct P-value in sem.missing.paths when categorical variables are present
  • Bug fix: intermediate endogenous variables not properly removed from the basis set in sem.missing.paths
  • Feature addition: significance indicators for P-value outputs
  • Bug fix: remove random effects from get.scaled.data
  • Feature addition: Use KRmodcomp from pbkrtest instead of get_LB_dff to calculate P-values for lmer models
  • Bug fix: transformations for scaled coefficients no longer fails in sem.coefs
  • Bug fix: all interactions now show up in sem.coefs when scaled
  • Bug fix: random slopes as fixed effects not registering properly in sem.model.fits
  • Bug fix: added \dontrun{} to all help files
  • Reduced time it takes examples to run
  • Stop: if duplicate responses are detected in the model list
  • Temporary bug fix: returns lowest P-value for when d-sep tests include non-linear intermediate endogenous variables, only if family is not Gaussian
  • Bug fix: reversed order of conditioning variables so they come first
  • Bug fix: removed entries from the basis set that attempt to predict an interaction that does not appear in the model list
  • Feature addition: rudimentary plotting using sem.plot
  • Feature addition: support for glmmadmb models
  • Feature addition: pbkrtest to reliably extract p-values from lme4 models
  • Major Bug Fix: incorrect basis set specified by ggm::topSort; introduced sort.dag function, removed dependency on ggm
  • Major Bug Fix: sem.basis.set now treats transformed variables as untransformed, resolving duplicate vars in the basis set
  • Bug fix: resolved switched interactions (e.g., x1:x2 vs x2:x1) leading to duplicate vars in the basis set using get.dag
  • Bug fix: fixed long standing bug with lmer models not returning p-values
  • Bug fix: correlated errors among exogenous variables in sem.coefs
  • Bug fix: correctly scale data from pgls models using get.scaled.data
  • Bug fix: corrected basis set for models created with gls
  • Feature addition: helper function get.scaled.model to get model from scaled coefficients
  • Feature addition: (partial) correlations in sem.coefs
  • Bug fix: passing of correlated errors to lavaan
  • Bug fix: standardized coefficients for variables transformed in model formula
  • Bug fix: get.model.control updated for latest versions of lme4 and nlme
  • Bug fix: hand compute interactions in sem.lavaan using argument compute.int = TRUE
  • Feature addition: function get.scaled.data to handle transformed variables in sem.coefs when standardize = "scale"
  • Feature addition: added additional plotting arguments for partial.residuals
  • Feature addition: AICc and delta AIC to sem.model.fits
  • Bug fix: Offsets in model formula treated as predictors in d-sep tests
  • Feature addition: AICc and delta AIC to sem.model.fits
  • Bug fix: issue with interactions in basis set and d-sep tests in sem.missing.paths
  • Bug fix: issue with fixed intercept models and calculating the basis set
  • Bug fix: issue with offset() variables in sem.missing.paths
  • Bug fix: transformed responses for partial residuals
  • Bug fix: duplicated values in basis set (function DAG in ggm package)
  • First release to CRAN
  • Added vignette
  • Fixed issue with design matrix including omitted observations and dropped levels in sem.model.fits
  • Incorporated interactions into partial.resid
  • Fixed bug with lme residuals in partial.resid and predict.sem
  • Modified sem.coefs to return NA for standardized interactions
  • Added functions get.dag and get.basis.set
  • Fixed issue with interactions in sem.missing.paths
  • Fixed issue in sem.missing.paths where only 1 missing path returned NA instead of p-value
  • Removed argument filter.exog = TRUE
  • Fixed error for add.vars in get.dag
  • Fixed issue with transformed variables and sem.coefs
  • Fixed issue with lme4 models and sem.model.fits
  • Fixed issue with incorrect independence claims in get.basis.set
  • Fixed (nlme) merging error in partial.resid
  • Fixed issue with glmmPQL and get.random.structure
  • Added new citation (arXiv)
  • Implemented workaround in sem.basis.set where output changed based on order of variables in formula with interactions
  • Saturated models now return AIC (and AICc) values (with warning)
  • Added standard errors on predictions for mixed models in predict.sem()
  • Fixed typo in output for sem.fisher.c with incorrect df
  • Fixed error with lme4 and gls models in partial.resid()
  • Fixed error with lmerTest returning "merMod" objects
  • New function predict.sem returns model predictions
  • Fixed "unsupported model class" error for merModTest
  • Major revisions and annotations to all functions to improve transparency and efficiency
  • Included new helper functions get.random.formula() and get.model.control()
  • Now reports df for all model types except glmer
  • Included support for "gls" and phylogenetically independent contrast ("pgls") models
  • Fixed deparse error in get.random.struture()
  • Improved handling of gls() models
  • Fixed issue with transformed corr.errors in get.basis.set
  • get.sem.coefs now does range standardization
  • get.sem.coefs now returns single table with significance values
  • Improved handling of uneven observations in get.partial.resid
  • get.partial.resid now accepts a single model in addition to a list of models
  • get.lavaan.sem now supports additional arguments from lavaan
  • Added new function to return R^2 and AIC values for component models in SEM
  • get.partial.resid now returns residuals plot with fitted line
  • get.missing.paths now returns more information for d-sep tests
  • get.basis.set drops independence claims in the basis set where interactions predict their corresponding main effects
  • get.fisher.c now reports k df and get.aic now reports K and n df
  • Fixed rounding error that misreported Fisher's C
  • Allow user-specified rounding using sig= argument in get.sem.fit, get.fisher.c, and get.aic
  • Fixed minor bug in get.partial.resid where random effects threw an error for lme models
  • Fixed interaction bug in filter.exogenous which was returning incompatible models from the basis set
  • Fixed bug in get.sem.coefs where missing values returned NA for corr.errors
  • Fixed issue with transformed/untransformed variables predicting one another in the basis set
  • Fixed interaction bug in get.missing.paths, filter.exogenous, get.partial.resiod
  • Now can supply single model control or list
  • Fixed bug for transformed variables in get.partial.resid
  • Can now specify an equation as input into get.partial.resid
  • Fixed bug for transformed interaction terms in structured equations for get.basis.set
  • Fixed bug for transformed variables when also specifying raw variables in corr.error
  • Added model control arguments to get.sem.fit and get.partial.resid
  • Added optional switch to display conditional variables in get.missing.paths
  • Removed logLik df from get.aic output
  • Incorporated multilevel data into get.sem.fit
  • Fixed error when length(basis.set) = 1 and .progressBar = T
  • Expanded interaction term in get.basis.set to include 20 letters instead of 10, and run them backwards
  • Renamed variable names in filter.exogenous to better reflect what they are
  • Incorporated transformed variables and multiple interactions into get.sem.coefs
  • Fixed but with add.vars; many functions now require the user to supply a data.frame
  • Standardized argument order across functions
  • Improved handling of correlated errors in get.sem.coefs
  • Incorporated correlated errors into get.sem.lavaan
  • Fixed return of standardized coefficients in get.sem.coefs
  • Added argument to define and quantify correlated errors
  • Added new function get.partial.resid
  • Added new function get.basis.set (replaced dag.updated)
  • Improved handling of interactions in the basis set
  • Improved handling of p-value rownames in get.missing.paths
  • get.aic now returns df
  • Added new functions get.aic, get.fisher.c, get.sem.coefs, get.lavaan.sem
  • Added README.md
  • Completed all existing .Rd files
  • Updated .progressBar as to not conflict with existing function progressBar
  • Altered handling of model formulae to collapse to a single character string
  • Initial build of package
  • Added CHANGELOG

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("piecewiseSEM")

1.2.1 by Jon Lefcheck, 3 months ago


Browse source code at https://github.com/cran/piecewiseSEM


Authors: Jon Lefcheck


Documentation:   PDF Manual  


Task views: Psychometric Models and Methods


GPL-3 license


Imports lavaan, lme4, methods, nlme, pbkrtest

Suggests knitr, rmarkdown


See at CRAN