Simple Conjoint Tidying, Analysis, and Visualization

Simple tidying, analysis, and visualization of conjoint (factorial) experiments, including estimation and visualization of average marginal component effects ('AMCEs') and marginal means ('MMs') for weighted and un-weighted survey data, along with useful reference category diagnostics and statistical tests. Estimation of 'AMCEs' is based upon methods described by Hainmueller, Hopkins, and Yamamoto (2014) .



title: "Simple Conjoint Analyses and Visualization" author: "Thomas J. Leeper" output: md_document: variant: markdown_github

cregg is a package for analyzing and visualizing the results of conjoint ("cj") factorial experiments using methods described by Hainmueller, Hopkins, and Yamamoto (2014). It provides functionality that is useful for analyzing and otherwise examining conjoint experimental data through a main function - cj() - that simply wraps around a number of analytic tools:

  • Estimation of average marginal component effects (AMCEs) for fully randomized conjoint designs (as well as designs involving an unlimited number of two-way constraints between features) and munging of AMCE estimates into tidy data frames, via amce()
  • Calculation of marginal means (MMs) for conjoint designs and munging them into tidy data frames via mm()
  • Tabulation of display frequencies of feature levels via cj_table() and cj_freqs() and cross-tabulation of feature restrictions using cj_props()
  • Diagnostics to assess preference heterogeneity, including an omnibus statstical test (cj_anova()) and tidying of differences in MMs (mm_diffs()) and AMCEs (amce_diffs()) across subgroups

In addition, the package provides a number of tools that are likely useful to conjoint analysts:

  • ggplot2-based visualizations of AMCEs and MMs, via plot() methods for all of the above
  • Tidying of raw "wide"-format conjoint survey datasets into "long" or "tidy" datasets using cj_tidy()
  • Diagnostics to choose feature reference categories, via amce_by_reference()

To demonstrate package functionality, the package includes three example datasets:

  • taxes, a full randomized choice task conjoint experiment conducted by Ballard-Rosa et al. (2016)
  • immigration, a partial factorial conjoint experiment with several restrictions between features conducted by Hainmueller, Hopkins, and Yamamoto (2014)
  • conjoint_wide, a simulated "wide"-format conjoint dataset that is used to demonstrate functionality of cj_tidy()

The design of cregg follows a few key princples:

  • Following tidy data principles throughout, so that all of the main functions produce consistently structured, metadata-rich data frames. Thus the response from any function is a tidy data frame that can easily be stacked with others (e.g., for computing AMCEs for subsets of respondents) and then producing ggplot2 visualizations.
  • A formula-based interface that meshes well with the underlying survey-based effect estimation API.
  • A consistent interface for both unconstrained and two-way constrained designs that relies only on formula notation without any package-specific "design" specification. Conjoint designs involving two-way constraints between features are easily supported using simple formula notation: Y ~ A + B + C implies an unconstrained design, while Y ~ A * B + C implies a constraint between levels of features A and B. cregg figures out the constraints automatically without needing to further specify them explicitly.

cregg also provides some sugar:

  • Using "label" attributes on variables to provide pretty printing, with options to relabel features or plots on the fly. The cj_df() function (and data frame class "cj_df") is designed to preserve these attributes during subsetting.
  • Using factor base levels rather than trying to set baseline levels atomically
  • A convenient API (via the cj(..., by = ~ group) idiom) for repeated, subgroup operations without the need for lapply() or for loops
  • All functions have arguments in data-formula order, making it simple to pipe into them via the magrittr pipe (%>%).

A detailed website showcasing package functionality is available at: https://thomasleeper.com/cregg/. Contributions and feedback are welcome on GitHub.

The package, whose primary point of contact is cj(), takes its name from the surname of a famous White House Press Secretary.

The package includes several example conjoint datasets, which is used here and and in examples:

library("cregg")
data("immigration")
data("taxes")

The package provides straightforward calculation and visualization of descriptive marginal means (MMs). These represent the mean outcome across all appearances of a particular conjoint feature level, averaging across all other features. In forced choice conjoint designs, MMs by definition average 0.5 with values above 0.5 indicating features that increase profile favorability and values below 0.5 indicating features that decrease profile favorability. For continuous outcomes, MMs can take any value in the full range of the outcome. Calculation of MMs entail no modelling assumptions are simply descriptive quantities of interest:

# descriptive plotting
f1 <- ChosenImmigrant ~ Gender + Education + LanguageSkills + CountryOfOrigin + Job + JobExperience + JobPlans + ReasonForApplication + 
    PriorEntry
plot(mm(immigration, f1, id = ~CaseID), vline = 0.5)

plot of chunk mmplot

cregg functions uses attr(data$feature, "label") to provide pretty printing of feature labels, so that variable names can be arbitrary. These can be overwritten using the feature_labels argument to override these settings. Feature levels are always deduced from the levels() of righthand-side variables in the model specification. All variables should be factors with levels in desired display order. Similarly, the plotted order of features is given by the order of terms in the RHS formula unless overridden by the order of variable names given in feature_order.

A more common analytic approach for conjoints is to estimate average marginal component effects (AMCEs) using some form of regression analysis. cregg uses glm() and svyglm() to perform estimation and margins to generate average marginal effect estimates. Designs can be specified with any interactions between conjoint features but only AMCEs are returned. (No functionality is provided at the moment for explict estimation of feature interaction effects.) Just like for mm(), the output of cj() (or its alias, amce()) is a tidy data frame:

# estimation
amces <- cj(taxes, chose_plan ~ taxrate1 + taxrate2 + taxrate3 + taxrate4 + taxrate5 + taxrate6 + taxrev, id = ~ID)
head(amces[c("feature", "level", "estimate", "std.error")], 20L)
                          feature         level      estimate   std.error
1           Tax rate for <$10,000      <10k: 0%  0.0000000000          NA
2           Tax rate for <$10,000      <10k: 5% -0.0139987267 0.008367718
3           Tax rate for <$10,000     <10k: 15% -0.0897702241 0.009883554
4           Tax rate for <$10,000     <10k: 25% -0.2215066470 0.012497932
5    Tax rate for $10,000-$35,000    10-35k: 5%  0.0000000000          NA
6    Tax rate for $10,000-$35,000   10-35k: 15% -0.0161677383 0.010015769
7    Tax rate for $10,000-$35,000   10-35k: 25% -0.0849079259 0.015824370
8    Tax rate for $10,000-$35,000   10-35k: 35% -0.1868125806 0.021074682
9    Tax rate for $25,000-$85,000    35-85k: 5%  0.0000000000          NA
10   Tax rate for $25,000-$85,000   35-85k: 15%  0.0005356495 0.008242105
11   Tax rate for $25,000-$85,000   35-85k: 25% -0.0533364485 0.009713809
12   Tax rate for $25,000-$85,000   35-85k: 35% -0.1083416179 0.011917151
13  Tax rate for $85,000-$175,000   85-175k: 5%  0.0000000000          NA
14  Tax rate for $85,000-$175,000  85-175k: 15%  0.0194226595 0.007719126
15  Tax rate for $85,000-$175,000  85-175k: 25%  0.0108897506 0.008078966
16  Tax rate for $85,000-$175,000  85-175k: 35% -0.0015463277 0.008431674
17 Tax rate for $175,000-$375,000  175-375k: 5%  0.0000000000          NA
18 Tax rate for $175,000-$375,000 175-375k: 15%  0.0384042184 0.008581007
19 Tax rate for $175,000-$375,000 175-375k: 25%  0.0504838117 0.008867028
20 Tax rate for $175,000-$375,000 175-375k: 35%  0.0716090284 0.009162901

This makes it very easy to modify, combine, print, etc. the resulting output. It also makes it easy to visualize using ggplot2. A convenience visualization function is provided:

# plotting of AMCEs
plot(amces)

plot of chunk plot_amce

To provide simple subgroup analyses, the cj() function provides a by argument to iterate over subsets of data and calculate AMCEs or MMs on each subgroup. For example, we may want to ensure that there are no substantial variations in preferences within-respondents across multiple conjoint decision tasks:

mm_by <- cj(immigration, ChosenImmigrant ~ Gender + Education + LanguageSkills, id = ~CaseID, estimate = "mm", by = ~contest_no)
plot(mm_by, group = "contest_no", vline = 0.5)

plot of chunk mm_by

A more formal test of these differences is provided by a nested model comparison test:

cj_anova(immigration, ChosenImmigrant ~ Gender + Education + LanguageSkills, by = ~contest_no)
Analysis of Deviance Table

Model 1: ChosenImmigrant ~ Gender + Education + LanguageSkills
Model 2: ChosenImmigrant ~ Gender + Education + LanguageSkills + contest_no + 
    Gender:contest_no + Education:contest_no + LanguageSkills:contest_no
  Resid. Df Resid. Dev Df Deviance      F Pr(>F)
1     13949     3353.0                          
2     13938     3349.6 11   3.3873 1.2814 0.2279

which provides a test of whether any of the interactions between the by variable and feature levels differ from zero.

Again, a detailed website showcasing package functionality is available at: https://thomasleeper.com/cregg/ and the content thereof is installed as a vignette. The package documentation provides further examples.

Installation

CRAN Downloads Travis Build Status codecov.io

This package can be installed directly from CRAN. To install the latest development version you can pull from GitHub:

if (!require("remotes")) {
    install.packages("remotes")
}
remotes::install_github("leeper/cregg")

News

cregg 0.2.4

  • cj() now imposes class "cj_df" on data to preserve attributes during subsetting.

cregg 0.2.3

  • Added function cj_table(), which can be useful in communicating the set of features and levels used in the design as a data frame (e.g,. using knitr::kable(cj_table(data, ~ feat1 + feat2))).
  • Renamed functions props() -> cj_props() and freqs() to cj_freqs() for API consistency.

cregg 0.2.2

  • Added function cj_df(), which provides a modified data frame class ("cj_df") that preserves variable "label" attributes when subsetting.
  • Built-in datasets immigration and taxes gain a "cj_df" class.
  • cj_tidy() now returns objects of class c("cj_df", "data.frame").

cregg 0.2.1

  • Added function cj_tidy() to tidy a "wide" respondent-length conjoint dataset into a "long" respondenttaskalternative-length dataset. An example dataset, wide_conjoint, is provided for examples and testing.

cregg 0.2.0

  • First stable release.
  • Completed functionality of amce_diffs(), limiting it to work with unconstrained designs. (#6)
  • Added tests for accuracy of AMCEs in two-way constrained and fully unconstrained designs.

cregg 0.1.14

  • Added support for constrained designs (when two-way constraints are present). (#6)
  • Removed margins dependency, leaving only linear probability model support.

cregg 0.1.13

  • Added another example datast, taxes, from Ballard-Rosa et al. (2016).
  • Renamed hainmueller dataset to immigration.

cregg 0.1.12

  • Expanded test suite to cover survey-weighted data. Note: cj_anova() currently does not work with weighted data due to a bug in survey::anova.svyglm().
  • Cleaned up internal code for consistency.
  • Added 'statistic' column to function outputs.

cregg 0.1.11

  • Variances returend by amce_diffs() now respect clustering. (#9)

cregg 0.1.10

  • Fixed a factor ordering issue in mm_diffs().
  • Added tests of numeric accuracy of estimates for all main functions.

cregg 0.1.9

  • Added new function mm_diffs() for calculating differences in marginal means.

cregg 0.1.8

  • mm() gains an h0 argument to specify a null hypothesis values so that z statistics and p-values are meaningful.
  • Cleaned up documentation and expanded 'Introduction' vignette, moving README content to there.
  • Require survey version 0.33 (for family argument).
  • Added a basic test suite. (#4)

cregg 0.1.7

  • Added amce_diffs() and amce_anova() functions to assess differences in AMCEs by a grouping variable.
  • Removed some grouping warnings from plot() methods. (#8)

cregg 0.1.6

  • Fixed a bug in the creation of svydesign() objects that was generating incorrect variance estimates.
  • Fully imported ggplot2 and ggstance.

cregg 0.1.5

  • Added props() function to calculate display proportions for features or combinations of features (e.g., for examining constrained designs). Updated documentation accordingly. (#2)
  • Expanded Introduction vignette with examples of a number of diagnostics. (#2)

cregg 0.1.4

  • Added amce_by_reference() function to examine sensitivity of results to choice of reference category. (#2)

cregg 0.1.3

  • Changed the level argument to alpha to avoid ambiguity with "levels" in the "feature level" sense used in the package (as opposed to the intended alpha or significance level).
  • Added a level_order argument to freqs(), mm(), and amce() that specifies whether feature levels are ordered ascending in the output or descending. This is mostly only useful for plotting to specify whether the levels within each feature should be ordered with lower factor levels at the top ("ascending") or at the bottom ("descending") of the plot. (#1)
  • cj() gains a by argument, which enables subgroup analyses, for example to investigate profile spillover effects or analyses by subsets of respondents. (#3)
  • Added vignettes: "Introduction" and "Reproducing Hainmueller et al. (2014)". The latter is a work in progress. (#7)

cregg 0.1.2

  • Changed name of freq() to freqs() and prefixed class names of return values from all functions with cj_*.
  • Added feature_order argument to all functions to regulate display order.
  • Fixed a bug in the handling of header_fmt in plot().
  • Updated README with example of freqs().

cregg 0.1.1

  • Initial release.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("cregg")

0.3.0 by Thomas J. Leeper, 9 months ago


https://github.com/leeper/cregg


Report a bug at https://github.com/leeper/cregg/issues


Browse source code at https://github.com/cran/cregg


Authors: Thomas J. Leeper [aut, cre]


Documentation:   PDF Manual  


MIT + file LICENSE license


Imports stats, sandwich, survey, lmtest, ggplot2, ggstance, scales

Suggests testthat, knitr, rmarkdown


See at CRAN