Plackett-Luce Models for Rankings

Functions to prepare rankings data and fit the Plackett-Luce model jointly attributed to Plackett (1975) and Luce (1959, ISBN:0486441369). The standard Plackett-Luce model is generalized to accommodate ties of any order in the ranking. Partial rankings, in which only a subset of items are ranked in each ranking, are also accommodated in the implementation. Disconnected/weakly connected networks implied by the rankings may be handled by adding pseudo-rankings with a hypothetical item. Optionally, a multivariate normal prior may be set on the log-worth parameters and ranker reliabilities may be incorporated as proposed by Raman and Joachims (2014) . Maximum a posteriori estimation is used when priors are set. Methods are provided to estimate standard errors or quasi-standard errors for inference as well as to fit Plackett-Luce trees. See the package website or vignette for further details.


CRAN_Status_Badge Travis-CI BuildStatus AppVeyor BuildStatus CoverageStatus

Package website: https://hturner.github.io/PlackettLuce/.

Overview

The PlackettLuce package implements a generalization of the model jointly attributed to Plackett (1975) and Luce (1959) for modelling rankings data. Examples of rankings data might be the finishing order of competitors in a race, or the preference of consumers over a set of competing products.

The output of the model is an estimated worth for each item that appears in the rankings. The parameters are generally presented on the log scale for inference.

The implementation of the Plackett-Luce model in PlackettLuce:

  • Accommodates ties (of any order) in the rankings, e.g. bananas (\succ) {apples, oranges} (\succ) pears.
  • Accommodates sub-rankings, e.g. pears (\succ) apples, when the full set of items is {apples, bananas, oranges, pears}.
  • Handles disconnected or weakly connected networks implied by the rankings, e.g. where one item always loses as in figure below. This is achieved by adding pseudo-rankings with a hypothetical or ghost item.


In addition the package provides methods for

  • Obtaining quasi-standard errors, that don’t depend on the constraints applied to the worth parameters for identifiability.
  • Fitting Plackett-Luce trees, i.e. a tree that partitions the rankings by covariate values, such as consumer attributes or racing conditions, identifying subgroups with different sets of worth parameters for the items.

Installation

The package may be installed from CRAN via

install.packages("PlackettLuce")

The development version can be installed via

# install.packages("devtools")
devtools::install_github("hturner/PlackettLuce")

Usage

The Netflix Prize was a competition devised by Netflix to improve the accuracy of its recommendation system. To facilitate this they released ratings about movies from the users of the system that have been transformed to preference data and are available from PrefLib. Each data set comprises rankings of a set of 3 or 4 movies selected at random. Here we consider rankings for just one set of movies to illustrate the functionality of PlackettLuce.

The data can be read in using the read.soc function in PlackettLuce

library(PlackettLuce)
preflib <- "http://www.preflib.org/data/election/"
netflix <- read.soc(file.path(preflib, "netflix/ED-00004-00000138.soc"))
head(netflix, 2)
##    n Rank 1 Rank 2 Rank 3 Rank 4
## 1 68      2      1      4      3
## 2 53      1      2      4      3

Each row corresponds to a unique ordering of the four movies in this data set. The number of Netflix users that assigned that ordering is given in the first column, followed by the four movies in preference order. So for example, 68 users ranked movie 2 first, followed by movie 1, then movie 4 and finally movie 3.

PlackettLuce, the model-fitting function in PlackettLuce requires that the data are provided in the form of rankings rather than orderings, i.e. the rankings are expressed by giving the rank for each item, rather than ordering the items. We can create a "rankings" object from a set of orderings as follows

R <- as.rankings(netflix[,-1], input = "ordering")
colnames(R) <- attr(netflix, "item")
R[1:3, as.rankings = FALSE]
##   Mean Girls Beverly Hills Cop The Mummy Returns Mission: Impossible II
## 1          2                 1                 4                      3
## 2          1                 2                 4                      3
## 3          2                 1                 3                      4

Note that read.soc saved the names of the movies in the "item" attribute of netflix, so we have used these to label the items. Subsetting the rankings object R with as.rankings = FALSE, returns the underlying matrix of rankings corresponding to the subset. So for example, in the first ranking the second movie (Beverly Hills Cop) is ranked number 1, followed by the first movie (Mean Girls) with rank 2, followed by the fourth movie (Mission: Impossible II) and finally the third movie (The Mummy Returns), giving the same ordering as in the original data.

Various methods are provided for "rankings" objects, in particular if we subset the rankings without as.rankings = FALSE, the result is again a "rankings" object and the corresponding print method is used:

R[1:3]
##                                          1 
## "Beverly Hills Cop > Mean Girls > Mis ..." 
##                                          2 
## "Mean Girls > Beverly Hills Cop > Mis ..." 
##                                          3 
## "Beverly Hills Cop > Mean Girls > The ..."
print(R[1:3], width = 60)
##                                                              1 
## "Beverly Hills Cop > Mean Girls > Mission: Impossible II  ..." 
##                                                              2 
## "Mean Girls > Beverly Hills Cop > Mission: Impossible II  ..." 
##                                                              3 
## "Beverly Hills Cop > Mean Girls > The Mummy Returns > Mis ..."

The rankings can now be passed to PlackettLuce to fit the Plackett-Luce model. The counts of each ranking provided in the downloaded data are used as weights when fitting the model.

mod <- PlackettLuce(R, weights = netflix$n)
coef(mod, log = FALSE)
##             Mean Girls      Beverly Hills Cop      The Mummy Returns 
##              0.2306285              0.4510655              0.1684719 
## Mission: Impossible II 
##              0.1498342

Calling coef with log = FALSE gives the worth parameters, constrained to sum to one. These parameters represent the probability that each movie is ranked first.

For inference these parameters are converted to the log scale, by default setting the first parameter to zero so that the standard errors are estimable:

summary(mod)
## Call: PlackettLuce(rankings = R, weights = netflix$n)
## 
## Coefficients:
##                        Estimate Std. Error z value Pr(>|z|)    
## Mean Girls              0.00000         NA      NA       NA    
## Beverly Hills Cop       0.67080    0.06099  10.999  < 2e-16 ***
## The Mummy Returns      -0.31404    0.06465  -4.857 1.19e-06 ***
## Mission: Impossible II -0.43128    0.06508  -6.627 3.42e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual deviance:  3493.5  on  3525  degrees of freedom
## AIC:  3499.5 
## Number of iterations: 5

In this way, Mean Girls is treated as the reference movie, the positive parameter for Beverly Hills Cop shows this was more popular among the users, while the negative parameters for the other two movies show these were less popular.

Comparisons between different pairs of movies can be made visually by plotting the log-worth parameters with comparison intervals based on quasi standard errors.

qv <- qvcalc(mod)
plot(qv, ylab = "Worth (log)", main = NULL)

If the intervals overlap there is no significant difference. So we can see that Beverly Hills Cop is significantly more popular than the other three movies, Mean Girls is significant more popular than The Mummy Returns or Mission: Impossible II, but there was no significant difference in users’ preference for these last two movies.

Going Further

The full functionality of PlackettLuce is illustrated in the package vignette, along with details of the model used in the package and a comparison to other packages. The vignette can be found on the package website or from within R once the package has been installed, e.g. via

vignette("Overview", package = "PlackettLuce")

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

References

Luce, R. Duncan. 1959. Individual Choice Behavior: A Theoretical Analysis. New York: Wiley.

Plackett, Robert L. 1975. “The Analysis of Permutations.” Appl. Statist 24 (2):193–202. https://doi.org/10.2307/2346567.

News

PlackettLuce 0.2-3

Improvements

  • Print methods for "PlackettLuce" and "summary.PlacketLuce" objects now respect options("width").

Changes in Behaviour

  • fitted always returns n which is now weighted count of rankings (previously only returned unweighted count with argument aggregate = TRUE).

Bug fixes

  • Correct vcov for weighted rankings of more than two items.
  • Enable AIC.pltree to work on "pltree" object with one node.

PlackettLuce 0.2-2

New features

  • Add AIC.pltree to enable computation of AIC on new observations (e.g. data held out in cross-validation).
  • Add fitted.pltree to return combined fitted probabilities for each choice within each ranking, for each node in a Plackett-Luce tree.

Bug fixes

  • vcov.PlackettLuce now works for models with non-integer weights (fixes #25).
  • plot.pltree now works for worth = TRUE with psychotree version 0.15-2 (currently pre-release on https://r-forge.r-project.org/R/?group_id=330)
  • PlackettLuce and plfit now work when start argument is set.
  • itempar.PlackettLuce now works with alias = FALSE

PlackettLuce 0.2-1

New features

  • Add pkgdown site.
  • Add content to README (fixes #5).
  • Add plot.PlackettLuce method so that plotting works for a saved "PlackettLuce" object

Improvements

  • Improved vignette, particularly example based on beans data (which has been updated).
  • Improved help files particularly ?PlackettLuce and new package?PlackettLuce. (Fixes #14 and #21).

Changes in behaviour

  • maxit defaults to 500 in PlackettLuce.
  • Steffensen acceleration only applied in iterations where it will increase the log-likelihood (still only attempted once iterations have reached a solution that is "close enough" as specified by steffensen argument).

Bug fixes

  • coef.pltree() now respects log = TRUE argument (fixes #19).
  • Fix bug causes lack of convergence with iterative scaling plus pseudo-rankings.
  • [.grouped_rankings] now works for replicated indices.

PlackettLuce 0.2-0

New Features

  • Add vignette.
  • Add data sets pudding, nascar and beans.
  • Add pltree() function for use with partykit::mob(). Requires new objects of type "grouped_rankings" that add a grouping index to a "rankings" object and store other derived objects used by PlackettLuce. Methods to print, plot and predict from Plackett-Luce tree are provided.
  • Add connectivity() function to check connectivity of a network given adjacency matrix. New adjacency() function computes adjacency matrix without creating edgelist, so remove as.edgelist generic and method for `"PlackettLuce" objects.
  • Add as.data.frame methods so that rankings and grouped rankings can be added to model frames.
  • Add format methods for rankings and grouped_rankings, for pretty printing.
  • Add [ methods for rankings and grouped_rankings, to create valid rankings from selected rankings and/or items.
  • Add method argument to offer choices of iterative scaling (default), or direct maximisation of the likelihood via BFGS or L-BFGS.
  • Add itempar method for "PlackettLuce" objects to obtain different parameterizations of the worth parameters.
  • Add read.soc function to read Strict Orders - Complete List (.soc) files from http://www.preflib.org.

Changes in behaviour

Old behaviour should be reproducible with arguments

npseudo = 0, steffensen = 0, start = c(rep(1/N, N), rep(0.1, D))

where N is number of items and D is maximum order of ties.

  • Implement pseudo-data approach - now used by default.
  • Improve starting values for ability parameters
  • Add Steffensen acceleration to iterative scaling algorithm
  • Dropped ref argument from PlackettLuce; should be specified instead when calling coef, summary, vcov or itempar.
  • qvcalc generic now imported from qvcalc

Improvements

  • Refactor code to speed up model fitting and computation of fitted values and vcov.
  • Implement ranking weights and starting values in PlackettLuce.
  • Add package tests
  • Add log argument to coef so that worth parameters (probability of coming first in strict ranking of all items) can be obtained easily.

PlackettLuce 0.1-0

  • GitHub-only release of prototype package.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("PlackettLuce")

0.2-9 by Heather Turner, a month ago


https://hturner.github.io/PlackettLuce/


Report a bug at https://github.com/hturner/PlackettLuce/issues


Browse source code at https://github.com/cran/PlackettLuce


Authors: Heather Turner [aut, cre] , Ioannis Kosmidis [aut] , David Firth [aut] , Jacob van Etten [ctb]


Documentation:   PDF Manual  


GPL-3 license


Imports Matrix, igraph, methods, partykit, psychotools, psychotree, RSpectra, qvcalc, sandwich, stats

Suggests BiocStyle, BayesMallows, BradleyTerry2, BradleyTerryScalable, Matrix.utils, PLMIX, ROlogit, StatRank, covr, hyper2, kableExtra, knitr, lbfgs, gnm, pmr, rmarkdown, testthat


Imported by ClimMobTools, PLMIX.


See at CRAN