Pseudo-Ranks

Efficient calculation of pseudo-ranks and (pseudo)-rank based test statistics. In case of equal sample sizes, pseudo-ranks and mid-ranks are equal. When used for inference mid-ranks may lead to paradoxical results. Pseudo-ranks are in general not affected by such a problem. For details, see Brunner, E., Bathke A. C. and Konietschke, F: Rank- and Pseudo-Rank Procedures in Factorial Designs - Using R and SAS, Springer Verlag, to appear.


CRANstatus Travis-CI Build Status Build status codecov Codacy Badge

This R package provides a function written in C++ to calculate pseudo-ranks in R and some rank statistics which can opionally use pseudo-ranks instead of ranks. For a definition and discussion of pseudo-ranks, see for example

Brunner, E., Bathke A. C. and Konietschke, F: Rank- and Pseudo-Rank Procedures in Factorial Designs - Using R and SAS, Springer Verlag, to appear.

To install the current development version from github:

## install devtools package
if (!requireNamespace("devtools", quietly = TRUE)) {
  install.packages("devtools")
}
# install package
devtools::install_github("happma/pseudorank")
library(pseudorank)

Calculating Pseudo-Ranks

The function 'pseudorank' can either be used with data.frames or with vectors. Please note that when using a data.frame only one grouping factor can be used.

# some example data
df <- data.frame(data =rnorm(100),group = c(rep(1,40),rep(2,40),rep(3,20)))
df$group <- as.factor(df$group)
 
# two ways to calculate pseudo-ranks
# Variant 1: use a vector for the data and a group vector
pseudorank(df$data, df$group)
 
# Variant 2: use a formual object, Note that only one group factor can be used
# that is, in data~group*group2 only 'group' will be used
pseudorank(data ~ group, df)

Similarly to the 'rank' function from base R, there are several different methods which can be used to handle ties in the data. Most nonparametric tests rely on so-called mid-(pseudo)-ranks, these are just the average of minimum and maximum (pseudo)-ranks.

# some example data
df <- data.frame(data =rnorm(100),group = c(rep(1,40),rep(2,40),rep(3,20)))
df$group <- as.factor(df$group)
 
# min and max pseudo-ranks
m <- pseudorank(df$data ,df$group, ties.method = "min")
M <- pseudorank(df$data ,df$group, ties.method = "max")
 
mid <- pseudorank(df$data ,df$group, ties.method = "average") # identical to (m+M)/2
 

In case of missing values there are three options to choose from. These are the same as for the function 'rank' or 'sort' from base R. It is recommended to use 'na.last = NA' to remove the NAs. If the NAs are kept, they can either be put at the beginning or the end of your data, then the pseudo-ranks from those NAs depend on the order they appear in the data. The order does not matter only if the groups containing missing values have the same sample size. See the following R Code for an example of this problem where observation 1 and 4 are interchanged. Here, the pseudo-ranks for those two observations are different, all other pseudo-ranks remain unchanged.

data = c(NA,7,1,NA,3,3,5.5,6,NA, 3, 1)
group =  as.factor(c(1,1,1,2,2,2,3,3,3,1,3))
 
# Variant 1: keep NAs and put them last
pseudorank(data, group, na.last = TRUE)
 
# we change the order of observation 1 and 4 (both NAs)
group =  as.factor(c(2,1,1,1,2,2,3,3,3,1,3))
pseudorank(data, group, na.last = TRUE)
 
# Variant 2: keep NAs and put them first
pseudorank(data, group, na.last = FALSE)
 
# Variant 3: remove Nas (recommended)
pseudorank(data, group, na.last = NA)
 

Hettmansperger-Norton Test for Patterned Alternatives in k-Sample Problems

The test implemented in this package uses pseudo-ranks instead of ranks. This is mainly due to paradoxical results caused by ranks. See

Brunner, E., Konietschke, F., Bathke, A. C., & Pauly, M. (2018). Ranks and Pseudo-Ranks-Paradoxical Results of Rank Tests. arXiv preprint arXiv:1802.05650.

for a discussion of this problem.

# create some data, please note that the group factor needs to be ordered
df <- data.frame(data = c(rnorm(40, 3, 1), rnorm(40, 2, 1), rnorm(20, 1, 1)),
  group = c(rep(1,40),rep(2,40),rep(3,20)))
df$group <- factor(df$group, ordered = TRUE)
 
# you can either test for a decreasing, increasing or custom trend
hettmansperger_norton_test(df$data, df$group, alternative="decreasing")
hettmansperger_norton_test(df$data, df$group, alternative="increasing")
hettmansperger_norton_test(df$data, df$group, alternative="custom", trend = c(1, 3, 2))
 

Kruskal-Wallis Test with Pseudo-Ranks

The Kruskal-Wallis test implemented in this package can use pseudo-ranks, if the argument 'pseudoranks = TRUE' is used.

# create some artificial data
x = c(1, 1, 1, 1, 2, 3, 4, 5, 6)
grp = as.factor(c('A','A','B','B','B','D','D','D','D'))
df = data.frame(x = x, grp = grp)
 
kruskal_wallis_test(x~grp, data=df, pseudoranks=TRUE)

News

pseudorank 0.3.7

  • Changed function name ''psrank'' to ''pseudorank'' and set ''psrank'' as deprecated.
  • Added a data set which can be used to show paradoxical results from rank tests.

pseudorank 0.3.0

  • Added the Kruskal-Wallis test with an option to use either ranks or pseudo-ranks.
  • Added an option to use the Hettmansperger-Norton test with ranks.
  • Fixed a bug in the computation of maximum pseudo-ranks.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("pseudorank")

0.3.8 by Martin Happ, a year ago


http://github.com/happma/pseudorank


Report a bug at http://github.com/happma/pseudorank/issues


Browse source code at https://github.com/cran/pseudorank


Authors: Martin Happ [aut, cre] , Georg Zimmermann [aut] , Arne C. Bathke [aut] , Edgar Brunner [aut]


Documentation:   PDF Manual  


GPL-3 license


Imports Rcpp, doBy

Suggests testthat

Linking to Rcpp

System requirements: C++11


Imported by HRM.


See at CRAN