Efficient calculation of pseudo-ranks and (pseudo)-rank based test statistics. In case of equal sample sizes, pseudo-ranks and mid-ranks are equal. When used for inference mid-ranks may lead to paradoxical results. Pseudo-ranks are in general not affected by such a problem. For details, see Brunner, E., Bathke A. C. and Konietschke, F: Rank- and Pseudo-Rank Procedures in Factorial Designs - Using R and SAS, Springer Verlag, to appear.
This R package provides a function written in C++ to calculate pseudo-ranks in R and some rank statistics which can opionally use pseudo-ranks instead of ranks. For a definition and discussion of pseudo-ranks, see for example
Brunner, E., Bathke A. C. and Konietschke, F: Rank- and Pseudo-Rank Procedures in Factorial Designs - Using R and SAS, Springer Verlag, to appear.
To install the current development version from github:
## install devtools packageif (!requireNamespace("devtools", quietly = TRUE)) {install.packages("devtools")}# install packagedevtools::install_github("happma/pseudorank")library(pseudorank)
The function 'pseudorank' can either be used with data.frames or with vectors. Please note that when using a data.frame only one grouping factor can be used.
# some example datadf <- data.frame(data =rnorm(100),group = c(rep(1,40),rep(2,40),rep(3,20)))df$group <- as.factor(df$group)# two ways to calculate pseudo-ranks# Variant 1: use a vector for the data and a group vectorpseudorank(df$data, df$group)# Variant 2: use a formual object, Note that only one group factor can be used# that is, in data~group*group2 only 'group' will be usedpseudorank(data ~ group, df)
Similarly to the 'rank' function from base R, there are several different methods which can be used to handle ties in the data. Most nonparametric tests rely on so-called mid-(pseudo)-ranks, these are just the average of minimum and maximum (pseudo)-ranks.
# some example datadf <- data.frame(data =rnorm(100),group = c(rep(1,40),rep(2,40),rep(3,20)))df$group <- as.factor(df$group)# min and max pseudo-ranksm <- pseudorank(df$data ,df$group, ties.method = "min")M <- pseudorank(df$data ,df$group, ties.method = "max")mid <- pseudorank(df$data ,df$group, ties.method = "average") # identical to (m+M)/2
In case of missing values there are three options to choose from. These are the same as for the function 'rank' or 'sort' from base R. It is recommended to use 'na.last = NA' to remove the NAs. If the NAs are kept, they can either be put at the beginning or the end of your data, then the pseudo-ranks from those NAs depend on the order they appear in the data. The order does not matter only if the groups containing missing values have the same sample size. See the following R Code for an example of this problem where observation 1 and 4 are interchanged. Here, the pseudo-ranks for those two observations are different, all other pseudo-ranks remain unchanged.
data = c(NA,7,1,NA,3,3,5.5,6,NA, 3, 1)group = as.factor(c(1,1,1,2,2,2,3,3,3,1,3))# Variant 1: keep NAs and put them lastpseudorank(data, group, na.last = TRUE)# we change the order of observation 1 and 4 (both NAs)group = as.factor(c(2,1,1,1,2,2,3,3,3,1,3))pseudorank(data, group, na.last = TRUE)# Variant 2: keep NAs and put them firstpseudorank(data, group, na.last = FALSE)# Variant 3: remove Nas (recommended)pseudorank(data, group, na.last = NA)
The test implemented in this package uses pseudo-ranks instead of ranks. This is mainly due to paradoxical results caused by ranks. See
Brunner, E., Konietschke, F., Bathke, A. C., & Pauly, M. (2018). Ranks and Pseudo-Ranks-Paradoxical Results of Rank Tests. arXiv preprint arXiv:1802.05650.
for a discussion of this problem.
# create some data, please note that the group factor needs to be ordereddf <- data.frame(data = c(rnorm(40, 3, 1), rnorm(40, 2, 1), rnorm(20, 1, 1)),group = c(rep(1,40),rep(2,40),rep(3,20)))df$group <- factor(df$group, ordered = TRUE)# you can either test for a decreasing, increasing or custom trendhettmansperger_norton_test(df$data, df$group, alternative="decreasing")hettmansperger_norton_test(df$data, df$group, alternative="increasing")hettmansperger_norton_test(df$data, df$group, alternative="custom", trend = c(1, 3, 2))
The Kruskal-Wallis test implemented in this package can use pseudo-ranks, if the argument 'pseudoranks = TRUE' is used.
# create some artificial datax = c(1, 1, 1, 1, 2, 3, 4, 5, 6)grp = as.factor(c('A','A','B','B','B','D','D','D','D'))df = data.frame(x = x, grp = grp)kruskal_wallis_test(x~grp, data=df, pseudoranks=TRUE)