Statistical Methods for Analyzing Clustered Matched Pair Data
Tests, utilities, and case studies for analyzing significance in
clustered binary matched-pair data. The central function clust.bin.pair uses
one of several tests to calculate a Chi-square statistic. Implemented are the
tests Eliasziw (1991) <10.1002>, Obuchowski (1998)
Durkalski (2003) <10.1002>, and Yang (2010)
<10.1002> with McNemar (1947) <10.1007>
included for comparison. The utility functions nested.to.contingency and
paired.to.contingency convert data between various useful formats. Thyroids
and psychiatry are the canonical datasets from Obuchowski and Petryshen (1989)
Statistical tools for analyzing clustered binary matched-pair data in R.
Clustered Binary Matched-Pair
The tests and tools included in this package work primarily on clustered binary matched-pair data. In order for data to be a good fit for analysis with these tools it needs to have the following three properties:
- Clustered (aka correlated, non-independent): Multiple samples drawn from the same distribution.
- e.g. Measurements of multiple teeth from each of several dental patients. The teeth of one patient are more likely to be similar than the teeth of different patients.
- Binary (aka dichotomous): Results that can have only two discrete values.
- e.g. Values like true/false, yes/no, success/failure, missing/present, etc.
- Matched-pair: Data points that come in pairs. Often from successive trials in a repeated measures experiment or from measuring two different, but related, sources.
- e.g. Eyes measured before and after surgery or the opinions of a doctor and her patient on the patient's progress.
This package contains 5 statistical tests suitable for analyzing clustered binary matched-pair data in various contexts. Four of the tests are designed specifically for this type of data. The fifth test, McNemar's test is the conceptual predecessor to each of the other tests, and is included for comparison. In practice, McNemar's test is specifically noted to be unsuitable for clustered data. The tests are listed below, along with the articles which introduce them:
McNemar, Quinn. 1947. "Note on the sampling error of the difference between correlated proportions or percentages." Psychometrika.
Eliasziw, Michael, and Allan Donner. 1991. "Application of the McNemar test to non‐independent matched pair data." Statistics in medicine.
Obuchowski, Nancy A. 1998. "On the comparison of correlated proportions for clustered data." Statistics in medicine.
Durkalski, Valerie L., Yuko Y. Palesch, Stuart R. Lipsitz, and Philip F. Rust. 2003. "Analysis of clustered matched‐pair data." Statistics in medicine.
Yang, Zhao, Xuezheng Sun, and James W. Hardin. 2010. "A note on the tests for clustered matched‐pair binary data." Biometrical journal.
Included is sample data from real world experiments of the form that can benefit from the application of these tests:
- Obfuscation: Programmers were asked to hand-evaluate pairs of obfuscated and deobfuscated snippets of C source code. The data is tested to see whether or not programmers trace deobfuscated code any differently than obfuscated code.
- Psychiatry: Psychiatrists and their patients were asked to evaluate the applicability of various concerns and treatments to the patient. The data is tested to see how well patient and doctor perception aligns.
- Thyroids: Hyperparathyroidism patients were scanned using both PET and SPECT tests. The data is tested to evaluate the sensitivity and specificity of the two tomogoraphy tests.
Description of functions as well as usage examples are available in the reference manual.
Installation and Use
You can install the latest release from CRAN:
To use, load as follows: