Risk-related information (like the prevalence of conditions and the sensitivity and specificity of diagnostic tests or treatment decisions) can be expressed in terms of probabilities or frequencies. By providing a toolbox of methods and metrics, 'riskyr' computes, translates, and visualizes risk-related information in a variety of ways. Offering multiple complementary perspectives on the interplay between key parameters renders teaching and training of risk literacy more transparent.
Risk-related information — like the prevalence of conditions and the sensitivity and specificity of diagnostic tests or treatment decisions — can be expressed in terms of probabilities or frequencies. By providing a toolbox of methods and metrics, riskyr
computes, translates, and visualizes risk-related information in a variety of ways. Offering multiple complementary perspectives on the interplay between key parameters renders teaching and training of risk literacy more transparent.
The goals of riskyr
are less of a computational and more of a representational nature: We express risk-related information in multiple formats, facilitate the translation between them, and provide a variety of attractive visualizations that emphasize different aspects of risk-related scenarios. Whereas people find it difficult to understand and compute information expressed in terms of probabilities, the same information is easier to understand and compute when expressed in terms of frequencies (e.g., Gigerenzer, 2002, 2014; Gigerenzer & Hoffrage, 1995). But rather than just expressing probabilities in terms of frequencies, riskyr
allows translating between formats and illustrates the relationships between different representations in a variety of ways. Switching between and interacting with different representations fosters transparency and boosts human understanding of risk-related information.[2]
Basic assumptions and goals driving the current development of riskyr
include:
Effective training in risk literacy requires transparent representations, smart strategies, and simple tools.
We aim to provide a set of (computational and representational) tools that facilitate various calculations, translations between formats, and a range of alternative views on the interplay between probabilities and frequencies.
Just as no single tool fits all tasks, no single graph illustrates all aspects of a problem. A variety of visualizations that illustrate the interplay of parameters and metrics can facilitate active and explorative learning. It is particularly helpful to view relationships from alternative perspectives and to observe the change of one parameter as a function of others.
Based on these assumptions and goals, we provide a range of computational and representational tools. Importantly, the objects and functions in the riskyr
toolbox are not isolated, but complement, explain, and support each other. All functions and visualizations can also be used separately and explored interactively, providing immediate feedback on the effect of changes in parameter values. By providing a variety of customization options, users can explore and design representations of risk-related information that suit their personal needs and goals.
riskyr
is available from CRAN at https://CRAN.R-project.org/package=riskyr:install.packages("riskyr") # install riskyr from CRAN clientlibrary("riskyr") # load to use the package
# install.packages("devtools")devtools::install_github("hneth/riskyr")
An interactive online version is available at http://riskyr.org.
The package documentation is available online:
riskyr
is designed to address problems like the following:[3]
Screening for hustosis
A screening device for detecting the clinical condition of hustosis is developed. The current device is very good, but not perfect. We have the following information:
- About 4% of the people of the general population suffer from hustosis.
- If someone suffers from hustosis, there is a chance of 80% that he or she will test positively for the condition.
- If someone is free from hustosis, there is a chance of 5% that he or she will still test positively for the condition.
Mr. and Ms. Smith have both been screened with the device:
- Mr. Smith tested positively (i.e., received a diagnosis of hustosis).
- Ms. Smith tested negatively (i.e., was judged to be free of hustosis).
Please answer the following questions:
- What is the probability that Mr. Smith actually suffers from hustosis?
- What is the probability that Ms. Smith is actually free of hustosis?
The first challenge in solving such problems is in understanding the information that is being provided. The problem description provides three essential probabilities:
prev = .04
.sens = .80
.fart = .05
, implying a specificity of (100% − 5%) = 95%: spec = .95
.The second challenge here lies in understanding the questions that are being asked — and in realizing that their answers are not simply the decision's sensitivity or specificity values. Instead, we are asked to provide two conditional probabilities:
PPV
).NPV
).One of the best tricks in risk literacy education is to translate probabilistic information into frequencies.[4] To do this, we imagine a representative sample of N = 1000
individuals. Rather than asking about the probabilities for Mr. and Ms. Smith, we could re-frame the questions as:
Assuming a representative sample of 1000 individuals:
- What proportion of individuals with a positive test result actually suffer from hustosis?
- What proportion of individuals with a negative test result are actually free of hustosis?
Here is how riskyr
allows you to view and solve such problems:
library(riskyr) # loads the package
We define a new riskyr
scenario (called hustosis
) with the information provided by our problem:
hustosis <- riskyr(scen_lbl = "Example",cond_lbl = "Hustosis",dec_lbl = "Screening test",popu_lbl = "Sample",N = 1000, # population sizeprev = .04, sens = .80, spec = (1 - .05) # 3 probabilities)
By providing the argument N = 1000
we define the scenario for a target population of 1000 people. If we leave this parameter unspecified (or NA
), riskyr
will automatically pick a suitable value of N
.
To obtain a quick overview of key parameter values, we ask for the summary
of hustosis
:
summary(hustosis) # summarizes key parameter values:
The summary distinguishes between probabilities, frequencies, and accuracy information. In Probabilities
we find the answer to both of our questions that take into account all the information provided above:
The conditional probability that Mr. Smith actually suffers from hustosis given his positive test result is 40% (as PPV = 0.400
).
The conditional probability that Ms. Smith is actually free of hustosis given her negative test result is 99.1% (as NPV = 0.991
).
If find these answers surprising, you are an ideal candidate for additional insights into the realm of risk literacy. A key component of riskyr
is to analyze and view a scenario from a variety of different perspectives. To get you started immediately, we only illustrate some introductory commands here and focus on different types of visualizations. (Call riskyr.guide()
for various vignettes that provide more detailed information.)
Rather than defining our hustosis
scenario by providing 3 essential probabilities (prev
, sens
, and spec
), we could define the same scenario by providing 4 essential frequencies (hi
, mi
, fa
, and cr
) as follows:
hustosis_2 <- riskyr(scen_lbl = "Example",cond_lbl = "Hustosis",dec_lbl = "Screening test",popu_lbl = "Sample",hi = 32, mi = 8, fa = 48, cr = 912 # 4 key frequencies)
As we took the values of these frequencies from the summary
of hustosis
, the hustosis_2
scenario should contain exactly the same information as hustosis
:
all.equal(hustosis, hustosis_2) # do both contain the same information?#> [1] TRUE
Various visualizations of riskyr
scenarios can be created by a range of plotting functions.
The default type of plot used in riskyr
is a prism plot (or network diagram) that shows key frequencies of a scenario as nodes and key probabilities as edges linking the nodes:
plot(hustosis) # default plot# => internally calls plot_prism(...) with many additional arguments:# plot(hustosis, type = "prism", by = "cddc", area = "no", f_lbl = "num", p_lbl = "mix")
A tree diagram is the upper half of a prism plot, which can be obtained by plotting a scenario with 1 of 3 perspectives:
by = "cd"
), to split the population into TRUE vs. FALSE (cond_true
vs. cond_false
) cases;by = "dc"
), to split the population into negative vs. positive (dec_neg
vs. dec_pos
) decisions;by = "ac"
), to split the population into correct vs. incorrect (dec_cor
vs. dec_err
) decisions.For instance, the following command plots a frequency tree by decisions:
plot(hustosis, by = "dc") # plot a tree diagram (by decision)
This particular tree splits the population of N = 1000
individuals into two subgroups by decision (by = "dc"
) and contains the answer to the second (frequency) version of our questions:
32/80 = .400
(corresponding to our value of PPV
above).912/920 = .991
(corresponding to our value of NPV
above, except for minimal differences due to rounding).Of course, the frequencies of these ratios were already contained in the hustosis
summary above. But the representation in the form of a tree diagram makes it easier to understand the decomposition of the population into subgroups and to see which frequencies are required to answer a particular question.
An icon array shows the classification result for each of N = 1000
individuals in our population:
plot(hustosis, type = "icons") # plot an icon array
While this particular icon array is highly regular (as both the icons and classification types are ordered), riskyr
provides many different versions of this type of graph. This allows viewing the probability of diagnostic outcomes as either frequency, area, or density (see ?plot_icons
for details and examples).
An area plot (or mosaic plot) offers a way of expressing classification results as the relationship between areas. Here, the entire population is represented as a square and the probability of its subgroups as the size of rectangles (see ?plot_area
for details and examples):
plot(hustosis, type = "area") # plot an area/mosaic plot (by = "cddc")
When not scaling the size of rectangles by their relative frequencies or probabilities, we can plot basic scenario information as a 2-by-2 confusion (or contingency) table (see ?plot_tab
for details and examples):
plot(hustosis, type = "tab") # plot 2x2 confusion table (by = "cddc")
A bar plot allows comparing relative frequencies as the heights of bars (see ?plot_bar
for details and examples):
plot(hustosis, type = "bar", f_lbl = "abb") # plot bar chart (by "all" perspectives):
By adopting a functional perspective, we can ask how the values of some probabilities (e.g., the predictive values PPV
and NPV
) change as a function of another (e.g., the condition's prevalence prev
, see ?plot_curve
for details and examples):
plot(hustosis, type = "curve", uc = .05) # plot probability curves (by prevalence):
When parameter values systematically depend on two other parameters, we can plot this as a plane in a 3D cube. The following graph plots the PPV
as a function of the sensitivity (sens
) and specificity (spec
) of our test for a given prevalence (prev
, see ?plot_plane
for details and examples):
plot(hustosis, type = "plane") # plot probability plane (by sens x spec):
The L-shape of this plane reveals a real problem with our current test: Given a prevalence of 4% for hustosis in our target population, the PPV
remains very low for the majority of the possible range of sensitivity and specificity values. To achieve a high PPV
, the key requirement for our test is an extremely high specificity. Although our current specificity value of 95% (spec = .95
) may sound pretty good, it is still not high enough to yield a PPV
beyond 40%.
As defining your own scenarios can be cumbersome and the literature is full of risk-related problems (often referred to as "Bayesian reasoning"), riskyr
provides a set of — currently 24 — pre-defined scenarios (stored in a list scenarios
). Here, we provide an example that shows how you can select and explore them.
Let us assume you want to learn more about the controversy surrounding screening procedures of prostate-cancer (known as PSA screening). Scenario 10 in our collection of scenarios
is from an article on this topic (Arkes & Gaissmaier, 2012). To select a particular scenario, simply assign it to an R object. For instance, we can assign Scenario 10 to s10
:
s10 <- scenarios$n10 # assign pre-defined Scenario 10 to s10
Our selected scenario object s10
is a list with 30 elements, which describe it in both text and numeric variables. The following commands provide an overview of s10
in text form:
s10$scen_lbl # a descriptive label#> [1] "PSA test (patients)"s10$cond_lbl # the current condition#> [1] "Prostate cancer"s10$dec_lbl # the current decision#> [1] "PSA-Test"s10$popu_lbl # the current population#> [1] "Male patients with symptoms"s10$scen_apa # scenario source (APA)#> [1] "Arkes, H. R., & Gaissmaier, W. (2012). Psychological research and the prostate-cancer screening controversy. Psychological Science, 23(6), 547--553."# summary(s10) # summarises a scenario
Generating some riskyr
plots allows a quick visual exploration of the scenario. We only illustrate some selected plots and options here, and trust that you will play with and explore the rest for yourself.
A tree diagram is a prism plot that views the population from only one perspective, but provides a quick overview. In the following plot, the boxes are depicted as squares with area sizes that are scaled by relative frequencies (using the area = "sq"
argument):
plot(s10, type = "tree", by = "cd", area = "sq", # tree/prism plot with scaled squaresf_lbl = "def", f_lbl_sep = ":\n") # custom frequency labels
The prism plot (or network diagram) combines 2 tree diagrams to simultaneously provide two perspectives on a population (see Wassner et al., 2004). riskyr
provides several variants of prism plots. To avoid redundancy to the previous tree diagram, the following version splits the population by accuracy and by decision (see the by = "acdc"
argument). In addition, the frequencies are represented as horizontal rectangles (area = "hr"
) so that their relative width reflect the number of people in the corresponding subgroup:
plot(s10, type = "prism", by = "acdc", area = "hr", # prism plot with horizontal rectanglesp_lbl = "num") # numeric probability labels
plot(s10, type = "icons", arr_type = "shuffled") # plot a shuffled icon array
plot(s10, type = "area", p_split = "v", p_lbl = "def") # plot an area/mosaic plot (with probabilities)
plot(s10, type = "tab", p_split = "h", p_lbl = "def") # plot a 2x2 table (with probabilities)
The following curves show the values of several conditional probabilities as a function of prevalence:
plot(s10, type = "curve", what = "all", uc = .05) # plot all curves (by prev):
Adding the argument what = "all"
also shows the proportion of positive decisions (ppod
) and the decision's overall accuracy (accu
) as a function of the prevalence (prev
). Would you have predicted their shape without seeing this graph?
The following surface shows the negative predictive value (NPV) as a function of sensitivity and specificity (for a given prevalence):
plot(s10, type = "plane", what = "NPV") # plot plane (as a function of sens x spec):
Hopefully, this brief overview managed to whet your appetite for visual exploration. If so, call riskyr.guide()
for viewing the package vignettes and obtaining additional information.
riskyr
originated out of a series of lectures and workshops on risk literacy.
The current version (0.2.0, as of Dec. 20, 2018) is still under development. Its primary designers are Hansjörg Neth, Felix Gaisbauer, Nico Gradwohl, and Wolfgang Gaissmaier, who are researchers at the department of Social Psychology and Decision Sciences at the University of Konstanz, Germany.
The riskyr
package is open source software written in R and released under the GPL 2 | GPL 3 licenses.
The following resources and versions are currently available:
Type: | Version: | URL: |
---|---|---|
A. riskyr (R package): |
Release version | https://CRAN.R-project.org/package=riskyr |
Development version | https://github.com/hneth/riskyr | |
B. riskyrApp (R Shiny code): |
Online version | http://riskyr.org |
Development version | https://github.com/hneth/riskyrApp | |
C. Online documentation: | Release version | https://hneth.github.io/riskyr |
Development version | https://hneth.github.io/riskyr/dev |
We appreciate your feedback, comments, or questions.
Please report any riskyr
-related issues at https://github.com/hneth/riskyr/issues.
Email us at [email protected] if you want to modify or share this software.
To cite riskyr
in derivations and publications please use:
riskyr
: A toolbox for rendering risk literacy more transparent.
Social Psychology and Decision Sciences, University of Konstanz, Germany.
Computer software (R package version 0.2.0, Dec. 20, 2018).
Retrieved from https://CRAN.R-project.org/package=riskyr.A BibTeX entry for LaTeX users is:
@Manual{riskyr,
title = {riskyr: A toolbox for rendering risk literacy more transparent},
author = {Hansjörg Neth and Felix Gaisbauer and Nico Gradwohl and Wolfgang Gaissmaier},
year = {2018},
organization = {Social Psychology and Decision Sciences, University of Konstanz},
address = {Konstanz, Germany},
note = {R package (version 0.2.0, Dec. 20, 2018)},
url = {https://CRAN.R-project.org/package=riskyr},
}
Calling citation("riskyr")
in the package also displays this information.
Arkes, H. R., & Gaissmaier, W. (2012). Psychological research and the prostate-cancer screening controversy. Psychological Science, 23, 547–553.
Garcia-Retamero, R., & Cokely, E. T. (2017). Designing visual aids that promote risk literacy: A systematic review of health research and evidence-based design heuristics. Human Factors, 59, 582–627.
Gigerenzer, G. (2002). Reckoning with risk: Learning to live with uncertainty. London, UK: Penguin.
Gigerenzer, G. (2014). Risk savvy: How to make good decisions. New York, NY: Penguin.
Gigerenzer, G., & Gaissmaier, W. (2011). Heuristic decision making. Annual Review of Psychology, 62, 451–482. [Available online]
Gigerenzer, G., Gaissmaier, W., Kurz-Milcke, E., Schwartz, L., & Woloshin, S. (2007). Helping doctors and patients make sense of health statistics. Psychological Science in the Public Interest, 8, 53–96. [Available online]
Gigerenzer, G., & Hoffrage, U. (1995). How to improve Bayesian reasoning without instruction: Frequency formats. Psychological Review, 102, 684–704.
Hoffrage, U., Gigerenzer, G., Krauss, S., & Martignon, L. (2002). Representation facilitates reasoning: What natural frequencies are and what they are not. Cognition, 84, 343–352.
Hoffrage, U., Krauss, S., Martignon, L., & Gigerenzer, G. (2015). Natural frequencies improve Bayesian reasoning in simple and complex inference tasks. Frontiers in Psychology, 6, 1473.
Hoffrage, U., Lindsey, S., Hertwig, R., & Gigerenzer, G. (2000). Communicating statistical information. Science, 290, 2261–2262.
Khan, A., Breslav, S., Glueck, M., & Hornbæk, K. (2015). Benefits of visualization in the mammography problem. International Journal of Human-Computer Studies, 83, 94–113.
Kurzenhäuser, S., & Hoffrage, U. (2002). Teaching Bayesian reasoning: An evaluation of a classroom tutorial for medical students. Medical Teacher, 24, 516–521.
Kurz-Milcke, E., Gigerenzer, G., & Martignon, L. (2008). Transparency in risk communication. Annals of the New York Academy of Sciences, 1128, 18–28.
Micallef, L., Dragicevic, P., & Fekete, J.-D. (2012). Assessing the effect of visualizations on Bayesian reasoning through crowd-sourcing. IEEE Transactions on Visualization and Computer Graphics, 18, 2536–2545.
Neth, H., & Gigerenzer, G. (2015). Heuristics: Tools for an uncertain world. In R. Scott & S. Kosslyn (Eds.), Emerging trends in the social and behavioral sciences. New York, NY: Wiley Online Library. [Available online]
Sedlmeier, P., & Gigerenzer, G. (2001). Teaching Bayesian reasoning in less than two hours. Journal of Experimental Psychology: General, 130, 380–400.
Wassner, C., Martignon, L., & Biehler, R. (2004). Bayesianisches Denken in der Schule. Unterrichtswissenschaft, 32, 58–96.
[1] Simon, H.A. (1996). The Sciences of the Artificial (3rd ed.). The MIT Press, Cambridge, MA. (p. 132).
[2] To clarify our notion of "risk" in this context, we need to distinguish it from its everyday usage as anything implying a chance of danger or harm. In basic research on judgment and decision making and the more applied fields of risk perception and risk communication, the term risk typically refers to decisions or events for which the options and their consequences are known and probabilities for all possible outcomes can be provided. For our present purposes, the notion of risk-related information refers to any scenario in which some events of interest are determined by probabilities. While it is important that quantitative (estimates of) probabilities are provided, their origin, reliability and validity is not questioned here. Thus, the probabilities provided can be based on clinical intuition, on recordings of extensive experience, or on statistical simulation models (e.g., repeatedly casting dice and counting the frequencies of outcomes). This notion of risk is typically contrasted with the much wider notion of uncertainty in which options or probabilities are unknown or cannot be quantified. (See Gigerenzer and Gaissmaier, 2011, or Neth and Gigerenzer, 2015, on this conceptual distinction and corresponding decision strategies.)
[3] See Gigerenzer (2002, 2014), Gigerenzer and Hoffrage, U. (1995), Gigerenzer et al. (2007), and Hoffrage et al. (2015) for scientific background information and similar problems. See Sedlmeier and Gigerenzer (2001) and Kurzenhäuser and Hoffrage (2002) for related training programs (with remarkable results), and Micallef et al. (2012) and Khan et al. (2015) for (rather sceptical and somewhat sobering) studies on the potential benefits of static representations for solving Bayesian problems.
[4] See Gigerenzer and Hoffrage (1995) and Hoffrage et al. (2000, 2002) on the concept of natural frequencies.
The current development version is available at https://github.com/hneth/riskyr/.
riskyr 0.2.0 was ready to be released on December 20, 2018, and submitted to CRAN on January 02, 2019.
Log of changes since last release:
New riskyrApp
version [2018-12]:
To use selected riskyr
functions without the need for coding
an updated version of riskyrApp
is available
at https://github.com/hneth/riskyrApp (R Shiny code) and
at http://riskyr.org (interactive online version).
Using pkgdown
[2018-12]:
Provide package documentation online at https://hneth.github.io/riskyr (latest release version) and https://hneth.github.io/riskyr/dev/ (current development version).
Retiring obsolete functions [2018-12]:
The functions plot_fnet
and plot_tree
are replaced by plot_prism
, and plot_mosaic
is replaced by plot_area
. This improves functionality (e.g., by providing more consistent options across different plotting functions) and removes dependencies on external packages.
New plot_prism
function [2018-11]:
Show a scenario as double frequency tree (by 3 x 2 perspectives) or a frequency tree (in 3 perspectives) with many additional options; replaces the older plot_fnet
and plot_tree
functions (and removes dependency on the diagram
package).
New plot_area
function [2018-10]:
Show a scenario as a mosaic plot of relative proportions (in 3 x 2 x 2 possible versions, with many additional options); replaces the older plot_mosaic
function (and removes dependencies on the grid
and vcd
packages).
New plot_tab
function [2018-10]:
Show a scenario as contingency table of frequencies (with row and column sums, and options for showing probabilities);
a variant of plot_area
that does not scale area sizes.
New plot_bar
function [2018-08]:
Show scenario frequencies as vertical bars (in various configurations).
Create plot_util.R
collection of graphical utility functions [2018-08]:
Define a new box
object type and various functions for plotting, labeling, and linking them in graphs
(to remove dependencies on and limitations imposed by other packages).
Updated riskyr
function [2018-03]:
As an alternative to providing 3 essential frequencies, it is now possible to define a scenario from 4 essential frequencies (and check for consistency with given probabilities).
Improved plot_icons
function [2018-12]:
Show icons separated into 2 subsets by 3 perspectives (condition, decision, accuracy), using the same by
argument as the other plotting functions.
plot_curve
and plot_plane
functions [2018-11]:
Update variable names (to snake_case) and add arguments (e.g., col_pal
, lbl_txt
mar_notes
, etc.) for consistency with newer plotting functions.
scale
argument [2018-10]:
The new plotting functions feature a scale
argument that allows scaling the size or areas of boxes either by (exact) probability or by (possibly rounded) frequency. When using scale = "f"
, the probabilities shown are also re-computed from (possibly rounded) frequencies.
plot_fnet
[2018-02]:
Change argument box.cex
to cex.lbl
to ensure consistency with plot_curves
and plot_plane
(and use it to scale arrow labels accordingly). Added warning when using deprecated argument.
plot_mosaic
[2018-02]:
Change Boolean vsplit
argument to by = "cd"
vs. by = "dc"
to ensure consistency with plot_fnet
and plot_tree
. Added warning when using deprecated argument.
mar_notes
and plot_mar
[2018-09]:
Use consistent plot margins and options for showing margin notes for all plots.
read_popu
[2018-11]:
Read a data frame popu
and interpret is as a riskyr
scenario, allows creating scenarios from raw data.
comp_accu.R
[2018-08]:
Compute exact accuracy values (not approximations, when using comp_accu_freq
on rounded freq
values) by using the new function comp_accu_prob
to compute the list accu
from probabilities. Signal rounding when showing accuracy based on rounded frequencies in plots (when show.accu == TRUE
and round == TRUE
).
pal
and freq
[2018-12]:
Use more consistent color and frequency names (e.g., cond_true
, dec_pos
, and dec_cor
are now names of frequencies and the colors corresponding to these frequencies).
freq
[2018-07]:
Add a 3rd perspective (by accuracy or by correspondence of decision to condition) and corresponding frequency pair of dec.cor
and dec.err
(i.e., hi + cr
vs. mi + fa
as the diagonal of 4 SDT cases). This increases the number of frequencies in freq
from 9 to 11. Also added corresponding labels in init_txt.R
and colors in init_pal.R
.
prob
[2018-09]:
Include accuracy metrics in probabilities (in prob
and summary functions).
pal
and txt
[2018-10]:
Add multiple color palettes and text labeling schemes (see ?pal
and ?txt
for details).
More consistent argument and variable names (using snake_case).
Many additions and corrections in documentation, examples, and vignettes.
plot_icons
[2018-03]:
Bug fix to also swap symbols in legend when the symbol order is changed manually.
txt_def
[2018-02]:
Simplify some default text labels (e.g., for current population, condition, and decision).
.onAttach
[2018-02]:
Cast dice to display probabilistic (i.e., risk-related) start-up messages.