Alluvial diagrams use x-splines, sometimes augmented with stacked
histograms, to visualize multi-dimensional or repeated-measures data with
categorical or ordinal variables. They can be viewed as simplified and
standardized Sankey diagrams; see Riehmann, Hanfler, and Froehlich (2005)
This is a ggplot2 extension for alluvial diagrams.
The alluvial plots implemented here can be used to visualize frequency distributions over time or frequency tables involving several categorical variables. The design is derived mostly from the alluvial package, but the ggplot2 framework induced several conspicuous differences:
The latest stable release can be installed from CRAN:
Development versions can be installed from GitHub:
devtools::install_github("corybrunson/ggalluvial", build_vignettes = TRUE)
devtools::install_github("corybrunson/ggalluvial", ref = "optimization")
Here is how to generate an alluvial diagram representation of the multi-dimensional categorical dataset of passengers on the Titanic:
titanic_wide <- data.frame(Titanic)head(titanic_wide)#> Class Sex Age Survived Freq#> 1 1st Male Child No 0#> 2 2nd Male Child No 0#> 3 3rd Male Child No 35#> 4 Crew Male Child No 0#> 5 1st Female Child No 0#> 6 2nd Female Child No 0ggplot(data = titanic_wide,aes(axis1 = Class, axis2 = Sex, axis3 = Age,y = Freq)) +scale_x_discrete(limits = c("Class", "Sex", "Age"), expand = c(.1, .05)) +xlab("Demographic") +geom_alluvium(aes(fill = Survived)) +geom_stratum() + geom_text(stat = "stratum", label.strata = TRUE) +theme_minimal() +ggtitle("passengers on the maiden voyage of the Titanic","stratified by demographics and survival")
The data is in "wide" format, but ggalluvial also recognizes data in "long" format and can convert between the two:
titanic_long <- to_lodes_form(data.frame(Titanic),key = "Demographic",axes = 1:3)head(titanic_long)#> Survived Freq alluvium Demographic stratum#> 1 No 0 1 Class 1st#> 2 No 0 2 Class 2nd#> 3 No 35 3 Class 3rd#> 4 No 0 4 Class Crew#> 5 No 0 5 Class 1st#> 6 No 0 6 Class 2ndggplot(data = titanic_long,aes(x = Demographic, stratum = stratum, alluvium = alluvium,y = Freq, label = stratum)) +geom_alluvium(aes(fill = Survived)) +geom_stratum() + geom_text(stat = "stratum") +theme_minimal() +ggtitle("passengers on the maiden voyage of the Titanic","stratified by demographics and survival")
For detailed discussion of the data formats recognized by ggalluvial and several examples that illustrate its flexibility and limitations, read the vignette:
vignette(topic = "ggalluvial", package = "ggalluvial")
The documentation contains several examples; use
help() to call forth examples of any layer (
If you use ggalluvial-generated figures in publication, i'd be grateful to hear about it! You can also cite the package according to
Issues and pull requests are more than welcome! Pretty much every fix and feature of this package derives from a problem or question posed by someone with datasets or design goals i hadn't anticipated.
Because the only functional (e.g. out
README.md) occurrence of devtools is to call
session_info() at the ends of the vignettes, this suggestion and usage are switched to sessioninfo.
Documentation is slightly reformatted due to switching roxygen syntax to markdown.
The internal z-ordering function
z_order_aes failed to recognize contiguous segments of alluvia, thereby assigning later segments missing values of
'group' and preventing them from being rendered. This has been corrected.
An occurrence of
geom_alluvium() was not updated for v0.8.0 and caused
geom_alluvium() to throw an error in some cases. This has been corrected.
An earlier solution to the z-ordering problem sufficed for matched layers (
*_flow()) but failed for the combination of
geom_flow(). This is been corrected in the code for
GeomFlow$draw_panel(), though a more elegant and general solution is preferred.
The deprecated parameters
axis_width (all geom layers) and
geom_flow()) are removed and an explanatory note added to the layers' documentation.
A vignette illustrating two methods for labeling small strata, using other ggplot2 extensions, is included.
The internal function
self_adjoin(), invoked by
geom_flow(), is revised, exported, documented, and exemplified.
weightaesthetic for the three
stat_*()functions is replaced by the
yaesthetic, so that
scale_y_continuous()will correctly transform the vertical scales of the layers. An example is provided in the documentation for
yaesthetic must be present in order for scales to be correctly transformed. The
weightparameter is still available but deprecated.
stat_alluvium()is replaced with
These changes make the functions that test for and convert between alluvial formats behave more like popular functions in the tidyverse. Some of the changes introduce backward incompatibilities, but most result in deprecation warnings.
to_*()are renamed to
to_*_form()for consistency. Their old names are deprecated.
is_alluvial()is deprecated and will be removed in a future version.
logicalis deprecated. In a future version, the functions
is_*_form()will only return logical values.
silent = TRUEnow silences all messages.
FALSEif any weights are negative, with a message to this effect.
diffuseparameters, using up-to-date rlang and tidyselect functionality.
dplyr::vars()objects, as in
dplyr::select_at(). Alternatively, variables can be fed to these functions as in
dplyr::select(), to be collected by
rlang::quos(...)and used as axis variables. If
NULL, then such additional arguments are ignored.
to_*_form()now merge their internal reshapen data frames with the distilled or diffused variables in a consistent order, placing the distilled or diffused variables to the left.
v3.3.0(patch number zero) instead of
v3.3.1. I've been unable to install this version locally, so there is a slight chance of incompatibility that i'll be watchful for going forward.
to_*()functions are combined; see
is_alluvial_alluvianow prints a message rather than a warning when some combinations of strata are not linked by any alluvia.
to_lodes()now has a
diffuseparameter to join any original variables to the reformatted data by the
idvariable (alluvium). This makes it possible to assign original variables to aesthetics after reformatting, as illustrated in a new example.
to_alluvia()now has a
distillparameter to control the inclusion of any original variables that vary within values of
idinto the reformatted data, based on a distilling function that returns a single value from a vector.
to_lodes()now has a logical
discernparameter that uses
make.unique()to make stratum values that appear at different axes distinct. The
stat_*()functions can pass the same parameter internally and print a warning if the data is already in lodes form.
GeomFlow$draw_panel()now begins by restricting to
complete.cases(), corresponding to flows with both starting and terminating axes. (This is not done in
StatFlow$compute_panel(), which would have the effect of excluding missing aesthetic values from legends.)
GeomAlluvium$setup_data()now throws a warning if some color or differentiation aesthetics vary within alluvia.
StatAlluvium$compute_panel()has been fixed.
ggalluvial() shortcut function, which included a formula interface, deprecated in version 0.4.0, is removed.
I only started maintaining
NEWS.md with version 0.5.0.