Alluvial plots are similar to sankey diagrams and visualise categorical data
over multiple dimensions as flows. (Rosvall M, Bergstrom CT (2010) Mapping Change in
Large Networks. PLoS ONE 5(1): e8694.
Alluvial plots are similar to sankey diagrams and visualise categorical data over multiple dimensions as flows. Rosval et. al. 2010 Their graphical grammar however is a bit more complex then that of a regular x/y plots. The ggalluvial
package made a great job of translating that grammar into ggplot2
syntax and gives you many option to tweak the appearance of an alluvial plot, however there still remains a multi-layered complexity that makes it difficult to use 'ggalluvial' for explorative data analysis. 'easyalluvial' provides a simple interface to this package that allows you to produce a decent alluvial plot from any dataframe in either long or wide format from a single line of code while also handling continuous data. It is meant to allow a quick visualisation of entire dataframes with a focus on different colouring options that can make alluvial plots a great tool for data exploration.
install.packages('easyalluvial')
devtools::install_github("erblast/easyalluvial")
In order to learn about all the features an how they can be useful check out the following tutorials:
suppressPackageStartupMessages( require(tidyverse) )suppressPackageStartupMessages( require(easyalluvial) )data = as_tibble(mtcars)categoricals = c('cyl', 'vs', 'am', 'gear', 'carb')numericals = c('mpg', 'cyl', 'disp', 'hp', 'drat', 'wt', 'qsec')data = data %>%mutate_at( vars(categoricals), as.factor )
Continuous Variables will be automatically binned as follows.
alluvial_wide( data = data, max_variables = 5, fill_by = 'first_variable' )
knitr::kable( head(quarterly_flights) )
tailnum | carrier | origin | dest | qu | mean_arr_delay |
---|---|---|---|---|---|
N0EGMQ LGA BNA MQ | MQ | LGA | BNA | Q1 | on_time |
N0EGMQ LGA BNA MQ | MQ | LGA | BNA | Q2 | on_time |
N0EGMQ LGA BNA MQ | MQ | LGA | BNA | Q3 | on_time |
N0EGMQ LGA BNA MQ | MQ | LGA | BNA | Q4 | on_time |
N11150 EWR MCI EV | EV | EWR | MCI | Q1 | late |
N11150 EWR MCI EV | EV | EWR | MCI | Q2 | late |
alluvial_long( quarterly_flights, key = qu, value = mean_arr_delay, id = tailnum, fill = carrier )
dplyr 0.8.0.
compatibilityvdiffr
is now used to test plots and added as a suggested dependencymanip_bin_numerics()
accepts c('median', 'mean', 'cuts', 'min_max') as bin_labels
argument which will be converted to bin label.alluvial_wide()
and alluvial_long()
do not crash anymore when dataframes are groupedCRAN released
CRAN submission