Visualizations of High-Dimensional Data

Gives access to data visualisation methods that are relevant from the data scientist's point of view. The flagship idea of 'DataVisualizations' is the mirrored density plot (MD-plot) for either classified or non-classified multivariate data published in Thrun, M.C. et al.: "Analyzing the Fine Structure of Distributions" (2020), PLoS ONE, . The MD-plot outperforms the box-and-whisker diagram (box plot), violin plot and bean plot and geom_violin plot of ggplot2. Furthermore, a collection of various visualization methods for univariate data is provided. In the case of exploratory data analysis, 'DataVisualizations' makes it possible to inspect the distribution of each feature of a dataset visually through a combination of four methods. One of these methods is the Pareto density estimation (PDE) of the probability density function (pdf). Additionally, visualizations of the distribution of distances using PDE, the scatter-density plot using PDE for two variables as well as the Shepard density plot and the Bland-Altman plot are presented here. Pertaining to classified high-dimensional data, a number of visualizations are described, such as f.ex. the heat map and silhouette plot. A political map of the world or Germany can be visualized with the additional information defined by a classification of countries or regions. By extending the political map further, an uncomplicated function for a Choropleth map can be used which is useful for measurements across a geographic area. For categorical features, the Pie charts, slope charts and fan plots, improved by the ABC analysis, become usable. More detailed explanations are found in the book by Thrun, M.C.: "Projection-Based Clustering through Self-Organization and Swarm Intelligence" (2018) .


Version 1.1.6 (CRAN, 2018-03-08) o Major Improvement: The MD-plot samples now data automatically above an threshold in order to be usuable for Big Data. o Bugfixes: In MD-plot the testing against uniform distribution now works also for one feauture. A crash is fixed where a columns has not even one finite value. o Bugfix: If colnames of Data are given in MD-plot but not unique, they are renamed to unique colnames. o Bugfix: Plot3D checks now that the length of Cls is equal to the number of rows of data. o Bugfix: ClassPDEplot did not plot all pdfs for SameKernelsAndRadius=0. o Improvement: ClassMDplot now jitters points instead of visualizing pdf if not enough data in class is given. o Bugfix: In Heatmap the x labels are now plotted correctly. o Bugfix: In ClassMDplot class names now work correctly. o Classplot and DualClassplot: Functions allow to plot one or two time series with a classification as scatter plots or line and scatter plots. Usefull to see if temporal clustering has time dependent variations and for Hidden Markov Models (see Mthrun/RHmm on GitHub).

Version 1.1.5 (CRAN, 2018-02-02) o Bugfix: The statistical testing in the MD-plot now is omitted if not enough unique values are given in some features. In such a case, only a scatter plot is plotted. o Naming Convention: Standardized plotting functions by removing "plot" in names except cases where it is conventional in other sources, e.g. MAplot. Older function names will be removed in the next CRAN version. o plotWorldmap renamed to Worldmap o plotChoroplethMap renamed to Choroplethmap o ClassBoxPlot renamed to ClassBoxplot o fanPlot renamed to Fanplot o pieChart renamed to Piechart o CrossTablePlot renamed to Crosstable o slopeChart renamed to Slopechart o plot3D renamed to Plot3D o InspectScatterOfData renamed to InspectScatterplots o PixelMatrixPlot renamed to Pixelmatrix o ShepardScatterPlot renamed to Sheparddiagram o ShepardDensityPlot renamed to ShepardPDEscatter o DualAxisLineChart renamed to DualaxisLinechart o SilhouettePlot renamed to Silhouetteplot o BoxplotData renamed to InspectBoxplots o QQplotWithFit renamed to QQplot o nanPlot is deprecated. The new function is PlotMissingvalues. o ClassViolinPlot is deprecated. The new function is ClassMDplot.

Version 1.1.4 (GitHub, 2018-11-17) o SignedLog: Allows to transform negativ values with logarithm. o Improvement: MD-plot allows for Scaling and Ordering of data and plots scattered points if a column has not enough values to perform density estimation. o Bugfix: stat_pde_density now works for only one value if other values are NaN.

Version 1.1.4 (CRAN, 2018-10-21) o DualAxisLineChart enables to visualize to lines in one plot overlaying them using ploty (e.g. two time series with two ranges of values) o Bugfix: PDEscatter now removes non-numeric values with na.rm=TRUE before xlim and ylim are defined o ProductRatioPlot: The plot is useful in the case where there are many instances of very small values, but a small number of very large ones o CrossTablePlot: Presents a heatmap with values and a cross table of given Data matrix of two features and a bin width or percentualized values o Update: plot3D function and documentation improved.

Version 1.1.3 (GitHub, 2018-07-07) o Improvement: MD-plot layout changes and plotting paramater added. o InspectCorrelation now visualizes the density and calcluates the spearman correlation index as a shortcut to PDEscatter. o Minor bugifx: MD-plot now uses ggExtra::rotateTextX() for better xaxis alignment of text.

Version 1.1.2 (GitHub, 2018-07-02) o Bugfix: InspectDistances methods argument now passed on to parallelDist::parDist.

Version 1.1.1 (CRAN 2018-06-30) o Improved visualization of MD-plot.

Version 1.1.0 (GitHub) o Vignette generated. o Bugfix: authors stated in Description regarding functions which were used in other dbt packages instead of this package.

Version 1.0.9 (GitHub) o ClassViolinPlot build on top of stat_pde_density for Data with Clustering. o Bugfix in stat_pde_density for special case of one value. o parallelDist integrated for faster distance computations.

Version 1.0.8 (GitHub) o Bugfix in stat_pde_density for special case of one value. o Minor bugfix in PDEscatter.

Version 1.0.7 (GitHub) o MD-plot: stat_pde_density added in order to integrate concept with ggplot2.

Version 1.0.1-1.0.6 (GitHub) o ClassBoxPlot function added for Data with Clustering. o ClassPDEplot function added for Data with Clustering. o ClassPDEplotMaxLikeli function added for Data with Clustering. o Minor bugfix in InspectVariable function. o Bugfix in internpiechart.

Version 1.0.0 (CRAN, 2018-05-06) o Complete package generated.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


1.2.2 by Michael Thrun, a year ago

Report a bug at

Browse source code at

Authors: Michael Thrun [aut, cre, cph] , Felix Pape [aut, rev] , Onno Hansen-Goos [ctr, ctb] , Hamza Tayyab [ctr, ctb] , Dirk Eddelbuettel [ctr] , Craig Varrichio [ctr] , Alfred Ultsch [dtc, ctb, ctr]

Documentation:   PDF Manual  

GPL-3 license

Imports Rcpp, ggplot2, sp, pracma, reshape2

Suggests plyr, MBA, ggmap, plotrix, rworldmap, rgl, ABCanalysis, choroplethr, dplyr, R6, parallelDist, knitr, rmarkdown, vioplot, ggExtra, plotly, htmlwidgets, diptest, moments, signal, DatabionicSwarm, ggrepel

Linking to Rcpp, RcppArmadillo

System requirements: C++11

Imported by AdaptGauss, FCPS, opGMMassessment, pguIMP.

Suggested by DatabionicSwarm, GeneralizedUmatrix, ProjectionBasedClustering.

See at CRAN