Computes 46 optimized distance and similarity measures for comparing probability functions. These comparisons between probability functions have their foundations in a broad range of scientific disciplines from mathematics to ecology. The aim of this package is to provide a core framework for clustering, classification, statistical inference, goodness-of-fit, non-parametric statistics, information theory, and machine learning tasks that are based on comparing univariate or multivariate probability functions.

Data collection and data comparison are the foundations of scientific research.
*Mathematics* provides the abstract framework to describe patterns we observe in nature and *Statistics* provides the
framework to quantify the uncertainty of these patterns. In statistics, natural patterns
are described in form of probability distributions which either follow a fixed pattern (parametric distributions) or more dynamic patterns (non-parametric distributions).

The `philentropy`

package implements fundamental distance and similarity measures to quantify distances between probability density functions as well as traditional information theory measures. In this regard, it aims to provide a framework for comparing
natural patterns in a statistical notation.

This project is born out of my passion for statistics and I hope that it will be useful to the people who share it with me.

`# install philentropy version 0.1.0 from CRANinstall.packages("philentropy")`

- Introduction to the philentropy package
- Distance and Similarity Measures implemented in philentropy
- Information Theory Metrics implemented in philentropy

`library(philentropy)# retrieve available distance metricsgetDistMethods()`

```
[1] "euclidean" "manhattan" "minkowski"
[4] "chebyshev" "sorensen" "gower"
[7] "soergel" "kulczynski_d" "canberra"
[10] "lorentzian" "intersection" "non-intersection"
[13] "wavehedges" "czekanowski" "motyka"
[16] "kulczynski_s" "tanimoto" "ruzicka"
[19] "inner_product" "harmonic_mean" "cosine"
[22] "hassebrook" "jaccard" "dice"
[25] "fidelity" "bhattacharyya" "hellinger"
[28] "matusita" "squared_chord" "squared_euclidean"
[31] "pearson" "neyman" "squared_chi"
[34] "prob_symm" "divergence" "clark"
[37] "additive_symm" "kullback-leibler" "jeffreys"
[40] "k_divergence" "topsoe" "jensen-shannon"
[43] "jensen_difference" "taneja" "kumar-johnson"
[46] "avg"
```

`# define a probability density function PP <- 1:10/sum(1:10)# define a probability density function QQ <- 20:29/sum(20:29) # combine P and Q as matrix objectx <- rbind(P,Q) # compute the jensen-shannon distance between# probability density functions P and Qdistance(x, method = "jensen-shannon")`

```
jensen-shannon using unit 'log'.
jensen-shannon
0.02628933
```

`# install.packages("devtools")# install the current version of philentropy on your systemlibrary(devtools)install_github("HajkD/philentropy", build_vignettes = TRUE, dependencies = TRUE)`

The current status of the package as well as a detailed history of the functionality of each version of `philentropy`

can be found in the NEWS section.

`distance()`

: Implements 46 fundamental probability distance (or similarity) measures`getDistMethods()`

: Get available method names for 'distance'`dist.diversity()`

: Distance Diversity between Probability Density Functions`estimate.probability()`

: Estimate Probability Vectors From Count Vectors

`H()`

: Shannon's Entropy H(X)`JE()`

: Joint-Entropy H(X,Y)`CE()`

: Conditional-Entropy H(X | Y)`MI()`

: Shannon's Mutual Information I(X,Y)`KL()`

: Kullback–Leibler Divergence`JSD()`

: Jensen-Shannon Divergence`gJSD()`

: Generalized Jensen-Shannon Divergence

I would be very happy to learn more about potential improvements of the concepts and functions provided in this package.

Furthermore, in case you find some bugs or need additional (more flexible) functionality of parts of this package, please let me know:

https://github.com/HajkD/philentropy/issues

or find me on twitter: HajkDrost

- Fixing bug that caused
`jensen-shannon`

computations to compute wrong values when`0 values`

were present in the input vectors (see issue #4 ; Many thanks to @wkc1986) - Fixing bug that caused
`jensen-difference`

computations to compute wrong values when`0 values`

were present in the input vectors - Fixing bugs in all distance metrics when handing 0/0, 0/x or x/0 cases

- new message system
- extending documentation

- Fixing bug that caused that
`JSD()`

gives NaN when any probability is 0 - see https://github.com/HajkD/philentropy/issues/1 (Thanks to William Kurtis Chang)

- Fixing C++ memory leaks in
`dist.diversity()`

and`distance()`

when check for`colSums(x) > 1.001`

was peformed (leak was found with`rhub::check_with_valgrind()`

)

Initial submission version.