News
NEWS
Version 4.2.2
Changes
20180921
 Modified the package to use Rcpp interface instead of the
oldstyle C interface.
Version 4.2.1
Changes
20170610
 Added a new vignette "Tutorial: Linear weight scaling in cluster analysis".
 Reorganized manuals and updated documentation.
Version 4.2.0
Changes
20170529
 Increased log likelihood calculation to long double
precision in C++ function weighted_select_levels.cpp.
 Replaced std::accumulate() function by forloop addition in
C++ function weighted_select_levels.cpp. This resolved a
numerical overflow issue when the weight values are large.
 Now R function plotBIC() automatically adjusts the "k*="
text position, so that the text label is placed entirely
within the BIC curve area and will not extend into the
figure margin.
 The cluster size has been changed from integer to double to
accomodate weighted cluster size in both R and C++ code.
 Force any weight vector to be equal weight in new R
function Ckmedian.1d.dp.
 Introduced S3 methods print and plot for Ckmedian.1d.dp
and Cksegs.1d.dp objects.
20170318

Introduced Cksegs.1d.dp() function for ksegments clustering
of y with or without x. Only method="quadratic" guarantees
optimality.

Expanded kmedian clustering to work with all possible
methods. Only unweighted solution guarantees optimality.
Version 4.1.0
Changes
20170302
 Introduced function Ckmedian.1d.dp() for kmedian unweighted
clustering.
Version 4.0.2
Changes
20170216
 Fixed symbol encoding used in NEWS.
 Updated documentation.
Version 4.0.1
Changes
20170216
 Fixed a warning message in the use of 'R_registerRoutines' and
'R_useDynamicSymbols'.
 Fixed a memory leak issue: invalid read of size 8.
Version 4.0.0
Changes
20170211
 Removed some examples for future use.
Version 3.4.15
 Minor changes.
Version 3.4.14
Changes
20170103
 Minor changes in documentation files.
20170102
 Changed package title to "Optimal and Fast Univariate Clustering".
20161227
 When the input vector x is empty, function Ckmeans.1d.dp now generates a
warning message instead of stops on error. Ckmeans.1d.dp.R is modified.
 Print out appropriate warning messages when input x does not have an
appropriate type. ahist.R is modified.
Version 3.4.13
Changes
20161219
 Revised the comparison function in sorting so that the code can be
compiled by C++98, as requested by a user.
20161206
 Expanded the ahist() function to support weighted adaptive histogram
20161204
 Expanded the vignette of adaptive histograms to a tutorial.
 Expanded the vignette of optimal univariate kmeans clustering to a tutorial.
 Update the time course example in Ckmeans.1d.dp function
20161022
 Added a vignette to visualize examples of adaptive histograms.
 Added a vignette to visualize examples of optimal univariate kmeans
clustering.
20161021
 Added an equal bin width histogram example to contrast with the adaptive
histogram.
20161016
 Moved ahist() function from visualize.R to a new R file ahist.R.
20161015
 Added a breaks argument to ahist() so as use default graphics::hist() but
with the capacity to add sticks to the histogram generated.
 Added a skip.empty.bin.color argument to ahist() to gain more control over
colors of the histogram bars.
20161012
 Added a data argument to ahist() to provide raw data for visualization.
 Allow x to ahist() to be an object of the class "Ckmeans.1d.dp" to avoid
recomputing the clustering if it has already been done. This requires the
data for clustering to be provided via the data argument.
20161011
 Added an argument add.sticks=TRUE to ahist() to turn on or off the sticks
just above the horizontal axis.
 Added an argument style to ahist() for different styles of adaptive
histogram.
20161001
 Added a new function plot() to visualize the clusters.
 Added a new function plotBIC() to show the Bayesian information criterion
as a function of number of clusters.
20160930
 Updated examples of ahist().
 Added sticks to ahist() to show the original input data.
20160927
 Fixed ahist() when there is only a single bin detected.
 Made ahist() run much faster than the previous version.
 Updated previous examples and added more examples to illustrate
the use of ahist() better.
20160925
 Introduced a new function ahist() to generate adaptive histograms
corresponding to the optimal univariate kmeans clustering.
20160924
 Known issue: loglinear option may generate optimal clustering different
from linear and quadratic.
 The default k estimation method is updated. Updated number of cluster k
estimation. The main difference is when there are duplicates in the data.
Otherwise, the estimated k would be the same with previous versions. Added
an argument estimate.k in fiction Ckmeans.1d.dp() to use the BIC method in
version 3.4.12 or earlier to estimated k for compatibility.
Version 3.4.12
Changes
20160820
 The weighted univariate kmeans now runs in O(kn), down from O(kn^2).
This is a result of integrating weighted and unweighted kmeans
clustering into a unified dynamic programming function without sacrificing
performance. This also fixed a bug in the previous loglineartime weighted
kmeans implementation.
Version 3.4.9
Changes
20160719
 If the input array is already sorted, sorting is not performed again.
 Added an option method to select either the linear or loglinear algorithm.
20160716
 Implemented linear recursive algorithm based on the method described in
(Aggarwal et al., 1987)
Version 3.4.8
Changes
20160601

Implemented an iterative O(nlgn+kn) algorithm. This version completely
eliminates the involved divideandconquer strategy reported in the
literature and further reduced the overhead.
This implementation was later determined to be incorrect.
Version 3.4.7
Changes
20160530

Implemented an O(nlgn+kn) algorithm combining divideandconquer and
dynamic programming. The space is still O(kn). The runtime is now practical
for very large sample sizes for any number of clusters.
This implementation was later determined to be incorrect.
Version 3.4.6
Changes
20160525
 Implemented an O(kn lg n) algorithm, speeding up the program greatly.
Version 3.4.5
Changes
20160522
 s[j,i] is now computed in constant time based on precomputed
sums of input x and its squares from 0 to i.
Version 3.4.4
Changes
20160522
 Incorporated a numerically stable method for computing sample variance when
selecting the number of clusters.
 Improved documentation.
 Removed a typo in describing time complexity.
20160518
 Now Ckmeans.1d.dp() function returns "totss", "tot.withinss", and
"betweenss" statistics to summarize the optimal clustering obtained.
 print.Ckmeans.1d.dp() print out the above statistics.
Version 3.4.3
Changes
20160515
 Upgraded to support c++11
 Introduced optimal kmeans clustering for weighted data
Version 3.4.2
Changes
20160514
 Implemented backward filling of the dynamic programming matrix to
utilize lower bounds for the optimal cluster boundary. This step
substantially reduced the runtime by half (two or more times faster).
20160507
 Implemented mathematically proven tighter ranges when searching for
cluster boundaries. The runtime of the function is greatly reduced.
Most notably, the runtime is roughly constant when number of clusters
increases after k=2.
 Integrated all test cases into one single file.
Version 3.4.0
Changes
20160507
 Substantial runtime reduction. Added code to check for an upper bound
for the sum of within cluster square distances. This reduced the runtime
by half when clustering 100000 points (from standard normal distribution)
into 10 clusters.
 Eliminated the unnecessary calculation of (n1) elements in the dynamic
programming matrix that are not needed for the final result. This
resulted in enormous reduction in run time when the number of cluster
is 2: assigning one million points into two clusters took half a
a second on iMac with 2.93 GHz Intel Core i7 processor.
 Included a reference to the first description of the dynamic programming
solution by Richard Bellman (1973).
Version 3.3.3
Changes
20160503
 Fixed a bug on cluster assignment when there is only one cluster. This
was a bug introduced in version 3.3.2.
Version 3.3.2
Changes
20160503
 Added automatic test cases.
 Removed an incorrect warning message when the number of clusters is equal
to the number of unique elements in the input vector.
 Changed from 1based to 0based C implementation.
 Optimized the code by reducing overhead. See 22% reduction in runtime to
repeatedly cluster seven points into two clusters one million times.
Version 3.3.1 20150210
Changes
 Fixed a problem that prevented Windows compilation (now forced the size_t
type to unsigned long in max() function.
Version 3.3.0 20150209
Changes
 Added automated test cases into the package.
 Changed the code to not issue a warning message when the number of clusters
is estimated to be 1.
 When lower bound of the number of clusters is greater than the unique
number of elements in the input vector, both the min and max numbers of
clusters are set to the number of unique number of input values.
 When the upper bound of the number of clusters is greater than the unique
number of elements in the input vector, the max number of clusters is set
to the number of unique elements in the input vector.
 Use warning() instead of cat() to display warning messages.
 Incorporate changes suggested by a user to speed up the code.
 Revised the examples and documentation to improve usability of the package
in general.
 Started the NEWS file.
Version 3.02 20140324 and earlier
Changes
 The program now automatically determines the number of clusters from a
given range.
 The code is optimized for further speedup.