tidyclust 0.3.0

Deprecation

finalize_model_tidyclust() and finalize_workflow_tidyclust() are deprecated. Use tune::finalize_model() and tune::finalize_workflow() instead, which now support cluster_spec objects natively. (#223)

New Models and Engines

New db_clust() clustering specification for fitting DBSCAN models, with engines "dbscan" and "hdbscan". (#209, #238)
New gm_clust() clustering specification for fitting Gaussian mixture models, with engine "mclust". (#209)
New mean_shift() clustering specification for fitting mean shift models, which iteratively shift observations toward regions of high density and determine the number of clusters automatically. Engines "LPCM" and "meanShiftR" are supported. (#240, #244)

Improvements

Added dials parameter constructors radius(), min_points(), circular(), zero_covariance(), shared_orientation(), shared_shape(), and shared_size() so that tuning parameters for db_clust() and gm_clust() resolve to real parameter objects rather than erroring on unexported dials:: names.
Added a “Getting started with tidyclust” vignette (vignette("tidyclust")). (#232)
Added butcher support for cluster_fit objects. axe_data() removes the training data stored in the fit, and axe_env() clears the environment reference from the preprocessing terms. (#126)
contr_one_hot() is now exported, fixing the indicators = "one_hot" code path in .convert_form_to_x_fit() and .convert_form_to_x_new(). (#218)
extract_cluster_assignment(), extract_centroids(), and predict() gain a labels argument, a character vector of cluster labels that overrides the auto-generated prefix-based labels. (#148)
hier_clust() gains a dist_fun argument for specifying a custom distance function. (#70)
hier_clust() documentation now clarifies that predict() may not match extract_cluster_assignment() on training data: predict() uses a distance-based heuristic while extract_cluster_assignment() uses cutree() based on the dendrogram structure. (#208)
The dist_fun argument accepted by cluster metrics is now documented, including how to use {philentropy} to supply custom distance methods. See vignette("tuning_and_metrics", package = "tidyclust") for examples. (#185)
tune_cluster() now supports parallel processing via the mirai package in addition to future. (#220)
tune_cluster() now warns when passed an apparent() resample. Metrics from apparent resamples are excluded by collect_metrics(summarize = TRUE) (the default) since tune 1.2.0, which caused unexpected NA values. Use collect_metrics(summarize = FALSE) to see per-resample metrics. (#193)
The .notes column returned by tune_cluster() now includes a trace column containing backtraces for errors and warnings, making it easier to debug failures. (#220)

Bug Fixes

Fixed bug when trying to tune the linkage_method argument. (#206, @lgaborini)
silhouette_avg() now has direction = "maximize" instead of direction = "zero", so that show_best() and select_best() correctly return models with the highest silhouette values. (#212, @dnldelarosa)
sse_within_total() now correctly applies a custom dist_fun when new_data is NULL by using training data stored in the model. (#184)

Breaking Changes

The foreach package is no longer supported for parallel processing in tune_cluster(). Use the future or mirai packages instead. See ?tune::parallelism for details. (#220)
The .config column produced by tune_cluster() has changed from the Preprocessor{num}_Model{num} pattern to pre{num}_mod{num}_post{num} to align with updates in the tune package. (#220)

tidyclust 0.2.4

The philentropy package is now used to calculate distances rather than Rfast. (#199)

tidyclust 0.2.3

Update to fix revdep issue for clustMixType. (#190)

tidyclust 0.2.2

Update to fix revdep issue for ClusterR. (#186)

tidyclust 0.2.1

Small change to let tune package have easy CRAN release. (#178)

tidyclust 0.2.0

New Engines

The clustMixType engine as been added to k_means(). This engine allows fitting of k-prototype models. (#63)
The klaR engine as been added to k_means(). This engine allows fitting of k-modes models. (#63)

Improvements

Engine specific documentation has been added for all models and engines. (#159)

Bug Fixes

Fixed bug where engine specific arguments were passed along for k_means() when the engine ClusterR. (#142)
Fixed bug where prefix argument wouldn’t be correctly passed through extract_cluster_assignment(), extract_centroids(), and predict() (#145)
Metric functions now error informatively if used with unfit cluster specifications. (#146)
Fixed bug that caused cluster ordering in extract_fit_summary(). (#136)
Using extract_cluster_assignment(), extract_centroids() and predict() on a fitted hier_clust() model without specifying num_clust or cut_height now gives more informative error message. (#147)
k_means() now errors informatively if fit() without num_clust specified. (#134)
Fixed bug where levels didn’t match number of clusters if prediction on fewer number of observations. (#158)
Fixed bug where tune_cluster() would error if used with an recipe that contained non-predictor variables such as id variables. (#124)

Breaking Changes

Exported internal functions ClusterR_kmeans_fit(), stats_kmeans_fit(), and hclust_fit() have been renamed to .k_means_fit_ClusterR(), .k_means_fit_stats(), and .hier_clust_fit_stats() to reduce visibility for users.
Cluster reordering is now done at the fitting time, not the extraction and prediction time. (#154)

tidyclust 0.1.2

The cluster specification methods for generics::tune_args() and generics::tunable() are now registered unconditionally (#115).

tidyclust 0.1.1

Fixed bug where extract_cluster_assignment() and predict() sometimes didn’t have agreement of clusters. (#94)
silhouette() and silhouette_avg() now return NAs instead of erroring when applied to a clustering object with 1 cluster. (#104)
Fixed bug where extract_cluster_assignment() doesn’t work for hier_clust() models in workflows where num_clusters is specified in extract_cluster_assignment().

tidyclust 0.1.0

Added a NEWS.md file to track changes to the package.