% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/optics.R, R/predict.R
\name{optics}
\alias{optics}
\alias{OPTICS}
\alias{extractDBSCAN}
\alias{extractXi}
\alias{predict.optics}
\title{Ordering Points to Identify the Clustering Structure (OPTICS)}
\usage{
optics(x, eps = NULL, minPts = 5, ...)

extractDBSCAN(object, eps_cl)

extractXi(object, xi, minimum = FALSE, correctPredecessors = TRUE)

\method{predict}{optics}(object, newdata, data, ...)
}
\arguments{
\item{x}{a data matrix or a \link{dist} object.}

\item{eps}{upper limit of the size of the epsilon neighborhood. Limiting the
neighborhood size improves performance and has no or very little impact on
the ordering as long as it is not set too low. If not specified, the largest
minPts-distance in the data set is used which gives the same result as
infinity.}

\item{minPts}{the parameter is used to identify dense neighborhoods and the
reachability distance is calculated as the distance to the minPts nearest
neighbor. Controls the smoothness of the reachability distribution. Default
is 5 points.}

\item{...}{additional arguments are passed on to fixed-radius nearest
neighbor search algorithm. See \code{\link[=frNN]{frNN()}} for details on how to
control the search strategy.}

\item{object}{clustering object.}

\item{eps_cl}{Threshold to identify clusters (\code{eps_cl <= eps}).}

\item{xi}{Steepness threshold to identify clusters hierarchically using the
Xi method.}

\item{minimum}{logical, representing whether or not to extract the minimal
(non-overlapping) clusters in the Xi clustering algorithm.}

\item{correctPredecessors}{logical, correct a common artifact by pruning
the steep up area for points that have predecessors not in the
cluster--found by the ELKI framework, see details below.}

\item{newdata}{new data points for which the cluster membership should be
predicted.}

\item{data}{the data set used to create the clustering object.}
}
\value{
An object of class \code{optics} with components:
\item{eps }{ value of \code{eps} parameter. }
\item{minPts }{ value of \code{minPts} parameter. }
\item{order }{ optics order for the data points in \code{x}. }
\item{reachdist }{ \link{reachability} distance for each data point in \code{x}. }
\item{coredist }{ core distance for each data point in \code{x}. }

For \code{extractDBSCAN()}, in addition the following
components are available:
\item{eps_cl }{ the value of the \code{eps_cl} parameter. }
\item{cluster }{ assigned cluster labels in the order of the data points in \code{x}. }

For \code{extractXi()}, in addition the following components
are available:
\item{xi}{ Steepness threshold\code{x}. }
\item{cluster }{ assigned cluster labels in the order of the data points in \code{x}.}
\item{clusters_xi }{ data.frame containing the start and end of each cluster
found in the OPTICS ordering. }
}
\description{
Implementation of the OPTICS (Ordering points to identify the clustering
structure) point ordering algorithm using a kd-tree.
}
\details{
\strong{The algorithm}

This implementation of OPTICS implements the original
algorithm as described by Ankerst et al (1999). OPTICS is an ordering
algorithm with methods to extract a clustering from the ordering.
While using similar concepts as DBSCAN, for OPTICS \code{eps}
is only an upper limit for the neighborhood size used to reduce
computational complexity. Note that \code{minPts} in OPTICS has a different
effect then in DBSCAN. It is used to define dense neighborhoods, but since
\code{eps} is typically set rather high, this does not effect the ordering
much. However, it is also used to calculate the reachability distance and
larger values will make the reachability distance plot smoother.

OPTICS linearly orders the data points such that points which are spatially
closest become neighbors in the ordering. The closest analog to this
ordering is dendrogram in single-link hierarchical clustering. The algorithm
also calculates the reachability distance for each point.
\code{plot()} (see \link{reachability_plot})
produces a reachability plot which shows each points reachability distance
between two consecutive points
where the points are sorted by OPTICS. Valleys represent clusters (the
deeper the valley, the more dense the cluster) and high points indicate
points between clusters.

\strong{Specifying the data}

If \code{x} is specified as a data matrix, then Euclidean distances and fast
nearest neighbor lookup using a kd-tree are used. See \code{\link[=kNN]{kNN()}} for
details on the parameters for the kd-tree.

\strong{Extracting a clustering}

Several methods to extract a clustering from the order returned by OPTICS are
implemented:
\itemize{
\item \code{extractDBSCAN()} extracts a clustering from an OPTICS ordering that is
similar to what DBSCAN would produce with an eps set to \code{eps_cl} (see
Ankerst et al, 1999). The only difference to a DBSCAN clustering is that
OPTICS is not able to assign some border points and reports them instead as
noise.
\item \code{extractXi()} extract clusters hierarchically specified in Ankerst et al
(1999) based on the steepness of the reachability plot. One interpretation
of the \code{xi} parameter is that it classifies clusters by change in
relative cluster density. The used algorithm was originally contributed by
the ELKI framework and is explained in Schubert et al (2018), but contains a
set of fixes.
}

\strong{Predict cluster memberships}

\code{predict()} requires an extracted DBSCAN clustering with \code{extractDBSCAN()} and then
uses predict for \code{dbscan()}.
}
\examples{
set.seed(2)
n <- 400

x <- cbind(
  x = runif(4, 0, 1) + rnorm(n, sd = 0.1),
  y = runif(4, 0, 1) + rnorm(n, sd = 0.1)
  )

plot(x, col=rep(1:4, time = 100))

### run OPTICS (Note: we use the default eps calculation)
res <- optics(x, minPts = 10)
res

### get order
res$order

### plot produces a reachability plot
plot(res)

### plot the order of points in the reachability plot
plot(x, col = "grey")
polygon(x[res$order, ])

### extract a DBSCAN clustering by cutting the reachability plot at eps_cl
res <- extractDBSCAN(res, eps_cl = .065)
res

plot(res)  ## black is noise
hullplot(x, res)

### re-cut at a higher eps threshold
res <- extractDBSCAN(res, eps_cl = .07)
res
plot(res)
hullplot(x, res)

### extract hierarchical clustering of varying density using the Xi method
res <- extractXi(res, xi = 0.01)
res

plot(res)
hullplot(x, res)

# Xi cluster structure
res$clusters_xi

### use OPTICS on a precomputed distance matrix
d <- dist(x)
res <- optics(d, minPts = 10)
plot(res)
}
\references{
Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Joerg
Sander (1999). OPTICS: Ordering Points To Identify the Clustering Structure.
\emph{ACM SIGMOD international conference on Management of data.} ACM Press. pp.
\doi{10.1145/304181.304187}

Hahsler M, Piekenbrock M, Doran D (2019). dbscan: Fast Density-Based
Clustering with R.  \emph{Journal of Statistical Software}, 91(1), 1-30.
\doi{10.18637/jss.v091.i01}

Erich Schubert, Michael Gertz (2018). Improving the Cluster Structure
Extracted from OPTICS Plots. In \emph{Lernen, Wissen, Daten, Analysen (LWDA 2018),}
pp. 318-329.
}
\seealso{
Density \link{reachability}.

Other clustering functions: 
\code{\link{dbscan}()},
\code{\link{extractFOSC}()},
\code{\link{hdbscan}()},
\code{\link{jpclust}()},
\code{\link{sNNclust}()}
}
\author{
Michael Hahsler and Matthew Piekenbrock
}
\concept{clustering functions}
\keyword{clustering}
\keyword{model}
