% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/row_filtering.R
\name{remove_rare_categorical}
\alias{remove_rare_categorical}
\title{Filter rare categories}
\usage{
remove_rare_categorical(
  data_set,
  cols = "auto",
  threshold = 0.01,
  verbose = TRUE
)
}
\arguments{
\item{data_set}{Matrix, data.frame or data.table}

\item{cols}{List of column(s) name(s) of data_set to transform. To transform all
columns, set it to "auto".  (character, default to "auto")}

\item{threshold}{share of occurrences under which row should be removed (numeric, default to 0.01)}

\item{verbose}{Should the algorithm talk? (logical, default to TRUE)}
}
\value{
Same dataset with less rows, edited by \strong{reference}. \cr
If you don't want to edit by reference please provide set \code{data_set = copy(data_set)}.
}
\description{
Filter rows that have a rare occurrences
}
\details{
Filtering is made column by column, meaning that extreme values from first element
of \code{cols} are removed, then extreme values from second element of \code{cols} are removed,
... \cr
So if filtering is performed on too many column, there ia high risk that a lot of rows will be dropped.
}
\examples{
# Given a set with rare "C"
library(data.table)
data_set <- data.table(cat_col = c(sample(c("A", "B"), 1000, replace=TRUE), "C"))

# When calling function
data_set <- remove_rare_categorical(data_set, cols = "cat_col",
                                   threshold = 0.01, verbose = TRUE)

# Then there are no "C"
unique(data_set[["cat_col"]])
}
