% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/MiscFuns.R
\name{bin}
\alias{bin}
\title{Bins the values of a variable (typically a factor)}
\usage{
bin(x, bin)
}
\arguments{
\item{x}{A vector whose values have to be grouped. Can be of any type but must be atomic.}

\item{bin}{A list of values to be grouped, a vector, a formula, or the special values \code{"bin::digit"} or \code{"cut::values"}. To create a new value from old values, use \code{bin = list("new_value"=old_values)} with \code{old_values} a vector of existing values. You can use \code{.()} for \code{list()}.
It accepts regular expressions, but they must start with an \code{"@"}, like in \code{bin="@Aug|Dec"}. It accepts one-sided formulas which must contain the variable \code{x}, e.g. \code{bin=list("<2" = ~x < 2)}.
The names of the list are the new names. If the new name is missing, the first value matched becomes the new name. In the name, adding \code{"@d"}, with \code{d} a digit, will relocate the value in position \code{d}: useful to change the position of factors. Use \code{"@"} as first item to make subsequent items be located first in the factor.
Feeding in a vector is like using a list without name and only a single element. If the vector is numeric, you can use the special value \code{"bin::digit"} to group every \code{digit} element.
For example if \code{x} represents years, using \code{bin="bin::2"} creates bins of two years.
With any data, using \code{"!bin::digit"} groups every digit consecutive values starting from the first value.
Using \code{"!!bin::digit"} is the same but starting from the last value.
With numeric vectors you can: a) use \code{"cut::n"} to cut the vector into \code{n} equal parts, b) use \code{"cut::a]b["} to create the following bins: \code{[min, a]}, \code{]a, b[}, \code{[b, max]}.
The latter syntax is a sequence of number/quartile (q0 to q4)/percentile (p0 to p100) followed by an open or closed square bracket. You can add custom bin names by adding them in the character vector after \code{'cut::values'}. See details and examples. Dot square bracket expansion (see \code{\link[fixest]{dsb}}) is enabled.}
}
\value{
It returns a vector of the same length as \code{x}
}
\description{
Tool to easily group the values of a given variable.
}
\section{"Cutting" a numeric vector}{


Numeric vectors can be cut easily into: a) equal parts, b) user-specified bins.

Use \code{"cut::n"} to cut the vector into \code{n} (roughly) equal parts. Percentiles are used to partition the data, hence some data distributions can lead to create less than \code{n} parts (for example if P0 is the same as P50).

The user can specify custom bins with the following syntax: \code{"cut::a]b]c]"etc}. Here the numbers \code{a}, \code{b}, \code{c}, etc, are a sequence of increasing numbers, each followed by an open or closed square bracket. The numbers can be specified as either plain numbers (e.g. \code{"cut::5]12[32["}), quartiles (e.g. \code{"cut::q1]q3["}), or percentiles (e.g. \code{"cut::p10]p15]p90]"}). Values of different types can be mixed: \code{"cut::5]q2[p80["} is valid provided the median (\code{q2}) is indeed greater than \code{5}, otherwise an error is thrown.

The square bracket right of each number tells whether the numbers should be included or excluded from the current bin. For example, say \code{x} ranges from 0 to 100, then \code{"cut::5]"} will create two  bins: one from 0 to 5 and a second from 6 to 100. With \code{"cut::5["} the bins would have been 0-4 and 5-100.

A factor is returned. The labels report the min and max values in each bin.

To have user-specified bin labels, just add them in the character vector following \code{'cut::values'}. You don't need to provide all of them, and \code{NA} values fall back to the default label. For example, \code{bin = c("cut::4", "Q1", NA, "Q3")} will modify only the first and third label that will be displayed as \code{"Q1"} and \code{"Q3"}.
}

\examples{

data(airquality)
month_num = airquality$Month
table(month_num)

# Grouping the first two values
table(bin(month_num, 5:6))

# ... plus changing the name to '10'
table(bin(month_num, list("10" = 5:6)))

# ... and grouping 7 to 9
table(bin(month_num, list("g1" = 5:6, "g2" = 7:9)))

# Grouping every two months
table(bin(month_num, "bin::2"))

# ... every 2 consecutive elements
table(bin(month_num, "!bin::2"))

# ... idem starting from the last one
table(bin(month_num, "!!bin::2"))

# Using .() for list():
table(bin(month_num, .("g1" = 5:6)))


#
# with non numeric data
#

month_lab = c("may", "june", "july", "august", "september")
month_fact = factor(month_num, labels = month_lab)

# Grouping the first two elements
table(bin(month_fact, c("may", "jun")))

# ... using regex
table(bin(month_fact, "@may|jun"))

# ...changing the name
table(bin(month_fact, list("spring" = "@may|jun")))

# Grouping every 2 consecutive months
table(bin(month_fact, "!bin::2"))

# ...idem but starting from the last
table(bin(month_fact, "!!bin::2"))

# Relocating the months using "@d" in the name
table(bin(month_fact, .("@5" = "may", "@1 summer" = "@aug|jul")))

# Putting "@" as first item means subsequent items will be placed first
table(bin(month_fact, .("@", "aug", "july")))

#
# "Cutting" numeric data
#

data(iris)
plen = iris$Petal.Length

# 3 parts of (roughly) equal size
table(bin(plen, "cut::3"))

# Three custom bins
table(bin(plen, "cut::2]5]"))

# .. same, excluding 5 in the 2nd bin
table(bin(plen, "cut::2]5["))

# Using quartiles
table(bin(plen, "cut::q1]q2]q3]"))

# Using percentiles
table(bin(plen, "cut::p20]p50]p70]p90]"))

# Mixing all
table(bin(plen, "cut::2[q2]p90]"))

# NOTA:
# -> the labels always contain the min/max values in each bin

# Custom labels can be provided, just give them in the char. vector
# NA values lead to the default label
table(bin(plen, c("cut::2[q2]p90]", "<2", "]2; Q2]", NA, ">90\%")))



#
# With a formula
#

data(iris)
plen = iris$Petal.Length

# We need to use "x"
table(bin(plen, list("< 2" = ~x < 2, ">= 2" = ~x >= 2)))


}
