--- title: "Cleansing the dataset" author: "Choonghyun Ryu" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Cleansing the dataset} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r environment, echo = FALSE, message = FALSE, warning=FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "") ``` ## Preface If you created a dataset to create a classification model, you must perform cleansing of the data. After you create the dataset, you should do the following: * **Cleansing the dataset** + **Optional removal of variables including missing values** + **Remove a variable with one unique number** + **Remove categorical variables with a large number of levels** + **Convert a character variable to a categorical variable** * Split the data into a train set and a test set * Modeling and Evaluate, Predict The alookr package makes these steps fast and easy: ## How to perform cleansing the dataset For information on how to perform cleansing the dataset, refer to the following website. - [`Cleansing the dataset`](https://choonghyunryu.github.io/alookr_vignette/cleansing.html)