---
title: "Splitting the dataset"
author: "Choonghyun Ryu"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Splitting the dataset}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r environment, echo = FALSE, message = FALSE, warning=FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "")
```

## Preface
To develop a classification model, the original data must be divided into train data set and test data set. You should do the following:

* Cleansing the dataset
* **Split the data into a train set and a test set**
    + **Split the data.frame or tbl_df into a train set and a test set**
    + **Compare dataset**
        + **Comparison of categorical variables**
        + **Comparison of numeric variables**
        + **Diagnosis of train set and test set**
    + **Extract train/test dataset**  
        + **Extract train set or test set**
        + **Extract the data to fit the model**
* Modeling and Evaluate, Predict

The alookr package makes these steps fast and easy:

## How to perform split the data

For information on how to perform split the data into a train set and a test set, refer to the following website.

- [`Splitting the dataset`](https://choonghyunryu.github.io/alookr_vignette/split.html)