This function tunes and train a model (or models) for classification. Current version supports Random Forest and XGBoost algorithms. Allows parallelization. The final model, feature importance, prediction results of the test dataset and confusion matrix are outputted as an RData object.

TrainModel(
  TrainSet,
  TestSet,
  alg = c("all", "RF", "XGB"),
  class_col = class_col,
  seed = 40,
  name_0 = name_0,
  name_1 = name_1,
  label = label,
  allowParallel = TRUE,
  free_cores = 4
)

Arguments

TrainSet

A data.frame with balanced classes.

TestSet

A data.frame.

alg

A character vector with the name of the classification algorithm to be used. Options are Random Forest 'RF', XGBoost 'XGB' or both 'all'.

class_col

A character vector with the name of the column that identify the classes.

seed

A numeric vector to set a seed for reproducible results.

name_0

A character vector with the name for the class==0

name_1

A character vector with the name for the class==1

label

A character vector with the prefix of the RData file to be outputted.

allowParallel

Logical. If TRUE allows parallel computation.

free_cores

A numeric vector with the number of cores to be left free if allowParallel=TRUE'.