When a method requires a function from a certain package, that package will need to be installed. In this case, the function is the base R function glm(), so no additional package is required. The train() function is essentially a wrapper around whatever method we chose. The method essentially specifies both the model (and more specifically the function to fit said model in R) and package that will be used. method = glm specifies that we will fit a generalized linear model.trControl = trainControl(method = "cv", number = 5) specifies that we will be using 5-fold cross-validation.data = default_trn specifies that training will be down with the default_trn data.It also indicates that all available predictors should be used. specifies the default variable as the response. Here, we have supplied four arguments to the train() function form the caret package. , data = default_trn, trControl = trainControl( method = "cv", number = 5), method = "glm", family = "binomial" ) predict() used on objects of type train will be truly magical!ĭefault_glm_mod = train( form = default ~.tuneGrid which specifies the tuning parameters to train over.method, a statistical learning method from a long list of availible models.preProcess which allows for specification of data pre-processing such as centering and scaling.trControl which specifies the resampling scheme, that is, how cross-validation should be performed to find the best values of the tuning parameters.This specifies the response and which predictors (or transformations of) should be used.It takes the following information then trains (tunes) the requested model: trainControl() will specify the resampling scheme.id() is not a function in caret, but we will get in the habit of using it to specify a grid of tuning parameters.Specify possible tuning parameters for method.It will also do some extra work to ensure that the train and test samples are somewhat similar.
OBS DOWNLOAD 21.1.1 MANUAL
createDataPartition() will take the place of our manual data splitting.Returning to the above list, we will see that a number of these tasks are directly addressed in the caret package. Thankfully, the R community has essentially provided a silver bullet for these issues, the caret package. Some methods cannot handle factor variables. Different methods have different handling of categorical predictors.Not all methods expect the same data format.Many methods have different cross-validation functions, or worse yet, no built-in process for cross-validation.The predict() function seems to have a different behavior for each new method we see.Calculate relevant metrics on the test dataĪt face value it would seem like it should be easy to repeat this process for a number of different methods, however we have run into a number of difficulties attempting to do so with R.Use resampling to find the “best model” by choosing the values of the tuning parameters.Decide on a set of candidate models (specify possible tuning parameters for method).Now that we have seen a number of classification and regression methods, and introduced cross-validation, we see the general outline of a predictive analysis: Discriminative versus Generative Methods.8.4 Estimating Expected Prediction Error.7.4 Tuning Parameters versus Model Parameters.7.1 Parametric versus Non-Parametric Models.6.4 Adding Flexibility to Linear Models.