library(tidyverse)
library(tidymodels)Unit 4 (Classification): Fit RDA Models
Introduction
This file serves as the answer key for the RDA portion of the Unit_04 homework. Unit 4 Exploratory Introduction to Classification Models on the course website contains all materials required for this assignment.
In this assignment, we demonstrate how to fit multiple configurations of KNN and RDA classification models using data from the Titanic data set. By evaluating these various configurations in our validation set, we select the top performing model in our validation set out of all KNN and RDA models we fit. We use this best model to generate predictions for our held out test set, which we only use ONCE for evaluation of our final best model.
Set up
Load tidy packages
Handle conflicts
options(conflicts.policy = "depends.ok")Load additional packages
library(discrim, exclude = "smoothness")Source function scripts (John’s or your own)
source("https://github.com/jjcurtin/lab_support/blob/main/fun_plots.R?raw=true")
source("https://github.com/jjcurtin/lab_support/blob/main/fun_eda.R?raw=true")Specify other global settings
theme_set(theme_classic())
options(tibble.width = Inf, dplyr.print_max=Inf)Paths
path_data <- "application_assignments/unit_04"Load data
Use
here::here()and relative path for your data.
data_trn <- read_csv(here::here(path_data, "titanic_train_cln.csv"),
col_types = cols()) |>
glimpse()Rows: 736
Columns: 12
$ passenger_id <dbl> 3, 4, 11, 16, 17, 26, 39, 40, 41, 46, 53, 54, 59, 61, 6…
$ survived <chr> "no", "no", "no", "no", "no", "no", "no", "no", "no", "…
$ pclass <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ name <chr> "Allison, Miss. Helen Loraine", "Allison, Mr. Hudson Jo…
$ sex <chr> "female", "male", "male", "male", "male", "male", "male…
$ age <dbl> 2, 30, 47, NA, 24, 25, 41, 48, NA, 45, 28, 17, 49, 36, …
$ n_sib_spouse <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0…
$ n_parent_child <dbl> 2, 2, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0…
$ ticket <chr> "113781", "113781", "PC 17757", "PC 17318", "PC 17558",…
$ fare <dbl> 151.5500, 151.5500, 227.5250, 25.9250, 247.5208, 26.000…
$ cabin <chr> "C22 C26", "C22 C26", "C62 C64", NA, "B58 B60", NA, "A2…
$ embarked <chr> "s", "s", "c", "s", "c", "c", "s", "c", "c", "s", "s", …
data_val <- read_csv(here::here(path_data, "titanic_val_cln.csv"),
col_types = cols()) |>
glimpse()Rows: 246
Columns: 12
$ passenger_id <dbl> 2, 12, 18, 20, 21, 31, 42, 51, 52, 84, 85, 97, 101, 103…
$ survived <chr> "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "…
$ pclass <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ name <chr> "Allison, Master. Hudson Trevor", "Astor, Mrs. John Jac…
$ sex <chr> "male", "female", "female", "male", "male", "male", "fe…
$ age <dbl> 0.9167, 18.0000, 50.0000, 36.0000, 37.0000, 45.0000, 44…
$ n_sib_spouse <dbl> 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 3, 0…
$ n_parent_child <dbl> 2, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 2, 0…
$ ticket <chr> "113781", "PC 17757", "PC 17558", "13050", "11751", "11…
$ fare <dbl> 151.5500, 227.5250, 247.5208, 75.2417, 52.5542, 35.5000…
$ cabin <chr> "C22 C26", "C62 C64", "B58 B60", "C6", "D35", "T", "B4"…
$ embarked <chr> "s", "c", "c", "c", "s", "s", "c", "c", "s", "s", "c", …
data_test <- read_csv(here::here(path_data, "titanic_test_cln.csv"),
col_types = cols()) |>
glimpse()Rows: 327
Columns: 12
$ passenger_id <dbl> 1, 5, 8, 9, 10, 15, 24, 25, 29, 33, 35, 45, 47, 49, 50,…
$ survived <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ pclass <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ name <chr> "Allen, Miss. Elisabeth Walton", "Allison, Mrs. Hudson …
$ sex <chr> "female", "female", "male", "female", "male", "male", "…
$ age <dbl> 29, 25, 39, 53, 71, 80, 42, 29, 35, 30, 42, 41, NA, 53,…
$ n_sib_spouse <dbl> 0, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0…
$ n_parent_child <dbl> 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 0, 0…
$ ticket <chr> "24160", "113781", "112050", "11769", "PC 17609", "2704…
$ fare <dbl> 211.3375, 151.5500, 0.0000, 51.4792, 49.5042, 30.0000, …
$ cabin <chr> "B5", "C22 C26", "A36", "C101", NA, "A23", NA, "C97", "…
$ embarked <chr> "s", "s", "s", "s", "c", "s", "c", "s", "s", "s", "s", …
Set appropriate variable classes (remember character classes should be factors).
Remember: - Nominal and ordinal variables are set to factors - Interval and ratio variables are set to numeric
Remember ordinal variables like pclass should have the factor levels explicitly stated to retain the order.
It also helps to class survived as a factor because otherwise it will be read as logical in the test set where the scores are all NA (remember – it is correct that these are NA, the TAs will keep the labeled test set to themselves for competition-scoring purposes).
We will handle this by writing a simple function. This is probably not necessary since classing is so quick, but it’s good practice! We will handle order for the factors we care about explicitly, and then fix the rest using across().
class_titantic <- function(d){
d |>
mutate(survived = factor(survived, levels = c("no", "yes")),
pclass = factor(pclass, levels = c("1", "2", "3")),
across(where(is.character), factor))
}Now, let’s apply it to all three sets.
data_trn <- data_trn |>
class_titantic() |>
glimpse()Rows: 736
Columns: 12
$ passenger_id <dbl> 3, 4, 11, 16, 17, 26, 39, 40, 41, 46, 53, 54, 59, 61, 6…
$ survived <fct> no, no, no, no, no, no, no, no, no, no, no, no, no, no,…
$ pclass <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ name <fct> "Allison, Miss. Helen Loraine", "Allison, Mr. Hudson Jo…
$ sex <fct> female, male, male, male, male, male, male, male, male,…
$ age <dbl> 2, 30, 47, NA, 24, 25, 41, 48, NA, 45, 28, 17, 49, 36, …
$ n_sib_spouse <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0…
$ n_parent_child <dbl> 2, 2, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0…
$ ticket <fct> 113781, 113781, PC 17757, PC 17318, PC 17558, 13905, 11…
$ fare <dbl> 151.5500, 151.5500, 227.5250, 25.9250, 247.5208, 26.000…
$ cabin <fct> C22 C26, C22 C26, C62 C64, NA, B58 B60, NA, A21, B10, N…
$ embarked <fct> s, s, c, s, c, c, s, c, c, s, s, s, s, s, s, c, s, c, s…
data_val <- data_val |>
class_titantic() |>
glimpse()Rows: 246
Columns: 12
$ passenger_id <dbl> 2, 12, 18, 20, 21, 31, 42, 51, 52, 84, 85, 97, 101, 103…
$ survived <fct> yes, yes, yes, no, yes, no, yes, yes, no, yes, no, no, …
$ pclass <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ name <fct> "Allison, Master. Hudson Trevor", "Astor, Mrs. John Jac…
$ sex <fct> male, female, female, male, male, male, female, female,…
$ age <dbl> 0.9167, 18.0000, 50.0000, 36.0000, 37.0000, 45.0000, 44…
$ n_sib_spouse <dbl> 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 3, 0…
$ n_parent_child <dbl> 2, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 2, 0…
$ ticket <fct> 113781, PC 17757, PC 17558, 13050, 11751, 113784, PC 17…
$ fare <dbl> 151.5500, 227.5250, 247.5208, 75.2417, 52.5542, 35.5000…
$ cabin <fct> C22 C26, C62 C64, B58 B60, C6, D35, T, B4, B51 B53 B55,…
$ embarked <fct> s, c, c, c, s, s, c, c, s, s, c, c, c, c, c, c, c, s, s…
data_test <- data_test |>
class_titantic() |>
glimpse()Rows: 327
Columns: 12
$ passenger_id <dbl> 1, 5, 8, 9, 10, 15, 24, 25, 29, 33, 35, 45, 47, 49, 50,…
$ survived <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ pclass <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ name <fct> "Allen, Miss. Elisabeth Walton", "Allison, Mrs. Hudson …
$ sex <fct> female, female, male, female, male, male, female, femal…
$ age <dbl> 29, 25, 39, 53, 71, 80, 42, 29, 35, 30, 42, 41, NA, 53,…
$ n_sib_spouse <dbl> 0, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0…
$ n_parent_child <dbl> 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 0, 0…
$ ticket <fct> 24160, 113781, 112050, 11769, PC 17609, 27042, PC 17757…
$ fare <dbl> 211.3375, 151.5500, 0.0000, 51.4792, 49.5042, 30.0000, …
$ cabin <fct> B5, C22 C26, A36, C101, NA, A23, NA, C97, C99, C7, D22,…
$ embarked <fct> s, s, s, s, c, s, c, s, s, s, s, c, s, c, c, s, s, c, s…
Here, we get creative and make some additional predictors (title and group_size) that we hypothesize might help in predicting survived.
We could have also simply done this in our recipe. Perhaps most importantly, we decided to do this using modeling EDA ONLY BASED ON TRAINING DATA.
add_predictors <- function(d){
d |>
mutate(group_size = n_parent_child + n_sib_spouse) |>
mutate(title = str_to_lower(str_extract(name,
"(?<=[:punct:][:space:])[:alpha:]{2,}(?=[:punct:])"))) |>
mutate(sparse_title = case_when(
title == "master" ~ "boy",
title %in% c("miss", "ms") ~ "young_woman",
title %in% c("dona", "lady", "mrs") ~ "adult_woman",
title %in% c("mr", "sir", "don", "jonkheer") ~ "adult_man",
title %in% c("col", "dr", "major", "rev") ~ "professional",
TRUE ~ NA_character_)) |>
mutate(sparse_title = case_when(
is.na(sparse_title) & age < 12 & sex == "male" ~ "boy",
is.na(sparse_title) & age < 18 & sex == "female" ~ "young_woman",
is.na(sparse_title) & sex == "male" ~ "adult_man",
is.na(sparse_title) & sex == "female" ~ "adult_woman",
!is.na(sparse_title) ~ as.character(sparse_title),
TRUE ~ NA_character_)) |>
mutate(n_cabins = str_count(cabin, "[:alpha:]"))
}Now, we’ll apply this function to all of our sets.
data_trn <- data_trn |>
add_predictors() |>
glimpse()Rows: 736
Columns: 16
$ passenger_id <dbl> 3, 4, 11, 16, 17, 26, 39, 40, 41, 46, 53, 54, 59, 61, 6…
$ survived <fct> no, no, no, no, no, no, no, no, no, no, no, no, no, no,…
$ pclass <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ name <fct> "Allison, Miss. Helen Loraine", "Allison, Mr. Hudson Jo…
$ sex <fct> female, male, male, male, male, male, male, male, male,…
$ age <dbl> 2, 30, 47, NA, 24, 25, 41, 48, NA, 45, 28, 17, 49, 36, …
$ n_sib_spouse <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0…
$ n_parent_child <dbl> 2, 2, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0…
$ ticket <fct> 113781, 113781, PC 17757, PC 17318, PC 17558, 13905, 11…
$ fare <dbl> 151.5500, 151.5500, 227.5250, 25.9250, 247.5208, 26.000…
$ cabin <fct> C22 C26, C22 C26, C62 C64, NA, B58 B60, NA, A21, B10, N…
$ embarked <fct> s, s, c, s, c, c, s, c, c, s, s, s, s, s, s, c, s, c, s…
$ group_size <dbl> 3, 3, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 2, 0…
$ title <chr> "miss", "mr", "col", "mr", "mr", "mr", "mr", "mr", "dr"…
$ sparse_title <chr> "young_woman", "adult_man", "professional", "adult_man"…
$ n_cabins <int> 2, 2, 2, NA, 2, NA, 1, 1, NA, 1, NA, NA, NA, 1, 1, 1, 1…
data_val <- data_val |>
add_predictors() |>
glimpse()Rows: 246
Columns: 16
$ passenger_id <dbl> 2, 12, 18, 20, 21, 31, 42, 51, 52, 84, 85, 97, 101, 103…
$ survived <fct> yes, yes, yes, no, yes, no, yes, yes, no, yes, no, no, …
$ pclass <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ name <fct> "Allison, Master. Hudson Trevor", "Astor, Mrs. John Jac…
$ sex <fct> male, female, female, male, male, male, female, female,…
$ age <dbl> 0.9167, 18.0000, 50.0000, 36.0000, 37.0000, 45.0000, 44…
$ n_sib_spouse <dbl> 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 3, 0…
$ n_parent_child <dbl> 2, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 2, 0…
$ ticket <fct> 113781, PC 17757, PC 17558, 13050, 11751, 113784, PC 17…
$ fare <dbl> 151.5500, 227.5250, 247.5208, 75.2417, 52.5542, 35.5000…
$ cabin <fct> C22 C26, C62 C64, B58 B60, C6, D35, T, B4, B51 B53 B55,…
$ embarked <fct> s, c, c, c, s, s, c, c, s, s, c, c, c, c, c, c, c, s, s…
$ group_size <dbl> 3, 1, 1, 0, 2, 0, 0, 1, 0, 2, 1, 1, 1, 1, 1, 0, 0, 5, 0…
$ title <chr> "master", "mrs", "mrs", "mr", "mr", "mr", "mrs", "mrs",…
$ sparse_title <chr> "boy", "adult_woman", "adult_woman", "adult_man", "adul…
$ n_cabins <int> 2, 2, 2, 1, 1, 1, 1, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1…
data_test <- data_test |>
add_predictors() |>
glimpse()Rows: 327
Columns: 16
$ passenger_id <dbl> 1, 5, 8, 9, 10, 15, 24, 25, 29, 33, 35, 45, 47, 49, 50,…
$ survived <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ pclass <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ name <fct> "Allen, Miss. Elisabeth Walton", "Allison, Mrs. Hudson …
$ sex <fct> female, female, male, female, male, male, female, femal…
$ age <dbl> 29, 25, 39, 53, 71, 80, 42, 29, 35, 30, 42, 41, NA, 53,…
$ n_sib_spouse <dbl> 0, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0…
$ n_parent_child <dbl> 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 0, 0…
$ ticket <fct> 24160, 113781, 112050, 11769, PC 17609, 27042, PC 17757…
$ fare <dbl> 211.3375, 151.5500, 0.0000, 51.4792, 49.5042, 30.0000, …
$ cabin <fct> B5, C22 C26, A36, C101, NA, A23, NA, C97, C99, C7, D22,…
$ embarked <fct> s, s, s, s, c, s, c, s, s, s, s, c, s, c, c, s, s, c, s…
$ group_size <dbl> 0, 3, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 3, 0, 0…
$ title <chr> "miss", "mrs", "mr", "mrs", "mr", "mr", "miss", "miss",…
$ sparse_title <chr> "young_woman", "adult_woman", "adult_man", "adult_woman…
$ n_cabins <int> 1, 2, 1, 1, NA, 1, NA, 1, 1, 1, 1, 1, NA, NA, 3, 2, 2, …
Create tracking tibble
Create an empty tracking tibble to track the validation error across the various model configurations you will fit.
error_val <- tibble(model = character(),
acc_val = numeric()) |>
glimpse()Rows: 0
Columns: 2
$ model <chr>
$ acc_val <dbl>
NOTE: See the key for fitting KNN models to see a demonstration of accuracy & what it means in this particular dataset.
RDA model configuration 1
We start by feature engineering a variable title (see above and the key for modeling EDA for more information). We extract the passengers’ titles from their names (e.g., “Mr” or “Miss”). This variable represents a combination of age and sex because titles differ by both. That also means I can eventually use age and sex variables to help fill in values when title is missing. Because there are many unique titles, we collapse it down into a sparse version (sparse_title). This isn’t required for for the homework but we show it to you here for learning purposes.
This meets the following homework requirements: Additional feature engineering technique of my choice!
Set up recipe
Informed by EDA from your modeling EDA script, create a recipe for your first model.
# only using sparse_title in the model
rec_1 <-
recipe(survived ~ sparse_title, data = data_trn) |>
step_dummy(sparse_title) Training feature matrix
Use the
prep()function to prep your recipe.
rec_prep <- rec_1 |>
prep(training = data_trn)Use the
bake()function to generate the feature matrix of the training data.
feat_trn <- rec_prep |>
bake(new_data = NULL)Fit Model
Fit a
rdamodel predictingsurvivedfrom your training feature matrix.
fit_rda_1 <- discrim_regularized(frac_common_cov = 0,
frac_identity = 0) |>
set_engine("klaR") |>
fit(survived ~ ., data = feat_trn)Validation feature matrix
Use
bake()to generate the feature matrix of the validation data that we will use to assess your model.
feat_val <- rec_prep |>
bake(new_data = data_val)Assess your model
Use
accuracy_vec()to calculate the validation error (accuracy) of your model. Add this value to your validation error tracking tibble.
accuracy_vec(feat_val$survived, predict(fit_rda_1, feat_val, type = "class")$.pred_class)[1] 0.7886179
error_val <- add_row(error_val,
model = "rda_1",
acc_val = accuracy_vec(feat_val$survived,
predict(fit_rda_1,
feat_val)$.pred_class))
error_val# A tibble: 1 × 2
model acc_val
<chr> <dbl>
1 rda_1 0.789
RDA model configuration 2
We’re now going to use sparse_title again but for imputation. age has a lot of missing values, but we know that age differs by sparse_title (see modeling EDA). We’ll use sparse_title as part of a knn imputation for age. Then, maybe age will be a better predictor that it would be with lots of missing data.
This meets the following homework requirements: Variable with missingness
Set up recipe
Informed by EDA from your modeling EDA, create a recipe for your second model.
rec_2 <-
recipe(survived ~ sparse_title + age + sex, data = data_trn) |>
step_dummy(sex, sparse_title) |>
step_impute_knn(age, impute_with = imp_vars(starts_with("sparse_title"))) |>
step_rm(starts_with("sparse_title"))Training feature matrix
Use the
prep()function to prep your recipe.
rec_prep_2 <- rec_2 |>
prep(training = data_trn)Use the
bake()function to generate the feature matrix of the training data.
feat_trn_2 <- rec_prep_2 |>
bake(new_data = NULL)Fit your model
Fit a
rdamodel predictingsurvivedfrom your training feature matrix.
fit_rda_2 <- discrim_regularized(frac_common_cov = 0,
frac_identity = 0) |>
set_engine("klaR") |>
fit(survived ~ ., data = feat_trn_2)Validation feature matrix
Use
bake()to generate the feature matrix of the validation data that we will use to assess your model.
feat_val_2 <- rec_prep_2 |>
bake(new_data = data_val)Assess your model
Use
accuracy_vec()to calculate the validation error (accuracy) of your model. Add this value to your validation error tracking tibble.
accuracy_vec(feat_val_2$survived, predict(fit_rda_2, feat_val_2)$.pred_class)[1] 0.7723577
error_val <- add_row(error_val,
model = "rda_2",
acc_val = accuracy_vec(feat_val_2$survived,
predict(fit_rda_2,
feat_val_2)$.pred_class))
error_val# A tibble: 2 × 2
model acc_val
<chr> <dbl>
1 rda_1 0.789
2 rda_2 0.772
RDA Configuration 3
Let’s try using step_pca() to make an “expense” variable. This will contain information from fare (YJ-transformed), pclass (kept as numeric),cabins (see modeling EDA key for more information on extracting the number of cabins from the cabin variable; created then YJ-transformed), and group_size (created then YJ-transformed).
This meets the following homework requirements: Transformed numeric
Set up recipe
Informed by EDA from your modeling EDA, create a recipe for your third model.
rec_3 <- recipe(survived ~ sparse_title + age + sex + group_size + n_cabins
+ fare, data = data_trn) |>
step_dummy(sex, sparse_title) |>
step_impute_knn(age, impute_with = imp_vars(starts_with("sparse_title"))) |>
step_YeoJohnson(fare, group_size, n_cabins) |>
step_impute_median(fare, n_cabins) |>
step_pca(fare, group_size, n_cabins, group_size, prefix = "expense_") |>
step_rm(starts_with("sparse_title"))Training feature matrix
Use the
prep()function to prep your recipe.
rec_prep_3 <- rec_3 |>
prep(training = data_trn)Use the
bake()function to generate the feature matrix of the training data.
feat_trn_3 <- rec_prep_3 |>
bake(new_data = data_trn)Fit your model
Fit a
rdamodel predictingsurvivedfrom your training feature matrix.
# use these hyperparamater values:
# discrim_regularized(frac_common_cov = 0, frac_identity = 0)
fit_rda_3 <- discrim_regularized(frac_common_cov = 0,
frac_identity = 0) |>
set_engine("klaR") |>
fit(survived ~ ., data = feat_trn_3)Validation feature matrix
Use
bake()to generate the feature matrix of the validation data that we will use to assess your model.
feat_val_3 <- rec_prep_3 |>
bake(new_data = data_val)Assess your model
Use accuracy_vec() to calculate the validation error (accuracy) of your model. Add this value to your validation error tracking tibble.
accuracy_vec(feat_val_3$survived, predict(fit_rda_3, feat_val_3)$.pred_class)[1] 0.7804878
error_val <- add_row(error_val,
model = "rda_3",
acc_val = accuracy_vec(feat_val_3$survived,
predict(fit_rda_3,
feat_val_3)$.pred_class))
error_val# A tibble: 3 × 2
model acc_val
<chr> <dbl>
1 rda_1 0.789
2 rda_2 0.772
3 rda_3 0.780
Generate predictions
My best model is a KNN, so this script is done! See key_unit_04_fit_knn.html for solution code for generating predictions.