# Necessary Setup
library(tidyverse)
library(tidymodels)
options(conflicts.policy = "depends.ok")
# John's Functions/Scripts
source("https://github.com/jjcurtin/lab_support/blob/main/fun_ml.R?raw=true")
source("https://github.com/jjcurtin/lab_support/blob/main/fun_plots.R?raw=true")
source("https://github.com/jjcurtin/lab_support/blob/main/fun_eda.R?raw=true")
# Additional Packages
conflictRules("Matrix", mask.ok = c("expand", "pack", "unpack"))
library(future)
library(xfun, include.only = "cache_rds")
# Additional Configurations
theme_set(theme_bw())
options(tibble.width = Inf, dplyr.print_max=Inf)
# Paths
path_data <- "application_assignments/unit_08"
# Parallel Processing
plan(multisession, workers = (parallel::detectCores(logical = FALSE) -1))Homework Unit 8: Advanced Performance Metrics
Introduction
To begin, download the following from the course web book (Unit 8):
hw_unit_8_performance.qmd(notebook for this assignment)breast_cancer.csv(data for this assignment)
The data for this week’s assignment has information about breast cancer diagnoses. It contains characteristics of different breast cancer tumors and classifies the tumor as benign or malignant. Your goal is to choose among two candidate statistical algorithms (general GLM vs a tuned KNN model) to identify and evaluate the best performing model for diagnosis.
You can imagine that the consequences of missing cancerous tumors are not equal to the consequences of misdiagnosing benign tumors. In this assignment, we will explore how the performance metric and balance of diagnoses affect our evaluation of best performing model in this data.
NOTE: Fitting models in this assignment will generate some warnings having to do with glm.fit. This is to be expected, and we are going to review these warnings and some related issues in our next lab.
Let’s get started!
Setup
Set up your notebook in this section. You will want to set your conflicts policy, paths, and initiate parallel processing here!
Read in your data
Read in the breast_cancer.csv data file and save as an object called data_all, perform any checks needed on the data (i.e., light cleaning EDA) and set the outcome diagnosis as a factor with malignant as the first level.
data_all <- read_csv(here::here(path_data, "breast_cancer.csv"), col_types = cols()) |>
glimpse()Rows: 423
Columns: 31
$ diagnosis <chr> "malignant", "benign", "benign", "malignant", …
$ perimeter_se <dbl> 1.9740, 1.6670, 1.4890, 2.9890, 2.6840, 1.4450…
$ fractal_dimension_mean <dbl> 0.05986, 0.06320, 0.05828, 0.06768, 0.05934, 0…
$ concave_points_worst <dbl> 0.12520, 0.11050, 0.03002, 0.20270, 0.06544, 0…
$ symmetry_mean <dbl> 0.1594, 0.1886, 0.1845, 0.2157, 0.1834, 0.1514…
$ texture_se <dbl> 0.3621, 0.7339, 1.6470, 0.9489, 0.8429, 1.0660…
$ concave_points_se <dbl> 0.008260, 0.013040, 0.003419, 0.012710, 0.0091…
$ concavity_mean <dbl> 0.075500, 0.070970, 0.004967, 0.169000, 0.0263…
$ fractal_dimension_se <dbl> 0.002881, 0.001982, 0.002534, 0.003884, 0.0014…
$ radius_worst <dbl> 17.770, 12.640, 12.360, 18.810, 12.970, 14.200…
$ concave_points_mean <dbl> 0.040790, 0.044970, 0.006434, 0.089230, 0.0206…
$ radius_mean <dbl> 15.120, 11.610, 11.220, 14.870, 11.500, 12.620…
$ smoothness_se <dbl> 0.005472, 0.005884, 0.004359, 0.006985, 0.0063…
$ smoothness_worst <dbl> 0.14910, 0.14150, 0.09994, 0.18780, 0.11830, 0…
$ symmetry_se <dbl> 0.01523, 0.01848, 0.01916, 0.01602, 0.02292, 0…
$ radius_se <dbl> 0.2711, 0.2456, 0.2239, 0.4266, 0.3927, 0.2449…
$ concavity_worst <dbl> 0.33270, 0.23020, 0.02318, 0.47040, 0.08105, 0…
$ concavity_se <dbl> 0.020390, 0.026310, 0.003223, 0.030110, 0.0124…
$ compactness_se <dbl> 0.019190, 0.020050, 0.006813, 0.025630, 0.0106…
$ smoothness_mean <dbl> 0.08876, 0.10880, 0.07780, 0.11620, 0.09345, 0…
$ area_se <dbl> 26.440, 15.890, 15.460, 41.180, 26.990, 18.510…
$ area_worst <dbl> 989.5, 475.7, 470.9, 1095.0, 508.9, 624.0, 544…
$ perimeter_mean <dbl> 98.78, 75.46, 70.79, 98.64, 73.28, 81.35, 79.0…
$ compactness_mean <dbl> 0.09588, 0.11680, 0.03574, 0.16490, 0.05991, 0…
$ area_mean <dbl> 716.6, 408.2, 386.8, 682.5, 407.4, 496.4, 466.…
$ fractal_dimension_worst <dbl> 0.09740, 0.07427, 0.07307, 0.10650, 0.06487, 0…
$ texture_mean <dbl> 16.68, 16.02, 33.81, 16.67, 18.45, 23.97, 18.5…
$ perimeter_worst <dbl> 117.70, 81.93, 78.44, 127.10, 83.12, 90.67, 85…
$ symmetry_worst <dbl> 0.3415, 0.2787, 0.2911, 0.3585, 0.2740, 0.2826…
$ texture_worst <dbl> 20.24, 19.67, 41.78, 27.37, 22.46, 31.31, 27.4…
$ compactness_worst <dbl> 0.33310, 0.21700, 0.06885, 0.44800, 0.10490, 0…
data_all |>
skim_some()| Name | data_all |
| Number of rows | 423 |
| Number of columns | 31 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 30 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| diagnosis | 0 | 1 | 6 | 9 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | p0 | p100 |
|---|---|---|---|---|
| perimeter_se | 0 | 1 | 0.76 | 21.98 |
| fractal_dimension_mean | 0 | 1 | 0.05 | 0.10 |
| concave_points_worst | 0 | 1 | 0.00 | 0.29 |
| symmetry_mean | 0 | 1 | 0.11 | 0.29 |
| texture_se | 0 | 1 | 0.36 | 4.88 |
| concave_points_se | 0 | 1 | 0.00 | 0.05 |
| concavity_mean | 0 | 1 | 0.00 | 0.43 |
| fractal_dimension_se | 0 | 1 | 0.00 | 0.03 |
| radius_worst | 0 | 1 | 7.93 | 33.13 |
| concave_points_mean | 0 | 1 | 0.00 | 0.19 |
| radius_mean | 0 | 1 | 6.98 | 28.11 |
| smoothness_se | 0 | 1 | 0.00 | 0.02 |
| smoothness_worst | 0 | 1 | 0.07 | 0.21 |
| symmetry_se | 0 | 1 | 0.01 | 0.08 |
| radius_se | 0 | 1 | 0.11 | 2.87 |
| concavity_worst | 0 | 1 | 0.00 | 1.25 |
| concavity_se | 0 | 1 | 0.00 | 0.40 |
| compactness_se | 0 | 1 | 0.00 | 0.11 |
| smoothness_mean | 0 | 1 | 0.05 | 0.16 |
| area_se | 0 | 1 | 6.80 | 525.60 |
| area_worst | 0 | 1 | 185.20 | 3432.00 |
| perimeter_mean | 0 | 1 | 43.79 | 188.50 |
| compactness_mean | 0 | 1 | 0.02 | 0.35 |
| area_mean | 0 | 1 | 143.50 | 2499.00 |
| fractal_dimension_worst | 0 | 1 | 0.06 | 0.17 |
| texture_mean | 0 | 1 | 9.71 | 39.28 |
| perimeter_worst | 0 | 1 | 50.41 | 229.30 |
| symmetry_worst | 0 | 1 | 0.16 | 0.66 |
| texture_worst | 0 | 1 | 12.02 | 49.54 |
| compactness_worst | 0 | 1 | 0.03 | 0.87 |
data_all <- data_all |>
mutate(diagnosis = factor(diagnosis, levels = c("malignant", "benign"))) |>
glimpse()Rows: 423
Columns: 31
$ diagnosis <fct> malignant, benign, benign, malignant, benign, …
$ perimeter_se <dbl> 1.9740, 1.6670, 1.4890, 2.9890, 2.6840, 1.4450…
$ fractal_dimension_mean <dbl> 0.05986, 0.06320, 0.05828, 0.06768, 0.05934, 0…
$ concave_points_worst <dbl> 0.12520, 0.11050, 0.03002, 0.20270, 0.06544, 0…
$ symmetry_mean <dbl> 0.1594, 0.1886, 0.1845, 0.2157, 0.1834, 0.1514…
$ texture_se <dbl> 0.3621, 0.7339, 1.6470, 0.9489, 0.8429, 1.0660…
$ concave_points_se <dbl> 0.008260, 0.013040, 0.003419, 0.012710, 0.0091…
$ concavity_mean <dbl> 0.075500, 0.070970, 0.004967, 0.169000, 0.0263…
$ fractal_dimension_se <dbl> 0.002881, 0.001982, 0.002534, 0.003884, 0.0014…
$ radius_worst <dbl> 17.770, 12.640, 12.360, 18.810, 12.970, 14.200…
$ concave_points_mean <dbl> 0.040790, 0.044970, 0.006434, 0.089230, 0.0206…
$ radius_mean <dbl> 15.120, 11.610, 11.220, 14.870, 11.500, 12.620…
$ smoothness_se <dbl> 0.005472, 0.005884, 0.004359, 0.006985, 0.0063…
$ smoothness_worst <dbl> 0.14910, 0.14150, 0.09994, 0.18780, 0.11830, 0…
$ symmetry_se <dbl> 0.01523, 0.01848, 0.01916, 0.01602, 0.02292, 0…
$ radius_se <dbl> 0.2711, 0.2456, 0.2239, 0.4266, 0.3927, 0.2449…
$ concavity_worst <dbl> 0.33270, 0.23020, 0.02318, 0.47040, 0.08105, 0…
$ concavity_se <dbl> 0.020390, 0.026310, 0.003223, 0.030110, 0.0124…
$ compactness_se <dbl> 0.019190, 0.020050, 0.006813, 0.025630, 0.0106…
$ smoothness_mean <dbl> 0.08876, 0.10880, 0.07780, 0.11620, 0.09345, 0…
$ area_se <dbl> 26.440, 15.890, 15.460, 41.180, 26.990, 18.510…
$ area_worst <dbl> 989.5, 475.7, 470.9, 1095.0, 508.9, 624.0, 544…
$ perimeter_mean <dbl> 98.78, 75.46, 70.79, 98.64, 73.28, 81.35, 79.0…
$ compactness_mean <dbl> 0.09588, 0.11680, 0.03574, 0.16490, 0.05991, 0…
$ area_mean <dbl> 716.6, 408.2, 386.8, 682.5, 407.4, 496.4, 466.…
$ fractal_dimension_worst <dbl> 0.09740, 0.07427, 0.07307, 0.10650, 0.06487, 0…
$ texture_mean <dbl> 16.68, 16.02, 33.81, 16.67, 18.45, 23.97, 18.5…
$ perimeter_worst <dbl> 117.70, 81.93, 78.44, 127.10, 83.12, 90.67, 85…
$ symmetry_worst <dbl> 0.3415, 0.2787, 0.2911, 0.3585, 0.2740, 0.2826…
$ texture_worst <dbl> 20.24, 19.67, 41.78, 27.37, 22.46, 31.31, 27.4…
$ compactness_worst <dbl> 0.33310, 0.21700, 0.06885, 0.44800, 0.10490, 0…
data_all |>
skim_some() |>
skimr::yank("factor")Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| diagnosis | 0 | 1 | FALSE | 2 | ben: 357, mal: 66 |
Print a table to see the balance of positive and negative diagnosis cases.
data_all |>
janitor::tabyl(diagnosis) diagnosis n percent
malignant 66 0.1560284
benign 357 0.8439716
What do you notice about the distribution of your outcome variable? Do you have any concerns? Write at least 2 sentences.
Our outcome variable is unbalanced (357:66). This could create problems with the Positive Predictive Value of our model and yield lower sensitivity (i.e., true positives).
Split data into train and test
Hold out 1/3 of the data as a test set for evaluation using the initial_split() function. Use the provided seed. Stratify on diagnosis.
set.seed(02182002)
splits_test <- data_all |>
initial_split(prop = 2/3, strata = "diagnosis")
data_train <- splits_test |>
analysis()
data_test <- splits_test |>
assessment()Light Modeling EDA
Look at correlations between predictors in data_train.
data_train |>
select(where(is.numeric)) |>
cor() |>
corrplot::corrplot(type = "upper")Visualize the variance of your predictors in data_train.
data_train |>
select(where(is.numeric)) |>
names() |>
map(\(name) plot_box_violin(df = data_train, x = name)) |>
cowplot::plot_grid(plotlist = _, ncol = 5)Answer the following questions with at least 2 sentences!
Why should you be looking at variance of your predictors?
Looking at the variability of our predictors allows us to examine the variable distribution and flag any potential outlier values. Examining the variable distribution is especially important here since we are considering fitting a linear model, and linear models assume that our predictors are normally distributed. Outlier values can be problematic for all types of models if they have high influence and impact the shape of the relationship between our predictors and the outcome.
If you had concerns about the variance of your predictors, what would you do?
Transformations (e.g. log, Yeo Johnson) can be applied to normalize the distributional shape of predictors. Options for handling outliers include excluding outlier values that are clearly errors/invalid, retaining outliers that we have no evidence are incorrect, or bringing outlier values to the fence.
Do you have concerns about the variance of your predictors in these data?
Almost all variables are very positively skewed, with a handful of variables (e.g. concave_points_se, perimeter_se, area_se… basically all _se variables) having a few points with unusually high values that appear separated from the distribution and may be outliers.
GLM vs KNN
In this part of this assignment, you will compare the performance of a standard GLM model vs a KNN model (tuned for neighbors) for predicting breast cancer diagnoses from all available predictors. You will choose between these models using bootstrapped resampling, and evaluate the final performance of your model in the held out test set created earlier in this script. You will now select and evaluate models using ROC AUC instead of accuracy.
Bootstrap splits
Split data_train into 100 bootstrap samples stratified on diagnosis. Use the provided seed.
set.seed(02182002)
splits_boot <- data_train |>
bootstraps(times = 100, strata = "diagnosis")
splits_boot |> print(n=10)# Bootstrap sampling using stratification
# A tibble: 100 × 2
splits id
<list> <chr>
1 <split [282/101]> Bootstrap001
2 <split [282/105]> Bootstrap002
3 <split [282/94]> Bootstrap003
4 <split [282/104]> Bootstrap004
5 <split [282/112]> Bootstrap005
6 <split [282/105]> Bootstrap006
7 <split [282/94]> Bootstrap007
8 <split [282/105]> Bootstrap008
9 <split [282/95]> Bootstrap009
10 <split [282/107]> Bootstrap010
# ℹ 90 more rows
Build recipes
Write 2 recipes (one for GLM, one for KNN) to predict breast cancer diagnosis from all predictors in data_train. Include the minimal necessary steps for each algorithm, including what you learned from your light EDA above.
rec_glm <- recipe(diagnosis ~ ., data = data_train)
rec_knn <- recipe(diagnosis ~ ., data = data_train) |>
step_range(all_predictors())Fit GLM
Fit a logistic regression classifier using the recipe you created and your bootstrap splits. Use ROC AUC (roc_auc) as your performance metric.
fit_lr_boot <-
logistic_reg() |>
set_engine("glm") |>
fit_resamples(preprocessor = rec_glm,
resamples = splits_boot,
metrics = metric_set(roc_auc))→ A | warning: glm.fit: algorithm did not converge
→ B | warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
→ A | warning: glm.fit: algorithm did not converge
→ B | warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
→ A | warning: glm.fit: algorithm did not converge
→ B | warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
→ A | warning: glm.fit: algorithm did not converge
→ B | warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
→ A | warning: glm.fit: algorithm did not converge
→ B | warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
→ A | warning: glm.fit: algorithm did not converge
→ B | warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
→ A | warning: glm.fit: algorithm did not converge
→ B | warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
→ A | warning: glm.fit: algorithm did not converge
→ B | warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
→ A | warning: glm.fit: algorithm did not converge
→ B | warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
# Fitting our bootstrap splits in this way appears to work just fine. However, upon closer look we see that there were some warnings. First lets look for the warnings by calling the fit object directly:
fit_lr_boot# Resampling results
# Bootstrap sampling using stratification
# A tibble: 100 × 4
splits id .metrics .notes
<list> <chr> <list> <list>
1 <split [282/101]> Bootstrap001 <tibble [1 × 4]> <tibble [2 × 4]>
2 <split [282/105]> Bootstrap002 <tibble [1 × 4]> <tibble [2 × 4]>
3 <split [282/94]> Bootstrap003 <tibble [1 × 4]> <tibble [2 × 4]>
4 <split [282/104]> Bootstrap004 <tibble [1 × 4]> <tibble [2 × 4]>
5 <split [282/112]> Bootstrap005 <tibble [1 × 4]> <tibble [2 × 4]>
6 <split [282/105]> Bootstrap006 <tibble [1 × 4]> <tibble [2 × 4]>
7 <split [282/94]> Bootstrap007 <tibble [1 × 4]> <tibble [2 × 4]>
8 <split [282/105]> Bootstrap008 <tibble [1 × 4]> <tibble [2 × 4]>
9 <split [282/95]> Bootstrap009 <tibble [1 × 4]> <tibble [2 × 4]>
10 <split [282/107]> Bootstrap010 <tibble [1 × 4]> <tibble [2 × 4]>
11 <split [282/99]> Bootstrap011 <tibble [1 × 4]> <tibble [2 × 4]>
12 <split [282/93]> Bootstrap012 <tibble [1 × 4]> <tibble [2 × 4]>
13 <split [282/98]> Bootstrap013 <tibble [1 × 4]> <tibble [2 × 4]>
14 <split [282/94]> Bootstrap014 <tibble [1 × 4]> <tibble [2 × 4]>
15 <split [282/97]> Bootstrap015 <tibble [1 × 4]> <tibble [2 × 4]>
16 <split [282/104]> Bootstrap016 <tibble [1 × 4]> <tibble [2 × 4]>
17 <split [282/101]> Bootstrap017 <tibble [1 × 4]> <tibble [2 × 4]>
18 <split [282/109]> Bootstrap018 <tibble [1 × 4]> <tibble [2 × 4]>
19 <split [282/100]> Bootstrap019 <tibble [1 × 4]> <tibble [2 × 4]>
20 <split [282/103]> Bootstrap020 <tibble [1 × 4]> <tibble [2 × 4]>
21 <split [282/95]> Bootstrap021 <tibble [1 × 4]> <tibble [2 × 4]>
22 <split [282/105]> Bootstrap022 <tibble [1 × 4]> <tibble [2 × 4]>
23 <split [282/108]> Bootstrap023 <tibble [1 × 4]> <tibble [2 × 4]>
24 <split [282/119]> Bootstrap024 <tibble [1 × 4]> <tibble [2 × 4]>
25 <split [282/110]> Bootstrap025 <tibble [1 × 4]> <tibble [2 × 4]>
26 <split [282/118]> Bootstrap026 <tibble [1 × 4]> <tibble [2 × 4]>
27 <split [282/104]> Bootstrap027 <tibble [1 × 4]> <tibble [2 × 4]>
28 <split [282/106]> Bootstrap028 <tibble [1 × 4]> <tibble [2 × 4]>
29 <split [282/106]> Bootstrap029 <tibble [1 × 4]> <tibble [2 × 4]>
30 <split [282/95]> Bootstrap030 <tibble [1 × 4]> <tibble [2 × 4]>
31 <split [282/105]> Bootstrap031 <tibble [1 × 4]> <tibble [2 × 4]>
32 <split [282/100]> Bootstrap032 <tibble [1 × 4]> <tibble [2 × 4]>
33 <split [282/99]> Bootstrap033 <tibble [1 × 4]> <tibble [2 × 4]>
34 <split [282/105]> Bootstrap034 <tibble [1 × 4]> <tibble [2 × 4]>
35 <split [282/102]> Bootstrap035 <tibble [1 × 4]> <tibble [2 × 4]>
36 <split [282/111]> Bootstrap036 <tibble [1 × 4]> <tibble [2 × 4]>
37 <split [282/96]> Bootstrap037 <tibble [1 × 4]> <tibble [2 × 4]>
38 <split [282/102]> Bootstrap038 <tibble [1 × 4]> <tibble [2 × 4]>
39 <split [282/111]> Bootstrap039 <tibble [1 × 4]> <tibble [2 × 4]>
40 <split [282/112]> Bootstrap040 <tibble [1 × 4]> <tibble [2 × 4]>
41 <split [282/110]> Bootstrap041 <tibble [1 × 4]> <tibble [2 × 4]>
42 <split [282/101]> Bootstrap042 <tibble [1 × 4]> <tibble [2 × 4]>
43 <split [282/101]> Bootstrap043 <tibble [1 × 4]> <tibble [2 × 4]>
44 <split [282/102]> Bootstrap044 <tibble [1 × 4]> <tibble [2 × 4]>
45 <split [282/111]> Bootstrap045 <tibble [1 × 4]> <tibble [2 × 4]>
46 <split [282/111]> Bootstrap046 <tibble [1 × 4]> <tibble [2 × 4]>
47 <split [282/105]> Bootstrap047 <tibble [1 × 4]> <tibble [2 × 4]>
48 <split [282/97]> Bootstrap048 <tibble [1 × 4]> <tibble [2 × 4]>
49 <split [282/103]> Bootstrap049 <tibble [1 × 4]> <tibble [2 × 4]>
50 <split [282/100]> Bootstrap050 <tibble [1 × 4]> <tibble [2 × 4]>
51 <split [282/103]> Bootstrap051 <tibble [1 × 4]> <tibble [2 × 4]>
52 <split [282/103]> Bootstrap052 <tibble [1 × 4]> <tibble [2 × 4]>
53 <split [282/109]> Bootstrap053 <tibble [1 × 4]> <tibble [2 × 4]>
54 <split [282/106]> Bootstrap054 <tibble [1 × 4]> <tibble [2 × 4]>
55 <split [282/107]> Bootstrap055 <tibble [1 × 4]> <tibble [2 × 4]>
56 <split [282/96]> Bootstrap056 <tibble [1 × 4]> <tibble [2 × 4]>
57 <split [282/105]> Bootstrap057 <tibble [1 × 4]> <tibble [2 × 4]>
58 <split [282/105]> Bootstrap058 <tibble [1 × 4]> <tibble [2 × 4]>
59 <split [282/101]> Bootstrap059 <tibble [1 × 4]> <tibble [2 × 4]>
60 <split [282/100]> Bootstrap060 <tibble [1 × 4]> <tibble [2 × 4]>
61 <split [282/96]> Bootstrap061 <tibble [1 × 4]> <tibble [2 × 4]>
62 <split [282/102]> Bootstrap062 <tibble [1 × 4]> <tibble [2 × 4]>
63 <split [282/112]> Bootstrap063 <tibble [1 × 4]> <tibble [2 × 4]>
64 <split [282/115]> Bootstrap064 <tibble [1 × 4]> <tibble [2 × 4]>
65 <split [282/105]> Bootstrap065 <tibble [1 × 4]> <tibble [2 × 4]>
66 <split [282/113]> Bootstrap066 <tibble [1 × 4]> <tibble [2 × 4]>
67 <split [282/107]> Bootstrap067 <tibble [1 × 4]> <tibble [2 × 4]>
68 <split [282/105]> Bootstrap068 <tibble [1 × 4]> <tibble [2 × 4]>
69 <split [282/99]> Bootstrap069 <tibble [1 × 4]> <tibble [2 × 4]>
70 <split [282/102]> Bootstrap070 <tibble [1 × 4]> <tibble [2 × 4]>
71 <split [282/95]> Bootstrap071 <tibble [1 × 4]> <tibble [2 × 4]>
72 <split [282/97]> Bootstrap072 <tibble [1 × 4]> <tibble [2 × 4]>
73 <split [282/108]> Bootstrap073 <tibble [1 × 4]> <tibble [2 × 4]>
74 <split [282/115]> Bootstrap074 <tibble [1 × 4]> <tibble [2 × 4]>
75 <split [282/105]> Bootstrap075 <tibble [1 × 4]> <tibble [2 × 4]>
76 <split [282/99]> Bootstrap076 <tibble [1 × 4]> <tibble [2 × 4]>
77 <split [282/106]> Bootstrap077 <tibble [1 × 4]> <tibble [2 × 4]>
78 <split [282/109]> Bootstrap078 <tibble [1 × 4]> <tibble [2 × 4]>
79 <split [282/109]> Bootstrap079 <tibble [1 × 4]> <tibble [2 × 4]>
80 <split [282/103]> Bootstrap080 <tibble [1 × 4]> <tibble [2 × 4]>
81 <split [282/101]> Bootstrap081 <tibble [1 × 4]> <tibble [2 × 4]>
82 <split [282/105]> Bootstrap082 <tibble [1 × 4]> <tibble [2 × 4]>
83 <split [282/100]> Bootstrap083 <tibble [1 × 4]> <tibble [2 × 4]>
84 <split [282/98]> Bootstrap084 <tibble [1 × 4]> <tibble [2 × 4]>
85 <split [282/100]> Bootstrap085 <tibble [1 × 4]> <tibble [2 × 4]>
86 <split [282/102]> Bootstrap086 <tibble [1 × 4]> <tibble [2 × 4]>
87 <split [282/97]> Bootstrap087 <tibble [1 × 4]> <tibble [2 × 4]>
88 <split [282/110]> Bootstrap088 <tibble [1 × 4]> <tibble [2 × 4]>
89 <split [282/106]> Bootstrap089 <tibble [1 × 4]> <tibble [2 × 4]>
90 <split [282/96]> Bootstrap090 <tibble [1 × 4]> <tibble [2 × 4]>
91 <split [282/111]> Bootstrap091 <tibble [1 × 4]> <tibble [2 × 4]>
92 <split [282/104]> Bootstrap092 <tibble [1 × 4]> <tibble [2 × 4]>
93 <split [282/102]> Bootstrap093 <tibble [1 × 4]> <tibble [2 × 4]>
94 <split [282/97]> Bootstrap094 <tibble [1 × 4]> <tibble [2 × 4]>
95 <split [282/97]> Bootstrap095 <tibble [1 × 4]> <tibble [2 × 4]>
96 <split [282/96]> Bootstrap096 <tibble [1 × 4]> <tibble [2 × 4]>
97 <split [282/101]> Bootstrap097 <tibble [1 × 4]> <tibble [2 × 4]>
98 <split [282/108]> Bootstrap098 <tibble [1 × 4]> <tibble [2 × 4]>
99 <split [282/102]> Bootstrap099 <tibble [1 × 4]> <tibble [2 × 4]>
100 <split [282/111]> Bootstrap100 <tibble [1 × 4]> <tibble [2 × 4]>
There were issues with some computations:
- Warning(s) x100: glm.fit: algorithm did not converge
- Warning(s) x100: glm.fit: fitted probabilities numerically 0 or 1 occurred
Run `show_notes(.Last.tune.result)` for more information.
# This tells us that we should use: show_notes(.Last.tune.result). So let’s run that:
show_notes(.Last.tune.result)unique notes:
───────────────────────────────────
glm.fit: algorithm did not converge
───────────────────────────────────
glm.fit: fitted probabilities numerically 0 or 1 occurred
In this case, it just recapitulates the same message that presented by calling the object. The bigger point here is: always inspect error and warning messages.
We are getting a warning message about our fitted probabilities. This occurs because the outcome is totally separable (remember that issue from our discussions of logistic regression earlier). This model can still be used for prediction but we wouldn’t be able to use the statistical tests of the parameter estimates. Those are not our focus here.
We are going to fit this model on the whole sample so we can show you why we get this warning message. When we aren’t bootstrapping, the warning is provided when we fit.
# make feature matrix
rec_prep <- rec_glm |>
prep(data_train)
feat_train_glm <- rec_prep |>
bake(data_train)
# fit on full training sample
fit_best_glm <- logistic_reg() |>
set_engine("glm", control = list(maxit = 100)) |>
fit(diagnosis ~ ., data = feat_train_glm)Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
fit_best_glm$fit |> summary()
Call:
stats::glm(formula = diagnosis ~ ., family = stats::binomial,
data = data, control = ~list(maxit = 100))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.094e+02 9.229e+06 0 1
perimeter_se -2.293e+01 7.566e+05 0 1
fractal_dimension_mean 1.167e+03 1.042e+08 0 1
concave_points_worst -4.310e+02 1.090e+07 0 1
symmetry_mean 8.222e+01 1.742e+07 0 1
texture_se -7.060e-01 9.537e+05 0 1
concave_points_se 1.336e+03 1.111e+08 0 1
concavity_mean -5.896e+02 2.130e+07 0 1
fractal_dimension_se 7.799e+03 4.973e+08 0 1
radius_worst 3.012e+01 1.755e+06 0 1
concave_points_mean -4.427e+02 3.349e+07 0 1
radius_mean 1.293e+01 5.222e+06 0 1
smoothness_se 5.648e+03 2.155e+08 0 1
smoothness_worst -6.220e+02 1.873e+07 0 1
symmetry_se 3.178e+02 7.981e+07 0 1
radius_se -2.286e+02 1.283e+07 0 1
concavity_worst -6.620e+01 4.531e+06 0 1
concavity_se 3.150e+02 4.663e+07 0 1
compactness_se -1.106e+03 5.112e+07 0 1
smoothness_mean -2.835e+02 3.690e+07 0 1
area_se 2.950e+00 1.117e+05 0 1
area_worst -5.191e-01 1.212e+04 0 1
perimeter_mean -6.444e+00 7.119e+05 0 1
compactness_mean 7.326e+02 2.785e+07 0 1
area_mean 2.901e-01 1.353e+04 0 1
fractal_dimension_worst -1.641e+03 6.941e+07 0 1
texture_mean -3.705e+00 1.069e+05 0 1
perimeter_worst 2.604e+00 1.195e+05 0 1
symmetry_worst -8.095e+01 1.207e+07 0 1
texture_worst 6.892e-01 1.142e+05 0 1
compactness_worst 2.856e+02 7.002e+06 0 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2.4423e+02 on 281 degrees of freedom
Residual deviance: 5.8048e-10 on 251 degrees of freedom
AIC: 62
Number of Fisher Scoring iterations: 28
Let’s take a closer look at the predictions our model is making.
# probabilities in train
predict(fit_best_glm, feat_train_glm, type = "prob") |>
print(n = 50)# A tibble: 282 × 2
.pred_malignant .pred_benign
<dbl> <dbl>
1 2.22e-16 1
2 9.23e-12 1.000
3 2.22e-16 1
4 2.22e-16 1
5 2.22e-16 1
6 2.22e-16 1
7 4.03e-12 1.000
8 2.22e-16 1
9 2.22e-16 1
10 2.22e-16 1
11 2.22e-16 1
12 2.22e-16 1
13 2.74e-11 1.000
14 2.22e-16 1
15 2.22e-16 1
16 2.22e-16 1
17 2.22e-16 1
18 2.22e-16 1
19 2.22e-16 1
20 2.22e-16 1
21 2.22e-16 1
22 2.22e-16 1
23 2.22e-16 1
24 2.22e-16 1
25 2.22e-16 1
26 2.22e-16 1
27 2.22e-16 1
28 2.22e-16 1
29 2.22e-16 1
30 2.22e-16 1
31 2.22e-16 1
32 2.22e-16 1
33 2.22e-16 1
34 2.22e-16 1
35 2.22e-16 1
36 2.22e-16 1
37 2.22e-16 1
38 2.22e-16 1
39 2.22e-16 1
40 2.22e-16 1
41 1.16e-11 1.000
42 2.22e-16 1
43 1.07e-11 1.000
44 2.22e-16 1
45 2.22e-16 1
46 2.22e-16 1
47 2.22e-16 1
48 2.22e-16 1
49 1.16e-11 1.000
50 2.22e-16 1
# ℹ 232 more rows
# confusion matrix for train
cm_trn <- tibble(truth = feat_train_glm$diagnosis,
estimate = predict(fit_best_glm, feat_train_glm)$.pred_class) |>
conf_mat(truth, estimate)
cm_trn Truth
Prediction malignant benign
malignant 44 0
benign 0 238
cm_trn |>
summary()# A tibble: 13 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 accuracy binary 1
2 kap binary 1
3 sens binary 1
4 spec binary 1
5 ppv binary 1
6 npv binary 1
7 mcc binary 1
8 j_index binary 1
9 bal_accuracy binary 1
10 detection_prevalence binary 0.156
11 precision binary 1
12 recall binary 1
13 f_meas binary 1
Here we can see the cause of our warning message: Our data is perfectly predicting the classes (“perfect separation”) of the outcome in train (as seen from the confusion matrix above). This likely occurs because we can predict the outcome really well with these features in held out data (see below) and there is some overfitting in training which moves us from very good prediction to perfect prediction in training. Since we have confirmed that the warning message makes sense to us here, it is OK to progress.
Print the average ROC AUC of your logistic regression model.
collect_metrics(fit_lr_boot, summarize = TRUE)# A tibble: 1 × 6
.metric .estimator mean n std_err .config
<chr> <chr> <dbl> <int> <dbl> <chr>
1 roc_auc binary 0.959 100 0.00302 pre0_mod0_post0
You might be wondering why our performance is not perfect across bootstrapped samples (we got the error for each of the 100 bootstraps above, which means our model always predicted perfectly in train). For demonstration purposes, we will assess this model in test here as well.
# test feature matrix
feat_test_glm <- rec_prep |>
bake(data_test)
# confusion matrix for test
cm_test <- tibble(truth = feat_test_glm$diagnosis,
estimate = predict(fit_best_glm, feat_test_glm)$.pred_class) |>
conf_mat(truth, estimate)
cm_test Truth
Prediction malignant benign
malignant 20 1
benign 2 118
Performance is still pretty good here, but due to the ratio of number of predictors relative to our low training cases across bootstrapped folds, our models are likely able to perfectly fit our training data every time, but do not always echo this same performance in their relative test sets.
You should always check for these sorts of warnings by printing the model fit object.
Fit KNN
Set up a hyperparameter grid to consider a range of values for neighbors in your KNN models. NOTE be sure that you are not considering values higher than the N of your training sample! In this dataset N = 282.
hyper_grid <- expand.grid(neighbors = seq(1, 280, by = 10))Fit a KNN model using the recipe you created and your bootstrap splits. Use ROC AUC (roc_auc) as your performance metric.
fit_knn_boot <-
nearest_neighbor(neighbors = tune()) |>
set_engine("kknn") |>
set_mode("classification") |>
tune_grid(preprocessor = rec_knn,
resamples = splits_boot,
grid = hyper_grid,
metrics = metric_set(roc_auc))Generate a plot to help you determine if you considered a wide enough range of values for neighbors.
plot_hyperparameters(fit_knn_boot, hp1 = "neighbors", metric = "roc_auc")Print the best value for the neighbors hyperparameter across resamples based on model ROC AUC.
select_best(fit_knn_boot, metric = "roc_auc")# A tibble: 1 × 2
neighbors .config
<dbl> <chr>
1 161 pre0_mod17_post0
Print the average ROC AUC of your best KNN regression model
collect_metrics(fit_knn_boot, summarize = TRUE) |>
filter(neighbors == select_best(fit_knn_boot, metric = "roc_auc")$neighbors)# A tibble: 1 × 7
neighbors .metric .estimator mean n std_err .config
<dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
1 161 roc_auc binary 0.995 100 0.000492 pre0_mod17_post0
Select and fit best model
Now you will select your best model configuration among the various KNN and GLM models based on overall ROC AUC and train it on your full training sample.
Create training (feat_train) and test (feat_test) feature matrices using your best recipe (GLM or KNN)
rec_prep <- rec_knn |>
prep(data_train)
feat_train <- rec_prep |>
bake(data_train)
feat_test <- rec_prep |>
bake(data_test)Fit your best performing model on the full training sample (feat_train).
fit_best_knn <- nearest_neighbor(neighbors = select_best(fit_knn_boot, metric = "roc_auc")$neighbors) |>
set_engine("kknn") |>
set_mode("classification") |>
fit(diagnosis ~ ., data = feat_train)Evaluate the best model
Make a figure to plot the ROC of your best model in the test set.
roc_plot <-
tibble(truth = feat_test$diagnosis,
prob = predict(fit_best_knn, feat_test, type = "prob")$.pred_malignant) |>
roc_curve(prob, truth = truth)
roc_plot |>
ggplot(aes(x = 1 - specificity, y = sensitivity, color = .threshold)) +
geom_path() +
geom_abline(lty = 3) +
coord_equal() +
labs(x = "1 - Specificity (FPR)",
y = "Sensitivity (TPR)")Generate a confusion matrix depicting your model’s performance in test.
cm <- tibble(truth = feat_test$diagnosis,
estimate = predict(fit_best_knn, feat_test)$.pred_class) |>
conf_mat(truth, estimate)
cm Truth
Prediction malignant benign
malignant 9 0
benign 13 119
cm |>
summary()# A tibble: 13 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 accuracy binary 0.908
2 kap binary 0.539
3 sens binary 0.409
4 spec binary 1
5 ppv binary 1
6 npv binary 0.902
7 mcc binary 0.607
8 j_index binary 0.409
9 bal_accuracy binary 0.705
10 detection_prevalence binary 0.0638
11 precision binary 1
12 recall binary 0.409
13 f_meas binary 0.581
Make a plot of your confusion matrix.
autoplot(cm)Report the ROC AUC, accuracy, sensitivity, specificity, PPV, and NPV of your best model in the held out test set.
# ROC AUC
auc <- tibble(truth = feat_test$diagnosis,
prob = predict(fit_best_knn, feat_test, type = "prob")$.pred_malignant) |>
roc_auc(prob, truth = truth) |>
pull(.estimate)
# Accuracy, sensitivity, specificity, PPV, NPV
# Using pivot_wider() to make a tracking tibble
results <- cm |>
summary() |>
filter(.metric %in% c("accuracy", "sens", "spec", "ppv", "npv")) |>
select(-.estimator) |>
pivot_wider(names_from = .metric, values_from = .estimate) |>
mutate(model = str_c("knn_", fit_best_knn$fit$best.parameters$k),
roc_auc = auc) |>
select(model, roc_auc, everything())
results# A tibble: 1 × 7
model roc_auc accuracy sens spec ppv npv
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 knn_161 0.983 0.908 0.409 1 1 0.902
Part 2: Addressing class imbalance
Since only 15% of our cases our malignant, let’s see if we can achieve higher sensitivity by up-sampling our data with SMOTE. We will again select between a standard GLM vs tuned KNN using bootstrapped CV and evaluate our best model in test.
Build recipes
Update your previous recipes to up-sample the minority class (malignant) in diagnosis using step_smote(). Remember to make 2 recipes (one for GLM, one for KNN).
rec_glm_up <- recipe(diagnosis ~ ., data = data_train) |>
themis::step_smote(diagnosis, over_ratio = 1, neighbors = 5)
rec_knn_up <- recipe(diagnosis ~ ., data = data_train) |>
themis::step_smote(diagnosis, over_ratio = 1, neighbors = 5) |>
step_range(all_predictors())Fit GLM
Fit an up-sampled logistic regression classifier using the new GLM recipe you created and your bootstrap splits. Use ROC AUC as your performance metric.
fit_lr_boot_up <-
logistic_reg() |>
set_engine("glm", control = list(maxit = 100)) |>
fit_resamples(preprocessor = rec_glm_up,
resamples = splits_boot,
metrics = metric_set(roc_auc))Print the average ROC AUC of your logistic regression model
collect_metrics(fit_lr_boot_up, summarize = TRUE)# A tibble: 1 × 6
.metric .estimator mean n std_err .config
<chr> <chr> <dbl> <int> <dbl> <chr>
1 roc_auc binary 0.942 100 0.00351 pre0_mod0_post0
Fit KNN
Set up a hyperparameter grid to consider a range of values for neighbors in your KNN models.
hyper_grid <- expand.grid(neighbors = seq(1, 280, by = 10))Fit an up-sampled KNN using the new KNN recipe you created and your bootstrap splits. Use ROC AUC as your performance metric.
fit_knn_boot_up <-
nearest_neighbor(neighbors = tune()) |>
set_engine("kknn") |>
set_mode("classification") |>
tune_grid(preprocessor = rec_knn_up,
resamples = splits_boot,
grid = hyper_grid,
metrics = metric_set(roc_auc))Generate a plot to help you determine if you considered a wide enough range of values for neighbors.
plot_hyperparameters(fit_knn_boot_up, hp1 = "neighbors", metric = "roc_auc")Print the best value for the neighbors hyperparameter across resamples based on model ROC AUC.
select_best(fit_knn_boot_up, metric = "roc_auc")# A tibble: 1 × 2
neighbors .config
<dbl> <chr>
1 151 pre0_mod16_post0
Print the average ROC AUC of your best KNN regression model
collect_metrics(fit_knn_boot_up, summarize = TRUE) |>
filter(neighbors == select_best(fit_knn_boot_up, metric = "roc_auc")$neighbors)# A tibble: 1 × 7
neighbors .metric .estimator mean n std_err .config
<dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
1 151 roc_auc binary 0.994 100 0.000599 pre0_mod16_post0
Select and fit the best model
Create the up-sampled training feature matrix using your best recipe (GLM or KNN). Remember, do not upsample your test data!
rec_prep <- rec_knn_up |>
prep(data_train)
feat_train_up <- rec_prep |>
bake(new_data = NULL)Fit your best performing up-sampled model on the full training sample.
fit_best_knn_up <- nearest_neighbor(neighbors = select_best(fit_knn_boot_up, metric = "roc_auc")$neighbors) |>
set_engine("kknn") |>
set_mode("classification") |>
fit(diagnosis ~ ., data = feat_train_up)Evaluate the best model
Make a figure to plot the ROC of your best ups-ampled model in the test set.
roc_plot_up <-
tibble(truth = feat_test$diagnosis,
prob = predict(fit_best_knn_up, feat_test, type = "prob")$.pred_malignant) |>
roc_curve(prob, truth = truth)
roc_plot_up |>
ggplot(aes(x = 1 - specificity, y = sensitivity, color = .threshold)) +
geom_path() +
geom_abline(lty = 3) +
coord_equal() +
labs(x = "1 - Specificity (FPR)",
y = "Sensitivity (TPR)")Generate a confusion matrix depicting your up-sampled model’s performance in test.
cm_up <- tibble(truth = feat_test$diagnosis,
estimate = predict(fit_best_knn_up, feat_test)$.pred_class) |>
conf_mat(truth, estimate)
cm_up Truth
Prediction malignant benign
malignant 19 5
benign 3 114
Make a plot of your confusion matrix.
autoplot(cm_up)Report the ROC AUC, accuracy, sensitivity, specificity, PPV, and NPV of your best up-sampled model in the held out test set.
# ROC AUC
auc <- tibble(truth = feat_test$diagnosis,
prob = predict(fit_best_knn_up, feat_test, type = "prob")$.pred_malignant) |>
roc_auc(prob, truth = truth) |>
pull(.estimate)
# Accuracy, sensitivity, specificity, PPV, NPV
results_up <- cm_up |>
summary() |>
filter(.metric %in% c("accuracy", "sens", "spec", "ppv", "npv")) |>
select(-.estimator) |>
pivot_wider(names_from = .metric, values_from = .estimate) |>
mutate(model = str_c("knn_smote_", fit_best_knn_up$fit$best.parameters$k),
roc_auc = auc) |>
select(model, roc_auc, everything())
results <- results |>
bind_rows(results_up)
results# A tibble: 2 × 7
model roc_auc accuracy sens spec ppv npv
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 knn_161 0.983 0.908 0.409 1 1 0.902
2 knn_smote_151 0.987 0.943 0.864 0.958 0.792 0.974
Part 3: New Classification Threshold
Now you want to check if there may be an additional benefit for your model’s performance if you adjust the classification threshold from its default 50% to a threshold of 40%
1) Adjust classification threshold to 40%
Make a tibble containing the following variables -
- truth: The true values of diagnosis in your test set
- prob: The predicted probabilities made by your best up-sampled model above in the test set
- estimate_40: Binary predictions of
diagnosis(benign vs malignant) created by applying a threshold of 40% to your best model’s predicted probabilities.
preds <- tibble(truth = feat_test$diagnosis,
prob = predict(fit_best_knn_up, feat_test, type = "prob")$.pred_malignant) |>
mutate(estimate_40 = if_else(prob > .40, "malignant", "benign"),
estimate_40 = factor(estimate_40, levels = c("malignant", "benign")))
preds |> print(n=10)# A tibble: 141 × 3
truth prob estimate_40
<fct> <dbl> <fct>
1 malignant 0.447 malignant
2 benign 0.0215 benign
3 benign 0.00473 benign
4 benign 0 benign
5 malignant 0.998 malignant
6 benign 0.0252 benign
7 benign 0.149 benign
8 benign 0.00497 benign
9 benign 0.224 benign
10 benign 0.00995 benign
# ℹ 131 more rows
2) Evaluate model at new threshold
Generate a confusion matrix depicting your up-sampled model’s performance in test at your new threshold.
cm_40 <- preds |>
conf_mat(truth = truth, estimate = estimate_40)
cm_40 Truth
Prediction malignant benign
malignant 21 7
benign 1 112
Make a plot of your confusion matrix.
autoplot(cm_40)Report the ROC AUC, accuracy, sensitivity, specificity, PPV, and NPV of your best up-sampled model in the held-out test set.
# ROC AUC
auc <- tibble(truth = feat_test$diagnosis,
prob = predict(fit_best_knn_up, feat_test, type = "prob")$.pred_malignant) |>
roc_auc(prob, truth = truth) |>
pull(.estimate)
# Accuracy, sensitivity, specificity, PPV, NPV
results_40 <- cm_40 |>
summary() |>
filter(.metric %in% c("accuracy", "sens", "spec", "ppv", "npv")) |>
select(-.estimator) |>
pivot_wider(names_from = .metric, values_from = .estimate) |>
mutate(model = str_c("knn_smote_", fit_best_knn_up$fit$best.parameters$k, "_thresh_40"),
roc_auc = auc) |>
select(model, roc_auc, everything())
results <- results |>
bind_rows(results_40)
results# A tibble: 3 × 7
model roc_auc accuracy sens spec ppv npv
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 knn_161 0.983 0.908 0.409 1 1 0.902
2 knn_smote_151 0.987 0.943 0.864 0.958 0.792 0.974
3 knn_smote_151_thresh_40 0.987 0.943 0.955 0.941 0.75 0.991
Adjusting the threshold did not help performance. Sensitivity stayed the same and specificity decreased! If we look at histograms below we see why this is. The black line shows a threshold of .5 (pretty good separation of classes). On the other hand a threshold of .4 creates more false positives (benign cases labeled as malignant) without capturing any additional true positives.
ggplot(data = preds, aes(x = prob)) +
geom_histogram(bins = 15) +
facet_wrap(vars(truth), nrow = 2) +
xlab("Pr(Disease)") +
geom_vline(xintercept = .5) +
geom_vline(xintercept = .4, color = "red")Lets see what would happen if we changed threshold to .3.
ggplot(data = preds, aes(x = prob)) +
geom_histogram(bins = 15) +
facet_wrap(vars(truth), nrow = 2) +
xlab("Pr(Disease)") +
geom_vline(xintercept = .3, color = "red")preds <- tibble(truth = feat_test$diagnosis,
prob = predict(fit_best_knn_up, feat_test, type = "prob")$.pred_malignant) |>
mutate(estimate_30 = if_else(prob > .30, "malignant", "benign"),
estimate_30 = factor(estimate_30, levels = c("malignant", "benign")))
cm_30 <- preds |>
conf_mat(truth = truth, estimate = estimate_30)
results_30 <- cm_30 |>
summary() |>
filter(.metric %in% c("accuracy", "sens", "spec", "ppv", "npv")) |>
select(-.estimator) |>
pivot_wider(names_from = .metric, values_from = .estimate) |>
mutate(model = str_c("knn_smote_", fit_best_knn_up$fit$best.parameters$k, "_thresh_30"),
roc_auc = auc) |>
select(model, roc_auc, everything())
results <- results |>
bind_rows(results_30)
results# A tibble: 4 × 7
model roc_auc accuracy sens spec ppv npv
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 knn_161 0.983 0.908 0.409 1 1 0.902
2 knn_smote_151 0.987 0.943 0.864 0.958 0.792 0.974
3 knn_smote_151_thresh_40 0.987 0.943 0.955 0.941 0.75 0.991
4 knn_smote_151_thresh_30 0.987 0.922 0.955 0.916 0.677 0.991
We now have perfect sensitivity because we correctly predict all malignant cases as positive. However we now have lower specificity and ppv. However, this may be appropriate for this case since it would be better to falsely label benign cases as malignant as opposed to missing malignant cases!
✭✭✭ You are a machine learning superstar ✭✭✭