Homework Unit 8: Advanced Performance Metrics

Author

TA

Published

March 5, 2026

Introduction

To begin, download the following from the course web book (Unit 8):

  • hw_unit_8_performance.qmd (notebook for this assignment)

  • breast_cancer.csv (data for this assignment)

The data for this week’s assignment has information about breast cancer diagnoses. It contains characteristics of different breast cancer tumors and classifies the tumor as benign or malignant. Your goal is to choose among two candidate statistical algorithms (general GLM vs a tuned KNN model) to identify and evaluate the best performing model for diagnosis.

You can imagine that the consequences of missing cancerous tumors are not equal to the consequences of misdiagnosing benign tumors. In this assignment, we will explore how the performance metric and balance of diagnoses affect our evaluation of best performing model in this data.

NOTE: Fitting models in this assignment will generate some warnings having to do with glm.fit. This is to be expected, and we are going to review these warnings and some related issues in our next lab.

Let’s get started!


Setup

Set up your notebook in this section. You will want to set your conflicts policy, paths, and initiate parallel processing here!

# Necessary Setup
library(tidyverse) 
library(tidymodels)

options(conflicts.policy = "depends.ok")

# John's Functions/Scripts
source("https://github.com/jjcurtin/lab_support/blob/main/fun_ml.R?raw=true")
source("https://github.com/jjcurtin/lab_support/blob/main/fun_plots.R?raw=true")
source("https://github.com/jjcurtin/lab_support/blob/main/fun_eda.R?raw=true")

# Additional Packages
conflictRules("Matrix", mask.ok = c("expand", "pack", "unpack"))
library(future)
library(xfun, include.only = "cache_rds")

# Additional Configurations
theme_set(theme_bw())
options(tibble.width = Inf, dplyr.print_max=Inf)

# Paths
path_data <- "application_assignments/unit_08"

# Parallel Processing
plan(multisession, workers = (parallel::detectCores(logical = FALSE) -1))

Read in your data

Read in the breast_cancer.csv data file and save as an object called data_all, perform any checks needed on the data (i.e., light cleaning EDA) and set the outcome diagnosis as a factor with malignant as the first level.

data_all <- read_csv(here::here(path_data, "breast_cancer.csv"), col_types = cols()) |> 
  glimpse()
Rows: 423
Columns: 31
$ diagnosis               <chr> "malignant", "benign", "benign", "malignant", …
$ perimeter_se            <dbl> 1.9740, 1.6670, 1.4890, 2.9890, 2.6840, 1.4450…
$ fractal_dimension_mean  <dbl> 0.05986, 0.06320, 0.05828, 0.06768, 0.05934, 0…
$ concave_points_worst    <dbl> 0.12520, 0.11050, 0.03002, 0.20270, 0.06544, 0…
$ symmetry_mean           <dbl> 0.1594, 0.1886, 0.1845, 0.2157, 0.1834, 0.1514…
$ texture_se              <dbl> 0.3621, 0.7339, 1.6470, 0.9489, 0.8429, 1.0660…
$ concave_points_se       <dbl> 0.008260, 0.013040, 0.003419, 0.012710, 0.0091…
$ concavity_mean          <dbl> 0.075500, 0.070970, 0.004967, 0.169000, 0.0263…
$ fractal_dimension_se    <dbl> 0.002881, 0.001982, 0.002534, 0.003884, 0.0014…
$ radius_worst            <dbl> 17.770, 12.640, 12.360, 18.810, 12.970, 14.200…
$ concave_points_mean     <dbl> 0.040790, 0.044970, 0.006434, 0.089230, 0.0206…
$ radius_mean             <dbl> 15.120, 11.610, 11.220, 14.870, 11.500, 12.620…
$ smoothness_se           <dbl> 0.005472, 0.005884, 0.004359, 0.006985, 0.0063…
$ smoothness_worst        <dbl> 0.14910, 0.14150, 0.09994, 0.18780, 0.11830, 0…
$ symmetry_se             <dbl> 0.01523, 0.01848, 0.01916, 0.01602, 0.02292, 0…
$ radius_se               <dbl> 0.2711, 0.2456, 0.2239, 0.4266, 0.3927, 0.2449…
$ concavity_worst         <dbl> 0.33270, 0.23020, 0.02318, 0.47040, 0.08105, 0…
$ concavity_se            <dbl> 0.020390, 0.026310, 0.003223, 0.030110, 0.0124…
$ compactness_se          <dbl> 0.019190, 0.020050, 0.006813, 0.025630, 0.0106…
$ smoothness_mean         <dbl> 0.08876, 0.10880, 0.07780, 0.11620, 0.09345, 0…
$ area_se                 <dbl> 26.440, 15.890, 15.460, 41.180, 26.990, 18.510…
$ area_worst              <dbl> 989.5, 475.7, 470.9, 1095.0, 508.9, 624.0, 544…
$ perimeter_mean          <dbl> 98.78, 75.46, 70.79, 98.64, 73.28, 81.35, 79.0…
$ compactness_mean        <dbl> 0.09588, 0.11680, 0.03574, 0.16490, 0.05991, 0…
$ area_mean               <dbl> 716.6, 408.2, 386.8, 682.5, 407.4, 496.4, 466.…
$ fractal_dimension_worst <dbl> 0.09740, 0.07427, 0.07307, 0.10650, 0.06487, 0…
$ texture_mean            <dbl> 16.68, 16.02, 33.81, 16.67, 18.45, 23.97, 18.5…
$ perimeter_worst         <dbl> 117.70, 81.93, 78.44, 127.10, 83.12, 90.67, 85…
$ symmetry_worst          <dbl> 0.3415, 0.2787, 0.2911, 0.3585, 0.2740, 0.2826…
$ texture_worst           <dbl> 20.24, 19.67, 41.78, 27.37, 22.46, 31.31, 27.4…
$ compactness_worst       <dbl> 0.33310, 0.21700, 0.06885, 0.44800, 0.10490, 0…
data_all |> 
  skim_some()
Data summary
Name data_all
Number of rows 423
Number of columns 31
_______________________
Column type frequency:
character 1
numeric 30
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
diagnosis 0 1 6 9 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate p0 p100
perimeter_se 0 1 0.76 21.98
fractal_dimension_mean 0 1 0.05 0.10
concave_points_worst 0 1 0.00 0.29
symmetry_mean 0 1 0.11 0.29
texture_se 0 1 0.36 4.88
concave_points_se 0 1 0.00 0.05
concavity_mean 0 1 0.00 0.43
fractal_dimension_se 0 1 0.00 0.03
radius_worst 0 1 7.93 33.13
concave_points_mean 0 1 0.00 0.19
radius_mean 0 1 6.98 28.11
smoothness_se 0 1 0.00 0.02
smoothness_worst 0 1 0.07 0.21
symmetry_se 0 1 0.01 0.08
radius_se 0 1 0.11 2.87
concavity_worst 0 1 0.00 1.25
concavity_se 0 1 0.00 0.40
compactness_se 0 1 0.00 0.11
smoothness_mean 0 1 0.05 0.16
area_se 0 1 6.80 525.60
area_worst 0 1 185.20 3432.00
perimeter_mean 0 1 43.79 188.50
compactness_mean 0 1 0.02 0.35
area_mean 0 1 143.50 2499.00
fractal_dimension_worst 0 1 0.06 0.17
texture_mean 0 1 9.71 39.28
perimeter_worst 0 1 50.41 229.30
symmetry_worst 0 1 0.16 0.66
texture_worst 0 1 12.02 49.54
compactness_worst 0 1 0.03 0.87
data_all <- data_all |> 
  mutate(diagnosis = factor(diagnosis, levels = c("malignant", "benign"))) |> 
  glimpse()
Rows: 423
Columns: 31
$ diagnosis               <fct> malignant, benign, benign, malignant, benign, …
$ perimeter_se            <dbl> 1.9740, 1.6670, 1.4890, 2.9890, 2.6840, 1.4450…
$ fractal_dimension_mean  <dbl> 0.05986, 0.06320, 0.05828, 0.06768, 0.05934, 0…
$ concave_points_worst    <dbl> 0.12520, 0.11050, 0.03002, 0.20270, 0.06544, 0…
$ symmetry_mean           <dbl> 0.1594, 0.1886, 0.1845, 0.2157, 0.1834, 0.1514…
$ texture_se              <dbl> 0.3621, 0.7339, 1.6470, 0.9489, 0.8429, 1.0660…
$ concave_points_se       <dbl> 0.008260, 0.013040, 0.003419, 0.012710, 0.0091…
$ concavity_mean          <dbl> 0.075500, 0.070970, 0.004967, 0.169000, 0.0263…
$ fractal_dimension_se    <dbl> 0.002881, 0.001982, 0.002534, 0.003884, 0.0014…
$ radius_worst            <dbl> 17.770, 12.640, 12.360, 18.810, 12.970, 14.200…
$ concave_points_mean     <dbl> 0.040790, 0.044970, 0.006434, 0.089230, 0.0206…
$ radius_mean             <dbl> 15.120, 11.610, 11.220, 14.870, 11.500, 12.620…
$ smoothness_se           <dbl> 0.005472, 0.005884, 0.004359, 0.006985, 0.0063…
$ smoothness_worst        <dbl> 0.14910, 0.14150, 0.09994, 0.18780, 0.11830, 0…
$ symmetry_se             <dbl> 0.01523, 0.01848, 0.01916, 0.01602, 0.02292, 0…
$ radius_se               <dbl> 0.2711, 0.2456, 0.2239, 0.4266, 0.3927, 0.2449…
$ concavity_worst         <dbl> 0.33270, 0.23020, 0.02318, 0.47040, 0.08105, 0…
$ concavity_se            <dbl> 0.020390, 0.026310, 0.003223, 0.030110, 0.0124…
$ compactness_se          <dbl> 0.019190, 0.020050, 0.006813, 0.025630, 0.0106…
$ smoothness_mean         <dbl> 0.08876, 0.10880, 0.07780, 0.11620, 0.09345, 0…
$ area_se                 <dbl> 26.440, 15.890, 15.460, 41.180, 26.990, 18.510…
$ area_worst              <dbl> 989.5, 475.7, 470.9, 1095.0, 508.9, 624.0, 544…
$ perimeter_mean          <dbl> 98.78, 75.46, 70.79, 98.64, 73.28, 81.35, 79.0…
$ compactness_mean        <dbl> 0.09588, 0.11680, 0.03574, 0.16490, 0.05991, 0…
$ area_mean               <dbl> 716.6, 408.2, 386.8, 682.5, 407.4, 496.4, 466.…
$ fractal_dimension_worst <dbl> 0.09740, 0.07427, 0.07307, 0.10650, 0.06487, 0…
$ texture_mean            <dbl> 16.68, 16.02, 33.81, 16.67, 18.45, 23.97, 18.5…
$ perimeter_worst         <dbl> 117.70, 81.93, 78.44, 127.10, 83.12, 90.67, 85…
$ symmetry_worst          <dbl> 0.3415, 0.2787, 0.2911, 0.3585, 0.2740, 0.2826…
$ texture_worst           <dbl> 20.24, 19.67, 41.78, 27.37, 22.46, 31.31, 27.4…
$ compactness_worst       <dbl> 0.33310, 0.21700, 0.06885, 0.44800, 0.10490, 0…
data_all |> 
  skim_some() |> 
  skimr::yank("factor")

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
diagnosis 0 1 FALSE 2 ben: 357, mal: 66

Print a table to see the balance of positive and negative diagnosis cases.

data_all |> 
  janitor::tabyl(diagnosis)
 diagnosis   n   percent
 malignant  66 0.1560284
    benign 357 0.8439716

What do you notice about the distribution of your outcome variable? Do you have any concerns? Write at least 2 sentences.

Our outcome variable is unbalanced (357:66). This could create problems with the Positive Predictive Value of our model and yield lower sensitivity (i.e., true positives).

Split data into train and test

Hold out 1/3 of the data as a test set for evaluation using the initial_split() function. Use the provided seed. Stratify on diagnosis.

set.seed(02182002)

splits_test <- data_all |>  
  initial_split(prop = 2/3, strata = "diagnosis")

data_train <- splits_test |> 
  analysis()

data_test <- splits_test |> 
  assessment()

Light Modeling EDA

Look at correlations between predictors in data_train.

data_train |> 
  select(where(is.numeric)) |> 
  cor() |> 
  corrplot::corrplot(type = "upper")

Visualize the variance of your predictors in data_train.

data_train |> 
  select(where(is.numeric)) |> 
  names() |> 
  map(\(name) plot_box_violin(df = data_train, x = name)) |> 
  cowplot::plot_grid(plotlist = _, ncol = 5)

Answer the following questions with at least 2 sentences!

Why should you be looking at variance of your predictors?

Looking at the variability of our predictors allows us to examine the variable distribution and flag any potential outlier values. Examining the variable distribution is especially important here since we are considering fitting a linear model, and linear models assume that our predictors are normally distributed. Outlier values can be problematic for all types of models if they have high influence and impact the shape of the relationship between our predictors and the outcome.

If you had concerns about the variance of your predictors, what would you do?

Transformations (e.g. log, Yeo Johnson) can be applied to normalize the distributional shape of predictors. Options for handling outliers include excluding outlier values that are clearly errors/invalid, retaining outliers that we have no evidence are incorrect, or bringing outlier values to the fence.

Do you have concerns about the variance of your predictors in these data?

Almost all variables are very positively skewed, with a handful of variables (e.g. concave_points_se, perimeter_se, area_se… basically all _se variables) having a few points with unusually high values that appear separated from the distribution and may be outliers.

GLM vs KNN

In this part of this assignment, you will compare the performance of a standard GLM model vs a KNN model (tuned for neighbors) for predicting breast cancer diagnoses from all available predictors. You will choose between these models using bootstrapped resampling, and evaluate the final performance of your model in the held out test set created earlier in this script. You will now select and evaluate models using ROC AUC instead of accuracy.

Bootstrap splits

Split data_train into 100 bootstrap samples stratified on diagnosis. Use the provided seed.

set.seed(02182002)

splits_boot <- data_train |> 
  bootstraps(times = 100, strata = "diagnosis") 

splits_boot |> print(n=10)
# Bootstrap sampling using stratification 
# A tibble: 100 × 2
   splits            id          
   <list>            <chr>       
 1 <split [282/101]> Bootstrap001
 2 <split [282/105]> Bootstrap002
 3 <split [282/94]>  Bootstrap003
 4 <split [282/104]> Bootstrap004
 5 <split [282/112]> Bootstrap005
 6 <split [282/105]> Bootstrap006
 7 <split [282/94]>  Bootstrap007
 8 <split [282/105]> Bootstrap008
 9 <split [282/95]>  Bootstrap009
10 <split [282/107]> Bootstrap010
# ℹ 90 more rows

Build recipes

Write 2 recipes (one for GLM, one for KNN) to predict breast cancer diagnosis from all predictors in data_train. Include the minimal necessary steps for each algorithm, including what you learned from your light EDA above.

rec_glm <- recipe(diagnosis ~ ., data = data_train)

rec_knn <- recipe(diagnosis ~ ., data = data_train) |> 
  step_range(all_predictors())

Fit GLM

Fit a logistic regression classifier using the recipe you created and your bootstrap splits. Use ROC AUC (roc_auc) as your performance metric.

fit_lr_boot <-
  logistic_reg() |> 
  set_engine("glm") |> 
  fit_resamples(preprocessor = rec_glm, 
                resamples = splits_boot, 
                metrics = metric_set(roc_auc))
→ A | warning: glm.fit: algorithm did not converge
→ B | warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
→ A | warning: glm.fit: algorithm did not converge
→ B | warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
→ A | warning: glm.fit: algorithm did not converge
→ B | warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
→ A | warning: glm.fit: algorithm did not converge
→ B | warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
→ A | warning: glm.fit: algorithm did not converge
→ B | warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
→ A | warning: glm.fit: algorithm did not converge
→ B | warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
→ A | warning: glm.fit: algorithm did not converge
→ B | warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
→ A | warning: glm.fit: algorithm did not converge
→ B | warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
→ A | warning: glm.fit: algorithm did not converge
→ B | warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
# Fitting our bootstrap splits in this way appears to work just fine. However, upon closer look we see that there were some warnings. First lets look for the warnings by calling the fit object directly:

fit_lr_boot
# Resampling results
# Bootstrap sampling using stratification 
# A tibble: 100 × 4
    splits            id           .metrics         .notes          
    <list>            <chr>        <list>           <list>          
  1 <split [282/101]> Bootstrap001 <tibble [1 × 4]> <tibble [2 × 4]>
  2 <split [282/105]> Bootstrap002 <tibble [1 × 4]> <tibble [2 × 4]>
  3 <split [282/94]>  Bootstrap003 <tibble [1 × 4]> <tibble [2 × 4]>
  4 <split [282/104]> Bootstrap004 <tibble [1 × 4]> <tibble [2 × 4]>
  5 <split [282/112]> Bootstrap005 <tibble [1 × 4]> <tibble [2 × 4]>
  6 <split [282/105]> Bootstrap006 <tibble [1 × 4]> <tibble [2 × 4]>
  7 <split [282/94]>  Bootstrap007 <tibble [1 × 4]> <tibble [2 × 4]>
  8 <split [282/105]> Bootstrap008 <tibble [1 × 4]> <tibble [2 × 4]>
  9 <split [282/95]>  Bootstrap009 <tibble [1 × 4]> <tibble [2 × 4]>
 10 <split [282/107]> Bootstrap010 <tibble [1 × 4]> <tibble [2 × 4]>
 11 <split [282/99]>  Bootstrap011 <tibble [1 × 4]> <tibble [2 × 4]>
 12 <split [282/93]>  Bootstrap012 <tibble [1 × 4]> <tibble [2 × 4]>
 13 <split [282/98]>  Bootstrap013 <tibble [1 × 4]> <tibble [2 × 4]>
 14 <split [282/94]>  Bootstrap014 <tibble [1 × 4]> <tibble [2 × 4]>
 15 <split [282/97]>  Bootstrap015 <tibble [1 × 4]> <tibble [2 × 4]>
 16 <split [282/104]> Bootstrap016 <tibble [1 × 4]> <tibble [2 × 4]>
 17 <split [282/101]> Bootstrap017 <tibble [1 × 4]> <tibble [2 × 4]>
 18 <split [282/109]> Bootstrap018 <tibble [1 × 4]> <tibble [2 × 4]>
 19 <split [282/100]> Bootstrap019 <tibble [1 × 4]> <tibble [2 × 4]>
 20 <split [282/103]> Bootstrap020 <tibble [1 × 4]> <tibble [2 × 4]>
 21 <split [282/95]>  Bootstrap021 <tibble [1 × 4]> <tibble [2 × 4]>
 22 <split [282/105]> Bootstrap022 <tibble [1 × 4]> <tibble [2 × 4]>
 23 <split [282/108]> Bootstrap023 <tibble [1 × 4]> <tibble [2 × 4]>
 24 <split [282/119]> Bootstrap024 <tibble [1 × 4]> <tibble [2 × 4]>
 25 <split [282/110]> Bootstrap025 <tibble [1 × 4]> <tibble [2 × 4]>
 26 <split [282/118]> Bootstrap026 <tibble [1 × 4]> <tibble [2 × 4]>
 27 <split [282/104]> Bootstrap027 <tibble [1 × 4]> <tibble [2 × 4]>
 28 <split [282/106]> Bootstrap028 <tibble [1 × 4]> <tibble [2 × 4]>
 29 <split [282/106]> Bootstrap029 <tibble [1 × 4]> <tibble [2 × 4]>
 30 <split [282/95]>  Bootstrap030 <tibble [1 × 4]> <tibble [2 × 4]>
 31 <split [282/105]> Bootstrap031 <tibble [1 × 4]> <tibble [2 × 4]>
 32 <split [282/100]> Bootstrap032 <tibble [1 × 4]> <tibble [2 × 4]>
 33 <split [282/99]>  Bootstrap033 <tibble [1 × 4]> <tibble [2 × 4]>
 34 <split [282/105]> Bootstrap034 <tibble [1 × 4]> <tibble [2 × 4]>
 35 <split [282/102]> Bootstrap035 <tibble [1 × 4]> <tibble [2 × 4]>
 36 <split [282/111]> Bootstrap036 <tibble [1 × 4]> <tibble [2 × 4]>
 37 <split [282/96]>  Bootstrap037 <tibble [1 × 4]> <tibble [2 × 4]>
 38 <split [282/102]> Bootstrap038 <tibble [1 × 4]> <tibble [2 × 4]>
 39 <split [282/111]> Bootstrap039 <tibble [1 × 4]> <tibble [2 × 4]>
 40 <split [282/112]> Bootstrap040 <tibble [1 × 4]> <tibble [2 × 4]>
 41 <split [282/110]> Bootstrap041 <tibble [1 × 4]> <tibble [2 × 4]>
 42 <split [282/101]> Bootstrap042 <tibble [1 × 4]> <tibble [2 × 4]>
 43 <split [282/101]> Bootstrap043 <tibble [1 × 4]> <tibble [2 × 4]>
 44 <split [282/102]> Bootstrap044 <tibble [1 × 4]> <tibble [2 × 4]>
 45 <split [282/111]> Bootstrap045 <tibble [1 × 4]> <tibble [2 × 4]>
 46 <split [282/111]> Bootstrap046 <tibble [1 × 4]> <tibble [2 × 4]>
 47 <split [282/105]> Bootstrap047 <tibble [1 × 4]> <tibble [2 × 4]>
 48 <split [282/97]>  Bootstrap048 <tibble [1 × 4]> <tibble [2 × 4]>
 49 <split [282/103]> Bootstrap049 <tibble [1 × 4]> <tibble [2 × 4]>
 50 <split [282/100]> Bootstrap050 <tibble [1 × 4]> <tibble [2 × 4]>
 51 <split [282/103]> Bootstrap051 <tibble [1 × 4]> <tibble [2 × 4]>
 52 <split [282/103]> Bootstrap052 <tibble [1 × 4]> <tibble [2 × 4]>
 53 <split [282/109]> Bootstrap053 <tibble [1 × 4]> <tibble [2 × 4]>
 54 <split [282/106]> Bootstrap054 <tibble [1 × 4]> <tibble [2 × 4]>
 55 <split [282/107]> Bootstrap055 <tibble [1 × 4]> <tibble [2 × 4]>
 56 <split [282/96]>  Bootstrap056 <tibble [1 × 4]> <tibble [2 × 4]>
 57 <split [282/105]> Bootstrap057 <tibble [1 × 4]> <tibble [2 × 4]>
 58 <split [282/105]> Bootstrap058 <tibble [1 × 4]> <tibble [2 × 4]>
 59 <split [282/101]> Bootstrap059 <tibble [1 × 4]> <tibble [2 × 4]>
 60 <split [282/100]> Bootstrap060 <tibble [1 × 4]> <tibble [2 × 4]>
 61 <split [282/96]>  Bootstrap061 <tibble [1 × 4]> <tibble [2 × 4]>
 62 <split [282/102]> Bootstrap062 <tibble [1 × 4]> <tibble [2 × 4]>
 63 <split [282/112]> Bootstrap063 <tibble [1 × 4]> <tibble [2 × 4]>
 64 <split [282/115]> Bootstrap064 <tibble [1 × 4]> <tibble [2 × 4]>
 65 <split [282/105]> Bootstrap065 <tibble [1 × 4]> <tibble [2 × 4]>
 66 <split [282/113]> Bootstrap066 <tibble [1 × 4]> <tibble [2 × 4]>
 67 <split [282/107]> Bootstrap067 <tibble [1 × 4]> <tibble [2 × 4]>
 68 <split [282/105]> Bootstrap068 <tibble [1 × 4]> <tibble [2 × 4]>
 69 <split [282/99]>  Bootstrap069 <tibble [1 × 4]> <tibble [2 × 4]>
 70 <split [282/102]> Bootstrap070 <tibble [1 × 4]> <tibble [2 × 4]>
 71 <split [282/95]>  Bootstrap071 <tibble [1 × 4]> <tibble [2 × 4]>
 72 <split [282/97]>  Bootstrap072 <tibble [1 × 4]> <tibble [2 × 4]>
 73 <split [282/108]> Bootstrap073 <tibble [1 × 4]> <tibble [2 × 4]>
 74 <split [282/115]> Bootstrap074 <tibble [1 × 4]> <tibble [2 × 4]>
 75 <split [282/105]> Bootstrap075 <tibble [1 × 4]> <tibble [2 × 4]>
 76 <split [282/99]>  Bootstrap076 <tibble [1 × 4]> <tibble [2 × 4]>
 77 <split [282/106]> Bootstrap077 <tibble [1 × 4]> <tibble [2 × 4]>
 78 <split [282/109]> Bootstrap078 <tibble [1 × 4]> <tibble [2 × 4]>
 79 <split [282/109]> Bootstrap079 <tibble [1 × 4]> <tibble [2 × 4]>
 80 <split [282/103]> Bootstrap080 <tibble [1 × 4]> <tibble [2 × 4]>
 81 <split [282/101]> Bootstrap081 <tibble [1 × 4]> <tibble [2 × 4]>
 82 <split [282/105]> Bootstrap082 <tibble [1 × 4]> <tibble [2 × 4]>
 83 <split [282/100]> Bootstrap083 <tibble [1 × 4]> <tibble [2 × 4]>
 84 <split [282/98]>  Bootstrap084 <tibble [1 × 4]> <tibble [2 × 4]>
 85 <split [282/100]> Bootstrap085 <tibble [1 × 4]> <tibble [2 × 4]>
 86 <split [282/102]> Bootstrap086 <tibble [1 × 4]> <tibble [2 × 4]>
 87 <split [282/97]>  Bootstrap087 <tibble [1 × 4]> <tibble [2 × 4]>
 88 <split [282/110]> Bootstrap088 <tibble [1 × 4]> <tibble [2 × 4]>
 89 <split [282/106]> Bootstrap089 <tibble [1 × 4]> <tibble [2 × 4]>
 90 <split [282/96]>  Bootstrap090 <tibble [1 × 4]> <tibble [2 × 4]>
 91 <split [282/111]> Bootstrap091 <tibble [1 × 4]> <tibble [2 × 4]>
 92 <split [282/104]> Bootstrap092 <tibble [1 × 4]> <tibble [2 × 4]>
 93 <split [282/102]> Bootstrap093 <tibble [1 × 4]> <tibble [2 × 4]>
 94 <split [282/97]>  Bootstrap094 <tibble [1 × 4]> <tibble [2 × 4]>
 95 <split [282/97]>  Bootstrap095 <tibble [1 × 4]> <tibble [2 × 4]>
 96 <split [282/96]>  Bootstrap096 <tibble [1 × 4]> <tibble [2 × 4]>
 97 <split [282/101]> Bootstrap097 <tibble [1 × 4]> <tibble [2 × 4]>
 98 <split [282/108]> Bootstrap098 <tibble [1 × 4]> <tibble [2 × 4]>
 99 <split [282/102]> Bootstrap099 <tibble [1 × 4]> <tibble [2 × 4]>
100 <split [282/111]> Bootstrap100 <tibble [1 × 4]> <tibble [2 × 4]>

There were issues with some computations:

  - Warning(s) x100: glm.fit: algorithm did not converge
  - Warning(s) x100: glm.fit: fitted probabilities numerically 0 or 1 occurred

Run `show_notes(.Last.tune.result)` for more information.
# This tells us that we should use: show_notes(.Last.tune.result). So let’s run that:

show_notes(.Last.tune.result)
unique notes:
───────────────────────────────────
glm.fit: algorithm did not converge
───────────────────────────────────
glm.fit: fitted probabilities numerically 0 or 1 occurred

In this case, it just recapitulates the same message that presented by calling the object. The bigger point here is: always inspect error and warning messages.

We are getting a warning message about our fitted probabilities. This occurs because the outcome is totally separable (remember that issue from our discussions of logistic regression earlier). This model can still be used for prediction but we wouldn’t be able to use the statistical tests of the parameter estimates. Those are not our focus here.

We are going to fit this model on the whole sample so we can show you why we get this warning message. When we aren’t bootstrapping, the warning is provided when we fit.

# make feature matrix
rec_prep <- rec_glm |> 
  prep(data_train)

feat_train_glm <- rec_prep |> 
  bake(data_train)


# fit on full training sample
fit_best_glm <- logistic_reg() |> 
  set_engine("glm", control = list(maxit = 100)) |> 
  fit(diagnosis ~ ., data = feat_train_glm)
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
fit_best_glm$fit |> summary()

Call:
stats::glm(formula = diagnosis ~ ., family = stats::binomial, 
    data = data, control = ~list(maxit = 100))

Coefficients:
                          Estimate Std. Error z value Pr(>|z|)
(Intercept)              1.094e+02  9.229e+06       0        1
perimeter_se            -2.293e+01  7.566e+05       0        1
fractal_dimension_mean   1.167e+03  1.042e+08       0        1
concave_points_worst    -4.310e+02  1.090e+07       0        1
symmetry_mean            8.222e+01  1.742e+07       0        1
texture_se              -7.060e-01  9.537e+05       0        1
concave_points_se        1.336e+03  1.111e+08       0        1
concavity_mean          -5.896e+02  2.130e+07       0        1
fractal_dimension_se     7.799e+03  4.973e+08       0        1
radius_worst             3.012e+01  1.755e+06       0        1
concave_points_mean     -4.427e+02  3.349e+07       0        1
radius_mean              1.293e+01  5.222e+06       0        1
smoothness_se            5.648e+03  2.155e+08       0        1
smoothness_worst        -6.220e+02  1.873e+07       0        1
symmetry_se              3.178e+02  7.981e+07       0        1
radius_se               -2.286e+02  1.283e+07       0        1
concavity_worst         -6.620e+01  4.531e+06       0        1
concavity_se             3.150e+02  4.663e+07       0        1
compactness_se          -1.106e+03  5.112e+07       0        1
smoothness_mean         -2.835e+02  3.690e+07       0        1
area_se                  2.950e+00  1.117e+05       0        1
area_worst              -5.191e-01  1.212e+04       0        1
perimeter_mean          -6.444e+00  7.119e+05       0        1
compactness_mean         7.326e+02  2.785e+07       0        1
area_mean                2.901e-01  1.353e+04       0        1
fractal_dimension_worst -1.641e+03  6.941e+07       0        1
texture_mean            -3.705e+00  1.069e+05       0        1
perimeter_worst          2.604e+00  1.195e+05       0        1
symmetry_worst          -8.095e+01  1.207e+07       0        1
texture_worst            6.892e-01  1.142e+05       0        1
compactness_worst        2.856e+02  7.002e+06       0        1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2.4423e+02  on 281  degrees of freedom
Residual deviance: 5.8048e-10  on 251  degrees of freedom
AIC: 62

Number of Fisher Scoring iterations: 28

Let’s take a closer look at the predictions our model is making.

# probabilities in train
predict(fit_best_glm, feat_train_glm, type = "prob") |> 
  print(n = 50)
# A tibble: 282 × 2
   .pred_malignant .pred_benign
             <dbl>        <dbl>
 1        2.22e-16        1    
 2        9.23e-12        1.000
 3        2.22e-16        1    
 4        2.22e-16        1    
 5        2.22e-16        1    
 6        2.22e-16        1    
 7        4.03e-12        1.000
 8        2.22e-16        1    
 9        2.22e-16        1    
10        2.22e-16        1    
11        2.22e-16        1    
12        2.22e-16        1    
13        2.74e-11        1.000
14        2.22e-16        1    
15        2.22e-16        1    
16        2.22e-16        1    
17        2.22e-16        1    
18        2.22e-16        1    
19        2.22e-16        1    
20        2.22e-16        1    
21        2.22e-16        1    
22        2.22e-16        1    
23        2.22e-16        1    
24        2.22e-16        1    
25        2.22e-16        1    
26        2.22e-16        1    
27        2.22e-16        1    
28        2.22e-16        1    
29        2.22e-16        1    
30        2.22e-16        1    
31        2.22e-16        1    
32        2.22e-16        1    
33        2.22e-16        1    
34        2.22e-16        1    
35        2.22e-16        1    
36        2.22e-16        1    
37        2.22e-16        1    
38        2.22e-16        1    
39        2.22e-16        1    
40        2.22e-16        1    
41        1.16e-11        1.000
42        2.22e-16        1    
43        1.07e-11        1.000
44        2.22e-16        1    
45        2.22e-16        1    
46        2.22e-16        1    
47        2.22e-16        1    
48        2.22e-16        1    
49        1.16e-11        1.000
50        2.22e-16        1    
# ℹ 232 more rows
# confusion matrix for train
cm_trn <- tibble(truth = feat_train_glm$diagnosis,
                 estimate = predict(fit_best_glm, feat_train_glm)$.pred_class) |> 
  conf_mat(truth, estimate)

cm_trn
           Truth
Prediction  malignant benign
  malignant        44      0
  benign            0    238
cm_trn |> 
  summary()
# A tibble: 13 × 3
   .metric              .estimator .estimate
   <chr>                <chr>          <dbl>
 1 accuracy             binary         1    
 2 kap                  binary         1    
 3 sens                 binary         1    
 4 spec                 binary         1    
 5 ppv                  binary         1    
 6 npv                  binary         1    
 7 mcc                  binary         1    
 8 j_index              binary         1    
 9 bal_accuracy         binary         1    
10 detection_prevalence binary         0.156
11 precision            binary         1    
12 recall               binary         1    
13 f_meas               binary         1    

Here we can see the cause of our warning message: Our data is perfectly predicting the classes (“perfect separation”) of the outcome in train (as seen from the confusion matrix above). This likely occurs because we can predict the outcome really well with these features in held out data (see below) and there is some overfitting in training which moves us from very good prediction to perfect prediction in training. Since we have confirmed that the warning message makes sense to us here, it is OK to progress.

Print the average ROC AUC of your logistic regression model.

collect_metrics(fit_lr_boot, summarize = TRUE)
# A tibble: 1 × 6
  .metric .estimator  mean     n std_err .config        
  <chr>   <chr>      <dbl> <int>   <dbl> <chr>          
1 roc_auc binary     0.959   100 0.00302 pre0_mod0_post0

You might be wondering why our performance is not perfect across bootstrapped samples (we got the error for each of the 100 bootstraps above, which means our model always predicted perfectly in train). For demonstration purposes, we will assess this model in test here as well.

# test feature matrix 
feat_test_glm <- rec_prep |> 
  bake(data_test)


# confusion matrix for test
cm_test <- tibble(truth = feat_test_glm$diagnosis,
                  estimate = predict(fit_best_glm, feat_test_glm)$.pred_class) |> 
  conf_mat(truth, estimate)

cm_test
           Truth
Prediction  malignant benign
  malignant        20      1
  benign            2    118

Performance is still pretty good here, but due to the ratio of number of predictors relative to our low training cases across bootstrapped folds, our models are likely able to perfectly fit our training data every time, but do not always echo this same performance in their relative test sets.

You should always check for these sorts of warnings by printing the model fit object.

Fit KNN

Set up a hyperparameter grid to consider a range of values for neighbors in your KNN models. NOTE be sure that you are not considering values higher than the N of your training sample! In this dataset N = 282.

hyper_grid <- expand.grid(neighbors = seq(1, 280, by = 10))

Fit a KNN model using the recipe you created and your bootstrap splits. Use ROC AUC (roc_auc) as your performance metric.

fit_knn_boot <-
  nearest_neighbor(neighbors = tune()) |> 
  set_engine("kknn") |> 
  set_mode("classification") |>
  tune_grid(preprocessor = rec_knn, 
            resamples = splits_boot, 
            grid = hyper_grid, 
            metrics = metric_set(roc_auc))

Generate a plot to help you determine if you considered a wide enough range of values for neighbors.

plot_hyperparameters(fit_knn_boot, hp1 = "neighbors", metric = "roc_auc")

Print the best value for the neighbors hyperparameter across resamples based on model ROC AUC.

select_best(fit_knn_boot, metric = "roc_auc")
# A tibble: 1 × 2
  neighbors .config         
      <dbl> <chr>           
1       161 pre0_mod17_post0

Print the average ROC AUC of your best KNN regression model

collect_metrics(fit_knn_boot, summarize = TRUE) |> 
  filter(neighbors == select_best(fit_knn_boot, metric = "roc_auc")$neighbors)
# A tibble: 1 × 7
  neighbors .metric .estimator  mean     n  std_err .config         
      <dbl> <chr>   <chr>      <dbl> <int>    <dbl> <chr>           
1       161 roc_auc binary     0.995   100 0.000492 pre0_mod17_post0

Select and fit best model

Now you will select your best model configuration among the various KNN and GLM models based on overall ROC AUC and train it on your full training sample.

Create training (feat_train) and test (feat_test) feature matrices using your best recipe (GLM or KNN)

rec_prep <- rec_knn |> 
  prep(data_train)

feat_train <- rec_prep |> 
  bake(data_train)

feat_test <- rec_prep |> 
  bake(data_test)

Fit your best performing model on the full training sample (feat_train).

fit_best_knn <- nearest_neighbor(neighbors = select_best(fit_knn_boot, metric = "roc_auc")$neighbors) |> 
  set_engine("kknn") |> 
  set_mode("classification") |> 
  fit(diagnosis ~ ., data = feat_train)

Evaluate the best model

Make a figure to plot the ROC of your best model in the test set.

roc_plot <- 
  tibble(truth = feat_test$diagnosis,
         prob = predict(fit_best_knn, feat_test, type = "prob")$.pred_malignant) |> 
  roc_curve(prob, truth = truth)

roc_plot |>
  ggplot(aes(x = 1 - specificity, y = sensitivity, color = .threshold)) +
  geom_path() +
  geom_abline(lty = 3) +
  coord_equal() +
  labs(x = "1 - Specificity (FPR)",
       y = "Sensitivity (TPR)")

Generate a confusion matrix depicting your model’s performance in test.

cm <- tibble(truth = feat_test$diagnosis,
             estimate = predict(fit_best_knn, feat_test)$.pred_class) |> 
  conf_mat(truth, estimate)

cm
           Truth
Prediction  malignant benign
  malignant         9      0
  benign           13    119
cm |> 
  summary()
# A tibble: 13 × 3
   .metric              .estimator .estimate
   <chr>                <chr>          <dbl>
 1 accuracy             binary        0.908 
 2 kap                  binary        0.539 
 3 sens                 binary        0.409 
 4 spec                 binary        1     
 5 ppv                  binary        1     
 6 npv                  binary        0.902 
 7 mcc                  binary        0.607 
 8 j_index              binary        0.409 
 9 bal_accuracy         binary        0.705 
10 detection_prevalence binary        0.0638
11 precision            binary        1     
12 recall               binary        0.409 
13 f_meas               binary        0.581 

Make a plot of your confusion matrix.

autoplot(cm)

Report the ROC AUC, accuracy, sensitivity, specificity, PPV, and NPV of your best model in the held out test set.

# ROC AUC
auc <- tibble(truth = feat_test$diagnosis,
       prob = predict(fit_best_knn, feat_test, type = "prob")$.pred_malignant) |> 
  roc_auc(prob, truth = truth) |> 
  pull(.estimate)

# Accuracy, sensitivity, specificity, PPV, NPV
# Using pivot_wider() to make a tracking tibble
results <- cm |> 
  summary() |> 
  filter(.metric %in% c("accuracy", "sens", "spec", "ppv", "npv")) |> 
  select(-.estimator) |> 
  pivot_wider(names_from = .metric, values_from = .estimate) |> 
  mutate(model = str_c("knn_", fit_best_knn$fit$best.parameters$k),
         roc_auc = auc) |> 
  select(model, roc_auc, everything())

results
# A tibble: 1 × 7
  model   roc_auc accuracy  sens  spec   ppv   npv
  <chr>     <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl>
1 knn_161   0.983    0.908 0.409     1     1 0.902

Part 2: Addressing class imbalance

Since only 15% of our cases our malignant, let’s see if we can achieve higher sensitivity by up-sampling our data with SMOTE. We will again select between a standard GLM vs tuned KNN using bootstrapped CV and evaluate our best model in test.

Build recipes

Update your previous recipes to up-sample the minority class (malignant) in diagnosis using step_smote(). Remember to make 2 recipes (one for GLM, one for KNN).

rec_glm_up <- recipe(diagnosis ~ ., data = data_train) |> 
  themis::step_smote(diagnosis, over_ratio = 1, neighbors = 5)

rec_knn_up <- recipe(diagnosis ~ ., data = data_train) |> 
  themis::step_smote(diagnosis, over_ratio = 1, neighbors = 5) |> 
  step_range(all_predictors())

Fit GLM

Fit an up-sampled logistic regression classifier using the new GLM recipe you created and your bootstrap splits. Use ROC AUC as your performance metric.

fit_lr_boot_up <-
  logistic_reg() |> 
  set_engine("glm",  control = list(maxit = 100)) |> 
  fit_resamples(preprocessor = rec_glm_up, 
                resamples = splits_boot, 
                metrics = metric_set(roc_auc))

Print the average ROC AUC of your logistic regression model

collect_metrics(fit_lr_boot_up, summarize = TRUE)
# A tibble: 1 × 6
  .metric .estimator  mean     n std_err .config        
  <chr>   <chr>      <dbl> <int>   <dbl> <chr>          
1 roc_auc binary     0.942   100 0.00351 pre0_mod0_post0

Fit KNN

Set up a hyperparameter grid to consider a range of values for neighbors in your KNN models.

hyper_grid <- expand.grid(neighbors = seq(1, 280, by = 10))

Fit an up-sampled KNN using the new KNN recipe you created and your bootstrap splits. Use ROC AUC as your performance metric.

fit_knn_boot_up <-
  nearest_neighbor(neighbors = tune()) |> 
  set_engine("kknn") |> 
  set_mode("classification") |>
  tune_grid(preprocessor = rec_knn_up, 
            resamples = splits_boot, 
            grid = hyper_grid, 
            metrics = metric_set(roc_auc))

Generate a plot to help you determine if you considered a wide enough range of values for neighbors.

plot_hyperparameters(fit_knn_boot_up, hp1 = "neighbors", metric = "roc_auc")

Print the best value for the neighbors hyperparameter across resamples based on model ROC AUC.

select_best(fit_knn_boot_up, metric = "roc_auc")
# A tibble: 1 × 2
  neighbors .config         
      <dbl> <chr>           
1       151 pre0_mod16_post0

Print the average ROC AUC of your best KNN regression model

collect_metrics(fit_knn_boot_up, summarize = TRUE) |> 
  filter(neighbors == select_best(fit_knn_boot_up, metric = "roc_auc")$neighbors)
# A tibble: 1 × 7
  neighbors .metric .estimator  mean     n  std_err .config         
      <dbl> <chr>   <chr>      <dbl> <int>    <dbl> <chr>           
1       151 roc_auc binary     0.994   100 0.000599 pre0_mod16_post0

Select and fit the best model

Create the up-sampled training feature matrix using your best recipe (GLM or KNN). Remember, do not upsample your test data!

rec_prep <- rec_knn_up |> 
  prep(data_train)

feat_train_up <- rec_prep |> 
  bake(new_data = NULL)

Fit your best performing up-sampled model on the full training sample.

fit_best_knn_up <- nearest_neighbor(neighbors = select_best(fit_knn_boot_up, metric = "roc_auc")$neighbors) |> 
  set_engine("kknn") |> 
  set_mode("classification") |> 
  fit(diagnosis ~ ., data = feat_train_up)

Evaluate the best model

Make a figure to plot the ROC of your best ups-ampled model in the test set.

roc_plot_up <- 
  tibble(truth = feat_test$diagnosis,
         prob = predict(fit_best_knn_up, feat_test, type = "prob")$.pred_malignant) |> 
  roc_curve(prob, truth = truth)

roc_plot_up |>
  ggplot(aes(x = 1 - specificity, y = sensitivity, color = .threshold)) +
  geom_path() +
  geom_abline(lty = 3) +
  coord_equal() +
  labs(x = "1 - Specificity (FPR)",
       y = "Sensitivity (TPR)")

Generate a confusion matrix depicting your up-sampled model’s performance in test.

cm_up <- tibble(truth = feat_test$diagnosis,
             estimate = predict(fit_best_knn_up, feat_test)$.pred_class) |> 
  conf_mat(truth, estimate)

cm_up
           Truth
Prediction  malignant benign
  malignant        19      5
  benign            3    114

Make a plot of your confusion matrix.

autoplot(cm_up)

Report the ROC AUC, accuracy, sensitivity, specificity, PPV, and NPV of your best up-sampled model in the held out test set.

# ROC AUC
auc <- tibble(truth = feat_test$diagnosis,
       prob = predict(fit_best_knn_up, feat_test, type = "prob")$.pred_malignant) |> 
  roc_auc(prob, truth = truth) |> 
  pull(.estimate)

# Accuracy, sensitivity, specificity, PPV, NPV
results_up <- cm_up |> 
  summary() |> 
  filter(.metric %in% c("accuracy", "sens", "spec", "ppv", "npv")) |> 
  select(-.estimator) |> 
  pivot_wider(names_from = .metric, values_from = .estimate) |> 
  mutate(model = str_c("knn_smote_", fit_best_knn_up$fit$best.parameters$k),
         roc_auc = auc) |> 
  select(model, roc_auc, everything())

results <- results |> 
  bind_rows(results_up)

results
# A tibble: 2 × 7
  model         roc_auc accuracy  sens  spec   ppv   npv
  <chr>           <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl>
1 knn_161         0.983    0.908 0.409 1     1     0.902
2 knn_smote_151   0.987    0.943 0.864 0.958 0.792 0.974

Part 3: New Classification Threshold

Now you want to check if there may be an additional benefit for your model’s performance if you adjust the classification threshold from its default 50% to a threshold of 40%

1) Adjust classification threshold to 40%

Make a tibble containing the following variables -

  • truth: The true values of diagnosis in your test set
  • prob: The predicted probabilities made by your best up-sampled model above in the test set
  • estimate_40: Binary predictions of diagnosis (benign vs malignant) created by applying a threshold of 40% to your best model’s predicted probabilities.
preds <- tibble(truth = feat_test$diagnosis,
                prob = predict(fit_best_knn_up, feat_test, type = "prob")$.pred_malignant) |> 
    mutate(estimate_40 = if_else(prob > .40, "malignant", "benign"),
         estimate_40 = factor(estimate_40, levels = c("malignant", "benign"))) 

preds |> print(n=10)
# A tibble: 141 × 3
   truth        prob estimate_40
   <fct>       <dbl> <fct>      
 1 malignant 0.447   malignant  
 2 benign    0.0215  benign     
 3 benign    0.00473 benign     
 4 benign    0       benign     
 5 malignant 0.998   malignant  
 6 benign    0.0252  benign     
 7 benign    0.149   benign     
 8 benign    0.00497 benign     
 9 benign    0.224   benign     
10 benign    0.00995 benign     
# ℹ 131 more rows

2) Evaluate model at new threshold

Generate a confusion matrix depicting your up-sampled model’s performance in test at your new threshold.

cm_40 <- preds |> 
  conf_mat(truth = truth, estimate = estimate_40)

cm_40
           Truth
Prediction  malignant benign
  malignant        21      7
  benign            1    112

Make a plot of your confusion matrix.

autoplot(cm_40)

Report the ROC AUC, accuracy, sensitivity, specificity, PPV, and NPV of your best up-sampled model in the held-out test set.

# ROC AUC
auc <- tibble(truth = feat_test$diagnosis,
                prob = predict(fit_best_knn_up, feat_test, type = "prob")$.pred_malignant) |> 
  roc_auc(prob, truth = truth) |> 
  pull(.estimate)

# Accuracy, sensitivity, specificity, PPV, NPV
results_40 <- cm_40 |> 
  summary() |> 
  filter(.metric %in% c("accuracy", "sens", "spec", "ppv", "npv")) |> 
  select(-.estimator) |> 
  pivot_wider(names_from = .metric, values_from = .estimate) |> 
  mutate(model = str_c("knn_smote_", fit_best_knn_up$fit$best.parameters$k, "_thresh_40"),
         roc_auc = auc) |> 
  select(model, roc_auc, everything())

results <- results |> 
  bind_rows(results_40)

results
# A tibble: 3 × 7
  model                   roc_auc accuracy  sens  spec   ppv   npv
  <chr>                     <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl>
1 knn_161                   0.983    0.908 0.409 1     1     0.902
2 knn_smote_151             0.987    0.943 0.864 0.958 0.792 0.974
3 knn_smote_151_thresh_40   0.987    0.943 0.955 0.941 0.75  0.991

Adjusting the threshold did not help performance. Sensitivity stayed the same and specificity decreased! If we look at histograms below we see why this is. The black line shows a threshold of .5 (pretty good separation of classes). On the other hand a threshold of .4 creates more false positives (benign cases labeled as malignant) without capturing any additional true positives.

ggplot(data = preds, aes(x = prob)) + 
  geom_histogram(bins = 15) +
  facet_wrap(vars(truth), nrow = 2) +
  xlab("Pr(Disease)") +
  geom_vline(xintercept = .5) +
  geom_vline(xintercept = .4, color = "red")

Lets see what would happen if we changed threshold to .3.

ggplot(data = preds, aes(x = prob)) + 
  geom_histogram(bins = 15) +
  facet_wrap(vars(truth), nrow = 2) +
  xlab("Pr(Disease)") +
  geom_vline(xintercept = .3, color = "red")

preds <- tibble(truth = feat_test$diagnosis,
                prob = predict(fit_best_knn_up, feat_test, type = "prob")$.pred_malignant) |> 
    mutate(estimate_30 = if_else(prob > .30, "malignant", "benign"),
         estimate_30 = factor(estimate_30, levels = c("malignant", "benign"))) 

cm_30 <- preds |> 
  conf_mat(truth = truth, estimate = estimate_30)


results_30 <- cm_30 |> 
  summary() |> 
  filter(.metric %in% c("accuracy", "sens", "spec", "ppv", "npv")) |> 
  select(-.estimator) |> 
  pivot_wider(names_from = .metric, values_from = .estimate) |> 
  mutate(model = str_c("knn_smote_", fit_best_knn_up$fit$best.parameters$k, "_thresh_30"),
         roc_auc = auc) |> 
  select(model, roc_auc, everything())

results <- results |> 
  bind_rows(results_30)


results
# A tibble: 4 × 7
  model                   roc_auc accuracy  sens  spec   ppv   npv
  <chr>                     <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl>
1 knn_161                   0.983    0.908 0.409 1     1     0.902
2 knn_smote_151             0.987    0.943 0.864 0.958 0.792 0.974
3 knn_smote_151_thresh_40   0.987    0.943 0.955 0.941 0.75  0.991
4 knn_smote_151_thresh_30   0.987    0.922 0.955 0.916 0.677 0.991

We now have perfect sensitivity because we correctly predict all malignant cases as positive. However we now have lower specificity and ppv. However, this may be appropriate for this case since it would be better to falsely label benign cases as malignant as opposed to missing malignant cases!

✭✭✭ You are a machine learning superstar ✭✭✭