Homework Unit 10: Neural Networks

Author

TA Key

Published

April 4, 2024

Introduction

This file serves as the answer key for the Unit_10 homework. Unit 10 Neural Networks in the course web book contains all materials required for this assignment.

In this assignment, we demonstrate how .


Setup

Handle conflicts

options(conflicts.policy = "depends.ok")
devtools::source_url("https://github.com/jjcurtin/lab_support/blob/main/fun_ml.R?raw=true")
tidymodels_conflictRules()

Load required packages

library(tidyverse) 
library(tidymodels)
library(xfun, include.only = "cache_rds")
library(keras, exclude = "get_weights")
library(magrittr, exclude = c("set_names", "extract"))

Source function scripts (John’s or your own)

devtools::source_url("https://github.com/jjcurtin/lab_support/blob/main/fun_plots.R?raw=true")
devtools::source_url("https://github.com/jjcurtin/lab_support/blob/main/fun_eda.R?raw=true")

Specify other global settings

theme_set(theme_classic())
options(tibble.width = Inf, dplyr.print_max=Inf)
rerun_setting <- FALSE

Paths

path_data <- "homework/unit_10"

Set up parallel processing

Note you can type cl into your console to see how many cores your computer has.

cl <- parallel::makePSOCKcluster(parallel::detectCores(logical = FALSE))
doParallel::registerDoParallel(cl)

Setting my conda environment manually because multiple environments exist on this machine

use_condaenv("C:/Users/kpaquette2/AppData/Local/r-miniconda/envs/r-reticulate")

Read in data

Read in wine_quality_trn.csv

data_trn <- read_csv(here::here(path_data, "wine_quality_trn.csv"), col_types = cols()) 

data_trn |> 
  skim_some()
Data summary
Name data_trn
Number of rows 3674
Number of columns 12
_______________________
Column type frequency:
character 1
numeric 11
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
quality 0 1 11 12 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate p0 p100
fixed_acidity 0 1 3.90 14.20
volatile_acidity 0 1 0.08 1.10
citric_acid 0 1 0.00 1.23
residual_sugar 0 1 0.60 65.80
chlorides 0 1 0.01 0.30
free_sulfur_dioxide 0 1 2.00 289.00
total_sulfur_dioxide 0 1 9.00 440.00
density 0 1 0.99 1.04
ph 0 1 2.77 3.81
sulphates 0 1 0.23 1.08
alcohol 0 1 8.00 14.00
data_trn <- data_trn |> 
  mutate(quality = factor(quality, 
                          levels = c("low_quality", "high_quality"),
                          labels = c("low quality", "high quality")))

data_trn |> 
  janitor::tabyl(quality)
      quality    n  percent
  low quality 1230 0.334785
 high quality 2444 0.665215

Note that:

  • There are no missing values on any variable.

  • Our outcome is a variable with two levels. We loaded it in here as a factor, with positive class (“high quality”) set as the second level. These levels are not perfectly balanced, but they’re also not extremely unbalanced (33.5%/66.5% split).

  • All our predictors are numeric.

Additional Modeling EDA

Lets make a single validation split to make some modeling decisions

split_val <- validation_split(data_trn, prop = 3/4, strata = "quality")
Warning: `validation_split()` was deprecated in rsample 1.2.0.
ℹ Please use `initial_validation_split()` instead.

Look at predictor variable distributions more closely

data_trn |> 
  select(where(is.numeric)) |> 
  names() |> 
  map(\(name) plot_box_violin(df = data_trn, x = name)) |> 
  cowplot::plot_grid(plotlist = _, ncol = 3)

All the predictors are positively skewed, with several being very positively skewed. The variables that stand out the most are residual_sugar, chlorides, free_sulfur_dioxide, and density - but we will correct for skew in all our predictors. Even the most normally distributed (alcohol) can be pushed a little bit more towards normal.

Before transforming the distribution of our predictors we are going to look at two different methods to scale them: step_range() and step_normalize(). Remember, step_range() scales predictors between 0 and 1 (unless you specify other defaults) and step_normalize() standardizes predictors so that they all have a mean of 0 and a standard deviation of 1. We will make the recipe and skim the feature set below to see how our values change depending on the scaling method.

rec_range <- recipe(quality ~ ., data = data_trn) |> 
  step_range(all_predictors())

feat_range_trn <- rec_range |> 
  prep(data_trn) |> 
  bake(NULL) 

feat_range_trn |> 
  skim_all()
Data summary
Name feat_range_trn
Number of rows 3674
Number of columns 12
_______________________
Column type frequency:
factor 1
numeric 11
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate n_unique top_counts
quality 0 1 2 hig: 2444, low: 1230

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 skew kurtosis
fixed_acidity 0 1 0.29 0.08 0 0.23 0.28 0.33 1 0.67 2.40
volatile_acidity 0 1 0.19 0.10 0 0.13 0.18 0.24 1 1.66 5.71
citric_acid 0 1 0.27 0.10 0 0.22 0.26 0.32 1 1.13 3.89
residual_sugar 0 1 0.09 0.08 0 0.02 0.07 0.14 1 1.21 4.71
chlorides 0 1 0.12 0.07 0 0.08 0.11 0.13 1 4.80 35.26
free_sulfur_dioxide 0 1 0.12 0.06 0 0.07 0.11 0.15 1 1.56 14.09
total_sulfur_dioxide 0 1 0.30 0.10 0 0.23 0.29 0.37 1 0.43 0.74
density 0 1 0.13 0.06 0 0.09 0.13 0.17 1 1.20 12.72
ph 0 1 0.40 0.14 0 0.30 0.39 0.49 1 0.45 0.44
sulphates 0 1 0.31 0.13 0 0.21 0.28 0.38 1 0.96 1.42
alcohol 0 1 0.42 0.20 0 0.25 0.40 0.57 1 0.46 -0.73
rec_norm <- recipe(quality ~ ., data = data_trn) |> 
  step_normalize(all_predictors())
  
feat_norm_trn <- rec_norm |> 
  prep(data_trn) |> 
  bake(NULL) 

feat_norm_trn |> 
  skim_all()
Data summary
Name feat_norm_trn
Number of rows 3674
Number of columns 12
_______________________
Column type frequency:
factor 1
numeric 11
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate n_unique top_counts
quality 0 1 2 hig: 2444, low: 1230

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 skew kurtosis
fixed_acidity 0 1 0 1 -3.50 -0.67 -0.08 0.51 8.67 0.67 2.40
volatile_acidity 0 1 0 1 -1.94 -0.66 -0.17 0.42 8.08 1.66 5.71
citric_acid 0 1 0 1 -2.80 -0.55 -0.13 0.46 7.49 1.13 3.89
residual_sugar 0 1 0 1 -1.12 -0.91 -0.24 0.68 11.64 1.21 4.71
chlorides 0 1 0 1 -1.62 -0.46 -0.12 0.22 12.42 4.80 35.26
free_sulfur_dioxide 0 1 0 1 -1.94 -0.72 -0.08 0.62 14.77 1.56 14.09
total_sulfur_dioxide 0 1 0 1 -3.01 -0.73 -0.12 0.70 7.05 0.43 0.74
density 0 1 0 1 -2.28 -0.76 -0.10 0.66 14.87 1.20 12.72
ph 0 1 0 1 -2.78 -0.72 -0.05 0.61 4.14 0.45 0.44
sulphates 0 1 0 1 -2.27 -0.70 -0.17 0.52 5.14 0.96 1.42
alcohol 0 1 0 1 -2.06 -0.83 -0.10 0.72 2.84 0.46 -0.73

Choose scaling method

Let’s choose our scaling method by running two models using our split_val object. We can also see if we have specified an appropriate number of epochs, while examining which of these scaling methods performs better. We’ll also hold out some data as a validation set (using validation_set) to monitor validation error and not just training error across epochs.

Fit range-corrected model

set.seed(102030)
fit_seeds <- sample.int(10^5, size = 3)

fit_nnet_range <- cache_rds(
  expr = {
    mlp(epochs = 30) |> 
      set_mode("classification") |> 
      set_engine("keras", 
                 verbose = 1, 
                 seeds = fit_seeds, 
                 metrics = c("accuracy"),
                 validation_split = .1) |> 
      fit_resamples(preprocessor = rec_range, 
                resamples = split_val,
                metrics = metric_set(accuracy))
  }, 
  dir = "cache/",
  file = "fit_nnet_range",
  rerun = rerun_setting)

fit_nnet_range$.metrics
[[1]]
# A tibble: 1 × 4
  .metric  .estimator .estimate .config             
  <chr>    <chr>          <dbl> <chr>               
1 accuracy binary         0.666 Preprocessor1_Model1

Fit standardized model

fit_nnet_norm <- cache_rds(
  expr = {
    mlp(epochs = 30) |>  
      set_mode("classification") |> 
      set_engine("keras", 
                 verbose = 1, 
                 seeds = fit_seeds, 
                 metrics = c("accuracy"),
                 validation_split = .1) |> 
      fit_resamples(preprocessor = rec_norm, 
                resamples = split_val,
                metrics = metric_set(accuracy))
  }, 
  dir = "cache/",
  file = "fit_nnet_norm",
  rerun = rerun_setting)

fit_nnet_norm$.metrics
[[1]]
# A tibble: 1 × 4
  .metric  .estimator .estimate .config             
  <chr>    <chr>          <dbl> <chr>               
1 accuracy binary         0.768 Preprocessor1_Model1

The model using step_normalize() does better than the one with step_range(). Let’s go with that one. Now we can explore how performance does when we also use step_YeoJohnson(). The only predictor that has non-positive values is citric_acid, but we will use step_YeoJohnson() on all predictors so that we are consistent across predictors.

Lets make our new features and look at the updated distributions of the predictors.

rec_norm_yj <- recipe(quality ~ ., data = data_trn) |> 
  step_YeoJohnson(all_predictors()) |> 
  step_normalize(all_predictors())

feat_norm_yj_trn <- rec_norm_yj |> 
  prep(data_trn) |> 
  bake(NULL) 

feat_norm_yj_trn |> 
  select(where(is.numeric)) |> 
  names() |> 
  map(\(name) plot_box_violin(df = feat_norm_yj_trn, x = name)) |> 
  cowplot::plot_grid(plotlist = _, ncol = 3)

The additional transformation to our predictors doesn’t dramatically change the shape of all our predictors, but it does improve the shape of some. The next question is whether it enhances the performance of the model.

Lets see!

fit_nnet_norm_yj <- cache_rds(
  expr = {
    mlp(epochs = 30) |>  
      set_mode("classification") |> 
      set_engine("keras", 
                 verbose = 1, 
                 seeds = fit_seeds, 
                 metrics = c("accuracy"),
                 validation_split = .1) |> 
      fit_resamples(preprocessor = rec_norm_yj, 
                resamples = split_val,
                metrics = metric_set(accuracy))
  }, 
  dir = "cache/",
  file = "fit_nnet_norm_yj",
  rerun = rerun_setting)

fit_nnet_norm_yj$.metrics
[[1]]
# A tibble: 1 × 4
  .metric  .estimator .estimate .config             
  <chr>    <chr>          <dbl> <chr>               
1 accuracy binary         0.770 Preprocessor1_Model1

Adding step_YeoJohnson() improves performance only minimally beyond the model with just step_normalize(). Practically, they are comparable, so let’s go with the simpler model for our purposes here.

Explore decorrelation

Next we will look at correlations among predictors using our simpler feat_norm_trn matrix.

feat_norm_trn |> 
  mutate(quality = if_else(quality == "high quality", 1, 0)) |> 
  cor() |> 
  corrplot::corrplot(type = "upper")

There are some relatively strong correlations. Density is highly correlated with residual sugar, total sulfar dioxide, and alcohol. Total sulfar dioxide is also correlated with free sulfar dioxide.

Let’s try one PCA extraction with residual sugar, total and free sulfur dioxide, and alcohol variables.

rec_norm_pca <- recipe(quality ~ ., data = data_trn) |> 
  step_pca(c(density, residual_sugar, ends_with("dioxide"), alcohol), 
           prefix = "density_corr") |> 
  step_normalize(all_predictors())

feat_norm_pca_trn <- rec_norm_pca |> 
  prep(data_trn) |> 
  bake(NULL) 

feat_norm_pca_trn |> 
  skim_all()
Data summary
Name feat_norm_pca_trn
Number of rows 3674
Number of columns 12
_______________________
Column type frequency:
factor 1
numeric 11
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate n_unique top_counts
quality 0 1 2 hig: 2444, low: 1230

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 skew kurtosis
fixed_acidity 0 1 0 1 -3.50 -0.67 -0.08 0.51 8.67 0.67 2.40
volatile_acidity 0 1 0 1 -1.94 -0.66 -0.17 0.42 8.08 1.66 5.71
citric_acid 0 1 0 1 -2.80 -0.55 -0.13 0.46 7.49 1.13 3.89
chlorides 0 1 0 1 -1.62 -0.46 -0.12 0.22 12.42 4.80 35.26
ph 0 1 0 1 -2.78 -0.72 -0.05 0.61 4.14 0.45 0.44
sulphates 0 1 0 1 -2.27 -0.70 -0.17 0.52 5.14 0.96 1.42
density_corr1 0 1 0 1 -8.03 -0.69 0.12 0.71 2.99 -0.44 1.11
density_corr2 0 1 0 1 -4.30 -0.61 -0.01 0.57 12.98 0.99 9.33
density_corr3 0 1 0 1 -12.12 -0.66 0.22 0.82 3.11 -1.29 5.86
density_corr4 0 1 0 1 -4.18 -0.73 -0.04 0.68 7.59 0.30 0.86
density_corr5 0 1 0 1 -3.24 -0.68 -0.04 0.62 5.56 0.23 0.30

Five pca components were extracted. This is the same as the number of features that were fed in, so it doesn’t necessarily help us from a simplicity standpoint (although we weren’t worried about that given our \(p\) to \(n\) ratio). Let’s see how this recipe does in model fitting.

fit_nnet_norm_pca <- cache_rds(
  expr = {
    mlp(epochs = 30) |>  
      set_mode("classification") |> 
      set_engine("keras", 
                 verbose = 1, 
                 seeds = fit_seeds, 
                 metrics = c("accuracy"),
                 validation_split = .1) |> 
      fit_resamples(preprocessor = rec_norm_pca, 
                resamples = split_val,
                metrics = metric_set(accuracy))
  }, 
  dir = "cache/",
  file = "fit_nnet_norm_pca",
  rerun = rerun_setting)

fit_nnet_norm_pca$.metrics
[[1]]
# A tibble: 1 × 4
  .metric  .estimator .estimate .config             
  <chr>    <chr>          <dbl> <chr>               
1 accuracy binary         0.753 Preprocessor1_Model1

This method didn’t increase accuracy. Let’s keep the simpler recipe, especially given that we have other methods for controlling overfitting due to correlated predictors (e.g., L2 regularization).

Exploring epochs

It’s difficult to demonstrate in qmd because the figures don’t print within the knit file, but throughout feature engineering we tested different values for epochs, ranging from 20-50. We settled on 30; this value tended to provide asymptotic performance in both accuracy and loss. Note that this can also be achieved (more formally) with a callback, as John demonstrated in the book.

Fit models

Explanation of systematic approach

To compare model configurations systematically, we built a grid (config_grid, below) that contained all the relevant configuration characteristics over which we wished to tune. We then use tune_grid() in the model fitting workflow. Although this might feel like more work/effort/overhead (especially up front), it actually saved a lot of time, because once we confirm things are working the way we want them to, you can just let the single code chunk run overnight.

Specifically, I used the following approach:

  1. Set up a grid tibble that contained all the relevant model configuration characteristics to consider (activation function, number of hidden units, overfitting control techniques & their hyperparameters). We fully crossed all these characteristics, resulting in 84 model configurations. Note that we used 0 as the penalty value when we wanted to consider dropout, and we used 0 as the dropout value when we wanted to consider L2 regularization, because only one of those values can be non-zero in any given model configuration.

  2. We confirmed that the number of epochs we selected in the earlier feature engineering exploration (30 epochs) still seemed sufficient for a simple and complex model in the configuration grid.

  3. We set up an analysis workflow with mlp(), set_mode(), set_engine(), and tune_grid(), passing in a split_val object. This allowed us to test the code with a much simpler resampling technique (single held-out set).

  4. We then created a repeated k-fold CV split object (10 folds, 3 repeats) and passed that into the analysis workflow from step 3 and let it run overnight.

This process allowed us to compare systematically across all possible crosses of model configuration characteristics so that we could feel confident that we selected the best option. Had we started by comparing models that varied on one characteristic (e.g., activation function), selected the better activation function, and then proceeded to consider only that activation function across all the hidden unit values, we might have missed the best model configuration. Just because one characteristic performs better in isolation doesn’t mean it will be part of the best overall model configuration.

Note that although the process described here (and shown in demo below) is limited to a few model characteristics, you could imagine expanding this systematic comparison to consider a wider variety of characteristics. It would require more coding (likely some custom functions wrapped around tidymodels functions), but you could have a “master” grid that defines your feature set, statistical algorithm, and even specific recipe steps (e.g., with or without PCA extraction). These values could then be set as arguments within your wrapper function and would control “if” statements inside to build the right recipe, select the right features, and follow the right statistical algorithm’s analysis workflow - and then tune on the various algorithm-specific characteristics.

Build grid

Because only penalty (L2) or dropout can be used in any single model configuration, we’ll set the opposite hyperparameter equal to 0. The simplest way to get the crossing right given this constraint was to build two separate grids and then bind the rows.

config_l2 <- expand_grid(hidden_units = c(5, seq(10, 50, by = 10), 100),
                         activation = c("softmax", "relu"),
                         penalty = c(0.01, 0.001, 0.0001),
                         dropout = 0)

config_dropout <- expand_grid(hidden_units = c(5, seq(10, 50, by = 10), 100),
                              activation = c("softmax", "relu"),
                              penalty = 0,
                              dropout = c(0.1, 0.3, 0.5))

config_grid <- bind_rows(config_l2, config_dropout)

Fit all model configurations

As noted above, 30 epochs is sufficient when doing feature engineering. To confirm this as the models got more complex, we’ve checked the loss/accuracy plots for:

  • The simplest model configuration in the grid: 5 hidden units, softmax activation function, and the highest penalty (0.01)
fits_nnet_simple <-
  cache_rds(
    expr = {
      mlp(epochs = 30, 
        hidden_units = 5, 
        penalty = .01, 
        activation = "softmax") |>
    set_mode("classification") |> 
        set_engine("keras", 
                   verbose = 1, 
                   seeds = fit_seeds, 
                   metrics = c("accuracy"),
                   validation_split = .1) |> 
        fit_resamples(preprocessor = rec_norm, 
                  resamples = split_val,
                  metrics = metric_set(accuracy))
    }, 
  dir = "cache/",
  file = "fits_nnet_simple",
  rerun = rerun_setting)

Here’s an image from the viewer for this model
  • The most complex model configuration in the grid: 100 hidden units, relu activation function, and the highest penalty (0.0001)
fits_nnet_complex <-
  cache_rds(
    expr = {
    mlp(epochs = 30, 
      hidden_units = 100, 
      penalty = .0001, 
      activation = "relu") |>
  set_mode("classification") |> 
  set_engine("keras", 
             verbose = 1,
             metrics = c("accuracy"),
             validation_split = 0.1) |>
  fit_resamples(preprocessor = rec_norm, 
                resamples = split_val,
                metrics = metric_set(accuracy))
    },
  dir = "cache/",
  file = "fits_nnet_complex",
  rerun = rerun_setting)
! Some required packages prohibit parallel processing:  'keras'

Here’s an image from the viewer for this model These demonstrate that 30 epochs was still producing asymptotic performance in loss and accuracy for simpler and more complex models.

First we fit with the validation splits to make sure everything is working. Note: you can use tictoc::tic() and tictoc::toc() around a code chunk to see how long it takes to run! We are not going to run this code in the knitted file but it is below in case you want to see what changes you might want to make to do this.

tictoc::tic()
fits_nnet <-
  mlp(epochs = 30, 
      hidden_units = tune(), 
      penalty = tune(), 
      dropout = tune(),
      activation = tune()) |>
  set_mode("classification") |> 
  set_engine("keras") |>
  tune_grid(preprocessor = rec_norm, 
            grid = config_grid,
            resamples = split_val,
            metrics = metric_set(accuracy))
tictoc::toc()

This only took about 9 minutes to run, so it was a relatively fast baseline check to make sure the whole process would work well!

We then created k-fold splits (3 X 10 k-fold) to use in tune_grid()

splits_kfold <- data_trn |> 
  vfold_cv(v = 10, repeats = 3, strata = "quality")

and updated the model fitting code to use splits_kfold as the resamples object inside tune_grid().

tictoc::tic()
fits_nnet <-
  cache_rds(
    expr = {
  mlp(epochs = 30, 
      hidden_units = tune(), 
      penalty = tune(), 
      dropout = tune(),
      activation = tune()) |>
  set_mode("classification") |> 
  set_engine("keras") |>
  tune_grid(preprocessor = rec_norm, 
            grid = config_grid,
            resamples = splits_kfold,
            metrics = metric_set(accuracy))
    },
  dir = "cache/",
  file = "fits_nnet_complex",
  rerun = rerun_setting)
! Some required packages prohibit parallel processing:  'keras'
tictoc::toc()
5258.51 sec elapsed

Select best model

collect_metrics(fits_nnet, summarize = TRUE) |> 
  plot_hist("mean")

Mean performance across folds for each model configuration.

show_best(fits_nnet)
# A tibble: 5 × 10
  hidden_units penalty dropout activation .metric  .estimator  mean     n
         <dbl>   <dbl>   <dbl> <chr>      <chr>    <chr>      <dbl> <int>
1          100       0     0.5 relu       accuracy binary     0.789    30
2          100       0     0.3 relu       accuracy binary     0.788    30
3           50       0     0.3 relu       accuracy binary     0.788    30
4           40       0     0.3 relu       accuracy binary     0.787    30
5          100       0     0.1 relu       accuracy binary     0.786    30
  std_err .config              
    <dbl> <chr>                
1 0.00380 Preprocessor1_Model84
2 0.00353 Preprocessor1_Model83
3 0.00341 Preprocessor1_Model77
4 0.00368 Preprocessor1_Model71
5 0.00378 Preprocessor1_Model82

Generate predictions

This section generates predictions for a best model in the held-out test set. We only generate predictions for one model out of all configurations. We can then use these predictions to generate one estimate of model performance in new data.

Read in test data

Read in the wine_quality_test.csv file. (We are using the labeled data, but yours would not have labels!)

data_test <- read_csv(here::here(path_data, "wine_quality_test_labeled.csv"),
                      col_types = cols()) |> 
  mutate(quality = factor(quality, 
                          levels = c("low_quality", "high_quality"),
                          labels = c("low quality", "high quality")))

Make feature matrices

Make training and test feature matrices using your best recipe.

feat_trn_best <- rec_norm |> 
  prep(data_trn) |> 
  bake(NULL)

feat_test_best <- rec_norm |> 
  prep(data_trn) |> 
  bake(data_test)

Fit best model

Fit best model in feat_trn_best.

best_model <- mlp(epochs = 30,
                  hidden_units = select_best(fits_nnet)$hidden_units,
                  penalty = select_best(fits_nnet)$penalty,
                  dropout = select_best(fits_nnet)$dropout,
                  activation = select_best(fits_nnet)$activation) |> 
  set_mode("classification") |> 
  set_engine("keras") |> 
  fit(quality ~ ., data = feat_trn_best)
Epoch 1/30

  1/115 [..............................] - ETA: 24s - loss: 0.6703
 71/115 [=================>............] - ETA: 0s - loss: 0.6286 
115/115 [==============================] - 0s 705us/step - loss: 0.6103
Epoch 2/30

  1/115 [..............................] - ETA: 0s - loss: 0.5046
 75/115 [==================>...........] - ETA: 0s - loss: 0.5074
115/115 [==============================] - 0s 682us/step - loss: 0.5071
Epoch 3/30

  1/115 [..............................] - ETA: 0s - loss: 0.4279
 75/115 [==================>...........] - ETA: 0s - loss: 0.4714
115/115 [==============================] - 0s 691us/step - loss: 0.4763
Epoch 4/30

  1/115 [..............................] - ETA: 0s - loss: 0.4373
 76/115 [==================>...........] - ETA: 0s - loss: 0.4788
115/115 [==============================] - 0s 682us/step - loss: 0.4792
Epoch 5/30

  1/115 [..............................] - ETA: 0s - loss: 0.5314
 75/115 [==================>...........] - ETA: 0s - loss: 0.4697
115/115 [==============================] - 0s 682us/step - loss: 0.4709
Epoch 6/30

  1/115 [..............................] - ETA: 0s - loss: 0.4492
 76/115 [==================>...........] - ETA: 0s - loss: 0.4796
115/115 [==============================] - 0s 682us/step - loss: 0.4764
Epoch 7/30

  1/115 [..............................] - ETA: 0s - loss: 0.3049
 76/115 [==================>...........] - ETA: 0s - loss: 0.4422
115/115 [==============================] - 0s 682us/step - loss: 0.4479
Epoch 8/30

  1/115 [..............................] - ETA: 0s - loss: 0.5070
 74/115 [==================>...........] - ETA: 0s - loss: 0.4404
115/115 [==============================] - 0s 691us/step - loss: 0.4450
Epoch 9/30

  1/115 [..............................] - ETA: 0s - loss: 0.4099
 76/115 [==================>...........] - ETA: 0s - loss: 0.4407
115/115 [==============================] - 0s 682us/step - loss: 0.4449
Epoch 10/30

  1/115 [..............................] - ETA: 0s - loss: 0.5499
 76/115 [==================>...........] - ETA: 0s - loss: 0.4623
115/115 [==============================] - 0s 682us/step - loss: 0.4618
Epoch 11/30

  1/115 [..............................] - ETA: 0s - loss: 0.5402
 76/115 [==================>...........] - ETA: 0s - loss: 0.4623
115/115 [==============================] - 0s 682us/step - loss: 0.4600
Epoch 12/30

  1/115 [..............................] - ETA: 0s - loss: 0.6676
 76/115 [==================>...........] - ETA: 0s - loss: 0.4639
115/115 [==============================] - 0s 682us/step - loss: 0.4588
Epoch 13/30

  1/115 [..............................] - ETA: 0s - loss: 0.4181
 75/115 [==================>...........] - ETA: 0s - loss: 0.4480
115/115 [==============================] - 0s 691us/step - loss: 0.4465
Epoch 14/30

  1/115 [..............................] - ETA: 0s - loss: 0.4581
 76/115 [==================>...........] - ETA: 0s - loss: 0.4358
115/115 [==============================] - 0s 682us/step - loss: 0.4358
Epoch 15/30

  1/115 [..............................] - ETA: 0s - loss: 0.3175
 75/115 [==================>...........] - ETA: 0s - loss: 0.4406
115/115 [==============================] - 0s 682us/step - loss: 0.4388
Epoch 16/30

  1/115 [..............................] - ETA: 0s - loss: 0.3677
 76/115 [==================>...........] - ETA: 0s - loss: 0.4223
115/115 [==============================] - 0s 682us/step - loss: 0.4233
Epoch 17/30

  1/115 [..............................] - ETA: 0s - loss: 0.4612
 75/115 [==================>...........] - ETA: 0s - loss: 0.4296
115/115 [==============================] - 0s 682us/step - loss: 0.4301
Epoch 18/30

  1/115 [..............................] - ETA: 0s - loss: 0.3854
 76/115 [==================>...........] - ETA: 0s - loss: 0.4404
115/115 [==============================] - 0s 678us/step - loss: 0.4377
Epoch 19/30

  1/115 [..............................] - ETA: 0s - loss: 0.4232
 75/115 [==================>...........] - ETA: 0s - loss: 0.4298
115/115 [==============================] - 0s 682us/step - loss: 0.4276
Epoch 20/30

  1/115 [..............................] - ETA: 0s - loss: 0.3333
 76/115 [==================>...........] - ETA: 0s - loss: 0.4210
115/115 [==============================] - 0s 674us/step - loss: 0.4201
Epoch 21/30

  1/115 [..............................] - ETA: 0s - loss: 0.3031
 76/115 [==================>...........] - ETA: 0s - loss: 0.4103
115/115 [==============================] - 0s 674us/step - loss: 0.4134
Epoch 22/30

  1/115 [..............................] - ETA: 0s - loss: 0.5703
 76/115 [==================>...........] - ETA: 0s - loss: 0.4084
115/115 [==============================] - 0s 682us/step - loss: 0.4103
Epoch 23/30

  1/115 [..............................] - ETA: 0s - loss: 0.4140
 76/115 [==================>...........] - ETA: 0s - loss: 0.4046
115/115 [==============================] - 0s 682us/step - loss: 0.4075
Epoch 24/30

  1/115 [..............................] - ETA: 0s - loss: 0.5179
 76/115 [==================>...........] - ETA: 0s - loss: 0.4008
115/115 [==============================] - 0s 682us/step - loss: 0.4030
Epoch 25/30

  1/115 [..............................] - ETA: 0s - loss: 0.2851
 75/115 [==================>...........] - ETA: 0s - loss: 0.3959
115/115 [==============================] - 0s 683us/step - loss: 0.4011
Epoch 26/30

  1/115 [..............................] - ETA: 0s - loss: 0.4847
 74/115 [==================>...........] - ETA: 0s - loss: 0.4048
115/115 [==============================] - 0s 690us/step - loss: 0.4046
Epoch 27/30

  1/115 [..............................] - ETA: 0s - loss: 0.3547
 76/115 [==================>...........] - ETA: 0s - loss: 0.3946
115/115 [==============================] - 0s 678us/step - loss: 0.3960
Epoch 28/30

  1/115 [..............................] - ETA: 0s - loss: 0.4363
 75/115 [==================>...........] - ETA: 0s - loss: 0.3871
115/115 [==============================] - 0s 682us/step - loss: 0.3906
Epoch 29/30

  1/115 [..............................] - ETA: 0s - loss: 0.4660
 75/115 [==================>...........] - ETA: 0s - loss: 0.3947
115/115 [==============================] - 0s 678us/step - loss: 0.3950
Epoch 30/30

  1/115 [..............................] - ETA: 0s - loss: 0.3104
 76/115 [==================>...........] - ETA: 0s - loss: 0.3961
115/115 [==============================] - 0s 682us/step - loss: 0.3980

Generate test predictions

Run this code chunk to save the best model’s predictions in the held-out test set.

feat_test_best |> 
  mutate(quality = predict(best_model, feat_test_best)$.pred_class) |> 
  select(quality) |> 
  glimpse() |> 
  write_csv(here::here(path_data, "test_preds_TA.csv"))

Model Performance

Since we have access to the labeled test set we can also see how our model did

accuracy_vec(feat_test_best$quality, predict(best_model, feat_test_best)$.pred_class)
[1] 0.7794118