options(conflicts.policy = "depends.ok")
::source_url("https://github.com/jjcurtin/lab_support/blob/main/fun_ml.R?raw=true")
devtoolstidymodels_conflictRules()
Homework Unit 10: Neural Networks
Introduction
This file serves as the answer key for the Unit_10 homework. Unit 10 Neural Networks in the course web book contains all materials required for this assignment.
In this assignment, we demonstrate how .
Setup
Handle conflicts
Load required packages
library(tidyverse)
library(tidymodels)
library(xfun, include.only = "cache_rds")
library(keras, exclude = "get_weights")
library(magrittr, exclude = c("set_names", "extract"))
Source function scripts (John’s or your own)
::source_url("https://github.com/jjcurtin/lab_support/blob/main/fun_plots.R?raw=true")
devtools::source_url("https://github.com/jjcurtin/lab_support/blob/main/fun_eda.R?raw=true") devtools
Specify other global settings
theme_set(theme_classic())
options(tibble.width = Inf, dplyr.print_max=Inf)
<- FALSE rerun_setting
Paths
<- "homework/unit_10" path_data
Set up parallel processing
Note you can type cl
into your console to see how many cores your computer has.
<- parallel::makePSOCKcluster(parallel::detectCores(logical = FALSE))
cl ::registerDoParallel(cl) doParallel
Setting my conda environment manually because multiple environments exist on this machine
use_condaenv("C:/Users/kpaquette2/AppData/Local/r-miniconda/envs/r-reticulate")
Read in data
Read in wine_quality_trn.csv
<- read_csv(here::here(path_data, "wine_quality_trn.csv"), col_types = cols())
data_trn
|>
data_trn skim_some()
Name | data_trn |
Number of rows | 3674 |
Number of columns | 12 |
_______________________ | |
Column type frequency: | |
character | 1 |
numeric | 11 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
quality | 0 | 1 | 11 | 12 | 0 | 2 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | p0 | p100 |
---|---|---|---|---|
fixed_acidity | 0 | 1 | 3.90 | 14.20 |
volatile_acidity | 0 | 1 | 0.08 | 1.10 |
citric_acid | 0 | 1 | 0.00 | 1.23 |
residual_sugar | 0 | 1 | 0.60 | 65.80 |
chlorides | 0 | 1 | 0.01 | 0.30 |
free_sulfur_dioxide | 0 | 1 | 2.00 | 289.00 |
total_sulfur_dioxide | 0 | 1 | 9.00 | 440.00 |
density | 0 | 1 | 0.99 | 1.04 |
ph | 0 | 1 | 2.77 | 3.81 |
sulphates | 0 | 1 | 0.23 | 1.08 |
alcohol | 0 | 1 | 8.00 | 14.00 |
<- data_trn |>
data_trn mutate(quality = factor(quality,
levels = c("low_quality", "high_quality"),
labels = c("low quality", "high quality")))
|>
data_trn ::tabyl(quality) janitor
quality n percent
low quality 1230 0.334785
high quality 2444 0.665215
Note that:
There are no missing values on any variable.
Our outcome is a variable with two levels. We loaded it in here as a factor, with positive class (“high quality”) set as the second level. These levels are not perfectly balanced, but they’re also not extremely unbalanced (33.5%/66.5% split).
All our predictors are numeric.
Additional Modeling EDA
Lets make a single validation split to make some modeling decisions
<- validation_split(data_trn, prop = 3/4, strata = "quality") split_val
Warning: `validation_split()` was deprecated in rsample 1.2.0.
ℹ Please use `initial_validation_split()` instead.
Look at predictor variable distributions more closely
|>
data_trn select(where(is.numeric)) |>
names() |>
map(\(name) plot_box_violin(df = data_trn, x = name)) |>
::plot_grid(plotlist = _, ncol = 3) cowplot
All the predictors are positively skewed, with several being very positively skewed. The variables that stand out the most are residual_sugar
, chlorides
, free_sulfur_dioxide
, and density
- but we will correct for skew in all our predictors. Even the most normally distributed (alcohol
) can be pushed a little bit more towards normal.
Before transforming the distribution of our predictors we are going to look at two different methods to scale them: step_range()
and step_normalize()
. Remember, step_range()
scales predictors between 0 and 1 (unless you specify other defaults) and step_normalize()
standardizes predictors so that they all have a mean of 0 and a standard deviation of 1. We will make the recipe and skim the feature set below to see how our values change depending on the scaling method.
<- recipe(quality ~ ., data = data_trn) |>
rec_range step_range(all_predictors())
<- rec_range |>
feat_range_trn prep(data_trn) |>
bake(NULL)
|>
feat_range_trn skim_all()
Name | feat_range_trn |
Number of rows | 3674 |
Number of columns | 12 |
_______________________ | |
Column type frequency: | |
factor | 1 |
numeric | 11 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | n_unique | top_counts |
---|---|---|---|---|
quality | 0 | 1 | 2 | hig: 2444, low: 1230 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | skew | kurtosis |
---|---|---|---|---|---|---|---|---|---|---|---|
fixed_acidity | 0 | 1 | 0.29 | 0.08 | 0 | 0.23 | 0.28 | 0.33 | 1 | 0.67 | 2.40 |
volatile_acidity | 0 | 1 | 0.19 | 0.10 | 0 | 0.13 | 0.18 | 0.24 | 1 | 1.66 | 5.71 |
citric_acid | 0 | 1 | 0.27 | 0.10 | 0 | 0.22 | 0.26 | 0.32 | 1 | 1.13 | 3.89 |
residual_sugar | 0 | 1 | 0.09 | 0.08 | 0 | 0.02 | 0.07 | 0.14 | 1 | 1.21 | 4.71 |
chlorides | 0 | 1 | 0.12 | 0.07 | 0 | 0.08 | 0.11 | 0.13 | 1 | 4.80 | 35.26 |
free_sulfur_dioxide | 0 | 1 | 0.12 | 0.06 | 0 | 0.07 | 0.11 | 0.15 | 1 | 1.56 | 14.09 |
total_sulfur_dioxide | 0 | 1 | 0.30 | 0.10 | 0 | 0.23 | 0.29 | 0.37 | 1 | 0.43 | 0.74 |
density | 0 | 1 | 0.13 | 0.06 | 0 | 0.09 | 0.13 | 0.17 | 1 | 1.20 | 12.72 |
ph | 0 | 1 | 0.40 | 0.14 | 0 | 0.30 | 0.39 | 0.49 | 1 | 0.45 | 0.44 |
sulphates | 0 | 1 | 0.31 | 0.13 | 0 | 0.21 | 0.28 | 0.38 | 1 | 0.96 | 1.42 |
alcohol | 0 | 1 | 0.42 | 0.20 | 0 | 0.25 | 0.40 | 0.57 | 1 | 0.46 | -0.73 |
<- recipe(quality ~ ., data = data_trn) |>
rec_norm step_normalize(all_predictors())
<- rec_norm |>
feat_norm_trn prep(data_trn) |>
bake(NULL)
|>
feat_norm_trn skim_all()
Name | feat_norm_trn |
Number of rows | 3674 |
Number of columns | 12 |
_______________________ | |
Column type frequency: | |
factor | 1 |
numeric | 11 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | n_unique | top_counts |
---|---|---|---|---|
quality | 0 | 1 | 2 | hig: 2444, low: 1230 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | skew | kurtosis |
---|---|---|---|---|---|---|---|---|---|---|---|
fixed_acidity | 0 | 1 | 0 | 1 | -3.50 | -0.67 | -0.08 | 0.51 | 8.67 | 0.67 | 2.40 |
volatile_acidity | 0 | 1 | 0 | 1 | -1.94 | -0.66 | -0.17 | 0.42 | 8.08 | 1.66 | 5.71 |
citric_acid | 0 | 1 | 0 | 1 | -2.80 | -0.55 | -0.13 | 0.46 | 7.49 | 1.13 | 3.89 |
residual_sugar | 0 | 1 | 0 | 1 | -1.12 | -0.91 | -0.24 | 0.68 | 11.64 | 1.21 | 4.71 |
chlorides | 0 | 1 | 0 | 1 | -1.62 | -0.46 | -0.12 | 0.22 | 12.42 | 4.80 | 35.26 |
free_sulfur_dioxide | 0 | 1 | 0 | 1 | -1.94 | -0.72 | -0.08 | 0.62 | 14.77 | 1.56 | 14.09 |
total_sulfur_dioxide | 0 | 1 | 0 | 1 | -3.01 | -0.73 | -0.12 | 0.70 | 7.05 | 0.43 | 0.74 |
density | 0 | 1 | 0 | 1 | -2.28 | -0.76 | -0.10 | 0.66 | 14.87 | 1.20 | 12.72 |
ph | 0 | 1 | 0 | 1 | -2.78 | -0.72 | -0.05 | 0.61 | 4.14 | 0.45 | 0.44 |
sulphates | 0 | 1 | 0 | 1 | -2.27 | -0.70 | -0.17 | 0.52 | 5.14 | 0.96 | 1.42 |
alcohol | 0 | 1 | 0 | 1 | -2.06 | -0.83 | -0.10 | 0.72 | 2.84 | 0.46 | -0.73 |
Choose scaling method
Let’s choose our scaling method by running two models using our split_val
object. We can also see if we have specified an appropriate number of epochs, while examining which of these scaling methods performs better. We’ll also hold out some data as a validation set (using validation_set
) to monitor validation error and not just training error across epochs.
Fit range-corrected model
set.seed(102030)
<- sample.int(10^5, size = 3)
fit_seeds
<- cache_rds(
fit_nnet_range expr = {
mlp(epochs = 30) |>
set_mode("classification") |>
set_engine("keras",
verbose = 1,
seeds = fit_seeds,
metrics = c("accuracy"),
validation_split = .1) |>
fit_resamples(preprocessor = rec_range,
resamples = split_val,
metrics = metric_set(accuracy))
}, dir = "cache/",
file = "fit_nnet_range",
rerun = rerun_setting)
$.metrics fit_nnet_range
[[1]]
# A tibble: 1 × 4
.metric .estimator .estimate .config
<chr> <chr> <dbl> <chr>
1 accuracy binary 0.666 Preprocessor1_Model1
Fit standardized model
<- cache_rds(
fit_nnet_norm expr = {
mlp(epochs = 30) |>
set_mode("classification") |>
set_engine("keras",
verbose = 1,
seeds = fit_seeds,
metrics = c("accuracy"),
validation_split = .1) |>
fit_resamples(preprocessor = rec_norm,
resamples = split_val,
metrics = metric_set(accuracy))
}, dir = "cache/",
file = "fit_nnet_norm",
rerun = rerun_setting)
$.metrics fit_nnet_norm
[[1]]
# A tibble: 1 × 4
.metric .estimator .estimate .config
<chr> <chr> <dbl> <chr>
1 accuracy binary 0.768 Preprocessor1_Model1
The model using step_normalize()
does better than the one with step_range()
. Let’s go with that one. Now we can explore how performance does when we also use step_YeoJohnson()
. The only predictor that has non-positive values is citric_acid
, but we will use step_YeoJohnson()
on all predictors so that we are consistent across predictors.
Lets make our new features and look at the updated distributions of the predictors.
<- recipe(quality ~ ., data = data_trn) |>
rec_norm_yj step_YeoJohnson(all_predictors()) |>
step_normalize(all_predictors())
<- rec_norm_yj |>
feat_norm_yj_trn prep(data_trn) |>
bake(NULL)
|>
feat_norm_yj_trn select(where(is.numeric)) |>
names() |>
map(\(name) plot_box_violin(df = feat_norm_yj_trn, x = name)) |>
::plot_grid(plotlist = _, ncol = 3) cowplot
The additional transformation to our predictors doesn’t dramatically change the shape of all our predictors, but it does improve the shape of some. The next question is whether it enhances the performance of the model.
Lets see!
<- cache_rds(
fit_nnet_norm_yj expr = {
mlp(epochs = 30) |>
set_mode("classification") |>
set_engine("keras",
verbose = 1,
seeds = fit_seeds,
metrics = c("accuracy"),
validation_split = .1) |>
fit_resamples(preprocessor = rec_norm_yj,
resamples = split_val,
metrics = metric_set(accuracy))
}, dir = "cache/",
file = "fit_nnet_norm_yj",
rerun = rerun_setting)
$.metrics fit_nnet_norm_yj
[[1]]
# A tibble: 1 × 4
.metric .estimator .estimate .config
<chr> <chr> <dbl> <chr>
1 accuracy binary 0.770 Preprocessor1_Model1
Adding step_YeoJohnson()
improves performance only minimally beyond the model with just step_normalize()
. Practically, they are comparable, so let’s go with the simpler model for our purposes here.
Explore decorrelation
Next we will look at correlations among predictors using our simpler feat_norm_trn
matrix.
|>
feat_norm_trn mutate(quality = if_else(quality == "high quality", 1, 0)) |>
cor() |>
::corrplot(type = "upper") corrplot
There are some relatively strong correlations. Density is highly correlated with residual sugar, total sulfar dioxide, and alcohol. Total sulfar dioxide is also correlated with free sulfar dioxide.
Let’s try one PCA extraction with residual sugar, total and free sulfur dioxide, and alcohol variables.
<- recipe(quality ~ ., data = data_trn) |>
rec_norm_pca step_pca(c(density, residual_sugar, ends_with("dioxide"), alcohol),
prefix = "density_corr") |>
step_normalize(all_predictors())
<- rec_norm_pca |>
feat_norm_pca_trn prep(data_trn) |>
bake(NULL)
|>
feat_norm_pca_trn skim_all()
Name | feat_norm_pca_trn |
Number of rows | 3674 |
Number of columns | 12 |
_______________________ | |
Column type frequency: | |
factor | 1 |
numeric | 11 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | n_unique | top_counts |
---|---|---|---|---|
quality | 0 | 1 | 2 | hig: 2444, low: 1230 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | skew | kurtosis |
---|---|---|---|---|---|---|---|---|---|---|---|
fixed_acidity | 0 | 1 | 0 | 1 | -3.50 | -0.67 | -0.08 | 0.51 | 8.67 | 0.67 | 2.40 |
volatile_acidity | 0 | 1 | 0 | 1 | -1.94 | -0.66 | -0.17 | 0.42 | 8.08 | 1.66 | 5.71 |
citric_acid | 0 | 1 | 0 | 1 | -2.80 | -0.55 | -0.13 | 0.46 | 7.49 | 1.13 | 3.89 |
chlorides | 0 | 1 | 0 | 1 | -1.62 | -0.46 | -0.12 | 0.22 | 12.42 | 4.80 | 35.26 |
ph | 0 | 1 | 0 | 1 | -2.78 | -0.72 | -0.05 | 0.61 | 4.14 | 0.45 | 0.44 |
sulphates | 0 | 1 | 0 | 1 | -2.27 | -0.70 | -0.17 | 0.52 | 5.14 | 0.96 | 1.42 |
density_corr1 | 0 | 1 | 0 | 1 | -8.03 | -0.69 | 0.12 | 0.71 | 2.99 | -0.44 | 1.11 |
density_corr2 | 0 | 1 | 0 | 1 | -4.30 | -0.61 | -0.01 | 0.57 | 12.98 | 0.99 | 9.33 |
density_corr3 | 0 | 1 | 0 | 1 | -12.12 | -0.66 | 0.22 | 0.82 | 3.11 | -1.29 | 5.86 |
density_corr4 | 0 | 1 | 0 | 1 | -4.18 | -0.73 | -0.04 | 0.68 | 7.59 | 0.30 | 0.86 |
density_corr5 | 0 | 1 | 0 | 1 | -3.24 | -0.68 | -0.04 | 0.62 | 5.56 | 0.23 | 0.30 |
Five pca components were extracted. This is the same as the number of features that were fed in, so it doesn’t necessarily help us from a simplicity standpoint (although we weren’t worried about that given our \(p\) to \(n\) ratio). Let’s see how this recipe does in model fitting.
<- cache_rds(
fit_nnet_norm_pca expr = {
mlp(epochs = 30) |>
set_mode("classification") |>
set_engine("keras",
verbose = 1,
seeds = fit_seeds,
metrics = c("accuracy"),
validation_split = .1) |>
fit_resamples(preprocessor = rec_norm_pca,
resamples = split_val,
metrics = metric_set(accuracy))
}, dir = "cache/",
file = "fit_nnet_norm_pca",
rerun = rerun_setting)
$.metrics fit_nnet_norm_pca
[[1]]
# A tibble: 1 × 4
.metric .estimator .estimate .config
<chr> <chr> <dbl> <chr>
1 accuracy binary 0.753 Preprocessor1_Model1
This method didn’t increase accuracy. Let’s keep the simpler recipe, especially given that we have other methods for controlling overfitting due to correlated predictors (e.g., L2 regularization).
Exploring epochs
It’s difficult to demonstrate in qmd because the figures don’t print within the knit file, but throughout feature engineering we tested different values for epochs, ranging from 20-50. We settled on 30; this value tended to provide asymptotic performance in both accuracy and loss. Note that this can also be achieved (more formally) with a callback, as John demonstrated in the book.
Fit models
Explanation of systematic approach
To compare model configurations systematically, we built a grid (config_grid
, below) that contained all the relevant configuration characteristics over which we wished to tune. We then use tune_grid()
in the model fitting workflow. Although this might feel like more work/effort/overhead (especially up front), it actually saved a lot of time, because once we confirm things are working the way we want them to, you can just let the single code chunk run overnight.
Specifically, I used the following approach:
Set up a grid tibble that contained all the relevant model configuration characteristics to consider (activation function, number of hidden units, overfitting control techniques & their hyperparameters). We fully crossed all these characteristics, resulting in 84 model configurations. Note that we used 0 as the penalty value when we wanted to consider dropout, and we used 0 as the dropout value when we wanted to consider L2 regularization, because only one of those values can be non-zero in any given model configuration.
We confirmed that the number of epochs we selected in the earlier feature engineering exploration (30 epochs) still seemed sufficient for a simple and complex model in the configuration grid.
We set up an analysis workflow with
mlp()
,set_mode()
,set_engine()
, andtune_grid()
, passing in asplit_val
object. This allowed us to test the code with a much simpler resampling technique (single held-out set).We then created a repeated k-fold CV split object (10 folds, 3 repeats) and passed that into the analysis workflow from step 3 and let it run overnight.
This process allowed us to compare systematically across all possible crosses of model configuration characteristics so that we could feel confident that we selected the best option. Had we started by comparing models that varied on one characteristic (e.g., activation function), selected the better activation function, and then proceeded to consider only that activation function across all the hidden unit values, we might have missed the best model configuration. Just because one characteristic performs better in isolation doesn’t mean it will be part of the best overall model configuration.
Note that although the process described here (and shown in demo below) is limited to a few model characteristics, you could imagine expanding this systematic comparison to consider a wider variety of characteristics. It would require more coding (likely some custom functions wrapped around tidymodels functions), but you could have a “master” grid that defines your feature set, statistical algorithm, and even specific recipe steps (e.g., with or without PCA extraction). These values could then be set as arguments within your wrapper function and would control “if” statements inside to build the right recipe, select the right features, and follow the right statistical algorithm’s analysis workflow - and then tune on the various algorithm-specific characteristics.
Build grid
Because only penalty (L2) or dropout can be used in any single model configuration, we’ll set the opposite hyperparameter equal to 0. The simplest way to get the crossing right given this constraint was to build two separate grids and then bind the rows.
<- expand_grid(hidden_units = c(5, seq(10, 50, by = 10), 100),
config_l2 activation = c("softmax", "relu"),
penalty = c(0.01, 0.001, 0.0001),
dropout = 0)
<- expand_grid(hidden_units = c(5, seq(10, 50, by = 10), 100),
config_dropout activation = c("softmax", "relu"),
penalty = 0,
dropout = c(0.1, 0.3, 0.5))
<- bind_rows(config_l2, config_dropout) config_grid
Fit all model configurations
As noted above, 30 epochs is sufficient when doing feature engineering. To confirm this as the models got more complex, we’ve checked the loss/accuracy plots for:
- The simplest model configuration in the grid: 5 hidden units, softmax activation function, and the highest penalty (0.01)
<-
fits_nnet_simple cache_rds(
expr = {
mlp(epochs = 30,
hidden_units = 5,
penalty = .01,
activation = "softmax") |>
set_mode("classification") |>
set_engine("keras",
verbose = 1,
seeds = fit_seeds,
metrics = c("accuracy"),
validation_split = .1) |>
fit_resamples(preprocessor = rec_norm,
resamples = split_val,
metrics = metric_set(accuracy))
}, dir = "cache/",
file = "fits_nnet_simple",
rerun = rerun_setting)
- The most complex model configuration in the grid: 100 hidden units, relu activation function, and the highest penalty (0.0001)
<-
fits_nnet_complex cache_rds(
expr = {
mlp(epochs = 30,
hidden_units = 100,
penalty = .0001,
activation = "relu") |>
set_mode("classification") |>
set_engine("keras",
verbose = 1,
metrics = c("accuracy"),
validation_split = 0.1) |>
fit_resamples(preprocessor = rec_norm,
resamples = split_val,
metrics = metric_set(accuracy))
},dir = "cache/",
file = "fits_nnet_complex",
rerun = rerun_setting)
! Some required packages prohibit parallel processing: 'keras'
These demonstrate that 30 epochs was still producing asymptotic performance in loss and accuracy for simpler and more complex models.
First we fit with the validation splits to make sure everything is working. Note: you can use tictoc::tic()
and tictoc::toc()
around a code chunk to see how long it takes to run! We are not going to run this code in the knitted file but it is below in case you want to see what changes you might want to make to do this.
::tic()
tictoc<-
fits_nnet mlp(epochs = 30,
hidden_units = tune(),
penalty = tune(),
dropout = tune(),
activation = tune()) |>
set_mode("classification") |>
set_engine("keras") |>
tune_grid(preprocessor = rec_norm,
grid = config_grid,
resamples = split_val,
metrics = metric_set(accuracy))
::toc() tictoc
This only took about 9 minutes to run, so it was a relatively fast baseline check to make sure the whole process would work well!
We then created k-fold splits (3 X 10 k-fold) to use in tune_grid()
<- data_trn |>
splits_kfold vfold_cv(v = 10, repeats = 3, strata = "quality")
and updated the model fitting code to use splits_kfold
as the resamples object inside tune_grid()
.
::tic()
tictoc<-
fits_nnet cache_rds(
expr = {
mlp(epochs = 30,
hidden_units = tune(),
penalty = tune(),
dropout = tune(),
activation = tune()) |>
set_mode("classification") |>
set_engine("keras") |>
tune_grid(preprocessor = rec_norm,
grid = config_grid,
resamples = splits_kfold,
metrics = metric_set(accuracy))
},dir = "cache/",
file = "fits_nnet_complex",
rerun = rerun_setting)
! Some required packages prohibit parallel processing: 'keras'
::toc() tictoc
5258.51 sec elapsed
Select best model
collect_metrics(fits_nnet, summarize = TRUE) |>
plot_hist("mean")
Mean performance across folds for each model configuration.
show_best(fits_nnet)
# A tibble: 5 × 10
hidden_units penalty dropout activation .metric .estimator mean n
<dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <int>
1 100 0 0.5 relu accuracy binary 0.789 30
2 100 0 0.3 relu accuracy binary 0.788 30
3 50 0 0.3 relu accuracy binary 0.788 30
4 40 0 0.3 relu accuracy binary 0.787 30
5 100 0 0.1 relu accuracy binary 0.786 30
std_err .config
<dbl> <chr>
1 0.00380 Preprocessor1_Model84
2 0.00353 Preprocessor1_Model83
3 0.00341 Preprocessor1_Model77
4 0.00368 Preprocessor1_Model71
5 0.00378 Preprocessor1_Model82
Generate predictions
This section generates predictions for a best model in the held-out test set. We only generate predictions for one model out of all configurations. We can then use these predictions to generate one estimate of model performance in new data.
Read in test data
Read in the wine_quality_test.csv
file. (We are using the labeled data, but yours would not have labels!)
<- read_csv(here::here(path_data, "wine_quality_test_labeled.csv"),
data_test col_types = cols()) |>
mutate(quality = factor(quality,
levels = c("low_quality", "high_quality"),
labels = c("low quality", "high quality")))
Make feature matrices
Make training and test feature matrices using your best recipe.
<- rec_norm |>
feat_trn_best prep(data_trn) |>
bake(NULL)
<- rec_norm |>
feat_test_best prep(data_trn) |>
bake(data_test)
Fit best model
Fit best model in feat_trn_best.
<- mlp(epochs = 30,
best_model hidden_units = select_best(fits_nnet)$hidden_units,
penalty = select_best(fits_nnet)$penalty,
dropout = select_best(fits_nnet)$dropout,
activation = select_best(fits_nnet)$activation) |>
set_mode("classification") |>
set_engine("keras") |>
fit(quality ~ ., data = feat_trn_best)
Epoch 1/30
1/115 [..............................] - ETA: 24s - loss: 0.6703
71/115 [=================>............] - ETA: 0s - loss: 0.6286
115/115 [==============================] - 0s 705us/step - loss: 0.6103
Epoch 2/30
1/115 [..............................] - ETA: 0s - loss: 0.5046
75/115 [==================>...........] - ETA: 0s - loss: 0.5074
115/115 [==============================] - 0s 682us/step - loss: 0.5071
Epoch 3/30
1/115 [..............................] - ETA: 0s - loss: 0.4279
75/115 [==================>...........] - ETA: 0s - loss: 0.4714
115/115 [==============================] - 0s 691us/step - loss: 0.4763
Epoch 4/30
1/115 [..............................] - ETA: 0s - loss: 0.4373
76/115 [==================>...........] - ETA: 0s - loss: 0.4788
115/115 [==============================] - 0s 682us/step - loss: 0.4792
Epoch 5/30
1/115 [..............................] - ETA: 0s - loss: 0.5314
75/115 [==================>...........] - ETA: 0s - loss: 0.4697
115/115 [==============================] - 0s 682us/step - loss: 0.4709
Epoch 6/30
1/115 [..............................] - ETA: 0s - loss: 0.4492
76/115 [==================>...........] - ETA: 0s - loss: 0.4796
115/115 [==============================] - 0s 682us/step - loss: 0.4764
Epoch 7/30
1/115 [..............................] - ETA: 0s - loss: 0.3049
76/115 [==================>...........] - ETA: 0s - loss: 0.4422
115/115 [==============================] - 0s 682us/step - loss: 0.4479
Epoch 8/30
1/115 [..............................] - ETA: 0s - loss: 0.5070
74/115 [==================>...........] - ETA: 0s - loss: 0.4404
115/115 [==============================] - 0s 691us/step - loss: 0.4450
Epoch 9/30
1/115 [..............................] - ETA: 0s - loss: 0.4099
76/115 [==================>...........] - ETA: 0s - loss: 0.4407
115/115 [==============================] - 0s 682us/step - loss: 0.4449
Epoch 10/30
1/115 [..............................] - ETA: 0s - loss: 0.5499
76/115 [==================>...........] - ETA: 0s - loss: 0.4623
115/115 [==============================] - 0s 682us/step - loss: 0.4618
Epoch 11/30
1/115 [..............................] - ETA: 0s - loss: 0.5402
76/115 [==================>...........] - ETA: 0s - loss: 0.4623
115/115 [==============================] - 0s 682us/step - loss: 0.4600
Epoch 12/30
1/115 [..............................] - ETA: 0s - loss: 0.6676
76/115 [==================>...........] - ETA: 0s - loss: 0.4639
115/115 [==============================] - 0s 682us/step - loss: 0.4588
Epoch 13/30
1/115 [..............................] - ETA: 0s - loss: 0.4181
75/115 [==================>...........] - ETA: 0s - loss: 0.4480
115/115 [==============================] - 0s 691us/step - loss: 0.4465
Epoch 14/30
1/115 [..............................] - ETA: 0s - loss: 0.4581
76/115 [==================>...........] - ETA: 0s - loss: 0.4358
115/115 [==============================] - 0s 682us/step - loss: 0.4358
Epoch 15/30
1/115 [..............................] - ETA: 0s - loss: 0.3175
75/115 [==================>...........] - ETA: 0s - loss: 0.4406
115/115 [==============================] - 0s 682us/step - loss: 0.4388
Epoch 16/30
1/115 [..............................] - ETA: 0s - loss: 0.3677
76/115 [==================>...........] - ETA: 0s - loss: 0.4223
115/115 [==============================] - 0s 682us/step - loss: 0.4233
Epoch 17/30
1/115 [..............................] - ETA: 0s - loss: 0.4612
75/115 [==================>...........] - ETA: 0s - loss: 0.4296
115/115 [==============================] - 0s 682us/step - loss: 0.4301
Epoch 18/30
1/115 [..............................] - ETA: 0s - loss: 0.3854
76/115 [==================>...........] - ETA: 0s - loss: 0.4404
115/115 [==============================] - 0s 678us/step - loss: 0.4377
Epoch 19/30
1/115 [..............................] - ETA: 0s - loss: 0.4232
75/115 [==================>...........] - ETA: 0s - loss: 0.4298
115/115 [==============================] - 0s 682us/step - loss: 0.4276
Epoch 20/30
1/115 [..............................] - ETA: 0s - loss: 0.3333
76/115 [==================>...........] - ETA: 0s - loss: 0.4210
115/115 [==============================] - 0s 674us/step - loss: 0.4201
Epoch 21/30
1/115 [..............................] - ETA: 0s - loss: 0.3031
76/115 [==================>...........] - ETA: 0s - loss: 0.4103
115/115 [==============================] - 0s 674us/step - loss: 0.4134
Epoch 22/30
1/115 [..............................] - ETA: 0s - loss: 0.5703
76/115 [==================>...........] - ETA: 0s - loss: 0.4084
115/115 [==============================] - 0s 682us/step - loss: 0.4103
Epoch 23/30
1/115 [..............................] - ETA: 0s - loss: 0.4140
76/115 [==================>...........] - ETA: 0s - loss: 0.4046
115/115 [==============================] - 0s 682us/step - loss: 0.4075
Epoch 24/30
1/115 [..............................] - ETA: 0s - loss: 0.5179
76/115 [==================>...........] - ETA: 0s - loss: 0.4008
115/115 [==============================] - 0s 682us/step - loss: 0.4030
Epoch 25/30
1/115 [..............................] - ETA: 0s - loss: 0.2851
75/115 [==================>...........] - ETA: 0s - loss: 0.3959
115/115 [==============================] - 0s 683us/step - loss: 0.4011
Epoch 26/30
1/115 [..............................] - ETA: 0s - loss: 0.4847
74/115 [==================>...........] - ETA: 0s - loss: 0.4048
115/115 [==============================] - 0s 690us/step - loss: 0.4046
Epoch 27/30
1/115 [..............................] - ETA: 0s - loss: 0.3547
76/115 [==================>...........] - ETA: 0s - loss: 0.3946
115/115 [==============================] - 0s 678us/step - loss: 0.3960
Epoch 28/30
1/115 [..............................] - ETA: 0s - loss: 0.4363
75/115 [==================>...........] - ETA: 0s - loss: 0.3871
115/115 [==============================] - 0s 682us/step - loss: 0.3906
Epoch 29/30
1/115 [..............................] - ETA: 0s - loss: 0.4660
75/115 [==================>...........] - ETA: 0s - loss: 0.3947
115/115 [==============================] - 0s 678us/step - loss: 0.3950
Epoch 30/30
1/115 [..............................] - ETA: 0s - loss: 0.3104
76/115 [==================>...........] - ETA: 0s - loss: 0.3961
115/115 [==============================] - 0s 682us/step - loss: 0.3980
Generate test predictions
Run this code chunk to save the best model’s predictions in the held-out test set.
|>
feat_test_best mutate(quality = predict(best_model, feat_test_best)$.pred_class) |>
select(quality) |>
glimpse() |>
write_csv(here::here(path_data, "test_preds_TA.csv"))
Model Performance
Since we have access to the labeled test set we can also see how our model did
accuracy_vec(feat_test_best$quality, predict(best_model, feat_test_best)$.pred_class)
[1] 0.7794118