Homework Unit 10: Neural Networks

Author

TA Key

Published

March 20, 2026

Introduction

To begin, download the following from the course web book (Unit 10):

  • hw_unit_10_neural_nets.qmd (notebook for this assignment)

  • wine_quality_trn.csv and wine_quality_test.csv (data for this assignment)

The data for this week’s assignment include wine quality ratings for various white wines. The outcome is quality - a categorical (“high_quality” or “low_quality”) outcome. There are 11 numeric predictors that describe attributes of the wine (e.g., acidity) that may relate to the overall quality rating.

We will be doing another competition this week. You will fit a series of neural network model configurations and select the best configuration among them. You will use the wine_quality_test.csv file only once at the end of the assignment to generate your best model’s predictions in the held-out test set (the test set will not include outcome labels). We will assess everyone’s predictions on the held-out set to determine the best fitting model in the class. The winner again gets a free lunch from John!

Note that this week’s assignment is less structured than previous assignments, allowing you to make more independent decisions about how to approach all aspects of the modeling process. Try to use the knowledge that you have developed from the work in previous assignments to explore your data, generate features, and examine different model configurations.

For this assignment, you will fit neural networks using set_engine("brulee"). The brulee package is a pure R implementation. If you haven’t already installed it, run install.packages("brulee") once before starting.

Let’s get started!

Setup

Set up your notebook in this section. You will want to be sure to set your path_data and initiate parallel processing here! The brulee package should be installed (run install.packages("brulee") if needed) but does not need to be explicitly loaded — it is accessed through tidymodels/parsnip.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidymodels)
── Attaching packages ────────────────────────────────────── tidymodels 1.4.1 ──
✔ broom        1.0.12     ✔ rsample      1.3.2 
✔ dials        1.4.2      ✔ tailor       0.1.0 
✔ infer        1.1.0      ✔ tune         2.0.1 
✔ modeldata    1.5.1      ✔ workflows    1.3.0 
✔ parsnip      1.4.1      ✔ workflowsets 1.1.1 
✔ recipes      1.3.1      ✔ yardstick    1.3.2 
── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ scales::discard() masks purrr::discard()
✖ dplyr::filter()   masks stats::filter()
✖ recipes::fixed()  masks stringr::fixed()
✖ dplyr::lag()      masks stats::lag()
✖ yardstick::spec() masks readr::spec()
✖ recipes::step()   masks stats::step()
options(conflicts.policy = "depends.ok")

library(xfun, include.only = "cache_rds")
library(future)

devtools::source_url("https://github.com/jjcurtin/lab_support/blob/main/fun_plots.R?raw=true")
ℹ SHA-1 hash of file is "def6ce26ed7b2493931fde811adff9287ee8d874"
devtools::source_url("https://github.com/jjcurtin/lab_support/blob/main/fun_eda.R?raw=true")
ℹ SHA-1 hash of file is "c045eee2655a18dc85e715b78182f176327358a7"
devtools::source_url("https://github.com/jjcurtin/lab_support/blob/main/fun_ml.R?raw=true")
ℹ SHA-1 hash of file is "32a0bc8ced92c79756b56ddcdc9a06e639795da6"
theme_set(theme_classic())
options(tibble.width = Inf, dplyr.print_max = Inf)
rerun_setting <- TRUE

path_data <- "application_assignments/unit_10"

plan(multisession, workers = parallel::detectCores(logical = FALSE))

Expectations for minimum model requirements

Across the model configurations you compare, you must fit models that vary with respect to:

  • Hidden layer architecture: Consider at least two different numbers of units within the hidden layer.

  • Controlling overfitting: Consider L2 regularization (using the penalty argument in mlp()) with at least two associated hyperparameter values.

  • Hidden layer activation function: Consider at least two activation functions for the hidden layer.

Some things to consider

About configurations

Of course you can’t choose between model configurations based on training set performance, so you’ll need to use some resampling technique. There are different costs and benefits associated with different resampling techniques (e.g., bias, variance, computing time), so you’ll need to decide which technique is the best for your needs for this assignment. Specifically, you should choose among a validation split, k-fold cross-validation, or bootstrapping for cross-validation.

Resampling and tuning

Given what you’ve learned about resampling/tuning, think about how you might move systematically through model configurations rather than haphazardly changing model configuration characteristics.

Consider compute time

As you weigh computing costs, think about what that might look like given your current context. For example, imagine that each model configuration takes 2 minutes to fit. If you want to use 100 bootstraps for CV, that means ~200 minutes (just over 3 hours) per model configuration. Now imagine you’re comparing 8 different model configurations. That multiplies your 200 minutes by 8, which starts to get pretty long. If you’re using 10-fold CV, that means it only takes 20 minutes per model configuration, so you might be able to compare more configurations. A validation split would be even simpler. Think about the costs and benefits of each approach, pick one and motivate it with commentary in your submission.

Performance metric: accuracy

Regardless of the resampling technique you choose, please compare models using accuracy as your performance metric.

Okay, now onto some EDA and then modeling…

Feature engineering

Read in wine_quality_trn.csv

data_trn <- read_csv(here::here(path_data, "wine_quality_trn.csv"),
                     col_types = cols()) |>
  mutate(quality = factor(quality,
                          levels = c("low_quality", "high_quality"),
                          labels = c("low quality", "high quality")))

Perform some modeling EDA. How will you scale your features? Should your features be normalized, scaled, or transformed in some other way? Decorrelated? Provide an explanation for your decisions here along with as much EDA as you find necessary.

# Quick look at the data
data_trn |> glimpse()
Rows: 3,674
Columns: 12
$ fixed_acidity        <dbl> 7.0, 6.3, 7.2, 7.2, 8.1, 7.0, 6.3, 8.1, 7.9, 6.6,…
$ volatile_acidity     <dbl> 0.27, 0.30, 0.23, 0.23, 0.28, 0.27, 0.30, 0.27, 0…
$ citric_acid          <dbl> 0.36, 0.34, 0.32, 0.32, 0.40, 0.36, 0.34, 0.41, 0…
$ residual_sugar       <dbl> 20.70, 1.60, 8.50, 8.50, 6.90, 20.70, 1.60, 1.45,…
$ chlorides            <dbl> 0.045, 0.049, 0.058, 0.058, 0.050, 0.045, 0.049, …
$ free_sulfur_dioxide  <dbl> 45, 14, 47, 47, 30, 45, 14, 11, 16, 48, 41, 28, 3…
$ total_sulfur_dioxide <dbl> 170, 132, 186, 186, 97, 170, 132, 63, 75, 143, 17…
$ density              <dbl> 1.0010, 0.9940, 0.9956, 0.9956, 0.9951, 1.0010, 0…
$ ph                   <dbl> 3.00, 3.30, 3.19, 3.19, 3.26, 3.00, 3.30, 2.99, 3…
$ sulphates            <dbl> 0.45, 0.49, 0.40, 0.40, 0.44, 0.45, 0.49, 0.56, 0…
$ alcohol              <dbl> 8.8, 9.5, 9.9, 9.9, 10.1, 8.8, 9.5, 12.0, 10.8, 1…
$ quality              <fct> high quality, high quality, high quality, high qu…
data_trn |> tab(quality)
# A tibble: 2 × 3
  quality          n  prop
  <fct>        <int> <dbl>
1 low quality   1230 0.335
2 high quality  2444 0.665
# Check for zero-variance predictors
data_trn |>
  select(-quality) |>
  summarise(across(everything(), sd)) |>
  pivot_longer(everything(), names_to = "predictor", values_to = "sd") |>
  filter(sd == 0)
# A tibble: 0 × 2
# ℹ 2 variables: predictor <chr>, sd <dbl>
# No zero-variance predictors with this dataset, so step_zv is not critical
# but we include it as good practice
# Check predictor distributions to decide on scaling approach
data_trn |>
  select(-quality) |>
  pivot_longer(everything(), names_to = "predictor", values_to = "value") |>
  ggplot(aes(x = value)) +
  geom_histogram(bins = 30) +
  facet_wrap(~predictor, scales = "free") +
  labs(title = "Distribution of predictors")

Gradient descent works better with inputs on the same scale, and L2 regularization requires comparable variance across predictors. We will compare a few scaling strategies: range scaling (0-1), standardization (step_normalize), and standardization with Yeo-Johnson transformation to handle skewness. We will use a validation split to select the best feature engineering approach before full hyperparameter tuning.

# Recipe 1: Range scaling (0-1)
rec_range <- recipe(quality ~ ., data = data_trn) |>
  step_range(all_predictors())

# Recipe 2: Standardization (zero mean, unit variance) -- our primary candidate
rec_norm <- recipe(quality ~ ., data = data_trn) |>
  step_normalize(all_predictors())

# Recipe 3: Standardization + Yeo-Johnson to handle skewed predictors
rec_norm_yj <- recipe(quality ~ ., data = data_trn) |>
  step_YeoJohnson(all_predictors()) |>
  step_normalize(all_predictors())
# Validation split for screening feature engineering choices
set.seed(102030)
split_val <- validation_split(data_trn, prop = 3/4, strata = "quality")
Warning: `validation_split()` was deprecated in rsample 1.2.0.
ℹ Please use `initial_validation_split()` instead.
# Compare feature engineering approaches with a default mlp
fit_nnet_range <- cache_rds(
  expr = {
    mlp(epochs = 30) |>
      set_mode("classification") |>
      set_engine("brulee") |>
      fit_resamples(preprocessor = rec_range,
                    resamples = split_val,
                    metrics = metric_set(accuracy))
  },
  dir = "cache/010/",
  file = "fit_nnet_range",
  rerun = rerun_setting)

fit_nnet_norm <- cache_rds(
  expr = {
    mlp(epochs = 30) |>
      set_mode("classification") |>
      set_engine("brulee") |>
      fit_resamples(preprocessor = rec_norm,
                    resamples = split_val,
                    metrics = metric_set(accuracy))
  },
  dir = "cache/010/",
  file = "fit_nnet_norm",
  rerun = rerun_setting)

fit_nnet_norm_yj <- cache_rds(
  expr = {
    mlp(epochs = 30) |>
      set_mode("classification") |>
      set_engine("brulee") |>
      fit_resamples(preprocessor = rec_norm_yj,
                    resamples = split_val,
                    metrics = metric_set(accuracy))
  },
  dir = "cache/010/",
  file = "fit_nnet_norm_yj",
  rerun = rerun_setting)
# Compare feature engineering accuracy
collect_metrics(fit_nnet_range)
# A tibble: 1 × 6
  .metric  .estimator  mean     n std_err .config        
  <chr>    <chr>      <dbl> <int>   <dbl> <chr>          
1 accuracy binary     0.751     1      NA pre0_mod0_post0
collect_metrics(fit_nnet_norm)
# A tibble: 1 × 6
  .metric  .estimator  mean     n std_err .config        
  <chr>    <chr>      <dbl> <int>   <dbl> <chr>          
1 accuracy binary     0.779     1      NA pre0_mod0_post0
collect_metrics(fit_nnet_norm_yj)
# A tibble: 1 × 6
  .metric  .estimator  mean     n std_err .config        
  <chr>    <chr>      <dbl> <int>   <dbl> <chr>          
1 accuracy binary     0.779     1      NA pre0_mod0_post0

Standardization (rec_norm) performs best or comparably to the other approaches and is the simplest. We will proceed with rec_norm for all subsequent hyperparameter tuning.

Fit models

Depending on the resampling technique you choose and how you plan to examine various model configurations, your code set-up is going to look different. Create as many code chunks as needed for whatever your approach.

Fit all models using set_engine("brulee"). The key tunable hyperparameters available through brulee are hidden_units, activation (options: "relu", "tanh", "sigmoid", "elu", "linear"), penalty (L2 regularization), and epochs. Refer to the Unit 10 lecture for example model fits.

Where needed, please include some annotation (in text outside of code chunks and/or in commented lines within code chunks) to help us review your assignment. You don’t need to tell us everything you’re doing because that should be relatively clear from the code! A good rule of thumb is to use annotation to tell us why you’re doing something (e.g., if you had several choices for how to proceed, why did you choose the one you did?) but you don’t have to describe what you’re doing (e.g., you don’t need to tell us you’re building a recipe - we will see that in your code).

We will use 10-fold cross-validation with 3 repeats (stratified on quality) for hyperparameter tuning. This provides a lower-variance performance estimate than a single validation split at the cost of compute time. Given that the dataset is small enough to make this tractable, this is preferred over a single validation split or bootstrapping (which would be even more compute-intensive for minimal gain in bias reduction).

# 10-fold CV with 3 repeats for full hyperparameter tuning
set.seed(102030)
splits_kfold <- data_trn |>
  vfold_cv(v = 10, repeats = 3, strata = "quality")

We build a grid that covers: - Hidden units: 5, 10, 20, 30, 50, 100 (wide range from simple to complex) - Activation: relu and tanh (two distinct function families to meet the assignment requirement) - Penalty: 0.0001, 0.001 (larger values can cause numerical overflow with brulee)

config_grid <- expand_grid(
  hidden_units = c(5, seq(10, 50, by = 10), 100),
  activation   = c("relu", "tanh"),
  penalty      = c(0.0001, 0.001)
)

nrow(config_grid)  # number of configurations
[1] 28
fits_nnet <- cache_rds(
  expr = {
    mlp(epochs      = 30,
        hidden_units = tune(),
        penalty      = tune(),
        activation   = tune()) |>
      set_mode("classification") |>
      set_engine("brulee") |>
      tune_grid(preprocessor = rec_norm,
                grid         = config_grid,
                resamples    = splits_kfold,
                metrics      = metric_set(accuracy))
  },
  dir    = "cache/010/",
  file   = "fits_nnet",
  rerun  = rerun_setting)
→ A | warning: Early stopping occurred at epoch 1 due to numerical overflow of the loss
               function.
→ B | error:   Could not find model.0.weight in the state_dict.
→ C | warning: Early stopping occurred at epoch 2 due to numerical overflow of the loss
               function.
→ D | warning: Early stopping occurred at epoch 13 due to numerical overflow of the loss
               function.
→ A | warning: Early stopping occurred at epoch 3 due to numerical overflow of the loss
               function.
→ A | warning: Early stopping occurred at epoch 6 due to numerical overflow of the loss
               function.
→ A | warning: Early stopping occurred at epoch 5 due to numerical overflow of the loss
               function.
→ A | warning: Early stopping occurred at epoch 4 due to numerical overflow of the loss
               function.
→ B | warning: Early stopping occurred at epoch 2 due to numerical overflow of the loss
               function.
# Review top configurations
show_best(fits_nnet, n = 10)
Warning in show_best(fits_nnet, n = 10): No value of `metric` was given;
"accuracy" will be used.
# A tibble: 10 × 9
   hidden_units penalty activation .metric  .estimator  mean     n std_err
          <dbl>   <dbl> <chr>      <chr>    <chr>      <dbl> <int>   <dbl>
 1           50  0.0001 relu       accuracy binary     0.779    30 0.00421
 2          100  0.0001 relu       accuracy binary     0.777    30 0.00355
 3          100  0.001  relu       accuracy binary     0.777    30 0.00449
 4           40  0.001  relu       accuracy binary     0.775    30 0.00397
 5           30  0.001  relu       accuracy binary     0.774    30 0.00466
 6           50  0.001  relu       accuracy binary     0.774    30 0.00418
 7           40  0.0001 relu       accuracy binary     0.773    30 0.00438
 8           10  0.0001 relu       accuracy binary     0.773    30 0.00402
 9           20  0.001  relu       accuracy binary     0.772    30 0.00321
10           20  0.0001 relu       accuracy binary     0.771    30 0.00335
   .config         
   <chr>           
 1 pre0_mod21_post0
 2 pre0_mod25_post0
 3 pre0_mod27_post0
 4 pre0_mod19_post0
 5 pre0_mod15_post0
 6 pre0_mod23_post0
 7 pre0_mod17_post0
 8 pre0_mod05_post0
 9 pre0_mod11_post0
10 pre0_mod09_post0

Generate predictions

This section is for generating predictions for your best model in the held-out test set. You should only generate predictions for one model out of all configurations you examined. We will use these predictions to generate your one estimate of model performance in the held-out data.

Add your last name between the quotation marks to be used in naming your predictions file.

last_name <- "TA"

Read in test data

Read in the wine_quality_test.csv file.

data_test <- read_csv(here::here(path_data, "wine_quality_test.csv"),
                      col_types = cols())

Make feature matrices

Make training feature matrix using your best recipe.

feat_trn_best <- rec_norm |>
  prep(data_trn) |>
  bake(NULL)

Make test feature matrix using your best recipe.

feat_test_best <- rec_norm |>
  prep(data_trn) |>
  bake(data_test)

Fit best model

Fit best configuration (selected via select_best()) in the full training set (no resampling).

best_config <- select_best(fits_nnet, metric = "accuracy")

best_model <- mlp(epochs       = 30,
                  hidden_units = best_config$hidden_units,
                  penalty      = best_config$penalty,
                  activation   = best_config$activation) |>
  set_mode("classification") |>
  set_engine("brulee") |>
  fit(quality ~ ., data = feat_trn_best)

Generate test predictions

Run this code chunk to save out your best model’s predictions in the held-out test set. Look over your predictions to confirm your model generated valid predictions for each test observation. Make sure the file containing your predictions has the form that you think it should. This requires visually inspecting the output csv file after you write it. The glimpse() call helps too.

if (!is.null(best_model)) {
  feat_test_best |>
    mutate(quality = predict(best_model, feat_test_best)$.pred_class) |>
    select(quality) |>
    glimpse() |>
    write_csv(here::here(path_data, str_c("test_preds_", last_name, ".csv")))
}
Rows: 1,224
Columns: 1
$ quality <fct> high quality, low quality, high quality, low quality, high qua…

Save & render

Render this .qmd file with your last name at the end (e.g., hw_unit_10_neural_nets_name.qmd). Make sure you changed “Your name here” at the top of the file to be your own name. Render the file to html. Upload the rendered file and your saved test_preds_lastname.csv to Canvas.

We will assess everyone’s predictions in the test, and the winner gets a free lunch from John!

Way to go!!