Introduction to Applied Machine Learning
Course Syllabus
Instructor
Teaching Assistants
Class Email Listserv
Communications
Meeting Times
Course Description
Requisites
Learning Outcomes
Course Topics
Required Textbooks and Software
Grading
Projects and Quizzes
Homework
Schedule
Student Ethics
Complaints
Diversity and Inclusion
Accommodations Policy
Reading and Video Assigments
Required Textbooks
Unit 1: Overview of machine learning concepts and uses
Unit 2: Introduction to regression models
Unit 3: Introduction to classification models
Unit 4: Cross validation methods
Unit 5: Subsetting and filtering
Unit 6: Regularization and penalized models
0.1
Unit 7: Bootstrapping and permutation tests
0.2
Unit 8: Decison trees
0.3
Unit 9: Bagging and boosting
0.4
Unit 10: Neural networks
0.5
Unit 11: NLP: Text processing
0.6
Unit 12: NLP: n-grams and bag of words
0.7
Unit 13: NLP: Sentiment analysis
0.8
Unit 14: NLP: Topic modeling
Homework Assigments
Homework 1: A gentle introduction to tibbles and the Tidyverse
Homework 2: Regression models
Homework 3: Classification models Part 1
Homework 3: Classification models Part 1
Homework 4: Classification models Part 2
Homework 5: Cross-validation
Homework 6: Subsetting and penalized models
1
Overview of Machine Learning Concepts and Uses
1.1
Course Overview
1.2
Discussion of Yarkoni 2017
1.2.1
Goal of scientific psychology is to understand human behavior
1.2.2
Association vs. prediction
1.2.3
Overfitting
is key concern with traditional one-sample statistical approaches
1.2.4
The
Bias-variance trade-off
1.2.5
Assessing and minimizing prediction error
1.3
Concepts and Definitions (Chapters 1 & 2 in ISL)
1.3.1
An introductory framework for machine learning
1.3.2
More details on supervised techniques
1.3.3
How do we estimate
\(f\)
?
1.3.4
How do we assess model performance?
1.4
An Empirical Demonstration of Overfitting and the Bias-Variance Trade-off
1.5
Programming Tips
2
Introduction to Regression Models
2.1
The Ames Housing Prices Dataset
2.2
The General Linear Model
2.2.1
Simple Regression
2.2.2
Extension to Multiple Linear Regression
2.2.3
Extension to Categorical Predictors
2.2.4
Extension to Interactive Models
2.2.5
Extension to Non-linear Models
2.3
Combining Sets of Predictors Using
\(model.matix()\)
2.3.1
Other Regression Performance Metrics
2.4
KNN Regression
2.4.1
The
hyperparameter
k
2.4.2
Defining “Nearest”
2.4.3
Normalization of X
2.4.4
KNN with Ames Housing Prices
2.5
Programming Tips
3
Introduction to Classification Models
3.1
Unit overview
3.2
Bayes classifier
3.3
Logistic regression
3.4
K nearest neighbors
3.5
Linear discriminant analysis
3.6
Quadratic discriminant analysis
3.7
Comparisons between these four classifiers
3.8
A quick tour of many classifiers
3.9
Classification model performance metrics
4
Cross Validation Methods
4.1
Unit Overview
4.2
The single validation (hold-out; test) set approach
4.3
Leave One Out Cross Validation
4.4
K-fold Cross Validation
4.5
Repeated K-fold Cross Validation
4.6
Bootstrapping for Cross Validation
4.7
Grouped K-fold
4.8
Using CV to Select Best Model Configurations
4.9
Caret Pre-processing within CV
4.10
Training, Validation, and Test
4.11
Nested Cross Validation
4.12
Alternative Performance Metrics in
\(train()\)
4.13
Data Exploration in a Nested World….
5
Subsetting and Univariate Filters
5.1
Unit Overview
5.2
Subset Selection Methods
5.3
Best Subset Selection
5.4
Forward Stepwise Selection
5.5
Backward Stepwise Selection
5.6
Implemention in Caret
6
Regularization and Penalized Models
6.1
Cost functions
6.2
Intuitions about Penalized Cost Functions and Regularization
6.3
Ridge Regression
6.4
LASSO Regression
6.5
LASSO vs. Ridge Comparison
6.6
Elastic Net Regression
6.7
Hyper-parameter Selection
6.8
Applications in a Sample Dataset
6.9
Ridge, LASSO, and Elastic net models for other Y distributions
7
Bootstrapping Standard Errors and Confidence Intervals
8
Dimensionality Reduction
8.1
Principal Components Regression
8.2
Partial Least Squares
References
9
Natural Language Processing
9.1
Programming Tips
10
Model Comparisons
10.1
Programming Tips
11
Tree-Based Methods, Bagging, Boosting
11.1
Programming Tips
12
Neural Networks
12.1
Programming Tips
13
Support Vector Machines
13.1
Programming Tips
Questions
13.2
Homework 1
13.3
Homework 2
13.3.1
Conceptual
13.3.2
Homework expectations and structure
13.3.3
The Homework
Appendix 1: Data Exploration Techniques
13.4
Sample data set for exploration
13.5
Missing data
13.6
All univariate plots
13.7
All X vs Y plots
13.8
Outliers
13.9
Visualize correlations among variables
Appendix 3 Curated Online Resources
13.10
Plots and visual data exploration
13.11
Data wrangling/Tidyverse and advanced programming concepts
13.12
Statistical learning algorithms
13.13
Performance Metrics
13.14
Caret
13.15
R Markdown
13.16
Cheatsheets
14
Appendix 2: Common Concepts, Terms, and Abbreviations
15
Appendix 5: Exemplar Datasets
15.1
UCI Machine Learning Repository
15.2
mlbench: Machine Learning Benchmark Problems
15.3
Kaggle
Published with bookdown
Introduction to Applied Machine Learning
12
Neural Networks
12.1
Programming Tips