14 Appendix 2: Common Concepts, Terms, and Abbreviations
- Bias (of a machine learning model)
- The error that is introduced by approximating a real-life data generating process, which may be extremely complicated, by a much simplier model
- Bias-variance trade-off
- A property of a set of predictive models whereby models with a lower bias in parameter estimation have a higher variance of the parameter estimates across samples, and vice versa.
- Lazy learning
- A learning method in which generalization of the training data is, in theory, delayed until the model needs to make a prediction, as opposed to in eager learning, where the system tries to generalize the training data before predictions are made.
- Overfitting problem
- When a model has small error in the training dataset where it was fit but much larger error when applied to an independent test set, it is said to be overfit to the training data. This is common for very flexible models, when p is large relative to n and when n is small. We can often introduce some bias into the model with a large reduction in overfitting to yield better performance in a test set.
- Variance (of a machine learning model)
- The amount by which ˆf would change if we estimated it using a different training dataset
Need defintions for:
Feature
Predictor
Outcome
Feature engineering
Reducible error
Irreducible error
Parametric statistical models
Non-parametric statistical models
Training data set
Test data set
Validation data set
Training (a machine learning model)
Fitting (a machine learning model)
Selecting (a machine learning model)
Evaluating (a machine learning model)
Performance metric
(Model) Hyperparameter
Cost function
Loss function
Tuning metric