14 Appendix 2: Common Concepts, Terms, and Abbreviations

Bias (of a machine learning model): The error that is introduced by approximating a real-life data generating process, which may be extremely complicated, by a much simplier model
Bias-variance trade-off: A property of a set of predictive models whereby models with a lower bias in parameter estimation have a higher variance of the parameter estimates across samples, and vice versa.
Lazy learning: A learning method in which generalization of the training data is, in theory, delayed until the model needs to make a prediction, as opposed to in eager learning, where the system tries to generalize the training data before predictions are made.
Overfitting problem: When a model has small error in the training dataset where it was fit but much larger error when applied to an independent test set, it is said to be overfit to the training data. This is common for very flexible models, when p is large relative to n and when n is small. We can often introduce some bias into the model with a large reduction in overfitting to yield better performance in a test set.
Variance (of a machine learning model): The amount by which \(\hat{f}\) would change if we estimated it using a different training dataset

Need defintions for:

Feature

Predictor

Outcome

Feature engineering

Reducible error

Irreducible error

Parametric statistical models

Non-parametric statistical models

Training data set

Test data set

Validation data set

Training (a machine learning model)

Fitting (a machine learning model)

Selecting (a machine learning model)

Evaluating (a machine learning model)

Performance metric

(Model) Hyperparameter

Cost function

Loss function

Tuning metric