Glosssary
Binary (or binomial) classification: a classification with a target that can fall into one of two classes.
Classification: a type of supervised learning with a categorical target.
Features: are variables (columns) in a dataset.
Observations: are instances (rows) in a dataset.
Regression: a type of supervised learning with a numerical target.
Supervised learning: a task where you need to train a model using a training dataset (with a known target value) to predict the target for unknown data.
target: the feature we want to predict.
Training dataset: a dataset that we use to train our machine learning algorithm.
Accuracy: the ratio of the number of correct answers to the size of the test set.
Evaluation metrics: the ways to measure the quality of a machine learning model.
Overfitting: the problem when the model's evaluation metric shows good results for the training set and poor results for the test set.
Precision: an evaluation metric that shows the ratio of the number of actual observations with answer "1" to the number of observations marked as "1" by the model.
Recall: an evaluation metric that shows what portion of actual "1" observations was marked as "1" by the model.
Sanity check: the process of comparing our model with a random one to assess whether the model makes sense.
Test set: a set used to test the quality of a trained model.
Underfitting: the problem when the model's evaluation metric shows poor results for both the training set and the test set.
Hyperparameters: are settings for learning algorithms.
Parameters: are model's settings obtained from training data that determine the model's work.
Validation set: a set of data that is a part of a source dataset. It's used to check an algorithm‘s quality during a model's training.