Glosssary

Binary (or binomial) classification: a classification with a target that can fall into one of two classes.

Classification: a type of supervised learning with a categorical target.

Features: are variables (columns) in a dataset.

Observations: are instances (rows) in a dataset.

Regression: a type of supervised learning with a numerical target.

Supervised learning: a task where you need to train a model using a training dataset (with a known target value) to predict the target for unknown data.

target: the feature we want to predict.

Training dataset: a dataset that we use to train our machine learning algorithm.

Accuracy: the ratio of the number of correct answers to the size of the test set.

Evaluation metrics: the ways to measure the quality of a machine learning model.

Overfitting: the problem when the model's evaluation metric shows good results for the training set and poor results for the test set.

Precision: an evaluation metric that shows the ratio of the number of actual observations with answer "1" to the number of observations marked as "1" by the model.

Recall: an evaluation metric that shows what portion of actual "1" observations was marked as "1" by the model.

Sanity check: the process of comparing our model with a random one to assess whether the model makes sense.

Test set: a set used to test the quality of a trained model.

Underfitting: the problem when the model's evaluation metric shows poor results for both the training set and the test set.

Hyperparameters: are settings for learning algorithms.

Parameters: are model's settings obtained from training data that determine the model's work.

Validation set: a set of data that is a part of a source dataset. It's used to check an algorithm‘s quality during a model's training.