Knowledge Base

Data Collection

Glossary

A crawler/scraper: a software that is used to extract data from websites.

Cross-validation: a method of model training and testing when the training set is split into KK equal blocks. At each of the KK stages, the block ii index is used for validation, and the rest for training.

Data labeling/data annotation: the process of determining the target values.

Labeled data: data with known target value.

Target Leakage: a situation when information about the target accidentally leaks into the features.

Unlabeled data: data that lacks the target value.

Practice

1# Cross-validation
2# model — untrained model for cross-validation;
3# cv — number of blocks for cross-validation (by default, it's 3).
4
5from sklearn.model_selection import cross_val_score
6cross_val_score(model, features, target, cv=3)
Send Feedback
close
  • Bug
  • Improvement
  • Feature
Send Feedback
,