Anomaly Detection
Glossary
nomalies/outliers are observations with abnormal properties (i.e. those that deviate from the normal trend). Outliers indicate a problem in the data or that something is out of the ordinary.
Practice
1# obtaining the list of outliers from column2boxplot = plt.boxplot(df['column'].values)3outliers = list(boxplot["fliers"][0].get_data()[1])
1# Training the isolation forrest for anomaly detection2from sklearn.ensemble import IsolationForest3isolation_forest = IsolationForest(n_estimators=100)4isolation_forest.fit(data)56# Obtaining anomaly estimate.7# Values from -0.5 to 0.5. A lower estimate indicates a higher chance that the observation is an outlier.8anomaly_scores = isolation_forest.decision_function(data)910# Obtaining anomaly prediction. -1 - outlier, 1 - normal observation11estimator = isolation_forest.predict(data)1213# training and obtaining prediction14estimator = isolation_forest.fit_predict(data)
1# KNN-based anomaly detection method23from pyod.models.knn import KNN4model = KNN()5model.fit(data)67# anomaly prediction: 1 - outlier, 0 - normal observation8predictions = model.predict(data)