Takeaway Sheet: Solving Tasks Related to Machine Learning

Practice


1# visualizing a correlation matrix
2cm = df.corr()
3sns.heatmap(cm, annot = True, square=True)


1# rendering paired graphs
2sns.scatterplot(df['Feature 1'], df['Feature 2'])


1# replacing categories with numeric values (label encoding)
2from sklearn.preprocessing import LabelEncoder
3
4encoder =  LabelEncoder() # creating a variable of the LabelEncoder class
5df['column'] = encoder.fit_transform(df['column']) # using the encoder to transform strings into numbers


1# transforming a categorical field into a set of binary ones (one-hot encoding)
2df = pd.get_dummies(df)


1# setting an optimization criterion for a model
2# possible criterion values can be found in documentation
3model = RandomForestRegressor(criterion='mae')


1# obtaining the coefficients of linear regression
2feature_weights = model.coef_
3# obtaining the null coefficient
4weight_0 = model.intercept_


1# getting feature importance for decision trees, random forests, and gradient boosting
2importances = model.feature_importances_