yellowbrick.classifier python

yellowbrick

yellowbrick

Alpha AlphaSelection Visualizeralpha

yellowbrick - visualize sklearn's classification & regression metrics in python

yellowbrick - visualize sklearn's classification & regression metrics in python

Python has many libraries that let us build machine learning models easily with a few lines of code. A library like scikit-learn has earned a reputation of the go-to library for ML models by the majority of data scientists and machine learning practitioners. Scikit-learn provides a very easy-to-use interface which lets us build an ML model using python with few lines of codes. Apart from that scikit-learn even provides functionalities related to feature selection, feature extraction, dimensionality reduction, grid searching hyper-parameters, etc. Though scikit-learn provides extensive models and metrics to evaluate those models, it does not provide functionalities to visualize that model evaluation metrics. Yellowbrick is a python library that provides various modules to visualize model evaluation metrics. Yellowbrick has different modules for tasks like feature visualizations, classification task metrics visualizations, regression task metrics visualizations, clustering task metrics visualizations, model selection visualizations, text data related visualizations, etc. We'll be explaining how to use yellowbrick API as a part of this tutorial with primarily concentrating on visualizing classification and regression task metrics.

In this section, we'll be exploring classification metrics visualizations available with yellowbrick. We'll be using different datasets along with different sklearn estimators for this. We'll then print classification metrics visualizations explaining the performance of models on that dataset.

We have now split all three datasets mentioned above into train (80%) and test (20%) sets using the train_test_split() method of sklearn. We'll be using these train/test sets for training models and evaluating performance.

The first chart that we'll introduce is a confusion matrix plot. The classifier module of yellowbrick has a class named ConfusionMatrix which lets us create a confusion matrix chart. We'll first need to create an object of this class passing it machine learning model. We can then call the fit() and score() method on the object of class ConfusionMatrix which will train model passed to it on train data and evaluate it on test data. We can then simply call the show() method on this object and it'll create a confusion matrix of test data. We have generated a confusion matrix of digits test data and used a random forest sklearn estimator.

Please make a note that the show() method will always show charts based on the data set on which the score() method was called. Below we have called the score() method with the test dataset. If we want to generate a confusion matrix for train data then we need to call the score() method with train data.

The yellowbrick also provides another way of creating a chart by using methods if we don't want to create a class object. Below we have created a confusion matrix by using the confusion_matrix() method available.

Please make a note that we have passed the sklearn estimator, train data, test data, and few other parameters to the method. The passing of test data is optional. If we don't pass test data then it'll create a matrix based on train data.

We'll be following the same process to create a visualization that we followed in the previous example. We'll first create an object of class ClassificationReport passing it sklearn estimator and list of class names. We'll then fit that object with train data and evaluate using test data. We'll then call the show() method to generate the figure. We have generated a classification report of digits test data and used a sklearn decision tree estimator for training data.

The third chart type that we'll explain is the ROC AUC curve chart. We have followed the same step of creating a chart as earlier examples. We have first created an object of class ROCAUC passing it sklearn decision tree estimator, fir object to train data, evaluated it on test data and plotted figure of test data by calling show() method.

The fourth chart type that we'll explain is precision-recall curves. We have followed the same process as previous examples to create this chart. We have created an object of class PrecisionRecallCurve with a sklearn decision tree estimator, the trained object on breast cancer train data, and evaluated it on test data. We have then called the show() method to display the chart.

The fifth chart type that we'll introduce is the discrimination threshold. It displays how precision, recall, f1 score, and queue rate change as we change the threshold at which we decide class prediction. The output of the machine learning model for classification problems is generally probability and we decide class based on some threshold on probability. The sklearn has put the threshold generally at 0.5 which means that if the probability is greater than 0.5 then we take the class as positive class else negative class.

Below we have created a discrimination threshold chart by creating an object of class DiscriminationThreshold passing it sklearn logistic regression estimator. We have then fitted object on breast cancer train data and evaluated it on test data before generating a chart.

The sixth and last chart type that we'll introduce for classification metrics visualizations is class prediction error. Its a bar chart showing how many samples are correctly classified and how many wrongly along with a class in which they were wrongly classified.

We have created a chart exactly the same way as earlier. We have created an object of class ClassPredictionError passing it a sklearn decision tree classifier. We have then fitted object on wine train data and evaluated it on test data. We have then plotted the class prediction error chart of test data. We can see from the below chart that few class_1 confused with class_0 from the first bar. Few class_2 samples and few class_0 samples are confused with class_1 from the second bar. The third bar tells us that other classes are not confused with class_2 at least.

The first chart type that we'll introduce for explaining regression metrics visualizations is the residual plot. The residual plots show a scatter plot between the predicted value on x-axis and residual on the y-axis. It points that if points are randomly distributed across the horizontal axis then its advisable to choose linear regression for it else a non-linear model will be an appropriate choice.

We have created a chart by first creating an object of class PredictionError. We have then fitted Boston train data and evaluated test data. The chart also has boolean parameters named bestfit and identity which specifies whether to include the best fit line and identity line in the chart or not.

The third chart type that we'll explain for regression tasks is the alpha selection chart which demonstrates how different values of alpha influence model selection during regularization. It shows the impact of regularization on the model. This will work on RegressionCV models of sklearn which are RidgeCV and LassoCV. We can try different alpha values.

The fourth and last chart type that we'll introduce is the cook's distance chart which displays the importance of individual features on the model. The removal of high influential samples from data might change model coefficients hence performance. Its generally used to detect outliers that can influence the model in a particular direction with the presence of just a few samples.

About: Sunny Solanki has 8+ years of experience in IT Industry. He has worked on various projects involving mostly Python & Java with US and Canadian banking clients. He possesses good hands-on with Python and its ecosystem libraries.His main areas of interests are AI/Machine Learning, Data Visualization, Concurrent Programming and Drones.Apart from his tech life, he prefers reading autobiographies and inspirational books. He also spends much of his time taking care of his 40+ plants.

CoderzColumn is a place developed for the betterment of development. We provide a versatile platform to learn & code in order to provide an opportunity of self-improvement to aspiring learners.

Related Equipments