Chapter Summary: The seaborn Library
Why Matplotlib Alone Won't Cut It
Matplotlib is a low-level data visualization library for Python, a foundation for more advanced libraries.
Matplotlib is sometimes called "ugly." But there are two ways to make graphs easier on the eyes:
- Use built-in styles in Matplotlib.
- Import and use a library such as seaborn.
To find out which styles are available, check the value of the available method in the style package:
1import matplotlib.pyplot as plt2print(plt.style.available) # calling different sets of colors
Sometimes, you'll need only some of the graphs in a project to have a particular style. Pass the name of the style to the with context manager in the context() method. Then indicate the area to which changes should be restricted:
1with plt.style.context('seaborn-pastel'):2 plt.bar([10, 20, 30, 40],[3, 9, 18, 7])
If all the graphs in the project should have the same style, call the plt.style.use() method when you start the project:
1plt.style.use('ggplot') # the ggplot style is selected here
The jointplot() Method
The seaborn library's jointplot function allows us to combine two different distributions in one graph. This method is one important advantage seaborn has over Matplotlib. You only need one line of code, and voilà! Your joint distribution plot is ready!
1import seaborn as sns2import pandas as pd34taxi = pd.read_csv('/datasets/taxi.csv')56sns.jointplot(x="rating", y="tips", data=taxi)
Let’s add a density distribution and a regression by assigning the value "reg"
(regression) to the kind
argument:
1sns.jointplot(x="rating", y="tips", data=taxi, kind='reg')
This plot could be added to a report intended for your colleagues, although it won't be clear to management without some additional explanation.
Color Palettes
Online services with specially compiled color palettes can help you avoid having hard to read graphs.
The seaborn library also has support for color palettes. You can access them using the color_palette()
method:
1current_palette = sns.color_palette("coolwarm", 20)2print(sns.palplot(current_palette))
You can then set a standard palette for all plots using the set_palette()
method:
1sns.set_palette('dark')
Here are other palette types: documentation. Standard palettes are displayed here:
In order to select a color for a particular plot, add the color
argument and the color’s name.
1sns.jointplot(x="rating", y="tips", data=taxi, kind='reg', color='blue')
Plot Styles
Here are some Matplotlib methods that work for seaborn as well:
set_title()
set_xlabel()
andset_ylabel()
We can change a plot's size using the figure()
method with the figsize
argument, which sets both the width and height in inches:
1plt.figure(figsize=(12, 3)) # Note! Write this code before you create the graph2ax = sns.lineplot(x="timepoint", y="signal", hue="event", style="event", data=fmri)
You can set the graph style using set_style()
. There are five preset themes that you can select from as the value of its only argument: 'darkgrid'
'whitegrid'
'dark'
'white'
'ticks'
'darkgrid'
is the default theme. If you don’t need a grid on your graph, choose the 'dark'
, 'white'
, or 'ticks'
theme. The 'whitegrid'
theme is the best option for complex graphs.
1sns.set_style("dark")
Graphs are usually built on two axes: X and Y. To display only the axes, use the despine()
method.
1sns.despine()
Categorical Data
The seaborn library lets you test plots with built-in datasets documentation. The load_dataset() method gives you access to the built-in datasets:
1import seaborn as sns2iris = sns.load_dataset("iris")3print(iris.head())
YYou can create a bar plot using the barplot()
method with the following arguments:
x
— the data on the X axisy
— the data on the Y axisdata
— the dataset for plottingcolor
orpalette
1import seaborn as sns2flights = sns.load_dataset("flights")3ax = sns.barplot(x="year", y="passengers", data=flights)
barplot()
aggregates data on its own; by default, it calculates the mean. Change this option in the estimator argument:
1import seaborn as sns2from numpy import median3flights = sns.load_dataset("flights")4ax = sns.barplot(x="year", y="passengers", data=flights, estimator=median)
To make a box-and-whisker plot (boxplots) in seaborn, we use boxplot()
1import seaborn as sns23sport = sns.load_dataset("exercise")4ax = sns.boxplot(x="diet", y="pulse", data=sport)
In addition to the x and y axes, you can add a third dimension to the plot. The hue parameter of the boxplot()
method is passed the column it should take the data from:
1import seaborn as sns2from numpy import median3sport = sns.load_dataset("exercise")4ax = sns.boxplot(x="kind", y="pulse", hue="diet", data=sport)
Visualizing Distributions
The seaborn library has ready-made plots for visualizing one-variable distributions and joint distributions.
The distplot()
(distribution plot) method shows you the value's distribution and its density and fuses a histogram with a line graph:
1import seaborn as sns2sport = sns.load_dataset("exercise")3sns.distplot(sport['pulse'])
As with with bar()
plots, you set bins using the bins
argument:
1sns.distplot(sport['pulse'], bins=10)
Construct joint distribution plots using the pairplot()
method:
1sns.pairplot(sport)
A third measurement can also be displayed. It’s declared in the hue
argument:
1sns.pairplot(sport, hue='diet')
Special Plots in seaborn
violinplot()
Like boxplot()
, this plot describes the shape of the distribution. Its unusual appearance is due to the combination of two density distribution plots. The main advantage of violinplot()
over boxplot()
is that it allows you to study the distribution and determine its type.
A comparison of violinplot()
and boxplot()
:
In seaborn, we construct this kind of plot using the violinplot()
method:
1sns.violinplot(x="kind", y="pulse", data=sport, palette='rainbow')
stripplot()
stripplot()
is another way to display categorical data.
In seaborn, we construct this kind of plot using the stripplot()
method:
1sns.stripplot(x="diet", y="pulse", data=sport)