plotting

Utility Functions that can be used to manage colors and other plotting tools. Includes all stable utility functions.

Work with color maps


source

plot_cmap_collections

 plot_cmap_collections (cmap_collections:str|list[str]=None)

Plot all color maps in the collections passed as cmap_collections

Type Default Details
cmap_collections str | list[str] None list of color map collections to display (from cmaps.keys())

The following color map collections are defined:

- Cyclic
- Diverging
- Miscellaneous
- Perceptually Uniform Sequential
- Qualitative
- Sequential
- Sequential (2)

plot_cmap_collections will plot a color bar for each color map in the selected collections:

  • A single collection
plot_cmap_collections('Cyclic')

  • Several collections
plot_cmap_collections(['Qualitative', 'Sequential'])

  • All the collections
plot_cmap_collections()


source

plot_color_bar

 plot_color_bar (cmap:str, series:list[int|float]=None)

Plot a color bar with value overlay from series based on cmap

Type Default Details
cmap str string name of one of the cmaps
series list[int | float] None series of numerical values to show for each color
plot_color_bar('tab10', range(10))

plot_color_bar('tab10', series=range(6))

plot_color_bar('tab10', series=[0, 1, 2])


source

get_color_mapper

 get_color_mapper (series:list[int|float], cmap:str='tab10')

Return color mapper based on a color map and a series of values

Type Default Details
series list[int | float] series of values to map to colors
cmap str tab10 name of the cmap to use

Usage

This function is used to ensure coherent colors for different plots.

  1. Define a color mapper based on values and cmap: clr_mapper = get_color_mapper([1, 2, 3, 4], cmap='Paired)
  2. Call the color mapper and have it return the appropriate values for any plot: clr_mapper.to_rgba(2)
Example

We have dataset with several features.

import pandas as pd
from sklearn.cluster import DBSCAN, KMeans
from sklearn.datasets import make_blobs
n_feats = 6
col_list = [f"col_{i}" for i in range(n_feats)]
X, y = make_blobs(n_samples=5_000, n_features=n_feats, centers=10, shuffle=True)

X = pd.DataFrame(X, columns=col_list)
X.head(3)
col_0 col_1 col_2 col_3 col_4 col_5
0 -4.377115 -3.113744 4.417737 7.327412 7.366114 0.033885
1 -10.319655 -6.998589 1.126784 7.731522 4.524063 -1.337312
2 1.542669 6.398550 8.267037 -1.024028 -0.697208 8.599691
1. Define a color mapper based on values and cmap

We cluster the data into 10 clusters and make a scatter plot of two of the features, displaying the 10 cluster using a cmap.

To ensure that we can keep the same cluster color mapping for other plots, we use clr_mapper to predefine how colors are mapped to each cluster:

  • clr_mapper = get_color_mapper(cluster_ids, cmap=cmap).
clustering = KMeans(n_clusters=10)
clusters = clustering.fit_predict(X)
cluster_ids = np.unique(clusters)
cmap='tab10'

plt.figure(figsize=(6, 3))
plt.scatter(X.col_0, X.col_1, c=clusters, s=2, cmap=cmap)
plt.colorbar()
plt.title('two first features data points, colored by cluster value')
plt.show()

clr_mapper = get_color_mapper(cluster_ids, cmap=cmap)
Call the color mapper and use it in any plot

Use for another plot, showing another feature, and its value for each sample, colored according to its cluster

featname = 'col_4'
plt.figure(figsize=(12, 3))
plt.plot(X[featname], c='grey', alpha=.25, lw=0.25)
plt.title(f'{featname}.')
plt.show()

plt.figure(figsize=(12, 3))
plt.plot(X[featname], c='grey', alpha=.25, lw=0.25)
for c in cluster_ids:
    mask = y == c
    X[f"{featname}_{c}"] = X.loc[:, featname]
    X.loc[~mask, f"{featname}_{c}"] = np.nan
    plt.plot(X[f"{featname}_{c}"], label=str(c), c=clr_mapper.to_rgba(c), lw=0, marker='o', markersize=1)
plt.title(f'{featname}. Data points colored according to the cluster it belongs to.')
plt.legend()
plt.show()

Advanced plots


source

plot_feature_scatter

 plot_feature_scatter (X:numpy.ndarray, y:Optional[numpy.ndarray]=None,
                       n_plots:int=2, axes_per_row:int=3, axes_size:int=5)

Plots n_plots scatter plots of randomly selected combinations of two features out of X

Type Default Details
X np.ndarray input dataset. X.shape[1] is used to set the total number of features
y Optional[np.ndarray] None target dataset
n_plots int 2 number of feature pairs scatter plot to show
axes_per_row int 3 number of axes per row. number of rows will be calculated accordingly
axes_size int 5 size of one axes. figsize will be (ncols * axes_size, nrows * axes_size)
X.shape
(5000, 16)
n_feats = 6
col_list = [f"col_{i}" for i in range(n_feats)]
X, y = make_blobs(n_samples=5_000, n_features=n_feats, centers=10, shuffle=True)

X = pd.DataFrame(X, columns=col_list)

plot_feature_scatter(X.values, y, n_plots=6, axes_per_row=3, axes_size=5)

When not value is available for y, it is set to 1 by default

plot_feature_scatter(X.values, n_plots=4, axes_per_row=2, axes_size=2)