Create scatter subplots between two subsets of pandas columns

Question:

I have a dataframe of the form:

df = pd.DataFrame(np.array([[0.2, 0.5, 0.3, 0.1], 
                            [0.1, 0.2, 0.5, 0.6], 
                            [0.4, 0.3, 0.3, 0.6]]),  
                            columns=['a', 'b', 'X', 'Y'])

and I want to perform all possible scatter plots between two subsets of columns: set1 = ['a', 'b'], and set2 = ['X', 'Y']. I’d like to place all these subplots in a "matrix" like:

size_set1 = len(set1)
size_set2 = len(set2)
df.plot(subplots=True, layout=(size_set1,size_set2), figsize=(30,30));

This is as far as I got, but this code does not produce scatter plots, and it just seems to plot each column instead of columns against each other.

The desired output should be (for this example) 4 scatter plots, (X,a), (X,b), (Y,a), (Y,b), arranged 2 above and 2 below.

Asked By: Qubix

||

Answers:

You can try to do it like this:

import itertools
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

df = pd.DataFrame(np.array([[0.2, 0.5, 0.3, 0.1], 
                            [0.1, 0.2, 0.5, 0.6], 
                            [0.4, 0.3, 0.3, 0.6]]),  
                            columns=['a', 'b', 'X', 'Y'])

set1 = ['a', 'b']
set2 = ['X', 'Y']

size_set1 = len(set1)
size_set2 = len(set2)

pairs = list(itertools.product(set1,set2))

for i in range(len(pairs)):
    plt.subplot(size_set2, size_set1, i+1)
    plt.scatter(df[pairs[i][0]], df[pairs[i][1]])

plt.show()

See also this this question and the answers: How to plot in multiple subplots

Answered By: claraja

Maybe something like this:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# added for layout; delete these two lines if you want "blank" plots
import seaborn as sns
sns.set_theme()

df = pd.DataFrame(np.array([[0.2, 0.5, 0.3, 0.1], 
                            [0.1, 0.2, 0.5, 0.6], 
                            [0.4, 0.3, 0.3, 0.6]]),  
                            columns=['a', 'b', 'X', 'Y'])

set1 = ['a', 'b']
set2 = ['X','Y']

fig, ax = plt.subplots(nrows=len(set1), ncols=len(set2), figsize=[12,12])

xmin = 0
xmax = 0.8

ymin = 0
ymax = 0.8

for i in range(len(set1)):
    for j in range(len(set2)):
        ax[i,j].scatter(df[set2[j]], df[set1[i]])
        ax[i,j].set_xlabel(set2[j])
        ax[i,j].set_ylabel(set1[i])
        
        ax[i,j].title.set_text(f'plot ({set2[j]},{set1[i]})')
        
        ax[i,j].set_xlim(xmin, xmax)
        ax[i,j].set_ylim(ymin, ymax)

fig.suptitle("Scatter plots", fontsize=16)
plt.tight_layout()    
plt.show()

Result:

scatter plot

If you want to extend this with a column c, we can do:

df = pd.DataFrame(np.array([[0.2, 0.5, 0.1, 0.3, 0.1], 
                            [0.1, 0.2, 0.2, 0.5, 0.6], 
                            [0.4, 0.3, 0.3, 0.3, 0.6]]),  
                            columns=['a', 'b', 'c', 'X', 'Y'])

# also adjust the set, of course:
set1 = ['a', 'b', 'c']

Result:

scatter plot2

Answered By: ouroboros1