Plot two DataFrames first column with first column, second columns with second etc

Question:

I have two Dataframes with 3 columns and 7 rows each,

A = {'col1': [20, 3, 45.6, 2, 500, 3e45, nan], 'col2': [nan, 90, 1e3, nan, 78, 6, nan],'col3':[25, 7e56, nan, nan, 23, 0.4, 78.04]}
B = {'a': [1, 24, 30, nan, nan, 56, 2.5], 'b': [100, nan, 10, 78.09, 1e29, 0.84, nan],'c': [nan, 4.6, nan, nan, 9e45, 0.2, nan] }
df_A = pd.DataFrame(data=a)
df_B = pd.DataFrame(data=b)

I’d like to iterate over the two Dataframes in order to do a scatter plot of the first column of df_A (col1) vs the first column of df_B (a), then the second column of A vs the second of B etc. And each plot should have the name of the columns as labels of the axis.

I tried with this, but it gives me as error the fact that x and y aren’t of the same size.

for col in df_A:
     plt.scatter(df_A[col], df_B[col], label=col)
Asked By: Viky E.

||

Answers:

As already said in the comments, make sure either that both df have the same column names, or maybe try using this approach.

for col in range(len(df_A.columns)):
    plt.figure()
    plt.scatter(df_A[df_A.columns[col]], df_B[df_B.columns[col]], label=df_A.columns[col])
    plt.legend()

If instead you also have an error because the data is of different size, then a solution is to either trim the data of one df, or add some padding with dummy data to the shorter df.

Answered By: Yolao_21

You are trying to access col both in df_A and df_B and in your example the column names are different which leads to an error.

I managed to reproduce your example by using matplot subplots and this is the code that works:

import numpy as np
import matplotlib.pyplot as plt

#Load the dataframes
nan = np.nan
a = {'col1': [20, 3, 45.6, 2, 500, 3e45, nan], 'col2': [nan, 90, 1e3, nan, 78, 6, nan],'col3':[25, 7e56, nan, nan, 23, 0.4, 78.04]}
b = {'a': [1, 24, 30, nan, nan, 56, 2.5], 'b': [100, nan, 10, 78.09, 1e29, 0.84, nan],'c': [nan, 4.6, nan, nan, 9e45, 0.2, nan] }
df_A = pd.DataFrame(data=a)
df_B = pd.DataFrame(data=b)


#Plot
#To fix the column name problem use enumerate with indexes.
for i,col in enumerate(df_A):
    plt.subplot(2, 2, i+1)
    plt.title(col)
    plt.scatter(df_A[col].values,df_B.iloc[:,i].values)

plt.show()

Answered By: Rennan Valadares

In the end, I solved it like this:

x_name = df_A.columns
y_name = df_B.columns

for i,col in enumerate(df_A.columns):
     plt.scatter(df_A.iloc[:,i], df_B.iloc[:,i])
     plt.xlabel(x_name[i])
     plt.ylabel(y_name[i])
     plt.show()
Answered By: Viky E.
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.