How to plot a grouped bar plot from two or more dataframes
Question:
I have multiple dataframes, and I want to plot them on the same figure in the Grouped Bar Chart view.
These are two very small dataframes that I would like to plot together in the same figure.
The dataframes are:
I want to plot a figure like this example:
I try this, plot only one graph:
fig, ax = plt.subplots()
df1.plot.bar(x='Zona',y='Total_MSP')
df4.plot.bar(x='Zona',y='NumEstCasasFavelas2017',ax=ax)
plt.show()
I tried this too:
fig, ax = plt.subplots()
df1.plot.bar(x='Zona',y='Total_MSP',ax=ax)
df4.plot.bar(x='Zona',y='NumEstCasasFavelas2017',ax=ax)
plt.show()
The results are just data from a single dataframe in a picture, not two data from both dataframes. Note that only the captions of both dataframes appear in the same picture, the data is only from a single isolated dataframe.
Answers:
- In order to create a grouped bar plot, the DataFrames must be combined with
pandas.merge
or pandas.DataFrame.merge
.
- See pandas User Guide: Merge, join, concatenate and compare and SO: Pandas Merging 101.
Data:
import pandas as pd
import matplotlib.pyplot as plt
df1 = pd.DataFrame({'Zone': ['C', 'L', 'N', 'O', 'S'],
'Total_MSP': [464245, 3764942, 1877505, 1023160, 3179477]})
df2 = pd.DataFrame({'Zone': ['C', 'L', 'N', 'O', 'S'],
'CasasFavelas_2017': [463, 4228, 851, 1802, 2060]})
Merge the dataframes:
- Using
pandas.merge
, combine the DataFrames.
df = pd.merge(df1, df2, on='Zone')
Zone Total_MSP CasasFavelas_2017
0 C 464245 463
1 L 3764942 4228
2 N 1877505 851
3 O 1023160 1802
4 S 3179477 2060
Plot:
- Plot the DataFrame with
pandas.DataFrame.plot
.
- Use log scale for
Casas
to show up.
ax = df.plot(kind='bar', x='Zone', logy=True, rot=0)
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()
Update:
- The OP added additional data in an answer, after this answer was provided.
- Use
pandas.concat
to combine more than 2 DataFrames.
df12 = pd.DataFrame({'Zone': ['C', 'L', 'N', 'O', 'S'], 'Total_MSP': [464245, 3764942, 1877505, 1023160, 3179477]})
df13 = pd.DataFrame({'Zone': ['C', 'L', 'N', 'O', 'S'], 'ValorMedioDollar': [1852.27, 1291.53, 1603.44, 2095.90, 1990.10]})
df14 = pd.DataFrame({'Zone': ['C', 'L', 'N', 'O', 'S'], 'IDH2010': [0.89, 0.70, 0.79, 0.90, 0.80]})
df15 = pd.DataFrame({'Zone': ['C', 'L', 'N', 'O', 'S'], 'QtdNovasCasas': [96,1387, 561, 281, 416]})
# use concat to combine more than two DataFrames
df = pd.concat([df12.set_index('Zone'), df13.set_index('Zone'), df14.set_index('Zone'), df15.set_index('Zone')], axis=1)
Total_MSP ValorMedioDollar IDH2010 QtdNovasCasas
Zone
C 464245 1852.27 0.89 96
L 3764942 1291.53 0.70 1387
N 1877505 1603.44 0.79 561
O 1023160 2095.90 0.90 281
S 3179477 1990.10 0.80 416
# plot the DataFrame
ax = df.plot(kind='bar', logy=True, figsize=(8, 6), rot=0)
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()
Adding Annotations:
- Not part of the original question.
Graphic with four custom color dataframes and caption
import pandas as pd
df12 = pd.DataFrame({'Zone': ['C', 'L', 'N', 'O', 'S'],
'Total_MSP': [464245, 3764942, 1877505, 1023160, 3179477]})
df13 = pd.DataFrame({'Zone': ['C', 'L', 'N', 'O', 'S'],
'ValorMedioDollar': [1852.27, 1291.53, 1603.44, 2095.90, 1990.10]})
df14 = pd.DataFrame({'Zone': ['C', 'L', 'N', 'O', 'S'],
'IDH2010': [0.89, 0.70, 0.79, 0.90, 0.80]})
df15 = pd.DataFrame({'Zone': ['C', 'L', 'N', 'O', 'S'],
'QtdNovasCasas': [96,1387, 561, 281, 416]})
df16 = pd.merge(df12, df13, on='Zone')
df16 = pd.merge(df16, df14, on='Zone')
df16 = pd.merge(df16, df15, on='Zone')
fig, ax = plt.subplots(figsize=(50, 20))
#https://xkcd.com/color/rgb/
colors2 = ['#448ee4', '#a9f971','#ceb301','#ffb7ce']
#For all values to be displayed, even though these scales are different, the log scale is used.
df16.plot.bar(x='Zone', logy=True, color=colors2, ax=ax,width=0.5, align = 'center');
#legend
#https://stackoverflow.com/questions/19125722/adding-a-legend-to-pyplot-in-matplotlib-in-the-most-simple-manner-possible
plt.gca().legend(('Total Resident Population-2017',
'Median Value of square meter-Dollars US',
'HDI- Human Development Index-2010',
'Number of new housing properties-2018'),bbox_to_anchor=(0.87, 0.89) ,fontsize=28)
plt.title('Estimated Resident Population, Average value of square meter, HDI, New housing properties in São Paulo - Brazil',fontsize=40)
plt.xlabel ('Names of the geographical subdivisions of São Paulo',fontsize=40)
plt.ylabel('Log Scale', fontsize=30)
#change the name of month on the x
ax = plt.gca()
names = ['Zone: Center', 'Zone: East', 'Zone: North', 'Zone: West', 'Zone: South']
ax.set_xticklabels(names,fontsize=40)
x = plt.gca().xaxis
plt.rcParams['ytick.labelsize'] = 30
# rotate the tick labels for the x axis
for item in x.get_ticklabels():
item.set_rotation(0)
for spine in plt.gca().spines.values():
spine.set_visible(False)
# remove all the ticks (both axes), and tick labels on the Y axis
plt.tick_params(top='off', bottom='off', left='off', right='off', labelleft='on', labelbottom='on')
# direct label each bar with Y axis values
for p in ax.patches[0:]:
plt.gca().text(p.get_x() + p.get_width()/2, p.get_height()+0.01, str(float(p.get_height())),
ha='center', va='baseline', rotation=0 ,color='black', fontsize=25)
plt.show()
fig.savefig('GraficoMultiplo.jpg')
I have multiple dataframes, and I want to plot them on the same figure in the Grouped Bar Chart view.
These are two very small dataframes that I would like to plot together in the same figure.
The dataframes are:
I want to plot a figure like this example:
I try this, plot only one graph:
fig, ax = plt.subplots()
df1.plot.bar(x='Zona',y='Total_MSP')
df4.plot.bar(x='Zona',y='NumEstCasasFavelas2017',ax=ax)
plt.show()
I tried this too:
fig, ax = plt.subplots()
df1.plot.bar(x='Zona',y='Total_MSP',ax=ax)
df4.plot.bar(x='Zona',y='NumEstCasasFavelas2017',ax=ax)
plt.show()
The results are just data from a single dataframe in a picture, not two data from both dataframes. Note that only the captions of both dataframes appear in the same picture, the data is only from a single isolated dataframe.
- In order to create a grouped bar plot, the DataFrames must be combined with
pandas.merge
orpandas.DataFrame.merge
. - See pandas User Guide: Merge, join, concatenate and compare and SO: Pandas Merging 101.
Data:
import pandas as pd
import matplotlib.pyplot as plt
df1 = pd.DataFrame({'Zone': ['C', 'L', 'N', 'O', 'S'],
'Total_MSP': [464245, 3764942, 1877505, 1023160, 3179477]})
df2 = pd.DataFrame({'Zone': ['C', 'L', 'N', 'O', 'S'],
'CasasFavelas_2017': [463, 4228, 851, 1802, 2060]})
Merge the dataframes:
- Using
pandas.merge
, combine the DataFrames.
df = pd.merge(df1, df2, on='Zone')
Zone Total_MSP CasasFavelas_2017
0 C 464245 463
1 L 3764942 4228
2 N 1877505 851
3 O 1023160 1802
4 S 3179477 2060
Plot:
- Plot the DataFrame with
pandas.DataFrame.plot
.- Use log scale for
Casas
to show up.
- Use log scale for
ax = df.plot(kind='bar', x='Zone', logy=True, rot=0)
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()
Update:
- The OP added additional data in an answer, after this answer was provided.
- Use
pandas.concat
to combine more than 2 DataFrames.
df12 = pd.DataFrame({'Zone': ['C', 'L', 'N', 'O', 'S'], 'Total_MSP': [464245, 3764942, 1877505, 1023160, 3179477]})
df13 = pd.DataFrame({'Zone': ['C', 'L', 'N', 'O', 'S'], 'ValorMedioDollar': [1852.27, 1291.53, 1603.44, 2095.90, 1990.10]})
df14 = pd.DataFrame({'Zone': ['C', 'L', 'N', 'O', 'S'], 'IDH2010': [0.89, 0.70, 0.79, 0.90, 0.80]})
df15 = pd.DataFrame({'Zone': ['C', 'L', 'N', 'O', 'S'], 'QtdNovasCasas': [96,1387, 561, 281, 416]})
# use concat to combine more than two DataFrames
df = pd.concat([df12.set_index('Zone'), df13.set_index('Zone'), df14.set_index('Zone'), df15.set_index('Zone')], axis=1)
Total_MSP ValorMedioDollar IDH2010 QtdNovasCasas
Zone
C 464245 1852.27 0.89 96
L 3764942 1291.53 0.70 1387
N 1877505 1603.44 0.79 561
O 1023160 2095.90 0.90 281
S 3179477 1990.10 0.80 416
# plot the DataFrame
ax = df.plot(kind='bar', logy=True, figsize=(8, 6), rot=0)
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()
Adding Annotations:
- Not part of the original question.
Graphic with four custom color dataframes and caption
import pandas as pd
df12 = pd.DataFrame({'Zone': ['C', 'L', 'N', 'O', 'S'],
'Total_MSP': [464245, 3764942, 1877505, 1023160, 3179477]})
df13 = pd.DataFrame({'Zone': ['C', 'L', 'N', 'O', 'S'],
'ValorMedioDollar': [1852.27, 1291.53, 1603.44, 2095.90, 1990.10]})
df14 = pd.DataFrame({'Zone': ['C', 'L', 'N', 'O', 'S'],
'IDH2010': [0.89, 0.70, 0.79, 0.90, 0.80]})
df15 = pd.DataFrame({'Zone': ['C', 'L', 'N', 'O', 'S'],
'QtdNovasCasas': [96,1387, 561, 281, 416]})
df16 = pd.merge(df12, df13, on='Zone')
df16 = pd.merge(df16, df14, on='Zone')
df16 = pd.merge(df16, df15, on='Zone')
fig, ax = plt.subplots(figsize=(50, 20))
#https://xkcd.com/color/rgb/
colors2 = ['#448ee4', '#a9f971','#ceb301','#ffb7ce']
#For all values to be displayed, even though these scales are different, the log scale is used.
df16.plot.bar(x='Zone', logy=True, color=colors2, ax=ax,width=0.5, align = 'center');
#legend
#https://stackoverflow.com/questions/19125722/adding-a-legend-to-pyplot-in-matplotlib-in-the-most-simple-manner-possible
plt.gca().legend(('Total Resident Population-2017',
'Median Value of square meter-Dollars US',
'HDI- Human Development Index-2010',
'Number of new housing properties-2018'),bbox_to_anchor=(0.87, 0.89) ,fontsize=28)
plt.title('Estimated Resident Population, Average value of square meter, HDI, New housing properties in São Paulo - Brazil',fontsize=40)
plt.xlabel ('Names of the geographical subdivisions of São Paulo',fontsize=40)
plt.ylabel('Log Scale', fontsize=30)
#change the name of month on the x
ax = plt.gca()
names = ['Zone: Center', 'Zone: East', 'Zone: North', 'Zone: West', 'Zone: South']
ax.set_xticklabels(names,fontsize=40)
x = plt.gca().xaxis
plt.rcParams['ytick.labelsize'] = 30
# rotate the tick labels for the x axis
for item in x.get_ticklabels():
item.set_rotation(0)
for spine in plt.gca().spines.values():
spine.set_visible(False)
# remove all the ticks (both axes), and tick labels on the Y axis
plt.tick_params(top='off', bottom='off', left='off', right='off', labelleft='on', labelbottom='on')
# direct label each bar with Y axis values
for p in ax.patches[0:]:
plt.gca().text(p.get_x() + p.get_width()/2, p.get_height()+0.01, str(float(p.get_height())),
ha='center', va='baseline', rotation=0 ,color='black', fontsize=25)
plt.show()
fig.savefig('GraficoMultiplo.jpg')