How to plot medians of grouped data in pandas
Question:
Considering two histograms as the following ones:
from pandas import DataFrame
import numpy as np
x = ['A']*300 + ['B']*400
y = np.random.randn(700)
df = DataFrame({'Letter': x, 'N': y})
df.hist('N', by='Letter')
I am trying to plot the median of each grouped data. I would also like to invert the order of the graphs (Group B on the left and Group A on the right)
Answers:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
x = ['A']*300 + ['B']*400
y = np.random.randn(700)
df = pd.DataFrame({'Letter': x, 'N': y})
# medians for each group
medians = df.groupby('Letter')['N'].median()
colors = {'A': 'blue', 'B': 'green'}
fig, axs = plt.subplots(1, 2)
df[df['Letter'] == 'B'].hist('N', ax=axs[0], color=colors['B'], grid=False)
df[df['Letter'] == 'A'].hist('N', ax=axs[1], color=colors['A'], grid=False)
# median lines
axs[0].axvline(medians['B'], color='r', linestyle='dashed', linewidth=1)
axs[1].axvline(medians['A'], color='r', linestyle='dashed', linewidth=1)
axs[0].set_title('Group B')
axs[1].set_title('Group A')
axs[0].set_xlabel('Value')
axs[1].set_xlabel('Value')
axs[0].set_ylabel('Frequency')
axs[1].set_ylabel('Frequency')
plt.tight_layout()
plt.show()
Considering two histograms as the following ones:
from pandas import DataFrame
import numpy as np
x = ['A']*300 + ['B']*400
y = np.random.randn(700)
df = DataFrame({'Letter': x, 'N': y})
df.hist('N', by='Letter')
I am trying to plot the median of each grouped data. I would also like to invert the order of the graphs (Group B on the left and Group A on the right)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
x = ['A']*300 + ['B']*400
y = np.random.randn(700)
df = pd.DataFrame({'Letter': x, 'N': y})
# medians for each group
medians = df.groupby('Letter')['N'].median()
colors = {'A': 'blue', 'B': 'green'}
fig, axs = plt.subplots(1, 2)
df[df['Letter'] == 'B'].hist('N', ax=axs[0], color=colors['B'], grid=False)
df[df['Letter'] == 'A'].hist('N', ax=axs[1], color=colors['A'], grid=False)
# median lines
axs[0].axvline(medians['B'], color='r', linestyle='dashed', linewidth=1)
axs[1].axvline(medians['A'], color='r', linestyle='dashed', linewidth=1)
axs[0].set_title('Group B')
axs[1].set_title('Group A')
axs[0].set_xlabel('Value')
axs[1].set_xlabel('Value')
axs[0].set_ylabel('Frequency')
axs[1].set_ylabel('Frequency')
plt.tight_layout()
plt.show()