comparing two columns data with same categories in a plot?
Question:
I have a dataset where the MaterialState_manual
and MaterialState_pipeline
feature has the same categories as shown in the table. such as T61
, 190c_250
and more. The number of precipitates in column Precipitate_manual
is different than the column Prepitates_pipeline
.
Now I want to create a hist plot to compare the same category two lines to show Precipitates_manual
vs Precipitate_pipeline
with all the categories we have in MaterialStates
.
What I did here, but does not shows the categories.
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas
df['Precipitates_manual'].hist()
df['Precipitates_pipeline'].hist()
The output:
Answers:
You could re-shape your dataframe by combining pandas melt
and pandas.DataFrame.groupby
:
df = pd.melt(frame = df,
id_vars = 'MaterialState_manual',
var_name = 'Precipitates_type',
value_name = 'Precipitates_value').groupby(by = ['MaterialState_manual', 'Precipitates_type']).sum().reset_index()
In order to get a dataframe like this one:
MaterialState_manual Precipitates_type Precipitates_value
0 190c_1000h Precipitates_manual 54
1 190c_1000h Precipitates_pipeline 61
2 190c_25000h Precipitates_manual 90
3 190c_25000h Precipitates_pipeline 68
4 190c_2500h Precipitates_manual 111
5 190c_2500h Precipitates_pipeline 137
6 190c_250h Precipitates_manual 100
7 190c_250h Precipitates_pipeline 93
8 190c_5000h Precipitates_manual 77
9 190c_5000h Precipitates_pipeline 78
10 T61 Precipitates_manual 60
11 T61 Precipitates_pipeline 48
In this way you can simply use seaborn.barplot
:
sns.barplot(data = df,
ax = ax,
x = 'MaterialState_manual',
y = 'Precipitates_value',
hue = 'Precipitates_type')
Complete Code
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
categories_list = ['T61', '190c_250h', '190c_1000h', '190c_2500h', '190c_5000h', '190c_25000h']
df_length = 100
np.random.seed(42)
df = pd.DataFrame()
df['MaterialState_manual'] = np.random.choice(a = categories_list, size = df_length, replace = True)
df['Precipitates_manual'] = np.random.randint(low = 1, high = 10, size = df_length)
df['Precipitates_pipeline'] = np.random.randint(low = 1, high = 10, size = df_length)
df = pd.melt(frame = df,
id_vars = 'MaterialState_manual',
var_name = 'Precipitates_type',
value_name = 'Precipitates_value').groupby(by = ['MaterialState_manual', 'Precipitates_type']).sum().reset_index()
fig, ax = plt.subplots()
sns.barplot(data = df,
ax = ax,
x = 'MaterialState_manual',
y = 'Precipitates_value',
hue = 'Precipitates_type')
plt.show()
Plot
I have a dataset where the MaterialState_manual
and MaterialState_pipeline
feature has the same categories as shown in the table. such as T61
, 190c_250
and more. The number of precipitates in column Precipitate_manual
is different than the column Prepitates_pipeline
.
Now I want to create a hist plot to compare the same category two lines to show Precipitates_manual
vs Precipitate_pipeline
with all the categories we have in MaterialStates
.
What I did here, but does not shows the categories.
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas
df['Precipitates_manual'].hist()
df['Precipitates_pipeline'].hist()
The output:
You could re-shape your dataframe by combining pandas melt
and pandas.DataFrame.groupby
:
df = pd.melt(frame = df,
id_vars = 'MaterialState_manual',
var_name = 'Precipitates_type',
value_name = 'Precipitates_value').groupby(by = ['MaterialState_manual', 'Precipitates_type']).sum().reset_index()
In order to get a dataframe like this one:
MaterialState_manual Precipitates_type Precipitates_value
0 190c_1000h Precipitates_manual 54
1 190c_1000h Precipitates_pipeline 61
2 190c_25000h Precipitates_manual 90
3 190c_25000h Precipitates_pipeline 68
4 190c_2500h Precipitates_manual 111
5 190c_2500h Precipitates_pipeline 137
6 190c_250h Precipitates_manual 100
7 190c_250h Precipitates_pipeline 93
8 190c_5000h Precipitates_manual 77
9 190c_5000h Precipitates_pipeline 78
10 T61 Precipitates_manual 60
11 T61 Precipitates_pipeline 48
In this way you can simply use seaborn.barplot
:
sns.barplot(data = df,
ax = ax,
x = 'MaterialState_manual',
y = 'Precipitates_value',
hue = 'Precipitates_type')
Complete Code
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
categories_list = ['T61', '190c_250h', '190c_1000h', '190c_2500h', '190c_5000h', '190c_25000h']
df_length = 100
np.random.seed(42)
df = pd.DataFrame()
df['MaterialState_manual'] = np.random.choice(a = categories_list, size = df_length, replace = True)
df['Precipitates_manual'] = np.random.randint(low = 1, high = 10, size = df_length)
df['Precipitates_pipeline'] = np.random.randint(low = 1, high = 10, size = df_length)
df = pd.melt(frame = df,
id_vars = 'MaterialState_manual',
var_name = 'Precipitates_type',
value_name = 'Precipitates_value').groupby(by = ['MaterialState_manual', 'Precipitates_type']).sum().reset_index()
fig, ax = plt.subplots()
sns.barplot(data = df,
ax = ax,
x = 'MaterialState_manual',
y = 'Precipitates_value',
hue = 'Precipitates_type')
plt.show()