comparing two columns data with same categories in a plot?

Question:

I have a dataset where the MaterialState_manual and MaterialState_pipeline feature has the same categories as shown in the table. such as T61, 190c_250 and more. The number of precipitates in column Precipitate_manual is different than the column Prepitates_pipeline.

enter image description here

Now I want to create a hist plot to compare the same category two lines to show Precipitates_manual vs Precipitate_pipeline with all the categories we have in MaterialStates.

What I did here, but does not shows the categories.

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas


df['Precipitates_manual'].hist()
df['Precipitates_pipeline'].hist()

The output:

enter image description here

What I did in excel but in python still struggling.
enter image description here

Asked By: Zia

||

Answers:

You could re-shape your dataframe by combining pandas melt and pandas.DataFrame.groupby:

df = pd.melt(frame = df,
             id_vars = 'MaterialState_manual',
             var_name = 'Precipitates_type',
             value_name = 'Precipitates_value').groupby(by = ['MaterialState_manual', 'Precipitates_type']).sum().reset_index()

In order to get a dataframe like this one:

   MaterialState_manual      Precipitates_type  Precipitates_value
0            190c_1000h    Precipitates_manual                  54
1            190c_1000h  Precipitates_pipeline                  61
2           190c_25000h    Precipitates_manual                  90
3           190c_25000h  Precipitates_pipeline                  68
4            190c_2500h    Precipitates_manual                 111
5            190c_2500h  Precipitates_pipeline                 137
6             190c_250h    Precipitates_manual                 100
7             190c_250h  Precipitates_pipeline                  93
8            190c_5000h    Precipitates_manual                  77
9            190c_5000h  Precipitates_pipeline                  78
10                  T61    Precipitates_manual                  60
11                  T61  Precipitates_pipeline                  48

In this way you can simply use seaborn.barplot:

sns.barplot(data = df, 
            ax = ax, 
            x = 'MaterialState_manual', 
            y = 'Precipitates_value', 
            hue = 'Precipitates_type')

Complete Code

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns


categories_list = ['T61', '190c_250h', '190c_1000h', '190c_2500h', '190c_5000h', '190c_25000h']
df_length = 100


np.random.seed(42)
df = pd.DataFrame()
df['MaterialState_manual'] = np.random.choice(a = categories_list, size = df_length, replace = True)
df['Precipitates_manual'] = np.random.randint(low = 1, high = 10, size = df_length)
df['Precipitates_pipeline'] = np.random.randint(low = 1, high = 10, size = df_length)


df = pd.melt(frame = df,
             id_vars = 'MaterialState_manual',
             var_name = 'Precipitates_type',
             value_name = 'Precipitates_value').groupby(by = ['MaterialState_manual', 'Precipitates_type']).sum().reset_index()


fig, ax = plt.subplots()

sns.barplot(data = df,
            ax = ax,
            x = 'MaterialState_manual',
            y = 'Precipitates_value',
            hue = 'Precipitates_type')

plt.show()

Plot

enter image description here

Answered By: Zephyr