Dataframe get some statistics data with specific column's data number
Question:
Hi~ Today I do another question.
I Want to some statistics data from my dataframe but the number of data is following same number of item.
I did upper question. and refer to the images.
Sample:
import pandas as pd
data = {'ITEM': ['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd', 'z', 'z', 'z', 'z'], 'Stat.1': [37, 84, 54, 72, 49, 20, 64, 75, 90, 71, 84, 46, 78, 24, 34], 'Stat.2': [27, 44, 43, 12, 92, 10, 24, 55, 50, 37, 44, 57, 38, 64, 17], 'Stat.3': [17, 54, 64, 68, 39, 50, 34, 65, 40, 72, 44, 56, 48, 94, 37], }
df = pd.DataFrame(data)
Answers:
Based on @Corralien’s comment, use groupby.transform
and join
only on the last duplicated
rows:
m = ~df['ITEM'].duplicated(keep='last')
out = df.join(df.groupby('ITEM').transform('median').add_prefix('Median of ')[m])
Output:
ITEM Stat.1 Stat.2 Stat.3 Median of Stat.1 Median of Stat.2 Median of Stat.3
0 a 37 27 17 NaN NaN NaN
1 a 84 44 54 NaN NaN NaN
2 a 54 43 64 NaN NaN NaN
3 a 72 12 68 NaN NaN NaN
4 a 49 92 39 54.0 43.0 54.0
5 b 20 10 50 NaN NaN NaN
6 b 64 24 34 NaN NaN NaN
7 b 75 55 65 64.0 24.0 50.0
8 c 90 50 40 NaN NaN NaN
9 c 71 37 72 80.5 43.5 56.0
10 d 84 44 44 84.0 44.0 44.0
11 z 46 57 56 NaN NaN NaN
12 z 78 38 48 NaN NaN NaN
13 z 24 64 94 NaN NaN NaN
14 z 34 17 37 40.0 47.5 52.0
Hi~ Today I do another question.
I Want to some statistics data from my dataframe but the number of data is following same number of item.
I did upper question. and refer to the images.
Sample:
import pandas as pd
data = {'ITEM': ['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd', 'z', 'z', 'z', 'z'], 'Stat.1': [37, 84, 54, 72, 49, 20, 64, 75, 90, 71, 84, 46, 78, 24, 34], 'Stat.2': [27, 44, 43, 12, 92, 10, 24, 55, 50, 37, 44, 57, 38, 64, 17], 'Stat.3': [17, 54, 64, 68, 39, 50, 34, 65, 40, 72, 44, 56, 48, 94, 37], }
df = pd.DataFrame(data)
Based on @Corralien’s comment, use groupby.transform
and join
only on the last duplicated
rows:
m = ~df['ITEM'].duplicated(keep='last')
out = df.join(df.groupby('ITEM').transform('median').add_prefix('Median of ')[m])
Output:
ITEM Stat.1 Stat.2 Stat.3 Median of Stat.1 Median of Stat.2 Median of Stat.3
0 a 37 27 17 NaN NaN NaN
1 a 84 44 54 NaN NaN NaN
2 a 54 43 64 NaN NaN NaN
3 a 72 12 68 NaN NaN NaN
4 a 49 92 39 54.0 43.0 54.0
5 b 20 10 50 NaN NaN NaN
6 b 64 24 34 NaN NaN NaN
7 b 75 55 65 64.0 24.0 50.0
8 c 90 50 40 NaN NaN NaN
9 c 71 37 72 80.5 43.5 56.0
10 d 84 44 44 84.0 44.0 44.0
11 z 46 57 56 NaN NaN NaN
12 z 78 38 48 NaN NaN NaN
13 z 24 64 94 NaN NaN NaN
14 z 34 17 37 40.0 47.5 52.0