Pandas: Flag groups and then change the data structure
Question:
Here is my raw data:
raw_data = pd.DataFrame({'Year': [1991, 1991, 1991, 2000, 2000],
'ID': ['A', 'A', 'A', 'B', 'B',],
'Group': ['a', 'b', 'c', 'a', 'b'],
'score': [6252, 6252,6252, 2342, 2342]})
I need to generate three group columns indicating if the each ID belongs to that group. Pivot function can only change the data structure and achieves part of my goals.
out_data = pd.DataFrame({'Year': [1991, 2000],
'Group a':['Yes','Yes'],
'Group b':['Yes','Yes'],
'Group c':['Yes','No'],
'ID': ['A', 'B'],
'score': [6252, 2342]})
Answers:
This is a variant on a pivot_table
:
(df
.pivot_table(index=['Year', 'ID'], columns='Group', values='score', aggfunc=any)
.replace({True: 'Yes'}).fillna('No')
.add_prefix('Group_')
.reset_index().rename_axis(columns=None)
)
or crosstab
:
(pd
.crosstab([df['Year'], df['ID']], df['Group'], values=df['score'], aggfunc=any)
.replace({True: 'Yes'}).fillna('No')
.add_prefix('Group_')
.reset_index().rename_axis(columns=None)
)
output:
Year ID Group_a Group_b Group_c
0 1991 A Yes Yes Yes
1 2000 B Yes Yes No
def function1(dd:pd.DataFrame):
return dd.assign(col1=1).pivot_table(index=['Year','ID','score'],columns='Group',values='col1')
.add_prefix('Group ')
raw_data.groupby(['Year','ID']).apply(function1)
.applymap(lambda x:"Yes" if pd.notna(x) else 'No')
.droplevel([0,1]).reset_index()
out:
Group Year ID score Group a Group b Group c
0 1991 A 6252 Yes Yes Yes
1 2000 B 2342 Yes Yes No
Here is my raw data:
raw_data = pd.DataFrame({'Year': [1991, 1991, 1991, 2000, 2000],
'ID': ['A', 'A', 'A', 'B', 'B',],
'Group': ['a', 'b', 'c', 'a', 'b'],
'score': [6252, 6252,6252, 2342, 2342]})
I need to generate three group columns indicating if the each ID belongs to that group. Pivot function can only change the data structure and achieves part of my goals.
out_data = pd.DataFrame({'Year': [1991, 2000],
'Group a':['Yes','Yes'],
'Group b':['Yes','Yes'],
'Group c':['Yes','No'],
'ID': ['A', 'B'],
'score': [6252, 2342]})
This is a variant on a pivot_table
:
(df
.pivot_table(index=['Year', 'ID'], columns='Group', values='score', aggfunc=any)
.replace({True: 'Yes'}).fillna('No')
.add_prefix('Group_')
.reset_index().rename_axis(columns=None)
)
or crosstab
:
(pd
.crosstab([df['Year'], df['ID']], df['Group'], values=df['score'], aggfunc=any)
.replace({True: 'Yes'}).fillna('No')
.add_prefix('Group_')
.reset_index().rename_axis(columns=None)
)
output:
Year ID Group_a Group_b Group_c
0 1991 A Yes Yes Yes
1 2000 B Yes Yes No
def function1(dd:pd.DataFrame):
return dd.assign(col1=1).pivot_table(index=['Year','ID','score'],columns='Group',values='col1')
.add_prefix('Group ')
raw_data.groupby(['Year','ID']).apply(function1)
.applymap(lambda x:"Yes" if pd.notna(x) else 'No')
.droplevel([0,1]).reset_index()
out:
Group Year ID score Group a Group b Group c
0 1991 A 6252 Yes Yes Yes
1 2000 B 2342 Yes Yes No