How to stack a paired data structure like this?
Question:
I have a data frame that looks like this:
df_dict = {'FamID' : [1, 2], 'Person_1' : ['Husband', 'Granpa'] , 'Person_2' : ['Wife', 'Grandson'], 'Higher_income' : [1, 0]}
df = pd.DataFrame(df_dict)
df = df.set_index('FamID')
It compares the income between household members. So for Higher_income
column, 1
means Person_1
has the higher income and 0
means Person_2
does.
How can I stack this data frame so that the result looks like:
Answers:
You can use wide_to_long
:
df=pd.wide_to_long(df.reset_index(),['Person'],i=['FamID'],j='Key',sep='_').reset_index(level=1).assign(Higher_income=lambda x : x['Higher_income'].ne(x['Key']-1).astype(int)).sort_index()
#you can add the .drop('Key',1) at the end
Out[51]:
Key Higher_income Person
FamID
1 1 1 Husband
1 2 0 Wife
2 1 0 Granpa
2 2 1 Grandson
For your data:
s = (df_tmp.filter(like='Person').stack()
.reset_index(level=-1, drop=True)
.reset_index(name='Person')
)
s.loc[::2, 'Higher_income'] = df_tmp.Higher_income.values
s.loc[1::2,'Higher_income'] = 1 - df_tmp.Higher_income.values
Output:
FamID Person Higher_income
0 1 Husband 1.0
1 1 Wife 0.0
2 2 Granpa 0.0
3 2 Grandson 1.0
You can create dummy columns with the same suffixes to figure out who has the highest_income. Then wide_to_long. This would scale well to many people, so long as the Higher_income label corresponds to the suffix of the Person_i
column.
# Make labels match Person_i format.
df['Higher_income'] = df['Higher_income'].replace(0, 2)
df = pd.get_dummies(df, columns=['Higher_income']).reset_index()
# FamID Person_1 Person_2 Higher_income_1 Higher_income_2
#0 1 Husband Wife 1 0
#1 2 Granpa Grandson 0 1
(pd.wide_to_long(df, i='FamID', j='num', stubnames=['Person', 'Higher_income'], sep='_')
.reset_index('num', drop=True))
# Person Higher_income
#FamID
#1 Husband 1
#2 Granpa 0
#1 Wife 0
#2 Grandson 1
I have a data frame that looks like this:
df_dict = {'FamID' : [1, 2], 'Person_1' : ['Husband', 'Granpa'] , 'Person_2' : ['Wife', 'Grandson'], 'Higher_income' : [1, 0]}
df = pd.DataFrame(df_dict)
df = df.set_index('FamID')
It compares the income between household members. So for Higher_income
column, 1
means Person_1
has the higher income and 0
means Person_2
does.
How can I stack this data frame so that the result looks like:
You can use wide_to_long
:
df=pd.wide_to_long(df.reset_index(),['Person'],i=['FamID'],j='Key',sep='_').reset_index(level=1).assign(Higher_income=lambda x : x['Higher_income'].ne(x['Key']-1).astype(int)).sort_index()
#you can add the .drop('Key',1) at the end
Out[51]:
Key Higher_income Person
FamID
1 1 1 Husband
1 2 0 Wife
2 1 0 Granpa
2 2 1 Grandson
For your data:
s = (df_tmp.filter(like='Person').stack()
.reset_index(level=-1, drop=True)
.reset_index(name='Person')
)
s.loc[::2, 'Higher_income'] = df_tmp.Higher_income.values
s.loc[1::2,'Higher_income'] = 1 - df_tmp.Higher_income.values
Output:
FamID Person Higher_income
0 1 Husband 1.0
1 1 Wife 0.0
2 2 Granpa 0.0
3 2 Grandson 1.0
You can create dummy columns with the same suffixes to figure out who has the highest_income. Then wide_to_long. This would scale well to many people, so long as the Higher_income label corresponds to the suffix of the Person_i
column.
# Make labels match Person_i format.
df['Higher_income'] = df['Higher_income'].replace(0, 2)
df = pd.get_dummies(df, columns=['Higher_income']).reset_index()
# FamID Person_1 Person_2 Higher_income_1 Higher_income_2
#0 1 Husband Wife 1 0
#1 2 Granpa Grandson 0 1
(pd.wide_to_long(df, i='FamID', j='num', stubnames=['Person', 'Higher_income'], sep='_')
.reset_index('num', drop=True))
# Person Higher_income
#FamID
#1 Husband 1
#2 Granpa 0
#1 Wife 0
#2 Grandson 1