How to group by the value of the dataframe?
Question:
I have these 2 df which are basically the same but in df1, the values are the amount of the payment of the respective customer and the another is the customers status for that period(the columns 1,2,3,4 are periods):
df1:
customer|1|2|3|4
x |2|5|5|5
y | |5|5|5
z |5|5|5|
df2:
customer|1|2|3|4
x |N|E|E|E
y | |N|E|E
z |N|E|C|-
I want to group by the status which is the values of the df2 to be like:
Status 1 |2 |3 |4
N 7|5 | |
E |10|10|10
C | |5 |
I used to group the status count using
df2.apply(pd.value_counts).fillna(0)
but now, instead of count
the values, I want to SUM
the value of the respective dataframe DF1
Answers:
As so often, this seems difficult, because you have your DataFrames in a weird shape. If you first melt
them, it becomes easy: just merge
them, groupby
your quantities of interest and sum them (and pivot
again if you want to display it in that format):
df1m = df1.melt(id_vars='customer', var_name='period', value_name='amount')
df2m = df2.melt(id_vars='customer', var_name='period', value_name='status')
dfm = df1m.merge(df2m)
res = dfm.groupby(['status', 'period'])['amount'].sum().reset_index()
res.pivot_table(index='status', columns='period')
#period 1 2 3 4
#status
#C NaN NaN 5.0 NaN
#E NaN 10.0 10.0 10.0
#N 7.0 5.0 NaN NaN
To show what melt does: it unpivots the DataFrame, so you have one row per observation (customer, period) that has the amount/status
df1m
# customer period amount
#0 x 1 2.0
#1 y 1 NaN
#2 z 1 5.0
#3 x 2 5.0
#4 y 2 5.0
#5 z 2 5.0
#6 x 3 5.0
#7 y 3 5.0
#8 z 3 5.0
#9 x 4 5.0
#10 y 4 5.0
11 z 4 NaN
I have these 2 df which are basically the same but in df1, the values are the amount of the payment of the respective customer and the another is the customers status for that period(the columns 1,2,3,4 are periods):
df1:
customer|1|2|3|4
x |2|5|5|5
y | |5|5|5
z |5|5|5|
df2:
customer|1|2|3|4
x |N|E|E|E
y | |N|E|E
z |N|E|C|-
I want to group by the status which is the values of the df2 to be like:
Status 1 |2 |3 |4
N 7|5 | |
E |10|10|10
C | |5 |
I used to group the status count using
df2.apply(pd.value_counts).fillna(0)
but now, instead of count
the values, I want to SUM
the value of the respective dataframe DF1
As so often, this seems difficult, because you have your DataFrames in a weird shape. If you first melt
them, it becomes easy: just merge
them, groupby
your quantities of interest and sum them (and pivot
again if you want to display it in that format):
df1m = df1.melt(id_vars='customer', var_name='period', value_name='amount')
df2m = df2.melt(id_vars='customer', var_name='period', value_name='status')
dfm = df1m.merge(df2m)
res = dfm.groupby(['status', 'period'])['amount'].sum().reset_index()
res.pivot_table(index='status', columns='period')
#period 1 2 3 4
#status
#C NaN NaN 5.0 NaN
#E NaN 10.0 10.0 10.0
#N 7.0 5.0 NaN NaN
To show what melt does: it unpivots the DataFrame, so you have one row per observation (customer, period) that has the amount/status
df1m
# customer period amount
#0 x 1 2.0
#1 y 1 NaN
#2 z 1 5.0
#3 x 2 5.0
#4 y 2 5.0
#5 z 2 5.0
#6 x 3 5.0
#7 y 3 5.0
#8 z 3 5.0
#9 x 4 5.0
#10 y 4 5.0
11 z 4 NaN