Transform rows categories to column while preserving rest of the data frame python

Question:

I have data frame as below

    Time    Groups  Entity  GC  Seg Category    Year    Quarter IndicatorName   Value
0   2021-06-01  KRO CO  P_GA    None    Model_Q2_2021   2021    2           yhat    568759.481223
1   2021-07-01  KRO CO  P_GA    None    Model_Q2_2021   2021    3           yhat    586003.965652
2   2021-08-01  KRO CO  P_GA    None    Model_Q2_2021   2021    3           yhat    583703.420655
3   2021-09-01  KRO CO  P_GA    None    Model_Q2_2021   2021    3           y       608601.857510
4   2021-10-01  KRO CO  P_GA    None    Model_Q2_2021   2021    4           y       628928.602344

I want to IndicatorName categories to make them columns in such a way that the corresponding value to them in addtion to the rest of the columns

I tried pivot, and melt but nothing gave me desired results.

The closed I have gone was with this

grouper = df.groupby('IndicatorName')
out = pd.concat([pd.Series(v['Value'].tolist(), name=k) for k, v in grouper], axis=1)

         y      yhat
0   8626.88 5.687595e+05
1   8215.30 5.860040e+05
2   8601.53 5.837034e+05
3   8145.16 6.086019e+05
4   9376.81 6.289286e+05
... ... ...
744 NaN 5.402358e+06
745 NaN 5.796123e+06
746 NaN 5.218829e+06
747 NaN 5.451504e+06

But I want to have all columns preserved and additional columns yhat and y

Any help/suggestion would be much appreciated.

Thanks in advance!

Asked By: user3459293

||

Answers:

add the two columns you want, and fill them with the value of df['value'] with a condition based on the value of df['IndicatorName']:

df['yhat'] = df['Value']*(df['IndicatorName'] == 'yhat')
df['y'] = df['Value']*(df['IndicatorName'] == 'y')

should get you:

    Time        (...) IndicatorName        Value      yhat         y
0   2021-06-01  (...)          yhat    568759.48 568759.48         0
1   2021-07-01  (...)          yhat    586003.96 586003.96         0
2   2021-08-01  (...)          yhat    583703.42 583703.42         0
3   2021-09-01  (...)             y    608601.85         0 608601.85
4   2021-10-01  (...)             y    628928.60         0 628928.60

Are you looking for pivot_table:

>>> (df.pivot_table(index=df.columns[:-2].tolist(), columns=['IndicatorName'], values='Value')
       .reset_index().rename_axis(columns=None))

        Time Groups Entity    GC   Seg       Category  Year  Quarter              y           yhat
0 2021-06-01    KRO     CO  P_GA  None  Model_Q2_2021  2021        2            NaN  568759.481223
1 2021-07-01    KRO     CO  P_GA  None  Model_Q2_2021  2021        3            NaN  586003.965652
2 2021-08-01    KRO     CO  P_GA  None  Model_Q2_2021  2021        3            NaN  583703.420655
3 2021-09-01    KRO     CO  P_GA  None  Model_Q2_2021  2021        3  608601.857510            NaN
4 2021-10-01    KRO     CO  P_GA  None  Model_Q2_2021  2021        4  628928.602344            NaN

You can change [:-2] by [:-1] if you want to keep the IndicatorName column.

Answered By: Corralien
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.