Long to wide format using a dictionary

Question:

I would like to make a long to wide transformation of my dataframe, starting from

match_id player goals home
1        John   1     home
1        Jim    3     home
...
2        John   0     away
2        Jim    2     away
...

ending up with:

match_id player_1 player_2 player_1_goals player_2_goals player_1_home player_2_home ...
1        John     Jim      1              3              home          home
2        John     Jim      0              2              away          away
...

Since I’m going to have columns with new names, I though that maybe I should try to build a dictionary for that, where the outer key is match id, for everylike so:

dict = {1: {
    'player_1': 'John',
    'player_1_goals':1,
    'player_1_home': 'home'
    'player_2': 'Jim',
    'player_2_goals':3,
    'player_2_home': 'home'
     },
        2: {
    'player_1': 'John',
    'player_1_goals':0,
    'player_1_home': 'away',
    'player_2': 'Jim',
    'player_2_goals':2
    'player_2_home': 'away'

     },
}

and then:

pd.DataFrame.from_dict(dict).T

In the real case scenario, however, the number of players will vary and I can’t hardcode it.

Is this the best way of doing this using diciotnaries? If so, how could I build this dict and populate it from my original pandas dataframe?

Asked By: 8-Bit Borges

||

Answers:

It looks like you want to pivot the dataframe. The problem is there is no column in your dataframe that "enumerates" the players for you. If you assign such a column via assign() method, then pivot() becomes easy.

So far, it actually looks incredibly similar this case here. The only difference is you seem to need to format the column names in a specific way where the string "player" needs to prepended to each column name. The set_axis() call below does that.

(df
 .assign(
     ind=df.groupby('match_id').cumcount().add(1).astype(str)
 )
 .pivot('match_id', 'ind', ['player', 'goals', 'home'])
 .pipe(lambda x: x.set_axis([
     '_'.join([c, i]) if c == 'player' else '_'.join(['player', i, c]) 
     for (c, i) in x
 ], axis=1))
 .reset_index()
)

res

Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.