Split Column of Semicolon-Separated Values and Duplicate Row with each Value in Pandas

Question:

I have a data frame with a row of data like this:

play_by_play = pd.DataFrame([{
    "players": "Tom Brady; Mike Evans; Tristan Wirfs; Leonard Fournette; Chris Godwin", 
               "down": 1, 
               "to_go": 10,
                "play_type": 'pass',
               "yards_gained": 8,
               "pass_attempt": 1,
               "complete_pass": 1,
               "rush_attempt": 0
              }])

I want to keep track of stats while the given players are on the field by using group by and aggregate functions. If I were looking to do this player-by-player, I would use play_by_play["players"].str.contains("Tom Brady") and aggregating the data using that filter, but I am looking to automate this. The solution I’ve landed on is to duplicate rows and have the "players" value split for each row. It would look something like this:

player down to_go play_type yards_gained pass_attempt complete_pass rush_attempt
"Tom Brady" 1 10 pass 8 1 1 0
"Mike Evans" 1 10 pass 8 1 1 0
"Tristan Wirfs" 1 10 pass 8 1 1 0
"Leonard Fournette" 1 10 pass 8 1 1 0
"Chris Godwin" 1 10 pass 8 1 1 0

How could I accomplish this? As I mentioned before, this needs to be pretty scalable, as there are thousands of rows of data. If there’s an easier way to group by based on unique values contained in a semicolon-separated column, I’m more than happy to go that route.

Asked By: jwald3

||

Answers:

You can use pandas.Series.str.split to make a list of the players then pandas.DataFrame.explode to make multiple rows :

play_by_play['players'] = play_by_play['players'].str.split(';')
play_by_play = play_by_play.explode('players').reset_index(drop=True)

# Output :

print(play_by_play)

              players  down  to_go play_type  yards_gained  pass_attempt  complete_pass  rush_attempt
0           Tom Brady     1     10      pass             8             1              1             0
1          Mike Evans     1     10      pass             8             1              1             0
2       Tristan Wirfs     1     10      pass             8             1              1             0
3   Leonard Fournette     1     10      pass             8             1              1             0
4        Chris Godwin     1     10      pass             8             1              1             0
Answered By: abokey
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.