Creating column based off values in separate row

Question:

Here is an example of my dataframe:

df = pd.DataFrame([['Arsenal FC', '', ''],
                   ['In', 'Age', 'Nat.'],
                   ['Leandro Trossard', 28, 'Belgium'],
                   ['Jakub Kiwior', 22, 'Poland'],
                   ['Jorginho', 32, 'Italy'],
                   ['Chelsea FC', '', ''],
                   ['In', 'Age', 'Nat.'],
                   ['Enzo Fernández ', 22, 'Argentina'],
                   ['Mykhaylo Mudryk', 22, 'Ukraine'],
                  ], columns=['Player', 'Age', 'Nat.'])

I want to create a new column "Club" which takes the string value of the cell in the "Player" and attaches it to the player below.

The tricky part is getting the right clubs assigned to the right players

This is my desired output:

df = pd.DataFrame([['In', 'Age', 'Nat.'],
                   ['Leandro Trossard', 28, 'Belgium', 'Arsenal FC'],
                   ['Jakub Kiwior', 22, 'Poland', 'Arsenal FC'],
                   ['Jorginho', 32, 'Italy', 'Arsenal FC'],
                   ['In', 'Age', 'Nat.'],
                   ['Enzo Fernández ', 22, 'Argentina', 'Chelsea FC'],
                   ['Mykhaylo Mudryk', 22, 'Ukraine', 'Chelsea FC'],
                  ], columns=['Player', 'Age', 'Nat.', 'Club'])

I can’t find another question that relates to this problem. Is this possible in python?

Asked By: code_machine

||

Answers:

One option using boolean masks with mask and ffill:

# which rows are empty string on Age?
m1 = df['Age'].ne('')
# which row are not internal headers?
m2 = df['Player'].ne('Player')

out = df[m1&m2].assign(Club=df.loc[m2, 'Player'].mask(m1).ffill())

Output:

             Player Age       Nat.        Club
2  Leandro Trossard  28    Belgium  Arsenal FC
3      Jakub Kiwior  22     Poland  Arsenal FC
4          Jorginho  32      Italy  Arsenal FC
7   Enzo Fernández   22  Argentina  Chelsea FC
8   Mykhaylo Mudryk  22    Ukraine  Chelsea FC

Intermediates:

             Player     m1        mask       ffill
0        Arsenal FC  False  Arsenal FC  Arsenal FC
2  Leandro Trossard   True         NaN  Arsenal FC
3      Jakub Kiwior   True         NaN  Arsenal FC
4          Jorginho   True         NaN  Arsenal FC
5        Chelsea FC  False  Chelsea FC  Chelsea FC
7   Enzo Fernández    True         NaN  Chelsea FC
8   Mykhaylo Mudryk   True         NaN  Chelsea FC

keeping the In/Age/Nat rows

# which rows are empty string on Age?
m1 = df['Age'].ne('')
# which row are not internal headers?
m2 = df['Player'].ne('In')

out = df[m1].assign(Club=df.loc[m2, 'Player'].mask(m1).ffill())

Output:

             Player  Age       Nat.        Club
1                In  Age       Nat.         NaN
2  Leandro Trossard   28    Belgium  Arsenal FC
3      Jakub Kiwior   22     Poland  Arsenal FC
4          Jorginho   32      Italy  Arsenal FC
6                In  Age       Nat.         NaN
7   Enzo Fernández    22  Argentina  Chelsea FC
8   Mykhaylo Mudryk   22    Ukraine  Chelsea FC
Answered By: mozway

Edited:

df = pd.DataFrame([['Arsenal FC', '', ''],
                   ['In', 'Age', 'Nat.'],
                   ['Leandro Trossard', 28, 'Belgium'],
                   ['Jakub Kiwior', 22, 'Poland'],
                   ['Jorginho', 32, 'Italy'],
                   ['Chelsea FC', '', ''],
                   ['In', 'Age', 'Nat.'],
                   ['Enzo Fernández ', 22, 'Argentina'],
                   ['Mykhaylo Mudryk', 22, 'Ukraine'],
                  ], columns=['Player', 'Age', 'Nat.'])

clubs = []
current_club = None
for i, row in df.iterrows():
    if row['Player'] in ['Arsenal FC', 'Chelsea FC']:
        current_club = row['Player']
    elif row['Player'] == 'In':
        continue
    else:
        clubs.append(current_club)

df['Club'] = clubs

print(df)

Output:

           Player Age     Nat.       Club
0      Arsenal FC        Arsenal FC
1             In   Age     Nat.      NaN
2  Leandro Trossard  28   Belgium  Arsenal FC
3     Jakub Kiwior  22   Poland  Arsenal FC
4         Jorginho  32    Italy  Arsenal FC
5       Chelsea FC        Chelsea FC
6             In   Age     Nat.      NaN
7  Enzo Fernández   22 Argentina  Chelsea FC
8  Mykhaylo Mudryk  22   Ukraine  Chelsea FC

Edit 2: Multiple club names

clubs = ['Arsenal FC', 'Chelsea FC', 'Other Club 1', 'Other Club 2', ..., 'Other Club n']

def get_club(row, clubs):
    if row['Player'] in clubs:
        return row['Player']
    else:
        return ''

df['Club'] = ''
club = ''
for index, row in df.iterrows():
    if row['Player'] in clubs:
        club = row['Player']
    else:
        df.at[index, 'Club'] = club

df = df[df['Club'] != ''].reset_index(drop=True)
df['Club'] = df.apply(lambda x: get_club(x, clubs), axis=1)
Answered By: 6c00h
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.