Python: delete row in dataframe by condition

Question:

I want to drop all rows in the ratings df where the team has no game. So not in the fixtures df in HomeTeam or AwayTeam occur. following I tried:

fixtures = pd.DataFrame({'HomeTeam': ["Team1", "Team3", "Team5", "Team6"], 'AwayTeam': [
    "Team2", "Team4", "Team6", "Team8"]})

ratings = pd.DataFrame({'team': ["Team1", "Team2", "Team3", "Team4", "Team5",
                                 "Team6", "Team7", "Team8", "Team9", "Team10", "Team11", "Team12"], "rating": ["1,5", "0,2", "0,5", "2", "3", "4,8", "0,9", "-0,4", "-0,6", "1,5", "0,2", "0,5"]})

ratings = ratings[(ratings.team != fixtures.HomeTeam) &
                  (ratings.team != fixtures.AwayTeam)]

but I get the error message:

ValueError: Can only compare identically-labeled Series objects

what can i do to stop the error from occurring?

Asked By: Robin Reiche

||

Answers:

Because both dataframes are not of equal size. You can use isin() instead.

ratings = ratings[~ratings.team.isin(fixtures.stack())]

#output
'''
    team    rating
6   Team7   0,9
8   Team9   -0,6
9   Team10  1,5
10  Team11  0,2
11  Team12  0,5

'''

Details:

print(fixtures.stack())
'''
0  HomeTeam    Team1
   AwayTeam    Team2
1  HomeTeam    Team3
   AwayTeam    Team4
2  HomeTeam    Team5
   AwayTeam    Team6
3  HomeTeam    Team6
   AwayTeam    Team8
dtype: object
'''

As you can see this returns all values ​​in fixtures. Using the ~ operator in the isin function, we filter out those that do not contain these values.

Answered By: Clegane
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.