Duplicating and Transforming Data in Dataframe
Question:
I have a dataframe of football results. It is laid out in date order with a row for each game. Each row contains the name of the home team and away team in different columns along with the result.
I want to create a new dataframe that contains a series of all the games played by each team (home and away) in a column called "Team", a separate column for the Opponent and a third column for the Result.
Here’s an example of the original dataframe:
Date
Home
Away
Result
Sunday, 21 May 2017, 15:00
A
B
A
Thursday, 18 May 2017, 19:45
C
D
D
Wednesday, 17 May 2017, 19:45
E
A
E
Tuesday, 16 May 2017, 20:00
B
C
Draw
And this is what I want to achieve:
Date
Team
Opponent
Result
Sunday, 21 May 2017, 15:00
A
B
A
Wednesday, 17 May 2017, 19:45
A
E
E
Sunday, 21 May 2017, 15:00
B
A
A
Tuesday, 16 May 2017, 20:00
B
C
Draw
Tuesday, 16 May 2017, 20:00
C
B
Draw
Thursday, 18 May 2017, 19:45
C
D
D
Thursday, 18 May 2017, 19:45
D
C
D
Wednesday, 17 May 2017, 19:45
E
A
E
I am new to Pandas and don’t know where to start with this. Can anyone help?
Answers:
You can swap the names with rename
and concat
, optionally sort_values
:
out = (pd.concat([df, df.rename(columns={'Home': 'Away', 'Away': 'Home'})])
.sort_values(by=['Home', 'Away'], ignore_index=True)
)
Output:
Date Home Away Result
0 Sunday, 21 May 2017, 15:00 A B A
1 Wednesday, 17 May 2017, 19:45 A E E
2 Sunday, 21 May 2017, 15:00 B A A
3 Tuesday, 16 May 2017, 20:00 B C Draw
4 Tuesday, 16 May 2017, 20:00 C B Draw
5 Thursday, 18 May 2017, 19:45 C D D
6 Thursday, 18 May 2017, 19:45 D C D
7 Wednesday, 17 May 2017, 19:45 E A E
I have a dataframe of football results. It is laid out in date order with a row for each game. Each row contains the name of the home team and away team in different columns along with the result.
I want to create a new dataframe that contains a series of all the games played by each team (home and away) in a column called "Team", a separate column for the Opponent and a third column for the Result.
Here’s an example of the original dataframe:
Date | Home | Away | Result |
---|---|---|---|
Sunday, 21 May 2017, 15:00 | A | B | A |
Thursday, 18 May 2017, 19:45 | C | D | D |
Wednesday, 17 May 2017, 19:45 | E | A | E |
Tuesday, 16 May 2017, 20:00 | B | C | Draw |
And this is what I want to achieve:
Date | Team | Opponent | Result |
---|---|---|---|
Sunday, 21 May 2017, 15:00 | A | B | A |
Wednesday, 17 May 2017, 19:45 | A | E | E |
Sunday, 21 May 2017, 15:00 | B | A | A |
Tuesday, 16 May 2017, 20:00 | B | C | Draw |
Tuesday, 16 May 2017, 20:00 | C | B | Draw |
Thursday, 18 May 2017, 19:45 | C | D | D |
Thursday, 18 May 2017, 19:45 | D | C | D |
Wednesday, 17 May 2017, 19:45 | E | A | E |
I am new to Pandas and don’t know where to start with this. Can anyone help?
You can swap the names with rename
and concat
, optionally sort_values
:
out = (pd.concat([df, df.rename(columns={'Home': 'Away', 'Away': 'Home'})])
.sort_values(by=['Home', 'Away'], ignore_index=True)
)
Output:
Date Home Away Result
0 Sunday, 21 May 2017, 15:00 A B A
1 Wednesday, 17 May 2017, 19:45 A E E
2 Sunday, 21 May 2017, 15:00 B A A
3 Tuesday, 16 May 2017, 20:00 B C Draw
4 Tuesday, 16 May 2017, 20:00 C B Draw
5 Thursday, 18 May 2017, 19:45 C D D
6 Thursday, 18 May 2017, 19:45 D C D
7 Wednesday, 17 May 2017, 19:45 E A E