How to swap multiple values within a row based on a conditional in pandas
Question:
Description of my problem
I pulled some data from an api that is unfortunately formatted. In particular, there are four columns I need to utilize in some way to solve my issue. They are label_1
, odds_1
, label_2
, and line_2
. The odds_
columns can consist of either Over
or Under
. Ideally, I’d want one of the odds_
columns to consist of exclusively Over
while the other consists of exclusively Under
. But alas, that is not how the data was formatted, so I am tasked with doing this myself. Ideally, label_
would be all Over
while label_2
would be all under.
Here is a visual of my problem:
My data
odds_1 label_1 line_1 odds_2 label_2 line_2
-165 Under 3.5 130 Over 3.5
-137 Under 2.5 108 Over 2.5
-104 Over 10.5 -122 Under 10.5
-117 Over 26.5 -109 Under 26.5
100 Over 2.5 -125 Under 2.5
-117 Over 14.5 -109 Under 14.5
My desired output
odds_1 label_1 line_1 odds_2 label_2 line_2
130 Over 3.5 -165 Under 3.5
108 Over 2.5 -137 Under 2.5
-104 Over 10.5 -122 Under 10.5
-117 Over 26.5 -109 Under 26.5
100 Over 2.5 -125 Under 2.5
-117 Over 14.5 -109 Under 14.5
What I have tried
-
I tried using pandas.DataFrame.at, but that poses multiple problems. 1. I don’t believe there is a way to vectorize it, and 2. It doesn’t address the necessary swap of the odds_
values.
-
I have also tried numpy.where. This fixes the vectorization issue, but does not address the swap of the odds_
columns (at least the way I’m currently doing it). I haven’t discovered a way to nest np.where
to make this work, but maybe I’m missing something
df = np.where(df.label_1 == 'Under', 'Over', df.label_1)
odds_1 label_1 line_1 odds_2 label_2 line_2
-165 Over 3.5 130 Under 3.5
-137 Over 2.5 108 Under 2.5
-104 Over 10.5 -122 Under 10.5
-117 Over 26.5 -109 Under 26.5
100 Over 2.5 -125 Under 2.5
-117 Over 14.5 -109 Under 14.5
- I have also tried pandas.DataFrame.iterrows but can’t quite get that to work either. Still not sure how to address the swap of the
odds_
columns here.
for idx, row in df.iterrows():
row.label_1 = 'Over' where some conditional?
Hope I gave enough info. Any help would be greatly appreciated. Thanks!
Answers:
Try the following:
idx = df.label_1 == 'Under'
df.loc[idx, ['odds_1', 'label_1','odds_2', 'label_2']] = df.loc[idx, ['odds_2','label_2','odds_1','label_1']].values
You can swap values like a, b = b, a
using boolean mask to select right rows (the 2 first):
m = df['label_1'] == 'Under'
cols1 = ['odds_1', 'label_1', 'line_1']
cols2 = ['odds_2', 'label_2', 'line_2']
df.loc[m, cols1], df.loc[m, cols2] = df.loc[m, cols2].values, df.loc[m, cols1].values
Output:
>>> df
odds_1 label_1 line_1 odds_2 label_2 line_2
0 130 Over 3.5 -165 Under 3.5
1 108 Over 2.5 -137 Under 2.5
2 -104 Over 10.5 -122 Under 10.5
3 -117 Over 26.5 -109 Under 26.5
4 100 Over 2.5 -125 Under 2.5
5 -117 Over 14.5 -109 Under 14.5
Description of my problem
I pulled some data from an api that is unfortunately formatted. In particular, there are four columns I need to utilize in some way to solve my issue. They are label_1
, odds_1
, label_2
, and line_2
. The odds_
columns can consist of either Over
or Under
. Ideally, I’d want one of the odds_
columns to consist of exclusively Over
while the other consists of exclusively Under
. But alas, that is not how the data was formatted, so I am tasked with doing this myself. Ideally, label_
would be all Over
while label_2
would be all under.
Here is a visual of my problem:
My data
odds_1 label_1 line_1 odds_2 label_2 line_2
-165 Under 3.5 130 Over 3.5
-137 Under 2.5 108 Over 2.5
-104 Over 10.5 -122 Under 10.5
-117 Over 26.5 -109 Under 26.5
100 Over 2.5 -125 Under 2.5
-117 Over 14.5 -109 Under 14.5
My desired output
odds_1 label_1 line_1 odds_2 label_2 line_2
130 Over 3.5 -165 Under 3.5
108 Over 2.5 -137 Under 2.5
-104 Over 10.5 -122 Under 10.5
-117 Over 26.5 -109 Under 26.5
100 Over 2.5 -125 Under 2.5
-117 Over 14.5 -109 Under 14.5
What I have tried
-
I tried using pandas.DataFrame.at, but that poses multiple problems. 1. I don’t believe there is a way to vectorize it, and 2. It doesn’t address the necessary swap of the
odds_
values. -
I have also tried numpy.where. This fixes the vectorization issue, but does not address the swap of the
odds_
columns (at least the way I’m currently doing it). I haven’t discovered a way to nestnp.where
to make this work, but maybe I’m missing something
df = np.where(df.label_1 == 'Under', 'Over', df.label_1)
odds_1 label_1 line_1 odds_2 label_2 line_2
-165 Over 3.5 130 Under 3.5
-137 Over 2.5 108 Under 2.5
-104 Over 10.5 -122 Under 10.5
-117 Over 26.5 -109 Under 26.5
100 Over 2.5 -125 Under 2.5
-117 Over 14.5 -109 Under 14.5
- I have also tried pandas.DataFrame.iterrows but can’t quite get that to work either. Still not sure how to address the swap of the
odds_
columns here.
for idx, row in df.iterrows():
row.label_1 = 'Over' where some conditional?
Hope I gave enough info. Any help would be greatly appreciated. Thanks!
Try the following:
idx = df.label_1 == 'Under'
df.loc[idx, ['odds_1', 'label_1','odds_2', 'label_2']] = df.loc[idx, ['odds_2','label_2','odds_1','label_1']].values
You can swap values like a, b = b, a
using boolean mask to select right rows (the 2 first):
m = df['label_1'] == 'Under'
cols1 = ['odds_1', 'label_1', 'line_1']
cols2 = ['odds_2', 'label_2', 'line_2']
df.loc[m, cols1], df.loc[m, cols2] = df.loc[m, cols2].values, df.loc[m, cols1].values
Output:
>>> df
odds_1 label_1 line_1 odds_2 label_2 line_2
0 130 Over 3.5 -165 Under 3.5
1 108 Over 2.5 -137 Under 2.5
2 -104 Over 10.5 -122 Under 10.5
3 -117 Over 26.5 -109 Under 26.5
4 100 Over 2.5 -125 Under 2.5
5 -117 Over 14.5 -109 Under 14.5