Alternating column values

Question:

I am working on a project where my dataset looks like bellow:

Origin Destination Num_Trips
Hamburg Frankfurt 2
Hamburg Cologne 1
Cologne Hamburg 3
Frankfurt Hamburg 5

I am interested only on one way either "Hamburg – Frankfurt" or "Frankfurt – Hamburg" and add them as number of trips made between this two locations. How can i do this in pandas so that i have one of them in my dataset with the total number of trips made between the two points either side?

Final Table:

Origin Destination Num_Trips
Hamburg Frankfurt 7
Hamburg Cologne 4

Thanks 🙂

Answers:

IIUC, you can try

cols = ['Origin', 'Destination']
vals = ['Hamburg', 'Frankfurt']

out = df[(df[cols] == vals).all(axis=1) | (df[cols] == vals[::-1]).all(axis=1)]
print(out)

      Origin Destination  distance
0    Hamburg   Frankfurt    393.34
3  Frankfurt     Hamburg    400.44
Answered By: Ynjxsjmh

Arpit Tiwari (Data Scientist)

import pandas as pd
dict1 = {'Origin':['Hamburg', 'Hamburg', 'Cologne', 'Frankfurt'],
        'Destination':['Frankfurt', 'Cologne', 'Hamburg', "Hamburg"],
        'distance':[393.34, 357.15, 358.24, 400.44]
       }
X = pd.DataFrame(dict1)

## Please refer to this part
X['Key'] = (X['Origin']+X['Destination']).apply(lambda x:"".join(sorted(x)))
X=X.drop_duplicates(subset=['Key'],keep='first').drop('Key',axis=1)
Answered By: Arpit Tiwari

You can do ( assuming you want to keep the first duplicate entry ):

out = (
df.groupby(df[['Origin', 'Destination']].apply(
        lambda x: str(sorted((x['Origin'], x['Destination']))), axis=1)).agg(
    {'Origin':'first',
     'Destination':'first',
     'Num_Trips':'sum'
    }).
    reset_index(drop=True)
)

print(out):

    Origin Destination  Num_Trips
0  Hamburg     Cologne          4
1  Hamburg   Frankfurt          7
Answered By: SomeDude

Here’s a simple solution to your problem –

data = {
    "Origin": ["Hamburg", "Hamburg", "Cologne", "Frankfurt"],
    "Destination": ["Frankfurt", "Cologne", "Hamburg", "Hamburg"],
    "Num_Trips": [2, 1, 3, 5]
}

df = pd.DataFrame(data)

df["Key"] = df[["Origin", "Destination"]].apply(lambda x: "|".join(set(x)), axis=1)
# Origin    Destination Num_Trips   Key
# Hamburg   Frankfurt   2           Frankfurt|Hamburg
# Hamburg   Cologne     1           Cologne|Hamburg
# Cologne   Hamburg     3           Cologne|Hamburg
# Frankfurt Hamburg     5           Frankfurt|Hamburg

df.groupby("Key").agg({"Origin": "first", 
                       "Destination": "first", 
                       "Num_Trips": sum}).reset_index(drop=True)

#   Origin  Destination Num_Trips
# 0 Hamburg Cologne     4
# 1 Hamburg Frankfurt   7
Answered By: Prashant
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.