Alternating column values
Question:
I am working on a project where my dataset looks like bellow:
Origin
Destination
Num_Trips
Hamburg
Frankfurt
2
Hamburg
Cologne
1
Cologne
Hamburg
3
Frankfurt
Hamburg
5
I am interested only on one way either "Hamburg – Frankfurt" or "Frankfurt – Hamburg" and add them as number of trips made between this two locations. How can i do this in pandas so that i have one of them in my dataset with the total number of trips made between the two points either side?
Final Table:
Origin
Destination
Num_Trips
Hamburg
Frankfurt
7
Hamburg
Cologne
4
Thanks 🙂
Answers:
IIUC, you can try
cols = ['Origin', 'Destination']
vals = ['Hamburg', 'Frankfurt']
out = df[(df[cols] == vals).all(axis=1) | (df[cols] == vals[::-1]).all(axis=1)]
print(out)
Origin Destination distance
0 Hamburg Frankfurt 393.34
3 Frankfurt Hamburg 400.44
Arpit Tiwari (Data Scientist)
import pandas as pd
dict1 = {'Origin':['Hamburg', 'Hamburg', 'Cologne', 'Frankfurt'],
'Destination':['Frankfurt', 'Cologne', 'Hamburg', "Hamburg"],
'distance':[393.34, 357.15, 358.24, 400.44]
}
X = pd.DataFrame(dict1)
## Please refer to this part
X['Key'] = (X['Origin']+X['Destination']).apply(lambda x:"".join(sorted(x)))
X=X.drop_duplicates(subset=['Key'],keep='first').drop('Key',axis=1)
You can do ( assuming you want to keep the first duplicate entry ):
out = (
df.groupby(df[['Origin', 'Destination']].apply(
lambda x: str(sorted((x['Origin'], x['Destination']))), axis=1)).agg(
{'Origin':'first',
'Destination':'first',
'Num_Trips':'sum'
}).
reset_index(drop=True)
)
print(out):
Origin Destination Num_Trips
0 Hamburg Cologne 4
1 Hamburg Frankfurt 7
Here’s a simple solution to your problem –
data = {
"Origin": ["Hamburg", "Hamburg", "Cologne", "Frankfurt"],
"Destination": ["Frankfurt", "Cologne", "Hamburg", "Hamburg"],
"Num_Trips": [2, 1, 3, 5]
}
df = pd.DataFrame(data)
df["Key"] = df[["Origin", "Destination"]].apply(lambda x: "|".join(set(x)), axis=1)
# Origin Destination Num_Trips Key
# Hamburg Frankfurt 2 Frankfurt|Hamburg
# Hamburg Cologne 1 Cologne|Hamburg
# Cologne Hamburg 3 Cologne|Hamburg
# Frankfurt Hamburg 5 Frankfurt|Hamburg
df.groupby("Key").agg({"Origin": "first",
"Destination": "first",
"Num_Trips": sum}).reset_index(drop=True)
# Origin Destination Num_Trips
# 0 Hamburg Cologne 4
# 1 Hamburg Frankfurt 7
I am working on a project where my dataset looks like bellow:
Origin | Destination | Num_Trips |
---|---|---|
Hamburg | Frankfurt | 2 |
Hamburg | Cologne | 1 |
Cologne | Hamburg | 3 |
Frankfurt | Hamburg | 5 |
I am interested only on one way either "Hamburg – Frankfurt" or "Frankfurt – Hamburg" and add them as number of trips made between this two locations. How can i do this in pandas so that i have one of them in my dataset with the total number of trips made between the two points either side?
Final Table:
Origin | Destination | Num_Trips |
---|---|---|
Hamburg | Frankfurt | 7 |
Hamburg | Cologne | 4 |
Thanks 🙂
IIUC, you can try
cols = ['Origin', 'Destination']
vals = ['Hamburg', 'Frankfurt']
out = df[(df[cols] == vals).all(axis=1) | (df[cols] == vals[::-1]).all(axis=1)]
print(out)
Origin Destination distance
0 Hamburg Frankfurt 393.34
3 Frankfurt Hamburg 400.44
Arpit Tiwari (Data Scientist)
import pandas as pd
dict1 = {'Origin':['Hamburg', 'Hamburg', 'Cologne', 'Frankfurt'],
'Destination':['Frankfurt', 'Cologne', 'Hamburg', "Hamburg"],
'distance':[393.34, 357.15, 358.24, 400.44]
}
X = pd.DataFrame(dict1)
## Please refer to this part
X['Key'] = (X['Origin']+X['Destination']).apply(lambda x:"".join(sorted(x)))
X=X.drop_duplicates(subset=['Key'],keep='first').drop('Key',axis=1)
You can do ( assuming you want to keep the first duplicate entry ):
out = (
df.groupby(df[['Origin', 'Destination']].apply(
lambda x: str(sorted((x['Origin'], x['Destination']))), axis=1)).agg(
{'Origin':'first',
'Destination':'first',
'Num_Trips':'sum'
}).
reset_index(drop=True)
)
print(out):
Origin Destination Num_Trips
0 Hamburg Cologne 4
1 Hamburg Frankfurt 7
Here’s a simple solution to your problem –
data = {
"Origin": ["Hamburg", "Hamburg", "Cologne", "Frankfurt"],
"Destination": ["Frankfurt", "Cologne", "Hamburg", "Hamburg"],
"Num_Trips": [2, 1, 3, 5]
}
df = pd.DataFrame(data)
df["Key"] = df[["Origin", "Destination"]].apply(lambda x: "|".join(set(x)), axis=1)
# Origin Destination Num_Trips Key
# Hamburg Frankfurt 2 Frankfurt|Hamburg
# Hamburg Cologne 1 Cologne|Hamburg
# Cologne Hamburg 3 Cologne|Hamburg
# Frankfurt Hamburg 5 Frankfurt|Hamburg
df.groupby("Key").agg({"Origin": "first",
"Destination": "first",
"Num_Trips": sum}).reset_index(drop=True)
# Origin Destination Num_Trips
# 0 Hamburg Cologne 4
# 1 Hamburg Frankfurt 7