Fast way to iterate and apply condition thought dataframe

Question:

I have a large dataframe such as below:

     vehicle id  delta
0            0      0
1            0     20
2            0     40
3            0    400
4            0     10
5            1      0
6            1     10
7            1    500
8            1     10
9            1     10
10           1    100
11           1     10

I want to add a new column as ‘Trip’ for each different vehicle that starts with trip_1 and if the delta is more than 50, then it adds a number to the trip number so the results would be as follow:

      vehicle id  delta   Trip
0            0      0     trip_1
1            0     20     trip_1
2            0     40     trip_1
3            0    400     trip_2
4            0     10     trip_2
5            1      0     trip_1
6            1     10     trip_1
7            1    500     trip_2
8            1     10     trip_2
9            1     10     trip_2
10           1    100     trip_3
11           1     10     trip_3

I’m thinking about using iterrow() but I want to avoid it since the dataframe is huge. Any suggestions?

Asked By: Mohammad.sh

||

Answers:

Try something like this:

df['trip'] = 'Trip_' + df.assign(tripid = (df.groupby('id')['delta'].diff() > 50).cumsum() + 1)
  .groupby('id')['tripid'].transform(lambda x: x.factorize()[0] + 1).astype(str)

Output:

    vehicle  id  delta    trip
0         0   0      0  Trip_1
1         1   0     20  Trip_1
2         2   0     40  Trip_1
3         3   0    400  Trip_2
4         4   0     10  Trip_2
5         5   1      0  Trip_1
6         6   1     10  Trip_1
7         7   1    500  Trip_2
8         8   1     10  Trip_2
9         9   1     10  Trip_2
10       10   1    100  Trip_3
11       11   1     10  Trip_3
Answered By: Scott Boston

You can use np.select which is way faster than looping

Your example:

import numpy as np
delta = df["delta"]
condlist = [delta < 50, (delta >50) & (delta <100) , delta >=100]
choicelist = ["trip_1", "trip_2","trip_3"]
df["Trip"] = np.select(condlist, choicelist)

Output

print(df)

      vehicle id  delta   Trip
0            0      0     trip_1
1            0     20     trip_1
2            0     40     trip_1
3            0    400     trip_2
4            0     10     trip_2
5            1      0     trip_1
6            1     10     trip_1
7            1    500     trip_2
8            1     10     trip_2
9            1     10     trip_2
10           1    100     trip_3
11           1     10     trip_3
Answered By: Yefet

Try this:

df['Trip'] = 'trip_' + df.groupby('id')['delta'].transform(
    lambda grp: np.where(grp > 50, 1, 0).cumsum()+1).apply(str)
print(df)

    vehicle  id  delta    Trip
0         0   0      0  trip_1
1         1   0     20  trip_1
2         2   0     40  trip_1
3         3   0    400  trip_2
4         4   0     10  trip_2
5         5   1      0  trip_1
6         6   1     10  trip_1
7         7   1    500  trip_2
8         8   1     10  trip_2
9         9   1     10  trip_2
10       10   1    100  trip_3
11       11   1     10  trip_3
Answered By: I'mahdi

here is one way to do it

df['trip']=df.assign(trip=
                     np.where ( (df.groupby(['vehicle_id'])['delta'].diff()>50),
                               1,
                               0)).groupby(['vehicle_id'])['trip'].cumsum()+1

df['trip']='Trip_' + df['trip'].astype('str')
df
    vehicle_id  delta   trip
0            0      0   Trip_1
1            0     20   Trip_1
2            0     40   Trip_1
3            0    400   Trip_2
4            0     10   Trip_2
5            1      0   Trip_1
6            1     10   Trip_1
7            1    500   Trip_2
8            1     10   Trip_2
9            1     10   Trip_2
10           1    100   Trip_3
11           1     10   Trip_3
Answered By: Naveed