Pandas – Fill the blank column with some value
Question:
I am looking to create a new column ‘isRunning’ in the data frame by identifying the ‘Task’ column status. Also, I have a ‘Session’ column and if a Session only has START value in the Task and no END vaule (for example, Session 3 has only START val), so ‘isRunning’ should have TRUE flag untill the end of that session.
Session
Task
isRunning
0
1
START
TRUE
1
1
TRUE
2
1
TRUE
3
1
TRUE
4
1
END
TRUE
5
1
FALSE
6
2
FALSE
7
2
START
TRUE
8
2
TRUE
9
2
END
TRUE
10
2
FALSE
11
2
FALSE
12
3
FALSE
13
3
START
TRUE
14
3
TRUE
15
3
TRUE
16
4
FALSE
17
4
START
TRUE
18
4
TRUE
19
4
TRUE
20
4
END
TRUE
Not sure how to move forward with adding an extra condition and identifying if the Task is running.
import pandas as pd
d = {'Session':[1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4],
'Task':['START', '', '', '', 'END', '', '', 'START', '', 'END', '', '', '', 'START', '', '', '', 'START', '', '', 'END']}
df = pd.DataFrame(data=d)
Answers:
Assuming one start/end per group. You can use a double mask:
m1 = df['Task'].eq('START').groupby(df['Session]').cummax()
m2 = df.loc[::-1, 'Task'].eq('END').groupby(df['Session']).cummax()
df['IsRunning'] = m1&m2
If several start/end per session are possible:
df['IsRunning'] = (df['Task'].map({'START': True, 'END': False})
.groupby(df['Session']).ffill()
.fillna(False)
) | df['Task'].eq('END')
Output:
Session Task IsRunning
0 1 START True
1 1 True
2 1 True
3 1 True
4 1 END True
5 1 False
6 2 False
7 2 START True
8 2 True
9 2 END True
10 2 False
11 2 False
12 3 False
13 3 START True
14 3 True
15 3 True
16 4 False
17 4 START True
18 4 True
19 4 True
20 4 END True
I am looking to create a new column ‘isRunning’ in the data frame by identifying the ‘Task’ column status. Also, I have a ‘Session’ column and if a Session only has START value in the Task and no END vaule (for example, Session 3 has only START val), so ‘isRunning’ should have TRUE flag untill the end of that session.
Session | Task | isRunning | |
---|---|---|---|
0 | 1 | START | TRUE |
1 | 1 | TRUE | |
2 | 1 | TRUE | |
3 | 1 | TRUE | |
4 | 1 | END | TRUE |
5 | 1 | FALSE | |
6 | 2 | FALSE | |
7 | 2 | START | TRUE |
8 | 2 | TRUE | |
9 | 2 | END | TRUE |
10 | 2 | FALSE | |
11 | 2 | FALSE | |
12 | 3 | FALSE | |
13 | 3 | START | TRUE |
14 | 3 | TRUE | |
15 | 3 | TRUE | |
16 | 4 | FALSE | |
17 | 4 | START | TRUE |
18 | 4 | TRUE | |
19 | 4 | TRUE | |
20 | 4 | END | TRUE |
Not sure how to move forward with adding an extra condition and identifying if the Task is running.
import pandas as pd
d = {'Session':[1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4],
'Task':['START', '', '', '', 'END', '', '', 'START', '', 'END', '', '', '', 'START', '', '', '', 'START', '', '', 'END']}
df = pd.DataFrame(data=d)
Assuming one start/end per group. You can use a double mask:
m1 = df['Task'].eq('START').groupby(df['Session]').cummax()
m2 = df.loc[::-1, 'Task'].eq('END').groupby(df['Session']).cummax()
df['IsRunning'] = m1&m2
If several start/end per session are possible:
df['IsRunning'] = (df['Task'].map({'START': True, 'END': False})
.groupby(df['Session']).ffill()
.fillna(False)
) | df['Task'].eq('END')
Output:
Session Task IsRunning
0 1 START True
1 1 True
2 1 True
3 1 True
4 1 END True
5 1 False
6 2 False
7 2 START True
8 2 True
9 2 END True
10 2 False
11 2 False
12 3 False
13 3 START True
14 3 True
15 3 True
16 4 False
17 4 START True
18 4 True
19 4 True
20 4 END True