Checking Previous elements in a list with Python and based on the previous element store a value in a new column with Pandas
Question:
list_Crashes = ['Startup', 'Crash in A', 'Shutdown', 'Crash in B', 'Crash in C', 'Startup', 'Crash in D',
'Startup', 'Crash in E', 'Crash in F', 'Crash in G', 'Shutdown', 'Crash in X', 'Crash in Y', 'Crash in Z']
I have a table which contains 2 columns.
the code will check the previous element of list and look for ( Startup / Shutdown ) :
Example : if a Crash is after a Startup ; State column will be filled with Startup in front of that Crash as the table below :
Crashes
State
Crash in A
Startup
Crash in B
Shutdown
Crash in C
Shutdown
Crash in D
Startup
Crash in E
Startup
Crash in F
Startup
Crash in G
Startup
Crash in X
Shutdown
Crash in Y
Shutdown
Crash in Z
Shutdown
the challenge I’m having is that the letters are random each time so i have to use "Crash in" in my code and not specific letters !
Any suggestions on how to do this?
EDIT : Real life example ( each line is an element of a list) :
12:33:04.1753 | Startup Configuration dazdazdazd
12:35:15.0142 | Crash in A <546464>, thread 61
12:35:53.0396 | Crash in B <5>, 3e9fc dazdazd
12:35:54.1664 | Crash in C <70>,bfc690dasfff
12:35:55.3817 | Crash in D <80>,de5484sdazdazd
12:36:01.6642 | Crash in E <50>,bfc428fdsfsgdgsgsd
12:53:34.6462 | System Shutdown
12:53:48.1724 | Exception: Crash in Y <01>, 38310dazdazdafaga
Code used from @mozway’s Answer :
def gen(lst):
last_non_crash =''
for x in lst:
if 'Crash in' in x:
last_non_crash = x
else:
yield [x, last_non_crash]
dataf = pd.DataFrame(gen(Crashtype), columns = ['Crashes', 'State'])
Output :
Crashes State
0 12:53:34.6462 | [1230.490] System shutdownn 12:36:01.6642 | Exception: Crash in E<50>,...
Expected Output :
Crashes State
0 Crash in A Startup
1 Crash in B Startup
2 Crash in C Startup
3 Crash in D Startup
4 Crash in E Startup
5 Crash in Y Shutdown
Answers:
IIUC, you can use a generator:
def gen(lst):
last_non_crash = ''
for x in lst:
if not x.startswith('Crash in'):
last_non_crash = x
else:
yield [x, last_non_crash]
pd.DataFrame(gen(list_Crashes), columns=['Crashes', 'State'])
output:
Crashes State
0 Crash in A Startup
1 Crash in B Shutdown
2 Crash in C Shutdown
3 Crash in D Startup
4 Crash in E Startup
5 Crash in F Startup
6 Crash in G Startup
7 Crash in X Shutdown
8 Crash in Y Shutdown
9 Crash in Z Shutdown
input:
list_Crashes = ['Startup', 'Crash in A', 'Shutdown', 'Crash in B', 'Crash in C', 'Startup', 'Crash in D',
'Startup', 'Crash in E', 'Crash in F', 'Crash in G', 'Shutdown', 'Crash in X', 'Crash in Y', 'Crash in Z']
updated answer
import re
def gen(lst):
last_non_crash = ''
for x in lst:
m = re.search(r'(Crash in w+|Shutdown|Startup)', x)
x = m.group() if m else 'unknown'
if not 'Crash in' in x:
last_non_crash = x
else:
yield [x, last_non_crash]
pd.DataFrame(gen(list_Crashes), columns=['Crashes', 'State'])
output:
Crashes State
0 Crash in A Startup
1 Crash in B Startup
2 Crash in C Startup
3 Crash in D Startup
4 Crash in E Startup
5 Crash in Y Shutdown
def function1(x):
return ' '.join(x[x.index('Crash'):x.index('Crash')+3]) if 'Crash' in x else ''
col2=df1.col2.str.split(' ')
Startup=col2.map(lambda x:'Startup' in x)
Shutdown=col2.map(lambda x:'Shutdown' in x)
Crash=np.where(Startup|Shutdown,False,col2.map(function1))
df1.assign(Crash=Crash).assign(State=np.select([Startup,Shutdown],['Startup','Shutdown'],None)).ffill()
.loc[Crash!=False]
out:
Crashes State
0 Crash in A Startup
1 Crash in B Startup
2 Crash in C Startup
3 Crash in D Startup
4 Crash in E Startup
5 Crash in Y Shutdown
list_Crashes = ['Startup', 'Crash in A', 'Shutdown', 'Crash in B', 'Crash in C', 'Startup', 'Crash in D',
'Startup', 'Crash in E', 'Crash in F', 'Crash in G', 'Shutdown', 'Crash in X', 'Crash in Y', 'Crash in Z']
I have a table which contains 2 columns.
the code will check the previous element of list and look for ( Startup / Shutdown ) :
Example : if a Crash is after a Startup ; State column will be filled with Startup in front of that Crash as the table below :
Crashes | State |
---|---|
Crash in A | Startup |
Crash in B | Shutdown |
Crash in C | Shutdown |
Crash in D | Startup |
Crash in E | Startup |
Crash in F | Startup |
Crash in G | Startup |
Crash in X | Shutdown |
Crash in Y | Shutdown |
Crash in Z | Shutdown |
the challenge I’m having is that the letters are random each time so i have to use "Crash in" in my code and not specific letters !
Any suggestions on how to do this?
EDIT : Real life example ( each line is an element of a list) :
12:33:04.1753 | Startup Configuration dazdazdazd
12:35:15.0142 | Crash in A <546464>, thread 61
12:35:53.0396 | Crash in B <5>, 3e9fc dazdazd
12:35:54.1664 | Crash in C <70>,bfc690dasfff
12:35:55.3817 | Crash in D <80>,de5484sdazdazd
12:36:01.6642 | Crash in E <50>,bfc428fdsfsgdgsgsd
12:53:34.6462 | System Shutdown
12:53:48.1724 | Exception: Crash in Y <01>, 38310dazdazdafaga
Code used from @mozway’s Answer :
def gen(lst):
last_non_crash =''
for x in lst:
if 'Crash in' in x:
last_non_crash = x
else:
yield [x, last_non_crash]
dataf = pd.DataFrame(gen(Crashtype), columns = ['Crashes', 'State'])
Output :
Crashes State
0 12:53:34.6462 | [1230.490] System shutdownn 12:36:01.6642 | Exception: Crash in E<50>,...
Expected Output :
Crashes State
0 Crash in A Startup
1 Crash in B Startup
2 Crash in C Startup
3 Crash in D Startup
4 Crash in E Startup
5 Crash in Y Shutdown
IIUC, you can use a generator:
def gen(lst):
last_non_crash = ''
for x in lst:
if not x.startswith('Crash in'):
last_non_crash = x
else:
yield [x, last_non_crash]
pd.DataFrame(gen(list_Crashes), columns=['Crashes', 'State'])
output:
Crashes State
0 Crash in A Startup
1 Crash in B Shutdown
2 Crash in C Shutdown
3 Crash in D Startup
4 Crash in E Startup
5 Crash in F Startup
6 Crash in G Startup
7 Crash in X Shutdown
8 Crash in Y Shutdown
9 Crash in Z Shutdown
input:
list_Crashes = ['Startup', 'Crash in A', 'Shutdown', 'Crash in B', 'Crash in C', 'Startup', 'Crash in D',
'Startup', 'Crash in E', 'Crash in F', 'Crash in G', 'Shutdown', 'Crash in X', 'Crash in Y', 'Crash in Z']
updated answer
import re
def gen(lst):
last_non_crash = ''
for x in lst:
m = re.search(r'(Crash in w+|Shutdown|Startup)', x)
x = m.group() if m else 'unknown'
if not 'Crash in' in x:
last_non_crash = x
else:
yield [x, last_non_crash]
pd.DataFrame(gen(list_Crashes), columns=['Crashes', 'State'])
output:
Crashes State
0 Crash in A Startup
1 Crash in B Startup
2 Crash in C Startup
3 Crash in D Startup
4 Crash in E Startup
5 Crash in Y Shutdown
def function1(x):
return ' '.join(x[x.index('Crash'):x.index('Crash')+3]) if 'Crash' in x else ''
col2=df1.col2.str.split(' ')
Startup=col2.map(lambda x:'Startup' in x)
Shutdown=col2.map(lambda x:'Shutdown' in x)
Crash=np.where(Startup|Shutdown,False,col2.map(function1))
df1.assign(Crash=Crash).assign(State=np.select([Startup,Shutdown],['Startup','Shutdown'],None)).ffill()
.loc[Crash!=False]
out:
Crashes State
0 Crash in A Startup
1 Crash in B Startup
2 Crash in C Startup
3 Crash in D Startup
4 Crash in E Startup
5 Crash in Y Shutdown