nested for loop/if statement in list comprehension
Question:
I have the following dataframe:
import pandas as pd
import numpy as np
d1 = {'atom_number': ["12,14,24", "23", "14,25,46", 20.3 , np.nan, "15,24"]}
df = pd.DataFrame(data=d1)
df
atom_number
0 12,14,24
1 23
2 14,25,46
3 20.3
4 NaN
5 15,24
I would like to split the string values if they are strings. Using the follwing code I get an AttributeError:
df['atom_number'] = [[int(x) if type(s) == str else np.nan for x in s.split(',')] for s in df.atom_number]
df = df.dropna(subset = ["atom_number"])
AttributeError: ‘float’ object has no attribute ‘split’
desired output:
atom_number
0 [12, 14, 24]
1 [23]
2 [14, 25, 46]
3 [15, 24]
I know I can filter df before using the list comprehension for string values, but I would like to know how this can be done in a list comprehension.
Answers:
Test type
for values of Series s
with isinstance
:
df['atom_number'] = [[int(x) for x in s.split(',')]
if isinstance(s, str)
else np.nan for s in df.atom_number]
Your solution:
df['atom_number'] = [[int(x) for x in s.split(',')]
if type(s) == str
else np.nan for s in df.atom_number]
Or use map
for integers:
df['atom_number'] = [list(map(int, s.split(',')))
if type(s) == str
else np.nan for s in df.atom_number]
df = df.dropna(subset = ["atom_number"])
print (df)
atom_number
0 [12, 14, 24]
1 [23]
2 [14, 25, 46]
5 [15, 24]
More complex solution remove floats in splitted values by ,
and keep single integers:
d1 = {'atom_number': ["12,14,24", "23", "14,25,46.5", 20.3 , np.nan, "15,24", 20]}
df = pd.DataFrame(data=d1)
df['atom_number'] = [[int(x) for x in s.split(',') if float(x).is_integer()]
if isinstance(s, str)
else [s]
if isinstance(s, int)
else np.nan for s in df.atom_number]
df = df.dropna(subset = ["atom_number"])
print (df)
atom_number
0 [12, 14, 24]
1 [23]
2 [14, 25]
5 [15, 24]
6 [20]
pandas has a .apply
method which is probably more optimal then list-comp. to modify pandas objects like DataFrames or series and looks more clearly.
also isinstance
looks better then comparing type of object to type.
e.g.:
import pandas as pd
import numpy as np
data = {'atom_number': ["12,14,24",
"23",
"14,25,46",
20.3,
np.nan,
"15,24"]}
df = pd.DataFrame(data=data)
def split_atom(x):
if isinstance(x, str):
return np.array([int(i) for i in x.split(',')])
elif isinstance(x, int):
return np.array([x])
df.atom_number = df.atom_number.apply(split_atom)
df.dropna()
I have the following dataframe:
import pandas as pd
import numpy as np
d1 = {'atom_number': ["12,14,24", "23", "14,25,46", 20.3 , np.nan, "15,24"]}
df = pd.DataFrame(data=d1)
df
atom_number
0 12,14,24
1 23
2 14,25,46
3 20.3
4 NaN
5 15,24
I would like to split the string values if they are strings. Using the follwing code I get an AttributeError:
df['atom_number'] = [[int(x) if type(s) == str else np.nan for x in s.split(',')] for s in df.atom_number]
df = df.dropna(subset = ["atom_number"])
AttributeError: ‘float’ object has no attribute ‘split’
desired output:
atom_number
0 [12, 14, 24]
1 [23]
2 [14, 25, 46]
3 [15, 24]
I know I can filter df before using the list comprehension for string values, but I would like to know how this can be done in a list comprehension.
Test type
for values of Series s
with isinstance
:
df['atom_number'] = [[int(x) for x in s.split(',')]
if isinstance(s, str)
else np.nan for s in df.atom_number]
Your solution:
df['atom_number'] = [[int(x) for x in s.split(',')]
if type(s) == str
else np.nan for s in df.atom_number]
Or use map
for integers:
df['atom_number'] = [list(map(int, s.split(',')))
if type(s) == str
else np.nan for s in df.atom_number]
df = df.dropna(subset = ["atom_number"])
print (df)
atom_number
0 [12, 14, 24]
1 [23]
2 [14, 25, 46]
5 [15, 24]
More complex solution remove floats in splitted values by ,
and keep single integers:
d1 = {'atom_number': ["12,14,24", "23", "14,25,46.5", 20.3 , np.nan, "15,24", 20]}
df = pd.DataFrame(data=d1)
df['atom_number'] = [[int(x) for x in s.split(',') if float(x).is_integer()]
if isinstance(s, str)
else [s]
if isinstance(s, int)
else np.nan for s in df.atom_number]
df = df.dropna(subset = ["atom_number"])
print (df)
atom_number
0 [12, 14, 24]
1 [23]
2 [14, 25]
5 [15, 24]
6 [20]
pandas has a .apply
method which is probably more optimal then list-comp. to modify pandas objects like DataFrames or series and looks more clearly.
also isinstance
looks better then comparing type of object to type.
e.g.:
import pandas as pd
import numpy as np
data = {'atom_number': ["12,14,24",
"23",
"14,25,46",
20.3,
np.nan,
"15,24"]}
df = pd.DataFrame(data=data)
def split_atom(x):
if isinstance(x, str):
return np.array([int(i) for i in x.split(',')])
elif isinstance(x, int):
return np.array([x])
df.atom_number = df.atom_number.apply(split_atom)
df.dropna()