nested for loop/if statement in list comprehension

Question

I have the following dataframe:

import pandas as pd
import numpy as np

d1 = {'atom_number': ["12,14,24",  "23", "14,25,46", 20.3 , np.nan,  "15,24"]}
df = pd.DataFrame(data=d1)
df

    atom_number
0   12,14,24
1   23
2   14,25,46
3   20.3
4   NaN
5   15,24

I would like to split the string values if they are strings. Using the follwing code I get an AttributeError:

df['atom_number'] = [[int(x) if type(s) == str else np.nan for x in s.split(',')] for s in df.atom_number] 
df = df.dropna(subset = ["atom_number"])

AttributeError: ‘float’ object has no attribute ‘split’

desired output:

    atom_number
0   [12, 14, 24]
1   [23]
2   [14, 25, 46]
3   [15, 24]

I know I can filter df before using the list comprehension for string values, but I would like to know how this can be done in a list comprehension.

Asked By: Limmi

||

Source

Answer 1

Test type for values of Series s with isinstance:

df['atom_number'] = [[int(x) for x in s.split(',')] 
                     if isinstance(s, str) 
                     else np.nan for s in df.atom_number]

Your solution:

df['atom_number'] = [[int(x) for x in s.split(',')] 
                     if type(s) == str 
                     else np.nan for s in df.atom_number]

Or use map for integers:

df['atom_number'] = [list(map(int, s.split(',')))
                     if type(s) == str 
                     else np.nan for s in df.atom_number] 


df = df.dropna(subset = ["atom_number"])
print (df)
    atom_number
0  [12, 14, 24]
1          [23]
2  [14, 25, 46]
5      [15, 24]

More complex solution remove floats in splitted values by , and keep single integers:

d1 = {'atom_number': ["12,14,24",  "23", "14,25,46.5", 20.3 , np.nan,  "15,24", 20]}
df = pd.DataFrame(data=d1)

df['atom_number'] = [[int(x) for x in s.split(',') if float(x).is_integer()] 
                      if isinstance(s, str) 
                      else [s] 
                      if isinstance(s, int) 
                      else np.nan for s in df.atom_number] 

df = df.dropna(subset = ["atom_number"])
print (df)
    atom_number
0  [12, 14, 24]
1          [23]
2      [14, 25]
5      [15, 24]
6          [20]

Answered By: jezrael

Answer 2

pandas has a .apply method which is probably more optimal then list-comp. to modify pandas objects like DataFrames or series and looks more clearly.

also isinstance looks better then comparing type of object to type.

e.g.:

import pandas as pd
import numpy as np

data = {'atom_number': ["12,14,24",
                        "23",
                        "14,25,46",
                        20.3,
                        np.nan,
                        "15,24"]}
df = pd.DataFrame(data=data)


def split_atom(x):
    if isinstance(x, str):
        return np.array([int(i) for i in x.split(',')])
    elif isinstance(x, int):
        return np.array([x])


df.atom_number = df.atom_number.apply(split_atom)
df.dropna()

Answered By: Paweł Pietraszko

nested for loop/if statement in list comprehension

Question:

Answers: