How to deal with errors in a list comprehension
Question:
I’m trying to work out the cleanest way to deal with list comprehension when my function errors out for some reason.
Here’s an example that works:
# Make up a dataframe
df=pd.DataFrame({'city':['London','Paris','New York'],
'population_m':[6.7,7.2,12.1],
'density_kms':[5752,5897,4856]})
# Define some function
def calc_some_stuff(input_df,city):
temp_df=input_df[df.city==city]
return({
'city':city,
'value':int(temp_df.population_m * temp_df.density_kms / 5)})
# Use a list comprehension to cycle through cities calculating the random thing
cities=['London','Paris','New York']
pd.DataFrame([calc_some_stuff(df,c) for c in cities])
There are a few ways that can break, either NaNs or missing data
### First type of break, replace df with this (so introduce nan)
df=pd.DataFrame({'city':['London','Paris','New York'],
'population_m':[6.7,7.2,12.1],
'density_kms':[5752,np.nan,4856]})
### Second type of break, missing data (introducing a new city without data here)
cities=['London','Paris','New York','Berlin']
I’ve tried some hacky solutions, using if 'value' in locals() else None
but that’s a big mess. I also tried catching the two types of errors with if, elif, else but that gets really big and messy when the true function is much larger than my example one here.
The output I’m looking for (made up numbers) is:
city,value
London,4568
Paris,NA
New York,4862
Berlin,NA
Answers:
Comprehension lists can be quite hard to debug especially when they get nested. So a little tip I use when writing list comprehension:
- Write the logic first using
for
loops if it doesn’t work right away
- Use
print
statements for debugging the loop if you are not getting the expected output.
- When it works and you get the expected output, convert the
for
loop syntax to a list comprehension
.
- Extra: you can try breaking down nested list comprehensions by taking out some logic in a function like you do. Also another option is to make the logic explicit and stick to a for loop to make it readable for your future self and anyone else who might read your code.
Simpliest way would be with catching exceptions in your custom function – the problem isn’t really related to list comprehension, but to the fact that your function cannot handle undefined data.
import pandas as pd
import numpy as np
# Make up a dataframe
### First type of break, replace df with this (so introduce nan)
df=pd.DataFrame({'city':['London','Paris','New York'],
'population_m':[6.7,7.2,12.1],
'density_kms':[5752,np.nan,4856]})
### Second type of break, missing data (introducing a new city without data here)
cities=['London','Paris','New York','Berlin']
# Define some function
def calc_some_stuff(input_df,city):
temp_df=input_df[df.city==city]
try:
return({
'city':city,
'value':int(temp_df.population_m * temp_df.density_kms / 5)})
except (ValueError, TypeError):
return({
'city':city,
'value':np.nan})
# Use a list comprehension to cycle through cities calculating the random thing
print(pd.DataFrame([calc_some_stuff(df,c) for c in cities]))
I’m trying to work out the cleanest way to deal with list comprehension when my function errors out for some reason.
Here’s an example that works:
# Make up a dataframe
df=pd.DataFrame({'city':['London','Paris','New York'],
'population_m':[6.7,7.2,12.1],
'density_kms':[5752,5897,4856]})
# Define some function
def calc_some_stuff(input_df,city):
temp_df=input_df[df.city==city]
return({
'city':city,
'value':int(temp_df.population_m * temp_df.density_kms / 5)})
# Use a list comprehension to cycle through cities calculating the random thing
cities=['London','Paris','New York']
pd.DataFrame([calc_some_stuff(df,c) for c in cities])
There are a few ways that can break, either NaNs or missing data
### First type of break, replace df with this (so introduce nan)
df=pd.DataFrame({'city':['London','Paris','New York'],
'population_m':[6.7,7.2,12.1],
'density_kms':[5752,np.nan,4856]})
### Second type of break, missing data (introducing a new city without data here)
cities=['London','Paris','New York','Berlin']
I’ve tried some hacky solutions, using if 'value' in locals() else None
but that’s a big mess. I also tried catching the two types of errors with if, elif, else but that gets really big and messy when the true function is much larger than my example one here.
The output I’m looking for (made up numbers) is:
city,value
London,4568
Paris,NA
New York,4862
Berlin,NA
Comprehension lists can be quite hard to debug especially when they get nested. So a little tip I use when writing list comprehension:
- Write the logic first using
for
loops if it doesn’t work right away - Use
print
statements for debugging the loop if you are not getting the expected output. - When it works and you get the expected output, convert the
for
loop syntax to alist comprehension
. - Extra: you can try breaking down nested list comprehensions by taking out some logic in a function like you do. Also another option is to make the logic explicit and stick to a for loop to make it readable for your future self and anyone else who might read your code.
Simpliest way would be with catching exceptions in your custom function – the problem isn’t really related to list comprehension, but to the fact that your function cannot handle undefined data.
import pandas as pd
import numpy as np
# Make up a dataframe
### First type of break, replace df with this (so introduce nan)
df=pd.DataFrame({'city':['London','Paris','New York'],
'population_m':[6.7,7.2,12.1],
'density_kms':[5752,np.nan,4856]})
### Second type of break, missing data (introducing a new city without data here)
cities=['London','Paris','New York','Berlin']
# Define some function
def calc_some_stuff(input_df,city):
temp_df=input_df[df.city==city]
try:
return({
'city':city,
'value':int(temp_df.population_m * temp_df.density_kms / 5)})
except (ValueError, TypeError):
return({
'city':city,
'value':np.nan})
# Use a list comprehension to cycle through cities calculating the random thing
print(pd.DataFrame([calc_some_stuff(df,c) for c in cities]))