Removing a nan from a list
Question:
While trying to work on a project with pandas I have run into a problem. I had a list with a nan
value in it and I couldn’t remove it.
I have tried:
incoms=data['int_income'].unique().tolist()
incoms.remove('nan')
But it didn’t work:
list.remove(x): x not in list"
The list incoms
is as follows:
[75000.0, 50000.0, 0.0, 200000.0, 100000.0, 25000.0, nan, 10000.0, 175000.0, 150000.0, 125000.0]
Answers:
I think you need dropna
for remove NaN
s:
incoms=data['int_income'].dropna().unique().tolist()
print (incoms)
[75000.0, 50000.0, 0.0, 200000.0, 100000.0, 25000.0, 10000.0, 175000.0, 150000.0, 125000.0]
And if all values are integers only:
incoms=data['int_income'].dropna().astype(int).unique().tolist()
print (incoms)
[75000, 50000, 0, 200000, 100000, 25000, 10000, 175000, 150000, 125000]
Or remove NaN
s by selecting all non NaN values by numpy.isnan
:
a = data['int_income'].unique()
incoms= a[~np.isnan(a)].tolist()
print (incoms)
[75000.0, 50000.0, 0.0, 200000.0, 100000.0, 25000.0, 10000.0, 175000.0, 150000.0, 125000.0]
a = data['int_income'].unique()
incoms= a[~np.isnan(a)].astype(int).tolist()
print (incoms)
[75000, 50000, 0, 200000, 100000, 25000, 10000, 175000, 150000, 125000]
Pure python solution – slowier if big DataFrame
:
incoms=[x for x in list(set(data['int_income'])) if pd.notnull(x)]
print (incoms)
[0.0, 100000.0, 200000.0, 25000.0, 125000.0, 50000.0, 10000.0, 150000.0, 175000.0, 75000.0]
incoms=[int(x) for x in list(set(data['int_income'])) if pd.notnull(x)]
print (incoms)
[0, 100000, 200000, 25000, 125000, 50000, 10000, 150000, 175000, 75000]
What you can do is simply get a cleaned list where you don’t put the values that, once converted to strings, are ‘nan’.
The code would be :
incoms = [incom for incom in incoms if str(incom) != 'nan']
A possibility in that particular case is to remove nans earlier to avoid to do it in the list:
incoms=data['int_income'].dropna().unique().tolist()
If you arrived at this thread for removing NaNs from a Python list (not pandas dataframes), the easiest way is a list comprehension that filters out NaNs.
import math
new_list = [x for x in my_list if not (isinstance(x, float) and math.isnan(x))]
or filter out NaNs by using the fact that NaN is not equal to itself.
new_list = [x for x in my_list if x == x]
While trying to work on a project with pandas I have run into a problem. I had a list with a nan
value in it and I couldn’t remove it.
I have tried:
incoms=data['int_income'].unique().tolist()
incoms.remove('nan')
But it didn’t work:
list.remove(x): x not in list"
The list incoms
is as follows:
[75000.0, 50000.0, 0.0, 200000.0, 100000.0, 25000.0, nan, 10000.0, 175000.0, 150000.0, 125000.0]
I think you need dropna
for remove NaN
s:
incoms=data['int_income'].dropna().unique().tolist()
print (incoms)
[75000.0, 50000.0, 0.0, 200000.0, 100000.0, 25000.0, 10000.0, 175000.0, 150000.0, 125000.0]
And if all values are integers only:
incoms=data['int_income'].dropna().astype(int).unique().tolist()
print (incoms)
[75000, 50000, 0, 200000, 100000, 25000, 10000, 175000, 150000, 125000]
Or remove NaN
s by selecting all non NaN values by numpy.isnan
:
a = data['int_income'].unique()
incoms= a[~np.isnan(a)].tolist()
print (incoms)
[75000.0, 50000.0, 0.0, 200000.0, 100000.0, 25000.0, 10000.0, 175000.0, 150000.0, 125000.0]
a = data['int_income'].unique()
incoms= a[~np.isnan(a)].astype(int).tolist()
print (incoms)
[75000, 50000, 0, 200000, 100000, 25000, 10000, 175000, 150000, 125000]
Pure python solution – slowier if big DataFrame
:
incoms=[x for x in list(set(data['int_income'])) if pd.notnull(x)]
print (incoms)
[0.0, 100000.0, 200000.0, 25000.0, 125000.0, 50000.0, 10000.0, 150000.0, 175000.0, 75000.0]
incoms=[int(x) for x in list(set(data['int_income'])) if pd.notnull(x)]
print (incoms)
[0, 100000, 200000, 25000, 125000, 50000, 10000, 150000, 175000, 75000]
What you can do is simply get a cleaned list where you don’t put the values that, once converted to strings, are ‘nan’.
The code would be :
incoms = [incom for incom in incoms if str(incom) != 'nan']
A possibility in that particular case is to remove nans earlier to avoid to do it in the list:
incoms=data['int_income'].dropna().unique().tolist()
If you arrived at this thread for removing NaNs from a Python list (not pandas dataframes), the easiest way is a list comprehension that filters out NaNs.
import math
new_list = [x for x in my_list if not (isinstance(x, float) and math.isnan(x))]
or filter out NaNs by using the fact that NaN is not equal to itself.
new_list = [x for x in my_list if x == x]