Dropping infinite values from dataframes in pandas?
Question:
How do I drop nan
, inf
, and -inf
values from a DataFrame
without resetting mode.use_inf_as_null
?
Can I tell dropna
to include inf
in its definition of missing values so that the following works?
df.dropna(subset=["col1", "col2"], how="all")
Answers:
First replace()
infs with NaN:
df.replace([np.inf, -np.inf], np.nan, inplace=True)
and then drop NaNs via dropna()
:
df.dropna(subset=["col1", "col2"], how="all", inplace=True)
For example:
>>> df = pd.DataFrame({"col1": [1, np.inf, -np.inf], "col2": [2, 3, np.nan]})
>>> df
col1 col2
0 1.0 2.0
1 inf 3.0
2 -inf NaN
>>> df.replace([np.inf, -np.inf], np.nan, inplace=True)
>>> df
col1 col2
0 1.0 2.0
1 NaN 3.0
2 NaN NaN
>>> df.dropna(subset=["col1", "col2"], how="all", inplace=True)
>>> df
col1 col2
0 1.0 2.0
1 NaN 3.0
The same method also works for Series
.
The above solution will modify the inf
s that are not in the target columns. To remedy that,
lst = [np.inf, -np.inf]
to_replace = {v: lst for v in ['col1', 'col2']}
df.replace(to_replace, np.nan)
Here is another method using .loc
to replace inf with nan on a Series:
s.loc[(~np.isfinite(s)) & s.notnull()] = np.nan
So, in response to the original question:
df = pd.DataFrame(np.ones((3, 3)), columns=list('ABC'))
for i in range(3):
df.iat[i, i] = np.inf
df
A B C
0 inf 1.000000 1.000000
1 1.000000 inf 1.000000
2 1.000000 1.000000 inf
df.sum()
A inf
B inf
C inf
dtype: float64
df.apply(lambda s: s[np.isfinite(s)].dropna()).sum()
A 2
B 2
C 2
dtype: float64
With option context, this is possible without permanently setting use_inf_as_na
. For example:
with pd.option_context('mode.use_inf_as_na', True):
df = df.dropna(subset=['col1', 'col2'], how='all')
Of course it can be set to treat inf
as NaN
permanently with
pd.set_option('use_inf_as_na', True)
For older versions, replace use_inf_as_na
with use_inf_as_null
.
Yet another solution would be to use the isin
method. Use it to determine whether each value is infinite or missing and then chain the all
method to determine if all the values in the rows are infinite or missing.
Finally, use the negation of that result to select the rows that don’t have all infinite or missing values via boolean indexing.
all_inf_or_nan = df.isin([np.inf, -np.inf, np.nan]).all(axis='columns')
df[~all_inf_or_nan]
You can use pd.DataFrame.mask
with np.isinf
. You should ensure first your dataframe series are all of type float
. Then use dropna
with your existing logic.
print(df)
col1 col2
0 -0.441406 inf
1 -0.321105 -inf
2 -0.412857 2.223047
3 -0.356610 2.513048
df = df.mask(np.isinf)
print(df)
col1 col2
0 -0.441406 NaN
1 -0.321105 NaN
2 -0.412857 2.223047
3 -0.356610 2.513048
Use (fast and simple):
df = df[np.isfinite(df).all(1)]
This answer is based on DougR’s answer in an other question.
Here an example code:
import pandas as pd
import numpy as np
df=pd.DataFrame([1,2,3,np.nan,4,np.inf,5,-np.inf,6])
print('Input:n',df,sep='')
df = df[np.isfinite(df).all(1)]
print('nDropped:n',df,sep='')
Result:
Input:
0
0 1.0000
1 2.0000
2 3.0000
3 NaN
4 4.0000
5 inf
6 5.0000
7 -inf
8 6.0000
Dropped:
0
0 1.0
1 2.0
2 3.0
4 4.0
6 5.0
8 6.0
To remove both Nan
, and inf
using a single command use
df = df[ np.isfinite( df ).all( axis = 1) ]
If for some reason the above doesn’t work for you, please try the following 2 steps:
df = df[ ~( df.isnull().any( axis = 1 ) ) ] #to remove nan
df = df[ ~( df.isin( [np.inf, -np.inf]).any(axis =1) )] #to remove inf
Just stumbled upon this one and I found a one line without replace or numpy:
df = pd.DataFrame(
[[1, np.inf],
[1, -np.inf],
[1, 2]],
columns=['a', 'b']
)
df.query("b not in [inf, -inf]")
>>> a b
2 1 2.0
For some version of pandas
, one might need to use back `
around the name of the column b
.
Unlike other answers here, this one line code worked for me.
import numpy as np
df= df[df['required_column_name']!= np.inf]
How do I drop nan
, inf
, and -inf
values from a DataFrame
without resetting mode.use_inf_as_null
?
Can I tell dropna
to include inf
in its definition of missing values so that the following works?
df.dropna(subset=["col1", "col2"], how="all")
First replace()
infs with NaN:
df.replace([np.inf, -np.inf], np.nan, inplace=True)
and then drop NaNs via dropna()
:
df.dropna(subset=["col1", "col2"], how="all", inplace=True)
For example:
>>> df = pd.DataFrame({"col1": [1, np.inf, -np.inf], "col2": [2, 3, np.nan]})
>>> df
col1 col2
0 1.0 2.0
1 inf 3.0
2 -inf NaN
>>> df.replace([np.inf, -np.inf], np.nan, inplace=True)
>>> df
col1 col2
0 1.0 2.0
1 NaN 3.0
2 NaN NaN
>>> df.dropna(subset=["col1", "col2"], how="all", inplace=True)
>>> df
col1 col2
0 1.0 2.0
1 NaN 3.0
The same method also works for Series
.
The above solution will modify the inf
s that are not in the target columns. To remedy that,
lst = [np.inf, -np.inf]
to_replace = {v: lst for v in ['col1', 'col2']}
df.replace(to_replace, np.nan)
Here is another method using .loc
to replace inf with nan on a Series:
s.loc[(~np.isfinite(s)) & s.notnull()] = np.nan
So, in response to the original question:
df = pd.DataFrame(np.ones((3, 3)), columns=list('ABC'))
for i in range(3):
df.iat[i, i] = np.inf
df
A B C
0 inf 1.000000 1.000000
1 1.000000 inf 1.000000
2 1.000000 1.000000 inf
df.sum()
A inf
B inf
C inf
dtype: float64
df.apply(lambda s: s[np.isfinite(s)].dropna()).sum()
A 2
B 2
C 2
dtype: float64
With option context, this is possible without permanently setting use_inf_as_na
. For example:
with pd.option_context('mode.use_inf_as_na', True):
df = df.dropna(subset=['col1', 'col2'], how='all')
Of course it can be set to treat inf
as NaN
permanently with
pd.set_option('use_inf_as_na', True)
For older versions, replace use_inf_as_na
with use_inf_as_null
.
Yet another solution would be to use the isin
method. Use it to determine whether each value is infinite or missing and then chain the all
method to determine if all the values in the rows are infinite or missing.
Finally, use the negation of that result to select the rows that don’t have all infinite or missing values via boolean indexing.
all_inf_or_nan = df.isin([np.inf, -np.inf, np.nan]).all(axis='columns')
df[~all_inf_or_nan]
You can use pd.DataFrame.mask
with np.isinf
. You should ensure first your dataframe series are all of type float
. Then use dropna
with your existing logic.
print(df)
col1 col2
0 -0.441406 inf
1 -0.321105 -inf
2 -0.412857 2.223047
3 -0.356610 2.513048
df = df.mask(np.isinf)
print(df)
col1 col2
0 -0.441406 NaN
1 -0.321105 NaN
2 -0.412857 2.223047
3 -0.356610 2.513048
Use (fast and simple):
df = df[np.isfinite(df).all(1)]
This answer is based on DougR’s answer in an other question.
Here an example code:
import pandas as pd
import numpy as np
df=pd.DataFrame([1,2,3,np.nan,4,np.inf,5,-np.inf,6])
print('Input:n',df,sep='')
df = df[np.isfinite(df).all(1)]
print('nDropped:n',df,sep='')
Result:
Input:
0
0 1.0000
1 2.0000
2 3.0000
3 NaN
4 4.0000
5 inf
6 5.0000
7 -inf
8 6.0000
Dropped:
0
0 1.0
1 2.0
2 3.0
4 4.0
6 5.0
8 6.0
To remove both Nan
, and inf
using a single command use
df = df[ np.isfinite( df ).all( axis = 1) ]
If for some reason the above doesn’t work for you, please try the following 2 steps:
df = df[ ~( df.isnull().any( axis = 1 ) ) ] #to remove nan
df = df[ ~( df.isin( [np.inf, -np.inf]).any(axis =1) )] #to remove inf
Just stumbled upon this one and I found a one line without replace or numpy:
df = pd.DataFrame(
[[1, np.inf],
[1, -np.inf],
[1, 2]],
columns=['a', 'b']
)
df.query("b not in [inf, -inf]")
>>> a b
2 1 2.0
For some version of pandas
, one might need to use back `
around the name of the column b
.
Unlike other answers here, this one line code worked for me.
import numpy as np
df= df[df['required_column_name']!= np.inf]