Pandas division by zero, errors despite np.where condition
Question:
So I am using jupyter notebooks and I have a function that uses the
data['woe2'] = np.log(data['B']/data['MonthSales'])
equation. The issue i’m having is that when ‘B’ equals 0 Python throws a tantrum over division by 0. This happens even though I tried using np.where to make an exception. Do you guys have any ideas?
import pandas as pd
data = pd.DataFrame({"A" : ["John","Deep","Julia","Kate","Sandy"],
"MonthSales" : [25,30,35,40,45], "B" : [10,0,0,20,40]})
data['woe2'] = np.where((data['B'] != 0),
np.log(data['B']/data['MonthSales']), 0)
Answers:
It isn’t complaining about division BY zero, but division of zero by a non-zero denominator. It is producing -inf
.
Here is a bit cleaner way to do it, as you can pass Pandas tests in as conditionals.
data_bool = data[‘B’] != 0
data[‘woe2’] = np.log(data[data_bool][‘B’]/data[data_bool][‘MonthSales’])
In recent:
we explain that np.where
is a conditional selector; its arguments are evaluated in full first.
The Series division:
In [72]: data['B']/data['MonthSales']
Out[72]:
0 0.400000
1 0.000000
2 0.000000
3 0.500000
4 0.888889
dtype: float64
Taking the log
, raises the warning. Note it is issued by pandas.core.arraylike
:
In [73]: np.log(data['B']/data['MonthSales'])
C:Userspaulminiconda3libsite-packagespandascorearraylike.py:402: RuntimeWarning: divide by zero encountered in log
result = getattr(ufunc, method)(*inputs, **kwargs)
Out[73]:
0 -0.916291
1 -inf
2 -inf
3 -0.693147
4 -0.117783
dtype: float64
If instead we take the log of the equivalent array, using the where/out
parameters to make it conditional, we avoid the warning:
In [74]: np.log((data['B']/data['MonthSales']).values, where=data['B']>0,
out=np.zeros(data.shape[0]))
Out[74]: array([-0.91629073, 0. , 0. , -0.69314718, -0.11778304])
I think that warning is just not significant (like all warnings), indeed, in numpy documentation, they put an example with 0 in the argument array and when we run the same code of the example, it goes with the same warning
So I am using jupyter notebooks and I have a function that uses the
data['woe2'] = np.log(data['B']/data['MonthSales'])
equation. The issue i’m having is that when ‘B’ equals 0 Python throws a tantrum over division by 0. This happens even though I tried using np.where to make an exception. Do you guys have any ideas?
import pandas as pd
data = pd.DataFrame({"A" : ["John","Deep","Julia","Kate","Sandy"],
"MonthSales" : [25,30,35,40,45], "B" : [10,0,0,20,40]})
data['woe2'] = np.where((data['B'] != 0),
np.log(data['B']/data['MonthSales']), 0)
It isn’t complaining about division BY zero, but division of zero by a non-zero denominator. It is producing -inf
.
Here is a bit cleaner way to do it, as you can pass Pandas tests in as conditionals.
data_bool = data[‘B’] != 0
data[‘woe2’] = np.log(data[data_bool][‘B’]/data[data_bool][‘MonthSales’])
In recent:
we explain that np.where
is a conditional selector; its arguments are evaluated in full first.
The Series division:
In [72]: data['B']/data['MonthSales']
Out[72]:
0 0.400000
1 0.000000
2 0.000000
3 0.500000
4 0.888889
dtype: float64
Taking the log
, raises the warning. Note it is issued by pandas.core.arraylike
:
In [73]: np.log(data['B']/data['MonthSales'])
C:Userspaulminiconda3libsite-packagespandascorearraylike.py:402: RuntimeWarning: divide by zero encountered in log
result = getattr(ufunc, method)(*inputs, **kwargs)
Out[73]:
0 -0.916291
1 -inf
2 -inf
3 -0.693147
4 -0.117783
dtype: float64
If instead we take the log of the equivalent array, using the where/out
parameters to make it conditional, we avoid the warning:
In [74]: np.log((data['B']/data['MonthSales']).values, where=data['B']>0,
out=np.zeros(data.shape[0]))
Out[74]: array([-0.91629073, 0. , 0. , -0.69314718, -0.11778304])
I think that warning is just not significant (like all warnings), indeed, in numpy documentation, they put an example with 0 in the argument array and when we run the same code of the example, it goes with the same warning