"RuntimeWarning: divide by zero encountered in log" in numpy.log even though small values were filtered out
Question:
Given samplex
:
In [22]: samplex
Out[22]:
array([0. , 0.00204082, 0.00408163, 0.00612245, 0.00816327,
0.01020408, 0.0122449 , 0.01428571, 0.01632653, 0.01836735,
0.02040816, 0.02244898, 0.0244898 , 0.02653061, 0.02857143,
0.03061224, 0.03265306, 0.03469388, 0.03673469, 0.03877551,
0.04081633, 0.04285714, 0.04489796, 0.04693878, 0.04897959,
0.05102041, 0.05306122, 0.05510204, 0.05714286, 0.05918367,
0.06122449, 0.06326531, 0.06530612, 0.06734694, 0.06938776,
0.07142857, 0.07346939, 0.0755102 , 0.07755102, 0.07959184,
0.08163265, 0.08367347, 0.08571429, 0.0877551 , 0.08979592,
0.09183673, 0.09387755, 0.09591837, 0.09795918, 0.1 ])
I am using numpy.where
to protect against log(0)
using np.where(samplex>1e-8
:
import numpy as np
np.where(samplex>1e-8,np.log(samplex),0)
But that’s not completely working – a warning is generated though numpy
does complete the work anyways:
<ipython-input-18-e5dde8c65402>:1: RuntimeWarning: divide by zero encountered in log
np.where(samplex>1e-8,np.log(samplex),0)
Out[18]:
array([ 0. , -6.19440539, -5.50125821, -5.0957931 , -4.80811103,
-4.58496748, -4.40264592, -4.24849524, -4.11496385, -3.99718081,
-3.8918203 , -3.79651012, -3.70949874, -3.62945603, -3.55534806,
-3.48635519, -3.42181667, -3.36119205, -3.30403363, -3.24996641,
-3.19867312, -3.14988295, -3.10336294, -3.05891118, -3.01635156,
-2.97552957, -2.93630885, -2.89856853, -2.86220088, -2.82710956,
-2.79320801, -2.76041819, -2.72866949, -2.69789783, -2.66804487,
-2.63905733, -2.61088645, -2.58348748, -2.55681923, -2.53084374,
-2.50552594, -2.48083332, -2.45673577, -2.43320528, -2.41021576,
-2.3877429 , -2.36576399, -2.34425779, -2.32320438, -2.30258509])
So what is happening here? Is there a preferred pattern to protect against divide by 0’s?
Answers:
Based on this comment from @TimRoberts
The problem is that np.log(samplex) gets evaluated immediately, before its result gets passed to np.where. You would need to extract a subarray and pass that to np.log
.
the condition may be inverted and rewritten as:
np.log(np.where(samplex>1e-8,samplex,1e-8))
Out[26]:
array([-18.42068074, -6.19440539, -5.50125821, -5.0957931 ,
-4.80811103, -4.58496748, -4.40264592, -4.24849524,
-4.11496385, -3.99718081, -3.8918203 , -3.79651012,
-3.70949874, -3.62945603, -3.55534806, -3.48635519,
-3.42181667, -3.36119205, -3.30403363, -3.24996641,
-3.19867312, -3.14988295, -3.10336294, -3.05891118,
-3.01635156, -2.97552957, -2.93630885, -2.89856853,
-2.86220088, -2.82710956, -2.79320801, -2.76041819,
-2.72866949, -2.69789783, -2.66804487, -2.63905733,
-2.61088645, -2.58348748, -2.55681923, -2.53084374,
-2.50552594, -2.48083332, -2.45673577, -2.43320528,
-2.41021576, -2.3877429 , -2.36576399, -2.34425779,
-2.32320438, -2.30258509])
To pad with 0
instead of an obscure log(1e-8)
using list comprehension and conversion to np.array
:
>>> g = np.array([np.log(s) if abs(s) > 1e-8 else 0 for s in samplex])
>>> g
array([ 0. , -6.19440359, -5.50125886, -5.09579294, -4.80811045,
-4.58496764, -4.40264576, -4.24849554, -4.11496389, -3.99718065,
-3.89182046, -3.7965101 , -3.70949857, -3.62945612, -3.55534801,
-3.48635535, -3.42181671, -3.36119198, -3.30403374, -3.24996642,
-3.19867303, -3.14988302, -3.10336292, -3.05891108, -3.0163516 ,
-2.97552953, -2.93630894, -2.89856854, -2.86220083, -2.82710962,
-2.79320801, -2.76041813, -2.72866953, -2.69789781, -2.6680448 ,
-2.63905735, -2.61088642, -2.58348753, -2.55681924, -2.5308437 ,
-2.50552597, -2.48083332, -2.45673572, -2.4332053 , -2.41021574,
-2.38774295, -2.36576401, -2.34425776, -2.32320442, -2.30258509])
May be a bit time-consuming compared to np.where
but probably more intuitive (both the code and result).
The documentation "equivalent" is
[xv if c else yv
for c, xv, yv in zip(condition, x, y)]
plugging your example in
[xv if c else yv
for c, xv, yv in zip(samplex>1e-8, np.log(samplex), 0)]
np.log(samplex)
is evaluated before it used in the zip. It is not the equivalent of:
[np.log(x) if x>1e-8 else 0 for x in samplex]
The 2nd and 3rd arguments of where
are arrays, not functions. There’s no conditional evaluation here.
ufunc
like np.log
take a where
parameter that does a conditional evaluation, avoiding the warning. It also needs to be used with an out
array (otherwise np.empty
will be used):
In [29]: res = np.log(samplex, where=samplex>1e-8, out=np.zeros_like(samplex))
In [30]: res
Out[30]:
array([ 0. , -6.19440359, -5.50125886, -5.09579294, -4.80811045,
-4.58496764, -4.40264576, -4.24849554, -4.11496389, -3.99718065,
-3.89182046, -3.7965101 , -3.70949857, -3.62945612, -3.55534801,
-3.48635535, -3.42181671, -3.36119198, -3.30403374, -3.24996642,
-3.19867303, -3.14988302, -3.10336292, -3.05891108, -3.0163516 ,
-2.97552953, -2.93630894, -2.89856854, -2.86220083, -2.82710962,
-2.79320801, -2.76041813, -2.72866953, -2.69789781, -2.6680448 ,
-2.63905735, -2.61088642, -2.58348753, -2.55681924, -2.5308437 ,
-2.50552597, -2.48083332, -2.45673572, -2.4332053 , -2.41021574,
-2.38774295, -2.36576401, -2.34425776, -2.32320442, -2.30258509])
Another approach is to suppress the warning – I won’t go into the details.
Given samplex
:
In [22]: samplex
Out[22]:
array([0. , 0.00204082, 0.00408163, 0.00612245, 0.00816327,
0.01020408, 0.0122449 , 0.01428571, 0.01632653, 0.01836735,
0.02040816, 0.02244898, 0.0244898 , 0.02653061, 0.02857143,
0.03061224, 0.03265306, 0.03469388, 0.03673469, 0.03877551,
0.04081633, 0.04285714, 0.04489796, 0.04693878, 0.04897959,
0.05102041, 0.05306122, 0.05510204, 0.05714286, 0.05918367,
0.06122449, 0.06326531, 0.06530612, 0.06734694, 0.06938776,
0.07142857, 0.07346939, 0.0755102 , 0.07755102, 0.07959184,
0.08163265, 0.08367347, 0.08571429, 0.0877551 , 0.08979592,
0.09183673, 0.09387755, 0.09591837, 0.09795918, 0.1 ])
I am using numpy.where
to protect against log(0)
using np.where(samplex>1e-8
:
import numpy as np
np.where(samplex>1e-8,np.log(samplex),0)
But that’s not completely working – a warning is generated though numpy
does complete the work anyways:
<ipython-input-18-e5dde8c65402>:1: RuntimeWarning: divide by zero encountered in log
np.where(samplex>1e-8,np.log(samplex),0)
Out[18]:
array([ 0. , -6.19440539, -5.50125821, -5.0957931 , -4.80811103,
-4.58496748, -4.40264592, -4.24849524, -4.11496385, -3.99718081,
-3.8918203 , -3.79651012, -3.70949874, -3.62945603, -3.55534806,
-3.48635519, -3.42181667, -3.36119205, -3.30403363, -3.24996641,
-3.19867312, -3.14988295, -3.10336294, -3.05891118, -3.01635156,
-2.97552957, -2.93630885, -2.89856853, -2.86220088, -2.82710956,
-2.79320801, -2.76041819, -2.72866949, -2.69789783, -2.66804487,
-2.63905733, -2.61088645, -2.58348748, -2.55681923, -2.53084374,
-2.50552594, -2.48083332, -2.45673577, -2.43320528, -2.41021576,
-2.3877429 , -2.36576399, -2.34425779, -2.32320438, -2.30258509])
So what is happening here? Is there a preferred pattern to protect against divide by 0’s?
Based on this comment from @TimRoberts
The problem is that np.log(samplex) gets evaluated immediately, before its result gets passed to np.where. You would need to extract a subarray and pass that to
np.log
.
the condition may be inverted and rewritten as:
np.log(np.where(samplex>1e-8,samplex,1e-8))
Out[26]:
array([-18.42068074, -6.19440539, -5.50125821, -5.0957931 ,
-4.80811103, -4.58496748, -4.40264592, -4.24849524,
-4.11496385, -3.99718081, -3.8918203 , -3.79651012,
-3.70949874, -3.62945603, -3.55534806, -3.48635519,
-3.42181667, -3.36119205, -3.30403363, -3.24996641,
-3.19867312, -3.14988295, -3.10336294, -3.05891118,
-3.01635156, -2.97552957, -2.93630885, -2.89856853,
-2.86220088, -2.82710956, -2.79320801, -2.76041819,
-2.72866949, -2.69789783, -2.66804487, -2.63905733,
-2.61088645, -2.58348748, -2.55681923, -2.53084374,
-2.50552594, -2.48083332, -2.45673577, -2.43320528,
-2.41021576, -2.3877429 , -2.36576399, -2.34425779,
-2.32320438, -2.30258509])
To pad with 0
instead of an obscure log(1e-8)
using list comprehension and conversion to np.array
:
>>> g = np.array([np.log(s) if abs(s) > 1e-8 else 0 for s in samplex])
>>> g
array([ 0. , -6.19440359, -5.50125886, -5.09579294, -4.80811045,
-4.58496764, -4.40264576, -4.24849554, -4.11496389, -3.99718065,
-3.89182046, -3.7965101 , -3.70949857, -3.62945612, -3.55534801,
-3.48635535, -3.42181671, -3.36119198, -3.30403374, -3.24996642,
-3.19867303, -3.14988302, -3.10336292, -3.05891108, -3.0163516 ,
-2.97552953, -2.93630894, -2.89856854, -2.86220083, -2.82710962,
-2.79320801, -2.76041813, -2.72866953, -2.69789781, -2.6680448 ,
-2.63905735, -2.61088642, -2.58348753, -2.55681924, -2.5308437 ,
-2.50552597, -2.48083332, -2.45673572, -2.4332053 , -2.41021574,
-2.38774295, -2.36576401, -2.34425776, -2.32320442, -2.30258509])
May be a bit time-consuming compared to np.where
but probably more intuitive (both the code and result).
The documentation "equivalent" is
[xv if c else yv
for c, xv, yv in zip(condition, x, y)]
plugging your example in
[xv if c else yv
for c, xv, yv in zip(samplex>1e-8, np.log(samplex), 0)]
np.log(samplex)
is evaluated before it used in the zip. It is not the equivalent of:
[np.log(x) if x>1e-8 else 0 for x in samplex]
The 2nd and 3rd arguments of where
are arrays, not functions. There’s no conditional evaluation here.
ufunc
like np.log
take a where
parameter that does a conditional evaluation, avoiding the warning. It also needs to be used with an out
array (otherwise np.empty
will be used):
In [29]: res = np.log(samplex, where=samplex>1e-8, out=np.zeros_like(samplex))
In [30]: res
Out[30]:
array([ 0. , -6.19440359, -5.50125886, -5.09579294, -4.80811045,
-4.58496764, -4.40264576, -4.24849554, -4.11496389, -3.99718065,
-3.89182046, -3.7965101 , -3.70949857, -3.62945612, -3.55534801,
-3.48635535, -3.42181671, -3.36119198, -3.30403374, -3.24996642,
-3.19867303, -3.14988302, -3.10336292, -3.05891108, -3.0163516 ,
-2.97552953, -2.93630894, -2.89856854, -2.86220083, -2.82710962,
-2.79320801, -2.76041813, -2.72866953, -2.69789781, -2.6680448 ,
-2.63905735, -2.61088642, -2.58348753, -2.55681924, -2.5308437 ,
-2.50552597, -2.48083332, -2.45673572, -2.4332053 , -2.41021574,
-2.38774295, -2.36576401, -2.34425776, -2.32320442, -2.30258509])
Another approach is to suppress the warning – I won’t go into the details.