"RuntimeWarning: divide by zero encountered in log" in numpy.log even though small values were filtered out

Question:

Given samplex:

In [22]: samplex
Out[22]:
array([0.        , 0.00204082, 0.00408163, 0.00612245, 0.00816327,
       0.01020408, 0.0122449 , 0.01428571, 0.01632653, 0.01836735,
       0.02040816, 0.02244898, 0.0244898 , 0.02653061, 0.02857143,
       0.03061224, 0.03265306, 0.03469388, 0.03673469, 0.03877551,
       0.04081633, 0.04285714, 0.04489796, 0.04693878, 0.04897959,
       0.05102041, 0.05306122, 0.05510204, 0.05714286, 0.05918367,
       0.06122449, 0.06326531, 0.06530612, 0.06734694, 0.06938776,
       0.07142857, 0.07346939, 0.0755102 , 0.07755102, 0.07959184,
       0.08163265, 0.08367347, 0.08571429, 0.0877551 , 0.08979592,
       0.09183673, 0.09387755, 0.09591837, 0.09795918, 0.1       ])

I am using numpy.where to protect against log(0) using np.where(samplex>1e-8:

import numpy as np
np.where(samplex>1e-8,np.log(samplex),0)

But that’s not completely working – a warning is generated though numpy does complete the work anyways:

<ipython-input-18-e5dde8c65402>:1: RuntimeWarning: divide by zero encountered in log
  np.where(samplex>1e-8,np.log(samplex),0)
Out[18]:
array([ 0.        , -6.19440539, -5.50125821, -5.0957931 , -4.80811103,
       -4.58496748, -4.40264592, -4.24849524, -4.11496385, -3.99718081,
       -3.8918203 , -3.79651012, -3.70949874, -3.62945603, -3.55534806,
       -3.48635519, -3.42181667, -3.36119205, -3.30403363, -3.24996641,
       -3.19867312, -3.14988295, -3.10336294, -3.05891118, -3.01635156,
       -2.97552957, -2.93630885, -2.89856853, -2.86220088, -2.82710956,
       -2.79320801, -2.76041819, -2.72866949, -2.69789783, -2.66804487,
       -2.63905733, -2.61088645, -2.58348748, -2.55681923, -2.53084374,
       -2.50552594, -2.48083332, -2.45673577, -2.43320528, -2.41021576,
       -2.3877429 , -2.36576399, -2.34425779, -2.32320438, -2.30258509])

So what is happening here? Is there a preferred pattern to protect against divide by 0’s?

Asked By: WestCoastProjects

||

Answers:

Based on this comment from @TimRoberts

The problem is that np.log(samplex) gets evaluated immediately, before its result gets passed to np.where. You would need to extract a subarray and pass that to np.log.

the condition may be inverted and rewritten as:

np.log(np.where(samplex>1e-8,samplex,1e-8))
Out[26]:
array([-18.42068074,  -6.19440539,  -5.50125821,  -5.0957931 ,
        -4.80811103,  -4.58496748,  -4.40264592,  -4.24849524,
        -4.11496385,  -3.99718081,  -3.8918203 ,  -3.79651012,
        -3.70949874,  -3.62945603,  -3.55534806,  -3.48635519,
        -3.42181667,  -3.36119205,  -3.30403363,  -3.24996641,
        -3.19867312,  -3.14988295,  -3.10336294,  -3.05891118,
        -3.01635156,  -2.97552957,  -2.93630885,  -2.89856853,
        -2.86220088,  -2.82710956,  -2.79320801,  -2.76041819,
        -2.72866949,  -2.69789783,  -2.66804487,  -2.63905733,
        -2.61088645,  -2.58348748,  -2.55681923,  -2.53084374,
        -2.50552594,  -2.48083332,  -2.45673577,  -2.43320528,
        -2.41021576,  -2.3877429 ,  -2.36576399,  -2.34425779,
        -2.32320438,  -2.30258509])
Answered By: WestCoastProjects

To pad with 0 instead of an obscure log(1e-8) using list comprehension and conversion to np.array :

>>> g = np.array([np.log(s) if abs(s) > 1e-8 else 0 for s in samplex])
>>> g
array([ 0.        , -6.19440359, -5.50125886, -5.09579294, -4.80811045,
       -4.58496764, -4.40264576, -4.24849554, -4.11496389, -3.99718065,
       -3.89182046, -3.7965101 , -3.70949857, -3.62945612, -3.55534801,
       -3.48635535, -3.42181671, -3.36119198, -3.30403374, -3.24996642,
       -3.19867303, -3.14988302, -3.10336292, -3.05891108, -3.0163516 ,
       -2.97552953, -2.93630894, -2.89856854, -2.86220083, -2.82710962,
       -2.79320801, -2.76041813, -2.72866953, -2.69789781, -2.6680448 ,
       -2.63905735, -2.61088642, -2.58348753, -2.55681924, -2.5308437 ,
       -2.50552597, -2.48083332, -2.45673572, -2.4332053 , -2.41021574,
       -2.38774295, -2.36576401, -2.34425776, -2.32320442, -2.30258509])

May be a bit time-consuming compared to np.where but probably more intuitive (both the code and result).

Answered By: Partha D.

The documentation "equivalent" is

[xv if c else yv
     for c, xv, yv in zip(condition, x, y)]

plugging your example in

[xv if c else yv
     for c, xv, yv in zip(samplex>1e-8, np.log(samplex), 0)]

np.log(samplex) is evaluated before it used in the zip. It is not the equivalent of:

[np.log(x) if x>1e-8 else 0 for x in samplex]

The 2nd and 3rd arguments of where are arrays, not functions. There’s no conditional evaluation here.

ufunc like np.log take a where parameter that does a conditional evaluation, avoiding the warning. It also needs to be used with an out array (otherwise np.empty will be used):

In [29]: res = np.log(samplex, where=samplex>1e-8, out=np.zeros_like(samplex))
In [30]: res
Out[30]:
array([ 0.        , -6.19440359, -5.50125886, -5.09579294, -4.80811045,
       -4.58496764, -4.40264576, -4.24849554, -4.11496389, -3.99718065,
       -3.89182046, -3.7965101 , -3.70949857, -3.62945612, -3.55534801,
       -3.48635535, -3.42181671, -3.36119198, -3.30403374, -3.24996642,
       -3.19867303, -3.14988302, -3.10336292, -3.05891108, -3.0163516 ,
       -2.97552953, -2.93630894, -2.89856854, -2.86220083, -2.82710962,
       -2.79320801, -2.76041813, -2.72866953, -2.69789781, -2.6680448 ,
       -2.63905735, -2.61088642, -2.58348753, -2.55681924, -2.5308437 ,
       -2.50552597, -2.48083332, -2.45673572, -2.4332053 , -2.41021574,
       -2.38774295, -2.36576401, -2.34425776, -2.32320442, -2.30258509])

Another approach is to suppress the warning – I won’t go into the details.

Answered By: hpaulj
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.