Why does built-in sum behave wrongly after "from numpy import *"?

Question:

I have some code like:

import math, csv, sys, re, time, datetime, pickle, os, gzip
from numpy import *

x = [1, 2, 3, ... ]
y = sum(x)

The sum of the actual values in x is 2165496761, which is larger than the limit of 32bit integer. The reported y value is -2129470535, implying integer overflow.

Why did this happen? I thought the built-in sum was supposed to use Python’s arbitrary-size integers?


See How to restore a builtin that I overwrote by accident? if you’ve accidentally done something like this at the REPL (interpreter prompt).

Asked By: notilas

||

Answers:

Python handles large numbers with arbitrary precision:

>>> sum([721832253, 721832254, 721832254])
2165496761

Just sum them up!

To make sure you don’t use numpy.sum, try __builtins__.sum() instead.

Answered By: Martijn Pieters

Doing from numpy import * causes the built-in sum function to be replaced with numpy.sum:

>>> sum(xrange(10**7))
49999995000000L
>>> from numpy import sum
>>> sum(xrange(10**7)) # assuming a 32-bit platform
-2014260032

To verify that numpy.sum is in use, try to check the type of the result:

>>> sum([721832253, 721832254, 721832254])
-2129470535
>>> type(sum([721832253, 721832254, 721832254]))
<type 'numpy.int32'>

To avoid this problem, don’t use star import.

If you must use numpy.sum and want an arbitrary-sized integer result, specify a dtype for the result like so:

>>> sum([721832253, 721832254, 721832254],dtype=object)
2165496761L

or refer to the builtin sum explicitly (possibly giving it a more convenient binding):

>>> __builtins__.sum([721832253, 721832254, 721832254])
2165496761L
Answered By: DSM

The reason why you get this invalid value is that you’re using np.sum on a int32. Nothing prevents you from not using a np.int32 but a np.int64 or np.int128 dtype to represent your data. You could for example just use

x.view(np.int64).sum()

On a side note, please make sure that you never use from numpy import *. It’s a terrible practice and a habit you must get rid of as soon as possible. When you use the from ... import *, you might be overwriting some Python built-ins which makes it very difficult to debug. Typical example, your overwriting of functions like sum or max

Answered By: Pierre GM
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.