Python "range" resource consumption

Question:

I wrote the following script

Basically, I’m just learning Python for Machine Learning and wanted to check how really computationally intensive tasks would perform. I observe that for 10**8 iterations, Python takes up a lot of RAM (around 3.8 GB) and also a lot of CPU time (just froze my system)

I want to know if there is any way to limit the time/memory consumption either through code or some global settings

Script –

initial_start = time.clock()
for i in range(9):
 start = time.clock()
 for j in range(10**i):
  pass
 stop = time.clock()
 print 'Looping exp(',i,') times takes', stop - start, 'seconds'
final_stop = time.clock()
print 'Overall program time is',final_stop - initial_start,'seconds'
Asked By: Sammy25

||

Answers:

In Python 2, range creates a list. Use xrange instead. For a more detailed explanation see Should you always favor xrange() over range()?

Note that a no-op for loop is a very poor benchmark that tells you pretty much nothing about Python.

Also note, as per gnibbler’s comment, Python 3’s range is works like Python 2’s xrange.

Answered By: Steven Rumbalski

Python takes RAM because you’re creating a very large list of 10 ** 8 length with range function. That’s where iterators become useful.

Use xrange instead of range.

It will work the same way as range do but instead of creating that large list in memory, xrange will just calculate inner index (incrementing it’s value by 1 each iteration).

Answered By: Rostyslav Dzinko

look at this question: How to limit the heap size?

To address your script, the timeit module measures the time it takes to perform an action more accurately

>>> import timeit
>>> for i in range(9):
...     print timeit.timeit(stmt='pass', number=10**i)
...
0.0
0.0
0.0
0.0
0.0
0.015625
0.0625
0.468752861023
2.98439407349

Your example is taking most of its time dealing with the gigantic lists of numbers you’re putting it memory. xrange instead of range will help fix that issue but you’re still using a terrible benchmark. the loop is going to execute over and over and not actually do anything, so the cpu is busy checking the condition and entering the loop.

As you can see, creating these lists is taking the majority of the time here

>>> timeit.timeit(stmt='range(10**7)', number=1)
0.71875405311584473
>>> timeit.timeit(stmt='for i in range(10**7): pass', number=1)
1.093757152557373
Answered By: Ryan Haining

As regards cpu, you have a for loop running for billions of iterations without any sort of sleep or pause inbetween, so no wonder the process hogs the cpu completely ( at least on a single core computer).

Answered By: Sachin

If you’re considering Python for machine learning, take a look at numpy. Its philosophy is to implement all "inner loops" (matrix operations, linear algebra) in optimized C, and to use Python to manipulate input and output and to manage high-level algorithms – sort of like Matlab that uses Python. That gives you the best of both worlds: ease and readability of Python, and speed of C.

To get back to your question, benchmarking numpy operations will give you a more realistic assessment of Python’s performances for machine learning.

Answered By: user4815162342