python how to get BytesIO allocated memory length?

Question:

This is the code i am using to test the memory allocation

import pycurl
import io


url = "http://www.stackoverflow.com"
buf = io.BytesIO()


print(len(buf.getvalue()))   #here i am getting 0 as length


c = pycurl.Curl()
c.setopt(c.URL, url)
c.setopt(c.CONNECTTIMEOUT, 10)
c.setopt(c.TIMEOUT, 10)
c.setopt(c.ENCODING, 'gzip')
c.setopt(c.FOLLOWLOCATION, True)
c.setopt(c.IPRESOLVE, c.IPRESOLVE_V4)
c.setopt(c.USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0) Gecko/20100101 Firefox/8.0')
c.setopt(c.WRITEFUNCTION, buf.write)
c.perform()
c.close()

print(len(buf.getvalue()))    #here length of the dowloaded file


print(buf.getvalue())
buf.close()

How to get the allocated buffer/memory length by BytesIO ?
what am i doing wrong here ? python doesn’t allocate fixed buffer length ?

Asked By: user4046642

||

Answers:

I am not sure what you mean by allocated buffer/memory length, but if you want the length of the user data stored in the BytesIO object you can do

>>> bio = io.BytesIO()
>>> bio.getbuffer().nbytes
0
>>> bio.write(b'here is some data')
17
>>> bio.getbuffer().nbytes
17

But this seems equivalent to the len(buf.getvalue()) that you are currently using.

The actual size of the BytesIO object can be found using sys.getsizeof():

>>> bio = io.BytesIO()
>>> sys.getsizeof(bio)
104

Or you could be nasty and call __sizeof__() directly (which is like sys.getsizeof() but without garbage collector overhead applicable to the object):

>>> bio = io.BytesIO()
>>> bio.__sizeof__()
72

Memory for BytesIO is allocated as required, and some buffering does take place:

>>> bio = io.BytesIO()
>>> for i in range(20):
...     _=bio.write(b'a')
...     print(bio.getbuffer().nbytes, sys.getsizeof(bio), bio.__sizeof__())
...
1 106 74
2 106 74
3 108 76
4 108 76
5 110 78
6 110 78
7 112 80
8 112 80
9 120 88
10 120 88
11 120 88
12 120 88
13 120 88
14 120 88
15 120 88
16 120 88
17 129 97
18 129 97
19 129 97
20 129 97
Answered By: mhawke

io.BytesIO() returns a standard file object which has function tell(). It reports the current descriptor position and does not copy the whole buffer out to compute total size as len(bio.getvalue()) of bio.getbuffer().nbytes. It is a very fast and simple method to get the exact size of used memory in the buffer object.

However, if you preset your buffer, tell() will point at the beginning of the buffer and return 0, but the buffer size is not zero. In this case, you can move the pointer to the end of the buffer seek(0,2), which will report the total buffer size without copying the whole buffer into another chank of the memory.

I posted and recently updated an example code and a more detailed answer here

Answered By: rth

You can also use tracemalloc to get indirect information about the size of objects, by wrapping memory events in tracemalloc.get_traced_memory()

Do note that active threads (if any) and side effects of your program will affect the output, but it may also be more representative of the real memory cost if many samples are taken, as shown below.

>>> import tracemalloc
>>> from io import BytesIO
>>> tracemalloc.start()
>>>
>>> memory_traces = []
>>>
>>> with BytesIO() as bytes_fh:
...     # returns (current memory usage, peak memory usage)
        # ..but only since calling .start()
...     memory_traces.append(tracemalloc.get_traced_memory())
...     bytes_fh.write(b'a' * (1024**2))  # create 1MB of 'a'
...     memory_traces.append(tracemalloc.get_traced_memory())
...
1048576
>>> print("used_memory = {}b".format(memory_traces[1][0] - memory_traces[0][0]))
used_memory = 1048870b
>>> 1048870 - 1024**2  # show small overhead
294  
Answered By: ti7
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.