Why does sys.getsizeof fail on a pandas series or data frame when they hold a type

Question:

Python getsizeof fails on a series which holds a type as data , I have function in which I need to calculate the size of any given argument which I do with getsizeof. But this is an issue as getsizeof fails unexpectedly for these kind of dataframes. Is there a way to avoid this failure in getSizeof

import sys
df=pd.Series(str)

sys.getsizeof(df)
TypeError: descriptor '__sizeof__' of 'str' object needs an argument
Asked By: ArunJose

||

Answers:

This is a Pandas bug.

Pandas makes the unusual decision of trying to compute a "deep" sizeof, including all element sizes, rather than just the memory consumption directly attributable to the Series instance itself. The __sizeof__ implementation for a Series instance eventually hits a loop that tries to call __sizeof__ on the elements:

for i in range(n):
    size += arr[i].__sizeof__()
return size

but calling __sizeof__ like this is incorrect. It should really call sys.getsizeof(arr[i]).

Calling __sizeof__ like this is wrong for two reasons: first, as you’ve seen, it fails when an element is a type object, because str.__sizeof__ is the unbound method for computing the size of a string, not the method for computing the size of the str type object itself. Second, sys.getsizeof adds corrections for GC metadata that __sizeof__ doesn’t account for.

Answered By: user2357112

I solved it at https://github.com/pandas-dev/pandas/issues/51858.
Upgrade pandas and you should be able to run the same code with no problem:

import sys
df=pd.Series(str)
sys.getsizeof(df)
Answered By: Brayan Muñoz
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.