Why does sys.getsizeof fail on a pandas series or data frame when they hold a type
Question:
Python getsizeof fails on a series which holds a type as data , I have function in which I need to calculate the size of any given argument which I do with getsizeof. But this is an issue as getsizeof fails unexpectedly for these kind of dataframes. Is there a way to avoid this failure in getSizeof
import sys
df=pd.Series(str)
sys.getsizeof(df)
TypeError: descriptor '__sizeof__' of 'str' object needs an argument
Answers:
This is a Pandas bug.
Pandas makes the unusual decision of trying to compute a "deep" sizeof, including all element sizes, rather than just the memory consumption directly attributable to the Series
instance itself. The __sizeof__
implementation for a Series
instance eventually hits a loop that tries to call __sizeof__
on the elements:
for i in range(n):
size += arr[i].__sizeof__()
return size
but calling __sizeof__
like this is incorrect. It should really call sys.getsizeof(arr[i])
.
Calling __sizeof__
like this is wrong for two reasons: first, as you’ve seen, it fails when an element is a type object, because str.__sizeof__
is the unbound method for computing the size of a string, not the method for computing the size of the str
type object itself. Second, sys.getsizeof
adds corrections for GC metadata that __sizeof__
doesn’t account for.
I solved it at https://github.com/pandas-dev/pandas/issues/51858.
Upgrade pandas and you should be able to run the same code with no problem:
import sys
df=pd.Series(str)
sys.getsizeof(df)
Python getsizeof fails on a series which holds a type as data , I have function in which I need to calculate the size of any given argument which I do with getsizeof. But this is an issue as getsizeof fails unexpectedly for these kind of dataframes. Is there a way to avoid this failure in getSizeof
import sys
df=pd.Series(str)
sys.getsizeof(df)
TypeError: descriptor '__sizeof__' of 'str' object needs an argument
This is a Pandas bug.
Pandas makes the unusual decision of trying to compute a "deep" sizeof, including all element sizes, rather than just the memory consumption directly attributable to the Series
instance itself. The __sizeof__
implementation for a Series
instance eventually hits a loop that tries to call __sizeof__
on the elements:
for i in range(n):
size += arr[i].__sizeof__()
return size
but calling __sizeof__
like this is incorrect. It should really call sys.getsizeof(arr[i])
.
Calling __sizeof__
like this is wrong for two reasons: first, as you’ve seen, it fails when an element is a type object, because str.__sizeof__
is the unbound method for computing the size of a string, not the method for computing the size of the str
type object itself. Second, sys.getsizeof
adds corrections for GC metadata that __sizeof__
doesn’t account for.
I solved it at https://github.com/pandas-dev/pandas/issues/51858.
Upgrade pandas and you should be able to run the same code with no problem:
import sys
df=pd.Series(str)
sys.getsizeof(df)