Store an array, or a list, of fixed length as a class attribute using slots in python
Question:
I have an object that represents a User
and a variable measured 20 times. The object will be something like this:
class User:
user_id: str
measures: List[float] #this is a list(or an array) of size 20
Given that I have many users that I need to represent I would like to use __slots__
to store the variable (so I can save space). Although I don’t know if it’s possible to implement this directly will save memory, because it probably will store the memory for the pointer to the list, but not the list floats. The following code runs, but not sure how is memory-wise compared to the latter:
class User:
__slots__ =['user_id', 'measures'] # this implementation runs, but no idea if its using slots "properly"
user_id: str
measures: List[float]
def __init__(self, user_id:str, measures:List[float]):
#...
or maybe the only alternative is to declare the 20 variables independently? (this is very cumbersome but I know it will work)
class User:
__slots__ =['user_id', 'm1', 'm2', ...] #very cumbersome
user_id: str
m1:float
m2:float
...
def __init__(self, user_id:str, measures:List[float]):
#...
or maybe I should use another class that contains the measures
.
Answers:
If memory is a concern, you should first think of keeping floats in an array (either numpy or plain Python array), as floats either "stand alone" or as elements in a list will be a full Python object with tens of bytes .
The simpler way to do it is to simply have the array to be created along with each instance of your object, and then you can use plain indexing on it:
from array import array
class User:
__slots__ =['user_id', 'measures']
user_id: str
measures: array
def __init__(self, user_id:str, measures:List[float]):
#...
self.measures = array("d")
self.measures.fromlist(measures)
Python array objects have no support in typing, so the best approach is simply forget about type-annotating it – unless you are willing to spend several hours fighting uphill for an optional annotation until you get it passing (but not necessarily right).
But most important in this case: when you want to use or set values in your measures list, Python will take the 8 bytes in which your value are stored (and you can even store 4 bytes 32bit fp values, just use "f" for the array data type), and "unbox" it into a Python float, ready to be used: user.measures[0] * 3
If you want to constrain the size of the measures list to less, or at most 20, you can do it with a plain Python if
statement inside __init__
:
def __init__(self, user_id:str, measures:List[float]):
#...
if len(measures) != 20:
raise ValueError(...)
...
moreover, and maybe more interesting for you – let’s say you want to be able to get to each measure by as a dotted attribute with a hardcoded name, and still store each measure as a 4byte only value – you can write custom code in the __getattr__
and __setattr__
methods, and some metadata allowing the mapping. But I digress – feel free to comment along if you need such a feature.
I have an object that represents a User
and a variable measured 20 times. The object will be something like this:
class User:
user_id: str
measures: List[float] #this is a list(or an array) of size 20
Given that I have many users that I need to represent I would like to use __slots__
to store the variable (so I can save space). Although I don’t know if it’s possible to implement this directly will save memory, because it probably will store the memory for the pointer to the list, but not the list floats. The following code runs, but not sure how is memory-wise compared to the latter:
class User:
__slots__ =['user_id', 'measures'] # this implementation runs, but no idea if its using slots "properly"
user_id: str
measures: List[float]
def __init__(self, user_id:str, measures:List[float]):
#...
or maybe the only alternative is to declare the 20 variables independently? (this is very cumbersome but I know it will work)
class User:
__slots__ =['user_id', 'm1', 'm2', ...] #very cumbersome
user_id: str
m1:float
m2:float
...
def __init__(self, user_id:str, measures:List[float]):
#...
or maybe I should use another class that contains the measures
.
If memory is a concern, you should first think of keeping floats in an array (either numpy or plain Python array), as floats either "stand alone" or as elements in a list will be a full Python object with tens of bytes .
The simpler way to do it is to simply have the array to be created along with each instance of your object, and then you can use plain indexing on it:
from array import array
class User:
__slots__ =['user_id', 'measures']
user_id: str
measures: array
def __init__(self, user_id:str, measures:List[float]):
#...
self.measures = array("d")
self.measures.fromlist(measures)
Python array objects have no support in typing, so the best approach is simply forget about type-annotating it – unless you are willing to spend several hours fighting uphill for an optional annotation until you get it passing (but not necessarily right).
But most important in this case: when you want to use or set values in your measures list, Python will take the 8 bytes in which your value are stored (and you can even store 4 bytes 32bit fp values, just use "f" for the array data type), and "unbox" it into a Python float, ready to be used: user.measures[0] * 3
If you want to constrain the size of the measures list to less, or at most 20, you can do it with a plain Python if
statement inside __init__
:
def __init__(self, user_id:str, measures:List[float]):
#...
if len(measures) != 20:
raise ValueError(...)
...
moreover, and maybe more interesting for you – let’s say you want to be able to get to each measure by as a dotted attribute with a hardcoded name, and still store each measure as a 4byte only value – you can write custom code in the __getattr__
and __setattr__
methods, and some metadata allowing the mapping. But I digress – feel free to comment along if you need such a feature.