Store an array, or a list, of fixed length as a class attribute using slots in python

Question:

I have an object that represents a User and a variable measured 20 times. The object will be something like this:

class User:
    user_id: str 
    measures: List[float] #this is a list(or an array) of size 20

Given that I have many users that I need to represent I would like to use __slots__ to store the variable (so I can save space). Although I don’t know if it’s possible to implement this directly will save memory, because it probably will store the memory for the pointer to the list, but not the list floats. The following code runs, but not sure how is memory-wise compared to the latter:

class User:
    __slots__ =['user_id', 'measures'] # this implementation runs, but no idea if its using slots "properly"

    user_id: str 
    measures: List[float] 

    def __init__(self, user_id:str, measures:List[float]):
         #...

or maybe the only alternative is to declare the 20 variables independently? (this is very cumbersome but I know it will work)

class User:
    __slots__ =['user_id', 'm1', 'm2', ...] #very cumbersome 

    user_id: str 
    m1:float 
    m2:float 
    ...

    def __init__(self, user_id:str, measures:List[float]):
         #...

or maybe I should use another class that contains the measures.

Asked By: Pablo

||

Answers:

If memory is a concern, you should first think of keeping floats in an array (either numpy or plain Python array), as floats either "stand alone" or as elements in a list will be a full Python object with tens of bytes .

The simpler way to do it is to simply have the array to be created along with each instance of your object, and then you can use plain indexing on it:

from array import array

class User:
    __slots__ =['user_id', 'measures'] 
    user_id: str 
    measures: array

    def __init__(self, user_id:str, measures:List[float]):
         #...
         self.measures = array("d")
         self.measures.fromlist(measures)

Python array objects have no support in typing, so the best approach is simply forget about type-annotating it – unless you are willing to spend several hours fighting uphill for an optional annotation until you get it passing (but not necessarily right).

But most important in this case: when you want to use or set values in your measures list, Python will take the 8 bytes in which your value are stored (and you can even store 4 bytes 32bit fp values, just use "f" for the array data type), and "unbox" it into a Python float, ready to be used: user.measures[0] * 3

If you want to constrain the size of the measures list to less, or at most 20, you can do it with a plain Python if statement inside __init__:


    def __init__(self, user_id:str, measures:List[float]):
         #...
         if len(measures) != 20:
              raise ValueError(...)
         ...

moreover, and maybe more interesting for you – let’s say you want to be able to get to each measure by as a dotted attribute with a hardcoded name, and still store each measure as a 4byte only value – you can write custom code in the __getattr__ and __setattr__ methods, and some metadata allowing the mapping. But I digress – feel free to comment along if you need such a feature.

Answered By: jsbueno
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.