Flags in Python

Question:

I’m working with a large matrix (250x250x30 = 1,875,000 cells), and I’d like a way to set an arbitrary number of flags for each cell in this matrix, in some manner that’s easy to use and reasonably space efficient.

My original plan was a 250x250x30 list array, where each element was something like: ["FLAG1","FLAG8","FLAG12"]. I then changed it to storing just integers instead: [1,8,12]. These integers are mapped internally by getter/setter functions to the original flag strings. This only uses 250mb with 8 flags per point, which is fine in terms of memory.

My question is: am I missing another obvious way to structure this sort of data?

Thanks all for your suggestions. I ended up rolling a few suggestions into one, sadly I can only pick one answer and have to live with upvoting the others:

EDIT: erm the initial code I had here (using sets as the base element of a 3d numpy array) used A LOT of memory. This new version uses around 500mb when filled with randint(0,2**1000).

import numpy

FLAG1=2**0
FLAG2=2**1
FLAG3=2**2
FLAG4=2**3

(x,y,z) = (250,250,30)

array = numpy.zeros((x,y,z), dtype=object)


def setFlag(location,flag):
    array[location] |= flag
def unsetFlag(location,flag):
    array[location] &= ~flag
Asked By: Alex J

||

Answers:

BitSet is what you want, since it allows you to store many flags at once using only an fixed size integer (Int type)

Answered By: dfa

Your solution is fine if every single cell is going to have a flag. However if you are working with a sparse dataset where only a small subsection of your cells will have flags what you really want is a dictionary. You would want to set up the dictonary so the key is a tuple for the location of the cell and the value is a list of flags like you have in your solution.

allFlags = {(1,1,1):[1,2,3], (250,250,30):[4,5,6]}

Here we have the 1,1,1 cell have the flags 1,2, and 3 and the cell 250,250,30 have the flags 4,5, and 6

edit- fixed key tuples, thanks Andre, and dictionary syntax.

Answered By: Robbie

Consider using Flyweight pattern to share cell properties:

http://en.wikipedia.org/wiki/Flyweight_pattern

Answered By: Perica Zivkovic

You can define some constants with different, power of two values as:

FLAG1 = 0x01
FLAG8 = 0x02
FLAG12 = 0x04
...

And use them with boolean logic to store the flags in only one integer, p.e.:

flags = FLAG1 | FLAG8

To check if a flag is enabled, you can use the & operator:

flag1_enabled = flags & FLAG1

If the flag is enabled, this expression will return a non-zero value, that will be evaluated as True in any boolean operation. If the flag is disabled, the expression will return 0, that is evaluated as False in boolean operations.

Answered By: Jaime Soriano

I would generally use a numpy array (presumably of short ints, 2 bytes each, since you may need more than 256 distinct values) — that would take less than 4MB for the <2 million cells.

If for some reason I couldn’t afford the numpy dependency (e.g on App Engine, which doesn’t support numpy), I’d use the standard library array module – it only supports 1-dimensional arrays, but it’s just as space-efficient as numpy for large homogeneous arrays, and the getter/setter routines you mention can perfectly well "linearize" a 3-items tuple that’s your natural index into the single integer index into the 1-D array.

In general, consider numpy (or array) any time you have large homogeneous, dense vectors or matrices of numbers — Python built-in lists are highly wasteful of space in this use case (due to their generality which you’re not using and don’t need here!-), and saving memory indirectly translates to saving time too (better caching, fewer levels of indirection, etc, etc).

Answered By: Alex Martelli

Taking Robbie’s suggestion one step further…

flags = set()
x, y, flag = 34, 201, 3
flags.add((x, y, flag)) # set flag 3 at position (34, 201)
if (3, 2, 1) in flags: # check if flag 1 is at position (3, 2)
    # do something
else:
    # do something else

You can also create a helper class.

class Flags(object):
    def __init__(self):
        self.data = set()
    def add(self, x, y, flag):
        self.data.add((x, y, flag))
    def remove(self, x, y, flag):
        self.data.remove((x, y, flag))
    def contains(self, x, y, flag):
        return (x, y, flag) in self.data

You could also implement Python’s special methods like __contains__ to make it easier to work with.

Answered By: FogleBird
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.