A data-structure for 1:1 mappings in python?

Question:

I have a problem which requires a reversable 1:1 mapping of keys to values.

That means sometimes I want to find the value given a key, but at other times I want to find the key given the value. Both keys and values are guaranteed unique.

x = D[y]
y == D.inverse[x]

The obvious solution is to simply invert the dictionary every time I want a reverse-lookup: Inverting a dictionary is very easy, there’s a recipe here but for a large dictionary it can be very slow.

The other alternative is to make a new class which unites two dictionaries, one for each kind of lookup. That would most likely be fast but would use up twice as much memory as a single dict.

So is there a better structure I can use?

  • My application requires that this should be very fast and use as little as possible memory.
  • The structure must be mutable, and it’s strongly desirable that mutating the object should not cause it to be slower (e.g. to force a complete re-index)
  • We can guarantee that either the key or the value (or both) will be an integer
  • It’s likely that the structure will be needed to store thousands or possibly millions of items.
  • Keys & Valus are guaranteed to be unique, i.e. len(set(x)) == len(x) for for x in [D.keys(), D.valuies()]
Asked By: Salim Fadhley

||

Answers:

The other alternative is to make a new class which unites two dictionaries, one for each > kind of lookup. That would most likely use up twice as much memory as a single dict.

Not really, since they would just be holding two references to the same data. In my mind, this is not a bad solution.

Have you considered an in-memory database lookup? I am not sure how it will compare in speed, but lookups in relational databases can be very fast.

Answered By: Shane C. Mason

Assuming that you have a key with which you look up a more complex mutable object, just make the key a property of that object. It does seem you might be better off thinking about the data model a bit.

Answered By: Pete Kirkham

The other alternative is to make a new
class which unites two dictionaries,
one for each kind of lookup. That
would most likely be fast but would
use up twice as much memory as a
single dict.

Not really. Have you measured that? Since both dictionaries would use references to the same objects as keys and values, then the memory spent would be just the dictionary structure. That’s a lot less than twice and is a fixed ammount regardless of your data size.

What I mean is that the actual data wouldn’t be copied. So you’d spend little extra memory.

Example:

a = "some really really big text spending a lot of memory"

number_to_text = {1: a}
text_to_number = {a: 1}

Only a single copy of the “really big” string exists, so you end up spending just a little more memory. That’s generally affordable.

I can’t imagine a solution where you’d have the key lookup speed when looking by value, if you don’t spend at least enough memory to store a reverse lookup hash table (which is exactly what’s being done in your “unite two dicts” solution).

Answered By: nosklo

“We can guarantee that either the key or the value (or both) will be an integer”

That’s weirdly written — “key or the value (or both)” doesn’t feel right. Either they’re all integers, or they’re not all integers.

It sounds like they’re all integers.

Or, it sounds like you’re thinking of replacing the target object with an integer value so you only have one copy referenced by an integer. This is a false economy. Just keep the target object. All Python objects are — in effect — references. Very little actual copying gets done.

Let’s pretend that you simply have two integers and can do a lookup on either one of the pair. One way to do this is to use heap queues or the bisect module to maintain ordered lists of integer key-value tuples.

See http://docs.python.org/library/heapq.html#module-heapq

See http://docs.python.org/library/bisect.html#module-bisect

You have one heapq (key,value) tuples. Or, if your underlying object is more complex, the (key,object) tuples.

You have another heapq (value,key) tuples. Or, if your underlying object is more complex, (otherkey,object) tuples.

An “insert” becomes two inserts, one to each heapq-structured list.

A key lookup is in one queue; a value lookup is in the other queue. Do the lookups using bisect(list,item).

Answered By: S.Lott

It so happens that I find myself asking this question all the time (yesterday in particular). I agree with the approach of making two dictionaries. Do some benchmarking to see how much memory it’s taking. I’ve never needed to make it mutable, but here’s how I abstract it, if it’s of any use:

class BiDict(list):
    def __init__(self,*pairs):
        super(list,self).__init__(pairs)
        self._first_access = {}
        self._second_access = {}
        for pair in pairs:
            self._first_access[pair[0]] = pair[1]
            self._second_access[pair[1]] = pair[0]
            self.append(pair)

    def _get_by_first(self,key):
        return self._first_access[key]

    def _get_by_second(self,key):
        return self._second_access[key]

    # You'll have to do some overrides to make it mutable
    # Methods such as append, __add__, __del__, __iadd__
    # to name a few will have to maintain ._*_access

class Constants(BiDict):
    # An implementation expecting an integer and a string
    get_by_name = BiDict._get_by_second
    get_by_number = BiDict._get_by_first

t = Constants(
        ( 1, 'foo'),
        ( 5, 'bar'),
        ( 8, 'baz'),
    )

>>> print t.get_by_number(5)
bar
>>> print t.get_by_name('baz')
8
>>> print t
[(1, 'foo'), (5, 'bar'), (8, 'baz')]
Answered By: David Berger

How about using sqlite? Just create a :memory: database with a two-column table. You can even add indexes, then query by either one. Wrap it in a class if it’s something you’re going to use a lot.

Answered By: ShawnMilo
class TwoWay:
    def __init__(self):
       self.d = {}
    def add(self, k, v):
       self.d[k] = v
       self.d[v] = k
    def remove(self, k):
       self.d.pop(self.d.pop(k))
    def get(self, k):
       return self.d[k]
Answered By: user17918

Here is my own solution to this problem: http://github.com/spenthil/pymathmap/blob/master/pymathmap.py

The goal is to make it as transparent to the user as possible. The only introduced significant attribute is partner.

OneToOneDict subclasses from dict – I know that isn’t generally recommended, but I think I have the common use cases covered. The backend is pretty simple, it (dict1) keeps a weakref to a ‘partner’ OneToOneDict (dict2) which is its inverse. When dict1 is modified dict2 is updated accordingly as well and vice versa.

From the docstring:

>>> dict1 = OneToOneDict()
>>> dict2 = OneToOneDict()
>>> dict1.partner = dict2
>>> assert(dict1 is dict2.partner)
>>> assert(dict2 is dict1.partner)
>>> dict1['one'] = '1'
>>> dict2['2'] = '1'
>>> dict1['one'] = 'wow'
>>> assert(dict1 == dict((v,k) for k,v in dict2.items()))
>>> dict1['one'] = '1'
>>> assert(dict1 == dict((v,k) for k,v in dict2.items()))
>>> dict1.update({'three': '3', 'four': '4'})
>>> assert(dict1 == dict((v,k) for k,v in dict2.items()))
>>> dict3 = OneToOneDict({'4':'four'})
>>> assert(dict3.partner is None)
>>> assert(dict3 == {'4':'four'})
>>> dict1.partner = dict3
>>> assert(dict1.partner is not dict2)
>>> assert(dict2.partner is None)
>>> assert(dict1.partner is dict3)
>>> assert(dict3.partner is dict1)
>>> dict1.setdefault('five', '5')
>>> dict1['five']
'5'
>>> dict1.setdefault('five', '0')
>>> dict1['five']
'5'

When I get some free time, I intend to make a version that doesn’t store things twice. No clue when that’ll be though 🙂

Answered By: spenthil
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.