defaultdict and tuples
Question:
I wanted to do the following:
d = defaultdict((int,float))
for z in range( lots_and_lots):
d['operation one'] += (1,5.67)
...
...
d['operation two'] += (1,4.56)
And then output the number of times each operation was called and the total of float value.
for k,v in d.items():
print k, 'Called', v[0], 'times, total =', v[1]
But I don’t know how to achieve this as not only can’t you use a tuple as a parameter to defaultdict you can’t add a tuple to a tuple and total the values in the tuple you just get extra values in your tuple. i.e:
>>> x = (1,0)
>>> x+= (2,3)
>>> x
(1, 0, 2, 3)
and not
>>> x = (1,0)
>>> x+= (2,3)
>>> x
(3,3)
How can I get what I want?
Answers:
the argument to defaultdict
must be a “callable” that returns a default value. define your default dict like so:
d = defaultdict(lambda: (0, 0.0))
The fact that int
and float
types can be called and return zero’s is a convenience, but not in any way crucial to the way defaultdict works.
getting the +=
to work is going to cause some trouble; addition across tuples is the concatantion of the tuples, so you’ll have to do it the long way:
left, right = d["key"]
d["key"] = (left + 2, right + 3)
Edit: if you just must use +=, you can do so, so long as you have a collection type that has the desired operations. fileoffset suggests using a numpy
array type, and that’s probably a nice idea, but you can get a close approximation just by subclassing tuple
and overriding the operators you need: Here’s a rough sketch of one:
class vector(tuple):
def __add__(self, other):
return type(self)(l+r for l, r in zip(self, other))
def __sub__(self, other):
return type(self)(l-r for l, r in zip(self, other))
def __radd__(self, other):
return type(self)(l+r for l, r in zip(self, other))
def __lsub__(self, other):
return type(self)(r-l for l, r in zip(self, other))
from collections import defaultdict
d = defaultdict(lambda:vector((0, 0.0)))
for k in range(5):
for j in range(5):
d[k] += (j, j+k)
print d
we don’t need (or want) to actually overload the +=
operator itself (spelled __iadd__
) because tuple
is immutable. Python will correctly replace the old value with new if you supply addition.
You could do it with collections.Counter to accumulate the results:
>>> from collections import Counter, defaultdict
>>> d = defaultdict(Counter)
>>> d['operation_one'].update(ival=1, fval=5.67)
>>> d['operation_two'].update(ival=1, fval=4.56)
Try this:
a = (1,0)
b = (2,3)
res = tuple(sum(x) for x in zip(a,b)
e.g.
d = defaultdict((int,float))
for z in range( lots_and_lots):
d['operation one'] = tuple(sum(x) for x in zip(d['operation one'], (1,5.67))
...
...
If you use numpy array’s you can get the desired output:
I assuming you have too many operations to simply store the list of values in each entry?
d = defaultdict(list)
for z in range(lots_and_lots):
d['operation one'].append(5.67)
...
...
d['operation two'].append(4.56)
for k,v in d.items():
print k, 'Called', len(v), 'times, total =', sum(v)
One thing you could do is make a custom incrementor:
class Inc(object):
def __init__(self):
self.i = 0
self.t = 0.0
def __iadd__(self, f):
self.i += 1
self.t += f
return self
and then
d = defaultdict(Inc)
for z in range(lots_and_lots):
d['operation one'] += 5.67
...
...
d['operation two'] += 4.56
for k,v in d.items():
print k, 'Called', v.i, 'times, total =', v.t
Write a class that you can pass into defaultdict
that accumulates values as you pass them in:
class Tracker(object):
def __init__(self):
self.values = None
self.count = 0
def __iadd__(self, newvalues):
self.count += 1
if self.values is None:
self.values = newvalues
else:
self.values = [(old + new) for old, new in zip(self.values, newvalues)]
return self
def __repr__(self):
return '<Tracker(%s, %d)>' % (self.values, self.count)
That’s a drop-in replacement for (int, float)
in your original post. Change your output loop to print the instance attributes like so:
for k,v in d.items():
print k, 'Called', v.count, 'times, total =', v.values
…and you’re done!
I wanted to do the following:
d = defaultdict((int,float))
for z in range( lots_and_lots):
d['operation one'] += (1,5.67)
...
...
d['operation two'] += (1,4.56)
And then output the number of times each operation was called and the total of float value.
for k,v in d.items():
print k, 'Called', v[0], 'times, total =', v[1]
But I don’t know how to achieve this as not only can’t you use a tuple as a parameter to defaultdict you can’t add a tuple to a tuple and total the values in the tuple you just get extra values in your tuple. i.e:
>>> x = (1,0)
>>> x+= (2,3)
>>> x
(1, 0, 2, 3)
and not
>>> x = (1,0)
>>> x+= (2,3)
>>> x
(3,3)
How can I get what I want?
the argument to defaultdict
must be a “callable” that returns a default value. define your default dict like so:
d = defaultdict(lambda: (0, 0.0))
The fact that int
and float
types can be called and return zero’s is a convenience, but not in any way crucial to the way defaultdict works.
getting the +=
to work is going to cause some trouble; addition across tuples is the concatantion of the tuples, so you’ll have to do it the long way:
left, right = d["key"]
d["key"] = (left + 2, right + 3)
Edit: if you just must use +=, you can do so, so long as you have a collection type that has the desired operations. fileoffset suggests using a numpy
array type, and that’s probably a nice idea, but you can get a close approximation just by subclassing tuple
and overriding the operators you need: Here’s a rough sketch of one:
class vector(tuple):
def __add__(self, other):
return type(self)(l+r for l, r in zip(self, other))
def __sub__(self, other):
return type(self)(l-r for l, r in zip(self, other))
def __radd__(self, other):
return type(self)(l+r for l, r in zip(self, other))
def __lsub__(self, other):
return type(self)(r-l for l, r in zip(self, other))
from collections import defaultdict
d = defaultdict(lambda:vector((0, 0.0)))
for k in range(5):
for j in range(5):
d[k] += (j, j+k)
print d
we don’t need (or want) to actually overload the +=
operator itself (spelled __iadd__
) because tuple
is immutable. Python will correctly replace the old value with new if you supply addition.
You could do it with collections.Counter to accumulate the results:
>>> from collections import Counter, defaultdict
>>> d = defaultdict(Counter)
>>> d['operation_one'].update(ival=1, fval=5.67)
>>> d['operation_two'].update(ival=1, fval=4.56)
Try this:
a = (1,0)
b = (2,3)
res = tuple(sum(x) for x in zip(a,b)
e.g.
d = defaultdict((int,float))
for z in range( lots_and_lots):
d['operation one'] = tuple(sum(x) for x in zip(d['operation one'], (1,5.67))
...
...
If you use numpy array’s you can get the desired output:
I assuming you have too many operations to simply store the list of values in each entry?
d = defaultdict(list)
for z in range(lots_and_lots):
d['operation one'].append(5.67)
...
...
d['operation two'].append(4.56)
for k,v in d.items():
print k, 'Called', len(v), 'times, total =', sum(v)
One thing you could do is make a custom incrementor:
class Inc(object):
def __init__(self):
self.i = 0
self.t = 0.0
def __iadd__(self, f):
self.i += 1
self.t += f
return self
and then
d = defaultdict(Inc)
for z in range(lots_and_lots):
d['operation one'] += 5.67
...
...
d['operation two'] += 4.56
for k,v in d.items():
print k, 'Called', v.i, 'times, total =', v.t
Write a class that you can pass into defaultdict
that accumulates values as you pass them in:
class Tracker(object):
def __init__(self):
self.values = None
self.count = 0
def __iadd__(self, newvalues):
self.count += 1
if self.values is None:
self.values = newvalues
else:
self.values = [(old + new) for old, new in zip(self.values, newvalues)]
return self
def __repr__(self):
return '<Tracker(%s, %d)>' % (self.values, self.count)
That’s a drop-in replacement for (int, float)
in your original post. Change your output loop to print the instance attributes like so:
for k,v in d.items():
print k, 'Called', v.count, 'times, total =', v.values
…and you’re done!