Using defaultdict to replace try and/or if statements in python

Question:

I have recently found and started using default dictionaries to replace several more bulky constructs. I have read in ‘the zen of python’ that one of the key points of python is “There should be one– and preferably only one –obvious way to do it.”

Based on that criteria (or perhaps more practically based on memory usage, or speed) which of the following (or something totally different) would be best? I have a hunch that the first is correct, but would like other people’s opinions.

my_dict = defaultdict(int)
for generic in iterable:
    my_dict[generic] +=1

or:

my_dict = {}
for generic in iterable:
    if generic not in my_dict:
        my_dict[generic] = 1
    else:
        my_dict[generic]+=1

or:

my_dict = {}
for generic in iterable:
    try:
        my_dict[generic] += 1
    except(KeyError):
        my_dict[generic] = 1

Same can be said of using my_dict = defaultdict(list) and using append functions. Assume that multiple for loops, or other conditionals are used rather than simply counting generic values from a single iterable as that would obviously have different features.

Asked By: ded

||

Answers:

If you insist on using a dictionary or defaultdict, the first one is the best. For counting, however, there’s a lovely class called Counter in collections:

>>> from collections import Counter
>>> c = Counter()
>>> for generic in iterable:
...     c.update(generic)

Or even shorter:

>>> c = Counter(iterable)
Answered By: Noctua

As Paulo Almeida commented, for the example you posted the “obvious” solution is to use a collections.Counter:

from collections import Counter
my_dict = Counter(iterable)

And that’s it.

As for the other snippets you posted, and assuming the my_dict[key] += 1 was just for the example and your general question is about “how to best populate a dict”: collections.defaultdict is the right choice for homogeneous dicts (same type of values for all keys) where the type has a default value (numeric zero, empty string, empty list…). The most common use case I can think of is for populating a dict of lists (or sets or other containers).

Now when neither collections.Counter nor collections.defaultdict solve your problem, you have three possible patterns:

  • look before
  • try/except KeyError
  • dict.setdefault(key, value)

The try/except solution will be faster if you expect the key to already exist – a try/except block is very quick to setup but costly when the exception is raised. As far as I’m concerned I don’t recommand it unless you are very very very sure about what your data looks like now and what they will look like in the future.

The “look before” solution has an almost constant cost, and while not free it’s still quite cheap. That’s really your safest bet.

the dict.setdefault() solution has about the same cost as the “look before” one, BUT you also have the constant cost of instanciating a default object, that will often be thrashed immediatly. It was a common pattern some years ago but since the collection.defaultdict appeared it’s of rather marginal use, not to say mostly useless.

Answered By: bruno desthuilliers

I created a script to test the performance of these 3 ways according to a situation in which I needed to choose the most performant way. Of course, the results will vary depending on the data you need to work with.

from collections import defaultdict
import random
import time

# build data
data = []
for x in range(0, 1000):
    data.append(
        { "name": "performance test", "company_id": random.randint(1, 9999), "platform": { "name": "stack", "last_name": "overflow" } }
    )

def perf_defaultdict():
    t0 = time.time()
    d = defaultdict(lambda:[])
    for x in data:
        d[x['company_id']].append(x['platform'])
    t1 = time.time()
    total = t1-t0
    print('defaultdict', total)
    return d

def perf_ifelse():
    t0 = time.time()
    d = {}
    for x in data:
        if x['company_id'] in d:
            d[x['company_id']].append(x['platform'])
        else:
            d[x['company_id']] = [x['platform']]
    t1 = time.time()
    total = t1-t0
    print('if else', total)
    return d

def perf_tryexcept():
    t0 = time.time()
    d = {}
    for x in data:
        try:
            d[x['company_id']].append(x['platform'])
        except KeyError:
            d[x['company_id']] = [x['platform']]
    t1 = time.time()
    total = t1-t0
    print('try/except', total)
    return d

# run
d1 = perf_defaultdict()
d2 = perf_ifelse()
d3 = perf_tryexcept()

Output 1:

defaultdict 0.0003848075866699219
if else 0.0002579689025878906
try/except 0.0004780292510986328

Output 2:

defaultdict 0.0005650520324707031
if else 0.00036215782165527344
try/except 0.00080108642578125

Output 3:

defaultdict 0.0006802082061767578
if else 0.0004029273986816406
try/except 0.0008919239044189453

In this case, if/else performed better.

Answered By: GIA
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.