What is the fastest (to access) struct-like object in Python?

Question:

I’m optimizing some code whose main bottleneck is running through and accessing a very large list of struct-like objects. Currently I’m using namedtuples, for readability. But some quick benchmarking using ‘timeit’ shows that this is really the wrong way to go where performance is a factor:

Named tuple with a, b, c:

>>> timeit("z = a.c", "from __main__ import a")
0.38655471766332994

Class using __slots__, with a, b, c:

>>> timeit("z = b.c", "from __main__ import b")
0.14527461047146062

Dictionary with keys a, b, c:

>>> timeit("z = c['c']", "from __main__ import c")
0.11588272541098377

Tuple with three values, using a constant key:

>>> timeit("z = d[2]", "from __main__ import d")
0.11106188992948773

List with three values, using a constant key:

>>> timeit("z = e[2]", "from __main__ import e")
0.086038238242508669

Tuple with three values, using a local key:

>>> timeit("z = d[key]", "from __main__ import d, key")
0.11187358437882722

List with three values, using a local key:

>>> timeit("z = e[key]", "from __main__ import e, key")
0.088604143037173344

First of all, is there anything about these little timeit tests that would render them invalid? I ran each several times, to make sure no random system event had thrown them off, and the results were almost identical.

It would appear that dictionaries offer the best balance between performance and readability, with classes coming in second. This is unfortunate, since, for my purposes, I also need the object to be sequence-like; hence my choice of namedtuple.

Lists are substantially faster, but constant keys are unmaintainable; I’d have to create a bunch of index-constants, i.e. KEY_1 = 1, KEY_2 = 2, etc. which is also not ideal.

Am I stuck with these choices, or is there an alternative that I’ve missed?

Asked By: DNS

||

Answers:

I would be tempted to either (a) invent some kind of workload specific caching, and offload the storage and retrieval of my data to a memcachedb-like process, to improve scalability rather than performance alone or (b) rewrite as a C extension, with native data storage. An ordered-dictionary type perhaps.

You could start with this:
http://www.xs4all.nl/~anthon/Python/ordereddict/

Answered By: Warren P

You can make your classes sequence like by adding __iter__, and __getitem__ methods, to make them sequence like (indexable and iterable.)

Would an OrderedDict work? There are several implementations available, and it is included in the Python31 collections module.

Answered By: mikerobi

A couple points and ideas:

  1. You’re timing accessing the same index many times in a row. Your actual program probably uses random or linear access, which will have different behavior. In particular, there will be more CPU cache misses. You might get slightly different results using your actual program.

  2. OrderedDictionary is written as a wrapper around dict, ergo it will be slower than dict. That’s a non-solution.

  3. Have you tried both new-style and old-style classes? (new-style classes inherit from object; old-style classes do not)

  4. Have you tried using psyco or Unladen Swallow? (2020 Update – these two projects are dead)

  5. Does your inner loop to modify the data or just access it? It might be possible to transform the data into the most efficient possible form before entering the loop, but use the most convenient form elsewhere in the program.

Answered By: Daniel Stutzbach

One thing to bear in mind is that namedtuples are optimised for access as tuples. If you change your accessor to be a[2] instead of a.c, you’ll see similar performance to the tuples. The reason is that the name accessors are effectively translating into calls to self[idx], so pay both the indexing and the name lookup price.

If your usage pattern is such that access by name is common, but access as tuple isn’t, you could write a quick equivalent to namedtuple that does things the opposite way: defers index lookups to access by-name. However, you’ll pay the price on the index lookups then. Eg here’s a quick implementation:

def makestruct(name, fields):
    fields = fields.split()
    import textwrap
    template = textwrap.dedent("""
    class {name}(object):
        __slots__ = {fields!r}
        def __init__(self, {args}):
            {self_fields} = {args}
        def __getitem__(self, idx): 
            return getattr(self, fields[idx])
    """).format(
        name=name,
        fields=fields,
        args=','.join(fields), 
        self_fields=','.join('self.' + f for f in fields))
    d = {'fields': fields}
    exec template in d
    return d[name]

But the timings are very bad when __getitem__ must be called:

namedtuple.a  :  0.473686933517 
namedtuple[0] :  0.180409193039
struct.a      :  0.180846214294
struct[0]     :  1.32191514969

ie, the same performance as a __slots__ class for attribute access (unsurprisingly – that’s what it is), but huge penalties due to the double lookup in index-based accesses. (Noteworthy is that __slots__ doesn’t actually help much speed-wise. It saves memory, but the access time is about the same without them.)

One third option would be to duplicate the data, eg. subclass from list and store the values both in the attributes and listdata. However you don’t actually get list-equivalent performance. There’s a big speed hit just in having subclassed (bringing in checks for pure-python overloads). Thus struct[0] still takes around 0.5s (compared with 0.18 for raw list) in this case, and you do double the memory usage, so this may not be worth it.

Answered By: Brian

This question is fairly old (internet-time), so I thought I’d try duplicating your test today, both with regular CPython (2.7.6), and with pypy (2.2.1) and see how the various methods compared. (I also added in an indexed lookup for the named tuple.)

This is a bit of a micro-benchmark, so YMMV, but pypy seemed to speed up named tuple access by a factor of 30 vs CPython (whereas dictionary access was only sped up by a factor of 3).

from collections import namedtuple

STest = namedtuple("TEST", "a b c")
a = STest(a=1,b=2,c=3)

class Test(object):
    __slots__ = ["a","b","c"]

    a=1
    b=2
    c=3

b = Test()

c = {'a':1, 'b':2, 'c':3}

d = (1,2,3)
e = [1,2,3]
f = (1,2,3)
g = [1,2,3]
key = 2

if __name__ == '__main__':
    from timeit import timeit

    print("Named tuple with a, b, c:")
    print(timeit("z = a.c", "from __main__ import a"))

    print("Named tuple, using index:")
    print(timeit("z = a[2]", "from __main__ import a"))

    print("Class using __slots__, with a, b, c:")
    print(timeit("z = b.c", "from __main__ import b"))

    print("Dictionary with keys a, b, c:")
    print(timeit("z = c['c']", "from __main__ import c"))

    print("Tuple with three values, using a constant key:")    
    print(timeit("z = d[2]", "from __main__ import d"))

    print("List with three values, using a constant key:")
    print(timeit("z = e[2]", "from __main__ import e"))

    print("Tuple with three values, using a local key:")
    print(timeit("z = d[key]", "from __main__ import d, key"))

    print("List with three values, using a local key:")
    print(timeit("z = e[key]", "from __main__ import e, key"))

Python Results:

Named tuple with a, b, c:
0.124072679784
Named tuple, using index:
0.0447055962367
Class using __slots__, with a, b, c:
0.0409136944224
Dictionary with keys a, b, c:
0.0412045334915
Tuple with three values, using a constant key:
0.0449477955531
List with three values, using a constant key:
0.0331083467148
Tuple with three values, using a local key:
0.0453569025139
List with three values, using a local key:
0.033030056702

PyPy Results:

Named tuple with a, b, c:
0.00444889068604
Named tuple, using index:
0.00265598297119
Class using __slots__, with a, b, c:
0.00208616256714
Dictionary with keys a, b, c:
0.013897895813
Tuple with three values, using a constant key:
0.00275301933289
List with three values, using a constant key:
0.002760887146
Tuple with three values, using a local key:
0.002769947052
List with three values, using a local key:
0.00278806686401
Answered By: Gerrat

This problem may be obsolete soon. CPython dev has evidently made significant improvements to the performance of accessing named tuple values by attribute name. The changes are scheduled for release in Python 3.8, near the end of Oct. 2019.

See: https://bugs.python.org/issue32492 and https://github.com/python/cpython/pull/10495.

Answered By: wst

Since this is an old question and we have newer data structures such as dataclasses available now, we should revisit this slightly 🙂

Tested on an AMD 5950x

Python 3.11:

test_slots                     0.082s
test_dataclass                 0.085s
test_dataclass_slots           0.086s
test_namedtuple_index          0.143s
test_dict                      0.144s
test_namedtuple_attr           0.169s
test_namedtuple_unpack         0.314s
test_enum_attr                 0.615s
test_enum_item                 1.082s
test_enum_call                 3.018s

Python 3.10:

test_dataclass_slots           0.155s
test_slots                     0.156s
test_dataclass                 0.177s
test_namedtuple_index          0.210s
test_dict                      0.214s
test_namedtuple_attr           0.261s
test_namedtuple_unpack         0.473s
test_enum_attr                 0.989s
test_enum_item                 1.790s
test_enum_call                 4.476s

From these results I would recommend using a dataclass for named access or tuples/namedtuples for index access.

The test code can be forked here: https://gist.github.com/WoLpH/02fae0b20b914354734aaac01c06d23b

import sys
import enum
import math
import random
import timeit
import typing
import dataclasses
import collections


repeat = 5
number = 1000
N = 5000


class PointTuple(typing.NamedTuple):
    x: int
    y: int
    z: int


@dataclasses.dataclass
class PointDataclass:
    x: int
    y: int
    z: int


@dataclasses.dataclass(slots=True)
class PointDataclassSlots:
    x: int
    y: int
    z: int


class PointObject:
    __slots__ = 'x', 'y', 'z'
    x: int
    y: int
    z: int


def test_namedtuple_attr():
    point = PointTuple(1234, 5678, 9012)

    for i in range(N):
        x, y, z = point.x, point.y, point.z


def test_namedtuple_index():
    point = PointTuple(1234, 5678, 9012)

    for i in range(N):
        x, y, z = point


def test_namedtuple_unpack():
    point = PointTuple(1234, 5678, 9012)

    for i in range(N):
        x, *y = point


def test_dataclass():
    point = PointDataclass(1234, 5678, 9012)

    for i in range(N):
        x, y, z = point.x, point.y, point.z


def test_dataclass_slots():
    point = PointDataclassSlots(1234, 5678, 9012)

    for i in range(N):
        x, y, z = point.x, point.y, point.z


def test_dict():
    point = dict(x=1234, y=5678, z=9012)

    for i in range(N):
        x, y, z = point['x'], point['y'], point['z']


def test_slots():
    point = PointObject()
    point.x = 1234
    point.y = 5678
    point.z = 9012

    for i in range(N):
        x, y, z = point.x, point.y, point.z


class PointEnum(enum.Enum):
    x = 1
    y = 2
    z = 3


def test_enum_attr():
    point = PointEnum

    for i in range(N):
        x, y, z = point.x, point.y, point.z

def test_enum_call():
    point = PointEnum

    for i in range(N):
        x, y, z = point(1), point(2), point(3)

def test_enum_item():
    point = PointEnum

    for i in range(N):
        x, y, z = point['x'], point['y'], point['z']

if __name__ == '__main__':
    tests = [
        test_namedtuple_attr,
        test_namedtuple_index,
        test_namedtuple_unpack,
        test_dataclass,
        test_dataclass_slots,
        test_dict,
        test_slots,
        test_enum_attr,
        test_enum_call,
        test_enum_item,
    ]

    print(f'Running tests {repeat} times with {number} calls.')
    print(f'Using {N} iterations in the loop')

    results = collections.defaultdict(lambda: math.inf)

    for i in range(repeat):
        # Shuffling tests to prevent skewed results due to CPU boosting or
        # thermal throttling
        random.shuffle(tests)

        print(f'Run {i}:', end=' ')
        for t in tests:
            name = t.__name__

            print(name, end=', ')
            sys.stdout.flush()

            timer = timeit.Timer(f'{name}()', f'from __main__ import {name}')
            results[name] = min(results[name], timer.timeit(number))

        print()

    for name, result in sorted(results.items(), key=lambda x: x[::-1]):
        print(f'{name:30} {result:.3f}s')
Answered By: Wolph