Store the order of arguments given to dataclass initializer

Question:

Using the Python dataclass decorator generates signatures with arguments in a particular order:

from dataclasses import dataclass
from inspect import signature

@dataclass
class Person:
    age: int
    name: str = 'John'

print(signature(Person))

Gives (age: int, name: str = 'John') -> None.

Is there a way to capture the order of arguments given when Person is instantiated? That is: Person(name='Jack', age=10) -> ('name', 'age'). I’m at a loss because writing an __init__ method on Person defeats most reasons for using the dataclass decorator. I don’t want to lose the type hints you get when creating a Person, but I need to serialize the instance to JSON with keys in the order used at initialization.

Asked By: Ian

||

Answers:

I think this is a good use case for writing your own __init__, and I don’t think that defeats the point of using a dataclass at all. You still get a nice __str__, __repr__, __eq__, and (if frozen) __hash__. You also get pattern matching for free.

A double-star kwargs argument is a Python dictionary, and Python dictionaries remember the order in which fields are inserted. Consider something like this.

from __future__ import annotations

from dataclasses import dataclass, InitVar
from inspect import signature
from typing import Any

@dataclass(init=False)
class Person:
    age: int
    name: str = 'John'

    original_ctor_args: InitVar[dict[str, Any]]

    def __init__(self, **kwargs):
        self.age = kwargs['age']
        self.name = kwargs.get('name', 'John')

        self.original_ctor_args = list(kwargs)

print(Person(age=10, name = 'Joe').original_ctor_args) # Prints ['age', 'name']
print(Person(name = 'Alice', age=15).original_ctor_args) # Prints ['name', 'age']
print(Person(age=20).original_ctor_args) # Prints ['age']
Answered By: Silvio Mayolo

Your question can be broken down in two parts–first, how you can get the keyword arguments in the order the caller passes them, and second, how you can modify dataclasses in a way that would allow the __init__ method of the decorated class to keep track of the said order.

To obtain the order of the keyword arguments specified by the caller, you can inspect the frame info of the caller like this:

import inspect

def foo(a=2, b=3):
    print(inspect.stack()[1][4][0], end='')

foo(b=4, a=3)

This would output the line containing the call to foo:

foo(b=4, a=3)

which you can then parse to get the order of the keyword arguments.

The above method comes with some severe limitations, however, that only the first line of the call is included, and that there is no way to tell which of the several calls to the same function is the one currently running, so the following code:

foo(a=3, b=4), foo(b=4,       
a=3), 2 # the second call to foo spans over two lines

would output:

foo(a=3, b=4), foo(b=4,       
foo(a=3, b=4), foo(b=4, 

which is downright unusable.

Fortunately, as of Python 3.11, there is now a new attribute of inspect.FrameInfo named positions, which contains the exact starting and ending line numbers of a call, as well as the exact starting and ending character offsets, so you can now extract the exact string that the caller uses to make a call currently running:

import inspect
from itertools import islice

def foo(a=2, b=3):
    frame_info = inspect.stack()[1]
    positions = frame_info.positions
    with open(frame_info.filename) as file:
        lines = list(islice(file, positions.lineno - 1, positions.end_lineno))
        lines[0] = lines[0][positions.col_offset:]
        lines[-1] = lines[-1][:positions.end_col_offset]
        print(''.join(lines))

1, foo(b=4,
a=3), 2

And this outputs:

foo(b=4,
a=3)

Great! Now you can use ast.parse to parse the above string and get the list of keyword arguments used to make the call.

Since dicts are guaranteed to preserve insertion order since Python 3.7, you can simply make the __init__ method of your data class sort the __dict__ attribute according to the keyword order above. To do that, you can create a dict that maps argument names to their indices in the keyword argument list above:

indices = next({keyword.arg: index for index, keyword in enumerate(node.keywords)}
    for node in ast.walk(ast.parse(''.join(lines))) if isinstance(node, ast.Call))

Now, reading the source code of dataclasses.py, you’ll find that it creates the __init__ method for your data class by calling _create_fn with the body of the method stored as a list of strings, so you can simply create a wrapper of _create_fn that adds the keyword order detection code as well as the post-initialization sorting code around the original code, and then patch _create_fn with this wrapper function.

The resulting code looks like this:

import dataclasses
from itertools import chain

def _create_fn(name, args, body, _orig_create_fn=dataclasses._create_fn, **kwargs):
    return _orig_create_fn(
        name,
        args,
        chain(
'''
import ast
from itertools import islice

frame_info = inspect.stack()[1]
positions = frame_info.positions
with open(frame_info.filename) as file:
    lines = list(islice(file, positions.lineno - 1, positions.end_lineno))
    lines[0] = lines[0][positions.col_offset:]
    lines[-1] = lines[-1][:positions.end_col_offset]
    indices = next({keyword.arg: index for index, keyword in enumerate(node.keywords)}
        for node in ast.walk(ast.parse(''.join(lines))) if isinstance(node, ast.Call))
'''.splitlines(),
            body,
            [
'self.__dict__ = dict(sorted(self.__dict__.items(), key=lambda t: indices.get(t[0], -1)))'
            ]
        ) if name == '__init__' else body,
        **kwargs
    )
dataclasses._create_fn = _create_fn

so that:

import inspect
from dataclasses import dataclass

@dataclass
class Person:
    age: int
    name: str = 'John'

print(inspect.signature(Person))
print(Person(name='Jack', age=10).__dict__)

would output:

(age: int, name: str = 'John') -> None
{'name': 'Jack', 'age': 10}

The order of the keywords of the __dict__ attribute, as you can see from above, follows the order of the keywords of the call as opposed to their definition in the data class.

Feel free to backport this for Python 3.10 and earlier versions using the inspect.stack()[1][4][0] method if your actual use case doesn’t involve calls spanning over multiple lines or multiple calls on the same line.

Answered By: blhsing
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.