How does one ignore extra arguments passed to a dataclass?

Question:

I’d like to create a config dataclass in order to simplify whitelisting of and access to specific environment variables (typing os.environ['VAR_NAME'] is tedious relative to config.VAR_NAME). I therefore need to ignore unused environment variables in my dataclass‘s __init__ function, but I don’t know how to extract the default __init__ in order to wrap it with, e.g., a function that also includes *_ as one of the arguments.

import os
from dataclasses import dataclass

@dataclass
class Config:
    VAR_NAME_1: str
    VAR_NAME_2: str

config = Config(**os.environ)

Running this gives me TypeError: __init__() got an unexpected keyword argument 'SOME_DEFAULT_ENV_VAR'.

Asked By: Californian

||

Answers:

I would just provide an explicit __init__ instead of using the autogenerated one. The body of the loop only sets recognized value, ignoring unexpected ones.

Note that this won’t complain about missing values without defaults until later, though.

@dataclass(init=False)
class Config:
    VAR_NAME_1: str
    VAR_NAME_2: str

    def __init__(self, **kwargs):
        names = set([f.name for f in dataclasses.fields(self)])
        for k, v in kwargs.items():
            if k in names:
                setattr(self, k, v)

Alternatively, you can pass a filtered environment to the default Config.__init__.

field_names = set(f.name for f in dataclasses.fields(Config))
c = Config(**{k:v for k,v in os.environ.items() if k in field_names})
Answered By: chepner

Cleaning the argument list before passing it to the constructor is probably the best way to go about it. I’d advice against writing your own __init__ function though, since the dataclass’ __init__ does a couple of other convenient things that you’ll lose by overriding it.

Also, since the argument-cleaning logic is very tightly bound to the behavior of the class and returns an instance, it might make sense to put it into a classmethod:

from dataclasses import dataclass
import inspect

@dataclass
class Config:
    var_1: str
    var_2: str

    @classmethod
    def from_dict(cls, env):      
        return cls(**{
            k: v for k, v in env.items() 
            if k in inspect.signature(cls).parameters
        })


# usage:
params = {'var_1': 'a', 'var_2': 'b', 'var_3': 'c'}
c = Config.from_dict(params)   # works without raising a TypeError 
print(c)
# prints: Config(var_1='a', var_2='b')
Answered By: Arne

I used a combination of both answers; setattr can be a performance killer. Naturally, if the dictionary won’t have some records in the dataclass, you’ll need to set field defaults for them.

from __future__ import annotations
from dataclasses import field, fields, dataclass

@dataclass()
class Record:
    name: str
    address: str
    zip: str = field(default=None)  # won't fail if dictionary doesn't have a zip key

    @classmethod
    def create_from_dict(cls, dict_) -> Record:
        class_fields = {f.name for f in fields(cls)}
        return Record(**{k: v for k, v in dict_.items() if k in class_fields})
Answered By: Doug

Using the dacite python library to populate a dataclass using a dictionary of values ignores extra arguments / values present in the dictionary (along with all the other benefits the library provides).

from dataclasses import dataclass
from dacite import from_dict


@dataclass
class User:
    name: str
    age: int
    is_active: bool


data = {
    'name': 'John',
    'age': 30,
    'is_active': True,
    "extra_1": 1000,
    "extra_2": "some value"
}

user = from_dict(data_class=User, data=data)
print(user)
# prints the following: User(name='John', age=30, is_active=True)
Answered By: Iswariya Manivannan

I did this based on previous answers:

import functools
import inspect

@functools.cache
def get_dataclass_parameters(cls: type):
    return inspect.signature(cls).parameters


def instantiate_dataclass_from_dict(cls: type, dic: dict):
    parameters = get_dataclass_parameters(cls)
    dic = {k: v for k, v in dic.items() if k in parameters}
    return cls(**dic)

Since inspect.signature(cls).parameters takes much more time than the actual instantiation / initialization, I use functools.cache to cache the result for each class.

Answered By: Frozen Flame