Annotating type with class and instance

Question

I’m making a semi-singleton class Foo that can have (also semi-singleton) subclasses. The constructor takes one argument, let’s call it a slug, and each (sub)class is supposed to have at most one instance for each value of slug.

Let’s say I have a subclass of Foo called Bar. Here is an example of calls:

Foo("a slug") -> returns a new instance of Foo, saved with key (Foo, "a slug").
Foo("some new slug") -> returns a new instance Foo, saved with key (Foo, "some new slug").
Foo("a slug") -> we have the same class and slug from step 1, so this returns the same instance that was returned in step 1.
Bar("a slug") -> we have the same slug as before, but a different class, so this returns a new instance of Bar, saved with key (Bar, "a slug").
Bar("a slug") -> this returns the same instance of Bar that we got in step 4.

I know how to implement this: class dictionary associating a tuple of type and str to instance, override __new__, etc. Simple stuff.

My question is how to type annotate this dictionary?

What I tried to do was something like this:

FooSubtype = TypeVar("FooSubtype", bound="Foo")

class Foo:
    _instances: Final[dict[tuple[Type[FooSubtype], str], FooSubtype]] = dict()

So, the idea is "whatever type is in the first element of the key ("assigning" it to FooSubtype type variable), the value needs to be an instance of that same type".

This fails with Type variable "FooSubtype" is unbound, and I kinda see why.

I get the same error if I split it like this:

FooSubtype = TypeVar("FooSubtype", bound="Foo")
InstancesKeyType: TypeAlias = tuple[Type[FooSubtype], str]

class Foo:
    _instances: Final[dict[InstancesKeyType, FooSubtype]] = dict()

The error points to the last line in this example, meaning it’s the value type, not the key one, that is the problem.

mypy also suggests using Generic, but I don’t see how to do it in this particular example, because the value’s type should somehow relate to the key’s type, not be a separate generic type.

This works:

class Foo:
    _instances: Final[dict[tuple[Type["Foo"], str], "Foo"]] = dict()

but it allows _instance[(Bar1, "x")] to be of type Bar2 (Bar1 and Bar2 here being different subclasses of Foo). It’s not a big problem and I’m ok with leaving it like this, but I’m wondering if there is a better (stricter) approach.

Asked By: Vedran Šego

||

Source

Answer 1

This is a really great question. First I looked through and said "no, you can’t at all", because you can’t express any relation between dict key and value. However, then I realised that your suggestion is almost possible to implement.

First, let’s define a protocol that describes your desired behavior:

from typing import TypeAlias, TypeVar, Protocol

_T = TypeVar("_T", bound="Foo")
# Avoid repetition, it's just a generic alias
_KeyT: TypeAlias = tuple[type[_T], str]  

class _CacheDict(Protocol):
    def __getitem__(self, __key: _KeyT[_T]) -> _T: ...
    def __delitem__(self, __key: _KeyT['Foo']) -> None: ...
    def __setitem__(self, __key: _KeyT[_T], __value: _T) -> None: ...

How does it work? It defines an arbitrary data structure with item access, such that cache_dict[(Foo1, 'foo')] resolves to type Foo1. It looks very much like a dict sub-part (or collections.abc.MutableMapping), but with slightly different typing. Dunder argument names are almost equivalent to positional-only arguments (with /). If you need other methods (e.g. get or pop), add them to this definition as well (you may want to use overload). You’ll almost certainly need __contains__ which should have the same signature as __delitem__.

So, now

class Foo:
    _instances: Final[_CacheDict] = cast(_CacheDict, dict())

class Foo1(Foo): pass
class Foo2(Foo): pass

reveal_type(Foo._instances[(Foo, 'foo')])  # N: Revealed type is "__main__.Foo"
reveal_type(Foo._instances[(Foo1, 'foo')])  # N: Revealed type is "__main__.Foo1"

wow, we have properly inferred value types! We cast dict to the desired type, because our typing is different from dict definitions.

It still has a problem: you can do

Foo._instances[(Foo1, 'foo')] = Foo2()

because _T just resolves to Foo here. However, this problem is completely unavoidable: even had we some infer keyword or Infer special form to spell def __setitem__(self, __key: _KeyT[Infer[_T]], __value: _T) -> None, it won’t work properly:

foo1_t: type[Foo] = Foo1  # Ok, upcasting
foo2: Foo = Foo2()  # Ok again
Foo._instances[(foo1_t, 'foo')] = foo2  # Ough, still allowed, _T is Foo again

Note that we don’t use any casts above, so this code is type-safe, but certainly conflicts with our intent.

So, we probably have to live with __setitem__ unstrictness, but at least have proper type from item access.

Finally, the class is not generic in _T, because otherwise all values will be inferred to declared type instead of function-scoped (you can try using Protocol[_T] as a base class and watch what’s happening, it’s pretty good for deeper understanding of mypy approach to type inference).

Here’s a link to playground with full code.

Also, you can subclass a MutableMapping[_KeyT['Foo'], 'Foo'] to get more methods instead of defining them manually. It will deal with __delitem__ and __contains__ out of the box, but __setitem__ and __getitem__ still need your implementation.

Here’s an alternative solution with MutableMapping and get (because get was tricky and funny to implement) (playground):

from collections.abc import MutableMapping
from abc import abstractmethod
from typing import TypeAlias, TypeVar, Final, TYPE_CHECKING, cast, overload


_T = TypeVar("_T", bound="Foo")
_Q = TypeVar("_Q")
_KeyT: TypeAlias = tuple[type[_T], str]

class _CacheDict(MutableMapping[_KeyT['Foo'], 'Foo']):
    @abstractmethod
    def __getitem__(self, __key: _KeyT[_T]) -> _T: ...
    @abstractmethod
    def __setitem__(self, __key: _KeyT[_T], __value: _T) -> None: ...
    
    @overload  # No-default version
    @abstractmethod
    def get(self, __key: _KeyT[_T]) -> _T | None: ...
    
    # Ooops, a `mypy` bug, try to replace with `__default: _T | _Q`
    # and check Foo._instances.get((Foo1, 'foo'), Foo2())
    # The type gets broader, but resolves to more specific one in a wrong way
    @overload  # Some default
    @abstractmethod
    def get(self, __key: _KeyT[_T], __default: _Q) -> _T | _Q: ...
    
    # Need this because of https://github.com/python/mypy/issues/11488
    @abstractmethod
    def get(self, __key: _KeyT[_T], __default: object = None) -> _T | object: ...


class Foo:
    _instances: Final[_CacheDict] = cast(_CacheDict, dict())

class Foo1(Foo): pass
class Foo2(Foo): pass

reveal_type(Foo._instances)
reveal_type(Foo._instances[(Foo, 'foo')])  # N: Revealed type is "__main__.Foo"
reveal_type(Foo._instances[(Foo1, 'foo')])  # N: Revealed type is "__main__.Foo1"
reveal_type(Foo._instances.get((Foo, 'foo')))  # N: Revealed type is "Union[__main__.Foo, None]"
reveal_type(Foo._instances.get((Foo1, 'foo')))  # N: Revealed type is "Union[__main__.Foo1, None]"
reveal_type(Foo._instances.get((Foo1, 'foo'), Foo1()))  # N: Revealed type is "__main__.Foo1"
reveal_type(Foo._instances.get((Foo1, 'foo'), Foo2()))  # N: Revealed type is "Union[__main__.Foo1, __main__.Foo2]"
(Foo1, 'foo') in Foo._instances  # We get this for free

Foo._instances[(Foo1, 'foo')] = Foo1()
Foo._instances[(Foo1, 'foo')] = object()  # E: Value of type variable "_T" of "__setitem__" of "_CacheDict" cannot be "object"  [type-var]

Note that we don’t use a Protocol now (because it needs MutableMapping to be a protocol as well) and use abstract methods instead.

Trick, don’t use it!

When I was writing this answer, I discovered a mypy bug that you can abuse in a very interesting way here. We started with something like this, right?

from collections.abc import MutableMapping
from abc import abstractmethod
from typing import TypeAlias, TypeVar, Final, TYPE_CHECKING, cast, overload


_T = TypeVar("_T", bound="Foo")
_Q = TypeVar("_Q")
_KeyT: TypeAlias = tuple[type[_T], str]

class _CacheDict(MutableMapping[_KeyT['Foo'], 'Foo']):
    @abstractmethod
    def __getitem__(self, __key: _KeyT[_T]) -> _T: ...
    @abstractmethod
    def __setitem__(self, __key: _KeyT[_T], __value: _T) -> None: ...
    
class Foo:
    _instances: Final[_CacheDict] = cast(_CacheDict, dict())

class Foo1(Foo): pass
class Foo2(Foo): pass

Foo._instances[(Foo1, 'foo')] = Foo1()
Foo._instances[(Foo1, 'foo')] = Foo2()

Now let’s change __setitem__ signature to a very weird thing. Warning: this is a bug, don’t rely on this behavior! If we type __default as _T | _Q, we magically get "proper" typing with strict narrowing to type of first argument.

    @abstractmethod
    def __setitem__(self, __key: _KeyT[_T], __value: _T | _Q) -> None: ...

Now:

Foo._instances[(Foo1, 'foo')] = Foo1()  # Ok
Foo._instances[(Foo1, 'foo')] = Foo2()  # E: Incompatible types in assignment (expression has type "Foo2", target has type "Foo1")  [assignment]

It is simply wrong, because _Q union part can be resolved to anything and is not used in fact (and moreover, it can’t be a typevar at all, because it’s used only once in the definition).

Also, this allows another invalid assignment, when right side is not a Foo subclass:

Foo._instances[(Foo1, 'foo')] = object()  # passes

I’ll report this soon and link the issue to this question.

Answered By: SUTerliakov

Annotating type with class and instance

Question:

Answers:

Trick, don’t use it!