Annotating type with class and instance
Question:
I’m making a semi-singleton class Foo
that can have (also semi-singleton) subclasses. The constructor takes one argument, let’s call it a slug
, and each (sub)class is supposed to have at most one instance for each value of slug
.
Let’s say I have a subclass of Foo
called Bar
. Here is an example of calls:
Foo("a slug")
-> returns a new instance of Foo
, saved with key (Foo, "a slug")
.
Foo("some new slug")
-> returns a new instance Foo
, saved with key (Foo, "some new slug")
.
Foo("a slug")
-> we have the same class and slug
from step 1, so this returns the same instance that was returned in step 1.
Bar("a slug")
-> we have the same slug as before, but a different class, so this returns a new instance of Bar
, saved with key (Bar, "a slug")
.
Bar("a slug")
-> this returns the same instance of Bar
that we got in step 4.
I know how to implement this: class dictionary associating a tuple of type
and str
to instance, override __new__
, etc. Simple stuff.
My question is how to type annotate this dictionary?
What I tried to do was something like this:
FooSubtype = TypeVar("FooSubtype", bound="Foo")
class Foo:
_instances: Final[dict[tuple[Type[FooSubtype], str], FooSubtype]] = dict()
So, the idea is "whatever type is in the first element of the key ("assigning" it to FooSubtype
type variable), the value needs to be an instance of that same type".
This fails with Type variable "FooSubtype" is unbound
, and I kinda see why.
I get the same error if I split it like this:
FooSubtype = TypeVar("FooSubtype", bound="Foo")
InstancesKeyType: TypeAlias = tuple[Type[FooSubtype], str]
class Foo:
_instances: Final[dict[InstancesKeyType, FooSubtype]] = dict()
The error points to the last line in this example, meaning it’s the value type, not the key one, that is the problem.
mypy
also suggests using Generic
, but I don’t see how to do it in this particular example, because the value’s type should somehow relate to the key’s type, not be a separate generic type.
This works:
class Foo:
_instances: Final[dict[tuple[Type["Foo"], str], "Foo"]] = dict()
but it allows _instance[(Bar1, "x")]
to be of type Bar2
(Bar1
and Bar2
here being different subclasses of Foo
). It’s not a big problem and I’m ok with leaving it like this, but I’m wondering if there is a better (stricter) approach.
Answers:
This is a really great question. First I looked through and said "no, you can’t at all", because you can’t express any relation between dict key and value. However, then I realised that your suggestion is almost possible to implement.
First, let’s define a protocol that describes your desired behavior:
from typing import TypeAlias, TypeVar, Protocol
_T = TypeVar("_T", bound="Foo")
# Avoid repetition, it's just a generic alias
_KeyT: TypeAlias = tuple[type[_T], str]
class _CacheDict(Protocol):
def __getitem__(self, __key: _KeyT[_T]) -> _T: ...
def __delitem__(self, __key: _KeyT['Foo']) -> None: ...
def __setitem__(self, __key: _KeyT[_T], __value: _T) -> None: ...
How does it work? It defines an arbitrary data structure with item access, such that cache_dict[(Foo1, 'foo')]
resolves to type Foo1
. It looks very much like a dict
sub-part (or collections.abc.MutableMapping
), but with slightly different typing. Dunder argument names are almost equivalent to positional-only arguments (with /
). If you need other methods (e.g. get
or pop
), add them to this definition as well (you may want to use overload
). You’ll almost certainly need __contains__
which should have the same signature as __delitem__
.
So, now
class Foo:
_instances: Final[_CacheDict] = cast(_CacheDict, dict())
class Foo1(Foo): pass
class Foo2(Foo): pass
reveal_type(Foo._instances[(Foo, 'foo')]) # N: Revealed type is "__main__.Foo"
reveal_type(Foo._instances[(Foo1, 'foo')]) # N: Revealed type is "__main__.Foo1"
wow, we have properly inferred value types! We cast dict
to the desired type, because our typing is different from dict
definitions.
It still has a problem: you can do
Foo._instances[(Foo1, 'foo')] = Foo2()
because _T
just resolves to Foo
here. However, this problem is completely unavoidable: even had we some infer
keyword or Infer
special form to spell def __setitem__(self, __key: _KeyT[Infer[_T]], __value: _T) -> None
, it won’t work properly:
foo1_t: type[Foo] = Foo1 # Ok, upcasting
foo2: Foo = Foo2() # Ok again
Foo._instances[(foo1_t, 'foo')] = foo2 # Ough, still allowed, _T is Foo again
Note that we don’t use any casts above, so this code is type-safe, but certainly conflicts with our intent.
So, we probably have to live with __setitem__
unstrictness, but at least have proper type from item access.
Finally, the class is not generic in _T
, because otherwise all values will be inferred to declared type instead of function-scoped (you can try using Protocol[_T]
as a base class and watch what’s happening, it’s pretty good for deeper understanding of mypy
approach to type inference).
Here’s a link to playground with full code.
Also, you can subclass a MutableMapping[_KeyT['Foo'], 'Foo']
to get more methods instead of defining them manually. It will deal with __delitem__
and __contains__
out of the box, but __setitem__
and __getitem__
still need your implementation.
Here’s an alternative solution with MutableMapping
and get
(because get
was tricky and funny to implement) (playground):
from collections.abc import MutableMapping
from abc import abstractmethod
from typing import TypeAlias, TypeVar, Final, TYPE_CHECKING, cast, overload
_T = TypeVar("_T", bound="Foo")
_Q = TypeVar("_Q")
_KeyT: TypeAlias = tuple[type[_T], str]
class _CacheDict(MutableMapping[_KeyT['Foo'], 'Foo']):
@abstractmethod
def __getitem__(self, __key: _KeyT[_T]) -> _T: ...
@abstractmethod
def __setitem__(self, __key: _KeyT[_T], __value: _T) -> None: ...
@overload # No-default version
@abstractmethod
def get(self, __key: _KeyT[_T]) -> _T | None: ...
# Ooops, a `mypy` bug, try to replace with `__default: _T | _Q`
# and check Foo._instances.get((Foo1, 'foo'), Foo2())
# The type gets broader, but resolves to more specific one in a wrong way
@overload # Some default
@abstractmethod
def get(self, __key: _KeyT[_T], __default: _Q) -> _T | _Q: ...
# Need this because of https://github.com/python/mypy/issues/11488
@abstractmethod
def get(self, __key: _KeyT[_T], __default: object = None) -> _T | object: ...
class Foo:
_instances: Final[_CacheDict] = cast(_CacheDict, dict())
class Foo1(Foo): pass
class Foo2(Foo): pass
reveal_type(Foo._instances)
reveal_type(Foo._instances[(Foo, 'foo')]) # N: Revealed type is "__main__.Foo"
reveal_type(Foo._instances[(Foo1, 'foo')]) # N: Revealed type is "__main__.Foo1"
reveal_type(Foo._instances.get((Foo, 'foo'))) # N: Revealed type is "Union[__main__.Foo, None]"
reveal_type(Foo._instances.get((Foo1, 'foo'))) # N: Revealed type is "Union[__main__.Foo1, None]"
reveal_type(Foo._instances.get((Foo1, 'foo'), Foo1())) # N: Revealed type is "__main__.Foo1"
reveal_type(Foo._instances.get((Foo1, 'foo'), Foo2())) # N: Revealed type is "Union[__main__.Foo1, __main__.Foo2]"
(Foo1, 'foo') in Foo._instances # We get this for free
Foo._instances[(Foo1, 'foo')] = Foo1()
Foo._instances[(Foo1, 'foo')] = object() # E: Value of type variable "_T" of "__setitem__" of "_CacheDict" cannot be "object" [type-var]
Note that we don’t use a Protocol
now (because it needs MutableMapping
to be a protocol as well) and use abstract methods instead.
Trick, don’t use it!
When I was writing this answer, I discovered a mypy
bug that you can abuse in a very interesting way here. We started with something like this, right?
from collections.abc import MutableMapping
from abc import abstractmethod
from typing import TypeAlias, TypeVar, Final, TYPE_CHECKING, cast, overload
_T = TypeVar("_T", bound="Foo")
_Q = TypeVar("_Q")
_KeyT: TypeAlias = tuple[type[_T], str]
class _CacheDict(MutableMapping[_KeyT['Foo'], 'Foo']):
@abstractmethod
def __getitem__(self, __key: _KeyT[_T]) -> _T: ...
@abstractmethod
def __setitem__(self, __key: _KeyT[_T], __value: _T) -> None: ...
class Foo:
_instances: Final[_CacheDict] = cast(_CacheDict, dict())
class Foo1(Foo): pass
class Foo2(Foo): pass
Foo._instances[(Foo1, 'foo')] = Foo1()
Foo._instances[(Foo1, 'foo')] = Foo2()
Now let’s change __setitem__
signature to a very weird thing. Warning: this is a bug, don’t rely on this behavior! If we type __default
as _T | _Q
, we magically get "proper" typing with strict narrowing to type of first argument.
@abstractmethod
def __setitem__(self, __key: _KeyT[_T], __value: _T | _Q) -> None: ...
Now:
Foo._instances[(Foo1, 'foo')] = Foo1() # Ok
Foo._instances[(Foo1, 'foo')] = Foo2() # E: Incompatible types in assignment (expression has type "Foo2", target has type "Foo1") [assignment]
It is simply wrong, because _Q
union part can be resolved to anything and is not used in fact (and moreover, it can’t be a typevar at all, because it’s used only once in the definition).
Also, this allows another invalid assignment, when right side is not a Foo
subclass:
Foo._instances[(Foo1, 'foo')] = object() # passes
I’ll report this soon and link the issue to this question.
I’m making a semi-singleton class Foo
that can have (also semi-singleton) subclasses. The constructor takes one argument, let’s call it a slug
, and each (sub)class is supposed to have at most one instance for each value of slug
.
Let’s say I have a subclass of Foo
called Bar
. Here is an example of calls:
Foo("a slug")
-> returns a new instance ofFoo
, saved with key(Foo, "a slug")
.Foo("some new slug")
-> returns a new instanceFoo
, saved with key(Foo, "some new slug")
.Foo("a slug")
-> we have the same class andslug
from step 1, so this returns the same instance that was returned in step 1.Bar("a slug")
-> we have the same slug as before, but a different class, so this returns a new instance ofBar
, saved with key(Bar, "a slug")
.Bar("a slug")
-> this returns the same instance ofBar
that we got in step 4.
I know how to implement this: class dictionary associating a tuple of type
and str
to instance, override __new__
, etc. Simple stuff.
My question is how to type annotate this dictionary?
What I tried to do was something like this:
FooSubtype = TypeVar("FooSubtype", bound="Foo")
class Foo:
_instances: Final[dict[tuple[Type[FooSubtype], str], FooSubtype]] = dict()
So, the idea is "whatever type is in the first element of the key ("assigning" it to FooSubtype
type variable), the value needs to be an instance of that same type".
This fails with Type variable "FooSubtype" is unbound
, and I kinda see why.
I get the same error if I split it like this:
FooSubtype = TypeVar("FooSubtype", bound="Foo")
InstancesKeyType: TypeAlias = tuple[Type[FooSubtype], str]
class Foo:
_instances: Final[dict[InstancesKeyType, FooSubtype]] = dict()
The error points to the last line in this example, meaning it’s the value type, not the key one, that is the problem.
mypy
also suggests using Generic
, but I don’t see how to do it in this particular example, because the value’s type should somehow relate to the key’s type, not be a separate generic type.
This works:
class Foo:
_instances: Final[dict[tuple[Type["Foo"], str], "Foo"]] = dict()
but it allows _instance[(Bar1, "x")]
to be of type Bar2
(Bar1
and Bar2
here being different subclasses of Foo
). It’s not a big problem and I’m ok with leaving it like this, but I’m wondering if there is a better (stricter) approach.
This is a really great question. First I looked through and said "no, you can’t at all", because you can’t express any relation between dict key and value. However, then I realised that your suggestion is almost possible to implement.
First, let’s define a protocol that describes your desired behavior:
from typing import TypeAlias, TypeVar, Protocol
_T = TypeVar("_T", bound="Foo")
# Avoid repetition, it's just a generic alias
_KeyT: TypeAlias = tuple[type[_T], str]
class _CacheDict(Protocol):
def __getitem__(self, __key: _KeyT[_T]) -> _T: ...
def __delitem__(self, __key: _KeyT['Foo']) -> None: ...
def __setitem__(self, __key: _KeyT[_T], __value: _T) -> None: ...
How does it work? It defines an arbitrary data structure with item access, such that cache_dict[(Foo1, 'foo')]
resolves to type Foo1
. It looks very much like a dict
sub-part (or collections.abc.MutableMapping
), but with slightly different typing. Dunder argument names are almost equivalent to positional-only arguments (with /
). If you need other methods (e.g. get
or pop
), add them to this definition as well (you may want to use overload
). You’ll almost certainly need __contains__
which should have the same signature as __delitem__
.
So, now
class Foo:
_instances: Final[_CacheDict] = cast(_CacheDict, dict())
class Foo1(Foo): pass
class Foo2(Foo): pass
reveal_type(Foo._instances[(Foo, 'foo')]) # N: Revealed type is "__main__.Foo"
reveal_type(Foo._instances[(Foo1, 'foo')]) # N: Revealed type is "__main__.Foo1"
wow, we have properly inferred value types! We cast dict
to the desired type, because our typing is different from dict
definitions.
It still has a problem: you can do
Foo._instances[(Foo1, 'foo')] = Foo2()
because _T
just resolves to Foo
here. However, this problem is completely unavoidable: even had we some infer
keyword or Infer
special form to spell def __setitem__(self, __key: _KeyT[Infer[_T]], __value: _T) -> None
, it won’t work properly:
foo1_t: type[Foo] = Foo1 # Ok, upcasting
foo2: Foo = Foo2() # Ok again
Foo._instances[(foo1_t, 'foo')] = foo2 # Ough, still allowed, _T is Foo again
Note that we don’t use any casts above, so this code is type-safe, but certainly conflicts with our intent.
So, we probably have to live with __setitem__
unstrictness, but at least have proper type from item access.
Finally, the class is not generic in _T
, because otherwise all values will be inferred to declared type instead of function-scoped (you can try using Protocol[_T]
as a base class and watch what’s happening, it’s pretty good for deeper understanding of mypy
approach to type inference).
Here’s a link to playground with full code.
Also, you can subclass a MutableMapping[_KeyT['Foo'], 'Foo']
to get more methods instead of defining them manually. It will deal with __delitem__
and __contains__
out of the box, but __setitem__
and __getitem__
still need your implementation.
Here’s an alternative solution with MutableMapping
and get
(because get
was tricky and funny to implement) (playground):
from collections.abc import MutableMapping
from abc import abstractmethod
from typing import TypeAlias, TypeVar, Final, TYPE_CHECKING, cast, overload
_T = TypeVar("_T", bound="Foo")
_Q = TypeVar("_Q")
_KeyT: TypeAlias = tuple[type[_T], str]
class _CacheDict(MutableMapping[_KeyT['Foo'], 'Foo']):
@abstractmethod
def __getitem__(self, __key: _KeyT[_T]) -> _T: ...
@abstractmethod
def __setitem__(self, __key: _KeyT[_T], __value: _T) -> None: ...
@overload # No-default version
@abstractmethod
def get(self, __key: _KeyT[_T]) -> _T | None: ...
# Ooops, a `mypy` bug, try to replace with `__default: _T | _Q`
# and check Foo._instances.get((Foo1, 'foo'), Foo2())
# The type gets broader, but resolves to more specific one in a wrong way
@overload # Some default
@abstractmethod
def get(self, __key: _KeyT[_T], __default: _Q) -> _T | _Q: ...
# Need this because of https://github.com/python/mypy/issues/11488
@abstractmethod
def get(self, __key: _KeyT[_T], __default: object = None) -> _T | object: ...
class Foo:
_instances: Final[_CacheDict] = cast(_CacheDict, dict())
class Foo1(Foo): pass
class Foo2(Foo): pass
reveal_type(Foo._instances)
reveal_type(Foo._instances[(Foo, 'foo')]) # N: Revealed type is "__main__.Foo"
reveal_type(Foo._instances[(Foo1, 'foo')]) # N: Revealed type is "__main__.Foo1"
reveal_type(Foo._instances.get((Foo, 'foo'))) # N: Revealed type is "Union[__main__.Foo, None]"
reveal_type(Foo._instances.get((Foo1, 'foo'))) # N: Revealed type is "Union[__main__.Foo1, None]"
reveal_type(Foo._instances.get((Foo1, 'foo'), Foo1())) # N: Revealed type is "__main__.Foo1"
reveal_type(Foo._instances.get((Foo1, 'foo'), Foo2())) # N: Revealed type is "Union[__main__.Foo1, __main__.Foo2]"
(Foo1, 'foo') in Foo._instances # We get this for free
Foo._instances[(Foo1, 'foo')] = Foo1()
Foo._instances[(Foo1, 'foo')] = object() # E: Value of type variable "_T" of "__setitem__" of "_CacheDict" cannot be "object" [type-var]
Note that we don’t use a Protocol
now (because it needs MutableMapping
to be a protocol as well) and use abstract methods instead.
Trick, don’t use it!
When I was writing this answer, I discovered a mypy
bug that you can abuse in a very interesting way here. We started with something like this, right?
from collections.abc import MutableMapping
from abc import abstractmethod
from typing import TypeAlias, TypeVar, Final, TYPE_CHECKING, cast, overload
_T = TypeVar("_T", bound="Foo")
_Q = TypeVar("_Q")
_KeyT: TypeAlias = tuple[type[_T], str]
class _CacheDict(MutableMapping[_KeyT['Foo'], 'Foo']):
@abstractmethod
def __getitem__(self, __key: _KeyT[_T]) -> _T: ...
@abstractmethod
def __setitem__(self, __key: _KeyT[_T], __value: _T) -> None: ...
class Foo:
_instances: Final[_CacheDict] = cast(_CacheDict, dict())
class Foo1(Foo): pass
class Foo2(Foo): pass
Foo._instances[(Foo1, 'foo')] = Foo1()
Foo._instances[(Foo1, 'foo')] = Foo2()
Now let’s change __setitem__
signature to a very weird thing. Warning: this is a bug, don’t rely on this behavior! If we type __default
as _T | _Q
, we magically get "proper" typing with strict narrowing to type of first argument.
@abstractmethod
def __setitem__(self, __key: _KeyT[_T], __value: _T | _Q) -> None: ...
Now:
Foo._instances[(Foo1, 'foo')] = Foo1() # Ok
Foo._instances[(Foo1, 'foo')] = Foo2() # E: Incompatible types in assignment (expression has type "Foo2", target has type "Foo1") [assignment]
It is simply wrong, because _Q
union part can be resolved to anything and is not used in fact (and moreover, it can’t be a typevar at all, because it’s used only once in the definition).
Also, this allows another invalid assignment, when right side is not a Foo
subclass:
Foo._instances[(Foo1, 'foo')] = object() # passes
I’ll report this soon and link the issue to this question.