Difference between collections.abc.Sequence and typing.Sequence

Question:

I was reading an article and about collection.abc and typing class in the python standard library and discover both classes have the same features.

I tried both options using the code below and got the same results

from collections.abc import Sequence

def average(sequence: Sequence):
    return sum(sequence) / len(sequence)

print(average([1, 2, 3, 4, 5]))  # result is 3.0

from typing import Sequence

def average(sequence: Sequence):
    return sum(sequence) / len(sequence)

print(average([1, 2, 3, 4, 5])) # result is 3.0


Under what condition will collection.abc become a better option to typing. Are there benefits of using one over the other?

Asked By: Oluwasube

||

Answers:

Actually, in your code you need neither of those:

Typing with annotations, which is what you are doing with your imported Sequences class is an optional feature, meant for (1) quick documentation; (2) checking of the code before it is run by static code analysers such as Mypy.

The fact is that some IDEs use the result of static checking by default in their recomented configurations, and they can make it look like code without annotations is "faulty": it is not – this is an optional feature.

As long as the object you pass into your function respect some of the Sequence interface it will need, it will work (it needs __len__ and __getitem__ as is)

Just run your code without annotations and see it work:

def average(myvariable):
    return sum(myvariable) / len(myvariable)

That said, here is what is happening: list is "the sequence" by excellence in Python, and implements everything a sequence needs.

typing.Sequence is just an indicator for the static-checker tools that the data marked with it should respect the Sequence protocol, and does nothing at run time. You can’t instantiate it. You can inherit from it (probably) but just to specialize other markers for typing, not for anything that will have any effect during actual program execution.

On the other hand collections.abc.Sequence predates the optional typing recomendations in PEP 484: it works as both a "virtual super class" which can indicate everything that works as a sequence in runtime (through the use of isinstance) (*). AND it can be used as a solid base class to implement fully functional cusotm Sequence classes of your own: just inherit from collections.abc.Sequence and implement functional __getitem__ and __len__ methods as indicated in the docs here: https://docs.python.org/3/library/collections.abc.html (that is for read only sequences – for mutable sequences, check collections.abc.MutableSequence, of course).

(*) for your custom sequence implementation to be recognized as a Sequence proper it has to be "registered" in runtime with a call to collections.abc.Sequence.register. However, AFAIK, most tools for static type checking do not recognize this, and will error in their static analysis)

Answered By: jsbueno

Good on you for using type annotations! As the documentations says, if you are on Python 3.9+, you should most likely never use typing.Sequence due to its deprecation. Since the introduction of generic alias types in 3.9 the collections.abc classes all support subscripting and should be recognized correctly by static type checkers of all flavors.

So the benefit of using collections.abc.T over typing.T is mainly that the latter is deprecated and should not be used.

As mentioned by jsbueno in his answer, annotations will never have runtime implications either way, unless of course they are explicitly picked up by a piece of code. They are just an essential part of good coding style. But your function would still work, i.e. your script would execute without error, even if you annotated your function with something absurd like def average(sequence: 4%3): ....


Proper annotations are still extremely valuable. Thus, I would recommend you get used to some of the best practices as soon as possible. (A more-or-less strict static type checker like mypy is very helpful for that.) For one thing, when you are using generic types like Sequence, you should always provide the appropriate type arguments. Those may be type variables, if your function is also generic or they may be concrete types, but you should always include them.

In your case, assuming you expect the contents of your sequence to be something that can be added with the same type and divided by an integer, you might want to e.g. annotate it as Sequence[float]. (In the Python type system, float is considered a supertype of int, even though there is no nominal inheritance.)

Another recommendation is to try and be as broad as possible in the parameter types. (This echoes the Python paradigm of dynamic typing.) The idea is that you just specify that the object you expect must be able to "quack", but you don’t say it must be a duck.

In your example, since you are reliant on the argument being compatible with sum as well as with len, you should consider what types those functions expect. The len function is simple, since it basically just calls the __len__ method of the object you pass to it. The sum function is more nuanced, but in your case the relevant part is that it expects an iterable of elements that can be added (e.g. float).

If you take a look at the collections ABCs, you’ll notice that Sequence actually offers much more than you need, being that it is a reversible collection. A Collection is the broadest built-in type that fulfills your requirements because it has __iter__ (from Iterable) and __len__ (from Sized). So you could do this instead:

from collections.abc import Collection


def average(numbers: Collection[float]) -> float:
    return sum(numbers) / len(numbers)

(By the way, the parameter name should not reflect its type.)

Lastly, if you wanted to go all out and be as broad as possible, you could define your own protocol that is even broader than Collection (by getting rid of the Container inheritance):

from collections.abc import Iterable, Sized
from typing import Protocol, TypeVar


T = TypeVar("T", covariant=True)


class SizedIterable(Sized, Iterable[T], Protocol[T]):
    ...


def average(numbers: SizedIterable[float]) -> float:
    return sum(numbers) / len(numbers)

This has the advantage of supporting very broad structural subtyping, but is most likely overkill.

(For the basics of Python typing, PEP 483 and PEP 484 are a must-read.)

Answered By: Daniil Fajnberg