Type-safety of the Python 3.9+ Annotated type
Python has made a lot of progress with gradual typing, a flexible type system, and type-checkers that scale well to very large codebases.
One nice feature that was introduced in Python 3.9 allows developers to
add annotations as additional metadata about a type. The syntax is
covered in the documentation of the typing
module, briefly:
name: Annotated[Type, ...]
Specifies an annotated variable name
of type Type
, with some
additional information ...
that is available to both a type-checker
and to libraries at runtime.
There are a lot of Python frameworks that use descriptors and
metaclasses to provide domain specific languages for defining things
like object relational models and complex serialization. That
is a lot of complexity and Annotated
could really help to simplify
things and push adoption of Python’s type annotations.
But I don’t think we’ll see broad adoption into type-checked Python codebases until we can make it type-safe.
The problem with Annotated⌗
One of the example use cases for Annotated
from the Python
documentation is:
Annotated[int, ValueRange(3, 10), ctype("char")]
I’ll focus on ValueRange
, which in this case is added for some
notional “range checking” support, which could happen at any of
type-checking, runtime, or both. But since we’re talking about a type
system, there’s a problem here.
The type of ValueRange
is dependent on the type of the annotated
variable, but we have no way to express this in Python’s type
annotation system.
It’s possible to write:
Annotated[str, ValueRange(3, 10)]
And the type-checker has no way to assert that ValueRange(3, 10)
should not be applied to str
. You’ll get green signals from your
type-checker, and it will blow up at runtime when an str <= int
or
str >= int
comparison is attempted.
It would be much nicer if we could write something like:
class ValueRange(Generic[T]):
def __init__(self, value_range: Sequence[T]) -> None:
self.value_range = value_range
def is_in_range(self, value: T) -> bool:
return value in self.value_range
Then use it like:
Annotated[int, ValueRange(range(3, 10))]
Annotated[str, ValueRange(["first", "second", "third"])
With full type-safety, so our type-checker will complain early if we try to write:
Annotated[bool, ValueRange(["True", "False")]
As an added bonus, our IDE will give us nice squiggly lines so we can immediately spot the problem.
Solving the problem⌗
We could solve this problem by introducing a type-checking only concept,
like Generic
:
Annotates[Q]
When used as part of a class’ bases, this would mean that:
- When the type appears as part of the tail of an
Annotated[...]
. - Ensure the head of the
Annotated
satisfiesQ
.
For example, an annotation that declares a numeric (i.e. int
or
float
) annotated value will always be positive could be defined as:
class Positive(Annotates[float]):
pass
And used like:
Annotated[int, Positive()]
Annotated[float, Positive()]
But trying to check Annotated[str, Positive()]
would fail as str
is
neither int
nor float
.
Returning to our ValueRange
example, we should be able to express
generic annotations too:
class ValueRange(Generic[T], Annotates[T]):
def __init__(self, value_range: Sequence[T]) -> None:
self.value_range = value_range
def is_in_range(self, value: T) -> bool:
return value in self.value_range
In this case, the type checker will infer the type of a ValueRange[T]
from its constructor arguments, and would check that inferred type
against the head of the Annotated[...]
.
Where next⌗
I’m not sure! This has just been annoying me long enough that I wanted to write about it.
If nobody already has a solution, I suppose this could be a PEP, but I’m not sure where to even start with that. Any pointers would be appreciated.