[Python] is a dynamically typed programming language with good support for “interface based programming”.
“Interface based programming” is a style of programming in which arguments to functions need not have a particular type, but must match a specified interface. (Interfaces are an explicit feature of the type system in Java; they are called “signatures” in ML; and are important parts of technologies like COM [Pattison 199901] and CORBA.)
Python’s builtin functions are good examples. Consider the
function len
[van
Rossum 20011221, 2.1]. This has the following interface:
a
.a
must have a __len__
method.a.__len__
takes no arguments.a.__len__()
returns an integer.For example, we might define a class all of whose instances have length 1:
class length_one: def __len__(self): return 1
and then len(length_one())
is 1
.
This interface based approach in Python allows progammers to develop their own implementations of lists, strings, dictionaries, files and other native types, and use them wherever a native object would have been legal. See [van Rossum 20011221, 3.3].
Interface based programming extends to modules. For example, the Python Database API [DBAPI 2.0] specifies an interface to a relational database. A program that uses the interface can be ported between the database backends that implement the interface [DBAPI Modules].
The Python language itself does not know about interfaces: interfaces are specified by documenting them. However, we can consider every Python function to specify an interface: namely, the set of objects that can be passed to it.
For example, consider the function successor
:
def successor(n): return n + 1
This can be considered to offer the following interface:
n
.n
must have an __add__
method.n.__add__
takes one argument.n.__add__
accepts an integer.successor
returns whatever n.__add__
returns.This paper will formalize this notion of interfaces and set out a type system for expressing these interfaces.
The type system described in this paper is not sufficiently general to be able to give a type to all Python programs or even all Python programs that never have runtime type errors. However, it is conservative in the sense that it never gives a type to a program that can fail with a runtime type error, and when it does fail it is usually because the programmer has used a “trick” of the language.
This means that it can be a useful tool for checking that a program adheres to a type discipline. There’s more detail about this in section 4.
Given types T_{1}
, T_{2}
, etc, we can build more
types by applying the following two type constructors:
T_{1} → T_{2}
. (T_{1}, T_{2}, ...)
. { m_{1}: T_{1}, m_{2}: T_{2}, ...
}^{0}
. { m_{1}: T_{1}, m_{2}: T_{2}, ...
}^{1}
. Here m_{1}
, m_{2}
, etc are attribute or method
names, namely nonempty strings. Object types are “open” if they are
not fully known (that is, they may have more methods than the specified
ones); and “closed” if all methods are known. The distinction is not
always important and I will write object types without the superscript
to mean either open or closed.
I will write meth(T)
for the set of methods of an
object type. This means that:
meth({ m_{1}: T_{1}, ..., m_{n}: T_{n} }^{0}) ⊇ { m_{1}, ..., m_{n} }
meth({ m_{1}: T_{1}, ..., m_{n}: T_{n} }^{1}) = { m_{1}, ..., m_{n} }
In addition, there are some predefined names for types and type constructors:
None = { }^{1} Number = { __add__: Number → Number, __sub__: Number → Number, __mul__: Number → Number, ... }^{1} List(T) = { __len__: () → Number, __getitem__: Number → T, __setitem__: (Number, T) → None, ... }^{1} Dict(T,U) = { __len__: () → Number, __getitem__: T → U, __setitem__: (T, U) → None, ... }^{1}
Note that these definitions have no foundation. Types in Python are general graphs.
The subtype relation T <: U
means that any object of
type T
can be used in a place where an object of type
U
was required, or more simply, that an object of type
T
implements the interface U
.
The subtype relation satisfies the following rules (all except (Transitive) can be applied in reverse):
T <: T 
(Reflexive)  

(Transitive)  

(Antisymmetric)  

(Function)  

(Object)  

(Tuple) 
The union of two types is the set of objects of either type. The intersection of two types is the set of objects that match both types. The union and intersection operations obey these rules:

(Union)  
T <: T ∪ U 
(UnionL)  
U <: T ∪ U 
(UnionR)  

(Intersection)  
T ∩ U <: T 
(IntersectionL)  
T ∩ U <: U 
(IntersectionR) 
These rules can be derived from the rules above:
(T → U) ∪ (V → U) <: (T ∩ V) → U 
(∪Function) 
(T → U) ∩ (T → V) <: T → (U ∩ V) 
(∩Function) 
(T_{1}, T_{2}, ..., T_{n}) ∪ (U_{1}, U_{2}, ..., U_{n}) <: (T_{1} ∪ U_{1}, T_{2}
∪ U_{2}, ..., T_{n} ∪ U_{n}) 
(∪Tuple) 
(T_{1} ∩ U_{1}, T_{2} ∩ U_{2}, ..., T_{n} ∩ U_{n}) <: (T_{1}, T_{2}, ..., T_{n}) ∩ (U_{1}, U_{2}, ..., U_{n}) 
(∩Tuple) 
{ m_{i}: T_{i}  i ∈ I } ∪ { mj: Uj  j ∈ J } <: { m_{i}: T_{i} ∪ U_{i}  i ∈ I ∩ J } 
(∪Object) 
{ m_{i}: T_{i}  i ∈ I } ∩ { mj: Uj  j ∈ J } <: { m_{i}: T_{i} ∩ U_{i}  i ∈ I ∪ J } 
(∩Object) 
I suspect that there’s no type inference algorithm for this system. Type inference for types without foundations and intersection types is known to be hard.
However, that doesn’t mean we won’t be able to get useful results from a procedure that attempts to deduce a type in this system, because we may be able to prove that a program can’t have a type, which means either that the program may fail with a runtime type error, or else that the programmer has used a trick of the Python language not modelled by the type system. In either case, the result is useful.
The basic idea behind type inference is to assign a type to each expression in the program. Initially these types are placeholders. Then we apply rules for the language constructs that allow us to place constraints on the types.
These examples are informal, because I haven’t yet explained how to assign types to pieces of Python code. But they should be clear.
def successor(n): return n + 1
Let successor: A
and n: B
. Then I deduce:
A = B → C
from line 1. B = { __add__: D → E }^{0}
and
Number <: D
from the +
on line 2. E <: C
from the return
on line 2. Number → C <: D → E
by rule
(Function). { __add__: Number → C }^{0} <: B
by
rule (Object). A <: { __add__: Number → C }^{0} →
C
by rule (Function). This type expresses all the facts I noted about
successor
in section 1 so I can’t
do any better than this.
len(1)
Let len: A
and len(1): B
. I deduce:
A <: Number → B
from the function
application. A = { __len__: () → Number }^{0} →
Number
by the definition of len
. __len__ ∈ meth(Number)
by rule (Object). __len__ ∉ meth(Number)
(note that this is crucially dependent on Number
being a
closed object type; if it were open then the code would typecheck). len(a)
a + 1
Let a: A
. Then I deduce:
A <: { __len__: () → Number }
from line
1. A = { __add__: B → C }^{0}
and
Number <: B
from line 2. A <: { __len__: () → Number } ∩ { __add__: Number
→ B }
by rule (Intersection). A <: { __len__: () → Number, __add__: Number
→ B }
by rule (∩Object). a + 1
a + ""
Let a: A
. Then I deduce:
A = { __add__: B → C }^{0}
and
Number <: B
from line 1. String <: B
from line 2. Number ∪ String <: B
by rule (Union). A <: { __add__: (Number ∪ String) →
C }
by rules (Function) and (Object). The informal examples of type inference assume that each site has a single type (implements a single interface). But Python doesn’t enforce that: a programmer can use one variable with several meanings. For example, the following correct code doesn’t have a type in this system:
a = 1 a = [] len(a)
because we deduce
Number <: A
from line 1. A <: { __len__: () → Number }
from line
3. Number <: { __len__: () → Number }
by rule
(Transitive). Number
has no method
__len__
. The type system can’t deduce all the information it needs to. For example, the following correct code doesn’t have a type in this system:
a = lambda x: x b = a(1) c = a([]) len(c)
because we deduce
D <: E
and D → E <: A
from line 1. Number <: D
from line 2. E <: C
from line 3. Number <: C
by rule (Transitive). C <: { __len__: () → Number }
from line 4. Number <: { __len__: () → Number }
by rule
(Transitive). Number
has no method
__len__
. Here we didn’t deduce enough information about A
: it’s
really something like ∀T: T → T
. But type
inference is undecidable for many systems involving secondorder
types [Well 19960805] so I
didn’t want to include them in my system. And for similar reasons, this algorithm can’t deal with programs that
switch on the types of objects.
[Aycock 19991123]  “Aggressive type inference”; John Aycock; Department of Computer Science, University of Victoria, Canada; 19991123. 
[DBAPI 2.0]  “Python Database API Specification 2.0”. 
[DBAPI Modules]  “Python Database Modules”. 
[Jeske 19991121]  “Python TypeInference based LINT (pylint.py)”; David Jeske; 19991121. 
[Jeske 19991205]  “Re: Python TypeInference based LINT (pylint.py)”; David Jeske; 19991205. 
[Pattison 199901]  “Understanding Interfacebased Programming”; Ted Pattison Microsoft; 199901. 
[Python]  “The Python programming language”. 
[Well 19960805]  “Typability and type checking in the secondorder λ calculus are equivalent and undecidable”; J B Wells; Boston University; 19960805. 
[van Rossum 20011221]  “Python Reference Manual (Python 2.2)”; Guide van Rossum; 20011221. 
20020226  GDR  Created. 