8 Types II: Typing Formalism
We will continue our exploration of types. First, we will introduce typing inference rules, which give us a concise and efficient way of describing type systems and how they behave. We will use these inference rules to give type systems for the IfLang and simply-typed lambda calculus. Then, we will explore the consequences of simple types, and see how it is not possible to give a type derivation for self-application. Then, we will see how brittle non-termination is: we will see how adding a seemingly orthogonal feature to termination – mutable state and references – results in a language that has nontermination, despite the fact that we did not introduce any explicit looping or recursion construct.
8.1 Typing Rules and Derivations for IfLang
Recall our IfLang type checker:
> (define-type IfLang (numI [value : Number]) (addI [e1 : IfLang] [e2 : IfLang]) (boolI [value : Boolean]) (ifI [guard : IfLang] [thn : IfLang] [els : IfLang]))
> (define-type Value (numV [v : Number]) (boolV [v : Boolean]))
> (define-type IfType [numT] [boolT]) > (type-of : (IfLang -> IfType))
> (define (type-of e) (type-case IfLang e [(numI n) (numT)] [(boolI b) (boolT)] [(addI e1 e2) (if (and (equal? (type-of e1) (numT)) (equal? (type-of e2) (numT))) (numT) (error 'type "expected int"))] [(ifI g thn els) (if (equal? (type-of g) (boolT)) (let [(t-thn (type-of thn)) (t-els (type-of els))] (if (equal? t-thn t-els) t-thn (error 'type "expected thn and els to have same type"))) (error 'type "expected bool"))]))
Similar to the situation when we introduced inference rules and derivation trees to efficiently and concisely describe the semantics of programs, it is once again useful to introduce a system of notation for efficiently describing typing rules. Here are the rules for the type system for IfLang that captures the above typechecker:
\dfrac{~}{n : \texttt{Num}}~\text{(T-Num)} \qquad \dfrac{~}{b : \texttt{Bool}}~\text{(T-Bool)}\dfrac{\texttt{e1 : Num} \quad \texttt{e2 : Num}} {\texttt{(+ e1 e2)} : \texttt{Num}}~\text{(T-Add)}\dfrac{\texttt{g : Bool} \quad \texttt{e1} : \tau \quad \texttt{e2} : \tau} {\texttt{(if g e1 e2)} : \tau}~\text{(T-If)}
For example, the (T-Num) rule says that any syntactic number n has type Num; we read the relation e : \tau as "term e has type \tau".
Once again, to understand how these rules work, it’s helpful to give some example derivation trees:
\dfrac{\dfrac{\dfrac{~}{1 : \texttt{Num}}~\text{(T-Num)} \quad \dfrac{~}{2 : \texttt{Num}}~\text{(T-Num)} } {\texttt{(+ 1 2) : \texttt{Num}}}~\text{(T-Add)} \quad \dfrac{~}{3 : \texttt{Num}}~\text{(T-Num)} } {\texttt{(+ (+ 1 2) 3) : \texttt{Num}}} ~\text{(T-Add)}
What happens if we try to give a type to an ill-typed term? Then, we get stuck: there are no valid derivation rules to apply. For instance:
\dfrac{???} {\texttt{(+ 1 \#t)} : ~?}
There are no rules to apply here since the second argument does not have type Num, so we would conclude that this term is not well-typed according to our type system rules.
8.2 Typing Rules for STLC
Now let’s give typing rules for the simply-typed lambda calculus. Recall the implementation in figure 12. Notice that the type-of function has an extra argument, the environment env, that is necessary for typechecking lambda terms and variables. So, our typing judgment needs to also include this environment as part of its definition. We will call the typing environment \Gamma. You can think of \Gamma as the math version of a hashtable: it maps variables to their types.
The symbol \vdash is called the turnstile symbol. Traditionally, it is used in mathematics and logic to define a ternary relation: it is notation for defining a relation between premises, contexts, and conclusions. We will see, through examples, how it is used, and clarify this point a bit later.
Using \Gamma, we will define a typing judgment of the form \Gamma \vdash e : \tau, which says "in context \Gamma, the term e has type \tau".
\dfrac{~}{\Gamma \vdash n : \texttt{Num}}~\text{(T-Num)}\dfrac{ x : \tau \in \Gamma}{\Gamma \vdash x : \tau}~\text{(T-Var)}\dfrac{\Gamma, x : \tau_1 \vdash e : \tau_2} {\Gamma \vdash \lambda x : \tau_1 . ~ e : \tau_1 \rightarrow \tau_2}~\text{(T-Abs)}\dfrac{\Gamma \vdash e_1 : \tau_1 \rightarrow \tau_2 \qquad \Gamma \vdash e_2 : \tau_1} {\Gamma \vdash e_1 ~ e_2 : \tau_2}~\text{(T-App)}
Figure 14: Typing rules for the simply-typed lambda calculus with numbers.
At first these definitions can look a bit scary, but they become comfortable with enough practice. First, let’s see how the type context \Gamma is used when drawing a typing derivation.
The (T-Num) rule holds in any context, so the following is a valid derivation:
\dfrac{~}{x : \texttt{Num} \vdash 4 : \texttt{Num}}~\text{(T-Num)}
This derivation says "in the type context where x is a variable with type Num, the term 4 has type Num".
Of course, this rule also holds in the empty type context where there are no other variables in scope, which we write in the following derivation tree:
\dfrac{~}{\vdash 4 : \texttt{Num}}~\text{(T-Num)}
Next, the T-Var rule simply looks into the environment to see which type a variable has. For instance, the following derivation tree shows that x has type Num in the context x : \texttt{Num}, y : \texttt{Num}:
\dfrac{x:\texttt{Num} \in x:\texttt{Num}, y:\texttt{Num}}{x:\texttt{Num}, y:\texttt{Num} \vdash x : \texttt{Num}}~\text{(T-Var)}
The key usage of the type context is that it lets you move variables into it during derivation. Here is what that looks like:
\dfrac{\dfrac{x : \texttt{Num} \in x : \texttt{Num}}{x : \texttt{Num} \vdash x : \texttt{Num}}~\text{(T-Var)}} {\vdash \lambda x : \texttt{Num} . x : \texttt{Num} \rightarrow \texttt{Num}}~\text{(T-Lam)}
Notice how the type of x was added into the type context in the premise. This is the key usage of the turnstile operator. Finally, we can give a more complicated derivation involving function calls:
\dfrac{ \dfrac{\dfrac{x : \texttt{Num} \in x : \texttt{Num}}{x : \texttt{Num} \vdash x : \texttt{Num}}~\text{(T-Var)}} {\vdash (\lambda x : \texttt{Num}. x) : \texttt{Num} \rightarrow \texttt{Num}}~\text{T-Lam} \qquad \dfrac{~}{\vdash 4 : \texttt{Num}}~\text{(T-Num)} } {\vdash (\lambda x : \texttt{Num}. x)~4 : \texttt{Num}}~\text{T-App}
Exercises:
Write down at least 3 observationally inequivalent programs that that are members of the the following types:
\texttt{Num} \rightarrow \texttt{Num}
\texttt{Num} \rightarrow (\texttt{Num} \rightarrow \texttt{Num})
(\texttt{Num} \rightarrow \texttt{Num}) \rightarrow (\texttt{Num} \rightarrow \texttt{Num})
Draw typing derivations for the following STLC programs, or state that no valid tree exists:
\lambda x : \texttt{Num} . (\lambda y : \texttt{Num} \rightarrow \texttt{Num} . (y ~ x))
8.3 Consequences of Simple Types
A question you might have about the type system for the simply-typed lambda calculus is: how expressive is it? Put differently: are there any syntactic lambda terms that we could run in our untyped lambda calculus that are not well-typed?
The answer is yes! Consider the untyped self-application function \omega = \lambda x . x~x. We would like to use this function in our simply-typed lambda calculus. To do this, we must annotate the argument x with some type. Which type should we choose?
Suppose we gave it type Num. Does the following term typecheck?
\lambda x : \texttt{Num} . x~x
No, it doesn’t, because we are trying to call x but it is not a function. Okay, let’s try to use a function type instead. Does this type check?
\lambda x : \texttt{Num} \rightarrow \texttt{Num} . x~x
It’s not immediately obvious, so let’s try to draw a derivation tree and see if we get stuck:
\dfrac{\dfrac{???} {x : \texttt{Num} \rightarrow \texttt{Num} \vdash x~x : ~ ?}~\text{(T-App)}} {\vdash \lambda x : \texttt{Num} \rightarrow \texttt{Num} . x~x : ~ ?}~\text{(T-Lam)}
Notice that at the ??? point in the tree, we are stuck. We know that x has type Number -> Number, but it is being called with itself as an argument, which has type Number -> Number; this is an invalid function call.
There are richer types than simple types that do permit solutions to the equation \tau \cong \tau \rightarrow \tau. It is possible to write recursive types that satisfy this. We will cover this later if we have time.
8.4 A Typing Derivation Tree Datastructure
Now for a big idea: derivation trees are just data! We can think of typing derivation trees themselves as surface syntax. Then, we can give an abstract syntax datatype for representing this surface syntax as follows:
(define-type TyTree ; contract: e = (numI n), t = numT [T-Num (e : IfLang) (t : IfType)] ; contract: e = (boolI b), t = boolT [T-Bool (e : IfLang) (t : IfType)] ; contract: ; e = (ifI g thn els) ; g has type bool ; t-g proves g has type boolT ; t-thn proves thn has type T ; t-els proves els has type T ; ty = T [T-If (e : IfLang) (t-g : TyTree) (t-thn : TyTree) (t-els : TyTree) (t : IfType)] ; contract: ; e = (addI e1 e2) ; t-e1 proves e1 has type numT ; t-e2 proves e2 has type numT ; ty = numT [T-Add (e : IfLang) (t-e1 : TyTree) (t-e2 : TyTree) (t : IfType)])
Programming languages with more advanced features than Plait have rich facilities for enforcing the kinds of contracts that TyTree places on its data. For instance, the Lean or Rocq programming languages have rich type systems that make it possible to enforce these contracts statically.
Let’s see some examples of how to manually construct typing derivation tree data structures that correspond to typing derivation trees. For example, consider the following typing derivation tree:
\dfrac{\dfrac{~}{\texttt{1 : Num}}~\text{(T-Num)} \quad \dfrac{~}{\texttt{2 : Num}}~\text{(T-Num)} }{\texttt{(+ 1 2) : Num}}~\text{(T-Add)}
The above derivation tree is surface syntax that parses to the following typing derivation tree datastructure:
(T-Add (addI (numI 1) (numI 2)) (T-Num (numI 1) (numT)) (T-Num (numI 2) (numT)) (numT))
Now, it’s natural to design a procedure for automatically building a typing derivation tree from a particular IfLang term. This looks a lot like the type-of function:
(define (get-type [t : TyTree]) (type-case TyTree t [(T-Num _ t) t] [(T-Bool _ t) t] [(T-If _ _1 _2 _3 t) t] [(T-Add _ _1 _2 t) t])) (define (build-ty-tree e) (type-case IfLang e [(numI n) (T-Num e (numT))] [(boolI b) (T-Bool e (boolT))] [(ifI g thn els) ; let ty-g be derivation tree for g ; ty-thn for thn ; ty-els for els ; check that ty-g shows g has ; type boolT ; check that ty-thn shows thn ; has type T ; check that ty-es shows els ; has type T ; build a return derivation tree ; raise a runtime error if ; any step fails (let [(ty-g (build-ty-tree g)) (ty-els (build-ty-tree els)) (ty-thn (build-ty-tree thn))] (if (equal? (get-type ty-g) (boolT)) (if (equal? (get-type ty-els) (get-type ty-thn)) (T-If e (get-type ty-els) ty-g ty-thn ty-els) (error 'type-error "arms don't match")) (error 'type-error "guard isn't bool") ))] [(addI e1 e2) (let [(ty-e1 (build-ty-tree e1)) (ty-e2 (build-ty-tree e2))] (if (and (equal? (get-type ty-e1) (numT)) (equal? (get-type ty-e2) (numT))) (T-Add e (numT) ty-e1 ty-e2) (error 'type-error "adding non-numbers")))]))
One way to understand the build-ty-tree function is that it builds a proof that a term has a particular type by applying the axioms and judgments of our type system.
8.5 Properties of Type Systems
Now we can begin to explore properties of type systems themselves. For the rest of this section we will explore the properties of the IfLang language, since its structure is simpler. Let’s start with a simple example: if an IfLang program is a well-typed Boolean value, then it must be either true or false. How can we prove this fact?
First, let’s define a new judgment that tells us whether or not a program is a value:
\dfrac{~}{\texttt{value (boolE b)}}~(\text{V-Bool}) \qquad \dfrac{~}{\texttt{value (numE b)}}~(\text{V-Num})
Now, we can formally state our first theorem of this course:
Theorem (Boolean canonical forms): If \Gamma \vdash e : \texttt{Bool} and value e, then e is boolI b.
(V-Bool). Then our theorem is true by assumption.
(V-Num). Then, by the T-Num rule, e has type Num, which contradicts our assumption that e has type Bool.
Case analysis: we are allowed to perform case-by-case reasoning based on inductive constructions like value, typing rules, or stepping rules.
Contradiction: If we can show that two assumptions are simultaneously contradictory, then we are permitted to conclude our (sub)-argument.
What is interesting is that we can convert this formal argument into a program:
(define-type ValTree ; contract: e = (boolI b) [V-Bool (e : IfLang)] ; contract : e = (numI n) [V-Num (e : IfLang)]) ; preconditions: ; - ty is a typing derivation tree for e ; - ty proves e has type Bool ; - val is a value derivation tree for e (define (canonical-form [e : IfLang] [ty : TyTree] [val : ValTree]) (type-case ValTree val [(V-Bool v) v] [(V-Num n) (error 'unreachable "unreachable path")]))
The program canonical-form looks trivial, but what is interesting about it is that it will never crash on any input that satisfies its preconditiongs. This is because the typing derivation tree ty contains in it a proof that the program is well-typed, and hence the V-Num arm of the type-case will never be reached.