Language Specification — Automath

by Alex Nelson, 16 July 2016

Overview. We will attempt to formulate the language specification for AUT-QE without type inclusion (and, by eliminating quasi-expressions, AUT-68). First we will develop the syntax in a slight variant of ABNF. Second we will formulate denotational semantics. Third we will review operational semantics for the underlying λ-typed λ-calculus, what de Groote calls λ^λ. We leave as an exercise for the reader formalizing this in Coq.

Syntax

1.1. Input Encoding. There is no specific restriction on the encoding, it may be ASCII, Unicode, or some other encoding, providing we can distinguish three character classes: (1) numeral digits, (2) punctuation, (3) “letters”. This is used loosely, as we expect, e.g., ideograms (think: Chinese, Kanji, etc.) should be considered “letters”, and exotic punctuation we might find in classical Chinese would be considered “punctuation”.

1.2. Identifiers. An Automath identifier consists of 1 or more characters which may be a “letter”, a “numeric digit”, underscore _, backspace (in C it’s \b, in ASCII and Unicode it is code point 0x08), a backquote `, or a single quote '.

<identifier character> ::= <letter>
                         | <number>
                         | <backspace>
                         | <backtick>
                         | "_"
                         | "'"

<backtick> ::= %0x60
<backspace> ::= %0x08

<identifier> ::= <identifier character>
               | <identifier> <identifier character>

1.3. Input Program (“Books”). The Automath checker expects input from flat files, and the input has historically been referred to as “books”. (Compared to, say, C—which would have called it a “program” instead of a “book”.) The analogy is made to physically tangible books, which consist of lines. (Just as a C program consists of statements.)

<book> ::= <blank>
         | <book> <line>
         | <book> ";"

1.4. Statements (“Lines”). A line is analogous to a “statement” in imperative programs. Automath lines either define a function, or assist in defining a function. In this light, we can specify the components and their function:

Automath Context: The named parameters for a function are indicated by the “Automath Context” part of the line. These are analogous to “stacks of named parameters”, and the context part is just the last identifier on the stack (sometimes referred to as the “indicator string”).
Identifier Part: The identifier of the definition (the definiendum, if you will).
Category Part: Specifies the type of the entity being defined.
Body: The body of the definition (the definiens, if you will).

We also have three types of lines:

Expansion Block: Since the named parameters are assembled as a stack, we have lines which assist in “pushing” a new named parameter. This is done in an expansion block line.
Primitive Notions: Automath gives us nothing, so we need to introduce new types and constants as “builtins” to the mathematical theory being developped. This is handled in primitive notion lines.
Function Definitions: The bulk of the work lies in defining functions, which is done in abbreviation or function definition lines. Constants are seen as functions with zero parameters.

The grammar of a line looks like the following:

; <ESTI> <expression> is the component telling us the type of the definition
<ESTI> ::= "'E'" | "_" <backspace> "E" | ";" | ":"

; <assign> <expression> tells us the body of the line
<assign> ::= ":=" | "="

<PN> ::= "PRIM" | "'prim'" | "PN" | "'pn'" | "???"
<primitive notion line> ::= <context> <identifier> <assign> <PN> <ESTI> <expression>
                          | <context> <identifier> <ESTI> <expression> <assign> <PN>

; Expansion blocks are for pushing a named parameter onto the context
<EB> ::= "EB" | "'eb'" | "---"
<EB line> ::=  <context> <identifier> <assign> <EB> <ESTI> <expression>
            | <context> <identifier> <ESTI> <expression> <assign> <EB>

<definition line> ::= <context> <identifier> <assign> <expression> <ESTI> <expression>
                    | <context> <identifier> <ESTI> <expression> <assign> <expression>

<line> ::= <primitive notion line>
         | <EB line>
         | <definition line>

1.5. Symbols and Module System (“Paragraphs”). Identifiers are used in definitions, symbols are used for references. What’s the difference? Well, Automath has a module system, and referring to an identifier defined inside a module requires specifying the module’s identifier too (in C++, we have to write std::cout to specify we want cout from std and not, say, logger).

Automath modules are called “paragraphs”, keeping up with the analogy to physical books, and are opened by new lines starting with a + symbol (and closed on a newline with --). To cite an identifier defined in a paragraph, we have to write identifier"paragraph_id".

Part of the contract with a line is the identifier being defined must be free: it is not used in the current paragraph. If it is not a free variable, then an error is thrown and the program terminates.

<symbol> ::= <identifier>
           | <identifier> DQUOTE <identifier list> DQUOTE

; Used to specify the paragraph
<identifier list> ::= <empty>
                    | <identifier>
                    | <identifier list> "." <identifier>
                    | <identifier list> "-" <identifier>

<par open> ::= "+" <identifier>

; "--" closes the current paragraph
<par close> ::= "-" <identifier>
              | "--"

1.6. Automath Context. A context is a stack of named parameters. The idea is we might want something like an “indexed family of types”. Some examples might include: “Let G be a group, and S a set. A ‘group action’ consists of a function…”, we would have G:Group, S:Set be the context in the line defining a group action. So to specify a group action, we first need to identify (a) which group we’re working with, and (b) which set we’re working with.

There’s nothing in, say, C++ we can compare this to…because it is unique to mathematics. One stab at it might be in an object oriented language, to have the constructor of a GroupAction class have a constructor of the form GroupAction(Group G, Set S).

In category theory, this is remarkably similar to the notion of a generalized element. This perspective is vindicated when examining the denotational semantics for contexts.

With the group action example, in Automath we might have something like the following sequence of lines:

* Group : Type := PN groups are primitive notions
* Set : Type := PN so are sets
* G:Group := --- Let G be a group.
G * S:Set := --- Let S be a set.
S * GroupAction:Type := PN Then a group action is a primitive notion

And we can continue specifying its properties.

In “real” type theory, we use a turnstile to separate the context from a statement, writing something like “Γ ⊢ t:T”. But in Automath, we write Γ * t:T because no one thought |- looks more like a turnstile than *. Further, Automath writes only the top element of the stack, and calls it the “Indicator String”.

<turnstile> ::= "*" | "@"

<context> ::= <empty>
            | <turnstile>
            | <symbol> <turnstile>

1.7. Expressions. Automath’s expressions boil down to a lambda-typed lambda calculus, meaning a lambda expression can produce either a term or a type. Quirks in the notation should be noted, however: lambda abstractions are denoted [x:type] body, and applications are <x>f instead of f(x) or f x.

One cautionary note: we described contexts as a stack of parameters needed to specify a type. (So to specify a group action, we first need to specify the group and the set acted upon.) Once we have identified these parameters, we get a type. This is “instantiation” in Automath, and would be identified by writing GroupAction(group, set). This is distinct from application in lambda calculus, since no lambda expressions are involved.

<expression> ::= "TYPE" | "'type'"
               | "PROP" | "'prop'"
               | <symbol>
               | <instantiation>
               | <abstraction>
               | <application>

; instantiation related grammar
<instantiation> ::= <symbol> "()"
                  | <symbol> "(" <expression list> ")"

<expression list> ::= <expression>
                    | <expression list> "," <expression>

; lambda abstrations
<abstraction> = "[" <identifier> <comma> <expression> "]" <expression>
<comma> = "," | ":"

;  application 
<application> = "<" <expression> ">" <expression>

1.8. Comments. The last syntactic component of the grammar, comments. There are one line comments and multi-line comments, which will be ignored by Automath. We are a little sloppy with the grammar here, with the understanding <multiline comment> is arbitrary text which does not contain }, and <text> is arbitrary text which does not contain a carriage return of any sort.

<comment> ::= "#" <text> <newline>
            | "{" <multiline comment> "}"

Denotational Semantics

2.0. The basic game plan is: we show Automath has a denotational semantic functor 〚...〛 whose image is a lambda-typed lambda calculus. (The next section discusses what lambda-typed lambda calculus is, specifically its operational semantics.)

We adjust Automath slightly to write out the entire context rather than just the “indicator string” (i.e., the “top” of the context “stack”).

Remark. This denotational semantic presentation works for AUT-QE-NTI (AUT-QE with no type inclusions). Type inclusion breaks the semantics in a bad and unpleasant way. It might be worth-while to consider examining “sub-kinding” instead of type inclusion, i.e., treat [x:T]type as a “subtype” of type. (End of Remark)

2.1. Contexts. We have an inductive construct for the context. The base case is 〚⊢ t:T〛: 1 →〚T〛, and the inductive rule is 〚Γ ⊢ t:T〛:〚Γ〛→〚T〛 where 〚Γ〛 expands out to the product of types in the context. So:

〚c_1:T_1 ⊢ t:T〛:〚T_1〛→〚T〛,
〚c_1:T_1, c_2:T_2 ⊢ t:T〛:〚T_1〛×〚T_2〛→〚T〛,
etc.

2.2. Lambda Calculus. A lambda expression is a morphism. We would have, for example,

〚⊢ (λx:nat,y:nat . x+y) : nat × nat -> nat〛: 1→Hom(ℕ×ℕ,ℕ)

describe addition as a function: it’s a morphism acting on 2 natural numbers, and produces a natural number. This is represented by the indicator function which picks out that particular morphism. More generally,

〚⊢ (λx:T . b) : T -> T'〛: 1→Hom(〚T〛,〚T'〛)

and with a general context we have

〚Γ ⊢ (λx:T . b) : T -> T'〛:〚Γ〛→ Hom(〚T〛,〚T'〛)

Likewise lambda-calculus “application” amounts to an evaluation map in the ambient category.

Operational Semantics

3.0. I am…going to punt most of this, because Dr. de Groote has already done a lot of hard work in his paper cited below. So showing soundness, strong normalization, etc., we will simply assume.

3.1. Definition: λ-Expressions. The set E of λ-expressions is inductively defined as follows:

The constant τ ∈ E
If x is a variable, then x ∈ E
If x is a variable and A, B ∈ E, then (λ x : A. B) ∈ E
If A, B ∈ E, then (A B) ∈ E.

Remark. The constant τ is analogous to Type in other type theories, or Barendregt’s *.

3.2. Definition: λ-Kinds. The set K of λ-kinds is defined inductively as follows:

The constant κ ∈ K
If x is a variable, A ∈ E, and B ∈ K, then (λ x : A. B) ∈ K

3.3. Typing Contexts. A typing context is a sequence of declarations x : A where x is a variable, and A∈E. Any context Γ, x : A is such that

the variable x is not declared in Γ
all the variables occuring free in the A are declared in Γ.

3.4. Typing Judgments. We define the notion of well-typed λ-expressions by providing a proof system to derive typing judgments of the shape

Γ ⊢ A : B

where Γ is a typing context, and A, B are λ-expressions. Although A and B are in the same syntactic categories, we may sometimes refer to A as a “term” and B as a “type” for convenience.

Remark. We also have a similar judgement where B is a λ-kind. Strictly speaking, the two types of judgements are different. Nevertheless, we abuse notation, and make no distinction between them.

3.5. Beta Reduction. Since this is a lambda-calculus at heart, we have beta reduction indicated by →_β. Taking its reflexive, transitive, symmetric closure, we get beta-conversion denoted =_β.

3.6. Proof System. Let Γ be a typing context, A be a λ-expression, and B be either a λ-kind or a λ-expression. A typing judgment of λ^λ is an expression of the form

Γ ⊢ A : B

derivable according to the following proof system:

	Constant
⊢ τ : κ	Constant

Γ ⊢ A : B	Variable
Γ, x:A ⊢ x:A	Variable

Γ, x:A ⊢ B : C	Abstraction
Γ ⊢ (λx:A.B) : (λx:A.C)	Abstraction

Γ ⊢ A : (λx:C.D) Γ ⊢ B:C	Application
Γ ⊢ (A B) : D[x:=B]	Application

Γ ⊢ A : B Γ ⊢ C:D	Weakening
Γ, x:C ⊢ A : B	Weakening

Γ ⊢ A : B Γ ⊢ C:D	if B =_β C; type conversion
Γ ⊢ A : C	if B =_β C; type conversion

Given some λ-expression A and typing context Γ, we say the λ-expression A is “Well-typed” according to the context Γ if and only if there exists a λ-kind or λ-expression B such that Γ ⊢ A : B.

Given some λ-expression A, we call A a “Correct λ-expression of λ^λ” if and only if A is well-typed according to the empty context.

Remark 1. Observe, in the rule of application, we necessarily have C be a λ-expression (and hence B is one, too). But D may be a λ-kind or a λ-expression. (End of Remark)

Remark 2. The abstraction rule of inference tells us that semantically parameters in the context are “equivalent” (in a precise sense) to arguments in a lambda abstraction. The difference is, to change from the x:T * t:T' := body to a lambda expression requires us to make the type a lambda-type too: * [x:T]t : [x:T]T' := body is the semantically equivalent line. (End of Remark)

References

N.G. de Bruijn, “The Mathematical Language Automath, its Usage, and Some of its Extensions”. In Symposium on Automatic Demonstration (Eds. M Laudet, D Lacombe, and M Schuetzenberger), pp 29-61; and reprinted in Selected Papers on Automath, pp 73–100.
N.G. de Bruijn, “A survey of the project Automath”. In To H.B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism (Eds. JP Selding and JR Hindley), pp 579–606. Also in Selected Papers on Automath, pp. 141–161
Diederik T. van Daalen, “The Language Theory of Automath”. PhD Thesis, pdf 1980.
Philippe de Groote, “Defining λ-Typed λ-Calculi by Axiomatizing the Typing Relation”. In: 10th Annual Symposium on Theoretical Aspects of Computer Science, STACS’93, (Eds. P. Enjalbert, A. Finkel, and K.W. Wagner), Lecture Notes in Computer Science, Vol. 665, Springer-Verlag (1993), pp. 712-723. Eprint
Benjamin C. Pierce, Types and Programming Languages. MIT Press, 2002.
I. Zandleven, “A Verifying Program for Automath”. E1 in Selected Papers on Automath.