Language Specification — Automath
by Alex Nelson, 16 July 2016
Overview. We will attempt to formulate the language specification for AUT-QE without type inclusion (and, by eliminating quasi-expressions, AUT-68). First we will develop the syntax in a slight variant of ABNF. Second we will formulate denotational semantics. Third we will review operational semantics for the underlying λ-typed λ-calculus, what de Groote calls λλ. We leave as an exercise for the reader formalizing this in Coq.
Syntax
1.1. Input Encoding. There is no specific restriction on the encoding, it may be ASCII, Unicode, or some other encoding, providing we can distinguish three character classes: (1) numeral digits, (2) punctuation, (3) “letters”. This is used loosely, as we expect, e.g., ideograms (think: Chinese, Kanji, etc.) should be considered “letters”, and exotic punctuation we might find in classical Chinese would be considered “punctuation”.
1.2. Identifiers.
An Automath identifier consists of 1 or more characters which may be a
“letter”, a “numeric digit”, underscore _
, backspace (in C it’s \b
,
in ASCII and Unicode it is code point 0x08
), a backquote
`
, or a single quote '
.
<identifier character> ::= <letter> | <number> | <backspace> | <backtick> | "_" | "'" <backtick> ::= %0x60 <backspace> ::= %0x08 <identifier> ::= <identifier character> | <identifier> <identifier character>
1.3. Input Program (“Books”). The Automath checker expects input from flat files, and the input has historically been referred to as “books”. (Compared to, say, C—which would have called it a “program” instead of a “book”.) The analogy is made to physically tangible books, which consist of lines. (Just as a C program consists of statements.)
<book> ::= <blank> | <book> <line> | <book> ";"
1.4. Statements (“Lines”). A line is analogous to a “statement” in imperative programs. Automath lines either define a function, or assist in defining a function. In this light, we can specify the components and their function:
- Automath Context: The named parameters for a function are indicated by the “Automath Context” part of the line. These are analogous to “stacks of named parameters”, and the context part is just the last identifier on the stack (sometimes referred to as the “indicator string”).
- Identifier Part: The identifier of the definition (the definiendum, if you will).
- Category Part: Specifies the type of the entity being defined.
- Body: The body of the definition (the definiens, if you will).
We also have three types of lines:
- Expansion Block: Since the named parameters are assembled as a stack, we have lines which assist in “pushing” a new named parameter. This is done in an expansion block line.
- Primitive Notions: Automath gives us nothing, so we need to introduce new types and constants as “builtins” to the mathematical theory being developped. This is handled in primitive notion lines.
- Function Definitions: The bulk of the work lies in defining functions, which is done in abbreviation or function definition lines. Constants are seen as functions with zero parameters.
The grammar of a line looks like the following:
; <ESTI> <expression> is the component telling us the type of the definition <ESTI> ::= "'E'" | "_" <backspace> "E" | ";" | ":" ; <assign> <expression> tells us the body of the line <assign> ::= ":=" | "=" <PN> ::= "PRIM" | "'prim'" | "PN" | "'pn'" | "???" <primitive notion line> ::= <context> <identifier> <assign> <PN> <ESTI> <expression> | <context> <identifier> <ESTI> <expression> <assign> <PN> ; Expansion blocks are for pushing a named parameter onto the context <EB> ::= "EB" | "'eb'" | "---" <EB line> ::= <context> <identifier> <assign> <EB> <ESTI> <expression> | <context> <identifier> <ESTI> <expression> <assign> <EB> <definition line> ::= <context> <identifier> <assign> <expression> <ESTI> <expression> | <context> <identifier> <ESTI> <expression> <assign> <expression> <line> ::= <primitive notion line> | <EB line> | <definition line>
1.5. Symbols and Module System (“Paragraphs”).
Identifiers are used in definitions, symbols are used for
references. What’s the difference? Well, Automath has a module system,
and referring to an identifier defined inside a module requires
specifying the module’s identifier too (in C++, we have to write
std::cout
to specify we want cout
from std
and not, say,
logger
).
Automath modules are called “paragraphs”, keeping up with the
analogy to physical books, and are opened by new lines starting with a
+
symbol (and closed on a newline with --
). To cite an identifier
defined in a paragraph, we have to write identifier"paragraph_id"
.
Part of the contract with a line is the identifier being defined must be free: it is not used in the current paragraph. If it is not a free variable, then an error is thrown and the program terminates.
<symbol> ::= <identifier> | <identifier> DQUOTE <identifier list> DQUOTE ; Used to specify the paragraph <identifier list> ::= <empty> | <identifier> | <identifier list> "." <identifier> | <identifier list> "-" <identifier> <par open> ::= "+" <identifier> ; "--" closes the current paragraph <par close> ::= "-" <identifier> | "--"
1.6. Automath Context.
A context is a stack of named parameters. The idea is we might want
something like an “indexed family of types”. Some examples might
include: “Let G be a group, and S a set. A ‘group action’ consists
of a function…”, we would have G:Group, S:Set
be the context in the
line defining a group action. So to specify a group action, we first
need to identify (a) which group we’re working with, and (b) which set
we’re working with.
There’s nothing in, say, C++ we can compare this to…because it is unique
to mathematics. One stab at it might be in an object oriented language,
to have the constructor of a GroupAction
class have a constructor of
the form GroupAction(Group G, Set S)
.
In category theory, this is remarkably similar to the notion of a generalized element. This perspective is vindicated when examining the denotational semantics for contexts.
With the group action example, in Automath we might have something like the following sequence of lines:
* Group : Type := PN
groups are primitive notions* Set : Type := PN
so are sets* G:Group := ---
Let G be a group.G * S:Set := ---
Let S be a set.S * GroupAction:Type := PN
Then a group action is a primitive notion
And we can continue specifying its properties.
In “real” type theory, we use a turnstile to separate the context from a
statement, writing something like “Γ ⊢ t:T”. But in Automath, we write
Γ * t:T
because no one thought |-
looks more like a turnstile than
*
. Further, Automath writes only the top element of the stack, and
calls it the “Indicator String”.
<turnstile> ::= "*" | "@" <context> ::= <empty> | <turnstile> | <symbol> <turnstile>
1.7. Expressions.
Automath’s expressions boil down to a lambda-typed lambda calculus,
meaning a lambda expression can produce either a term or a type. Quirks
in the notation should be noted, however: lambda abstractions are
denoted [x:type] body
, and applications are <x>f
instead of f(x)
or f x
.
One cautionary note: we described contexts as a stack of parameters
needed to specify a type. (So to specify a group action, we first need
to specify the group and the set acted upon.) Once we have identified
these parameters, we get a type. This is “instantiation” in Automath,
and would be identified by writing GroupAction(group, set)
. This is
distinct from application in lambda calculus, since no lambda
expressions are involved.
<expression> ::= "TYPE" | "'type'" | "PROP" | "'prop'" | <symbol> | <instantiation> | <abstraction> | <application> ; instantiation related grammar <instantiation> ::= <symbol> "()" | <symbol> "(" <expression list> ")" <expression list> ::= <expression> | <expression list> "," <expression> ; lambda abstrations <abstraction> = "[" <identifier> <comma> <expression> "]" <expression> <comma> = "," | ":" ; application <application> = "<" <expression> ">" <expression>
1.8. Comments.
The last syntactic component of the grammar, comments. There are one
line comments and multi-line comments, which will be ignored by
Automath. We are a little sloppy with the grammar here, with the
understanding <multiline comment>
is arbitrary text which does not
contain }
, and <text>
is arbitrary text which does not contain a
carriage return of any sort.
<comment> ::= "#" <text> <newline> | "{" <multiline comment> "}"
Denotational Semantics
2.0. The basic game plan is: we show Automath has a denotational
semantic functor 〚...〛
whose image is a lambda-typed lambda calculus.
(The next section discusses what lambda-typed lambda calculus is,
specifically its operational semantics.)
We adjust Automath slightly to write out the entire context rather than just the “indicator string” (i.e., the “top” of the context “stack”).
Remark.
This denotational semantic presentation works for AUT-QE-NTI (AUT-QE
with no type inclusions). Type inclusion breaks the semantics in a bad
and unpleasant way. It might be worth-while to consider examining
“sub-kinding” instead of type inclusion, i.e., treat [x:T]type
as a
“subtype” of type
.
(End of Remark)
2.1. Contexts.
We have an inductive construct for the context.
The base case is 〚⊢ t:T〛: 1 →〚T〛
, and the inductive rule is
〚Γ ⊢ t:T〛:〚Γ〛→〚T〛
where 〚Γ〛
expands out to the product of
types in the context. So:
〚c_1:T_1 ⊢ t:T〛:〚T_1〛→〚T〛
,〚c_1:T_1, c_2:T_2 ⊢ t:T〛:〚T_1〛×〚T_2〛→〚T〛
,- etc.
2.2. Lambda Calculus. A lambda expression is a morphism. We would have, for example,
〚⊢ (λx:nat,y:nat . x+y) : nat × nat -> nat〛: 1→Hom(ℕ×ℕ,ℕ)
describe addition as a function: it’s a morphism acting on 2 natural numbers, and produces a natural number. This is represented by the indicator function which picks out that particular morphism. More generally,
〚⊢ (λx:T . b) : T -> T'〛: 1→Hom(〚T〛,〚T'〛)
and with a general context we have
〚Γ ⊢ (λx:T . b) : T -> T'〛:〚Γ〛→ Hom(〚T〛,〚T'〛)
Likewise lambda-calculus “application” amounts to an evaluation map in the ambient category.
Operational Semantics
3.0. I am…going to punt most of this, because Dr. de Groote has already done a lot of hard work in his paper cited below. So showing soundness, strong normalization, etc., we will simply assume.
3.1. Definition: λ-Expressions. The set E of λ-expressions is inductively defined as follows:
- The constant τ ∈ E
- If x is a variable, then x ∈ E
- If x is a variable and A, B ∈ E, then (λ x : A. B) ∈ E
- If A, B ∈ E, then (A B) ∈ E.
Remark.
The constant τ is analogous to Type
in other type theories, or
Barendregt’s *
.
3.2. Definition: λ-Kinds. The set K of λ-kinds is defined inductively as follows:
- The constant κ ∈ K
- If x is a variable, A ∈ E, and B ∈ K, then (λ x : A. B) ∈ K
3.3. Typing Contexts. A typing context is a sequence of declarations x : A where x is a variable, and A∈E. Any context Γ, x : A is such that
- the variable x is not declared in Γ
- all the variables occuring free in the A are declared in Γ.
3.4. Typing Judgments. We define the notion of well-typed λ-expressions by providing a proof system to derive typing judgments of the shape
- Γ ⊢ A : B
where Γ is a typing context, and A, B are λ-expressions. Although A and B are in the same syntactic categories, we may sometimes refer to A as a “term” and B as a “type” for convenience.
Remark. We also have a similar judgement where B is a λ-kind. Strictly speaking, the two types of judgements are different. Nevertheless, we abuse notation, and make no distinction between them.
3.5. Beta Reduction. Since this is a lambda-calculus at heart, we have beta reduction indicated by →β. Taking its reflexive, transitive, symmetric closure, we get beta-conversion denoted =β.
3.6. Proof System. Let Γ be a typing context, A be a λ-expression, and B be either a λ-kind or a λ-expression. A typing judgment of λλ is an expression of the form
- Γ ⊢ A : B
derivable according to the following proof system:
-
Constant ⊢ τ : κ -
Γ ⊢ A : B Variable Γ, x:A ⊢ x:A -
Γ, x:A ⊢ B : C Abstraction Γ ⊢ (λx:A.B) : (λx:A.C) -
Γ ⊢ A : (λx:C.D) Γ ⊢ B:C Application Γ ⊢ (A B) : D[x:=B] -
Γ ⊢ A : B Γ ⊢ C:D Weakening Γ, x:C ⊢ A : B -
Γ ⊢ A : B Γ ⊢ C:D if B =β C; type conversion Γ ⊢ A : C
Given some λ-expression A and typing context Γ, we say the λ-expression A is “Well-typed” according to the context Γ if and only if there exists a λ-kind or λ-expression B such that Γ ⊢ A : B.
Given some λ-expression A, we call A a “Correct λ-expression of λλ” if and only if A is well-typed according to the empty context.
Remark 1. Observe, in the rule of application, we necessarily have C be a λ-expression (and hence B is one, too). But D may be a λ-kind or a λ-expression. (End of Remark)
Remark 2.
The abstraction rule of inference tells us that semantically parameters
in the context are “equivalent” (in a precise sense) to arguments in a
lambda abstraction. The difference is, to change from the x:T * t:T' := body
to a lambda expression requires us to make the type a lambda-type too:
* [x:T]t : [x:T]T' := body
is the semantically equivalent line.
(End of Remark)
References
- N.G. de Bruijn, “The Mathematical Language Automath, its Usage, and Some of its Extensions”. In Symposium on Automatic Demonstration (Eds. M Laudet, D Lacombe, and M Schuetzenberger), pp 29-61; and reprinted in Selected Papers on Automath, pp 73–100.
- N.G. de Bruijn, “A survey of the project Automath”. In To H.B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism (Eds. JP Selding and JR Hindley), pp 579–606. Also in Selected Papers on Automath, pp. 141–161
- Diederik T. van Daalen, “The Language Theory of Automath”. PhD Thesis, pdf 1980.
- Philippe de Groote, “Defining λ-Typed λ-Calculi by Axiomatizing the Typing Relation”. In: 10th Annual Symposium on Theoretical Aspects of Computer Science, STACS’93, (Eds. P. Enjalbert, A. Finkel, and K.W. Wagner), Lecture Notes in Computer Science, Vol. 665, Springer-Verlag (1993), pp. 712-723. Eprint
- Benjamin C. Pierce, Types and Programming Languages. MIT Press, 2002.
- I. Zandleven, “A Verifying Program for Automath”. E1 in Selected Papers on Automath.