Data Structures — Automath
by Alex Nelson, 2 January 2016
Lets examine the basic data structures underpinning Automath. This post could be viewed as an “implementation in pseudocode”. There will be many pictures translating code snippets to “what we want under the hood”.
Fortunately, there are really only a few aspects to the language: statements (“lines”), expressions, and lastly paragraphs. We can start simple, then incorporate more and more features.
Lines
There are three types of lines in Automath:
- Lines defining functions and constants
- Lines introducing primitive notions (“builtins” for a given mathematical theory)
- Lines expanding the context (named arguments for function declarations)
There’s a basic structure underlying all these lines, which we’ll examine first. The sole purpose for lines: defining functions (the first two cases listed), or specifying the parameters used in their definition (the last case listed).
Consider the following snippet:
* prop : TYPE := PRIM
* [a:prop] proof : TYPE := PRIM
* term : TYPE := PRIM
We need to keep track of what has already been defined, which could be done in a dictionary. The lines form a doubly linked list:
Alternatively, in a language like Go where this is seen as bad form, I suppose we could use an array instead:
The only problem with the array approach is, well, we don’t know how many lines a file has a priori. And remember, a “line” isn’t necessarily a single line of the file, we could rewrite the entire Automath program to be a single-line file.
Punchline: We need some kind of linear collection for line
s. The
exact details are language-specific (duh), but we don’t need to specify
anything so far except ordering the lines matters a great deal: the
introduction to the text must occur before the end.
Contexts
The code snippet we have considered is more complicated than it appears. Why? Because it’s really equivalent to the following:
{1} * prop : TYPE := PRIM
{2} * a : prop := ---
{3} a * proof : TYPE := PRIM
{4} * term : TYPE := PRIM
We see line 2 introduces a variable (it’s an expander block line), which creates a new stack of named parameters. For this particular snippet, we have the following data structure representation:
To see this stack structure more explicitly, consider the following snippet:
# >>> * [x:term][y:term] eq : prop := PRIM
# expands to
{1} * x : term := ---
{2} x * y : term := ---
{3} y * eq : prop := PRIM
The EB lines (lines 1 and 2) carries with them a stack of identifiers
describing the implicit arguments. When y
is placed into the context
of eq
, it borrows the y.args
field to determine the parameters
needed in defining the predicate.
The y.args
field determines the named parameters used in defining
eq : prop
. In a C/Blub language, this amounts to defining prop
eq(term x, term y)
function.
Why keep this list structure? We could simplify things if we copy
the args
into the function definition, and keep track of identifiers
and their types in some “dictionary” global variable (egads!). This has
the advantage of keeping things simple, using less memory, and less
complex. What could go wrong?
If we have a module system (which is what paragraphs are designed for!), then we have to prevent variable name collisions.
Functions and Primitive Notions
So far we have discussed contexts in depth, but we have yet to investigate functions and primitive notions. We adopt the convention that constants are just functions with zero parameters.
The body
is just an expression, which is the next topic of
discussion in the next section.
But a primitive notion doesn’t have a body: it’s primitive, after all. It’s just a “builtin function” for a given theory. Consequently, the only thing distinguishing a primitive notion from a defined function is whether the body is present or not.
In pseudocode, the metalanguage can test for such things as follows:
boolean isPrimitive(Definition d) {
return (d.body==null);
}
int arity(Definition d) {
return d.args.length;
}
boolean isConst(Definition d) {
return (arity(d)==0);
}
boolean isFunction(Definition d) {
return (!isPrimitive(d) && !isConst(d));
}
This lets us use one data structure for two different types of lines!
Lines only Define Functions, or Help us in the Process.
This is one punchline we should note. Contexts are lines with an
arglist
field, more or less. Functions are lines with a context
(specifying the named parameters for the function) and a body (telling
us what the function does). The body requires previously defined
entities, or contexts. The only reason we have contexts is to specify
the parameters for functions.
Expressions
There’s four or so expressions in Automath, and it’s far simpler than the complicated structure of lines and contexts. Why is this so? Because Automath boils down to a typed lambda calculus, so we have lambda abstractions and applications. But we also have function calls, since lines just define functions. Morally, there is no semantic difference between lambda abstractions and function definitions.
We need to introduce new types, and have “families of types”. How can we
make this more clear and rigorous? We need kinds, i.e., “types of
types”. Automath has two: PROP
for propositions, and TYPE
for…everything else. (AUT-68, under the hood, made PROP=TYPE
; but
AUT-QE fixed this so they were no longer identical.)
But for our framework, we need a base Expression
class.
class Expression {
String id;
};
Kinds
We could implement this either as a pair of global variables (ugh!) or
as a factory method producing identical objects for a given kind. In
either event, the object in question must tell us (a) it’s a kind, (b)
whether it’s TYPE
or PROP
, and (c) have some identifier.
class Kind extends Expression {
String id; // "TYPE" or "PROP"
int code; // = 0 for PROP, 1 for TYPE
};
Types are defined as primitive notions whose type
refers to the
specific Kind
in question.
Lambda Abstraction
The Automath notation for Lambda Abstraction is [x:type] body
instead
of the more common “λ x:type . body” we’d see in lectures and books.
Not only can Automath lambda expressions produce values from input, we
can go further and produce types. That is to say, we can write
something like [x:term]TYPE
to specify a function which, given some
term (say, x0
), will produce a type. Although analogous to
dependent product type,
there are subtle differences which we can only refer the interested
reader to the literature. For us, we’ll just pretend they’re “the same”.
Sweeping all these subtle points under the rug, the basic structure of a lambda abstraction would be:
class LambdaAbstraction extends Expression {
String variableId;
Expression *variableType;
Expression *body;
};
Application
Since Automath just checks expression, we don’t have to worry about
the subtle problems of simplification. For example, in Python, (lambda x: x+1) 2 => 3
.
But Automath would have us write <2>[x:nat]Successor(x)
which is
semantically Successor(2)
, and we can stop there…or go further, all
the way to the primitive notions in arithmetic since 2:=Successor(Successor(0))
.
At minimum, all we appear to need for <x>f
is the function f
and an
argument we’ll apply to it x
:
// <x>f is represented by a pair of pointers
class Application extends Expression {
Expression *fun; // f
Expression *arg; // x
};
Function Calls
We now have a list of arguments applied to a function, instead of a
single argument. I suppose we could just treat it as f(x,y)=<<y>x>f
by
Currying. There is some subtlety here, since we don’t want to call a
function with too many parameters, e.g. eq(x,z,y)
should throw an
error.
At the same time, Automath allows a “shorthand” convention where, if we
call a function with too few parameters, the resulting expression is a
function of the remaining parameters needed. More precisely, Automath
takes the peculiar convention of treating the parameters as a stack, and
applying a function to a smaller list of arguments amounts to “popping”
the stack. So if we had a function f(x,y,z)
, then we could construct a
function g(x,y)
by simply passing in z
: g(x,y) = f(z)
.
In pseudocode, we would have
Expression call(Definition fun, Expression[] args) {
// precondition: arity must not be less than number of arguments
assert(arity(fun)<args.length)
if (arity(fun) == args.length) {
return new Term(fun, args);
}
else {
Expression[arity(fun)] paddedParameters = new Expression[arity(fun)];
int numberUnfixedParameters = arity(fun)-args.length
// Copy over the first 'arity(fun)-args.length' named parameters
for(int i=0; i<numberUnfixedParameters; i++)
paddedParameters[i] = fun.arguments()[i];
// Copy over the arguments passed in
for(int i=0; i<args.length; i++)
paddedParameters[i+numberUnfixedParameters] = args[i]
return new Term(fun, paddedParameters);
}
}
Paragraphs
This is the last thing we need to discuss before calling it a day: paragraphs. They form a crude module system for Automath. We can have them nested arbitrarily deep, we can re-open them later on, and we expect it to behave like a running linear list of lines.
Lets give some examples of how this would look.
* l0
* l1
+ p # open paragraph p
* l2
* l3
- p # close paragraph p
* l4
This would amount to the following situation:
But to make things worse, we can re-open paragraphs that have been previously closed. For example:
* l0
* l1
+ p # open paragraph p
* l2
* l3
- p # close paragraph p
* l4
* l5
+* p # reopen paragraph p
* l6
* l7
- p # close paragraph p again
* l8
* l9
Or graphically, we’d have the following situation (with the paragraph linkage in red):
Paragraphs themselves form a linear list. When we have multiple paragraphs in a row, something like
+ p1
# ...
- p1
+ p2
# ...
- p2
+ p3
# ...
- p3
# etc.
We want to treat the paraph siblings p1
, p2
, etc., as all living in
the same list or array. Why would this matter?
We’d like to keep track of all the subparagraphs contained within a given paragraph. That’s the real motivation for this feature. It makes tracking down a line or function or variable definition easier (or even possible).
Concluding Remarks
We reviewed the basic statement structure in Automath, which amounted to making definitions…or introducing parameters to be used in a definition. Then we examined the basic expressions admitted, namely, function calls, lambda abstractions, and applying a lambda abstraction to an expression.
Although the statement structure is fairly straightforward (after all, there was only one “type” of statement), the expressions were far more complicated. Implementation in a language not permitting inheritance or pointer casting (e.g., Go) may prove challenging.
The only piece of machinery left to examine is how Automath checks expressions for correctness, but this is remarkably simple type checking (is the function passed the correct argument? Is the definition coherent? Etc.). This is the subject for another topic of discussion, however.