Data Structures — Automath

by Alex Nelson, 2 January 2016

Lets examine the basic data structures underpinning Automath. This post could be viewed as an “implementation in pseudocode”. There will be many pictures translating code snippets to “what we want under the hood”.

Fortunately, there are really only a few aspects to the language: statements (“lines”), expressions, and lastly paragraphs. We can start simple, then incorporate more and more features.

Lines
- Contexts
- Functions and Primitive Notions
Expressions
Paragraphs

Lines

There are three types of lines in Automath:

Lines defining functions and constants
Lines introducing primitive notions (“builtins” for a given mathematical theory)
Lines expanding the context (named arguments for function declarations)

There’s a basic structure underlying all these lines, which we’ll examine first. The sole purpose for lines: defining functions (the first two cases listed), or specifying the parameters used in their definition (the last case listed).

Consider the following snippet:

* prop : TYPE := PRIM
* [a:prop] proof : TYPE := PRIM
* term : TYPE := PRIM

We need to keep track of what has already been defined, which could be done in a dictionary. The lines form a doubly linked list:

Lines form a doubly-linked list

Alternatively, in a language like Go where this is seen as bad form, I suppose we could use an array instead:

Lines form an array

The only problem with the array approach is, well, we don’t know how many lines a file has a priori. And remember, a “line” isn’t necessarily a single line of the file, we could rewrite the entire Automath program to be a single-line file.

Punchline: We need some kind of linear collection for lines. The exact details are language-specific (duh), but we don’t need to specify anything so far except ordering the lines matters a great deal: the introduction to the text must occur before the end.

Contexts

The code snippet we have considered is more complicated than it appears. Why? Because it’s really equivalent to the following:

{1}   * prop : TYPE := PRIM
{2}   * a : prop := ---
{3} a * proof : TYPE := PRIM
{4}   * term : TYPE := PRIM

We see line 2 introduces a variable (it’s an expander block line), which creates a new stack of named parameters. For this particular snippet, we have the following data structure representation:

Context stack representation

To see this stack structure more explicitly, consider the following snippet:

# >>> * [x:term][y:term] eq : prop := PRIM
# expands to
{1}   * x : term := ---
{2} x * y : term := ---
{3} y * eq : prop := PRIM

The EB lines (lines 1 and 2) carries with them a stack of identifiers describing the implicit arguments. When y is placed into the context of eq, it borrows the y.args field to determine the parameters needed in defining the predicate.

The y.args field determines the named parameters used in defining eq : prop. In a C/Blub language, this amounts to defining prop eq(term x, term y) function.

Context stack representation

Why keep this list structure? We could simplify things if we copy the args into the function definition, and keep track of identifiers and their types in some “dictionary” global variable (egads!). This has the advantage of keeping things simple, using less memory, and less complex. What could go wrong?

If we have a module system (which is what paragraphs are designed for!), then we have to prevent variable name collisions.

Functions and Primitive Notions

So far we have discussed contexts in depth, but we have yet to investigate functions and primitive notions. We adopt the convention that constants are just functions with zero parameters.

The body is just an expression, which is the next topic of discussion in the next section.

But a primitive notion doesn’t have a body: it’s primitive, after all. It’s just a “builtin function” for a given theory. Consequently, the only thing distinguishing a primitive notion from a defined function is whether the body is present or not.

In pseudocode, the metalanguage can test for such things as follows:

boolean isPrimitive(Definition d) {
    return (d.body==null);
}
int arity(Definition d) {
    return d.args.length;
}
boolean isConst(Definition d) {
    return (arity(d)==0);
}
boolean isFunction(Definition d) {
    return (!isPrimitive(d) && !isConst(d));
}

This lets us use one data structure for two different types of lines!

Lines only Define Functions, or Help us in the Process. This is one punchline we should note. Contexts are lines with an arglist field, more or less. Functions are lines with a context (specifying the named parameters for the function) and a body (telling us what the function does). The body requires previously defined entities, or contexts. The only reason we have contexts is to specify the parameters for functions.

Expressions

There’s four or so expressions in Automath, and it’s far simpler than the complicated structure of lines and contexts. Why is this so? Because Automath boils down to a typed lambda calculus, so we have lambda abstractions and applications. But we also have function calls, since lines just define functions. Morally, there is no semantic difference between lambda abstractions and function definitions.

We need to introduce new types, and have “families of types”. How can we make this more clear and rigorous? We need kinds, i.e., “types of types”. Automath has two: PROP for propositions, and TYPE for…everything else. (AUT-68, under the hood, made PROP=TYPE; but AUT-QE fixed this so they were no longer identical.)

But for our framework, we need a base Expression class.

class Expression {
    String id;
};

Kinds

We could implement this either as a pair of global variables (ugh!) or as a factory method producing identical objects for a given kind. In either event, the object in question must tell us (a) it’s a kind, (b) whether it’s TYPE or PROP, and (c) have some identifier.

class Kind extends Expression {
    String id; // "TYPE" or "PROP"
    int code; // = 0 for PROP, 1 for TYPE
};

Types are defined as primitive notions whose type refers to the specific Kind in question.

Lambda Abstraction

The Automath notation for Lambda Abstraction is [x:type] body instead of the more common “λ x:type . body” we’d see in lectures and books.

Not only can Automath lambda expressions produce values from input, we can go further and produce types. That is to say, we can write something like [x:term]TYPE to specify a function which, given some term (say, x0), will produce a type. Although analogous to dependent product type, there are subtle differences which we can only refer the interested reader to the literature. For us, we’ll just pretend they’re “the same”.

Sweeping all these subtle points under the rug, the basic structure of a lambda abstraction would be:

class LambdaAbstraction extends Expression {
    String variableId;
    Expression *variableType;
    Expression *body;
};

Application

Since Automath just checks expression, we don’t have to worry about the subtle problems of simplification. For example, in Python, (lambda x: x+1) 2 => 3. But Automath would have us write <2>[x:nat]Successor(x) which is semantically Successor(2), and we can stop there…or go further, all the way to the primitive notions in arithmetic since 2:=Successor(Successor(0)).

At minimum, all we appear to need for <x>f is the function f and an argument we’ll apply to it x:

// <x>f is represented by a pair of pointers
class Application extends Expression {
    Expression *fun; // f
    Expression *arg; // x
};

Function Calls

We now have a list of arguments applied to a function, instead of a single argument. I suppose we could just treat it as f(x,y)=<<y>x>f by Currying. There is some subtlety here, since we don’t want to call a function with too many parameters, e.g. eq(x,z,y) should throw an error.

At the same time, Automath allows a “shorthand” convention where, if we call a function with too few parameters, the resulting expression is a function of the remaining parameters needed. More precisely, Automath takes the peculiar convention of treating the parameters as a stack, and applying a function to a smaller list of arguments amounts to “popping” the stack. So if we had a function f(x,y,z), then we could construct a function g(x,y) by simply passing in z: g(x,y) = f(z).

In pseudocode, we would have

Expression call(Definition fun, Expression[] args) {
    // precondition: arity must not be less than number of arguments
    assert(arity(fun)<args.length)

    if (arity(fun) == args.length) {
        return new Term(fun, args);
    }
    else {
        Expression[arity(fun)] paddedParameters = new Expression[arity(fun)];
        int numberUnfixedParameters = arity(fun)-args.length
        
        // Copy over the first 'arity(fun)-args.length' named parameters
        for(int i=0; i<numberUnfixedParameters; i++)
            paddedParameters[i] = fun.arguments()[i];
        
        // Copy over the arguments passed in
        for(int i=0; i<args.length; i++)
            paddedParameters[i+numberUnfixedParameters] = args[i]
        
        return new Term(fun, paddedParameters);
    }
}

Paragraphs

This is the last thing we need to discuss before calling it a day: paragraphs. They form a crude module system for Automath. We can have them nested arbitrarily deep, we can re-open them later on, and we expect it to behave like a running linear list of lines.

Lets give some examples of how this would look.

  * l0
  * l1
+ p # open paragraph p
  * l2
  * l3
- p # close paragraph p
  * l4

This would amount to the following situation:

Paragraphs as a linked list

But to make things worse, we can re-open paragraphs that have been previously closed. For example:

  * l0
  * l1
+ p # open paragraph p
  * l2
  * l3
- p # close paragraph p
  * l4
  * l5
+* p # reopen paragraph p
  * l6
  * l7
- p # close paragraph p again
  * l8
  * l9

Or graphically, we’d have the following situation (with the paragraph linkage in red):

Paragraphs as a linked list

Paragraphs themselves form a linear list. When we have multiple paragraphs in a row, something like

+ p1
# ...
- p1
+ p2
# ...
- p2
+ p3
# ...
- p3
# etc.

We want to treat the paraph siblings p1, p2, etc., as all living in the same list or array. Why would this matter?

We’d like to keep track of all the subparagraphs contained within a given paragraph. That’s the real motivation for this feature. It makes tracking down a line or function or variable definition easier (or even possible).

Concluding Remarks

We reviewed the basic statement structure in Automath, which amounted to making definitions…or introducing parameters to be used in a definition. Then we examined the basic expressions admitted, namely, function calls, lambda abstractions, and applying a lambda abstraction to an expression.

Although the statement structure is fairly straightforward (after all, there was only one “type” of statement), the expressions were far more complicated. Implementation in a language not permitting inheritance or pointer casting (e.g., Go) may prove challenging.

The only piece of machinery left to examine is how Automath checks expressions for correctness, but this is remarkably simple type checking (is the function passed the correct argument? Is the definition coherent? Etc.). This is the subject for another topic of discussion, however.