# Data Structures — Automath

by Alex Nelson, 2 January 2016

Lets examine the basic data structures underpinning Automath. This post could be viewed as an “implementation in pseudocode”. There will be many pictures translating code snippets to “what we want under the hood”.

Fortunately, there are really only a few aspects to the language: statements (“lines”), expressions, and lastly paragraphs. We can start simple, then incorporate more and more features.

# Lines

There are three types of lines in Automath:

- Lines defining functions and constants
- Lines introducing primitive notions (“builtins” for a given mathematical theory)
- Lines expanding the context (named arguments for function declarations)

There’s a basic structure underlying all these lines, which we’ll examine first. The sole purpose for lines: defining functions (the first two cases listed), or specifying the parameters used in their definition (the last case listed).

Consider the following snippet:

```
* prop : TYPE := PRIM
* [a:prop] proof : TYPE := PRIM
* term : TYPE := PRIM
```

We need to keep track of what has already been defined, which could be done in a dictionary. The lines form a doubly linked list:

Alternatively, in a language like Go where this is seen as bad form, I suppose we could use an array instead:

The only problem with the array approach is, well, we don’t know how
many lines a file has *a priori*. And remember, a “line” isn’t
necessarily a single line of the file, we could rewrite the entire
Automath program to be a single-line file.

**Punchline:** We need some kind of linear collection for `line`

s. The
exact details are language-specific (duh), but we don’t need to specify
anything so far except ordering the lines matters a great deal: the
introduction to the text must occur *before* the end.

## Contexts

The code snippet we have considered is more complicated than it appears. Why? Because it’s really equivalent to the following:

```
{1} * prop : TYPE := PRIM
{2} * a : prop := ---
{3} a * proof : TYPE := PRIM
{4} * term : TYPE := PRIM
```

We see line 2 introduces a variable (it’s an *expander block* line),
which creates a new *stack* of named parameters. For this particular
snippet, we have the following data structure representation:

To see this stack structure more explicitly, consider the following snippet:

```
# >>> * [x:term][y:term] eq : prop := PRIM
# expands to
{1} * x : term := ---
{2} x * y : term := ---
{3} y * eq : prop := PRIM
```

The EB lines (lines 1 and 2) carries with them a stack of identifiers
describing the implicit arguments. When `y`

is placed into the context
of `eq`

, it borrows the `y.args`

field to determine the parameters
needed in defining the predicate.

The `y.args`

field determines the named parameters used in defining
`eq : prop`

. In a C/Blub language, this amounts to defining ```
prop
eq(term x, term y)
```

function.

**Why keep this list structure?** We could simplify things if we copy
the `args`

into the function definition, and keep track of identifiers
and their types in some “dictionary” global variable (egads!). This has
the advantage of keeping things simple, using less memory, and less
complex. What could go wrong?

If we have a module system (which *is* what paragraphs are designed
for!), then we have to prevent variable name collisions.

## Functions and Primitive Notions

So far we have discussed contexts in depth, but we have yet to investigate functions and primitive notions. We adopt the convention that constants are just functions with zero parameters.

The `body`

is just an expression, which is the next topic of
discussion in the next section.

But a primitive notion doesn’t have a body: it’s *primitive*, after
all. It’s just a “builtin function” for a given theory. Consequently,
the only thing distinguishing a primitive notion from a defined function
is whether the body is present or not.

In pseudocode, the metalanguage can test for such things as follows:

```
boolean isPrimitive(Definition d) {
return (d.body==null);
}
int arity(Definition d) {
return d.args.length;
}
boolean isConst(Definition d) {
return (arity(d)==0);
}
boolean isFunction(Definition d) {
return (!isPrimitive(d) && !isConst(d));
}
```

This lets us use one data structure for two different types of lines!

**Lines only Define Functions, or Help us in the Process.**
This is one punchline we should note. Contexts are lines with an
`arglist`

field, more or less. Functions are lines with a context
(specifying the named parameters for the function) and a body (telling
us what the function *does*). The body requires previously defined
entities, or contexts. The only reason we have contexts *is to specify
the parameters for functions*.

# Expressions

There’s four or so expressions in Automath, and it’s far simpler than
the complicated structure of lines and contexts. Why is this so? Because
Automath boils down to a typed lambda calculus, so we have lambda
abstractions and applications. But we also have function calls, since
lines just define functions. Morally, there is no *semantic* difference
between lambda abstractions and function definitions.

We need to introduce new types, and have “families of types”. How can we
make this more clear and rigorous? We need kinds, i.e., “types of
types”. Automath has two: `PROP`

for propositions, and `TYPE`

for…everything else. (AUT-68, under the hood, made `PROP=TYPE`

; but
AUT-QE fixed this so they were no longer identical.)

But for our framework, we need a base `Expression`

class.

```
class Expression {
String id;
};
```

## Kinds

We could implement this either as a pair of global variables (ugh!) or
as a factory method producing identical objects for a given kind. In
either event, the object in question must tell us (a) it’s a kind, (b)
whether it’s `TYPE`

or `PROP`

, and (c) have some identifier.

```
class Kind extends Expression {
String id; // "TYPE" or "PROP"
int code; // = 0 for PROP, 1 for TYPE
};
```

Types are defined as primitive notions whose `type`

refers to the
specific `Kind`

in question.

## Lambda Abstraction

The Automath notation for Lambda Abstraction is `[x:type] body`

instead
of the more common “λ x:type . body” we’d see in lectures and books.

Not only can Automath lambda expressions produce *values* from input, we
can go further and produce *types*. That is to say, we can write
something like `[x:term]TYPE`

to specify a function which, given some
term (say, `x0`

), will produce a type. Although analogous to
dependent product type,
there are subtle differences which we can only refer the interested
reader to the literature. For us, we’ll just pretend they’re “the same”.

Sweeping all these subtle points under the rug, the basic structure of a lambda abstraction would be:

```
class LambdaAbstraction extends Expression {
String variableId;
Expression *variableType;
Expression *body;
};
```

## Application

Since Automath just *checks* expression, we don’t have to worry about
the subtle problems of simplification. For example, in Python, `(lambda x: x+1) 2 => 3`

.
But Automath would have us write `<2>[x:nat]Successor(x)`

which is
semantically `Successor(2)`

, and we can stop there…or go further, all
the way to the primitive notions in arithmetic since `2:=Successor(Successor(0))`

.

At minimum, all we appear to need for `<x>f`

is the function `f`

and an
argument we’ll apply to it `x`

:

```
// <x>f is represented by a pair of pointers
class Application extends Expression {
Expression *fun; // f
Expression *arg; // x
};
```

## Function Calls

We now have a list of arguments applied to a function, instead of a
single argument. I suppose we could just treat it as `f(x,y)=<<y>x>f`

by
Currying. There is some subtlety here, since we don’t want to call a
function with too many parameters, e.g. `eq(x,z,y)`

should throw an
error.

At the same time, Automath allows a “shorthand” convention where, if we
call a function with too few parameters, the resulting expression is a
function of the remaining parameters needed. More precisely, Automath
takes the peculiar convention of treating the parameters as a stack, and
applying a function to a smaller list of arguments amounts to “popping”
the stack. So if we had a function `f(x,y,z)`

, then we could construct a
function `g(x,y)`

by simply passing in `z`

: `g(x,y) = f(z)`

.

In pseudocode, we would have

```
Expression call(Definition fun, Expression[] args) {
// precondition: arity must not be less than number of arguments
assert(arity(fun)<args.length)
if (arity(fun) == args.length) {
return new Term(fun, args);
}
else {
Expression[arity(fun)] paddedParameters = new Expression[arity(fun)];
int numberUnfixedParameters = arity(fun)-args.length
// Copy over the first 'arity(fun)-args.length' named parameters
for(int i=0; i<numberUnfixedParameters; i++)
paddedParameters[i] = fun.arguments()[i];
// Copy over the arguments passed in
for(int i=0; i<args.length; i++)
paddedParameters[i+numberUnfixedParameters] = args[i]
return new Term(fun, paddedParameters);
}
}
```

# Paragraphs

This is the last thing we need to discuss before calling it a day: paragraphs. They form a crude module system for Automath. We can have them nested arbitrarily deep, we can re-open them later on, and we expect it to behave like a running linear list of lines.

Lets give some examples of how this would look.

```
* l0
* l1
+ p # open paragraph p
* l2
* l3
- p # close paragraph p
* l4
```

This would amount to the following situation:

But to make things worse, we can re-open paragraphs that have been previously closed. For example:

```
* l0
* l1
+ p # open paragraph p
* l2
* l3
- p # close paragraph p
* l4
* l5
+* p # reopen paragraph p
* l6
* l7
- p # close paragraph p again
* l8
* l9
```

Or graphically, we’d have the following situation (with the paragraph linkage in red):

**Paragraphs themselves form a linear list.**
When we have multiple paragraphs in a row, something like

```
+ p1
# ...
- p1
+ p2
# ...
- p2
+ p3
# ...
- p3
# etc.
```

We want to treat the paraph siblings `p1`

, `p2`

, etc., as all living in
the same list or array. Why would this matter?

We’d like to keep track of all the subparagraphs contained within a
given paragraph. That’s the real motivation for this feature. It makes
tracking down a line or function or variable definition *easier* (or
even *possible*).

# Concluding Remarks

We reviewed the basic statement structure in Automath, which amounted to making definitions…or introducing parameters to be used in a definition. Then we examined the basic expressions admitted, namely, function calls, lambda abstractions, and applying a lambda abstraction to an expression.

Although the statement structure is fairly straightforward (after all, there was only one “type” of statement), the expressions were far more complicated. Implementation in a language not permitting inheritance or pointer casting (e.g., Go) may prove challenging.

The only piece of machinery left to examine is how Automath checks expressions for correctness, but this is remarkably simple type checking (is the function passed the correct argument? Is the definition coherent? Etc.). This is the subject for another topic of discussion, however.