Introduction to Automated Theorem Provers

by Alex Nelson, 27 March 2020

“How do I get started with automated theorem provers?”

This question pops up every month or so, and I thought I’d write an answer in one place (because I end up repeating the answer).

The guiding philosophy is to learn the theory and implementation simultaneously.

We could think of a theorem prover as a sort of “interpreter” based on a foundations of mathematics. There are three main camps to the foundations of mathematics: first-order logic + set theory, type theory, and higher-order logic.

Terminology

Before we get started, let’s just clarify some terminology. Some random people on the internet insist “automated theorem provers” differ from “proof checkers” by having some “automation”. This vague claim lacks any basis in reality.

A “proof assistant” is a computer program the user interacts with to construct or verify a proof.

Many, but not all, proof assistants expect user input in the form of “tactics” which manipulate the proof obligations (the claims left to be proven). Such proof obligations are generically referred to as “goals”.

The “proof engine” interacts with the user via tactics, and informs the user of any remaining goals. Then, under the hood, the proof engine uses automation to construct a “proof object” which is then checked by a small “proof checker”.

(Caution: I am being a bit generous with my notion of “proof objects”. Some foundations (type theory) have a particular notion with a specific technical meaning. I’m just referring to any object in memory used during a proof checking procedure as a “proof object”.)

This is the schematic idea underlying automated theorem provers, more or less. There are variations on this theme. Sometimes there is no clear cut distinction between the proof engine and the checker, they’re blurred into a giant monolithic program.

I drew these terminological distinctions from:

Judgments and Inference Rules

A formal system may be formulated in terms of judgments and rules of inference, and we’re trying to pretend a foundation of mathematics is a formal system, so we need some level of familiarity with “judgments” and “rules of inference”.

These concepts trace back as far as Frege’s Begriffsschrift, though they are most clearly articulated in Per Martin-Löf’s 1983 lectures:

Usually a “judgment” is an n-ary relation in the metalanguage, whose meaning is specified by an inductive definition using rules of inference. The notion of an inductive definition was first introduced in:

This is paywalled, so a free-but-terse introduction to inductive definitions may be found in:

An honorable mention is:

Harper’s focus is more on reasoning about programming languages, but he introduces the notions of “judgment” and “rule of inference” in the first few chapters of his book.

A proof assistant is then just a computer program which assists us in chaining inference rules together to form correct proofs.

Getting Started

The first book I recommend reading:

But it, read it, and work through it. Implement your own version of his code in a different statically-typed functional programming language.

(I always recommend the reader should pick a language like Haskell, F#, or Standard ML, then implement the same algorithms yourself. This is a good strategy to learn from the literature, try to implement it yourself from scratch using a different programming language than the book uses.)

Harrison implements quite a few algorithms in OCaml, and in chapter 6 implements an LCF-style prover for a Hilbert proof system atop first-order logic.

The only minor qualms I have is that:

  1. Harrison is more bullish on model theory than me. But that’s like saying, “He’s a vegetarian”.
  2. There are a few times where I wish Harrison would add a couple sentences explaining where we’re heading.

As a supplement to chapter 3 of Harrison, I highly recommend:

Chang and Lee focus on the resolution algorithm and several optimizations of it. Harrison does an alright job explaining it, but I like seeing a complementary explanation alongside Harrison’s.

A few honorable mentions:

Gallier’s book discusses various algorithms for theorem provers working on first-order logic, using pidgin Pascal pseudocode.

Fitting’s book is a graduate-level presentation of similar material.

Further Logic Reading

If the reader is unfamiliar with mathematical logic, I can recommend a few books to brush up on it.

Note we could also view proof assistants as a sort of “digital Hilbert programme”, where the “finitary metatheory” has been replaced by a programming language.

Set Theory reading?

As far as the axioms of set theory, they are almost all variations of ZFC. Some include classes (NBG, MK, etc.).

If we want to do category theory, then we need some way to iterate this procedure of “We have a notion of a collection Coll, we need the collection of all Coll so we form a new notion Coll+. But now we need the collection of all Coll+, so we form a new notion Coll++…” — this is precisely Tarski–Grothendieck set theory (= ZFC + axiom that for any set there exists a Grothendieck universe containing it).

Without Tarski–Grothendieck set theory, we would run into problems trying to formalize Kan extensions (and other stuff).

But since everything is a variation on ZFC, we could refer the curious reader to:

It might be a fun exercise to implement a proof assistant using first-order logic plus set theory, which allows the user to allow or deny the use of certain axioms. If a proof exists which does not use, say, the axiom of choice, then there is a way to provide it. When the user denies the axiom of choice, and tries to use a theorem which requires the axiom of choice, then an error is raised.

This could allow investigating Feferman universes versus Grothendieck universes (versus Tarski universes), predicative versus impredicative, intuitionistic versus classical, etc. etc. etc.

For a review of the set theory needed for category theory, see:

Type Theory Foundations

There are other foundations of mathematics besides logic. Type theory is the other major family of foundations. Basically, type theory is lambda calculus plus some additional structure. A good introduction which takes the approach of adding structure iteratively:

Pierce’s book is written for computer scientists interested in using type theory in programming language theory. I think it beautifully introduces a good fragment of the lambda cube, which coincides with several different foundations of mathematics. But since the applications to the foundations of mathematics is not explicit, this connection may be too subtle to be appreciated.

A follow-up which focuses on applying the lambda cube to the foundations of mathematics, specifically formal proofs, is:

The focus in Nederpelt and Geuvers is specifically formal proof in the sense of theorem provers.

Logical Frameworks

I would also recommend, after reading Pierce’s book, reading:

Dependent types give us sufficient power to create a “logical framework”…a sort of language we can use to describe any logical system. This uses a profound connection called the Curry-Howard Isomorphism.

Loosely, the Curry-Howard isomorphism comes in several flavors. For logical frameworks, propositions are represented by types. An implication connective is represented by a dependents type (a generalization of the function type), modus ponens is encoded by function application. Other type operators encode other logical connectives, and type theory gives us basic rules of inference for free. (It turns out logical frameworks can encode any logic we want.)

There are two major logical frameworks I think are worth looking into further: LF and Automath.

LF

Also called Edinburgh LF, this is the more recent of the two. It was introduced in the paper:

This is a bit involved, and some of it confusing (e.g., there is a lambda abstraction for types, but this is never used). The basic idea is to provide a formal language to describe logic. Judgments are represented as types, and rules of inference are functions. A proof is then a program. (The Twelf theorem prover implements these ideas.)

A couple gentler review articles:

As I said, the original formulation of LF was rather baroque, with parts of it not even used. What’s trickier is we need to work with “canonical forms”. There’s a version of LF where substitution then produces canonical forms for us, simplifying life. This was presented as “Canonical LF” in the paper:

This paper also gives an example of using Twelf to formalize the semantics of a programming language.

Automath

I would also recommend, for the intrepid reader, another book:

This is the selected papers of Nick de Bruijn, who invented the Automath proof checker back in the ’60s. It’s one of the earliest programmes on formalizing mathematics. Automath is the first logical framework invented. De Bruijn independently discovered the Curry-Howard isomorphism, and describes it quite beautifully in a number of articles contained in the book.

Freek Wiedijk has implemented Automath in portable C and has some examples of foundations formalized in Automath:

This can be related to more modern type theories, which is done in the article:

One word of caution, though, Automath is based on a lambda-typed lambda calculus, which is rather esoteric now. Most of the dependently-typed lambda calculi use Pi types. A simple lambda-typed lambda calculus is introduced in:

Calculus of Constructions

The “highest corner” of the lambda cube is called the calculus of constructions. Pierce’s TAPL is a decent introduction to it.

Instead of merely “encoding” a logic (as a logical framework does), the calculus of constructions handles a “deep embedding” of logics.

As for implementing a theorem prover based on it, a good paper:

The famous theorem prover based on the Calculus of Constructions is, of course, Coq. For its foundations and usage, see:

Homotopy Type Theory (briefly)

This is the new kid on the block, invented some time around 2006 (though its origins can be dated back to the days of yore, i.e., 1998). Initially HoTT (as Homotopy Type Theory is called) used Coq as its preferred theorem prover, then Lean took over.

The theory underpinning HoTT is rather complicated. The simplest version: when we have an equality t=u, we have a proposition. The Curry-Howard correspondence encodes this as a type. HoTT replaces this equality with a slightly weaker equivalence relation. As far as I can tell, a type is viewed as a topological space, and the weaker equivalence relation is encoded as a claim of homotopic equivalence.

I’m brusing a bit under the rug here. For example, this is book equality or judgemental equality which gives us a proposition t=u, as opposed to definitional equality which defines the identifier t to be u. This distinction is critical in the calculus of constructions (it was first introduced in Automath by Nick de Bruijn). HoTT replaces ‘book equality’ with another notion of equivalence.

There are still some difficulties with HoTT, and despite the hype I’m not certain it’s the panacea its advocates claim. (The difficulties surrounding HoTT have to do with models of it, cubical set theory, which I’m certain will be ironed out in the coming years.) I’ve yet to see any paper discuss explicitly how HoTT simplifies theorem prover implementations, or is a boon. There is a vast literature on HoTT, so I can’t say such a paper doesn’t exist: I just haven’t found it yet.

In fact, the only proof assistant which appears to explore HoTT seems to be the 1Lab experiment using Agda.

At the time I wrote this initially, the only book on the topic of HoTT was the HoTT book. There have been a couple books introduced more recently, for example:

Higher Order Logic

The third family of foundations of mathematics, higher order logic, may be thought of in terms of type theory (second-order lambda calculus with base types prop and indiv for propositions and non-logical constants, a predicate is then a term of type indiv -> prop or more generally a -> prop for any type a, a quantifier is a term of type (a -> prop) -> prop eating in a predicate and producing a proposition).

An introduction to second-order logic:

It’s rather eye-opening how mathematicians implicitly work in some fragment of second-order logic, and Shapiro provides several damning examples.

Another presentation of higher-order logic from the category theory perspective may be found in:

For a “no nonsense” presentation of higher-order logic and first-order logic in a Hilbert proof calculus, see:

HOL Light is based on a slight variant of this presentation. John Harrison has put a lot of his publications online, and a good chunk are related to his HOL Light prover.

As far as implementation, there is the classic (but out-of-print and horribly expensive) book:

Some of it has been revised and maintained for current HOL implementations as the HOL Documentation.

Metalogical frameworks

Note that HOL forms a “metalogical framework” — logical frameworks allowed us to define a logic and emulate reasoning with that logic. But how would we be able to prove things like “Logic A has property P”? A logical framework cannot do this.

This might not sound exciting, but it’s useful to prove formally things about a logic. Like soundness.

In fact, CakeML seems to be using this aspect of HOL to prove the correctness of its implementation, as well as various other “niceness properties” of its language and toolchain.

HOL as a logical framework

Every metalogical framework is a logical framework with some extra power.

But we can use a fragment of intuitionistic HOL to form a logical framework. This is precisely what Isabelle’s Pure is: a fragment of intuitionistic higher-order logic suitable for formalizing other logics.

When we implement logic X in Isabelle, we refer to the implementation as “Isabelle/X”.

It’s great for things like exploring the design space of proof assistants and the foundations of mathematics.

Changelog

It’s been a while since I first wrote this — I wrote it initially as a response to some questions on a forum, then some emails, and just decided to cobble them all together into a post. But times change, and parts of this post need revising as life moves along.

I guess I should probably keep a log of the amendments made.