Introduction to Automated Theorem Provers

by Alex Nelson, 27 March 2020

“How do I get started with automated theorem provers?”

This question pops up every month or so, and I thought I’d write an answer in one place (because I end up repeating the answer).

The guiding philosophy is to learn the theory and implementation simultaneously.

We could think of a theorem prover as a sort of “interpreter” based on a foundations of mathematics. There are three main camps to the foundations of mathematics: first-order logic + set theory, type theory, and higher-order logic.

Terminology

Before we get started, let’s just clarify some terminology. Some random people on the internet insist “automated theorem provers” differ from “proof checkers” by having some “automation”. This vague claim lacks any basis in reality.

A “proof assistant” is a computer program the user interacts with to construct or verify a proof.

Many, but not all, proof assistants expect user input in the form of “tactics” which manipulate the proof obligations (the claims left to be proven). Such proof obligations are generically referred to as “goals”.

The “proof engine” interacts with the user via tactics, and informs the user of any remaining goals. Then, under the hood, the proof engine uses automation to construct a “proof object” which is then checked by a small “proof checker”.

This is the schematic idea underlying automated theorem provers, more or less. There are variations on this theme. Sometimes there is no clear cut distinction between the proof engine and the checker, they’re blurred into a giant monolithic program.

I drew these terminological distinctions from:

Getting Started

The first book I recommend reading:

Harrison implements quite a few algorithms in OCaml, and in chapter 6 implements an LCF-style prover for a Hilbert proof system atop first-order logic.

I always recommend the reader should pick a language like Haskell, F#, or Standard ML, then implement the same algorithms yourself. This is a good strategy to learn from the literature, try to implement it yourself from scratch.

As a supplement to chapter 3 of Harrison, I highly recommend:

Chang and Lee focus on the resolution algorithm and several optimizations of it. Harrison does an alright job explaining it, but I like seeing a complementary explanation alongside Harrison’s.

A few honorable mentions:

Gallier’s book discusses various algorithms for theorem provers working on first-order logic, using pidgin Pascal pseudocode.

Fitting’s book is a graduate-level presentation of similar material.

Further Logic Reading

If the reader is unfamiliar with mathematical logic, I can recommend a few books to brush up on it.

Judgments and Inference Rules

A formal system may be formulated in terms of judgments and rules of inference, and we’re trying to pretend a foundation of mathematics is a formal system, so we need some level of familiarity with “judgments” and “rules of inference”.

I haven’t found a good single source for these concepts, though they originate with Per Martin-Löf’s 1983 lectures:

Usually a “judgment” is an n-ary relation in the metalanguage, whose meaning is specified by an inductive definition using rules of inference. The notion of an inductive definition was first introduced in:

This is paywalled, so a free-but-terse introduction to inductive definitions may be found in:

An honorable mention is:

Harper’s focus is more on reasoning about programming languages, but he introduces the notions of “judgment” and “rule of inference” in the first few chapters of his book.

Type Theory Foundations

There are other foundations of mathematics besides logic. Type theory is the other major family of foundations. Basically, type theory is lambda calculus plus some additional structure. A good introduction which takes the approach of adding structure iteratively:

Pierce’s book is written for computer scientists interested in using type theory in programming language theory. I think it beautifully introduces a good fragment of the lambda cube, which coincides with several different foundations of mathematics. But since the applications to the foundations of mathematics is not explicit, this connection may be too subtle to be appreciated.

A follow-up which focuses on applying the lambda cube to the foundations of mathematics, specifically formal proofs, is:

The focus in Nederpelt and Geuvers is specifically formal proof in the sense of theorem provers.

Logical Frameworks

I would also recommend, after reading Pierce’s book, reading:

Dependent types give us sufficient power to create a “logical framework”…a sort of language we can use to describe any logical system. This uses a profound connection called the Curry-Howard Isomorphism.

Loosely, the Curry-Howard isomorphism comes in several flavors. For logical frameworks, propositions are represented by types. An implication connective is represented by a dependents type (a generalization of the function type), modus ponens is encoded by function application. Other type operators encode other logical connectives, and type theory gives us basic rules of inference for free. (It turns out logical frameworks can encode any logic we want.)

There are two major logical frameworks I think are worth looking into further: LF and Automath.

LF

Also called Edinburgh LF, this is the more recent of the two. It was introduced in the paper:

This is a bit involved, and some of it confusing (e.g., there is a lambda abstraction for types, but this is never used). The basic idea is to provide a formal language to describe logic. Judgments are represented as types, and rules of inference are functions. A proof is then a program. (The Twelf theorem prover implements these ideas.)

A couple gentler review articles:

As I said, the original formulation of LF was rather baroque, with parts of it not even used. What’s trickier is we need to work with “canonical forms”. There’s a version of LF where substitution then produces canonical forms for us, simplifying life. This was presented as “Canonical LF” in the paper:

This paper also gives an example of using Twelf to formalize the semantics of a programming language.

Automath

I would also recommend, for the intrepid reader, another book:

This is the selected papers of Nick de Bruijn, who invented the Automath proof checker back in the ’60s. It’s one of the earliest programmes on formalizing mathematics. Automath is the first logical framework invented. De Bruijn independently discovered the Curry-Howard isomorphism, and describes it quite beautifully in a number of articles contained in the book.

Freek Wiedijk has implemented Automath in portable C and has some examples of foundations formalized in Automath:

This can be related to more modern type theories, which is done in the article:

One word of caution, though, Automath is based on a lambda-typed lambda calculus, which is rather esoteric now. Most of the dependently-typed lambda calculi use Pi types. A simple lambda-typed lambda calculus is introduced in:

Calculus of Constructions

The “highest corner” of the lambda cube is called the calculus of constructions. Pierce’s TAPL is a decent introduction to it.

Instead of merely “encoding” a logic (as a logical framework does), the calculus of constructions handles a “deep embedding” of logics.

As for implementing a theorem prover based on it, a good paper:

The famous theorem prover based on the Calculus of Constructions is, of course, Coq. For its foundations and usage, see:

Homotopy Type Theory (briefly)

This is the new kid on the block, invented some time around 2006 (though its origins can be dated back to the days of yore, i.e., 1998). Initially HoTT (as Homotopy Type Theory is called) used Coq as its preferred theorem prover, then Lean took over.

The theory underpinning HoTT is rather complicated. The simplest version: when we have an equality t=u, we have a proposition. The Curry-Howard correspondence encodes this as a type. HoTT replaces this equality with a slightly weaker equivalence relation. As far as I can tell, a type is viewed as a topological space, and the weaker equivalence relation is encoded as a claim of homotopic equivalence.

There are still some difficulties with HoTT, and despite the hype I’m not certain it’s the panacea its advocates claim. (The difficulties surrounding HoTT have to do with models of it, cubical set theory, which I’m certain will be ironed out in the coming years.) I’ve yet to see any paper discuss explicitly how HoTT simplifies theorem prover implementations, or is a boon. There is a vast literature on HoTT, so I can’t say such a paper doesn’t exist: I just haven’t found it yet.

The best single book (and the only book) on the topic of HoTT seems to be the HoTT book. It’s best if you are familiar with the Calculus of Constructions before reading this book.

Higher Order Logic

The third family of foundations of mathematics, higher order logic, may be thought of in terms of type theory (second-order lambda calculus with base types prop and indiv for propositions and non-logical constants, a predicate is then a term of type indiv -> prop or more generally a -> prop for any type a, a quantifier is a term of type (a -> prop) -> prop eating in a predicate and producing a proposition).

An introduction to second-order logic:

It’s rather eye-opening how mathematicians implicitly work in some fragment of second-order logic, and Shapiro provides several damning examples.

Another presentation of higher-order logic from the category theory perspective may be found in:

HOL Light is based on a slight variant of this presentation. John Harrison has put a lot of his publications online, and a good chunk are related to his HOL Light prover.

As far as implementation, there is the classic (but out-of-print and horribly expensive) book:

Some of it has been revised and maintained for current HOL implementations as the HOL Documentation.