\( \newcommand\D{\mathrm{d}} \newcommand\E{\mathrm{e}} \newcommand\I{\mathrm{i}} \newcommand\bigOh{\mathcal{O}} \newcommand{\cat}[1]{\mathbf{#1}} \newcommand\curl{\vec{\nabla}\times} \newcommand{\CC}{\mathbb{C}} \newcommand{\NN}{\mathbb{N}} \newcommand{\QQ}{\mathbb{Q}} \newcommand{\RR}{\mathbb{R}} \newcommand{\ZZ}{\mathbb{Z}} \)
UP | HOME

On Kernel Size

Table of Contents

Puzzle: How small should a proof assistant's kernel be, for it to be considered "bug free"?

Solution: Use the Negative Bernoulli distribution for \(r\) bugs, the probability of a line of code being bug free is \(p\), and the probability of a bug is \(q:=1-p\approx 20/1000\) (which is the currently accepted rate, informally used among programmers), and Chebyshev's inequality to estimate the number of bugs \(r\) within \(3\sigma\) of zero. Recall \(\mu=pr/q\) is the mean for the distribution, and \(\sigma^{2}=pr/q^{2}\) is the variance. Then Chebyshev gives us, for tolerance of \(k\sigma\):

\begin{equation} \mu \leq k\sigma\implies \frac{p}{q}r \leq \frac{k}{q}\sqrt{pr}. \end{equation}

We solve for \(r\):

\begin{equation} r \leq \frac{k^{2}}{p}. \end{equation}

The upper bound on \(r\) is then \(r_{\text{max}}\) which is defined as:

\begin{equation} r_{\text{max}} := \frac{k^{2}}{p}. \end{equation}

Then we find the expected number of lines of code corresponding to \(r_{\text{max}}\) should be

\begin{equation} \mu_{\text{max}} = \frac{pr_{\text{max}}}{q} \approx \frac{k^{2}}{q}. \end{equation}

For the figures given, we have

\begin{equation} \mu_{\text{max}}(k) \approx 50 k^{2}, \end{equation}

and for \(k=3\) sigmas we have:

\begin{equation} \mu_{\text{max}} = 441. \end{equation}

Thus the upper-bound on a program to be a candidate for being "bug free" is approximately 441 lines of code.

If we then use this estimate for the number of lines \(n\leq 50k^{2}\) for a Binomial distributed random variable (NOT a negative Binomial distributed random variable), then the probability of finding a bug within 3 sigmas of 0 bugs is greater than 50%.

Just a quick consistency check, we can use the Poisson approximation to a Binomial distribution to find a heuristic upper bound on the number of lines of code to be \(n\leq 10/p\approx500\).

1. Observations

If we use a Binomial distributed random variable \(B\sim\mathrm{Bin}(N,p_{bug})\), and look at finding the greatest number of lines of code \(N\) such that the probability of finding no more than \(3\sigma\) bugs is \(\Pr(B\leq 3\sigma)\geq 0.5\), then we find \(N\approx 533\). The R code which let me solve this by brute force:

p_bug <- 0.02
q_bug <- 1 - p_bug

sd_bug <- sqrt(p_bug*q_bug) # sigma = sd_bug*sqrt(n)

# `guess` computes the likelihood L(n) = Pr(B <= 3*sigma | n)
# of no more than `3*sigma` bugs. We are trying to find the greatest
# `n` such that `guess(n) >= 0.5`; empirically this seems to be
# 533, and fluctuates around a decreasing trend.
guess <- function(n) { pbinom(round(3*sd_bug*sqrt(n)), n, p_bug)}

guess(441)
# [1] 0.6113548
guess(533)
# [1] 0.5002271

This guess is not a monotonic function, but 533 seems to be the greatest value of n such that guess(n) >= 0.5.

If we wanted to be more certain, we could modify the code:

guess <- function(n, k=3) { pbinom(round(k*sd_bug*sqrt(n)), n, p_bug)}

expected_bugs <- function(n) { p_bug*n }

bug_sigma <- function(n, k) { k*sd_bug*sqrt(n) }

guess(1283, k=5) # expected bugs: 25.66
# [1] 0.5000607
bug_sigma(1283, 5)
# [1] 25.07329

guess(883, k=4) # expected bugs: 17.66
# [1] 0.5001064
bug_sigma(883, 4)
# [1] 16.64058

guess(533, k=3) # expected bugs: 10.66
# [1] 0.5002271
bug_sigma(533, 3)
# [1] 9.696453

## The case when bug_sigma(n, k) == expected_bugs(n, k)
guess(441, k=3) # expected bugs: 8.82
# [1] 0.6113548
bug_sigma(441, 3)
# [1] 8.82

guess(283, k=2) # expected bugs: 5.66
# [1] 0.5005881
bug_sigma(283, 2)
# [1] 4.710329

guess(133, k=1) # expected bugs: 2.66
# [1] 0.5018235
bug_sigma(133, 1)
# [1] 1.614559

The moral: the fewer lines of code, the better.

2. References

The concern for small trusted kernel is generically referred to as the "de Bruijn criterion", a term coined by Henk Berendregt in:

  • Henk Barendregt,
    "The impact of the lambda calculus in logic and computer science".
    Bulletin of Symbolic Logic 3, no. 2 (1997) pp.181–215.

Specifically, Berendregt writes:

If a complicated computer system claims that a certain mathematical statement is correct, then one may wonder whether this is indeed the case. For example, there may be software errors in the system. A satisfactory methodological answer has been given by de Bruijn. Proof-objects should be public and written in such a formalism that a reasonably simple proof-checker can verify them. One should be able to verify the program for this proof-checker ‘by hand’. We call this the de Bruijn criterion.

Last Updated 2023-08-20 Sun 09:21.