\( \newcommand\D{\mathrm{d}} \newcommand\E{\mathrm{e}} \newcommand\I{\mathrm{i}} \newcommand\bigOh{\mathcal{O}} \newcommand{\cat}[1]{\mathbf{#1}} \newcommand\curl{\vec{\nabla}\times} \newcommand{\CC}{\mathbb{C}} \newcommand{\NN}{\mathbb{N}} \newcommand{\QQ}{\mathbb{Q}} \newcommand{\RR}{\mathbb{R}} \newcommand{\ZZ}{\mathbb{Z}} \)
UP | HOME

Statically-Typed Functional Languages

Table of Contents

1. Introduction

Statically-typed functional programming languages appear to be variations on a theme with slight differences among syntax. One way to derive statically-typed functional languages is:

  1. Start with a typed lambda-calculus
  2. Add algebraic data types

This will give the "floor model" of such languages, then we can add more bells and whistles and syntactic sugar.

1.1. What are "algebraic data types"?

I have not found a rigorous definition anywhere, but a suitable implementation of algebraic data types requires the following:

  1. Iso-recursive types
  2. Variant types
  3. Product ("tuple") types

What's an "iso-recursive type"? Well, recursive types are formalized using a new binder (μ) with a type variable \(X\). This is called the Fixpoint Type Constructor.

We then have two functions

  • \(\mathrm{unfold}\colon\mu X.\tau\to\tau[(\mu X.\tau)/X]\) where \(\tau[(\mu X.\tau)/X]\) is obtained by replacing all free instances of \(X\) by \(\mu X.\tau\), and
  • \(\mathrm{fold}\colon\tau[(\mu X.\tau)/X]\to\mu X.\tau\)

Such that

  1. \(\mathrm{fold}\circ\mathrm{unfold}=\mathrm{id}_{\mu X.\tau}\) is the identity function, and
  2. \(\mathrm{unfold}\circ\mathrm{fold}=\mathrm{id}_{\tau[(\mu X.\tau)/X]}\) is the identity function.

An algebraic data type is then a variant, each branch being a product type (possibly recursive). For example, a list:

  • Type definition: \(\tau~\mathrm{list} = \mu X.(\mathbf{1} + (\tau\times X))\)
    • Technically, we have it be \(\forall\tau.\mu X.(\mathbf{1} + (\tau\times X))\)
    • Haskell swaps \(\mu\) and \(\forall\) binders, confusingly enough
  • Constructors:
    1. \(\mathrm{nil}_{\tau} = \mathrm{fold}_{\tau~\mathrm{list}}(\mathrm{injl~unit})\)
    2. \(\mathrm{cons}_{\tau} = \lambda x{:}\tau.\lambda\ell{:}\tau~\mathrm{list}.\mathrm{fold}_{\tau~\mathrm{list}}~\mathrm{injr}(x,\ell)\)
  • Pattern matching (destructuring, etc.) then uses "unfold", specifically \(\lambda\ell{:}\tau~\mathrm{list}.\mathrm{case}~(\mathrm{unfold}_{\tau~\mathrm{list}}\ell)~\mathrm{of}\dots\)

2. GHC Core

We can write a minimal functional language using System F, which is what Haskell does with its intermediate Core language:

data Expr
  = Var Var
  | Lit Literal
  | App Expr Expr
  | Lam Var Expr -- Both term and type lambda
  | Let Bind Expr
  | Case Expr Var Type [(AltCon, [Var], Expr)]
  | Type Type -- Used for type application

data Var = Id Name Type -- Term variable
  | TyVar Name Kind -- Type/kind variable

type Kind = Type

data Type = TyVarTy Var
  | LitTy TyLit
  | AppTy Type Type
  | TyConApp TyCon [Type]
  | FunTy Type Type -- Not really necy
  | ForAllTy Var Type

2.1. Prenex Polymorphism

We actually don't need "full" System F. We can restrict attention to rank-1 polymorphism, where the ∀ and big Λ binders appear in "prenex form" (i.e., outermost position). So, we can write:

data MonoType = TyVarTy Var
  | LitTy TyLit
  | AppTy MonoType MonoType
  | TyConApp TyCon [MonoType]
  | FunTy MonoType MonoType -- Not really necy
data Type = TSType MonoType
  | ForAllTy Var TypeScheme

Or something like that.

This will have decidable type checking, whereas System F does not have decidable type checking (it's known to be undecidable).

3. References

  • Simon Peyton Jones,
    "Into the Core: Squeezing Haskell into Nine Constructors".
    2016 talk and 2022 talk
    • Talks about making a minimal functional language atop System F.
  • Lecture notes on recursive types; CS Cornell Fall 2012 course 4110 PDF

Last Updated 2022-07-15 Fri 09:11.