Idioms - SML

1. Using type t in Signatures
2. Datatypes in Signatures
3. Stringify Types
- 3.1. Type Conversions
4. Comparable Types
- 4.1. Caveat: Equality Testing
5. Principle: Similar Functions should have Similar Names

These are idioms gleaned from reading some random Standard ML programs.

1. Using `type t` in Signatures

When encoding a data type as a signature, for example a signature SET, use type t to specify the data type.

Example:

signature SET =
sig
    type 'a t;
    val empty : 'a t;
    val insert : 'a -> 'a t -> 'a t;
    val null : 'a t -> bool;
    val member : 'a -> 'a t -> bool;
    val map : ('a -> 'b) -> 'a t -> 'b t;
end;

structure TreeSet : SET =
struct
    datatype 'a t = E | T of int*'a*('a t)*('a t);

    val empty = E;

    fun insert x E = (* ... *)
      | insert x (T (n, v, l, r)) = (* ... *);

    fun null E = true
      | null _ = false;

    fun member _ E = false
      | x (T (_, v, l, r)) = (* ... *);

    fun map _ E = E
      | map f (T (_, x, l, r)) = (* ... *);
end;

(* mySet : TreeSet.t *)
mySet = TreeSet.empty;

2. Datatypes in Signatures

One caveat worth discussing: we can end up with duplicate-but-distinct types if we place datatype declarations in signatures. For example, consider the signature:

signature EXP = sig
    datatype t = Var of id
               | Abs of id * t
               | App of t * t
end;

Suppose we have two different structures which implement this signature, e.g., one for a parse tree and another for an abstract syntax tree:

structure ParseTree : EXP = struct
    datatype t = Var of id
               | Abs of id * t
               | App of t * t;
end;

structure AST : EXP = struct
    datatype t = Var of id
               | Abs of id * t
               | App of t * t;
end;

The surprise: AST.t and ParseTree.t are two distinct types. This is because Standard ML has generative datatype declarations, meaning: each time a datatype declaration is encountered, a new type will be created. We could avoid this by doing something like:

struct Exp = struct
    datatype t = Var of id
               | Abs of id * t
               | App of t * t;
end;
       
signature EXP = sig
    datatype t = datatype Exp.t;
end;

This is specified by rule 18 of the 1997 definition of SML to not generate duplicate datatypes. We could also do the following:

datatype Exp = Var of id
             | Abs of id * Exp
             | App of Exp * Exp;

signature EXP = sig
    datatype t = datatype Exp;
end;

But it is unclear if we could do the following:

local
    datatype Exp = Var of id
                 | Abs of id * Exp
                 | App of Exp * Exp;
in
    signature EXP = sig
        datatype t = datatype Exp;
    end;
end;

This also has the problem/feature that the constructors Var, Abs, App are private.

(This is also probably why signatures should have just type t instead of a datatype declaration.)

See:

Generative Datatypes in MLton's wiki
Tips for writing concise SML in MLton's wiki
Appendix G §§2–3 of the Revised Definition of Standard ML.
§4.3.3 of "The History of Standard ML"
Karl Crary, Robert Harper, Perry Cheng, Leaf Petersen, Chris Stone, Transparent and opaque interpretations of datatypes (1998).

3. Stringify Types

In Haskell, we tend to have a single typeclass Show which has a "to string" method. Standard ML does not have anything similar to type classes, so we end up encoding them as signatures. Strictly speaking, we would have

signature SHOW =
sig
    type t;
    val toString : t -> string;
end;

3.1. Type Conversions

More broadly, if we have defined two types A and B, and it is "obvious" we can convert any value of type A to a value of type B, then we define a function of the form:

structure A =
struct
    type t;
    val toB : t -> B;
    (* val fromB : B -> t; if we can convert the other way around *)
    (* etc. *)
end;

Then we have A.toB foo be an "obvious" conversion.

Gansner and Reppy's Standard ML Basis Library, section 1.1.4, encourages this sort of naming convention for conversion functions.

4. Comparable Types

When defining a new type T, we would need to implement a function compare : T * T -> order. This should be placed in a signature, so it's "public facing".

We would want various nice properties of the ordering, like:

Transitivity: if compare(x,y) = compare(y,z), then we want compare(x,z) = compare(x,y) too.
Reflexivity: for any x, we want compare(x, x) = EQUAL
Antisymmetry: if not (compare(x, y) = GREATER) and not (compare(y, x) = GREATER), then we expect compare(x, y) = EQUAL.

A "nice to have" property which, I am not certain is absolutely necessary:

Completeness: for any x and y, we want compare(x,y) orelse compare(y,x) to be true

Examples:

Discussion with Weight-Balanced Trees

References:

Gansner and Reppy's Standard ML Basis Library, sections 1.1.3 and 4.1, encourages this sort of practice with linear orders.

4.1. Caveat: Equality Testing

The only caveat which springs to mind is that it may be faster to implement your own function testing for equality. Again, Haskell has Data.Eq typeclass, but Standard ML has equality type kludge.

For example, if you are implementing a proof assistant, and you have a datatype encoding logical propositions, it would be somewhat faster to implement your own eq function than to use compare (x,y) = EQUALS.

5. Principle: Similar Functions should have Similar Names

If we are building our own collection data structure (say, a set or hashmap or whatever), then we should name the functions similar to analogous functions on lists. For example,

signature SET =
sig
    type 'a t;
    val app : ('a -> unit) -> 'a t -> unit;
    val map : ('a -> 'b) -> 'a t -> 'b t;
    (* And so on... *)
end;

The intuition being that an instance of SET.t is like a list and should have analogous map and app functions (among others). In retrospect, a better design choice would have been to have a hierarchy of signatures for collections, and the structure would implement the relevant one.

Gansner and Reppy's Standard ML Basis Library, section 1.1.2, encourages this sort of practice.