Compiler and correctness

Back from the Open Source Experience 2021 meeting, I got the chance to learn more about Pharo – a pure object-oriented programming language and a powerful environment à la Smalltalk – and about Coq – a formal proof management system. Even, I had the opportunity to discuss with Yves Bertot and I had been so lucky that he explained me the demo one-to-one. To be honnest, I am still rehashing many concepts and this experience is feeding my willing to learn more.

Once at home, I want to try by myself. Ah, because I am running Guix as a computational environment manager, the story starts with a yak shaving: close this old bug#46016 about Proof General. Other said, I want to interact with Coq from my regular text editor Emacs. Therefore, let fix the annoyance with patch#51755. And, since we are at it, although it is not necessary, let also add the Coq package semantics with the patch#51896. Now we are ready, let's go!

Our aim with this post is to sketch some ideas about compilation and correctness using a very basic --not to say trivial_– toy implementation. How to test the implementation? How to know the implementation is correct? And we get it, our aim is to use Coq.

Because we want to keep everything as simple as possible, the compiler is about the simplest language ever: a calculator computing only additions but with parenthesis, e.g., (1+2)+(3+(4+0)). Again, keeping everything as simple as possible, we are using the simplest computational model: a stack machine with two instructions (PUSH and ADD). Therefore, we need:

Our didactic goal

In our very restricted toy framework, the pipeline of a computation is defined by this composition –data flowing from right to left–,

pop . emulator . compiler . push


  • push returns a tree,
  • compiler transforms the tree to a sequence of instructions,
  • emulator executes this list of instructions,
  • pop returns the last elements; we expect one value.

In this post, we are going to implement various emulator and compiler. The aim is to discuss how it is possible to ensure the correctness of the composition emulator . compiler.

Implementation using Haskell

We choose Haskell and not OCaml – only because Tuareg (Emacs OCaml mode) is not working for us as expected. Anyway, let start!

Parsing with Haskell

Using the builtin Haskell parsing utilities, the parser function reads,

parser :: String -> Maybe STree
parser input = case parse expr "" input of
		    Left _  -> Nothing
		    Right e -> Just e

where expr of type Parser STree. Well, please read all the details in the file Parser.hs. Keeping simplicity in mind, the data type for the tree is just:

data STree =
  Val Integer
  | Add STree STree

where the leaf is named Val and carries the value of an integer; the constructor Add is a node and by referring recursively to STree it defines the tree . For instance, (1 + 2) + 3 is thus represented by the tree,

    /   \
  Add    Val 3
 /   \
Val 1 Val 2

where Val are leaves and Add are nodes. So far, so good!


The emulator (virtual machine) runs two instructions (PUSH and ADD), the binary, here, is a list of instructions, and the stack is a list of integers. The Haskell code implementing these words reads,

data Instr =
  PUSH Integer
  | ADD

type Binary = [Instr]
type Stack  = [Integer]

Therefore, the emulator takes a binary, returns a stack and it executes (computes) starting with an empty stack,

emulator :: Binary -> Stack
emulator b = exec b []

For the interested reader, the details of exec is provided here.


The input of the compiler a tree STree and it produces a binary (a list of instructions), i.e., it reads,

compiler :: STree -> Binary
compiler source = comp source []

where comp is an auxiliary function dealing with each case of the tree: if it is a leaf (Val), then the instruction is to PUSH the value in the stack (bytecode), otherwise if it is a node (Add), then compile recursively the subtrees and set the instruction ADD to stack. These words are implemented in Haskell with,

comp :: STree -> Binary -> Binary
comp (Val n) bytecode   = PUSH n : bytecode
comp (Add x y) bytecode = comp x $ comp y $ ADD : bytecode

Nothing fancy!


…to complete a working example, the functions push and pop are here the identity function (do nothing) and take the first element. Well, adding some other minor code,

  • cmd for the command line interface and parsing the input string,
  • go for gluing the compilation and running parts,
  • emit for debugging purpose,

and we end with,

$ ./calc-no-coq -x '(1+2)+3'
= 6
$ ./calc-no-coq -c '(1+2)+3'
Instructions: "PUSH 1, PUSH 2, ADD, PUSH 3, ADD, ;"


Implementation using Coq

As you see just above, we named the previous calc-no-coq because the title of this post is about «compilation and correctness», thus now we are going to speak about correctness. The perfect tool is Coq the proof assistant. Before proving, let use Coq as a programming language and implement these compiler and emulator in Coq, then extract to Haskell. Other said, we reuse the same pipeline as before,

pop . emulator . compiler . push

where emulator and compiler are going to be implemented in Coq, the pop and push will be adapted accordingly, i.e.,

  • push converts from the Haskell tree datatype STree to the Coq tree datatype Tree, i.e, signature STree -> Coq.Tree,
  • pop converts from the Coq stack datatype to Haskell integer, i.e., signature Coq.Stack -> Integer

This Coq.Tree is defined by the module Lang,

Inductive Tree : Type :=
  Val (n : nat)
  | Add (x : Tree) (y : Tree).

which looks similar to definition of STree, and Coq.Stack is identically defined as above, i.e., a list of number,

Definition Stack := list nat.

So far, so good!


Similarly to the previous implementation, we straightforwardly adpat the data structure for the instructions and the binary (list of instructions),

Inductive Instr:Type := PUSH (n : nat)
		      | ADD.

Definition Binary := list Instr.

The emulator executes a bytecode (b : Binary) and returns a stack,

Fixpoint exec (b : Binary) (s : Stack) : Stack :=
  match b, s with
    PUSH n :: rest, stack => exec rest (n :: stack)
  | ADD :: rest, m :: n :: stack => exec rest (m + n :: stack)
  | [], stack => stack
  | _, stack => stack

Definition emulator (b : Binary) := exec b [].

where exec is the function really computing; this function takes two arguments: a list of instructions (Binary) and stack and then returns another stack.


The compiler requires a tree (named above Coq.Tree) and an auxiliary function,

Fixpoint comp (source : Tree) (bytecode : Binary) : Binary :=
  match source with
    Lang.Val n => PUSH n :: bytecode
  | Lang.Add x y => comp x (comp y (ADD :: bytecode))

Definition compiler (source : Tree) := comp source [].

where all is strongly similar with the previous implementation.


…to a complete working example, we just need to extract from Coq to Haskell with,

Require Export Lang Compiler Emulator.

Require Extraction.
Extraction Language Haskell.
Extraction "Generated.hs" compiler emulator.

additing some Haskell glue code, as push and pop, and command-line and parser, we get,

$ ./calc -x '(1+2)+3'
= 6
$ ./calc -c '(1+2)+3'
Instructions: "PUSH 1, PUSH 2, ADD, PUSH 3, ADD, ;"
$ ./calc -c '1+(2+3)'
Instructions: "PUSH 1, PUSH 2, PUSH 3, ADD, ADD, ;"

Note the order of the instructions depending on the parenthesis order, i.e., depending on the tree, for instance,

  (1+2)+3            1+(2+3)

     Add                Add
    /   \              /   \
  Add    Val 3       Val 1  Add
 /   \                     /   \
Val 1 Val 2              Val 2 Val 3

Neat! However, let avoid to go out of rail and thus let recap!

  • we are interested by the emulator and compiler parts,
  • for both implementations, the command-line is done by the same Haskell implementation (cmd); idem for parsing (parser).

Now, we want to have a proof that the implementations of emulator and compiler are correct. For the implementation in Haskell, we cannot do much, except run QuickCheck (large number of examples picked randomly). For the implementation in Coq, well Coq is a proof assistant after all, so could we prove something about the correctness?

Proving the emulator and compiler are correct

In place of a large test suite, we can reason about the logic of the functions emulator and compiler. As we wrote in Test = examples or theorem?, instead of ruling out many examples to convince us about the correctness, we could try to prove a theorem. Which one?

We would like to prove that compiling the code and running it using the emulator is equal to interpret it using an (abstract) evaluator. It means prove this theorem,

Theorem correctness: forall (code : Tree),
    emulator (compiler code) = eval code :: [].

where eval is an abstract interpreter implemented in Coq but never effectively doing a single real computation. Such interpreter is easily defined by the recursive function,

Fixpoint eval (source:Tree) :=
  match source with
    Val n => n
  | Add x y => (eval x) + (eval y)

A formal proof of the theorem correctness requires other theorems, lemma or corollary. Note that for Coq, theorem, lemma and corollary are the same but they help the human reader.

How do we prove this theorem correctness assisted by Coq?

Unwrap the emulator

The emulator is a layer on the top of the execution model – the function exec. Instead of the theorem correctness, we could think about the theorem,

Theorem correctness': forall (code : Tree) (stack : Stack),
    exec (compiler code) stack = eval code :: stack.

and if this theorem correstness' holds, then the proof of the theorem correctness reads,

Corollary correctness'': forall (code : Tree),
    exec (compiler code) [] = eval code :: [].
  intro code.
  rewrite correctness'.

Theorem correctness: forall (code : Tree),
    emulator (compiler code) = eval code :: [].
  intro code.
  unfold emulator.
  rewrite correctness''.

where the corollary is a straightforward application of the (admitted for now) theorem. Then, the proof of correctness just unfolds the function of interest emulator and applies (rewrite) the previous proved corollary.

How do we prove this theorem correctness'?

Unwrap the compiler

The correctness is now dependent on the theorem correctness'. Because compiler is a wrapper around the function comp, we could think about the lemma,

Lemma correctness_comp: forall (code : Tree) (bytecode : Binary) (stack : Stack),
    exec (comp code bytecode) stack = exec bytecode (eval code :: stack).

The key point of the proof is the use of the additing commutativity proof Nat.add_comm, \(\forall n, m : n + m = m + n\). Such could appear straightforward, but it is not because it depends on how the numbers are internally represented. For efficiency purpose, a computer somehow represents only a finite set of natural numbers, commonly named integer. Instead Coq is using, by default, the mathematical construction of natural numbers defining a zero, noted O and a successor, i.e., the data structure,

Inductive nat : Type :=
  | S n.

where e.g., 3 is thus S (S (S 0)). That's why, the proof applies Nat.add_comm. Aside this consideration, the proof mainly unfolds and folds the definitions; see below.

Using this correctness_comp lemma, then the corollary correctness_comp' just considers the case of an empty binary,

Corollary correctness_comp': forall (code:Tree) (stack:Stack),
    exec (comp code []) stack = eval code :: stack.


…ending the proof is just unfolding the functions of interest and applying (rewrite) the previous proved results. The complete proof of correctness is therefore (presented linearly as in math textbook),

Lemma correctness_comp: forall (code:Tree) (bytecode:Binary) (stack:Stack),
    exec (comp code bytecode) stack = exec bytecode (eval code :: stack).
  intros code.
  induction code.
  - simpl. reflexivity.
  - unfold comp. fold comp.
    intros bytecode stack.
    rewrite IHcode1.
    rewrite IHcode2.
    unfold eval. fold eval.
    unfold exec. fold exec.
    rewrite Nat.add_comm.

Corollary correctness_comp': forall (code:Tree) (stack:Stack),
    exec (comp code []) stack = eval code :: stack.
  intros code stack.
  rewrite correctness_comp.
  unfold exec.

Theorem correctness': forall (code:Tree) (stack:Stack),
    exec (compiler code) stack = eval code :: stack.
  intros code stack.
  unfold compiler.
  rewrite correctness_comp'.

Corollary correctness'': forall (code:Tree),
    exec (compiler code) [] = eval code :: [].
  intro code.
  rewrite correctness'.

Theorem correctness: forall (code:Tree),
    emulator (compiler code) = eval code :: [].
  intro code.
  unfold emulator.
  rewrite correctness''.

Done! Well, almost. We could also prove that the behaviour of the interpreter eval is correct, i.e., the properties of the addition for mathematical intergers are fully satisfied.

Opinionated afterword

A Coq's proof is somehow a compilation. Therefore, similarly as Thompson's Reflections on Trusting Trust, what is the seed we have to trust? How to inspect this seed? What is the size / complexity of this seed?

For sure, realistic compilers usable for critical embedded software using formal verification are around, as CompCert – incredible piece!

Last, whatever the trust we have in the compiler, it runs on the top of a kernel (operating system). What trust do we have in this binary kernel? How this binary kernel is it built? Maybe, all the bootstrap chain could starts from stage0 (Hex0) on some formally verified microkernel as seL4 or on a Linux kernel compiled by CompCert from GNU Hurd microkernel1.



Assuming Coq would run Hurd. ;-)

© 2014-2024 Simon Tournier <simon (at) >

(last update: 2024-06-05 Wed 09:47)