Exercises for Week 3#

Exercise overview#

In compiler implementation, an essential data structure is that of associative maps, also often referred to as environments. For example, type checking uses environments for associating variables with types, code generation uses environments for mapping variables to target registers, and optimization phases use environments extensively, e.g., for tracking uses and definitions of variables. Note that while it is usual that keys in the environments are identifiers, (that is, variable names), that is not always the case. For example, in common subexpression elimination optimization that optimizes x = a + b; y = a + b to x = a + b; y = x, the keys are expressions such as a + b.

The exercises below explore four different approaches for implementing environments. For simplicity, we consider enviromnents where keys are strings and values are integers. The examples make use of OCaml’s module system. This fulfills the secondary goal of these exercises to familiarize oneself with OCaml modules. To read up on OCaml module system, see Chapter 5 of the OCaml book.

The four approaches for implementing environments we consider are

  1. List-based environments.

  2. Functional environments.

  3. Map-based environments.

  4. Map-based environments with integer hashtable.

Relevant reading#

For the implementation of approach 4, the relevant reading is Section 5.1 of Andrew Appel’s book Modern Compiler Implementation in ML.

Environment signature#

We start off by defining the type of our keys, the NotFound exception, and a module signature that will be shared by all four implementations

type value = int
exception NotFound (* we throw this exception when the key is not found *)

module type EnvSig = sig 
  type env                  (* abstract environment type *)
  type symbol               (* abstract symbol type      *)
  val empty_env : env       (* create an empty environment *)
  val insert : env -> symbol -> value -> env  (* insert; returns updated environment *)
  val lookup : env -> symbol -> value         (* lookup; returns the value or raises NotFound *)
  val symbol : string -> symbol   (* create a symbol value from string *)
  val approach_name: string (* the name of our approach, for benchmarking later *)
end

Most of the signature is standard, except perhaps the symbol type and the symbol function. They introduce a layer of indirection. Instead of inserting a key-value pair directly into the environment, we first convert the key into a so-called symbol value (that has type symbol) which we pass over as the argument to insert. This layer of indirection is needed by the last approach, which promises a bit of extra performance. We added it to the signature in order to parametrize the benchmarking; but everywhere except the last implementation, the symbol function is implemented as identity function fun x -> x, and the symbol type is a synonym to string.

For example, if E is a module implementing our environments, we can write a function such as

let example () = 
  let open E in 
  let e0 = empty_env in 
  let x_sym = symbol "x" in
  let e1 = insert e0 x_sym 5 in 
  let result = lookup e1 x_sym in 
  Printf.printf "%d\n" result         (* prints 5 *)

List-based environments#

Our first implementation uses associative lists as environments. See the following start of an implementation. Two functions are left to you as an exercise – they are the ones raising the TODO exception.

(* replace the code that raises TODO with your implementation *)
exception TODO 

(* Approach 1 - list based environments  *)
module ListEnv : EnvSig = struct 
  let approach_name = "List-based environments"
  type env = (string * value) list 
  type symbol = string 
  let symbol = fun x -> x 
  let empty_env = []
  let insert e k v = raise TODO
  let lookup e k = raise TODO
end     

Testing your module#

When you complete the module you can start testing it. For example, you can repurpose the example question from above where you replace E with ListEnv. Do create your own examples to see that this module works as expected.

This approach to testing applies to all of the remaining implementations.

Functional environments#

The following approach uses functions as environments. It is closer in spirit to the mathematical notion of an associative map being a function from keys to values. We have now included the implementation of insert, and we ask you to only implement the lookup function here.

(* Approach 2 - functional environments *)

module FunEnv : EnvSig = struct 
  let approach_name = "Functional environments"
  type env = string -> value 
  type symbol = string 
  let symbol = fun x -> x 
  let empty_env = fun _ -> raise NotFound
  let insert e k v = fun x -> if x = k then v else e x 
  let lookup e k = raise TODO
end 

Map-based environments#

Next implementation is a thin wrapper around the OCaml’s Map standard library. This way of wrapping allows us to see how this implementation relates to the other approaches.

(* Approach 3 - Map-based environments  *)
module MapEnv : EnvSig = struct 
  let approach_name = "Map-based environments"
  module StringMap = Map.Make (String)   (* using Map functor; do read up on this !*)
  type symbol = string 
  let symbol = fun x-> x

  type env = value StringMap.t
  let empty_env = StringMap.empty
  let lookup e k = raise TODO
  let insert e k v = raise TODO
end 

Map-based environments with int hashtables#

The following implementation is based on the symbol tables as described in Modern Compiler Implementation in ML, Chapter 5.1.

(* Approach 4 - Map-based environments with int hashtable; 
   based on MCIML, Chapter 5.1 *)

module MapHashEnv : EnvSig = struct 
  let approach_name = "Map-based envs with int H/T"
  let nextsym = ref 0
  type symbol = string * int
  module H = Hashtbl
  let hashtbl: (string, int) H.t = H.create 2048 (* some init size *)
  module SymbolMap = 
      Map.Make (
          struct 
            type t = symbol 
            let compare (_,n1) (_,n2) = compare n1 n2 
          end)

  let symbol name =
    match Hashtbl.find_opt hashtbl name with
    | Some i -> (name, i)
    | None ->
      let i = !nextsym in
      nextsym := i + 1;
      H.add hashtbl name i ;
      (name, i)
          
  type env = value SymbolMap.t 
  let empty_env = SymbolMap.empty
  let insert e k v = raise TODO
  let lookup e k = raise TODO
end;;

Make sure you understand the code and the purpose of the symbol table-based indirection.

Benchmarking#

So, four ways to do the same thing!? But what is the right way? A back-of-an-envelop asymptotic analysis will tell us that both the list-based environments and functional environments are inferior to the map-based ones (can you see why?). It may also be instructive to do an empirical analysis to get a sense of how all four of the approaches relate to each other.

How should one benchmark these? Ideally, we should have a finished implementation of a compiler where the environment module can be replaced and we perform an end-to-end evaluation of how different implementations affect compilper performance. Alas, we do not have a finished (or even started) compiler. Instead, we will try to use a micro-benchmarking approach, using an OCaml library called core_bench. You can read about the idea of micro-benchmarking and the library we use in this Jane Street blog post.

(* Benchmarking using core_bench *)
(* This probably requires extra libraries to install using opam *)
let _ = Random.self_init();;

let random_string n = 
  String.init n (fun _ -> Char.chr(97 + (Random.int 26)))

let max_n = 2000 
let name_strings = List.init max_n (fun _ -> (random_string 10))
let values = List.init max_n (fun x -> x )

module type BenchSig = sig 
  val approach_name : string 
  val bench : unit -> int
end


(* Question to students: is this a good way of doing benchmarking? *)
(* Are there any issues to discuss about this? *)
module EnvBench (E:EnvSig):BenchSig = struct
  include E  
  let names = List.map symbol name_strings
  let names_and_values = List.combine names values
  let bench () = 
    let e1 = List.fold_left (fun e (k,v) -> insert e k v ) empty_env names_and_values in 
    let s = List.fold_left (fun s n -> s + lookup e1 n ) 0 names  in
    s
end 

(* main function *)
let () =
  let open Core in 
  let open Core_bench in 
  let f m = let module M = (val m : BenchSig) in             
            Bench.Test.create ~name: M.approach_name M.bench 
  in (List.map [(module EnvBench (ListEnv)    : BenchSig);
                (module EnvBench (FunEnv)     : BenchSig);
                (module EnvBench (MapEnv)     : BenchSig);
                (module EnvBench (MapHashEnv) : BenchSig); 
                ]  ~f: f) 
  |> Bench.make_command   (* we're plugging into the bench main functionality *)
  |> Command_unix.run

Configuring and running the benchmark#

We use the following dune configuration; see the list of libraries that you may need to install using opam.

(executable
   (name envbench)
   (libraries core core_bench core_unix.command_unix))

(install 
    (section bin)    
    (files (envbench.exe as envbench)))

(env
  (dev
    (flags (:standard -warn-error -A)))) 

With the above configuration, dune build will produce a binary envbench that we can call

_build/install/default/bin/envbench

You will get an output that looks like follows. We have redacted the actual numbers with asterisks – run the benchmark to see the numbers for yourself!

Estimated testing time 40s (4 benchmarks x 10s). Change using '-quota'.
┌─────────────────┬────────────┬──────────┬───────────┬───────────┬────────────┐
│ Name            │   Time/Run │  mWd/Run │  mjWd/Run │  Prom/Run │ Percentage │
├─────────────────┼────────────┼──────────┼───────────┼───────────┼────────────┤
│ List-based env  │ ********** │ ******** │ ********* │ ********* │ ********** │
│ Functional env  │ ********** │ ******** │ ********* │ ********* │ ********** │
│ Map-based env   │ ********** │ ******** │ ********* │ ********* │ ********** │
│ Map + int H/T   │ ********** │ ******** │ ********* │ ********* │ ********** │
└─────────────────┴────────────┴──────────┴───────────┴───────────┴────────────┘

The core_bench library provides a number of flags that we can study. Check them out by running

_build/install/default/bin/envbench --help