Exercises for Week 3#
Exercise overview#
In compiler implementation, an essential
data structure is that of associative maps, also
often referred to as environments. For example,
type checking uses environments
for associating variables with types, code generation
uses environments for mapping variables to target
registers, and optimization phases use environments extensively, e.g.,
for tracking uses and definitions of variables. Note that
while it is usual that
keys in the environments are identifiers,
(that is, variable names), that is not always the case.
For example, in common subexpression elimination
optimization that optimizes
x = a + b; y = a + b
to x = a + b; y = x
,
the keys are expressions such as a + b
.
The exercises below explore four different approaches for implementing environments. For simplicity, we consider enviromnents where keys are strings and values are integers. The examples make use of OCaml’s module system. This fulfills the secondary goal of these exercises to familiarize oneself with OCaml modules. To read up on OCaml module system, see Chapter 5 of the OCaml book.
The four approaches for implementing environments we consider are
List-based environments.
Functional environments.
Map-based environments.
Map-based environments with integer hashtable.
Relevant reading#
For the implementation of approach 4, the relevant reading is Section 5.1 of Andrew Appel’s book Modern Compiler Implementation in ML.
Environment signature#
We start off by defining the type of our keys, the NotFound
exception, and
a module signature that will be shared by all four
implementations
type value = int
exception NotFound (* we throw this exception when the key is not found *)
module type EnvSig = sig
type env (* abstract environment type *)
type symbol (* abstract symbol type *)
val empty_env : env (* create an empty environment *)
val insert : env -> symbol -> value -> env (* insert; returns updated environment *)
val lookup : env -> symbol -> value (* lookup; returns the value or raises NotFound *)
val symbol : string -> symbol (* create a symbol value from string *)
val approach_name: string (* the name of our approach, for benchmarking later *)
end
Most of the signature is standard, except perhaps the symbol
type and the symbol
function. They introduce a layer of indirection.
Instead of inserting a key-value pair directly
into the environment, we first convert the key into a so-called symbol value (that has type symbol
)
which we pass over as the argument to insert
. This layer of indirection is needed
by the last approach, which promises a bit of extra performance. We added it to the signature
in order to parametrize the benchmarking; but everywhere except the last implementation, the
symbol
function is implemented as identity function fun x -> x
, and the symbol
type is a synonym to string
.
For example, if E is a module implementing our environments, we can write a function such as
let example () =
let open E in
let e0 = empty_env in
let x_sym = symbol "x" in
let e1 = insert e0 x_sym 5 in
let result = lookup e1 x_sym in
Printf.printf "%d\n" result (* prints 5 *)
List-based environments#
Our first implementation uses associative lists as environments. See the following
start of an implementation. Two functions are left to you as an exercise – they are the ones
raising the TODO
exception.
(* replace the code that raises TODO with your implementation *)
exception TODO
(* Approach 1 - list based environments *)
module ListEnv : EnvSig = struct
let approach_name = "List-based environments"
type env = (string * value) list
type symbol = string
let symbol = fun x -> x
let empty_env = []
let insert e k v = raise TODO
let lookup e k = raise TODO
end
Testing your module#
When you complete the module you can start testing it. For example,
you can repurpose the example
question from above where you replace E
with ListEnv
.
Do create your own examples to see that this module works as expected.
This approach to testing applies to all of the remaining implementations.
Functional environments#
The following approach uses functions as environments. It is closer in spirit
to the mathematical notion of an associative map being a function from keys to values.
We have now included the implementation of insert
, and we ask you to only
implement the lookup
function here.
(* Approach 2 - functional environments *)
module FunEnv : EnvSig = struct
let approach_name = "Functional environments"
type env = string -> value
type symbol = string
let symbol = fun x -> x
let empty_env = fun _ -> raise NotFound
let insert e k v = fun x -> if x = k then v else e x
let lookup e k = raise TODO
end
Map-based environments#
Next implementation is a thin wrapper around the OCaml’s Map
standard library.
This way of wrapping allows us to see how this implementation relates to the other approaches.
(* Approach 3 - Map-based environments *)
module MapEnv : EnvSig = struct
let approach_name = "Map-based environments"
module StringMap = Map.Make (String) (* using Map functor; do read up on this !*)
type symbol = string
let symbol = fun x-> x
type env = value StringMap.t
let empty_env = StringMap.empty
let lookup e k = raise TODO
let insert e k v = raise TODO
end
Map-based environments with int hashtables#
The following implementation is based on the symbol tables as described in Modern Compiler Implementation in ML, Chapter 5.1.
(* Approach 4 - Map-based environments with int hashtable;
based on MCIML, Chapter 5.1 *)
module MapHashEnv : EnvSig = struct
let approach_name = "Map-based envs with int H/T"
let nextsym = ref 0
type symbol = string * int
module H = Hashtbl
let hashtbl: (string, int) H.t = H.create 2048 (* some init size *)
module SymbolMap =
Map.Make (
struct
type t = symbol
let compare (_,n1) (_,n2) = compare n1 n2
end)
let symbol name =
match Hashtbl.find_opt hashtbl name with
| Some i -> (name, i)
| None ->
let i = !nextsym in
nextsym := i + 1;
H.add hashtbl name i ;
(name, i)
type env = value SymbolMap.t
let empty_env = SymbolMap.empty
let insert e k v = raise TODO
let lookup e k = raise TODO
end;;
Make sure you understand the code and the purpose of the symbol table-based indirection.
Benchmarking#
So, four ways to do the same thing!? But what is the right way? A back-of-an-envelop asymptotic analysis will tell us that both the list-based environments and functional environments are inferior to the map-based ones (can you see why?). It may also be instructive to do an empirical analysis to get a sense of how all four of the approaches relate to each other.
How should one benchmark these? Ideally, we should have a finished implementation of a compiler
where the environment module can be replaced and we perform an end-to-end evaluation of how different implementations affect compilper performance. Alas, we do not have a finished (or even started) compiler. Instead, we will try to use a micro-benchmarking approach, using an OCaml library called core_bench
. You can read about the idea of micro-benchmarking and the library we use
in this Jane Street blog post.
(* Benchmarking using core_bench *)
(* This probably requires extra libraries to install using opam *)
let _ = Random.self_init();;
let random_string n =
String.init n (fun _ -> Char.chr(97 + (Random.int 26)))
let max_n = 2000
let name_strings = List.init max_n (fun _ -> (random_string 10))
let values = List.init max_n (fun x -> x )
module type BenchSig = sig
val approach_name : string
val bench : unit -> int
end
(* Question to students: is this a good way of doing benchmarking? *)
(* Are there any issues to discuss about this? *)
module EnvBench (E:EnvSig):BenchSig = struct
include E
let names = List.map symbol name_strings
let names_and_values = List.combine names values
let bench () =
let e1 = List.fold_left (fun e (k,v) -> insert e k v ) empty_env names_and_values in
let s = List.fold_left (fun s n -> s + lookup e1 n ) 0 names in
s
end
(* main function *)
let () =
let open Core in
let open Core_bench in
let f m = let module M = (val m : BenchSig) in
Bench.Test.create ~name: M.approach_name M.bench
in (List.map [(module EnvBench (ListEnv) : BenchSig);
(module EnvBench (FunEnv) : BenchSig);
(module EnvBench (MapEnv) : BenchSig);
(module EnvBench (MapHashEnv) : BenchSig);
] ~f: f)
|> Bench.make_command (* we're plugging into the bench main functionality *)
|> Command_unix.run
Configuring and running the benchmark#
We use the following dune configuration; see the list of libraries that you may need
to install using opam
.
(executable
(name envbench)
(libraries core core_bench core_unix.command_unix))
(install
(section bin)
(files (envbench.exe as envbench)))
(env
(dev
(flags (:standard -warn-error -A))))
With the above configuration, dune build
will produce a binary envbench
that we can call
_build/install/default/bin/envbench
You will get an output that looks like follows. We have redacted the actual numbers with asterisks – run the benchmark to see the numbers for yourself!
Estimated testing time 40s (4 benchmarks x 10s). Change using '-quota'.
┌─────────────────┬────────────┬──────────┬───────────┬───────────┬────────────┐
│ Name │ Time/Run │ mWd/Run │ mjWd/Run │ Prom/Run │ Percentage │
├─────────────────┼────────────┼──────────┼───────────┼───────────┼────────────┤
│ List-based env │ ********** │ ******** │ ********* │ ********* │ ********** │
│ Functional env │ ********** │ ******** │ ********* │ ********* │ ********** │
│ Map-based env │ ********** │ ******** │ ********* │ ********* │ ********** │
│ Map + int H/T │ ********** │ ******** │ ********* │ ********* │ ********** │
└─────────────────┴────────────┴──────────┴───────────┴───────────┴────────────┘
The core_bench library provides a number of flags that we can study. Check them out by running
_build/install/default/bin/envbench --help