Dolphin – Phase 1

4. Dolphin – Phase 1#

Attention

This is a group assignment. You should have formed groups by now. If not, please do so as soon as possible. There will an announcement on BrightSpace explaining how you should register your groups on BrightSpace.

4.1. Assignment overview#

This assignment covers translating a fragment of Dolphin language, in the AST form, into LLVM--. There are 3 tasks and 3 questions in this assignment. Questions marked for glory are optional. Do them only if you feel ambitious and have the time.

4.1.1. What you need to get started#

For LLVM--: make sure you understand everything from the lectures on LLVM--. That is, everything in the LLVM-- chapter except for aggregate and named types, GEP, and phi nodes.
For semantic analysis: this will be covered in lectures.
For programmatic LLVM-- code generation: this is partly covered in TA classes for week 5; and also will be complemented in lectures. This part, in spirit, resembles programmatic generation of x86 code we hand in assignment 2; the constant/copy propagation technique that you will need to use for conversion to LLVM is also covered in the notes on basic optimizations.
For OCaml, the assignment uses OCaml records which are covered in Section 3.4 of the OCaml book.

Important

You will need files symbol.ml, symbol.mli, cfgBuilder.ml, cfgBuilder.mli, and ll.ml as they have been released in the TA classes.

4.1.2. What you need to hand in#

Please hand in a .zip file containing the following

A brief report documenting your solution. Acceptable report formats are .pdf, .rtf, and .md. For each task and question, briefly (1 – 4 sentences) describe your implementation or answer. Write concisely.
All the source files needed to reproduce your solution. This also includes the C code provided. Please explain in your report how the solution could be reproduced, e.g., calling make (if you have made a Makefile), the command line to call clang, etc.

Note

If you are answering any of the glory questions, please submit them as separate solutions, i.e., in a separate, clearly marked subfolder in the .zip file of your submission.

Important

Make sure to understand all the code you hand in, including what is copied from here. (The code for pretty printing (typed) ASTs is an exception here; see the appendix below.)

4.2. The Abstract Syntax Tree (AST) of Dolphin (phase 1)#

In this phase, a Dolphin program is simply a sequence of statements. Intuitively, this is to be understood as the body of the main function.

The OCaml types for the AST describing programs is given below:

(* -- Use this in your solution without modifications *)
type ident = Ident of {name : string;}

type typ = | Int | Bool

type binop =
| Plus | Minus | Mul | Div | Rem | Lt | Le | Gt | Ge | Lor | Land | Eq | NEq

type unop = | Neg | Lnot

type expr =
| Integer of {int : int64}
| Boolean of {bool : bool}
| BinOp of {left : expr; op : binop; right : expr}
| UnOp of {op : unop; operand : expr}
| Lval of lval
| Assignment of {lvl : lval; rhs : expr;}
| Call of {fname : ident; args : expr list}
and lval =
| Var of ident

type statement =
| VarDeclStm of {name : ident; tp : typ option; body : expr}
| ExprStm of {expr : expr option}
| IfThenElseStm of {cond : expr; thbr : statement; elbro : statement option}
| CompoundStm of {stms : statement list}
| ReturnStm of {ret : expr}

type program = statement list

Action item

Put the AST declarations above in a module called Ast. That is, make a file called ast.ml and copy the code above into it. Do not change the code above.

The types ident (program identifiers), typ (types), binop (binary operators), and unop (unary operations) are self-explanatory.

Expressions consist of the following (in the order they appear in type expr above):

Integer literals
Booleans literals
Binary operations consisting of the a left operand, a binary operation, and a right operand
Unary operations consisting of a unary operation and an operand
L-values, i.e., values that can appear on the left-hand-side of an assignment expression (see below), in our fragment the only possible l-values are variables
Assignments consisting of an l-value, and right-hand-side expression assigned to the l-value in question
Function call consisting of a function name (an identifier), and a list of arguments. Note: even though the phase-1 fragment of Dolphin has no function definitions, programs do have access to a very limited standard library, customised for this phase.

Statements consist of the following:

Variable declarations consisting of a name (identifier), optionally a type, and a body (an initialization expression). The optionality of the type annotation for variable declarations can be seen in the difference following two declarations of variable x in var x = 10; and var x : int = 10; as written in Dolphin’s syntax.
Expression statements which allow us to use expressions as statements (this is restricted in semantic analysis). Note that the expression is optional. A None expression statement corresponds to the empty statement, i.e., ; on its own in languages like C and Java.
If statements where cond is the condition (an expression), thbr is the statement for the then branch, and elbro is the optional else branch.
Compound statements: these correspond to statement blocks. Think of code blocks in C and Java enclosed in curly braces. Note: compound statements serve two purposes. They delimit the scope of declarations, and allow us to use a statement blocks inside other statements, e.g., a statement block as the body of the then, or else branch of an if statement.
Return statement where the result of an expression is returned. (Recall that in this phase, the program being the body of the main function should return an integer.)

4.3. The Typed Abstract Syntax Tree (AST) of Dolphin (phase 1)#

The typed AST differs from the AST in that it includes all the necessary type information for (LLVM) code generation. The differences between the two notions of AST can be subtle. Please pay attention to the details.

The OCaml types for the typed AST describing programs is given below:

(* -- Use this in your solution without modifications *)
module Sym = Symbol

type ident = Ident of {sym : Sym.symbol}

type typ = Void | Int | Bool | ErrorType

type binop = Plus | Minus | Mul | Div | Rem | Lt | Le | Gt | Ge | Lor | Land | Eq | NEq

type unop = Neg | Lnot

type expr =
| Integer of {int : int64}
| Boolean of {bool : bool}
| BinOp of {left : expr; op : binop; right : expr; tp : typ}
| UnOp of {op : unop; operand : expr; tp : typ}
| Lval of lval
| Assignment of {lvl : lval; rhs : expr; tp : typ}
| Call of {fname : ident; args : expr list; tp : typ}
and lval =
| Var of {ident : ident; tp : typ}

type statement =
| VarDeclStm of {name : ident; tp : typ; body : expr}
| ExprStm of {expr : expr option}
| IfThenElseStm of {cond : expr; thbr : statement; elbro : statement option}
| CompoundStm of {stms : statement list}
| ReturnStm of {ret : expr}

type param = Param of {paramname : ident; typ : typ}

type funtype = FunTyp of {ret : typ; params : param list}

type program = statement list

Action item

Put the AST declarations above in a module called TypedAst. That is, make a file called typedAst.ml and copy the code above into it. Do not change the code above.

Below, we enumerate and explain the main differences between (untyped) ASTs we saw earlier and typed ASTs:

Types now include the Void type, and the ErrorType type. While the source (ASTs) do not have void in them, library functions, which are featured in this phase 1 Dolphin fragment, can involve the Void type. It is needed for error recovery where the compiler does not merely stop when it first encounters a type error in the program, but continues and reports all errors that it detects.
Identifiers, instead using strings for names, use symbols. Symbols are much more efficient to use than strings.
Many expressions, e.g., unary and binary operations, now also include a type. This extra type, is the type of the entire expression, that is, a binary operation for comparison, with both left and right expressions bing integers, will have its as Bool. Note also how variable declarations, which optionally included types in ASTs, always include a type in typed ASTs (it is not an option anymore). Note that assignments are also accompanied by a type. The reason for this is that an assignment, in addition to storing the right-hand-side value into the relevant memory address, also produces this right-hand-side value as a result. That is, the expression x = y = 1 + 4 is a valid expression, which should be understood in two steps, in the first step assigns 5 to y and results in 5, in the second step, the result of the first step, i.e., 5, is assigned to x. The following is also a valid expression: x = (y = 4 - 2) + 7. It assigns 2 to y and 9 to x.

Inclusion of the typing information allows one to directly determine the type of an expression with a shallow inspection of the expression when necessary, e.g., when we need to decide what LLVM type to use when generating LLVM-- code. This can be done using the following OCaml functions:

let rec type_of_expr = function
  | TAst.Integer _ -> TAst.Int
  | TAst.Boolean _ -> TAst.Bool
  | TAst.BinOp {tp; _} -> tp
  | TAst.UnOp {tp; _} -> tp
  | TAst.Lval lvl -> type_of_lval lvl
  | TAst.Assignment {tp; _} -> tp
  | TAst.Call {tp; _} -> tp
and type_of_lval = function
  | TAst.Var {tp; _} -> tp

4.4. Semantic analysis of Dolphin programs (phase 1)#

The first task asks you to implement semantic analysis. The semantic analysis must check the program semantically. That is, it must make sure that the program satisfies to the criteria for being a valid program that we can produce executable code for. At a high level view this process takes an AST (something of type Ast.program above) and produces a typed AST (something of type TypedAst.program above). This requires to determine all the necessary types to produce a typed AST, e.g., the types of all variable declaration, if omitted, must be inferred. This, however, is not the only requirement. There are a few extra requirements, some of which are specific to this phase. In particular, not all OCaml values of type TypedAst.program are semantically valid programs. The following are extra conditions, that should be guaranteed by semantic analysis, and can therefore be assumed when producing LLVM-- code.

No expression, except for a call to a function with Void return type can have type Void. This also includes the type (inferred for) declared variables. That is, a program var x = f(); where f is a function with Void return type is not semantically valid. Hence, one needs to be careful when inferring types.
Arithmetic operations Plus, Minus, Mul, Div, and Rem can only be applied to integer expressions and produce integer results.
Comparison operators Lt, Le, Gt, and Ge (in this phase) can only be applied to integer operands and result in a boolean.
Comparison operators Eq and NEq require the two operands to have the same (non-void) type and result in a boolean.
Boolean operations Lor and Land can only be applied to boolean operands and result in a boolean. These operators are short-circuiting (explained later).
Variables should be properly resolved and type checked. Note that this is a non-local requirement. In other words, semantic analysis must use environment to remember the type of all variables that are in scope.
A well-typed assignment must have a right-hand-side that has the same (non-void) type as the l-value it is being assigned to.
A valid call must correspond to a call to a known function, in this phase just library functions. The types and number of arguments must match the parameters of the function. Furthermore, the type of the overall expression must the same as the return type of the function.
The body given for variable declaration should match its type, if provided. If not provided, the type of the variable is inferred to be the type of the body expression which must not be void.
A valid expression statement can either be an assignment, or a function call. Any other expression is not considered valid as an expression statement.
The condition of an if statement must be an expression of type boolean. The then branch and the else branch (if present) must valid statements.
A compound statement is valid if all its statements are valid. Any variable declared inside a compound statement are only valid within that block (and obviously only after that declaration itself within the block).
Variables can shadow one another. That is, when used, a variable refers to its closest declaration, e.g., var x = 10; var x = 12; print_integer(x); print_integer(x); is valid and should print 12 twice, var x = 10; {var x = 12; print_integer(x);} print_integer(x); should print 12 followed by 10. However, var x = 10; {var x = 12;} print_integer(x); print_integer(x); should print 10 twice, etc.
Note that function names and local variables live in the same scope. This means that local variables can mask functions. That is, var f = 10; f(12); should always produce an error even if function f exists and has the appropriate type.
Return statements are only valid (in this phase) if the expression returned is an integer.
All programs (in this phase) must always end in a return statement (which must be an integer).

Note

In case of questions regarding ambiguity in the above semantic rules ask questions on the forum. We will try our best to answer promptly. In case necessary, e.g., if you are in doubt and there is no enough time (please do not leave assignments for the last minute!), use your best judgment (you must be able to reasonably defend it) and explain your reasoning in your report. We will do our best to be understanding.

Important

The details of type checking and type inference are discussed in class.

Task 1: Implement semantic analysis
Implement semantic analysis for phase 1 of Dolphin as described above.

The following template OCaml code is a good starting point to embark in this assignment. Here, we assume that a module Errors exists which declares an exception TypeError and a number of errors, e.g., the TypeMismatch error used below. This is explained below.

exception Unimplemented (* your code should eventually compile without this exception *)

let typecheck_typ = function
| Ast.Int -> TAst.Int
| Ast.Bool -> TAst.Bool

(* should return a pair of a typed expression and its inferred type. you can/should use typecheck_expr inside infertype_expr. *)
let rec infertype_expr env expr =
  match expr with
  | _ -> raise Unimplemented
and infertype_lval env lvl =
  match lvl with
  | _ -> raise Unimplemented
(* checks that an expression has the required type tp by inferring the type and comparing it to tp. *)
and typecheck_expr env expr tp =
  let texpr, texprtp = infertype_expr env expr in
  if texprtp <> tp then raise Unimplemented;
  texpr

(* should check the validity of a statement and produce the corresponding typed statement. Should use typecheck_expr and/or infertype_expr as necessary. *)
let rec typecheck_statement env stm =
  match stm with
  | _ -> raise Unimplemented
(* should use typecheck_statement to check the block of statements. *)
and typecheck_statement_seq env stms = raise Unimplemented

(* the initial environment should include all the library functions, no local variables, and no errors. *)
let initial_environment = raise Unimplemented

(* should check that the program (sequence of statements) ends in a return statement and make sure that all statements are valid as described in the assignment. Should use typecheck_statement_seq. *)
let typecheck_prog prg = raise Unimplemented

We recommend that you create three separate modules for this task. One module is for the semantic analysis following the example above; let us call this the Semant module. The other two should be the Errors module, and the Env module, roughly following the structures below. (These modules will be discussed further in class.)

(* Errors module *)
module Sym = Symbol
module TAst = TypedAst
module TPretty = TypedPretty

type error =
| TypeMismatch of {expected : TAst.typ; actual : TAst.typ}
(* other errors to be added as needed. *)

(* Useful for printing errors *)
let error_to_string err =
  match err with
  | TypeMismatch {expected; actual; _} -> Printf.sprintf "Type mismatch: expected %s but found %s." (TPretty.typ_to_string expected) (TPretty.typ_to_string actual)

(* Env module *)

exception Unimplemented (* your code should eventually compile without this exception *)

type environment = | (* to be defined *)

(* create an initial environment with the given functions defined *)
let make_env function_types = raise Unimplemented

(* insert a local declaration into the environment *)
let insert_local_decl env sym typ = raise Unimplemented

(* lookup variables and functions. Note: it must first look for a local variable and if not found then look for a function. *)
let lookup_var_fun env sym = raise Unimplemented

Question 1: Reflect on the usage of type inference as opposed to type checking in your implementation. Briefly describe what principle guides you in writing your code, that is when you decide to use one vs the other.
Question 2: Reflect on the error recovery strategy in your implementation. How do you implement it?

4.5. Code generation#

The compiler must generate valid LLVM code for all semantically valid programs. For this purpose we use the cfgBuilder module.

Task 2: Implement code generation
Implement code generation for phase 1 of Dolphin as described above.

There are a few key points to pay attention to when producing code for Dolphin programs:

Dolphin has a guaranteed order of evaluation: left to right. For instance, in binary operations, the left operand is executed first. Once the result of the first operand is obtained, then the program continues with executing the right operand, followed by the operation itself. Similarly, function arguments are computed from left to right before the function call.
Pay attention to the short-circuiting nature of boolean and (&&) and or (||) operations. These operations first compute the left operand. If the left operand determines the result of the operation (if it is false in case of && or true in case of ||), the right operand is not computed. This is the standard semantics for such operations. Here, the LLVM program generated must perform the analysis of branch appropriately so as to not compute the right operand, if not necessary. To maintain SSA form, we must either allocate temporary stack space for the result (as we have seen earlier), or use phi nodes (this is one of the glory points below).
As we have done before when translating programs by hand to LLVM, local variables are allocated on the stack (using LLVM--‘s alloca instruction). We need an environment to remember the assigned LLVM-- register for each variable and its LLVM-- type. This is in many ways similar to the environment we use for semantic analysis. However, we no longer need to have functions and their types in the environment. All the necessary information, i.e., the return type, and the type of arguments, is already stored in the typed AST.
Code generation for division must produce code that checks whether the divisor is zero. If so, it should print an error and exit. That is, the “division by zero” hardware exception should never occur in a Dolphin program.

Question 3 (glory **): Use phi nodes to re-implement short-circuiting behavior of boolean and and or operators.

Task 3: Testing and consolidation

In this task, we consolidate the two parts from the previous tasks, and test our project in an end-to-end fashion.

Write a top-level function compile_prog that given an AST, runs the semantic analysis on it. If there are any errors, they should be printed on standard error output, and the program exits with exit code 1. If there are no errors, it proceeds to generate the LLVM translation. The result of the translation should be output on standard output, and the program exits with exit code 0.
Write a test corpus consisting of 10 or more example programs that cover all the features of the language. It is important that the tests you write are relevant. Think about negative and positive tests. For negative tests, the critical aspect is the semantic analysis. In particular, if your semantic analysis reports a particular type of an error, your test corpus should include a program that exhibits that kind of error. For positive tests, your tests should cover all features of the language, i.e., all possible binary operations, calls to the standard library, etc.

Attention

Do not take this part of the assignment lightly. Use this task as a vehicle for discovering bugs and logical errors in your project.

4.6. Appendix#

You will need libraries printbox and printbox-text to use pretty printers below. These can be installed using opam using the following command opam install printbox printbox-text

These pretty printers produce a so-called box which is the terminology that printbox uses to refer to formatted, structured texts. A box ca be printed as follows:

PrintBox_text.output stdout (Pretty.program_to_tree prog)

This will print the AST of the program as a tree.

4.6.1. pretty printer for ASTs (module `Pretty`)#

module PBox = PrintBox
open Ast

(* producing trees for pretty printing *)
let typ_style = PBox.Style.fg_color PBox.Style.Green
let ident_style = PBox.Style.fg_color PBox.Style.Yellow
let fieldname_style = ident_style
let keyword_style = PBox.Style.fg_color PBox.Style.Blue

let info_node_style = PBox.Style.fg_color PBox.Style.Cyan

let make_typ_line name = PBox.line_with_style typ_style name
let make_fieldname_line name = PBox.line_with_style fieldname_style name
let make_ident_line name = PBox.line_with_style ident_style name
let make_keyword_line name = PBox.line_with_style keyword_style name

let make_info_node_line info = PBox.line_with_style info_node_style info

let ident_to_tree (Ident {name}) = make_ident_line name

let typ_to_tree tp =
  match tp with
  | Bool -> make_typ_line "Bool"
  | Int -> make_typ_line "Int"

let binop_to_tree op =
  match op with
  | Plus -> make_keyword_line "Plus"
  | Minus -> make_keyword_line "Minus"
  | Mul -> make_keyword_line "Mul"
  | Div -> make_keyword_line "Div"
  | Rem -> make_keyword_line "Rem"
  | Lt -> make_keyword_line "Lt"
  | Le -> make_keyword_line "Le"
  | Gt -> make_keyword_line "Gt"
  | Ge -> make_keyword_line "Ge"
  | Lor -> make_keyword_line "Lor"
  | Land -> make_keyword_line "Land"
  | Eq -> make_keyword_line "Eq"
  | NEq -> make_keyword_line "NEq"

let unop_to_tree op =
  match op with
  | Neg -> make_keyword_line "Neg"
  | Lnot -> make_keyword_line "Lnot"
  
let rec expr_to_tree e =
  match e with
  | Integer {int; _} -> PBox.hlist ~bars:false [make_info_node_line "IntLit("; PBox.line (Int64.to_string int); make_info_node_line ")"]
  | Boolean {bool; _} -> PBox.hlist ~bars:false [make_info_node_line "BooleanLit("; make_keyword_line (if bool then "true" else "false"); make_info_node_line ")"]
  | BinOp {left; op; right; _} -> PBox.tree (make_info_node_line "BinOp") [expr_to_tree left; binop_to_tree op; expr_to_tree right]
  | UnOp {op; operand; _} -> PBox.tree (make_info_node_line "UnOp") [unop_to_tree op; expr_to_tree operand]
  | Lval l -> PBox.tree (make_info_node_line "Lval") [lval_to_tree l]
  | Assignment {lvl; rhs; _} -> PBox.tree (make_info_node_line "Assignment") [lval_to_tree lvl; expr_to_tree rhs]
  | Call {fname; args; _} ->
    PBox.tree (make_info_node_line "Call")
      [PBox.hlist ~bars:false [make_info_node_line "FunName: "; ident_to_tree fname];
       PBox.tree (make_info_node_line "Args") (List.map (fun e -> expr_to_tree e) args)]
and lval_to_tree l =
  match l with
  | Var ident -> PBox.hlist ~bars:false [make_info_node_line "Var("; ident_to_tree ident; make_info_node_line ")"]

let rec statement_to_tree c =
  match c with
  | VarDeclStm {name; tp; body} -> PBox.tree (make_keyword_line "VarDeclStm") 
    [PBox.hlist ~bars:false [make_info_node_line "Ident: "; ident_to_tree name]; 
    PBox.hlist ~bars:false [make_info_node_line "Type: "; Option.fold ~none:PBox.empty ~some:typ_to_tree tp];
    PBox.hlist ~bars:false [make_info_node_line "Body: "; expr_to_tree body]]
  | ExprStm {expr; _} -> PBox.hlist ~bars:false [make_info_node_line "ExprStm: "; Option.fold ~none:PBox.empty ~some:expr_to_tree expr]
  | IfThenElseStm {cond; thbr; elbro; _} ->
    PBox.tree (make_keyword_line "IfStm")
      ([PBox.hlist ~bars:false [make_info_node_line "Cond: "; expr_to_tree cond]; PBox.hlist ~bars:false [make_info_node_line "Then-Branch: "; statement_to_tree thbr]] @
       match elbro with None -> [] | Some elbr -> [PBox.hlist ~bars:false [make_info_node_line "Else-Branch: "; statement_to_tree elbr]])
  | CompoundStm {stms; _} -> PBox.tree (make_info_node_line "CompoundStm") (statement_seq_to_forest stms)
  | ReturnStm {ret; _} -> PBox.hlist ~bars:false [make_keyword_line "ReturnValStm: "; expr_to_tree ret]
and statement_seq_to_forest stms = List.map statement_to_tree stms

let program_to_tree prog = 
  PBox.tree (make_info_node_line "Program") (statement_seq_to_forest prog)

4.6.2. pretty printer for ASTs (module `TypedPretty`)#

module Sym = Symbol
module PBox = PrintBox
open TypedAst

let typ_to_string = function
| Void -> "void"
| Int -> "int"
| Bool -> "bool"
| ErrorType -> "'type error'"

(* producing trees for pretty printing *)
let ident_to_tree (Ident {sym}) = Pretty.make_ident_line (Sym.name sym)

let typ_to_tree tp =
  match tp with
  | Void -> Pretty.make_typ_line "Void"
  | Int -> Pretty.make_typ_line "Int"
  | Bool -> Pretty.make_typ_line "Bool"
  | ErrorType -> PBox.line_with_style (PBox.Style.set_bg_color PBox.Style.Red PBox.Style.default) "ErrorType"

let binop_to_tree op =
  match op with
  | Plus -> Pretty.make_keyword_line "PLUS"
  | Minus -> Pretty.make_keyword_line "Minus"
  | Mul -> Pretty.make_keyword_line "Mul"
  | Div -> Pretty.make_keyword_line "Div"
  | Rem -> Pretty.make_keyword_line "Rem"
  | Lt -> Pretty.make_keyword_line "Lt"
  | Le -> Pretty.make_keyword_line "Le"
  | Gt -> Pretty.make_keyword_line "Gt"
  | Ge -> Pretty.make_keyword_line "Ge"
  | Lor -> Pretty.make_keyword_line "Lor"
  | Land -> Pretty.make_keyword_line "Land"
  | Eq -> Pretty.make_keyword_line "Eq"
  | NEq -> Pretty.make_keyword_line "NEq"

let unop_to_tree op =
  match op with
  | Neg -> Pretty.make_keyword_line "Neg"
  | Lnot -> Pretty.make_keyword_line "Lor"

let rec expr_to_tree e =
  match e with
  | Integer {int; _} -> PBox.hlist ~bars:false [Pretty.make_info_node_line "IntLit("; PBox.line (Int64.to_string int); Pretty.make_info_node_line ")"]
  | Boolean {bool; _} -> PBox.hlist ~bars:false [Pretty.make_info_node_line "BooleanLit("; Pretty.make_keyword_line (if bool then "true" else "false"); Pretty.make_info_node_line ")"]
  | BinOp {left; op; right; tp; _} -> PBox.tree (Pretty.make_info_node_line "BinOp") [typ_to_tree tp; expr_to_tree left; binop_to_tree op; expr_to_tree right]
  | UnOp {op; operand; tp; _} -> PBox.tree (Pretty.make_info_node_line "UnOp") [typ_to_tree tp; unop_to_tree op; expr_to_tree operand]
  | Lval l -> PBox.tree (Pretty.make_info_node_line "Lval") [lval_to_tree l]
  | Assignment {lvl; rhs; tp; _} -> PBox.tree (Pretty.make_info_node_line "Assignment") [typ_to_tree tp; lval_to_tree lvl; expr_to_tree rhs]
  | Call {fname; args; tp; _} ->
    PBox.tree (Pretty.make_info_node_line "Call")
      [typ_to_tree tp; 
      PBox.hlist ~bars:false [Pretty.make_info_node_line "FunName: "; ident_to_tree fname];
        PBox.tree (Pretty.make_info_node_line "Args") (List.map (fun e -> expr_to_tree e) args)]
and lval_to_tree l =
  match l with
  | Var {ident; tp} -> PBox.hlist ~bars:false [Pretty.make_info_node_line "Var("; ident_to_tree ident; Pretty.make_info_node_line ")"; PBox.line " : "; typ_to_tree tp;]

let rec statement_to_tree c =
  match c with
  | VarDeclStm {name; tp; body; _} ->
    PBox.tree (Pretty.make_keyword_line "VarDeclStm")
      [PBox.hlist ~bars:false [Pretty.make_info_node_line "Ident: "; ident_to_tree name]; 
      PBox.hlist ~bars:false [Pretty.make_info_node_line "Type: "; typ_to_tree tp];
      PBox.hlist ~bars:false [Pretty.make_info_node_line "Body: "; expr_to_tree body]]
  | ExprStm {expr; _} -> PBox.hlist ~bars:false [Pretty.make_info_node_line "ExprStm: "; Option.fold ~none:PBox.empty ~some:expr_to_tree expr]
  | IfThenElseStm {cond; thbr; elbro; _} ->
    PBox.tree (Pretty.make_keyword_line "IfStm")
      ([PBox.hlist ~bars:false [Pretty.make_info_node_line "Cond: "; expr_to_tree cond]; PBox.hlist ~bars:false [Pretty.make_info_node_line "Then-Branch: "; statement_to_tree thbr]] @
        match elbro with None -> [] | Some elbr -> [PBox.hlist ~bars:false [Pretty.make_info_node_line "Else-Branch: "; statement_to_tree elbr]])
  | CompoundStm {stms; _} -> PBox.tree (Pretty.make_info_node_line "CompoundStm") (statement_seq_to_forest stms)
  | ReturnStm {ret; _} -> PBox.hlist ~bars:false [Pretty.make_keyword_line "ReturnValStm: "; expr_to_tree ret]
and statement_seq_to_forest stms = List.map statement_to_tree stms

let program_to_tree prg =
  PBox.tree (Pretty.make_info_node_line "Program") (statement_seq_to_forest prg)

4.6.3. C runtime#

#include <stdio.h>         /* make sure these two includes are    */
#include <inttypes.h>      /* present in the start of your C file */

// your LLVM program must produce a function called dolphin_main of the following type.
extern int64_t dolphin_main();

int64_t read_integer () {
    int64_t value;
    printf("Please enter an integer: ");
    scanf("%" PRId64 "" , &value);
    return value;
}

void print_integer (int64_t value) {
    printf("%" PRId64 "\n" , value);
}

int main(){
    return dolphin_main();
}

The following OCaml code defines the type of the two library functions. Use this code, placed in a separate module, to bootstrap your semantic analysis and code generation.

module TAst = TypedAst
module Sym = Symbol

let make_ident name = TAst.Ident {sym = Sym.symbol name}

let library_functions =
  [
    (Symbol.symbol "read_integer", TAst.FunTyp {ret = TAst.Int; params = []});
    (Symbol.symbol "print_integer", TAst.FunTyp {ret = TAst.Void; params = [Param {paramname = make_ident "value"; typ = TAst.Int}]})
  ]

Dolphin – Phase 1

Contents

4. Dolphin – Phase 1#

4.1. Assignment overview#

4.1.1. What you need to get started#

4.1.2. What you need to hand in#

4.2. The Abstract Syntax Tree (AST) of Dolphin (phase 1)#

4.3. The Typed Abstract Syntax Tree (AST) of Dolphin (phase 1)#

4.4. Semantic analysis of Dolphin programs (phase 1)#

4.5. Code generation#

4.6. Appendix#

4.6.1. pretty printer for ASTs (module Pretty)#

4.6.2. pretty printer for ASTs (module TypedPretty)#

4.6.3. C runtime#

4.6.1. pretty printer for ASTs (module `Pretty`)#

4.6.2. pretty printer for ASTs (module `TypedPretty`)#