5. Dolphin – Phase 2#

Attention

This is a group assignment. The workload is calibrated for a group of 3.

In case of questions regarding ambiguity of what you should do, ask questions on the forum. If you are in doubt and there is no enough time, use your best judgment and explain your reasoning in your report.

5.1. Assignment overview#

This assignment is builds on the previous assignment and covers translating a fragment of Dolphin language, in the AST form, into LLVM--. This phase extends the language in two ways.

  1. the language is extended with loops.

  2. the language is extended to support multiple variable declarations in one declaration statement.

There are 2 tasks and no questions in this assignment. There are no glory questions in this assignment.

5.1.1. What you need to get started#

This assignment is continuation of the previous assignment. To get started you need to edit the code from the previous assignment.

5.1.2. What you need to hand in#

Please hand in a .zip file containing the following

  1. A brief report documenting your solution. Acceptable report formats are .pdf, .rtf, and .md. For each task and question, briefly (1 – 4 sentences) describe your implementation or answer. Write concisely.

  2. All the source files needed to reproduce your solution. This also includes the C code provided. Please explain in your report how the solution could be reproduced, e.g., calling make (if you have made a Makefile), the command line to call clang, etc.

Important

Make sure to understand all the code you hand in, including what is copied from here. (The code for pretty printing (typed) ASTs is an exception here; see the appendix below.)

5.2. The Abstract Syntax Tree (AST) of Dolphin (phase 2)#

In this phase, a Dolphin program is still simply a sequence of statements. Recall that, intuitively, this is to be understood as the body of the main function.

The OCaml types for the AST describing programs is given below:

(* -- Use this in your solution without modifications *)
type ident = Ident of {name : string;}

type typ = | Int | Bool

type binop = | Plus | Minus | Mul | Div | Rem | Lt 
  | Le | Gt | Ge | Lor | Land | Eq | NEq

type unop = | Neg | Lnot

type expr =
| Integer of {int : int64}
| Boolean of {bool : bool}
| BinOp of {left : expr; op : binop; right : expr}
| UnOp of {op : unop; operand : expr}
| Lval of lval
| Assignment of {lvl : lval; rhs : expr;}
| Call of {fname : ident; args : expr list}
and lval =
| Var of ident

type single_declaration = 
  Declaration of {name : ident; tp : typ option; body : expr}

type declaration_block = DeclBlock of single_declaration list

type for_init =
| FIExpr of expr
| FIDecl of declaration_block

type statement =
| VarDeclStm of declaration_block
| ExprStm of {expr : expr option}
| IfThenElseStm of {cond : expr; thbr : statement; elbro : statement option}
| WhileStm of {cond : expr; body : statement}
| ForStm of { init : for_init option
            ; cond : expr option
            ; update : expr option
            ; body : statement }
| BreakStm
| ContinueStm
| CompoundStm of {stms : statement list}
| ReturnStm of {ret : expr}

type program = statement list

Action item

The AST declarations above should replace the contents of the module called Ast. Do not change the code above.

Note that in the AST above the types typ, binop, unop, expr, lval, and program have not changed at all. Note that the VarDeclStm is changed. It no longer consists of an inlined record. Instead, it consists of a declaration_block, which in turn is a list of single_declarations. This change corresponds to allowing declaration statements to declare multiple variables in one statement, e.g.,

var x : int = 2, z = 5, w = "abc", t = x + z;

The meaning of multiple declarations is to be understood from left to right, and variables declared before can be used in the initialization expressions of later variables. The declaration statement above is identical in meaning to the following sequence of declarations.

var x : int = 2;
var z = 5;
var w = "abc";
var t = x + z;

Hint

There should be no major changes necessary, in either semantic analysis or code generation, to support multiple declarations in one statement, if variable declarations are handled properly in the previous phase. A simple fold over the list should suffice. Also note that there is no need to assume/enforce that the length of the list is non-empty in this phase.

The type statement, in the AST declaration above, has four new cases: WhileStm (while loops), ForStm (for loops), BreakStm (the break statement), and ContinueStm (the continue statement).

  • The while loop construction carries a condition expression and a body which is a statement.

  • The for loop on the other hand is more involved. It carries an optional initialization, an optional condition expression, an optional update expression, and a body statement.

    The initialization, if present, is either a declaration block, or an expression. If the initialization of a for loop is a declaration block, the scope of those declarations is the entire for loop, i.e., condition, update, and the body of the loop. If either of the initialization or the update parts of the for loop are absent, they should be understood as do nothing, similarly to empty expression statements (no expression present).

    In case of the update expression, or the initialization (when it is an expression), there is no requirement about the type. They could be expressions of any type, as long as they are well-typed. The condition, on the other hand, when present should be a boolean expression. When absent, the condition is to be understood as true. As a simple example, the for loop for(;;){} with no initialization, condition, or update present is an infinite loop that does nothing, just like while(true){}. Do note that the condition of the while loop must always be present. This is reflected in type of the AST.

  • The break and continue statements, respectively break out of, and go to the beginning of the current loop. Here, current loop means the innermost enclosing loop (either a while or a for). Note that break and continue statements should never appear outside a loop. Semantic analysis should check this. For the code generation of the continue statement it is important to note that continue behaves differently based on whether it innermost enclosing loop is a for or a while loop. In case of the while loop, the continue statement causes the loop to start over starting with evaluating the condition of the loop. In case of the for loop, however, the continue statement causes the update clause to be evaluated before evaluating the loop condition.

Hint

In order to handle break and continue properly one needs to extend the environments. For semantic analysis, the environment should track whether the part of the program being analyzed is inside a loop or not. For code generation, the environment should track the label to jump to in case of break and continue.

5.3. The Typed Abstract Syntax Tree (AST) of Dolphin (phase 2)#

Recall the typed AST differs from the AST in that it includes all the necessary type information for (LLVM) code generation. The differences between the two notions of AST can be subtle. Please pay attention to the details.

The OCaml types for the typed AST describing programs is given below:

(* -- Use this in your solution without modifications *)
module Sym = Symbol

type ident = Ident of {sym : Sym.symbol}

type typ = | Void | Int | Bool | ErrorType

type binop = | Plus | Minus | Mul | Div | Rem | Lt 
  | Le | Gt | Ge | Lor | Land | Eq | NEq

type unop = | Neg | Lnot

type expr =
| Integer of {int : int64}
| Boolean of {bool : bool}
| BinOp of {left : expr; op : binop; right : expr; tp : typ}
| UnOp of {op : unop; operand : expr; tp : typ}
| Lval of lval
| Assignment of {lvl : lval; rhs : expr; tp : typ}
| Call of {fname : ident; args : expr list; tp : typ}
and lval =
| Var of {ident : ident; tp : typ}

type single_declaration = Declaration of {name : ident; tp : typ; body : expr}

type declaration_block = DeclBlock of single_declaration list

type for_init =
| FIDecl of declaration_block
| FIExpr of expr

type statement =
| VarDeclStm of declaration_block
| ExprStm of {expr : expr option}
| IfThenElseStm of {cond : expr; thbr : statement; elbro : statement option}
| WhileStm of {cond : expr; body : statement}
| ForStm of { init : for_init option 
            ; cond : expr option
            ; update : expr option
            ; body : statement }
| BreakStm
| ContinueStm
| CompoundStm of {stms : statement list}
| ReturnStm of {ret : expr}

type param = Param of {paramname : ident; typ : typ}

type funtype = FunTyp of {ret : typ; params : param list}

type program = statement list

Action item

The AST declarations above should replace the contents of the module called TypedAst. Do not change the code above.

For an explanation of the main differences between (untyped) ASTs we saw earlier and typed ASTs consult the previous assignment.

5.4. Semantic analysis of Dolphin programs (phase 2)#

The first task asks you to update semantic analysis of the previous assignment to support the new extensions.

Task 1: Implement semantic analysis
Implement semantic analysis for phase 2 of Dolphin as described above. Extend the errors in the Errors module to accommodate new ways a program can be semantically malformed.

5.5. Code generation#

Recall that the compiler must generate valid LLVM code for all semantically valid programs. For this purpose we use the cfgBuilder module.

Task 2: Implement code generation
Implement code generation for phase 2 of Dolphin as described above.

Task 3: Testing and consolidation

In this task, we consolidate the two parts from the previous tasks, and test our project in an end-to-end fashion.

  1. You should already have a top-level function compile_prog. If implemented properly before, this function should not require any changes for this assignment. In other words, ensure the following behavior for compile_prog.

    • Given an AST, compile_prog runs the semantic analysis on it. If there are any errors, they should be printed on standard error output, and the program should exit with exit code 1. If there are no errors, it proceeds to generate the LLVM translation. The result of the translation should be output on standard output, and the program exits with exit code 0.

  2. Write a test corpus consisting of 10 or more example programs that cover all the new features of the language. It is important that the tests you write are relevant. Think about negative and positive tests. For negative tests, the critical aspect is the semantic analysis. In particular, if your semantic analysis reports a particular type of an error, your test corpus should include a program that exhibits that kind of error. For positive tests, your tests should cover all new features of the language, i.e., all possible cases of using (and nesting) for and while loops, as well the break and continue statements in them, etc. Do remember to produce positive and negative tests for declaration statements declaring multiple variables.

    Attention

    Do not take this part of the assignment lightly. Use this task as a vehicle for discovering bugs and logical errors in your project.

5.6. Appendix#

Recall that you will need libraries printbox and printbox-text to use pretty printers below. These can be installed using opam using the following command opam install printbox printbox-text

These pretty printers produce a so-called box which is the terminology that printbox uses to refer to formatted, structured texts. A box ca be printed as follows:

PrintBox_text.output stdout (Pretty.program_to_tree prog)

This will print the AST of the program as a tree.

5.6.1. pretty printer for ASTs (module Pretty)#

module PBox = PrintBox
open Ast

(* producing trees for pretty printing *)
let typ_style = PBox.Style.fg_color PBox.Style.Green
let ident_style = PBox.Style.fg_color PBox.Style.Yellow
let fieldname_style = ident_style
let keyword_style = PBox.Style.fg_color PBox.Style.Blue

let info_node_style = PBox.Style.fg_color PBox.Style.Cyan

let make_typ_line name = PBox.line_with_style typ_style name
let make_fieldname_line name = PBox.line_with_style fieldname_style name
let make_ident_line name = PBox.line_with_style ident_style name
let make_keyword_line name = PBox.line_with_style keyword_style name

let make_info_node_line info = PBox.line_with_style info_node_style info

let ident_to_tree (Ident {name}) = make_ident_line name

let typ_to_tree tp =
  match tp with
  | Bool -> make_typ_line "Bool"
  | Int -> make_typ_line "Int"

let binop_to_tree op =
  match op with
  | Plus -> make_keyword_line "PLUS"
  | Minus -> make_keyword_line "Minus"
  | Mul -> make_keyword_line "Mul"
  | Div -> make_keyword_line "Div"
  | Rem -> make_keyword_line "Rem"
  | Lt -> make_keyword_line "Lt"
  | Le -> make_keyword_line "Le"
  | Gt -> make_keyword_line "Gt"
  | Ge -> make_keyword_line "Ge"
  | Lor -> make_keyword_line "Lor"
  | Land -> make_keyword_line "Land"
  | Eq -> make_keyword_line "Eq"
  | NEq -> make_keyword_line "NEq"

let unop_to_tree op =
  match op with
  | Neg -> make_keyword_line "Neg"
  | Lnot -> make_keyword_line "Lor"
  
let rec expr_to_tree e =
  match e with
  | Integer {int; _} -> PBox.hlist ~bars:false [make_info_node_line "IntLit("; PBox.line (Int64.to_string int); make_info_node_line ")"]
  | Boolean {bool; _} -> PBox.hlist ~bars:false [make_info_node_line "BooleanLit("; make_keyword_line (if bool then "true" else "false"); make_info_node_line ")"]
  | BinOp {left; op; right; _} -> PBox.tree (make_info_node_line "BinOp") [expr_to_tree left; binop_to_tree op; expr_to_tree right]
  | UnOp {op; operand; _} -> PBox.tree (make_info_node_line "UnOp") [unop_to_tree op; expr_to_tree operand]
  | Lval l -> PBox.tree (make_info_node_line "Lval") [lval_to_tree l]
  | Assignment {lvl; rhs; _} -> PBox.tree (make_info_node_line "Assignment") [lval_to_tree lvl; expr_to_tree rhs]
  | Call {fname; args; _} ->
    PBox.tree (make_info_node_line "Call")
      [PBox.hlist ~bars:false [make_info_node_line "FunName: "; ident_to_tree fname];
       PBox.tree (make_info_node_line "Args") (List.map (fun e -> expr_to_tree e) args)]
and lval_to_tree l =
  match l with
  | Var ident -> PBox.hlist ~bars:false [make_info_node_line "Var("; ident_to_tree ident; make_info_node_line ")"]

let single_declaration_to_tree (Declaration {name; tp; body; _}) =
  PBox.tree (make_keyword_line "Declaration") 
    [PBox.hlist ~bars:false [make_info_node_line "Ident: "; ident_to_tree name]; 
    PBox.hlist ~bars:false [make_info_node_line "Type: "; Option.fold ~none:PBox.empty ~some:typ_to_tree tp];
    PBox.hlist ~bars:false [make_info_node_line "Body: "; expr_to_tree body]]

let declaration_block_to_tree (DeclBlock declarations) =
PBox.tree (make_keyword_line "VarDecl")  (List.map single_declaration_to_tree declarations)

let for_init_to_tree = function
| FIDecl db -> PBox.hlist ~bars:false [PBox.line "ForInitDecl: "; declaration_block_to_tree db]
| FIExpr e -> PBox.hlist ~bars:false [PBox.line "ForInitExpr: "; expr_to_tree e]

let rec statement_to_tree c =
  match c with
  | VarDeclStm db -> PBox.hlist ~bars:false [PBox.line "DeclStm: "; declaration_block_to_tree db]
  | ExprStm {expr} -> PBox.hlist ~bars:false [make_info_node_line "ExprStm: "; Option.fold ~none:PBox.empty ~some:expr_to_tree expr]
  | IfThenElseStm {cond; thbr; elbro} ->
    PBox.tree (make_keyword_line "IfStm")
      ([PBox.hlist ~bars:false [make_info_node_line "Cond: "; expr_to_tree cond]; PBox.hlist ~bars:false [make_info_node_line "Then-Branch: "; statement_to_tree thbr]] @
       match elbro with None -> [] | Some elbr -> [PBox.hlist ~bars:false [make_info_node_line "Else-Branch: "; statement_to_tree elbr]])
  | WhileStm {cond; body} ->
    PBox.tree (make_keyword_line "WhileStm") 
      [PBox.hlist ~bars:false [make_info_node_line "Cond: "; expr_to_tree cond];
        PBox.hlist ~bars:false [make_info_node_line "Body: "; statement_to_tree body]]
  | ForStm {init; cond; update; body} ->
    PBox.tree (make_keyword_line "ForStm") 
      [PBox.hlist ~bars:false [make_info_node_line "Init: "; Option.fold ~none:PBox.empty ~some:for_init_to_tree init];
        PBox.hlist ~bars:false [make_info_node_line "Cond: "; Option.fold ~none:PBox.empty ~some:expr_to_tree cond];
        PBox.hlist ~bars:false [make_info_node_line "Update: "; Option.fold ~none:PBox.empty ~some:expr_to_tree update];
        PBox.hlist ~bars:false [make_info_node_line "Body: "; statement_to_tree body]]
  | BreakStm -> make_keyword_line "BreakStm"
  | ContinueStm -> make_keyword_line "ContinueStm"
  | CompoundStm {stms} -> PBox.tree (make_info_node_line "CompoundStm") (statement_seq_to_forest stms)
  | ReturnStm {ret} -> PBox.hlist ~bars:false [make_keyword_line "ReturnValStm: "; expr_to_tree ret]
and statement_seq_to_forest stms = List.map statement_to_tree stms

let program_to_tree prog = 
  PBox.tree (make_info_node_line "Program") (statement_seq_to_forest prog)

5.6.2. pretty printer for ASTs (module TypedPretty)#

module Sym = Symbol
module PBox = PrintBox
open TypedAst

let typ_to_string = function
| Void -> "void"
| Int -> "int"
| Bool -> "bool"
| ErrorType -> "'type error'"

(* producing trees for pretty printing *)
let ident_to_tree (Ident {sym}) = Pretty.make_ident_line (Sym.name sym)

let typ_to_tree tp =
  match tp with
  | Void -> Pretty.make_typ_line "Void"
  | Int -> Pretty.make_typ_line "Int"
  | Bool -> Pretty.make_typ_line "Bool"
  | ErrorType -> PBox.line_with_style (PBox.Style.set_bg_color PBox.Style.Red PBox.Style.default) "ErrorType"

let binop_to_tree op =
  match op with
  | Plus -> Pretty.make_keyword_line "PLUS"
  | Minus -> Pretty.make_keyword_line "Minus"
  | Mul -> Pretty.make_keyword_line "Mul"
  | Div -> Pretty.make_keyword_line "Div"
  | Rem -> Pretty.make_keyword_line "Rem"
  | Lt -> Pretty.make_keyword_line "Lt"
  | Le -> Pretty.make_keyword_line "Le"
  | Gt -> Pretty.make_keyword_line "Gt"
  | Ge -> Pretty.make_keyword_line "Ge"
  | Lor -> Pretty.make_keyword_line "Lor"
  | Land -> Pretty.make_keyword_line "Land"
  | Eq -> Pretty.make_keyword_line "Eq"
  | NEq -> Pretty.make_keyword_line "NEq"

let unop_to_tree op =
  match op with
  | Neg -> Pretty.make_keyword_line "Neg"
  | Lnot -> Pretty.make_keyword_line "Lor"

let rec expr_to_tree e =
  match e with
  | Integer {int; _} -> PBox.hlist ~bars:false [Pretty.make_info_node_line "IntLit("; PBox.line (Int64.to_string int); Pretty.make_info_node_line ")"]
  | Boolean {bool; _} -> PBox.hlist ~bars:false [Pretty.make_info_node_line "BooleanLit("; Pretty.make_keyword_line (if bool then "true" else "false"); Pretty.make_info_node_line ")"]
  | BinOp {left; op; right; tp; _} -> PBox.tree (Pretty.make_info_node_line "BinOp") [typ_to_tree tp; expr_to_tree left; binop_to_tree op; expr_to_tree right]
  | UnOp {op; operand; tp; _} -> PBox.tree (Pretty.make_info_node_line "UnOp") [typ_to_tree tp; unop_to_tree op; expr_to_tree operand]
  | Lval l -> PBox.tree (Pretty.make_info_node_line "Lval") [lval_to_tree l]
  | Assignment {lvl; rhs; tp; _} -> PBox.tree (Pretty.make_info_node_line "Assignment") [typ_to_tree tp; lval_to_tree lvl; expr_to_tree rhs]
  | Call {fname; args; tp; _} ->
    PBox.tree (Pretty.make_info_node_line "Call")
      [typ_to_tree tp; 
      PBox.hlist ~bars:false [Pretty.make_info_node_line "FunName: "; ident_to_tree fname];
        PBox.tree (Pretty.make_info_node_line "Args") (List.map (fun e -> expr_to_tree e) args)]
and lval_to_tree l =
  match l with
  | Var {ident; tp} -> PBox.hlist ~bars:false [Pretty.make_info_node_line "Var("; ident_to_tree ident; Pretty.make_info_node_line ")"; PBox.line " : "; typ_to_tree tp;]

let single_declaration_to_tree (Declaration {name; tp; body; _}) =
  PBox.tree (Pretty.make_keyword_line "Declaration") 
    [PBox.hlist ~bars:false [Pretty.make_info_node_line "Ident: "; ident_to_tree name]; 
    PBox.hlist ~bars:false [Pretty.make_info_node_line "Type: "; typ_to_tree tp];
    PBox.hlist ~bars:false [Pretty.make_info_node_line "Body: "; expr_to_tree body]]

let declaration_block_to_tree (DeclBlock declarations) =
  PBox.tree (Pretty.make_keyword_line "VarDecl") (List.map single_declaration_to_tree declarations)

let for_init_to_tree = function
| FIDecl db -> PBox.hlist ~bars:false [PBox.line "ForInitDecl: "; declaration_block_to_tree db]
| FIExpr e -> PBox.hlist ~bars:false [PBox.line "ForInitExpr: "; expr_to_tree e]

let rec statement_to_tree c =
  match c with
  | VarDeclStm db -> PBox.hlist ~bars:false [PBox.line "DeclStm: "; declaration_block_to_tree db]
  | ExprStm {expr; _} -> PBox.hlist ~bars:false [Pretty.make_info_node_line "ExprStm: "; Option.fold ~none:PBox.empty ~some:expr_to_tree expr]
  | IfThenElseStm {cond; thbr; elbro; _} ->
    PBox.tree (Pretty.make_keyword_line "IfStm")
      ([PBox.hlist ~bars:false [Pretty.make_info_node_line "Cond: "; expr_to_tree cond]; PBox.hlist ~bars:false [Pretty.make_info_node_line "Then-Branch: "; statement_to_tree thbr]] @
        match elbro with None -> [] | Some elbr -> [PBox.hlist ~bars:false [Pretty.make_info_node_line "Else-Branch: "; statement_to_tree elbr]])
  | WhileStm {cond; body; _} ->
    PBox.tree (Pretty.make_keyword_line "WhileStm") 
      [PBox.hlist ~bars:false [Pretty.make_info_node_line "Cond: "; expr_to_tree cond];
        PBox.hlist ~bars:false [Pretty.make_info_node_line "Body: "; statement_to_tree body]]
  | ForStm {init; cond; update; body; _} ->
    PBox.tree (Pretty.make_keyword_line "ForStm") 
      [PBox.hlist ~bars:false [Pretty.make_info_node_line "Init: "; Option.fold ~none:PBox.empty ~some:for_init_to_tree init];
        PBox.hlist ~bars:false [Pretty.make_info_node_line "Cond: "; Option.fold ~none:PBox.empty ~some:expr_to_tree cond];
        PBox.hlist ~bars:false [Pretty.make_info_node_line "Update: "; Option.fold ~none:PBox.empty ~some:expr_to_tree update];
        PBox.hlist ~bars:false [Pretty.make_info_node_line "Body: "; statement_to_tree body]]
  | BreakStm -> Pretty.make_keyword_line "BreakStm"
  | ContinueStm -> Pretty.make_keyword_line "ContinueStm"
  | CompoundStm {stms; _} -> PBox.tree (Pretty.make_info_node_line "CompoundStm") (statement_seq_to_forest stms)
  | ReturnStm {ret; _} -> PBox.hlist ~bars:false [Pretty.make_keyword_line "ReturnValStm: "; expr_to_tree ret]
and statement_seq_to_forest stms = List.map statement_to_tree stms

let program_to_tree prg =
  PBox.tree (Pretty.make_info_node_line "Program") (statement_seq_to_forest prg)

5.6.3. C runtime#

The C runtime we use has not changed compared to the previous assignment. See the corresponding description in the previous assignment