6. Dolphin – Phase 4#
Attention
This is a group assignment. The workload is calibrated for a group of 3. Please also see the recommended workflow section in the Appendix below.
In case of questions regarding ambiguity of what you should do, ask questions on the forum. If you are in doubt and there is no enough time, use your best judgment and explain your reasoning in your report.
6.1. Assignment overview#
This assignment builds on the previous three assignments: Phase 1, Phase 2, and Phase 3.
Phase 4 of Dolphin extends the language with two new features
Top-level functions, and
Comma expressions that are used to sequentialize expressions.
There are 5 tasks and no questions in this assignment. There is one glory task. Do it only if you have the time and feel ambitious.
6.1.1. What you need to get started#
This assignment is a continuation of the previous assignment. You will need to edit the code from the previous assignment.
You will need to understand Menhir parser generator. See Menhir documentation.
Important
You will need files deserializer.ml
and deserializer.mli
for Phase 4 as they have been released in the TA classes.
6.1.2. What you need to hand in#
Please hand in a .zip
file named group<XY>.zip
(replace <XY>
by your group number) containing the following
A brief report documenting your solution. Acceptable report formats are
.pdf
,.rtf
, and.md
. For each task and question, briefly (1 – 4 sentences) describe your implementation or answer. Write concisely.All the source files needed to reproduce your solution. This also includes the
C
code provided. Please explain in your report how the solution could be reproduced, e.g., callingmake
(if you have made aMakefile
), the command line to callclang
, etc. We recommend using thedune
anddune-project
released in class.All the tests that you create (see Task 5) should be placed into a directory
assignment-07-tests
as individual.dlp
files.
Before submitting, check that your zip file is good for submission with the script presub.sh
available in the docker container.
You can use it with the following command in your terminal: ./presub.sh <phase> <path_to_zip>
.
You must replace <phase>
by the phase of the assignment (1,2,3,4 or 5), and <path_to_zip>
by the path to you zip file.
It checks that:
all required files are present and not empty
your project compiles with
dune
your compiler can compile the simplest Dolphin source code
return 0;
.
Important
Make sure to understand all the code you hand in. (The code for the deserializer is an exception here.)
6.2. Functions in Dolphin#
Phase 4 Dolphin program consists of a list of one or more function declarations. The following applies to function declarations.
The syntax for function declarations is
<return-type> <function-name> ( <arguments> ) { <function-body> }
, where<arguments>
is a list of zero or more comma-separated pairs of the form<argument-name>: <argument-type>
. Note that both the return type and the argument types must be explicitly given. The following is an example of a declaration of function namedf
with return typeint
that takes two argumentsx
andy
, both of typeint
./* valid function declaration */ int f (x: int, y : int) { return x+y; }
The information about the function name, return type, and the argument number and types, is called function signature.
Exactly one function in the program must have name
main
with the return typeint
, and no arguments.Within a program, all function names must be unique.
Within a function, all argument names must be unique. For example, the following declaration is invalid
/* invalid function declaration - duplicate argument x */ int f (x: int, x : int) { return x; }
Within a function, lexical scoping rules apply.
Arguments can be modified in the function as if they were local variables.
/* valid program */ int f (x:int) { x = x * 10; return x; } int main () { var x:int = 1; var y = f (x); return x + y ; /* returns 11 */ }
Within a program, all functions are mutually recursive. For example, in the following program, function
odd
is accessible fromeven
despite their declaration order./* even-odd example - purpose: showcases mutual recursion - this program is valid. - returns 1 */ int even (x:int) { if (x == 0) { return 0; } else { return odd (x - 1); /* odd is in scope */ } } int odd (x: int) { if (x == 0) { return 1; } else { return even (x - 1); /* even is in scope */ } } int main () { return even (5); }
If the return type of the function is not
void
, all paths in the program must return. See, for example, functioneven
above. The type of the return statements must agree with the return-type of the function. In void functions, the return statement may be omitted.Variables and function names share their namespace.
6.3. Comma expressions#
Comma expressions are a form of expressions that are separated with a comma. The semantics is
that of sequencing the evaluation of the expressions. The expression to the left of the comma
is evaluated before the expression to the right of the comma. The result of the comma expression is that of the right subexpression. In other words, the result of the left-subexpression is ignored. Comma expressions are useful for their side-effects, e.g., assignments or side-effects through function calls. A common use case for comma
expressions is in the update expression in for
-loops.
int main () {
var _o = get_stdout ();
for ( var i: int = 0, j: int = 9
; i < 10
; i = i+1, j=j-1 /* obs: comma expression here */ )
{
output_string (int_to_string (i), _o);
output_string (" ", _o);
output_string (int_to_string (j), _o);
output_string ("\n", _o);
}
return 0;
}
The following applies to comma experissions.
In particular, comma expressions are not valid expression statements. That is, as in the earlier phases, a valid expression statement can either be an assignment, or a function call. In particular, the following program is invalid.
int main () {
var x = 0;
x = 1,x = 2; /* invalid expression statement */
/* the programmer should use semi-colon instead ;) */
return x;
}
The following program, however, is valid, because the comma expression appears as the right-hand side of the assignment expression statement.
/* valid program */
int main () {
var x = 0;
x = (1,2); /* valid expression statement */
return x; /* returns 2 */
}
Note
C allows comma expression statements.
Comma expressions cannot appear as the top-level expression in the arguments to function calls.
6.4. The Abstract Syntax Tree (AST) of Dolphin (phase 4)#
The structure of the new AST should follow the idea that a program is as list of functions as described earlier in the document.
Note that return statements should also now accept the form return;
for functions with return type void; this requires changing both the parser and the ReturnStm
entry in the AST type.
The OCaml types for the AST describing programs is given below:
(* -- Use this in your solution without modifications *)
module Loc = Location
type ident = Ident of {name : string; loc : Loc.location}
type typ =
| Int of {loc : Loc.location}
| Bool of {loc : Loc.location}
type rettyp =
| Void of {loc : Loc.location}
| RetTyp of typ
type binop =
| Plus of {loc : Loc.location}
| Minus of {loc : Loc.location}
| Mul of {loc : Loc.location}
| Div of {loc : Loc.location}
| Rem of {loc : Loc.location}
| Lt of {loc : Loc.location}
| Le of {loc : Loc.location}
| Gt of {loc : Loc.location}
| Ge of {loc : Loc.location}
| Lor of {loc : Loc.location}
| Land of {loc : Loc.location}
| Eq of {loc : Loc.location}
| NEq of {loc : Loc.location}
type unop =
| Neg of {loc : Loc.location}
| Lnot of {loc : Loc.location}
type expr =
| Integer of {int : int64; loc : Loc.location}
| Boolean of {bool : bool; loc : Loc.location}
| BinOp of {left : expr; op : binop; right : expr; loc : Loc.location}
| UnOp of {op : unop; operand : expr; loc : Loc.location}
| Lval of lval
| Assignment of {lvl : lval; rhs : expr; loc : Loc.location}
| Call of {fname : ident; args : expr list; loc : Loc.location}
| CommaExpr of {left : expr; right : expr; loc : Loc.location}
and lval =
| Var of ident
type single_declaration = Declaration of {name : ident; tp : typ option; body : expr; loc : Loc.location}
type declaration_block = DeclBlock of {declarations : single_declaration list; loc : Loc.location}
type for_init =
| FIExpr of expr
| FIDecl of declaration_block
type statement =
| VarDeclStm of declaration_block
| ExprStm of {expr : expr option; loc : Loc.location}
| IfThenElseStm of {cond : expr; thbr : statement; elbro : statement option; loc : Loc.location}
| WhileStm of {cond : expr; body : statement; loc : Loc.location}
| ForStm of {init : for_init option; cond : expr option; update : expr option; body : statement; loc : Loc.location}
| BreakStm of {loc : Loc.location}
| ContinueStm of {loc : Loc.location}
| CompoundStm of {stms : statement list; loc : Loc.location}
| ReturnStm of {ret : expr option; loc : Loc.location}
type fundecl = {ret : rettyp; funname : ident; params : (ident * typ) list; body : statement list; loc : Loc.location}
type program_fragment = FunDecl of fundecl
type program = program_fragment list
Action item
The AST declarations above should replace the contents of the module called Ast
.
Do not change the code above.
6.5. Parser#
Task 1: Extend your parser to support new language features
Note that lexer does not need to change.
6.6. Semantic analysis#
In order to tackle mutual recursion, the type checking can be done with two passes over the AST.
In the first pass, check the validity of all function signatures only. If signatures of all functions are valid, gather them into a function environment (similar to one for the build-in functions) that will be used in pass two.
In the second pass, check the validity of each function body using the environment created in the previous pass.
Task 2: Extend your semantic analysis to support new language features
Extend the error module as needed.
6.7. Code generation#
For code generation, we map each Dolphin function to an LLVM function. Each generated LLVM function
has the same number of arguments as the source Dolphin function that it corresponds to. Because arguments may be used as local variables, the generated function needs to copy them from the LLVM to stack alloca
-ted memory.
For example, the source level function
int f (x:int) { x = x * 10; return x; }
when translated to LLVM may look as follows
define i64 @dolphin_fun_f (i64 %x_arg) {
; Allocate memory for copying the argument %x_arg.
%x_local_arg_copy = alloca i64
; Copy the argument
store i64 %x_arg, i64* %x_local_arg_copy
; Identifier %x_arg is not used from this point on.
; The rest of the function uses %x_local_arg_copy.
...
}
Task 3: Design and implement code generation for Phase 4
You should have the functions compile_prog_from_ast
and compile_prog_from_filename
from the
previous assignment. If implemented properly before, these functions should not require any changes for this assignment.
You should be able to execute your compiler from the CLI, with the command dolphin compile --phase 4 <path_to_dolphin_program.dlp>
.
You can still test the phases of your compiler after the parser, from a serialized AST, with the command dolphin compile --rescue --phase 4 <path_to_serialised_ast.json>
.
6.8. Testing and consolidation#
Because of the changes in the syntax the tests from the previous phases will not immediately work. To
port them, wrap them as int main() { ... }
functions. Additionally, create at least 10 new test programs
that test the functionality of the new phase.
All the tests should be included into your submission as .dlp
files.
Task 4: Testing and consolidation
Port your old tests and create new ones as specified by the description above.
Task 5 (glory): Add support for tail call optimization
Full LLVM supports tail call annotations. See tail
or musttail
annotations on call instructions
in LLVM documentation. Add support for
tail-call optimization.
6.9. Appendix#
6.9.1. C runtime#
The C runtime we use has not changed compared to the previous assignment. See the corresponding description in the previous assignment
6.9.2. Recommended workflow#
Start by designing the Typed AST declarations for both functions and comma expressions.
Tackle function and comma expressions separately.
Start with functions, and proceed with modifying and debugging each phase that needs to be changed in this assignment. a. if you decide to split work among group members, each of the parser, semantic analysis, and code generator, can be worked on independently, for as long as know how to put together the individual pieces. b. if you decide to work on the assignment in the order of the phases, work from the parser to semantic analysis to code generation.
Move onto comma expressions, and implement support for them in the parser, semantic analysis, and code generation, as in the previous step.