7. Dolphin – Phase 4#
Attention
This is a group assignment. The workload is calibrated for a group of 3. Please also see the recommended workflow section in the Appendix below.
In case of questions regarding ambiguity of what you should do, ask questions on the forum. If you are in doubt and there is no enough time, use your best judgment and explain your reasoning in your report.
7.1. Assignment overview#
This assignment builds on the previous three assignments: Phase 1, Phase 2, and Phase 3.
Phase 4 of Dolphin extends the language with two new features
Top-level functions, and
Comma expressions that are used to sequentialize expressions.
There are 5 tasks and no questions in this assignment. There is one glory task. Do it only if you have the time and feel ambitious.
7.1.1. What you need to get started#
This assignment is a continuation of the previous assignment. You will need to edit the code from the previous assignment.
You will need to understand Menhir parser generator. See Menhir documentation.
7.1.2. What you need to hand in#
Please hand in a .zip
file containing the following.
A brief report documenting your solution. Acceptable report formats are
.pdf
,.rtf
, and.md
. For each task and question, briefly (1 – 4 sentences) describe your implementation and answer. Write concisely.All the source files needed to reproduce your solution. This also includes the
C
code provided. Please explain in your report how the solution could be reproduced, e.g., callingmake
(if you have created aMakefile
), the command line to callclang
, etc.All the tests that you create (see Task 5) should be placed into a directory
assignment-07-tests
as individual.dlp
files.
Important
Make sure to understand all the code you hand in.
7.2. Functions in Dolphin#
Phase 4 Dolphin program consists of a list of one or more function declarations. The following applies to function declarations.
The syntax for function declarations is
<return-type> <function-name> ( <arguments> ) { <function-body> }
, where<arguments>
is a list of zero or more comma-separated pairs of the form<argument-name>: <argument-type>
. Note that both the return type and the argument types must be explicitly given. The following is an example of a declaration of function namedf
with return typeint
that takes two argumentsx
andy
, both of typeint
./* valid function declaration */ int f (x: int, y : int) { return x+y; }
The information about the function name, return type, and the argument number and types, is called function signature.
Exactly one function in the program must have name
main
with the return typeint
, and no arguments.Within a program, all function names must be unique.
Within a function, all argument names must be unique. For example, the following declaration is invalid
/* invalid function declaration - duplicate argument x */ int f (x: int, x : int) { return x; }
Within a function, lexical scoping rules apply.
Arguments can be modified in the function as if they were local variables.
/* valid program */ int f (x:int) { x = x * 10; return x; } int main () { var x:int = 1; var y = f (x); return x + y ; /* returns 11 */ }
Within a program, all functions are mutually recursive. For example, in the following program, function
odd
is accessible fromeven
despite their declaration order./* even-odd example - purpose: showcases mutual recursion - this program is valid. - returns 1 */ int even (x:int) { if (x == 0) { return 0; } else { return odd (x - 1); /* odd is in scope */ } } int odd (x: int) { if (x == 0) { return 1; } else { return even (x - 1); /* even is in scope */ } } int main () { return even (5); }
If the return type of the function is not
void
, all paths in the program must return. See, for example, functioneven
above. The type of the return statements must agree with the return-type of the function. In void functions, the return statement may be omitted.Variables and function names share their namespace.
7.3. Comma expressions#
Comma expressions are a form of expressions that are separated with a comma. The semantics is
that of sequencing the evaluation of the expressions. The expression to the left of the comma
is evaluated before the expression to the right of the comma. The result of the comma expression is that of the right subexpression. In other words, the result of the left-subexpression is ignored. Comma expressions are useful for their side-effects, e.g., assignments or side-effects through function calls. A common use case for comma
expressions is in the update expression in for
-loops.
int main () {
var _o = get_stdout ();
for ( var i: int = 0, j: int = 9
; i < 10
; i = i+1, j=j-1 /* obs: comma expression here */ )
{
output_string (int_to_string (i), _o);
output_string (" ", _o);
output_string (int_to_string (j), _o);
output_string ("\n", _o);
}
return 0;
}
The following applies to comma experissions.
In particular, comma expressions are not valid expression statements. That is, as in the earlier phases, a valid expression statement can either be an assignment, or a function call. In particular, the following program is invalid.
int main () {
var x = 0;
x = 1,x = 2; /* invalid expression statement */
/* the programmer should use semi-colon instead ;) */
return x;
}
The following program, however, is valid, because the comma expression appears as the right-hand side of the assignment expression statement.
/* valid program */
int main () {
var x = 0;
x = (1,2); /* valid expression statement */
return x; /* returns 2 */
}
Note
C allows comma expression statements.
Comma expressions cannot appear as the top-level expression in the arguments to function calls.
7.4. Abstract syntax trees#
Task 1: Design and implement an AST and Typed AST for Phase 4
The structure of the new AST should follow the idea that a program is as list of functions as described earlier in the document.
7.5. Parser#
Task 2: Extend your parser to support new language features
Note that lexer does not need to change.
7.6. Semantic analysis#
In order to tackle mutual recursion, the type checking can be done with two passes over the AST.
In the first pass, check the validity of all function signatures only. If signatures of all functions are valid, gather them into a function environment (similar to one for the build-in functions) that will be used in pass two.
In the second pass, check the validity of each function body using the environment created in the previous pass.
Task 3: Extend your semantic analysis to support new language features
Extend the error module as needed.
7.7. Code generation#
For code generation, we map each Dolphin function to an LLVM function. Each generated LLVM function
has the same number of arguments as the source Dolphin function that it corresponds to. Because arguments may be used as local variables, the generated function needs to copy them from the LLVM to stack alloca
-ted memory.
For example, the source level function
int f (x:int) { x = x * 10; return x; }
when translated to LLVM may look as follows
define i64 @dolphin_fun_f (i64 %x_arg) {
; Allocate memory for copying the argument %x_arg.
%x_local_arg_copy = alloca i64
; Copy the argument
store i64 %x_arg, i64* %x_local_arg_copy
; Identifier %x_arg is not used from this point on.
; The rest of the function uses %x_local_arg_copy.
...
}
Task 4: Design and implement code generation for Phase 4
7.8. Testing and consolidation#
Because of the changes in the syntax the tests from the previous phases will not immediately work. To
port them, wrap them as int main() { ... }
functions. Additionally, create at least 10 new test programs
that test the functionality of the new phase.
All the tests should be included into your submission as .dlp
files.
Task 5: Testing and consolidation
Port your old tests and create new ones as specified by the description above.
Task 6 (glory): Add support for tail call optimization
Full LLVM supports tail call annotations. See tail
or musttail
annotations on call instructions
in LLVM documentation. Add support for
tail-call optimization.
7.9. Appendix#
7.9.1. C runtime#
The C runtime we use has not changed compared to the previous assignment. See the corresponding description in the previous assignment
7.9.2. Recommended workflow#
Start by designing the AST and the Typed AST declarations for both functions and comma expressions.
Tackle function and comma expressions separately.
Start with functions, and proceed with modifying and debugging each phase that needs to be changed in this assignment. a. if you decide to split work among group members, each of the parser, semantic analysis, and code generator, can be worked on independently, for as long as you adhere to the AST interfaces, and know how to put together the individual pieces. b. if you decide to work on the assignment in the order of the phases, work from the parser to semantic analysis to code generation.
Move onto comma expressions, and implement support for them in the parser, semantic analysis, and code generation, as in the previous step.