Dolphin – Phase 4

6. Dolphin – Phase 4#

Attention

This is a group assignment. The workload is calibrated for a group of 3. Please also see the recommended workflow section in the Appendix below.

In case of questions regarding ambiguity of what you should do, ask questions on the forum. If you are in doubt and there is no enough time, use your best judgment and explain your reasoning in your report.

6.1. Assignment overview#

This assignment builds on the previous three assignments: Phase 1, Phase 2, and Phase 3.

Phase 4 of Dolphin extends the language with two new features

Top-level functions, and
Comma expressions that are used to sequentialize expressions.

There are 5 tasks and no questions in this assignment. There is one glory task. Do it only if you have the time and feel ambitious.

6.1.1. What you need to get started#

This assignment is a continuation of the previous assignment. You will need to edit the code from the previous assignment.
You will need to understand Menhir parser generator. See Menhir documentation.

6.1.2. What you need to hand in#

Please hand in a .zip file containing the following.

A brief report documenting your solution. Acceptable report formats are .pdf, .rtf, and .md. For each task and question, briefly (1 – 4 sentences) describe your implementation and answer. Write concisely.
All the source files needed to reproduce your solution. This also includes the C code provided. Please explain in your report how the solution could be reproduced, e.g., calling make (if you have created a Makefile), the command line to call clang, etc.
All the tests that you create (see Task 5) should be placed into a directory assignment-07-tests as individual .dlp files.

Important

Make sure to understand all the code you hand in.

6.2. Functions in Dolphin#

Phase 4 Dolphin program consists of a list of one or more function declarations. The following applies to function declarations.

The syntax for function declarations is <return-type> <function-name> ( <arguments> ) { <function-body> }, where <arguments> is a list of zero or more comma-separated pairs of the form <argument-name>: <argument-type>. Note that both the return type and the argument types must be explicitly given. The following is an example of a declaration of function named f with return type int that takes two arguments x and y, both of type int.
```
/* valid function declaration */
int f (x: int, y : int) {
 return x+y;
} 
```
The information about the function name, return type, and the argument number and types, is called function signature.
Exactly one function in the program must have name main with the return type int, and no arguments.
Within a program, all function names must be unique.

Within a function, all argument names must be unique. For example, the following declaration is invalid

/* invalid function declaration 
   - duplicate argument x 
*/
int f (x: int, x : int) {
 return x;
} 

Within a function, lexical scoping rules apply.

Arguments can be modified in the function as if they were local variables.

/* valid program */
int f (x:int) {
    x = x * 10; 
    return x;
}

int main () {
    var x:int = 1;
    var y = f (x);
    return x + y ; /* returns 11 */
}

Within a program, all functions are mutually recursive. For example, in the following program, function odd is accessible from even despite their declaration order.

/* even-odd example 
   - purpose: showcases mutual recursion 
   - this program is valid.
   - returns 1 
*/
int even (x:int) {
  if (x == 0) {
    return 0;
  } else {
    return odd (x - 1);   /* odd is in scope */
  }
}

int odd (x: int) {
  if (x == 0) {
    return 1;
  } else { 
    return even (x - 1);  /* even is in scope */
  }
}

int main () {
  return even (5);
}

If the return type of the function is not void, all paths in the program must return. See, for example, function even above. The type of the return statements must agree with the return-type of the function. In void functions, the return statement may be omitted.
Variables and function names share their namespace.

6.3. Comma expressions#

Comma expressions are a form of expressions that are separated with a comma. The semantics is that of sequencing the evaluation of the expressions. The expression to the left of the comma is evaluated before the expression to the right of the comma. The result of the comma expression is that of the right subexpression. In other words, the result of the left-subexpression is ignored. Comma expressions are useful for their side-effects, e.g., assignments or side-effects through function calls. A common use case for comma expressions is in the update expression in for-loops.

int main () {
    var _o = get_stdout ();
    for ( var i: int = 0, j: int = 9
         ; i < 10 
         ; i = i+1, j=j-1 /* obs: comma expression here */ )
   {
        output_string (int_to_string (i), _o);
        output_string (" ", _o);
        output_string (int_to_string (j), _o);
        output_string ("\n", _o);
   }
    return 0;
}

The following applies to comma experissions.

In particular, comma expressions are not valid expression statements. That is, as in the earlier phases, a valid expression statement can either be an assignment, or a function call. In particular, the following program is invalid.

int main () {
    var x = 0;
    x = 1,x = 2; /* invalid expression statement                      */
                 /* the programmer should use semi-colon instead ;)   */
    return x;
}

The following program, however, is valid, because the comma expression appears as the right-hand side of the assignment expression statement.

/* valid program */
int main () {
    var x = 0;
    x = (1,2); /* valid expression statement */
    return x;  /* returns 2 */
}

Note

C allows comma expression statements.

Comma expressions cannot appear as the top-level expression in the arguments to function calls.

6.4. Abstract syntax trees#

Task 1: Design and implement an AST and Typed AST for Phase 4
The structure of the new AST should follow the idea that a program is as list of functions as described earlier in the document. Note that return statements should also now accept the form return; for functions with return type void; this requires changing both the parser and the ReturnStm entry in the AST type.

6.5. Parser#

Task 2: Extend your parser to support new language features
Note that lexer does not need to change.

6.6. Semantic analysis#

In order to tackle mutual recursion, the type checking can be done with two passes over the AST.

In the first pass, check the validity of all function signatures only. If signatures of all functions are valid, gather them into a function environment (similar to one for the build-in functions) that will be used in pass two.
In the second pass, check the validity of each function body using the environment created in the previous pass.

Task 3: Extend your semantic analysis to support new language features
Extend the error module as needed.

6.7. Code generation#

For code generation, we map each Dolphin function to an LLVM function. Each generated LLVM function has the same number of arguments as the source Dolphin function that it corresponds to. Because arguments may be used as local variables, the generated function needs to copy them from the LLVM to stack alloca-ted memory. For example, the source level function int f (x:int) { x = x * 10; return x; } when translated to LLVM may look as follows

define i64 @dolphin_fun_f (i64 %x_arg) {
 ; Allocate memory for copying the argument %x_arg.
 %x_local_arg_copy = alloca i64          
 ; Copy the argument 
 store i64 %x_arg, i64* %x_local_arg_copy 
 ; Identifier %x_arg is not used from this point on.
 ; The rest of the function uses %x_local_arg_copy.
 ...
}

Task 4: Design and implement code generation for Phase 4

6.8. Testing and consolidation#

Because of the changes in the syntax the tests from the previous phases will not immediately work. To port them, wrap them as int main() { ... } functions. Additionally, create at least 10 new test programs that test the functionality of the new phase.

All the tests should be included into your submission as .dlp files.

Task 5: Testing and consolidation

Port your old tests and create new ones as specified by the description above.

Task 6 (glory): Add support for tail call optimization
Full LLVM supports tail call annotations. See tail or musttail annotations on call instructions in LLVM documentation. Add support for tail-call optimization.

6.9. Appendix#

6.9.1. C runtime#

The C runtime we use has not changed compared to the previous assignment. See the corresponding description in the previous assignment

6.9.2. Recommended workflow#

Start by designing the AST and the Typed AST declarations for both functions and comma expressions.
Tackle function and comma expressions separately.
Start with functions, and proceed with modifying and debugging each phase that needs to be changed in this assignment. a. if you decide to split work among group members, each of the parser, semantic analysis, and code generator, can be worked on independently, for as long as you adhere to the AST interfaces, and know how to put together the individual pieces. b. if you decide to work on the assignment in the order of the phases, work from the parser to semantic analysis to code generation.
Move onto comma expressions, and implement support for them in the parser, semantic analysis, and code generation, as in the previous step.