Developing new compilers using coins

Using an LL parser generator


  1. Introduction
  2. PL0' Compiler (without nested functions)
    1. Grammar of PL0'
    2. Compiler Driver
    3. Lexer/Parser
    4. Example of Compilation
    5. Visualized Compilation Process
  3. PL0' Compiler (with nested functions: case 1)
    1. Grammar of PL0'
    2. Lexer/Parser
  4. PL0' Compiler (with nested functions: case 2)
    1. Grammar of PL0'
    2. Lexer/Parser

‚ODIntroduction


To develop a new compiler for a language, it is only necessary to create a lexer/parser that translates programs written in that language to HIR, and to create a compiler driver that calls the parser.


1. PL0' Compiler


1.1 Grammar of PL0'

<program> ::= <block> '.'
<block> ::=  ( <declaration> )* <statement>
<declaration> ::= <const_decl> | <var_decl> | <func_decl>
<const_decl> ::= 'const' <const_def> ( ',' <const_def> )* ';'
<const_def> ::= ident '=' number
<var_decl> ::= 'var' ident ( ',' ident )* ';'
<func_decl> ::= 'function' ident '(' [ ident ( ',' ident )* ]  ')' <block> ';'
<statement> ::= [ ident ':=' <expression>
             | 'begin' <statement> ( ';' <statement> )* 'end'
             | 'if' <condition> 'then' <statement>
             | 'while' <condition> 'do' <statement>
             | 'return' <expression>
             | 'write' <expression>
             | 'writeln' ]
<condition> ::= 'odd' <expression>
             | <expression> ( '=' | '<>' | '<' | '<=' | '>' | '>=' ) <expression>
<expression> ::= [ ( '-' | '+' ) ] <term> ( ( '-' | '+' ) <term> )*
<term> ::= <factor> ( ( '*' | '/' ) <factor> )*
<factor> ::= ident '(' [ <expression> ( ',' <expression> )* ] ')'
             | number
             | ident
             | '(' <expression> ')'

1.2 Compiler Driver: PL0Driver.java

PL0Driver is a subclass of the Driver (standard driver of COINS), and can be invoked by the following command:

  $ java PL0Driver <options> <source-file>
An example of this command is:
  $ java PL0Driver -S -coins:trace=HIR.1 mult.p  
where
   -S
means to stop after the assembler code generation,
   -coins:trace=HIR.1
means to print trace messages of 'HIR' module whose message level is less than or equals to 1. 'trace' is a option predefined in the Driver,
    mult.p
is the source file name. '.p' is a suffix whose rule is not defined in the standard driver. To define the rule it is necessary to place the suffix rule file named 'suffixes' that contains the following line:
       p,     PL0,       PL0 source,        -,s,o
to
   ~/coins/suffixes
   
The main method in the PL0Driver is the following:
  public static void main(String[] args) { 
    new PL0Driver().go(args);  
  }
The 'go' method of the Driver calls passes defined in the Driver according to the suffix rules. These passes are preprocess, compile, assemble, and link. The PL0Driver overrides only one method of the Driver super class, makeHirFromSource. The makeLirFromHir method calls the parser of PL0.

1.3 Lexer/Parser

The parserPL0Driver was generated from pl0.jj by JavaCC. The lexer is included in the parser. JavaCC generates recursive descent parsers written in Java. JavaCC itself is written in Java.

This parser translates a source program into a HIR tree. The methods used in this parser to generates HIR tree nodes are described in coins.ir.hir.HIR0. HIR0.java is the basic interface file and the super interface of HIR.java. HIR.java is the full interface file. These interfaces are implemented in coins.ir.hir.HIR_Impl. The methods for symbols are described in coins.sym.Sym0 that is the super interface of coins.sym.Sym. These interfaces are implemented in coins.sym.Sym_Impl.

Nested function definition is not implemented in this parser.

1.4 Example of Compilation

-----source program mult.p-----

 function multiply(x, y)
     var a,b,c;
    begin
      a:=x; b:=y; c:=0;
      while b>0 do
        begin
          if odd b then c:=c+a;
          a:=2*a; b:=b/2;
        end;
      return c;
    end; 

const m=7,n=85;
var x,y;

begin
  x:=m; y:=n; 
  write x; write y; writeln; write multiply(x,y); writeln
end.

--------HIR tree and Symbol Table--------

--------object program mult.s--------

      .section      ".text"
      .align      4
multiply.2:
      save      %sp,-96,%sp
.L2:
      mov      0,%i2
.L3:
      cmp      %i1,0
      ble      .L8
      nop      
.L4:
      and      %i1,1,%i3
      cmp      %i3,1
      bne      .L6
      nop      
.L5:
      add      %i2,%i0,%i2
.L6:
      sll      %i0,1,%i0
      mov      2,%o1
      mov      %i1,%o0
      call      .div
      nop      
      mov      %o0,%i1
      ba      .L3
      nop      
.L8:
.L9:
      mov      %i2,%i0
      ret
      restore

      .align      1
string.13:
      .byte      37
      .byte      100
      .byte      0
      .align      1
string.16:
      .byte      10
      .byte      0

      .align      4
      .global      main
main:
      save      %sp,-96,%sp
.L11:
      mov      7,%i0
      mov      85,%i0
      sethi      %hi(string.13),%o0
      or      %o0,%lo(string.13),%o0
      mov      7,%o1
      call      printf
      nop      
      sethi      %hi(string.13),%o0
      or      %o0,%lo(string.13),%o0
      mov      85,%o1
      call      printf
      nop      
      sethi      %hi(string.16),%o0
      or      %o0,%lo(string.16),%o0
      call      printf
      nop      
      mov      7,%o0
      mov      85,%o1
      call      multiply.2
      nop      
      mov      %o0,%o1
      sethi      %hi(string.13),%o0
      or      %o0,%lo(string.13),%o0
      call      printf
      nop      
      sethi      %hi(string.16),%o0
      or      %o0,%lo(string.16),%o0
      call      printf
      nop      
.L12:
      ret
      restore

1.5 Visualized Compilation Process

CoVis is the visualizer of compilation processes. The following figure is a snapchot of CoVis. It shows the compilation process of the mult.p program.


2. PL0' Compiler (with nest of functions)


In case of nested functions, an inner function may access the global variables (the local variables of the outer functions). Our PL0' compiler constructs a frame of a function as a C struct, and inner functions can access outer local variables through the global display of frame pointers.

Example:

var x,y;
function abc(z)
    var a, b;
    function fgh(w)
        var u;
        begin
            u := x+a+z;
            return u+w
        end;
    begin
        a:=z+x;
        return fgh(a)
    end;
x:=abc(y)
. 
The above PL0' program is translated to the HIR which is similar to the HIR made from the following C program.
void *display_0;
void *display_1;
void *display_2;

struct main_frame{
    int x; int y;
};

struct abc_frame{
    int z; int a; int b;
};

struct fgh_frame{
    int w; int u;
};

int fgh(int w){
  struct fgh_frame frame_; 
  void *saveDisplay_;
  int ret_val_var;
  saveDisplay_ = display_2;
  display_2 = &frame_;
  frame_.w = w;
  frame_.u = (((struct main_frame *)display_0) -> x) 
            + (((struct abc_frame *)display_1) -> a)
            + (((struct abc_frame *)display_1) -> z);
  ret_val_var = frame_.u + frame_.w;
  display_2 = saveDisplay_;
  return ret_val_var;
  display_2 = saveDisplay_;
  return 0;
}

int abc(int z){
  struct abc_frame frame_; 
  void *saveDisplay_;
  int ret_val_var;
  saveDisplay_ = display_1;
  display_1 = &frame_;
  frame_.z = z;
  frame_.a = frame_.z + (((struct main_frame *)display_0) -> x);
  ret_val_var = fgh(frame_.a);
  display_1 = saveDisplay_;
  return ret_val_var;
  display_1 = saveDisplay_;
  return 0; 
}

main(){
  struct main_frame frame_;
  void *saveDisplay_;
  saveDisplay_ = display_0;
  display_0 = &frame_;
  frame_.x = abc(frame_.y);
  display_0 = saveDisplay_;
  return 0;
}
PL0' grammar is slightly changed to construct the abave structs in one-pass,

2.1 Grammar of PL0'

<block> is changed as follows:

<block> ::=  ( <decl> )* ( <func_decl> )* <statement>
<decl> ::= <const_decl> | <var_decl>

3. PL0' Compiler (with nested functions: case 2)


In case of nested functions, an inner function may access the global variables (the local variables of the outer functions). This time, our PL0' compiler implements the lambda lifting method. The outer variables accessed by inner functions are given to the inner functions as call-by-address parameters.

Example:

var x,y;
function abc(z)
    var a, b;
    function fgh(w)
        var u;
        begin
            u := x+a+z;
            return u+w
        end;
    begin
        a:=z+x;
        return fgh(a)
    end;
x:=abc(y)
. 
The above PL0' program is translated to the HIR which is similar to the HIR made from the following C program.
main(){
    int x,y;
    x = abc(y, &x);
}

int abc(int z, int *x){
    int a, b;
    a = z + *x;
    return fgh(a, x, &a, &z);
}

int fgh(int w, int *x, int *a, int *z){
    int u;
    u = *x + *a + *z;
    return u+w;
}

3.1 Grammar of PL0'

The same grammar as the one in the secton 1.1.

3.2 Lexer/Parser

Lexer/Parser is written in JavaCC. PL0Parser.java is generated by the following command:

        $ javacc pl03.jj
Added parameters by the lambda lifting method cannot be determined in one-pass. Therefore, we build the HIR tree in one-pass with dummy function definitions and dummy function calls. After calculating the added parameters, these dummy trees are replaced by the modified trees.