Developing new compilers using coins

Using an LR parser generator


  1. Introduction
  2. Example 1 (simple declarations and expressions)
    1. Grammar of the language
    2. Compiler Driver
    3. Lexer/Parser
    4. Example of Compilation
    5. Visualized Compilation Process
  3. Example 2 (control statements)
    1. Grammar of the language
    2. Compiler Driver/Lexer/Parser
  4. Example 3 (functions)
    1. Grammar of the language
    2. Compiler Driver/Lexer/Parser
  5. Example 4 (data types and arrays)
    1. Grammar of the language
    2. Compiler Driver/Lexer/Parser

‚ODIntroduction


To develop a new compiler for a language, it is only necessary to create a lexer/parser that translates programs written in that language to HIR, and to create a compiler driver that calls the parser.


1. Example 1 (simple declarations and expressions)


1.1 Grammar of the language

<program> ::= <declaration_list> <statement_list>
<declaration_list> ::= empty 
           | <declaration_list> <declaration> ';'
<declaration> ::= 'int' ident
           | <declaration> ',' ident
<statement_list> ::= <statement>
           | <statement_list> <statement>
<statement> ::= <ident> '=' <expression> ';' 
           | 'read' ident ';'
           | 'write' <expression> ';' 
<expression> ::= <expression> '+' <term>
           | <expression> '-' <term>
           | <term>
<term> ::= <term> '*' <factor>
           | <term> '/' <factor>
           | <term> '%' <factor>
           | <factor>  
<factor> ::= '-' <factor>
           | '(' <expression> ')'
           | ident
           | number

1.2 Compiler Driver: SimpleDriver.java

SimpleDriver is a subclass of the Driver (standard driver of COINS), and can be invoked by the following command:

  $ java SimpleDriver <options> <source-file>
An example of this command is:
  $ java SimpleDriver -S -coins:trace=HIR.1  expr1.c  
where
   -S
means to stop after the assembler code generation,
   -coins:trace=HIR.1
means to print trace messages of 'HIR' module whose message level is less than or equals to 1. 'trace' is a option predefined in the Driver,
    expr1.c
is the source file name. '.c' is a suffix defined in the default suffix rule file 'suffixes'.

The main method in the SimpleDriver is the following:

  public static void main(String[] args) { 
    new SimpleDriver().go(args);  
  }
The 'go' method of the Driver calls passes defined in the Driver according to the suffix rules. These passes are preprocess, compile, assemble, and link. The SimpleDriver overrides only one method of the Driver super class, makeHirFromSource. The makeLirFromHir method calls the parser of the Simple language.

1.3 Lexer/Parser

The Lexer, Scanner.java, was generated from simple1.l by JFlex. The parser was generated from simple1.y by 'jay' ( http://www.informatik.uni-osnabrueck.de/alumni/bernd/jay/) by the following command:

   $ jay < skeleton simple1.y > Parser.java
'jay' is the Java version of Berkely YACC and generates LALR parsers written in Java. This parser translates a source program into a HIR tree. For example, it translates the source program:
    int a, b; 
    read a; 
    b = a + 5; 
    write b; 
into the HIR tree that is similar to the tree generated from the C program:
   main() { 
      int a, b; 
      scanf("%d", &a); 
      b = a + 5; 
      printf(("%d€n", b); 
   }
The methods used in this parser to generates HIR tree nodes are described in coins.ir.hir.HIR0. HIR0.java is the basic interface file and the super interface of HIR.java. HIR.java is the full interface file. These interfaces are implemented in coins.ir.hir.HIR_Impl. The methods for symbols are described in coins.sym.Sym0 that is the super interface of coins.sym.Sym. These interfaces are implemented in coins.sym.Sym_Impl.

1.4 Example of Compilation

-----expr1.c-----

    int a, b;
    read a;
    b = a + 5;
    write b;
--------HIR tree and Symbol Table--------

--------expr1.s--------

	.section ".text"
	.align	1
string.6:
	.byte	37
	.byte	100
	.byte	0
	.align	1
string.8:
	.byte	37
	.byte	100
	.byte	10
	.byte	0

	.align	4
	.global	main
main:
	save	%sp,-104,%sp
	sethi	%hi(string.6),%o0
	or	%o0,%lo(string.6),%o0
	add	%fp,-4,%o1
	call	scanf
	nop
	ld	[%fp-4],%i0
	add	%i0,5,%o1
	sethi	%hi(string.8),%o0
	or	%o0,%lo(string.8),%o0
	call	printf
	nop
.L3:
	ret
	restore

1.5 Visualized Compilation Process

CoVis is the visualizer of compilation processes. The following figure is a snapchot of CoVis. It shows the compilation process of the expr1.c program.


2. Example 2 (control statements)


2.1 Grammar of the language

<program> ::= <declaration_list> <statement_list>
<declaration_list> ::= empty 
           | <declaration_list> <declaration> ';'
<declaration> ::= 'int' ident 
           | <declaration> ',' ident
<statement_list> ::= <statement>
           | <statement_list> <statement>
<assign> ::= ident '=' <expression>
<opt_assign> ::= empty
           | <assign>
<statement> ::= <assign> ';' 
           | empty ';'
           | 'read' ident ';' 
           | 'write' <expression> ';' 
           | '{' <statement_list> '}' 
           | <if_part>
           | <if_part> 'else' <statement>
           | 'while' '(' <expression> ')' <statement>
           | 'do' <statement> 'while' <expression> ';' 
           | 'for' '(' <opt_assign> ';' <test_exp> ';' <opt_assign> ')' <statement>
           | 'switch' '(' <expression> ')' '{' <case_list> <default_case> '}' 
           | 'break' ';' 
           | 'continue' ';' 
<case_list> ::= <case_item>
           | <case_list> <case_item>
<case_item> ::= 'case' number ':'
           | <case_item> <statement>
<default_case> ::= empty
           | <default_item>
<default_item> ::= 'default' ':'
           | <default_item> <statement>
<if_part> ::= 'if' '(' <expression> ')' <statement>
<test_exp> ::= empty
           | <expression>     
<expression> ::= <expression1> '||' <expression1>
           | <expression1>
<expression1> ::= <expression2> '&&' <expression2>
           | <expression2>
<expression2> ::= <expression3> <relational_op> <expression3> 
           | <expression3> 
<relational_op> ::= '<' | '<=' | '==' | '!=' | '>=' | '>'
<expression3> ::= <expression3> '+' <term>
           | <expression3> '-' <term>
           | <term> 
<term> ::= <term> '*' <factor>
           | <term> '/' <factor>
           | <term> '%' <factor>
           | <factor>  
<factor> ::= '-' <factor>
           | '!' <factor> 
           | '(' <expression> ')'
           | ident
           | number

2.2 Compiler Driver/Lexer/Parser

The SimpleDriver is the same as the previous one. The Lexer, Scanner.java, was generated from simple2.l by JFlex. The parser was generated from simple2.y by 'jay'. Although 'jay' reports one shift/reduce conflict caused by dangling-else, it is resoleved in favor of shifting.


3. Example 3 (functions)


3.1 Grammar of the language

<program> ::= <global_declarations>
<global_declarations> ::= empty
           | <global_declarations> <declaration> ';'
           | <global_declarations> <function_def> ';'
<declaration> ::= <type> ident
           | <type> <function_head>
           | <declaration> ',' ident
           | <declaration> ',' <function_head>
<type> ::= 'void' | 'int'
<param_list> ::= empty
           | <param_decl>
           | <param_list> ',' <param_decl>
<param_decl> ::= <type> ident
<function_head> ::= ident '(' <param_list> ')'
<function_def> ::= <type> <function_head> <block> 
<block> ::= '{' <declaration_list> <statement_list> '}'
<declaration_list> ::= empty 
           | <declaration_list> <declaration> ';'
<statement_list> ::= <statement>
           | <statement_list> <statement>
<assign> ::= ident '=' <expression> 
<opt_assign> ::= empty
           | <assign>
<statement> ::= <block>
           | 'return' ';'
           | 'return' <expression> ';'
           | <assign> ';' 
           | empty ';'
           | 'read' ident ';' 
           | 'write' <expression> ';' 
           | <if_part>
           | <if_part> 'else' <statement>
           | 'while' '(' <expression> ')' <statement>
           | 'do' <statement> 'while' <expression> ';' 
           | 'for' '(' <opt_assign> ';' <test_exp> ';' <opt_assign> ')' <statement>
           | 'switch' '(' <expression> ')' '{' <case_list> <default_case> '}' 
           | 'break' ';' 
           | 'continue' ';' 
<case_list> ::= <case_item>
           | <case_list> <case_item>
<case_item> ::= 'case' number ':'
           | <case_item> <statement>
<default_case> ::= empty
           | <default_item>
<default_item> ::= 'default' ':'
           | <default_item> <statement>
<if_part> ::= 'if' '(' <expression> ')' <statement>
<test_exp> ::= empty
           | <expression>     
<expression> ::= <expression1> '||' <expression1>
           | <expression1>
<expression1> ::= <expression2> '&&' <expression2>
           | <expression2>
<expression2> ::= <expression3> <relational_op> <expression3> 
           | <expression3> 
<relational_op> ::= '<' | '<=' | '==' | '!=' | '>=' | '>'
<expression3> ::= <expression3> '+' <term>
           | <expression3> '-' <term>
           | <term> 
<term> ::= <term> '*' <factor>
           | <term> '/' <factor>
           | <term> '%' <factor>
           | <factor>  
<factor> ::= '-' <factor>
           | '!' <factor> 
           | '(' <expression> ')'
           | ident '(' <argument_list> ')'
           | ident
           | number
<argument_list> ::= empty
           | <expression>
           | <argument_list> <expression>

3.2 Compiler Driver/Lexer/Parser

The SimpleDriver is the same as the previous one. The Lexer, Scanner.java, was generated from simple3.l by JFlex. The parser was generated from simple3.y by 'jay'. Although 'jay' reports one shift/reduce conflict caused by dangling-else, it is resoleved in favor of shifting.


4. Example 4 (data types and arrays)


4.1 Grammar of the language

<program> ::= <global_declarations>
<global_declarations> ::= empty
           | <global_declarations> <declaration> ';'
           | <global_declarations> <function_def> ';'
<declaration> ::= <type_spec> <declarator>
           | <type_spec> <function_head>
           | <declaration> ',' <declarator>
           | <declaration> ',' <function_head>
<type_spec> ::= <type>
           | 'static' <type>
<type> ::= 'void' | 'int' | 'double'
<declarator> ::= ident <dimension_list>
<dimension_list> ::= empty
           | '[' ']' <dimension_list>
           | '[' number ']' <dimension_list>
<param_list> ::= empty
           | <param_decl>
           | <param_list> ',' <param_decl>
<param_decl> ::= <type> <declarator>
           | <type> '&' <declarator>
<function_head> ::= ident '(' <param_list> ')'
<function_def> ::= <type_spec> <function_head> <block> 
<block> ::= '{' <declaration_list> <statement_list> '}'
<declaration_list> ::= empty 
           | <declaration_list> <declaration> ';'
<statement_list> ::= <statement>
           | <statement_list> <statement>
<assign> ::= ident '=' <expression>
           | ident <subscript_list> '=' <expression>
<opt_assign> ::= empty
           | <assign>
<statement> ::= <block>
           | 'return' ';'
           | 'return' <expression> ';'
           | <assign> ';' 
           | empty ';'
           | 'read' ident ';' 
           | 'write' <expression> ';' 
           | <if_part>
           | <if_part> 'else' <statement>
           | 'while' '(' <expression> ')' <statement>
           | 'do' <statement> 'while' <expression> ';' 
           | 'for' '(' <opt_assign> ';' <test_exp> ';' <opt_assign> ')' <statement>
           | 'switch' '(' <expression> ')' '{' <case_list> <default_case> '}' 
           | 'break' ';' 
           | 'continue' ';' 
<case_list> ::= <case_item>
           | <case_list> <case_item>
<case_item> ::= 'case' number ':'
           | <case_item> <statement>
<default_case> ::= empty
           | <default_item>
<default_item> ::= 'default' ':'
           | <default_item> <statement>
<if_part> ::= 'if' '(' <expression> ')' <statement>
<test_exp> ::= empty
           | <expression>     
<expression> ::= <expression1> '||' <expression1>
           | <expression1>
<expression1> ::= <expression2> '&&' <expression2>
           | <expression2>
<expression2> ::= <expression3> <relational_op> <expression3> 
           | <expression3> 
<relational_op> ::= '<' | '<=' | '==' | '!=' | '>=' | '>'
<expression3> ::= <expression3> '+' <term>
           | <expression3> '-' <term>
           | <term> 
<term> ::= <term> '*' <factor>
           | <term> '/' <factor>
           | <term> '%' <factor>
           | <factor>  
<factor> ::= '-' <factor>
           | '!' <factor> 
           | '(' <expression> ')'
           | ident '(' <argument_list> ')'
           | ident <subscript_list>
           | ident
           | number
           | string
           | character
           | real_number
<subscript_list> ::= '[' <expression> ']'
           | '[' <expression> ']' <subscript_list>
<argument_list> ::= empty
           | <expression>
           | <argument_list> <expression>
 

4.2 Compiler Driver/Lexer/Parser

The SimpleDriver is the same as the previous one. The Lexer, Scanner.java, was generated from simple4.l by JFlex. The parser was generated from simple4.y by 'jay'. Although 'jay' reports one shift/reduce conflict caused by dangling-else, it is resoleved in favor of shifting.