电子工程代写|编译器代写Compilers代考|CPS843

电子工程代写|编译器代写Compilers代考|Intermediate Code Generation

In the process of translating a source program into target code, a compiler may construct one or more intermediate representations, which can have a variety of forms. Syntax trees are a form of intermediate representation; they are commonly used during syntax and semantic analy sis.

After syntax and semantic analysis of the source program, many compilers generate an explicit low-level or machine-like intermediate representation, which we can think of as a program for an abstract machine. This intermediate representation should have two important properties: it should be easy to produce and it should be easy to translate into the target machine.

In Chapter 6, we consider an intermediate form called three-address code, which consists of a sequence of assembly-like instructions with three operands per instruction. Each operand can act like a register. The output of the intermediate code generator in Fig. $1.7$ consists of the three-address code sequence
There are several points worth noting about three-address instructions. Fir st, each three-address assignment instruction has at most one oper ator on the right side. Thus, these instructions fix the order in which oper ations are to be done; the multiplication precedes the addition in the sour $œ$ program (1.1). Second, the compiler must generate a temporary name to hold the value computed by a three-address instruction. Third, some “three-address instructions” like the first and last in the sequence (1.3), above, have fewer than three oper an ds.
In Chapter 6, we cover the principal intermediate representations used in compilers. Chapter 5 introduœes techniques for syntax-directe $\mathrm{d}$ tr anslation that are applied in Chapter 6 to type checking and intermediate- $\infty$ de generation for typical programming language constructs such as expressions, flow-of-control constructs, and proce dure calls.

电子工程代写|编译器代写Compilers代考|Code Optimization

The machine-independent code-optimization phase attempts to improve the intermediate $\infty$ de so that better target $\infty$ de will result. Usually better means faster, but other objectives may be desired, such as shorter $\infty$ de, or target code that consumes less power. For example, a straightforw ar d algorithm generates the intermediate code (1.3), using an instruction for each operator in the tree representation that $\infty$ mes from the semantic analyzer.

A simple intermediate co de gener ation algorithm followed by code optimization is a re asonable way to gener ate good target $\infty$ de. The optimizer can deduce that the conversion of 60 from integer to floating point can be done on $\varnothing$ and for all at compile time, so the inttofloat operation can be eliminated by replacing the integer $60 \mathrm{by}$ the floating-point number $60.0$. Moreover, $\mathrm{t} 3$ is used only once to transmit its value to id1 so the optimizer can transform (1.3) into the shorter sequence
\begin{aligned} &t 1=i d 3 * 60.0 \ &i d 1=i d 2+t 1 \end{aligned}
There is a great variation in the amount of code optimization different compilers perform. In those that do the most, the so-called “optimizing compilers,” a significant amount of time is spent on this phase. There are simple optimizations that significantly improve the running time of the target program without slowing down compilation too much. The chapters from 8 on discuss machine-independent and machine-dependent optimizations in detail.

电子工程代写|编译器代写Compilers代考|Code Optimization

$$t 1=i d 3 * 60.0 \quad i d 1=i d 2+t 1$$

电子工程代写|编译器代写Compilers代考|CMSC426

电子工程代写|编译器代写Compilers代考|Syntax Analysis

The second phase of the compiler is syntax analysis or parsing. The parser uses the first components of the tokens produced by the lexical analyzer to create a tree-like intermediate representation that depicts the grammatical structure of the token stream. A typical representation is a syntax tree in which each interior node represents an oper ation an $\mathrm{d}$ the children of the no de represent the arguments of the operation. A syntax tree for the token stream (1.2) is shown as the output of the syntactic analyzer in Fig. 1.7.
This tree shows the order in which the operations in the assignment
position = initial + rate * 60
are to be performed. The tree has an interior node labeled $*$ with $\langle$ id, 3$\rangle$ as its left child and the integer 60 as its right child. The node $\langle$ id, 3$\rangle$ represents the identifier rate. The node labeled * makes it explicit that we must first multiply the value of rate by 60 . The no de labeled + indicates that we must add the result of this multiplication to the value of initial. The root of the tree, labeled $=$, indicates that we must store the result of this addition into the location for the identifier position. This ordering of operations is consistent with the usual conventions of arithmetic which tell us that multiplication has higher precedence than addition, and hence that the multiplication is to be performed before the addition.

The subsequent phases of the compiler use the gr ammatical structure to help analyze the source program and generate the target program. In Chapter 4 we shall use context-free grammars to specify the grammatical structure of progr amming languages and discuss algorithms for constructing efficient syntax analyzers automatically from certain classes of gr ammars. In Chapters 2 and 5 we shall see that syntax-directed definitions can help specify the translation of programming language constructs.

电子工程代写|编译器代写Compilers代考|Semantic Analysis

The semantic analyzer uses the syntax tree and the information in the symbol table to check the source program for semantic consistency with the language definition. It also gathers type information and saves it in either the syntax tree or the symbol table, for subsequent use during intermediate-code generation.
An important part of semantic analy sis is type checking, where the compiler checks that each oper ator has mat ching operands. For example, many progr amming language definitions require an array index to be an integer; the compiler must report an error if a floating-point number is used to index an array.

The language specification may permit some type conversions called coercions. For example, a binary arithmetic operator may be applied to either a pair of integers or to a pair of floating-point numbers. If the oper ator is applied to a floating-point number and an integer, the compiler may convert or coerce the integer into a floating-point number.Such a coercion appears in Fig. 1.7. Suppose that position, initial, and rate have been declared to be floating-point numbers, and that the lexeme 60 by itself forms an integer. The type checker in the semantic analyzer in Fig. 1.7 discovers that the operator $*$ is applied to a floating-point number rate and an integer 60. In this case, the integer may be converted into a floating-point number. In Fig. 1.7, notice that the output of the semantic analyzer has an extra node for the operator inttofloat, which explicitly converts its integer argument into a floating-point number. Type checking and semantic analysis are discussed in Chapter $6 .$

电子工程代写|编译器代写Compilers代考|CVPR2022

电子工程代写|编译器代写Compilers代考|The Structure of a Compiler

Up to this point we have treated a compiler as a single box that maps a source progr am into a semantically equivalent target program. If we open up this box a little, we see that there are two parts to this mapping: analy sis and synthesis.
The analysis part breaks up the sour $œ$ program into constituent piecs and imposes a grammatical structure on them. It then uses this structure to create an intermediate representation of the sour $\infty$ program. If the analysis part detects that the sour oe program is either syntactically ill formed or semantically unsound, then it must provide informative messages, so the user can take corrective action. The analysis part also collects inform ation about the sour ce program and stores it in a data structure called a symbol table, which is passed along with the intermediate representation to the synthesis part.

The synthesis part constructs the desired target program from the intermediate representation and the information in the symbol table. The analy sis part is often called the front end of the compiler; the synthesis part is the back end.
If we examine the compilation process in more detail, we see that it operates as a sequence of phases, each of which transforms one representation of the source program to another. A typical decomposition of a compiler into phases is shown in Fig. 1.6. In practice, several phases may be grouped together, and the intermediate representations between the grouped phases need not be constructed explicitly. The symbol table, which stores information about the entire source program, is used by all phases of the compiler.
Some compilers have a machine-independent optimization phase between the front end and the back end. The purpose of this optimization phase is to perform transformations on the intermediate representation, so that the back end can produce a better target program than it would have otherwise produced from an unoptimized intermediate representation. Since optimization is optional, one or the other of the two optimization phases shown in Fig. $1.6$ may be missing.

电子工程代写|编译器代写Compilers代考|Lexical Analysis

The first phase of a compiler is called lexical analysis or scanning. The lexical analyzer reads the stream of characters making up the source program and groups the characters into meaningful se quences called lexemes. For each lexeme, the lexical analyzer produces as output a token of the form
〈token-name, attribute-value
that it passes on to the subsequent phase, syntax analysis. In the token, the first component token-name is an abstract symbol that is used during syntax analysis, and the second component attribute-value points to an entry in the symbol table for this token. Information from the symbol-table entry is nee ded for semantic analy sis and $\infty$ de generation.

For example, suppose a source program contains the assignment statement
$$\text { position = initial + rate } * 60$$
The characters in this assignment could be grouped into the following lexemes and mapped into the following tokens passed on to the synt ax analyzer: is an abstract symbol standing for identifier an 1 points to the symboltable entry for position. The symbol-table entry for an identifier holds information about the identifier, such as its name and type.

1. The assignment symbol $=$ is a lexeme that is mapped into the token $\langle=\rangle$. Since this token needs no attribute-value, we have omitted the second component. We could have used any abstract symbol such as assign for the token-name, but for not ational convenience we have chosen to use the lexeme itself as the name of the abstract symbol.

电子工程代写|编译器代写Compilers代考|Lexical Analysis

，并将其传递给后续阶段，语法分析。在token中，第一个组件token-name是一个抽象符号，用于语法分析，第 二个组件attribute-value指向这个token在符号表中的一个条目。语义分析和语义分析需要来自符号表条目的信息 $\infty$ 代。

$$\text { position }=\text { initial }+\text { rate } * 60$$

1. 赋值符号 $=$ 是映射到标记中的词位 $\langle=\rangle$. 由于这个标记不需要属性值，我们省略了第二个组件。我们可以使 用任何抽象符号，例如为标记名称分配，但为了不方便，我们选择使用词位本身作为抽象符号的名称。

