从零写一个编译器(十三):代码生成之遍历AST

项目的完整代码在 C2j-Compiler

前言

在上一篇完成对JVM指令的生成,下面就可以真正进入代码生成部分了。通常现代编译器都是先把生成IR,再经过代码优化等等,最后才编译成目标平台代码。但是时间水平有限,我们没有IR也没有代码优化,就直接利用AST生成Java字节码

入口

进行代码生成的入口在CodeGen,和之前解释器一样:先获取main函数的头节点,从这个节点开始,先进入函数定义,再进入代码块

函数定义节点

在进入函数定义节点的时候,就要生成一个函数定义对应的Java字节码,即一个静态方法(因为我们对整个C语言文件生成为一个类,main方法为public static main,其它的则是对应的静态方法,结构体则是另外的类)

  • 对于函数定义先从节点中拿到对应的函数命和参数
  • emitArgs是用来处理参数的,根据参数生成相应的Java字节码
  • 如果这个函数是main的话已经是交由前面处理了,逻辑差不多(具体在start.Start中)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
case SyntaxProductionInit.NewName_LP_RP_TO_FunctDecl:
root.reverseChildren();
AstNode n = root.getChildren().get(0);
String name = (String) n.getAttribute(NodeKey.TEXT);
symbol = (Symbol) root.getAttribute(NodeKey.SYMBOL);
generator.setCurrentFuncName(name);
if (name != null && !name.equals("main")) {
String declaration = name + emitArgs(symbol);
generator.emitDirective(Directive.METHOD_PUBBLIC_STATIC, declaration);
generator.setNameAndDeclaration(name, declaration);
}
copyChild(root, root.getChildren().get(0));
break;

case SyntaxProductionInit.NewName_LP_VarList_RP_TO_FunctDecl:
n = root.getChildren().get(0);
name = (String) n.getAttribute(NodeKey.TEXT);
symbol = (Symbol) root.getAttribute(NodeKey.SYMBOL);
generator.setCurrentFuncName(name);
if (name != null && !name.equals("main")) {
String declaration = name + emitArgs(symbol);
generator.emitDirective(Directive.METHOD_PUBBLIC_STATIC, declaration);
generator.setNameAndDeclaration(name, declaration);
}

Symbol args = symbol.getArgList();

if (args == null || argsList == null || argsList.isEmpty()) {
System.err.println("generate function with arg list but arg list is null");
System.exit(1);
}
break;

创建结构体和数组

数组

创建结构体和数组的节点在DefGenerate里,可以看到在这里只处理了数组和普通变量,有关结构体的处理是在对结构体第一次使用的时候。顺便提一下代码生成对于赋初值操作是没有进行处理的。

  • 如果是个数组,酒直接调用ProgramGenerator直接生成创建数组的指令
  • 如果是个普通变量,就直接找到它并且赋值为0(这里变量在队列里的位置是根据符号表来计算的,具体可以看上一篇的getLocalVariableIndex方法)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
public class DefGenerate extends BaseGenerate {
@Override
public Object generate(AstNode root) {
int production = (int) root.getAttribute(NodeKey.PRODUCTION);
ProgramGenerator generator = ProgramGenerator.getInstance();
Symbol symbol = (Symbol) root.getAttribute(NodeKey.SYMBOL);

switch (production) {
case SyntaxProductionInit.Specifiers_DeclList_Semi_TO_Def:
Declarator declarator = symbol.getDeclarator(Declarator.ARRAY);
if (declarator != null) {
if (symbol.getSpecifierByType(Specifier.STRUCTURE) == null) {
generator.createArray(symbol);
}
} else {
int i = generator.getLocalVariableIndex(symbol);
generator.emit(Instruction.SIPUSH, "" + 0);
generator.emit(Instruction.ISTORE, "" + i);
}

break;

default:
break;
}

return root;
}
}

结构体

处理结构体定义的代码在UnaryNodeGenerate,也就是只有在使用到结构体定义时才会进行定义

  • 先拿到当前UNARY的符号,如果instanceof ArrayValueSetter就说明是一个结构体数组,就进入getStructSymbolFromStructArray方法创建一个结构体数组,并返回当前下标的结构体对象
  • 设置当前结构体的作用域范围
  • 对结构体作为类进行定义
  • 然后对读取结构体的域
  • 其实可以忽略指针部分,因为代码生成并没有对指针进行模拟
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
case SyntaxProductionInit.Unary_StructOP_Name_TO_Unary:
child = root.getChildren().get(0);
String fieldName = (String) root.getAttribute(NodeKey.TEXT);
Object object = child.getAttribute(NodeKey.SYMBOL);
boolean isStructArray = false;

if (object instanceof ArrayValueSetter) {
symbol = getStructSymbolFromStructArray(object);
symbol.addValueSetter(object);
isStructArray = true;
} else {
symbol = (Symbol) child.getAttribute(NodeKey.SYMBOL);
}

if (isStructArray) {
ArrayValueSetter vs = (ArrayValueSetter) object;
Symbol structArray = vs.getSymbol();
structArray.addScope(ProgramGenerator.getInstance().getCurrentFuncName());
} else {
symbol.addScope(ProgramGenerator.getInstance().getCurrentFuncName());
}

ProgramGenerator.getInstance().putStructToClassDeclaration(symbol);

if (isSymbolStructPointer(symbol)) {
copyBetweenStructAndMem(symbol, false);
}

Symbol args = symbol.getArgList();
while (args != null) {
if (args.getName().equals(fieldName)) {
args.setStructParent(symbol);
break;
}

args = args.getNextSymbol();
}

if (args == null) {
System.err.println("access a filed not in struct object!");
System.exit(1);
}

if (args.getValue() != null) {
ProgramGenerator.getInstance().readValueFromStructMember(symbol, args);
}

root.setAttribute(NodeKey.SYMBOL, args);
root.setAttribute(NodeKey.VALUE, args.getValue());

if (isSymbolStructPointer(symbol)) {
checkValidPointer(symbol);
structObjSymbol = symbol;
monitorSymbol = args;

GenerateBrocasterImpl.getInstance().registerReceiverForAfterExe(this);
} else {
structObjSymbol = null;
}
break;

一元操作节点

这个节点和在解释器的有很多相同,除了有对结构体的操作,其它的也是有非常重要的作用

  • 像数字、字符串或者是变量和之前的操作都是把信息传递到父节点,交由父节点处理
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
case SyntaxProductionInit.Number_TO_Unary:
text = (String) root.getAttribute(NodeKey.TEXT);
boolean isFloat = text.indexOf('.') != -1;
if (isFloat) {
value = Float.valueOf(text);
root.setAttribute(NodeKey.VALUE, value);
} else {
value = Integer.valueOf(text);
root.setAttribute(NodeKey.VALUE, value);
}
break;

case SyntaxProductionInit.Name_TO_Unary:
symbol = (Symbol) root.getAttribute(NodeKey.SYMBOL);
if (symbol != null) {
root.setAttribute(NodeKey.VALUE, symbol.getValue());
root.setAttribute(NodeKey.TEXT, symbol.getName());
}
break;

case SyntaxProductionInit.String_TO_Unary:
text = (String) root.getAttribute(NodeKey.TEXT);
root.setAttribute(NodeKey.VALUE, text);
break;

case SyntaxProductionInit.Unary_LB_Expr_RB_TO_Unary:
child = root.getChildren().get(0);
symbol = (Symbol) child.getAttribute(NodeKey.SYMBOL);

child = root.getChildren().get(1);
int index = 0;
if (child.getAttribute(NodeKey.VALUE) != null) {
index = (Integer) child.getAttribute(NodeKey.VALUE);
}
Object idxObj = child.getAttribute(NodeKey.SYMBOL);

try {
Declarator declarator = symbol.getDeclarator(Declarator.ARRAY);
if (declarator != null) {
Object val = declarator.getElement((int) index);
root.setAttribute(NodeKey.VALUE, val);
ArrayValueSetter setter;
if (idxObj == null) {
setter = new ArrayValueSetter(symbol, index);
} else {
setter = new ArrayValueSetter(symbol, idxObj);
}

root.setAttribute(NodeKey.SYMBOL, setter);
root.setAttribute(NodeKey.TEXT, symbol.getName());

}
Declarator pointer = symbol.getDeclarator(Declarator.POINTER);
if (pointer != null) {
setPointerValue(root, symbol, index);
PointerValueSetter pv = new PointerValueSetter(symbol, index);
root.setAttribute(NodeKey.SYMBOL, pv);
root.setAttribute(NodeKey.TEXT, symbol.getName());
}
} catch (Exception e) {
e.printStackTrace();
System.exit(1);
}
break;

赋值操作

  • 如果当前是一个数组,先拿到它的符号和下标
  • 如果不是结构体数组,那么拿到下标直接用readArrayElement生成读取数组元素的指令
  • 如果是一个符号则用getLocalVariableIndex读取这个符号的值
  • 如果是一个常数,则直接生成IPUSH指令
  • 最后进行赋值操作,如果不是对结构体的域进行赋值就直接用getLocalVariableIndex拿到队列位置然后生成ISTORE
  • 如果是对结构体数组的元素的域的赋值,就调用assignValueToStructMemberFromArray生成代码,如果只是结构体就直接调用assignValueToStructMember生成代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
ProgramGenerator generator = ProgramGenerator.getInstance();

if (BaseGenerate.resultOnStack) {
this.value = obj;
BaseGenerate.resultOnStack = false;
} else if (obj instanceof ArrayValueSetter) {
ArrayValueSetter setter = (ArrayValueSetter) obj;
Symbol symbol = setter.getSymbol();
Object index = setter.getIndex();
if (symbol.getSpecifierByType(Specifier.STRUCTURE) == null) {
if (index instanceof Symbol) {
ProgramGenerator.getInstance().readArrayElement(symbol, index);
if (((Symbol) index).getValue() != null) {
int i = (int) ((Symbol) index).getValue();
try {
this.value = symbol.getDeclarator(Declarator.ARRAY).getElement(i);
} catch (Exception e) {
e.printStackTrace();
}
}
} else {
int i = (int) index;
try {
this.value = symbol.getDeclarator(Declarator.ARRAY).getElement(i);
} catch (Exception e) {
e.printStackTrace();
}

ProgramGenerator.getInstance().readArrayElement(symbol, index);
}
}
} else if (obj instanceof Symbol) {
Symbol symbol = (Symbol) obj;
this.value = symbol.value;
int i = generator.getLocalVariableIndex(symbol);
generator.emit(Instruction.ILOAD, "" + i);
} else if (obj instanceof Integer) {
Integer val = (Integer) obj;
generator.emit(Instruction.SIPUSH, "" + val);
this.value = obj;
}

if (!this.isStructMember()) {
int idx = generator.getLocalVariableIndex(this);
if (!generator.isPassingArguments()) {
generator.emit(Instruction.ISTORE, "" + idx);
}
} else {
if (this.getStructSymbol().getValueSetter() != null) {
generator.assignValueToStructMemberFromArray(this.getStructSymbol().getValueSetter(), this, this.value);
} else {
generator.assignValueToStructMember(this.getStructSymbol(), this, this.value);
}
}

最后

完成这部分后,对下面的代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
void quicksort(int A[10], int p, int r) {
int x;
int i;
i = p - 1;
int j;
int t;
int v;
v = r - 1;
if (p < r) {
x = A[r];
for (j = p; j <= v; j++) {
if (A[j] <= x) {
i++;
t = A[i];
A[i] = A[j];
A[j] = t;
}
}
v = i + 1;
t = A[v];
A[v] = A[r];
A[r] = t;
t = v - 1;
quicksort(A, p, t);
t = v + 1;
quicksort(A, t, r);
}
}

void main () {
int a[10];
int i;
int t;
printf("before quick sort:");
for(i = 0; i < 10; i++) {
t = (10 - i);
a[i] = t;
printf("value of a[%d] is %d", i, a[i]);
}
quicksort(a, 0, 9);
printf("after quick sort:");
for (i = 0; i < 10; i++) {
printf("value of a[%d] is %d", i, a[i]);
}
}

则会生成下面的Java字节码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
.class public C2Bytecode
.super java/lang/Object

.method public static main([Ljava/lang/String;)V
sipush 10
newarray int
astore 0
sipush 0
istore 1
sipush 0
istore 2
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc "before quick sort:"
invokevirtual java/io/PrintStream/print(Ljava/lang/String;)V
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc "
"
invokevirtual java/io/PrintStream/print(Ljava/lang/String;)V
sipush 0
istore 1

loop0:
iload 1
sipush 10
if_icmpge branch0
sipush 10
iload 1
isub
istore 2
aload 0
iload 1
iload 2
iastore
aload 0
iload 1
iaload
istore 3
iload 1
istore 4
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc "value of a["
invokevirtual java/io/PrintStream/print(Ljava/lang/String;)V
getstatic java/lang/System/out Ljava/io/PrintStream;
iload 4
invokevirtual java/io/PrintStream/print(I)V
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc "] is "
invokevirtual java/io/PrintStream/print(Ljava/lang/String;)V
getstatic java/lang/System/out Ljava/io/PrintStream;
iload 3
invokevirtual java/io/PrintStream/print(I)V
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc "
"
invokevirtual java/io/PrintStream/print(Ljava/lang/String;)V
iload 1
sipush 1
iadd
istore 1
goto loop0
branch0:
aload 0
sipush 0
sipush 9
invokestatic C2Bytecode/quicksort([III)V
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc "after quick sort:"
invokevirtual java/io/PrintStream/print(Ljava/lang/String;)V
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc "
"
invokevirtual java/io/PrintStream/print(Ljava/lang/String;)V
sipush 0
istore 1

loop2:
iload 1
sipush 10
if_icmpge branch4
aload 0
iload 1
iaload
istore 3
iload 1
istore 4
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc "value of a["
invokevirtual java/io/PrintStream/print(Ljava/lang/String;)V
getstatic java/lang/System/out Ljava/io/PrintStream;
iload 4
invokevirtual java/io/PrintStream/print(I)V
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc "] is "
invokevirtual java/io/PrintStream/print(Ljava/lang/String;)V
getstatic java/lang/System/out Ljava/io/PrintStream;
iload 3
invokevirtual java/io/PrintStream/print(I)V
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc "
"
invokevirtual java/io/PrintStream/print(Ljava/lang/String;)V
iload 1
sipush 1
iadd
istore 1
goto loop2
branch4:
return
.end method
.method public static quicksort([III)V
sipush 2
newarray int
astore 6
sipush 0
istore 5
sipush 1
istore 5
aload 6
iload 5
sipush 1
iastore
aload 6
sipush 1
iaload
istore 10
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc "before quick sort: "
invokevirtual java/io/PrintStream/print(Ljava/lang/String;)V
getstatic java/lang/System/out Ljava/io/PrintStream;
iload 10
invokevirtual java/io/PrintStream/print(I)V
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc "
"
invokevirtual java/io/PrintStream/print(Ljava/lang/String;)V
sipush 0
istore 9
sipush 0
istore 3
iload 1
sipush 1
isub
istore 3
sipush 0
istore 4
sipush 0
istore 7
sipush 0
istore 8
iload 2
sipush 1
isub
istore 8
iload 1
iload 2
if_icmpge branch1

aload 0
iload 2
iaload
istore 9
iload 1
istore 4

loop1:

iload 4
iload 8
if_icmpgt ibranch1

aload 0
iload 4
iaload
iload 9
if_icmpgt ibranch2

iload 3
sipush 1
iadd
istore 3
aload 0
iload 3
iaload
istore 7
aload 0
iload 3
aload 0
iload 4
iaload
iastore
aload 0
iload 4
iload 7
iastore
ibranch2:

iload 4
sipush 1
iadd
istore 4
goto loop1

ibranch1:

iload 3
sipush 1
iadd
istore 8
aload 0
iload 8
iaload
istore 7
aload 0
iload 8
aload 0
iload 2
iaload
iastore
aload 0
iload 2
iload 7
iastore
iload 8
sipush 1
isub
istore 7
aload 0
iload 1
iload 7
invokestatic C2Bytecode/quicksort([III)V
iload 8
sipush 1
iadd
istore 7
aload 0
iload 7
iload 2
invokestatic C2Bytecode/quicksort([III)V
branch1:

return
.end method

.end class

小结

这篇的代码生成和之前解释器的思路很相似,都是根据AST和对应的产生式来执行或者生成代码。

其实主要的思路是很清晰的,只是其中有太多细节容易让人太过纠结。这个系列算作是我自己的学习笔记,到这也有十三篇了,下一篇可能写写总结就正式结束了。