Monday, January 19, 2009

Slow Motion: "2:21:45"

At the moment, it takes the stage one compiler, as build by stage zero, 2:21:45 wall clock time to compile the 747 lines of the system file. The marshalled AST/object file has a size of 380.243 bytes.

The compiler is that slow since it uses a stack based interpreter; direct evaluation is impossible since ML cannot handle the recursion depth needed by the compiler. (This is mostly due to text processing. Concatenating large lists of chars is beyond ML.)

That's including semantical analysis, and a lot of printing of intermediate results, which both account for a substantial part of the time. Without semantical analysis, not counting identification and well-formedness checks, it takes 33:46 to compile. I think I can get that number down to about 10 minutes...


Some people don't understand bootstrapping. I guess I should explain that I have a stage zero compiler written in ML which accepts Hi and produces ML. I am now writing a bootstrap compiler in Hi which accepts Hi and produces C. So, to get the final compiler you need to a) build the ML compiler, b) build the Hi to C compiler with that compiler, and c) rebuild the Hi to C compiler with the last compiler. After that the initial ML compiler becomes irrelevant.


The good part is, of course, that the stage one compiler accepts the same source as the stage zero compiler. For comparison, the stage zero compiler compiles the stage one compiler in a few minutes.

The stage zero compiler consists of approximately 14 thousand lines of ML code. The stage one compiler consists of approximately 12 thousand lines of Hi code.

At this rate, it would take one and a half day to compile; or, three hours if I build a compiler without semantic checks. If the factor 120 is representative, I'll end up with a self compile of around 15 minutes. Afterthought: Good news, the C runtime is much faster on method lookup, which is a major bottleneck at the moment, so 15 minutes will be an extreme worst-case.

The latter is still too much. There are no real optimizations done in the compiler, but an ML compile of stage zero is 1 second...

Stuff I can do: (A) Let the ML compiler produce code for the new C runtime (a thousand lines of code), (B) strip down the Hi compiler that it produces code for the C runtime from lambda terms, (C) live with it since, theoretically, it doesn't matter how fast the ML produced compiler is.

No comments:

Post a Comment