Sunday, May 17, 2015

Term Rewriting with Shared Pointers is a Major Design Decision

A compiler is bookkeeping plus term rewriting. Old compilers were usually full of bookkeeping through the use of tables. They needed to be since resources were scarce and performance was low. As processors evolved and became more powerful, compilers became more declarative and shifted in the direction of term rewriting.

Hi is a declarative language, so naturally, the bootstrap relies little on tables but simply rewrites, massages, the term until it is semantically verified and code can easily be output in a number of steps.

The bootstrap lacked in performance, due to:
  1. Wasteful memory representation of the data being worked upon.
  2. A slow operational model, combinator rewriting instead of the C calling convention.
  3. Lack of imperative constructs.
  4. Lack of fast impure containers.
  5. Garbage collection.
I could have solved 1., made an effort to fix 2., added 3., and ultimately added 4. (but that only becomes necessary, probably, when processing large, >1MLoc, files) and, in a final last effort, implement features to do 5. fast.

The path of a selfhosting declarative compiler seems to be that you keep on adding imperative features until you've reinvented C. I personally loath the Haskell IO monad and barely agree with Clean's uniqueness types. For Hi, it isn't worth it: The language is fine for small programs, making it fast and selfhosting is a 20-100 man year effort, which I don't have, can fail anyway, and I want to keep the language simple and declarative.

Most people agree we hit the end of Moore's Law, the era of the free lunch is over. We had a 3500x speedup over the last decades since the introduction of the first microprocessor, declarative languages cannot wait anymore for another few decades until they become that fast that people want to use them.

So, I decided to ditch the effort and do it all in C++. Fast interpreter, slow language. If it would ever attract a user base, there are plenty of optimizations left.

But since I have a term-rewriting self-hosting compiler as a design reference, naturally the choice for representing terms comes up. Since I rewrite a tree, or DAG, I basically have two choices: use trees and copy, or use DAGS and support sharing with some form of garbage collection. So, it's either copying or sharing. Both have nonperforming corner cases: trees are fast but you might spend too much time copying, for reference counting the overhead may become large on terms when only rewriting trees and the reference counting degrades to pure overhead.

I made the choice to represent the AST with C++'s shared pointers. After having looked at GCC, which uses a pointer-to-massive-struct copying system internally, I wanted to test performance of a sharing solution on a 500KLoc file, a large list of small declarations, with respect to copying. It spends 40% of its time reference counting.

For industrial compilers this would imply that copying beats sharing; at least, pending the implementation of C++'s shared pointer.

It does 30ms over a small file of 5k declarations, so I'll stick to this scheme of shared pointers. It's just too convenient implementation wise, if a node gets rewritten or falls out of scope in a C routine, it is automagically garbage collected. But boy, oh boy, it is horrific to pay performance for design decisions this early in the implementation stage.

No comments:

Post a Comment