Saturday, October 23, 2010

Time-Out

Working on the Hi language goes through times where new and old ideas gobble up and I look at them from various angles where actually no code is being written. Ah well, the two, strike that, three things which need to happen, are 1. full conformance (long integers, floats and doubles, native strings, a fast type-checker), 2. a VM with native support for strings, 3. performance.

I'll need to dump the current translation scheme, which is a pity since by keeping it simple I could use the C optimizer to do a lot for free like register allocation and lots of assembly instruction optimizations.

Another of the big drawbacks of going to a VM -which should support strings- is that in the current compilation scheme an allocation is cheap. Functions are translated to combinators, combinators rewrite the heap. The assumption at the moment is that a combinator will always rewrite (to) a DAG, and that there is always enough space in the heap to allocate the intermediates. That way, a) allocation is cheap (just an increment), b) I only garbage collect in between rewrites, which is a fast scheme by itself.

If I am gonna allow string allocations in between, I will open up a severe can of worms, since the assumption that there will always be enough heap space for intermediates must be dropped, thus garbage collections must be traced, thus intermediate pointers from the 'stack' must be tracked and chased also. So, I might as well go for the whole stack of optimizations, hello typed assembly and graph-coloring algorithms.

I use SSA form in the intermediate assembly with an infinite supply of intermediate local registers which is why I also looked at LLVM since that would map, but I might be better of by changing it to stack pushing (also, since the data structure in the heap is a DAG - garbage collection could be done by stack compressing), no idea.

For the bootstrap, I chose the simplest scheme I could get away with for representing records, somewhat with an eye on making it easy for dynamic 'code' generation. It wastes memory like hell; at the moment, I am looking at 1GB compiles, whereas my gut feeling tells me that 20MB should just be enough. Okay, this compiler is pretty naive, and uses list of characters (texts) in places where integers should suffice, but still, without a better scheme for representing data I'll still expect to be in the hundreds of MB region.

I could reuse the ocaml VM, which I kind-of skimmed through a few times, or see what Guile has to give, but still, it looks like I'll need my own VM. (Hmm, CDL3 also has a simple VM, could look there too.)

No comments:

Post a Comment