Hi Language: October 2010

Thursday, October 28, 2010

Still not pleased

I am still not programming. The thing which bothers me the most is that I don't feel too much like going in the direction of a 'classical' VM -i.e. a VM with primitive types, loads, stores, a calling mechanism and a garbage collector- and generating code for that. Why? My gut feeling tells me I will generate just too much code in the conversion from combinator to the code for a classical VM, rewriting a combinator expression just entails producing code with lots of allocations, and copying of pointers/slots from the term being rewritten to the term being produced.

What I want is a translation scheme to some Graph Rewrite System, or abstract machine, with a high level instruction set which avoids the current explosion when going from combinator to instructions. I am wondering whether the current scheme is just too concrete in its naive translation. Can I get rid of explicit data nodes? Can I translate to, not a graph machine, but trivial combinators on a stack machine?

Saturday, October 23, 2010

Time-Out

Working on the Hi language goes through times where new and old ideas gobble up and I look at them from various angles where actually no code is being written. Ah well, the two, strike that, three things which need to happen, are 1. full conformance (long integers, floats and doubles, native strings, a fast type-checker), 2. a VM with native support for strings, 3. performance.

I'll need to dump the current translation scheme, which is a pity since by keeping it simple I could use the C optimizer to do a lot for free like register allocation and lots of assembly instruction optimizations.

Another of the big drawbacks of going to a VM -which should support strings- is that in the current compilation scheme an allocation is cheap. Functions are translated to combinators, combinators rewrite the heap. The assumption at the moment is that a combinator will always rewrite (to) a DAG, and that there is always enough space in the heap to allocate the intermediates. That way, a) allocation is cheap (just an increment), b) I only garbage collect in between rewrites, which is a fast scheme by itself.

If I am gonna allow string allocations in between, I will open up a severe can of worms, since the assumption that there will always be enough heap space for intermediates must be dropped, thus garbage collections must be traced, thus intermediate pointers from the 'stack' must be tracked and chased also. So, I might as well go for the whole stack of optimizations, hello typed assembly and graph-coloring algorithms.

I use SSA form in the intermediate assembly with an infinite supply of intermediate local registers which is why I also looked at LLVM since that would map, but I might be better of by changing it to stack pushing (also, since the data structure in the heap is a DAG - garbage collection could be done by stack compressing), no idea.

For the bootstrap, I chose the simplest scheme I could get away with for representing records, somewhat with an eye on making it easy for dynamic 'code' generation. It wastes memory like hell; at the moment, I am looking at 1GB compiles, whereas my gut feeling tells me that 20MB should just be enough. Okay, this compiler is pretty naive, and uses list of characters (texts) in places where integers should suffice, but still, without a better scheme for representing data I'll still expect to be in the hundreds of MB region.

I could reuse the ocaml VM, which I kind-of skimmed through a few times, or see what Guile has to give, but still, it looks like I'll need my own VM. (Hmm, CDL3 also has a simple VM, could look there too.)

Saturday, October 2, 2010

C Back-End

I looked at various manners of binding LLVM to Hi. It ain't easy. The most direct would be to just use Hi's FFI for an LLVM-Hi binding, but LLVM has a big back-end, and it is written in C++, whereas Hi binds to C. The best route to take looks like to 'double' the code of the internal assembly used in C with a library for that, and -at some point- write an LLVM back-end instead of a C back-end. It would speed up the emitting phase I hope since it wouldn't need (text emitting) functions written in Hi, which are pretty slow at the moment.