Friday, June 25, 2010

Marshalling, serialization, introspection and metaprogramming

The system file, which holds -among others- all primitives, compiles again, and I am looking at how everything w.r.t. the new runtime holds.

I have a compiler which mimicks gcc in the sense that it compiles 'compilation units,' which are files at the moment. I marshal precompiled ASTs in between. Now, for technical reasons, the runtime doesn't play nice with marshalling. Basically, since it needs to insert an unknown number of nodes into it while making an FFI call. Besides that, raw pointers aren't chased since they are seen as series of bits. The, rather clumsy, way around that is just to reserve enough space and do a collect before unmarshalling a structure, and regarding pointers, well, the same trick.

(Why not do it better than that like a-la a Java VM? Well, I am doing translation to, and easy interfacing with, C, and doing it better comes at a costs. I now only try to collect in between combinator rewrites, and that works well and has a very low overhead compared to a 'try-to-collect with each pointer update approach,' which is the only other way to get big structures inserted. A lot of fine-grained combinator rewrites does the same, but leaves my very simple runtime intact.)

I don't like reserving space and I don't like the unsafe pointers.

So, I Figured out the raw pointer problem, I'll 'expose' raw pointers in the heap and just chase them if they point into it, it's the only sane solution.

As far as marshalling goes, As stated before, the other option is to serialize data, which means writing (un-)serialization code in the programming language. I like that manner, it is probably slower, but I don't need to get involved in making the runtime more complex.

Writing serialization code is boilerplate, it is a lot uninteresting error-prone code (the AST has a hundred plus nodes). Since I have interfaces which mimic Haskell's type classes, I'ld like to implement a 'Deriving' feature. Bad thing about that is that it means making the compiler more complex, and I'ld need to implement routines for each possible deriving type of interface.

Another option is to implement introspection code and generate serialization code from that. Bad thing is, since I don't save that much type information and serialization depends on that, I, well, can't really.

Which leaves the last option: Metaprogramming. I have been thinking about implementing a preprocessor a-la C. I am not really that much of a fan of it, it opens a way to abuse, but given everything, that really is the best option since it would solve all my problems, while opening up a lot of interesting features for programmers.

But, it is for later, I'll write some scrap code now.

And I figured out how to do it too, a neat trick. I'll just expose (parts of) the AST as data compile time, and rely on the partial evaluator to optimize the result. That means I (probably) won't generate, or insert, strings in the AST, which means something somewhat less than C templates, but it'll mean I can scrap enough boilerplate code from the compiler, which means, by absurd induction, it'll probably be enough to solve six sigma 99.99..% of all cases.

It boils down to something easy -a primitive which exposes/inserts part of the AST compile time,- so a half-baked, compile-time introspection, no real meta-programming to speak of. Just enough not to write down all definitions explicitly.

I toyed around with some pseudo-code, imagining the resulting code. I cannot produce an explicit test? Or at least, not without generating AST code, I guess, somehow. So, reflection is just not enough... Template meta-programming, going back to the syntactic level?

Scanned the Meta-programming in Haskell paper. Except for monads (yuck), it just follows the introspection/generation pattern. Good: Stuff is typed. Bad: Not a lot of 'syntactic' sugar, and no syntax tricks on the meta-programming level. Explicitly building an AST is just, well, cumbersome. Sometimes, it's just more convenient to run syntactic tricks. See if I can do it differently, go a step back, and just work on the token level.

062610: Man, I assumed the heap garbage collector was obvious, but I guess I'ld better check all invariants there too. Added the pointer chasing code, checked the invariants informally against a pseudo algorithm of classic Cheney. They seem to hold.

My 't' is stuck in Dutch, and my 's' is stuck in English.

No comments:

Post a Comment