Wednesday, December 30, 2009

Log 123009

Various issues: Alpha conversion now in place, trivial. Linker bugs, or rather, failures to provide good errors, when not supplying all the object files. (I use a transform to bind to definitions. If an actual is defined, it becomes a call, otherwise, it becomes a name. I.e. supply the wrong set of modules, and I generate open LC terms.)

It probably wasn't even a substitution bug, but, I feel a lot more comfortable with alpha conversion in place.

First syntax change I'll make when ready: add 'fi.' Just read up on some Ada, I am convinced.

123009: Two flaws: 1) Flattening selects/methods may have an arity bug - not sure. 2) Processing applications to thunks has a bug - confirmed. It's almost the same bug...
123009: Stuck. Did I now do the exception handling wrong... or not?
123009: It's something else...
123009: Looking at the wrong places. Select is ok, exception is ok, it's just the thunk bug - missing 'apply' remove.


All operational semantics preserving translations should be easily expressed with the 'thunkification' trick, i.e. 'shuffling' redexes which evaluate first under one strategy to a place where they will evaluate first under another strategy. Wonder why this took me more than fifteen minutes, actually...

Monday, December 28, 2009

Christmas Kōan LCi 1001

Two hands clap and there is a sound. What is the sound of one hand?
— Hakuin Ekaku

" the beginning a monk first thinks a kōan is an inert object upon which to focus attention; after a long period of consecutive repetition, one realizes that the kōan is also a dynamic activity, the very activity of seeking an answer to the kōan. The kōan is both the object being sought and the relentless seeking itself. In a kōan, the self sees the self not directly but under the guise of the kōan...When one realizes ("makes real") this identity, then two hands have become one. The practitioner becomes the kōan that he or she is trying to understand. That is the sound of one hand." — G. Victor Sogen Hori, Translating the Zen Phrase Book

Of course, you could try a naive translation of LCi to LC such that the resulting term is a list of calls made.

But under the most naive encoding, some terms might reduce under all rewrite strategies to the same term, i.e. superficially make the same calls in exactly the same order, and you still wouldn't know anything about the exact temporal behaviour. I just need observational equivalence on traces of rewrites of terms under an evaluation strategy.

It always boils down to the same observation: Value doesn't equal behaviour.

Sunday, December 27, 2009


Compiled all sources to object files, and ended up with a substitution bug while linking... In a compiler, you really end up paying for each wrong decision.

As mentioned earlier, I thought I could cheat my way around alpha-conversion for performance reasons. The result: bugs. No way around it, I'll add it. I am rereading all my functions on my intermediate LC and adding comments/replacing them with the standard definitions.

I have the tendency to use blogs as scratch pads for half thoughts. It becomes a mess, I cleaned up some posts.

Boehm-Demers-Weiser = Borg?

Strange, I thought it would be nice if I would add bindings to gc.h. But after thinking it over, what would happen: When garbage collecting the Boehm collector would travel the whole Hi heap looking for references to nodes allocated in the Boehm heap. So it might not be worth to have my own Cheney collector running in parallel with Boehm, even if faster?

Log 122709

I turned off the typechecker in order to get the C bootstrap to compile itself. The OCaml compiler compiles to a term which is evaluated by a simple small-step evaluator on a stack of closures, it can't handle deeply nested evaluations.[1] I need to know whether it's able to compile itself, otherwise I'll need to backport the C back-end to the ML compiler. Guess I'll start on unit tests again. It's actually quite hard to derive unit tests for interfaces. Going to check Haskell on that, my interfaces are essentially type classes on type constructors, so the difference shouldn't be that big.

[1] Not ideal, but OCaml runs out of stack space easily. So, I build an explicit evaluator on a stack. Which is correct, but naive, it allocates 1Gb on some of the bigger files. The rewriter is pretty naive, and uses some unknown fact that the semantics of eager LC can be expressed on just a stack of closures, whereas most publications use some three tuple of a stack and some other stuff.

Saturday, December 26, 2009

Two Thoughts

I posted a mail about an impure calculus. Now the question would be: Why would one be interested in that? The answer is rather trivial.

Even pure programming languages, like Haskell and Clean, compile to side-effecting programs, even when they carefully try to hide that. The underlying semantics of a functional program is, without being too pompous about it, therefore best expressed in an impure calculus. Ideally with a corresponding operational semantics, since that is what programs do: They have run-time behaviour.[1][2][3]

Now, the reason why am I interested in lazy calculi: I want mostly eager behaviour -I like predictability,- but at some points I want laziness for it's expressiveness. For example, one really nice thing about Haskell is that equivalence of, say, the contents of two trees may be determined by lazily unfolding them to two lists and checking those lists for equality. This is hard to program in a language with eager semantics. I was wondering if I could `cheat` my way around this problem and settle on a semantics in the middle of lazy and eager behaviour, the language seems to be eager, but, under water, when side-effect free, the language is essentially lazy. I.e., when dealing with pure functions/data, evaluation is lazy.

No idea whether the latter is achievable. Guess not, though Haskell proves otherwise. Rephrasing: Not as easily as I want it - with this semantics I'll end up with an impure Haskell variant. Worst of both worlds.

[1] Black addendum: And if I read one more paper which thinks it'll contribute anything interesting to compiler technology by chasing arrows, I'll puke, and see it as an outright failure of fundamental CS to get anything right.
[2]I am referring to compiler/code-generation techniques here, not top-level type-theoretical concerns.
[3]An ideal compiler for a pure lazy language would compile superficially effect-free programs down to lazy side-effecting terms to eager side-effecting terms to imperative code. Then again, ideals don't work, and it seems it has been investigated anyway. Though this might almost exactly be what GHC does these days.

Rephrasing Thunks

Assume a lambda calculus with side effects/impure function calls:

   M ::= c | v | \v . M | (M M) | call c

An example term: (\x. \y. call (+) x y) 2 3.

The operational semantics of a term is the series of calls made to the system during the evaluation under a certain rewrite strategy. Two terms evaluated under two rewrite strategies are considered operationally equal if they made exactly the same calls in exactly the same order.

The standard naive technique for transforming a call-by-need term to a call-by-value term is called thunkification:
  1. An abstraction \x.e translates into \x.T(e);
  2. An application (e1 e2) translates into T(e1)(\x.T(e2 )), where x is a fresh variable.
    That is, the evaluation of the argument is suspended (thunkified);
  3. A variable x translates into (x d) (where d is a dummy argument)
    Thus, x is bound to a suspension and x must be dethunkified.
I suspect that for this impure calculus, it is sufficient to thunkify lazy terms and optimize the corresponding term according to partial evaluation under a call-by-value semantics. Possibly, arguments to calls can be fully reduced by just supplying them their dummy argument, forcing evaluation. That would make strictness analysis superfluous, I wonder what I missed here?

It seems trivial, but it probably isn't. I have the feeling that there is a generalizing pattern if you would supply operational semantics preserving functions between the two/four mostly known reduction strategies, with the addendum that for impure LC, you might need to 'patch' rewriting such that all calls receive fully reduced terms to mimic behaviour of 'real' down-to-earth programming languages, or assume all calls receive constants as arguments. Of course, translating call-by-need to call-by-value is the most obviously wanted function, but it might be worth it to study and prove correct those other semantic preserving functions in order to gain some better understanding in rewrite semantics for impure LC.

  1. Paul Steckler, Mitchell Wand, Selective Thunkification, Proceedings of the 1st International Static Analysis Symposium. 1994.
  2. Torben Amtoft, Minimal Thunkification, Proceedings of the Third International Workshop on Static Analysis WSA'93, volume 724 of Lecture Notes in Computer Science. 1993.
  3. Alan Mycroft, The theory of transforming call-by-need to call-by-value. International Symposium on Programming, Paris. 1980.

Friday, December 25, 2009

Operational Semantics Preserving Translations

Assume a lambda calculus (LCi) with side effects/impure function calls:
LCi ::= c | v | \v . LCi | (LCi LCi) | call LCi*
Note that * is a sequence of terms.

An example term: (\x. \y. call (+) x y) 2 3.

A call models an impure/non-referentially transparent system call. For simplicity, we assume that if two programs are run, and they have made identical calls, that any two new identical calls will return identical results.

I.e.: if two programs made (precisely) two calls to some function "tick" resulting in values 1 and 2, then the third call to tick will return the same value.

Assume the functions EAGER and LAZY, eager and lazy operational semantics for closed terms in LCi, respectively. They both map an LCi term into a value of type (LCi := call LCi*)* LCi, the collection of made calls in temporal order followed, possibly, by the term in strong normal form - fully reduced. I'll handwave termination concerns away for the time being. EAGER and LAZY are called the operational semantics since it tracks the system calls made and returns the evaluated term, if any. Note that I assume that a call only accepts an LCi term in strong normal form (fully reduced), and returns a term in strong normal form.


EAGER((\x. \y. call (+) x y) 2 3) = (5 := call 2 3), 5.
During the evaluation of the term we made one call to +, returned 5, and the whole program reduces to 5.


EAGER( (\x . call tick) (call tick) ) = (1 := call tick), (2 := call tick), 2.
Two calls to tick returned constants 1 and 2, the whole program reduces to 2.


LAZY ( (\x. call (+) 2 3) (call tick) ) = (5 := call 2 3), 5.


EAGER ( (\x. call (+) 2 3) (call tick) ) = (1 := call tick), (5 := call 2 3), 5.

Two terms are assumed to be operationally equal if they made exactly the same systems calls in temporal order, and possibly reduce to the same term, BUT two identical terms may return a similar result under the two impure operational semantics, but are not equal since they didn't make the same series of system calls.

Needed: two functions e_to_l and l_to_e such that
EAGER(l) = LAZY(e_to_l l),
LAZY(l) = EAGER(l_to_e l), 
EAGER(l) = EAGER(l_to_e(e_to_l l)), and  
LAZY(l) = LAZY(e_to_l (l_to_e l)).
I.e., operational semantics preserving functions between eager and lazily evaluated impure LC terms where = is the alpha equivalence on series of terms.

Rationale: I want to implement some lazy features in Hi, this seems to be one good solution.


  1. Say you would like to implement a Haskell compiler but you only have a Lisp compiler. Solution: translate Haskell to a corresponding LCi term l, translate that term to l' = l_to_e l, translate l' to Lisp. Hereby solving the problem once and for all.
  2. Say you want to have eager semantics in Haskell. That could be done similarly with the e_to_l function.

I didn't think it through too much, a simple CPS transform probably doesn't do it, but may be a good starting point.

Funny thing is that this is a totally opposing view to normal interpretations of operational semantics of LC since I am only interested in observational equivalence with respect to side effects. Which makes sense for someone who builds compilers since the only interesting quality of a program is what it does to the system and the result is often uninteresting.

Log 122509

Solved all typing problems, typechecker seems to be correct, but slow. Rethinking the instance checks since the code seems to be too complex, hinting at bugs. It also generates errors on the wrong spots, which should be fixable easily. Bootstrap accepts 30% of its own source, after which it stalls for unknown reasons.

12/25/09: Its the quadratic time typechecker. I gather type constraints in a simple bottom up pass over the term. It introduces a large number of variables and equalities where for each equality a linear number of substitutions are made on the sequent. Moreover, instantiation constraints also introduce type unknowns. On large terms, the prover run by an ocaml interpreter just stalls. I discard type equalities, for now, from the antecedents which should, almost, halve the running time.

In the end, I want to decorate variables in the AST with their derived types. The running time should be fixable by the introduction of two type unknowns: those unknowns which should be remembered, and unknowns for intermediates, where the former are stored separately in the sequent. A nice to have for now.

Tuesday, December 22, 2009

On Generalizations

The bootstrap compiler has a more restrictive typechecker, basically because I implemented some cases better. Parts of the bootstrap compiler are therefore not accepted. Two problems, which basically boil down to generalization errors.

First -yeah I read the papers,- it assumed, mostly, that generalization is 'good' because of the following example:
let f = [x -> x] in (f 3, f 'a')

If we wouldn't generalize the type of f, it must be the identity on int and char, which fails to typecheck. The only problem: The example is not general, it never happens. Functions which are polymorphic are usually written as definitions, and almost never occur in function bodies.

What does happen is that we want to know in local functions what argument types to expect, such that subsequently we can optimize better, i.e.:
let g = [y -> print y] in g 3
If y has type int, then the generic function print may be optimized to print_int.

Moreover, and this seems to be a source of 'features,' it is hard to generalize in the right order. In the example below, it should be derived that z is a printable value.
f z = let x = [x -> x] in let g = [y -> print y] in g (f z)

Which fails, at the moment.

Which leaves me with two choices. After thinking it through, I derived that linear order solving handles almost all cases, so that's no problem. So, get rid of generalizations is one choice, get rid of the bug the other. Guess I'll stick with generalizations, even if problematic. (Can only generalize over local variables, pattern matching and generalization are a hard mix.)

I guess I forgot an equality somewhere, or forgot assigning some variables monomorphic...

12/23/09: A substitution on a set of monomorphic variables is the set of free-variables after substitution.

Monday, December 21, 2009

Towards a Top-Level Interpreter

I am looking at ways to build a top-level interactive interpreter. Best thing I can think of:
  1. Compile a series of modules to a shared library.
  2. Write an interpreter which compiles (untyped) expressions to thunks using libdl. A simple lambda to SKI to thunks compiler should be enough.
  3. Evaluate thunks with the runtime.
Should do it. Needed: bindings of the runtime into the Hi language.

Sunday, December 20, 2009

Log 122009

After a number of changes, some small, some large, the to ml compiled compiler accepts its own system file again. It also cleanly compiles simple programs to C (fac, list manipulations). Various thoughts:
  1. I simplified the type checker to a prover using solely antecedents and consequents. Both are stored in a list, giving quadratic behavior. It should become O(n x log(n)) here.
  2. In order to avoid quadratic, or worse, behavior I tried to avoid alpha conversion on lambda terms during substitution. There have been/are too many bugs because of that. Should solve this.
  3. I don't inline yet. I should do this at some point.
  4. The method invocations are compiled down to combinators. Case statements are a linear collection of tests, that means linear behavior on lookups. Not sure this is a problem actually.
  5. Everything not essential graph rewriting is done through FFI (libffi) where all functions are loaded from dynamic libraries (libdl). Every FFI call is evaluated each time. FFI is slow already, I could/should add memoization here.
  6. FFI calls are dynamic in nature leading to runtime failures when types are not matched. Should look into swig to automate the glue.
  7. I should allow unary arguments to FFI, now it is assumed all FFI symbols refer to functions.
  8. I wonder how combinators will work out long-time. There might be just too much copying - should think over how spurious copying can be avoided.
  9. I need serialization. This should be simple and obvious if all data is treated as trees.
  10. I started on a simple ReST parser for documentation which I intend to be part of the compiler.
  11. I should prove the type system correct?
Explanation in order why I use FFI even for simple addition: This is probably the reverse of what one would expect since tradional compilers often optimize heavily on native types. But, to me, the order of requirements was: 1) fast graph rewriting, 2) good simple interfacing with C, and 3) reasonably fast compiler. Hence, I considered thinking about integers a waste of time, or, a nice to have.

Writing a compiler is

like pushing a giant wedding cake through a keyhole in the hope you end up with a camel.

Tuesday, December 1, 2009


There's a bug in the garbage collector, somewhere I probably forget to decrease/increase a pointer. (I use a drop-in replacement for Boehm, so I do some pointer arithmetic to get to the meta-field of an allocated structure.)

It is unbelievable how much time it takes to debug one C routine of 50 lines.

[Fixed: returned wrong variable after copying.]

Tuesday, November 24, 2009

Log 112409

First compile of new runtime. Runtime now measures 1.3k fully documented lines of code, which is a thad too much for my taste. Passing around an environment pointer explicitly to make it easier to migrate to thread-safe code. Removed GC_alloc, etc, so Boehm is out at the moment. Thinking over the libffi hack. Start of unit tests on all code.

Moved to another Machine. Again...

My Macbook Air's HD crashed leaving me with not a lot except for the sources which I saved just in time. Instead of building an SSD into the Air, one of the finest laptops I ever had except for overheating, I bought a Macbook Pro Aluminium.

I now own: an Asus S5000 (battery dead), a Dell latitude D420 (battery dead), a Dell XPS (fails to load the battery), a Macbook Air (HD crash), a Macbook Pro Aluminium.

I installed Fedora Core 12, which is almost running smoothly (no key backlights, no sound) in about a day. It took some time for it to boot nicely from the Unix partition after the pre-install. I got the feeling it found the bootloader correctly, but subsequently failed to find the linux partition, not sure what was up with that. The HD uses a weird double partitioning scheme which seems to mess up the bootloader. Refit also seems to have some small bug in it, it stalls sometime on reboot. I use a Mobile Broadband dongle which only works when I first boot into MacOs, so there seems to be something fishy with NetworkManager's initialization scripts. Anyway, guess I'll just wait for that and applesmc to be updated.

The Macbook Pro is a fine machine, but somewhat too sturdy if you're used to an Air. Hope it ain't gonna develop problems again, it might be that Fedora doesn't handle Apple's hardware well...

Monday, November 23, 2009

Log 112309

Reasonably clear. Writing unit tests for what should hopefully be the last runtime...

Tuesday, November 17, 2009

Very, Very, Very Fast 2.5-Phase Compacting

I skipped from Boehm/Demers to a (fast) variant of Cheney. Cheney has one serious drawback, it is a copying collector so it wastes memory. I guess I might go for a compacting algorithm for multicore machines. Here a minor variant on compacting.

I assume all nodes only refer to older nodes. For simplicity, I assume here that records only hold references. Nodes also don't point into other nodes.

Assume each node has a meta-field in which is stored the size of the node and an empty field used during compaction.

A running example:

A: B: C: D: E:
[4,#|E,B] ..n.. [3,#|C] [4,#|D,E] ..m.. [3,#|E] ..k.. [2,#|]

For simplicity, the nodes only refer to other nodes. A is a node [4,#|E,B], the first two fields are meta-information, the latter fields the actual content. A has size 4, the second field is not used. The content holds two fields which are references to nodes E and B. ..n.. denotes a space of size n filled with nodes which will be reclaimed.

The collection algorithm:

  1. Nodes are processed from left to right, the first node is marked life. When a marked life node is processed, its children are marked life. In a life node being processed the empty space preceding it is stored, i.e. the sum of the size of all unmarked nodes. For each unmarked node, we add its size to the unused space, and advance to the next node.

    A: B: C: D: E:
    [4,0|E,B] ..n.. [3,n|C] [4,n|D,E] ..m.. [3,n+m|E] ..k.. [2,n+m+k|]

  2. Slide the nodes from left to right, for each field, look up in the referenced field how much that node will be moved and recalc the correct new pointer.

    So, halfway after sliding A and B the heap will look like:

    A: B: C: D: E:
    [4,0|E-n-m-k,B-n] [3,n|C-n] ..n.. [4,n|D,E] ..m.. [3,n+m|E] ..k.. [2,n+m+k|]

  3. Major disadvantage, it slides to the left, where we want to compact to the right. So, that could be solved with an extra phase which moves the memory n+m+k to the right.

    A: B: C: D: E:
    ..n+m+k.. [4,0|E,B+m+k] [3,n|C+m+k] [4,n|D+k,E] [3,n+m|E] [2,n+m+k|]


Room for improvement? The correct final offsets could be calculated in the second phase, leaving a simple copy in the last phase. Nodes to be reclaimed could be merged in the first phase by setting the length field of the head node to the sum of all to be reclaimed nodes following it.

Benefits, ok-ish performance expected since it only has 'two and a half' phases and reasonable cache coherence if most nodes are relatively close. It also keeps the age order correct, which should give better performance than a Cheney collector.

Additional notes:

A disadvantage seems to be that we visit all nodes in the heap during a linear scan, instead of just life nodes during the mark phase. But given modern CPUs, we can expect that to be a minor overhead since for each non-marked node the pointer is just advanced with its size, and often nodes are expected to be in the chache anyway - which gives very good locality, and nothing is written to memory. The algorithm has two linear scans, where other algorithms might use some data structure to maintain information to not visit all nodes. The second phase could be made faster by storing forwarding pointers to the next life node, or adding them to a stack - then again, not sure that is worth the effort. (A slight adaption, in the first phase, let each dead node 'eat' the next dead node by setting it's size to the sum of both nodes.)

For a functional language we expect a large number of dead objects, hence the last phase, a simple copy of a small number of objects, should be relatively light. Its a time/space trade-off, expected running time is probably slower than Cheney, since the whole heap is visited (twice), but it halves the memory space used.

Then again, its my first try at compacting. Maybe a good book?

Something itches in the back of my mind, an old compiler building joke: "Nothing beats free lists."

11/19/2009: (Checked it against 'A Compacting Garbage Collector for Unidirectional Heaps (1998)' by Kent Boortz, Dan Sahlin which I just scanned for basic complexity. It has roughly the same overall complexity, but this algorithm is way simpler, so I guess I am save.)

Thursday, November 12, 2009

Running Time

Did some on and off programming at snail speed. Back-end is starting to look pretty slick. Compiled out the method calls, compiled out the try/catch statements, compiled out the FFI, reduced almost everything to graph rewriting, generating pretty clean C from intermediate, now fixing the runtime.

(Oh yeah, now aiming multicore.)

Sunday, October 18, 2009


Since ANF and DOT are just convenience representations, I am removing them by 'fusing' the introduction and removal of DOT. I also simplified the throwing of exceptions: explicit code has its merits, since it generates readable output, but the overall aim is a minimal, but flexible, fast compiler.

Saturday, October 17, 2009

HD Failure

Duh, disk crashed. Which, given headaches and the time I have left, ain't good. Fortunately, saved my compiler to USB just on time. I lost the test scripts and all other stuff I was developing - I usually work on multiple projects. Could have been worse, though, I guess... Moved it to one of my other laptops, it compiles again.

Tuesday, June 2, 2009


ANF and DOT are convenience representations used in the compilation process of lambda terms.

ANF is a representation which explicitly linearizes local computations by binding them to let expressions. Every let expression let x = E0 in E1 is equivalent to a normal lambda expression (x . E1) E0.

DOT is a representation which also lifts subcomputations out of lambda terms. I.e., a term E0 E1, where E0 and E1 are not in normal form, is translated to (. .) E0 E1, any normal form is not lifted out of a term.

When translating lambda terms to C, the question for my runtime is where a thunk should be pushed in the GRS, or whether a term in normal form should be placed into a thunk awaiting a result.

The advantage of DOT over ANF is that it makes explicit the thunks and where results should be stored and it linearizes the term. I.e., since dots denote where arguments should be placed in order, local computations in subterms can be lifted such that we only arrive at a series of thunks which should be pushed.

The advantage of ANF over DOT it that it is easier to translate to C since the let bindings are closer to assignment to local variables.

DOT should be more suitable for me.

Monday, June 1, 2009

Clearing tha Head

The dot transform is nice representation. But, thinking it over, why would I prefer it over ANF? They serve the same purpose.

Sheesh, and I need half-a-year to figure this out... Guess the head needed some water.

Sunday, May 31, 2009

A Narrow Road

I once had a great idea to compile to Thunk form, which is explicitly different from a stack machine, but is a simple representation for a graph rewrite system. The idea was a follows, the fac function

fac = \n -> if n = 1 then 1 else n * (fac (n-1))

Is compiled to

fac = \n -> if n = 1 then [1] else
[. . .] [*] [n] [[. .] [fac] [[. . .] [-] [n] *[1]]]

Reduces like

[fac 3]
[. . .] [*] [3] [[. .] [fac] [[. . .] [-] [3] *[1]]]
[. . .] [*] [3] [[. .] [fac] [[. . 1] [-] *[3]]]
[. . .] [*] [3] [[. .] [fac] [[. 3 1] *[-]]]
[. . .] [*] [3] [[. .] [fac] [*[- 3 1]]]
[. . .] [*] [3] [[. .] [fac] *[2]]
[. . .] [*] [3] [[. 2] *[fac]]
[. . .] [*] [3] [*[fac 2]]
[. . .] [*] [3] [[. . .] [*] [2] [[. .] [fac] [[. . .] [-] [2] *[1]]]]
... [. . .] [*] [3] [[. . .] [*] [2] [[. .] [fac] *[1]]]
... [. . .] [*] [3] [[. . .] [*] [2] *[1]]
... [. . .] [*] [3] *[2]
... *[6]

An optimizing compiler shouldn't push constants, so would evaluate (generate code) like this

fac = \n -> if n = 1 then 1 else
[* n .] [[fac .] *[[- n 1]]]

I am trying to get rid of the thunk form, to simplify the internal representation, and just use a dot to show where an application series is expecting an argument. I.e. the brackets now are simple parenthesis denoting subformulas, and after 'dotting' the formula it is 'linearized', i.e. the order of applications do not matter that much any more.

Tuesday, May 12, 2009

Thursday, May 7, 2009

Small Mistakes

John McCarthy insists on having a good abstract syntax, and how right he is. It are the small mistakes in the abstract syntax which really careen the compilation process. Small work-arounds pop up to fix problems which just "shouldn't" be there and code starts to look perfidious. And it just takes ages to find why, what, when, which mistake was made.

Refactoring miasmic code.

Sunday, May 3, 2009

Reality Check

I haven't been working on the language for quite some time now. I recently started again. Puzzling thought: I now added two extra internal languages to the compiler. A minimalist untyped lambda calculus core language, and a code language, which should abstractly represent the C subset I am compiling to.

Why? These are two extra levels of indirection, which if I look hard at them, are slightly more direct (no meta-information to handle), but I think I can get away with restricting myself to the original abstract syntax, meanwhile ascribing a slightly different semantics while converting.

Friday, March 13, 2009

Log 031309

Not sure I like the way I am heading with this...

Thursday, March 12, 2009

Log 031209

Still minimizing code. Got it down to six basic instructions. I like minimalism, I think I can get it down to four...

Wednesday, March 11, 2009

Log 031109

Code generation (hi). Think I spotted the logical error.

Tuesday, March 10, 2009

Log 031009

Writing code generation code (hi). Introducing lambda_return, to make sure I got my code generation right.

Sunday, March 8, 2009

Log 030809

Left neuron reconnected to right neuron; took about a month. (Or is it one-and-half month?). Hope this won't be the cycle I'll get stuck in.

ML generation is probably a bad idea. Intermediate byte code is a good idea; reduced it to approx ten instructions. That should be enough.

Various edits (code.hi, lambda.hi, ml.hi, c.hi). Introduced lambda_expand. Re-evaluating thunk code, seems correct, but should be simplified.

What Kind of Development Model is This?

Instead of debugging run-time code, which gave me a headache. I decided to implement some features, to simplify the runtime. I might also go for some intermediate bytecode representation.

Compiling out methods sure is a good idea. Added about twenty lines to compile them away, and I think I can remove a few hundred lines this way.

I am also building a back-end to compile stage1 to ML instead of C. Just to get to the point that I can self-compile, once. Not even sure what that buys me, except for some confidence that most of the generated code doesn't contain errors.

Thursday, March 5, 2009

Log 030509

A month has passed without progress. Added a counter, looking at documentation generation (Haddock), back to debugging.

Wednesday, March 4, 2009

Coalgebraic Comparison

If you lax some restrictions on types, you can describe some nice new types like coalgebraic lazy lists. Coalgebraic since it approximates the coalgebraic carrier type, and lazy since an extra construct is introduced which acts like the 'step'-function on a coalgebraic structure.

Below, an example.

import "system.hi"

namespace colist (

    using system
    using opt

    type colist = \=> unit -> opt (a, colist a)

    def conil: colist a = [ _ -> nothing ]

    def cocons: a -> colist a -> colist a =
        [ x, xx -> [ _ -> opt.just (x, xx) ] ]

    def cocat: colist a -> colist a -> colist a =
        [ xx, yy ->
            [ u ->
                v = xx u;
                if opt.nothing_test v then yy u
                    (x, xx) = opt.value v;
                    opt.just (x, cocat xx yy) ] ]

   def cocompare: ::Ord a => colist a -> colist a -> int =
        [ xx, yy ->
            v0 = xx nop;
            v1 = yy nop;
            if opt.nothing_test v0 then
                if opt.nothing_test v1 then 0 else 1
            else if opt.nothing_test v1 then (- 1)
                (x, xx) = opt.value v0;
                (y, yy) = opt.value v1;
                if x < y then (- 1)
                else if y < x then 1
                else cocompare xx yy ]

    def coprint: colist int -> unit =
        [ xx ->
            v0 = xx nop;
            if opt.nothing_test v0 then
                print "nil"
                (x, xx) = opt.value v0;
                _ = print (x+0);
                _ = print ", ";
                _ = coprint xx; 
                nop ]


using system
using colist

type tree = \=> [ leaf | branch (tree a) a (tree a) ]

def tree_to_colist: tree a -> colist a =
    [ leaf -> conil
    | branch l v r ->
        [ u ->
            cocat (tree_to_colist l) 
               (cocons v (tree_to_colist r)) u ] ]

Sunday, February 15, 2009

Log 021509

On the rare occasion I am sharp enough to debug: strange errors in the translation of terms, unfortunately they are logical errors. Sigh.

Saturday, February 14, 2009

Travelling the C Stack

I use a Cheney style GC which is triggered, and may start a collection, each time a term is rewritten. The collector starts of with one root. This gives a minimal overhead, since in other GCs collections are tried each allocation step and this scheme does without that overhead giving an allocation routine of just a few machine instructions.

A problem with this scheme is that there may not be enough room in the heap for allocating a new term during a rewrite. Normally, less than a hundred bytes will be allocated, so keeping a spill-over region will be quite safe, except for FFI which may introduce a term of unknown size for example while reading files.

The are several ways of handling this: a) adjusting to a normal GC where an allocation may be interrupted with a collection, which also would imply that I would need to travel the C stack to collect other roots, b) explicitly introduce an allocation routine in the language which will grow the heap such that enough room is available for building large terms, c) allocate large terms in another region, d) never allocate large terms, instead pass pointers from C and work on opaque structures which must be deallocated explicitly.

Option d) it is.

Tuesday, February 10, 2009

Finalizers Considered Questionable

If you glue C to a GC-ed language, you end up with the question what to do with structures passed from C. One way of handling them would be to add finalizers to objects in the GC-ed language. I am figuring out what others did. A post from Python-Dev.

Here's my $0.02.

I agree with the sentiments that use of finalizers
should be discouraged. They are extremely helpful
in cases like tempfile.TemporaryFileWrapper, so I
think that they should be supported. I do think that
the language should not promise a high level of service.

Some observations:

- I spent a little bit of time on the ANSI
Smalltalk committee, where I naively advocated
adding finalizers to the language. I was resoundingly
told no. :)

- Most of the Python objects I deal with these days
are persistent. Their lifetimes are a lot more complicated
that most Python objects. They get created once, but they
get loaded into and out of memory many times. In fact, they
can be in memory many times simultaneously. :) A couple
of years ago I realized that it only made sense to call
__init__ when an object was first created, not when it is
subsequently (re)loaded into memory. This led to a
change in Python pickling semantics and the deprecation
of the loathsome __getinitargs__ protocol. :)

For me, a similar case can be made against use of __del__
for persistent objects. For persistent objects, a __del__
method should only be used for cleaning up the most volatile
of resources. A persistent object __del__ should not perform
any semantically meaningful operations because __del__ has
no semantic meaning.

- Zope has a few uses of __del__. These are all for
non-persistent objects. Interesting, in grepping for __del__,
I found a lot of cases where __del__ was used and then commented
out. Finalizers seem to be the sort of thing that people
want initially and then get over.

I'm inclined to essentially keep the current rules and
simply not promise that __del__ will be able to run correctly.
That is, Python should call __del__ and ignore exceptions raised
(or provide some *optional* logging or other debugging facility).
There is no reason for __del__ to fail unless it depends on
cyclicly-related objects, which should be viewed as a design

OTOH, __del__ should never fail because module globals go away.
IMO, the current circular references involving module globals are
unnecessary, but that's a different topic. ;)


I think I agree with this: ... GC is about memory management, not resource management .... The gist: since you cannot, in general, rely on finalizers being called when needed, or in the right order, better to avoid them.

Friday, February 6, 2009

Wednesday, February 4, 2009

Compiling out Select

Yet again a change scheduled to the compiler. If I introduce type specialization I can get rid of select statements and, again, simplify the runtime.

Log 020409

Fixing void arguments and results (c).

Tuesday, February 3, 2009

Log 020309

Still slow. Stack-exception (sigh) (text -> lists of text) (ml).

Thursday, January 29, 2009

Log 012909

Another slow day. Missed a rule in lambda_transform_child? (hi) Fixing test_lib.hi (hi) (bindings to OS)

Language Design

I took some care in designing the Hi language and its implementation. Recently, there was some debate on LtU whether design and implementation issues should be discussed. A problem is that there are no good designers of programming languages, even, there is no good body of knowledge to abstractly discuss what is, or would constitute, a good programming language.

So, what do we need? Some form of applied semiotics?

Why Functional Programming Matters

Short story.

When I was doing my master studies we were introduced to, I think, Miranda. One of the firsts home-work assignments was to write an interpreter which I had to deliver with a friend.

I had an Atari, with a PC emulation chip, but the emulation wasn't very good, and didn't work on all programs. Anyway, it didn't accept the environment given. So I ended up coding about a two hundred lines out of my head at home.

Next day, we had to give in the assignment. As usual, I was late, so with a floppy with source code and about a minute left, I met with my friend. He said to forget about it, drop the assignment and go to class.

I logged in on a Sun workstation, uploaded the source code, and ran it. It worked flawlessly.

That's why functional programming matters.

Wednesday, January 28, 2009


Debugging: the stage0 compiler produces tracing output, the ML runtime of stage0 produces tracing information, the stage1 compiler produces tracing output, the code produced by the stage1 compiler contains information, the C runtime produces tracing information, there is tracing information in the test suits, and I use gdb/nemiver.

And in the end, bugs are solved while I sleep.

Log 012809

Left neuron has trouble communicating with right neuron. Fixing bug in ___apply (expand and copy continuations) (hi,c). Fixing substitution bug (hi) (labels do not remain unique after a split in two phases). Getting test_hof1.hi to work.

Monday, January 26, 2009

Paying the Price

My compiler translates input to a Graph Rewriting System expressed in C. I recently made a design decision to put everything except for the GRS, which means everything except for the treatment of literals, outside of the runtime.

This means that, for example, addition is handled through FFI. Of course, you pay a substantial performance price for that: Ackerman now runs in 5.6 seconds instead of, the best I got so far, around one second.

The benefit is that extending Hi programs with calls to, simple, C libraries now means nothing more than possibly introducing labels for types, and then calling the C libraries directly.

At some point, I might change it again, but I hope that I can optimize later by folding sequential specific calls w.r.t. predefined C libraries to specialized C code.

At this point, I can only hope that most programs don't resemble Ackerman.

Sunday, January 25, 2009

How To Compile a Lambda Term1

native_int* __fac(native_int* n) {

    native_int arity = 0;
    native_int* v_0;
    native_int* r_15 = (native_int*) n[1];
    native_int* r_16 = (native_int*) n[2];
    native_int* r_17 = (native_int*) n[3];
    native_int* r_18 = (native_int*) n[4];

    native_int* r_19;
    arity = 1;
    if (n[5] >= arity) goto reduce;
    native_int* reta = (native_int*) n[3];
    native_int reti = n[4];
    reta[reti] = (native_int) n;
    r_19 = (native_int*) n[1];
    goto finally;
    reduce: n=n;
    v_0 = (native_int*) n[6];

    native_int* r_20;
    if (v_0[0] != 0) goto label1;
    if ( strncmp( (char*) v_0[1], (char*) "", 128) != 0 ) goto label1;
    if ( strncmp( (char*) v_0[2], (char*) "", 128) != 0 ) goto label1;
    if (v_0[4] != 0) goto label1;

    static native_int r_21[5] = {(native_int) 0, (native_int) "", (native_int) "", (native_int) 1, (native_int) 1};

    r_17[(native_int) r_18] = (native_int) r_21;
    r_20 = r_15;
    goto label0;
    label1: n=n;

    native_int* r_22 = (native_int*) __system_dml;
    native_int* r_23 = v_0;

    native_int* r_24 = 0;
    native_int* r_25 = GC_malloc(8 * sizeof(native_int));
    r_25[0] = (native_int) r_22;
    r_25[1] = (native_int) r_15;
    r_25[2] = (native_int) r_16;
    r_25[3] = (native_int) r_17;
    r_25[4] = (native_int) r_18;
    r_25[5] = (native_int) 2;
    r_25[6] = (native_int) r_23;
    r_25[7] = (native_int) r_24;
    if (n[5] == arity) goto dontexpand26;
    r_25 = ___expand(r_25, n, arity);
    dontexpand26: n=n;
    native_int* r_27 = r_25;
    native_int* r_28 = (native_int*) 7;

    native_int* r_29 = (native_int*) __fac;

    native_int* r_30 = 0;
    native_int* r_31 = GC_malloc(7 * sizeof(native_int));
    r_31[0] = (native_int) r_29;
    r_31[1] = (native_int) r_25;
    r_31[2] = (native_int) r_16;
    r_31[3] = (native_int) r_27;
    r_31[4] = (native_int) r_28;
    r_31[5] = (native_int) 1;
    r_31[6] = (native_int) r_30;
    native_int* r_32 = r_31;
    native_int* r_33 = (native_int*) 6;

    native_int* r_34 = (native_int*) __system_dsb;

    native_int* r_35 = v_0;

    static native_int r_36[5] = {(native_int) 0, (native_int) "", (native_int) "", (native_int) 1, (native_int) 1};
    native_int* r_37 = GC_malloc(8 * sizeof(native_int));
    r_37[0] = (native_int) r_34;
    r_37[1] = (native_int) r_31;
    r_37[2] = (native_int) r_16;
    r_37[3] = (native_int) r_32;
    r_37[4] = (native_int) r_33;
    r_37[5] = (native_int) 2;
    r_37[6] = (native_int) r_35;
    r_37[7] = (native_int) r_36;
    r_20 = r_37;
    goto label0;
    label0: n=n;
    r_19 = r_20;
    finally: n=n;
    return r_19;

1Without a ridiculous CPS transform ;oP. But with some small errors. ;o)

Log 012509

Fixing the back-end (hi). Small changes to the runtime (c). Getting fac.hi to compile (hi). Getting test_hof.hi to compile (c).

Thursday, January 22, 2009

Consecrated Source Code

Who, if we knew
What we receive, would either not accept
Life offered, or soon beg to lay it down.


The stage one compiler accepts its own source, identifies correctly, and can dump the AST to object files. This is great, since I was a bit worried it wouldn't be able to compile itself. (ML might run out of memory/stack space, you never know.)

On the top level, I can see the ML GC at work. Identify a few words, pause, identify, pause, identify, pause, ...

All is set to bootstrap, just need to fix the runtime.

Part of the stage zero compiler is a faithful, I should say: direct, translation of the operational semantics. So, I guess I can state that both operational semantics as the stage one compiler are in good shape.

Next steps: fix the linker/translation to C, as well as the bugs in the C runtime. Build a lot of small programs to test. Probably implement some small optimizations on the lambda terms. And then, ... one long, long, compile to bootstrap.

After that, conformance suits and some small fixes in the semantical analysis of stage one.

Combinatorial Explosion

Just a memory.

Have you ever been in a MDMA rush while performing the light show of a house party and being sheered at by a crowd of two thousand plus because you managed to press a series of buttons at the right moment?

Meanwhile, I was thinking about avoiding combinatorial explosion of solving nand-encoded prime factoring boolean terms.

This was a long time ago. I grew out of it, all of it.

But it was good...

Unexpected: 2+3 = ... 6!

Sloppy copy of *. FFI works, on hand crafted code.

Log 012209

Slow. Fixing dynamic loading back-end (hi). Small tests on C FFI (C). -Wall (C).

On Constants

I am weird, I read source code, just to learn. Apache, Java Swing, Clean, Haskell, gcc, ogre 3d, irrlicht, ml light, Tinyhttpd, various lisps, smv, Mozilla javascript, Helium, gtk, HOL light, ...

Invariably, with the exception of 1%, I'll find a mess. It is just a fact, people can't program, or programming is too hard.

I read the Python source to see how they handle dynamical loading of shared libraries. I wasn't sure I actually needed a cache for the handles.

Surprisingly, an array of 128 handles are allocated. I am a conservative programmer, I grow the area of handles when needed.

But thinking of it: Who's right? Python will crash when more than 128 libraries are loaded; but that will not happen very fast. My code will handle a lot of libraries; but if my code is abused, it is not thread-save, or contains a bug, it will silently keep growing and keep using memory.

There is a point in coding with hard constants.

Multitasking Heat

Multitasking: fixing small bugs in the make file for stage 1, building a new runtime, fixing the module system, fixing the back-end such that functions from C shared libraries can be called directly.

And all that while my Mac is overheating while crunching through the source of stage 1.

Wednesday, January 21, 2009


I compile the AST to an intermediate representation, untyped lambda terms. That is for convenience, but how convenient is it? As it looks now, most transformations are just as well done on the AST, with the added benefit that work is moved from the linker to the compiler; and another benefit, that functions become available for other optimizations on the AST like type directed optimizations.

On the other hand, it gives a clear split between a front and back/intermediate end.

I compile the lambda terms to text directly. That might have been a pretty bad idea.

**** ML

This now happened to me once too often:

Fatal error: exception Stack_overflow

If there is a limit on the stack, please don't write a functional language.

Log 012109

Bugs in incremental compilation (hi). Initial commit of new C runtime (C). Makefile for C runtime (C). Added provisional support for loading dynamic libraries (hi/C).

You have to love git


VERSION=`date +"%m%d%y"`

git archive --format=tar --prefix=$PREFIX/ HEAD | gzip > ../$PREFIX$VERSION.tgz

Tuesday, January 20, 2009

O(log(N)) or O(1)?

I use a binary search on types and methods to find the corresponding function. This has O(log(N)).

It is possible to compile to O(1) lookup, assuming that the source is type-checked. A select term can be written in compiled down lambda terms as ((select "a_method") a_value). The term (select "a_method") can be translated to a combinator "F_a_method". If the value holds an integer of a fixed range, the number of types, O(1) is possible.

I am sticking with the first scheme for now.

Why Generalize?

In most academic papers, it is assumed that a modern compiler will generalize the types of let bindings in definitions. The classical example is something like:

let f x = x in (f true, f 1)

If the type of f is not generalized to (forall a. a -> a) before used in the body, it will not type check.

Unfortunately, in my source code I cannot find any examples where any locally bound identifier should be generalized.

Why have all this machinery for generalization?

Actually, given more detailed type information for the arguments of f, it is substantially more trivial to generate faster code for f. So that's a good argument against generalization.

Toilet Adornments

When you read Lambda the Ultimate, you would think writing a compiler has a lot to do with science. Unfortunately, 99% of the time you're stuck with more worldly problems.

Mundane problems encountered in a day:

  • Forgetting to pass the current module bindings to all bindings during identification

  • Forgetting to add "." as a default search directory to the search path

  • Returning the current module in the list of all loaded modules

  • Not adding a good fall-back for "exhausted all cases" in the operational semantics

  • Compiling string literals in the import section down to char lists

  • Forgetting that if the result of an evaluation is an exception, the runtime should exit with another exit code


In the absence of a specification, conformance of a compiler is measured by bootstrapping it.

Not nearly good enough for what I am aiming at. So I'll need a tight language specification, and conformance testing suite.

Like every programmer, I am not really looking forward to testing.

The devil plays with my compiler and my blog: While I was writing this, I got this message from the compiler.

Fatal error: exception Runtime.RuntimeException("reduce error")

That is, a bug in the runtime of stage zero. A good reason for a conformance suit. Dealing with this tomorrow. Of to bed. (This happened while loading more than one module)

The Hi Mascot

Mascot Mas"cot, Mascotte Mas"cotte, n. [Through French fr. Pr. mascot a little sorcerer or magician, mascotto witchcraft, sorcery.]
1. A person who is supposed to bring good luck to the household to which he or she belongs. [1913 Webster]
2. Hence: Anything that brings good luck; especially, an animal kept by a group, as a sports team, to serve as a symbol and to bring luck. [1913 Webster +PJC]
-- From The Collaborative International Dictionary of English v.0.48

Hi is an abbreviation of Hieronymus, after my favourite painter, Hieronymus Bosch. I used that as a working name since Erik Meijer had published a small language named Mondrian after another Dutch painter, Piet Mondriaan.

It is a picture taken from the painting 'A Ship of Fools.' Not very likely to help with language adoption as the Python mascot from the Python language by Guido van Rossum, but I like it.

Initial Tests on the Module System

It seems to work. After the system file is processed, there is hardly any overhead for producing small programs which use the system.

So, I am now building a linker and getting ready for a full compile.

Sometimes modularity really works. I only need one run of the compiler to dump object files from the AST; after that, it's just writing and debugging the linker which produces the C code. So, the three hours to produce the object files are now mostly irrelevant.

Programming in Hi

Is a Blast! Now, only if I knew why?

... this is a space reserved for the Hi Language design philosophy... Some initial thoughts:

  • Be smart, don't be intelligent

  • If it looks wrong, it is wrong

  • What's wrong with perfection, anyway?

  • LOCs are relevant, less LOCs are more relevant

  • True, if tested

  • It there is more than one way of doing it, then all of them are wrong

Log 012009

Performance. Incremental compilation (hi). Makefiles for stages (hi). Debugging the module system (hi).

Monday, January 19, 2009

Slow Motion: "2:21:45"

At the moment, it takes the stage one compiler, as build by stage zero, 2:21:45 wall clock time to compile the 747 lines of the system file. The marshalled AST/object file has a size of 380.243 bytes.

The compiler is that slow since it uses a stack based interpreter; direct evaluation is impossible since ML cannot handle the recursion depth needed by the compiler. (This is mostly due to text processing. Concatenating large lists of chars is beyond ML.)

That's including semantical analysis, and a lot of printing of intermediate results, which both account for a substantial part of the time. Without semantical analysis, not counting identification and well-formedness checks, it takes 33:46 to compile. I think I can get that number down to about 10 minutes...

Some people don't understand bootstrapping. I guess I should explain that I have a stage zero compiler written in ML which accepts Hi and produces ML. I am now writing a bootstrap compiler in Hi which accepts Hi and produces C. So, to get the final compiler you need to a) build the ML compiler, b) build the Hi to C compiler with that compiler, and c) rebuild the Hi to C compiler with the last compiler. After that the initial ML compiler becomes irrelevant.

The good part is, of course, that the stage one compiler accepts the same source as the stage zero compiler. For comparison, the stage zero compiler compiles the stage one compiler in a few minutes.

The stage zero compiler consists of approximately 14 thousand lines of ML code. The stage one compiler consists of approximately 12 thousand lines of Hi code.

At this rate, it would take one and a half day to compile; or, three hours if I build a compiler without semantic checks. If the factor 120 is representative, I'll end up with a self compile of around 15 minutes. Afterthought: Good news, the C runtime is much faster on method lookup, which is a major bottleneck at the moment, so 15 minutes will be an extreme worst-case.

The latter is still too much. There are no real optimizations done in the compiler, but an ML compile of stage zero is 1 second...

Stuff I can do: (A) Let the ML compiler produce code for the new C runtime (a thousand lines of code), (B) strip down the Hi compiler that it produces code for the C runtime from lambda terms, (C) live with it since, theoretically, it doesn't matter how fast the ML produced compiler is.

Reasons Progress is Slow

Top ten reasons progress is slow:

1. I think too much, and don't start coding
2. I think too little, and start coding
3. I kill-off more code than I write
4. If the code is not perfectly clear, it must be wrong
5. I re-invent every step taken in compiler design for the last 50 years
6. Bugs, and unexpected behaviors (ML/Boehm)
7. No tracking
8. Too many irrelevant Lambda discussions
9. Too much irrelevant science
10. I started blogging

Must have: Specialization

A must have for the language is specialization of a value to a type. A) It is a nice to have. B) It makes handling of thrown exceptions easier (in Hi any value can be thrown).

For orthogonality reasons, it would be nice if the syntax of specialization would resemble a case distinction. Or, better, if it could be handled inside a case. It must be treated differently, because, after specialization, the value gets a new type.

This seems reasonable (A):

[ i: int -> do something with i
| c: char -> do something with c
| l: list (char, a) -> do something with l ]

So, I need to be able to compile type-cases on values...

Or, a more trivial approach (B):

[ i: int -> do something with i
| c: char -> do something with c
| l: nil -> do something with l
| l: cons -> do something with l ]

Make a case distinction not on the type label but on the constructor. This is easier than the former, but probably less nice?

So, it's (A) or (B). Pro's and cons's:

+A, nice readability.
-A, hard to implement.

+B, you make a case distinction immediately, so that might also result in short code.
-B, you need access to, or expose, the structure of the type.

Question, given specialization on types, we might as well specialize on the interface? (There's no RTTI for that...)

After some thought: (A) is lousy, in the absence of a lot of RTTI you might need to visit the whole graph to derive the correct type instance (observe list (opt int)).

There is also option (C), differentiate on the type constructor:

[ i: int -> do something with i
| c: char -> do something with c
| l: list -> do something with l ]

(C) it is. Not really nice, but the best of (A) and (B).

Log 011909

Option parsing (hi). Marshalling (ml). Modules (hi). Runtime makefile (ml). Hi favicon.

Performance Issues

This is a reference list I keep to measure performance between various implementations.

It computes (ackerman 3 8). The list is from a languages shoot-out, but run on a 500Mhz machine. Some of my measurements are on a 1Ghz laptop, the latest are on my Macbook Air, so I am of at least a factor two. All my own measurements are normalized towards a 1Ghz machine. I don't treat integers as primitives in the language so I am fine with the performance.

ocaml 0.04 664 9 log
vc++ 0.04 540 13 log
vc 0.05 492 14 log
mercury 0.06 1728 36 log
gcc 0.07 1504 14 log
ghc 0.07 1224 9 log
mingw32 0.08 596 14 log
delphi 0.10 604 15 log
modula2 0.11 672 0 log
bcc 0.12 608 14 log
fpascal 0.13 556 18 log
lcc 0.13 548 14 log
gnat 0.14 792 0 log
se 0.14 592 30 log
bigforth 0.15 924 13 log
pliant 0.16 3228 14 log
csharp 0.18 3280 15 log
modula3 0.18 932 24 log
smlnj 0.22 940 20 log
vpascal 0.28 600 18 log
java 0.53 4628 11 log
nice 0.64 4940 0 log
gforth 0.67 1504 15 log
poplisp 0.78 3272 10 log
*** c combinator evaluator 0.82 (GRAPH-FAKE/NO_DICK)
erlang 1.01 5268 10 log
ocamlb 1.09 380 9 log
oz 1.26 652 19 log
*** target
cim 1.32 2044 23 log
*** c combinator evaluator 1.48 (GRAPH/NO_DICK)
parrot 2.53 8036 35 log
pike 2.61 3620 9 log
lua5 2.71 840 11 log
lua 3.81 1000 11 log
*** c combinator evaluator 3.80 (GRAPH/BOEHM)
mawk 3.99 1912 10 log
slang 5.51 2296 15 log
*** c combinator evaluator 6.25 (DAG)
guile 8.52 2784 9 log
icon 9.37 1280 13 log
awka 10.34 17396 10 log
ici 10.78 2264 7 log
cygperl 13.85 37080 11 log
*** c ast evaluator 13.97
python 14.74 3544 12 log
perl 50.15 36100 11 log
*** ml intermediate lambda evaluator 120.29
rexx 78.82 5896 14 log
*** ml ast evaluator 457.9
gawk T T 10 log
php T T 10 log
elastic F F 17 log
jscript F F 14 log
rebol F F 12 log
ruby F F 11 log
tcl F F 15 log
vbscript F F 15 log

ML ast is an evaluator directly on the ast, ml lambda is a lambda evaluator on a compiled ast. The C ast evaluator is an evaluator on the ast, the C dag evaluator is a evaluator on a dag rewriter of inverted lambda terms, the C graph evaluator is a graph rewrite system with destructive updates - Boehm is with the Boehm-Demers-Weiser garbage collector, no-dick is with a Cheney style garbage collector, graph-fake is with some small inlining of definitions.

Note that there is a factor 120 between the ml compiled version and the C runtime.

Redesigning Redesigns

I have been designing the language and the compiler(s) for ages now, it seems. The language has seen several forms, the type system has been overhauled several times, as has the internal structure of the compiler, and I have compiled to various intermediate forms.

And all of that mixed with a form of explorative programming, well, I can only hope it ends with something useful. The good part, it keeps getting better, and I have no idea how I should have done it otherwise.

Incremental Compilation

I looked at various manners of implementing incremental compilation. Of course, one of the best manners would be to choose an object format. However, since I don't want to write all the code for writing and reading objects, I'll do it a bit differently and just marshal the AST to object files.

Benefits, ML supports marshalling, and it should be okay-ishly simple to implement a simple marshalling scheme in the C runtime. So, two strikes in one.

The Hi Language

The Hi language is glue-like language implemented by me. At the moment, it is a mess. I have worked a long time on the compiler, but it is difficult to measure progress. I started this blog so I can keep better track of design decisions.

I got a stage zero compiler implemented in ML (ocaml). It reads Hi source and generates ML. I also had an evaluator in ML, and used to compile the AST to C. Unfortunately, recent development broke the evaluator (since I was more interested in compiling), and I broke the to C translator, since the generated C had hick-ups on interpreting large pieces of code. The ML target is slow also, but has a very clean implementation, so I'll stick with that.

The stage zero compiler at the moment accepts most of the language, but a few errors remain in the type checker. This is not a big problem, as I can compile the source without type checking it.

I also wrote the major portion of the stage one compiler in Hi which compiles to another variant of C. At the moment I am looking at how to build it such that it allows for incremental compilation. This is kind of a must, since the compiler, as compiled by stage zero, is too slow to handle a lot of input.

The stage one compiler performs more checks, and handles types correctly, except for a few corner cases.

The C runtime has some small, difficult to analyse, bugs.

So, the path forward. Fix the stage one compiler such that it allows for incremental compilation. Fix the C runtime.

I am moving the source-code into a version control system to keep track of it; I used tarballs until now. I chose git, for no better reason that I trust that Linus knows what he is doing. I find it strange to add half-baked code into a repository though.