Thursday, January 29, 2009

Log 012909

Another slow day. Missed a rule in lambda_transform_child? (hi) Fixing test_lib.hi (hi) (bindings to OS)

Language Design

I took some care in designing the Hi language and its implementation. Recently, there was some debate on LtU whether design and implementation issues should be discussed. A problem is that there are no good designers of programming languages, even, there is no good body of knowledge to abstractly discuss what is, or would constitute, a good programming language.

So, what do we need? Some form of applied semiotics?

Why Functional Programming Matters

Short story.

When I was doing my master studies we were introduced to, I think, Miranda. One of the firsts home-work assignments was to write an interpreter which I had to deliver with a friend.

I had an Atari, with a PC emulation chip, but the emulation wasn't very good, and didn't work on all programs. Anyway, it didn't accept the environment given. So I ended up coding about a two hundred lines out of my head at home.

Next day, we had to give in the assignment. As usual, I was late, so with a floppy with source code and about a minute left, I met with my friend. He said to forget about it, drop the assignment and go to class.

I logged in on a Sun workstation, uploaded the source code, and ran it. It worked flawlessly.

That's why functional programming matters.

Wednesday, January 28, 2009

Debugging

Debugging: the stage0 compiler produces tracing output, the ML runtime of stage0 produces tracing information, the stage1 compiler produces tracing output, the code produced by the stage1 compiler contains information, the C runtime produces tracing information, there is tracing information in the test suits, and I use gdb/nemiver.

And in the end, bugs are solved while I sleep.

Log 012809

Left neuron has trouble communicating with right neuron. Fixing bug in ___apply (expand and copy continuations) (hi,c). Fixing substitution bug (hi) (labels do not remain unique after a split in two phases). Getting test_hof1.hi to work.

Monday, January 26, 2009

Paying the Price

My compiler translates input to a Graph Rewriting System expressed in C. I recently made a design decision to put everything except for the GRS, which means everything except for the treatment of literals, outside of the runtime.

This means that, for example, addition is handled through FFI. Of course, you pay a substantial performance price for that: Ackerman now runs in 5.6 seconds instead of, the best I got so far, around one second.

The benefit is that extending Hi programs with calls to, simple, C libraries now means nothing more than possibly introducing labels for types, and then calling the C libraries directly.

At some point, I might change it again, but I hope that I can optimize later by folding sequential specific calls w.r.t. predefined C libraries to specialized C code.

At this point, I can only hope that most programs don't resemble Ackerman.

Sunday, January 25, 2009

How To Compile a Lambda Term1


native_int* __fac(native_int* n) {

    native_int arity = 0;
    native_int* v_0;
    native_int* r_15 = (native_int*) n[1];
    native_int* r_16 = (native_int*) n[2];
    native_int* r_17 = (native_int*) n[3];
    native_int* r_18 = (native_int*) n[4];

    native_int* r_19;
    arity = 1;
    if (n[5] >= arity) goto reduce;
    native_int* reta = (native_int*) n[3];
    native_int reti = n[4];
    reta[reti] = (native_int) n;
    r_19 = (native_int*) n[1];
    goto finally;
    reduce: n=n;
    v_0 = (native_int*) n[6];

    native_int* r_20;
    if (v_0[0] != 0) goto label1;
    if ( strncmp( (char*) v_0[1], (char*) "system.int", 128) != 0 ) goto label1;
    if ( strncmp( (char*) v_0[2], (char*) "system.int", 128) != 0 ) goto label1;
    if (v_0[4] != 0) goto label1;

    static native_int r_21[5] = {(native_int) 0, (native_int) "system.int", (native_int) "system.int", (native_int) 1, (native_int) 1};

    r_17[(native_int) r_18] = (native_int) r_21;
    r_20 = r_15;
    goto label0;
    label1: n=n;

    native_int* r_22 = (native_int*) __system_dml;
    native_int* r_23 = v_0;

    native_int* r_24 = 0;
    native_int* r_25 = GC_malloc(8 * sizeof(native_int));
    r_25[0] = (native_int) r_22;
    r_25[1] = (native_int) r_15;
    r_25[2] = (native_int) r_16;
    r_25[3] = (native_int) r_17;
    r_25[4] = (native_int) r_18;
    r_25[5] = (native_int) 2;
    r_25[6] = (native_int) r_23;
    r_25[7] = (native_int) r_24;
    if (n[5] == arity) goto dontexpand26;
    r_25 = ___expand(r_25, n, arity);
    dontexpand26: n=n;
    native_int* r_27 = r_25;
    native_int* r_28 = (native_int*) 7;

    native_int* r_29 = (native_int*) __fac;

    native_int* r_30 = 0;
    native_int* r_31 = GC_malloc(7 * sizeof(native_int));
    r_31[0] = (native_int) r_29;
    r_31[1] = (native_int) r_25;
    r_31[2] = (native_int) r_16;
    r_31[3] = (native_int) r_27;
    r_31[4] = (native_int) r_28;
    r_31[5] = (native_int) 1;
    r_31[6] = (native_int) r_30;
    native_int* r_32 = r_31;
    native_int* r_33 = (native_int*) 6;

    native_int* r_34 = (native_int*) __system_dsb;

    native_int* r_35 = v_0;

    static native_int r_36[5] = {(native_int) 0, (native_int) "system.int", (native_int) "system.int", (native_int) 1, (native_int) 1};
    native_int* r_37 = GC_malloc(8 * sizeof(native_int));
    r_37[0] = (native_int) r_34;
    r_37[1] = (native_int) r_31;
    r_37[2] = (native_int) r_16;
    r_37[3] = (native_int) r_32;
    r_37[4] = (native_int) r_33;
    r_37[5] = (native_int) 2;
    r_37[6] = (native_int) r_35;
    r_37[7] = (native_int) r_36;
    r_20 = r_37;
    goto label0;
    label0: n=n;
    r_19 = r_20;
    finally: n=n;
    return r_19;
}

1Without a ridiculous CPS transform ;oP. But with some small errors. ;o)

Log 012509

Fixing the back-end (hi). Small changes to the runtime (c). Getting fac.hi to compile (hi). Getting test_hof.hi to compile (c).

Thursday, January 22, 2009

Consecrated Source Code

Who, if we knew
What we receive, would either not accept
Life offered, or soon beg to lay it down.

--Milton.


The stage one compiler accepts its own source, identifies correctly, and can dump the AST to object files. This is great, since I was a bit worried it wouldn't be able to compile itself. (ML might run out of memory/stack space, you never know.)

On the top level, I can see the ML GC at work. Identify a few words, pause, identify, pause, identify, pause, ...

All is set to bootstrap, just need to fix the runtime.

Part of the stage zero compiler is a faithful, I should say: direct, translation of the operational semantics. So, I guess I can state that both operational semantics as the stage one compiler are in good shape.

Next steps: fix the linker/translation to C, as well as the bugs in the C runtime. Build a lot of small programs to test. Probably implement some small optimizations on the lambda terms. And then, ... one long, long, compile to bootstrap.

After that, conformance suits and some small fixes in the semantical analysis of stage one.

Combinatorial Explosion

Just a memory.

Have you ever been in a MDMA rush while performing the light show of a house party and being sheered at by a crowd of two thousand plus because you managed to press a series of buttons at the right moment?

Meanwhile, I was thinking about avoiding combinatorial explosion of solving nand-encoded prime factoring boolean terms.

This was a long time ago. I grew out of it, all of it.

But it was good...

Unexpected: 2+3 = ... 6!

Sloppy copy of *. FFI works, on hand crafted code.

Log 012209

Slow. Fixing dynamic loading back-end (hi). Small tests on C FFI (C). -Wall (C).

On Constants

I am weird, I read source code, just to learn. Apache, Java Swing, Clean, Haskell, gcc, ogre 3d, irrlicht, ml light, Tinyhttpd, various lisps, smv, Mozilla javascript, Helium, gtk, HOL light, ...

Invariably, with the exception of 1%, I'll find a mess. It is just a fact, people can't program, or programming is too hard.

I read the Python source to see how they handle dynamical loading of shared libraries. I wasn't sure I actually needed a cache for the handles.

Surprisingly, an array of 128 handles are allocated. I am a conservative programmer, I grow the area of handles when needed.

But thinking of it: Who's right? Python will crash when more than 128 libraries are loaded; but that will not happen very fast. My code will handle a lot of libraries; but if my code is abused, it is not thread-save, or contains a bug, it will silently keep growing and keep using memory.

There is a point in coding with hard constants.

Multitasking Heat

Multitasking: fixing small bugs in the make file for stage 1, building a new runtime, fixing the module system, fixing the back-end such that functions from C shared libraries can be called directly.

And all that while my Mac is overheating while crunching through the source of stage 1.

Wednesday, January 21, 2009

Intermediates?

I compile the AST to an intermediate representation, untyped lambda terms. That is for convenience, but how convenient is it? As it looks now, most transformations are just as well done on the AST, with the added benefit that work is moved from the linker to the compiler; and another benefit, that functions become available for other optimizations on the AST like type directed optimizations.

On the other hand, it gives a clear split between a front and back/intermediate end.

I compile the lambda terms to text directly. That might have been a pretty bad idea.

**** ML

This now happened to me once too often:


Fatal error: exception Stack_overflow


If there is a limit on the stack, please don't write a functional language.

Log 012109

Bugs in incremental compilation (hi). Initial commit of new C runtime (C). Makefile for C runtime (C). Added provisional support for loading dynamic libraries (hi/C).

You have to love git


#!/bin/bash

VERSION=`date +"%m%d%y"`
PREFIX="hieronymus"

git archive --format=tar --prefix=$PREFIX/ HEAD | gzip > ../$PREFIX$VERSION.tgz

Tuesday, January 20, 2009

O(log(N)) or O(1)?

I use a binary search on types and methods to find the corresponding function. This has O(log(N)).

It is possible to compile to O(1) lookup, assuming that the source is type-checked. A select term can be written in compiled down lambda terms as ((select "a_method") a_value). The term (select "a_method") can be translated to a combinator "F_a_method". If the value holds an integer of a fixed range, the number of types, O(1) is possible.

I am sticking with the first scheme for now.

Why Generalize?

In most academic papers, it is assumed that a modern compiler will generalize the types of let bindings in definitions. The classical example is something like:


let f x = x in (f true, f 1)


If the type of f is not generalized to (forall a. a -> a) before used in the body, it will not type check.

Unfortunately, in my source code I cannot find any examples where any locally bound identifier should be generalized.

Why have all this machinery for generalization?

Actually, given more detailed type information for the arguments of f, it is substantially more trivial to generate faster code for f. So that's a good argument against generalization.

Toilet Adornments

When you read Lambda the Ultimate, you would think writing a compiler has a lot to do with science. Unfortunately, 99% of the time you're stuck with more worldly problems.

Mundane problems encountered in a day:

  • Forgetting to pass the current module bindings to all bindings during identification

  • Forgetting to add "." as a default search directory to the search path

  • Returning the current module in the list of all loaded modules

  • Not adding a good fall-back for "exhausted all cases" in the operational semantics

  • Compiling string literals in the import section down to char lists

  • Forgetting that if the result of an evaluation is an exception, the runtime should exit with another exit code

Conformance

In the absence of a specification, conformance of a compiler is measured by bootstrapping it.

Not nearly good enough for what I am aiming at. So I'll need a tight language specification, and conformance testing suite.

Like every programmer, I am not really looking forward to testing.


The devil plays with my compiler and my blog: While I was writing this, I got this message from the compiler.

Fatal error: exception Runtime.RuntimeException("reduce error")

That is, a bug in the runtime of stage zero. A good reason for a conformance suit. Dealing with this tomorrow. Of to bed. (This happened while loading more than one module)

The Hi Mascot


Mascot Mas"cot, Mascotte Mas"cotte, n. [Through French fr. Pr. mascot a little sorcerer or magician, mascotto witchcraft, sorcery.]
1. A person who is supposed to bring good luck to the household to which he or she belongs. [1913 Webster]
2. Hence: Anything that brings good luck; especially, an animal kept by a group, as a sports team, to serve as a symbol and to bring luck. [1913 Webster +PJC]
-- From The Collaborative International Dictionary of English v.0.48


Hi is an abbreviation of Hieronymus, after my favourite painter, Hieronymus Bosch. I used that as a working name since Erik Meijer had published a small language named Mondrian after another Dutch painter, Piet Mondriaan.



It is a picture taken from the painting 'A Ship of Fools.' Not very likely to help with language adoption as the Python mascot from the Python language by Guido van Rossum, but I like it.

Initial Tests on the Module System

It seems to work. After the system file is processed, there is hardly any overhead for producing small programs which use the system.

So, I am now building a linker and getting ready for a full compile.

Sometimes modularity really works. I only need one run of the compiler to dump object files from the AST; after that, it's just writing and debugging the linker which produces the C code. So, the three hours to produce the object files are now mostly irrelevant.

Programming in Hi

Is a Blast! Now, only if I knew why?


... this is a space reserved for the Hi Language design philosophy... Some initial thoughts:

  • Be smart, don't be intelligent

  • If it looks wrong, it is wrong

  • What's wrong with perfection, anyway?

  • LOCs are relevant, less LOCs are more relevant

  • True, if tested

  • It there is more than one way of doing it, then all of them are wrong


Log 012009

Performance. Incremental compilation (hi). Makefiles for stages (hi). Debugging the module system (hi).

Monday, January 19, 2009

Slow Motion: "2:21:45"

At the moment, it takes the stage one compiler, as build by stage zero, 2:21:45 wall clock time to compile the 747 lines of the system file. The marshalled AST/object file has a size of 380.243 bytes.

The compiler is that slow since it uses a stack based interpreter; direct evaluation is impossible since ML cannot handle the recursion depth needed by the compiler. (This is mostly due to text processing. Concatenating large lists of chars is beyond ML.)

That's including semantical analysis, and a lot of printing of intermediate results, which both account for a substantial part of the time. Without semantical analysis, not counting identification and well-formedness checks, it takes 33:46 to compile. I think I can get that number down to about 10 minutes...


Some people don't understand bootstrapping. I guess I should explain that I have a stage zero compiler written in ML which accepts Hi and produces ML. I am now writing a bootstrap compiler in Hi which accepts Hi and produces C. So, to get the final compiler you need to a) build the ML compiler, b) build the Hi to C compiler with that compiler, and c) rebuild the Hi to C compiler with the last compiler. After that the initial ML compiler becomes irrelevant.


The good part is, of course, that the stage one compiler accepts the same source as the stage zero compiler. For comparison, the stage zero compiler compiles the stage one compiler in a few minutes.

The stage zero compiler consists of approximately 14 thousand lines of ML code. The stage one compiler consists of approximately 12 thousand lines of Hi code.

At this rate, it would take one and a half day to compile; or, three hours if I build a compiler without semantic checks. If the factor 120 is representative, I'll end up with a self compile of around 15 minutes. Afterthought: Good news, the C runtime is much faster on method lookup, which is a major bottleneck at the moment, so 15 minutes will be an extreme worst-case.

The latter is still too much. There are no real optimizations done in the compiler, but an ML compile of stage zero is 1 second...

Stuff I can do: (A) Let the ML compiler produce code for the new C runtime (a thousand lines of code), (B) strip down the Hi compiler that it produces code for the C runtime from lambda terms, (C) live with it since, theoretically, it doesn't matter how fast the ML produced compiler is.

Reasons Progress is Slow

Top ten reasons progress is slow:

1. I think too much, and don't start coding
2. I think too little, and start coding
3. I kill-off more code than I write
4. If the code is not perfectly clear, it must be wrong
5. I re-invent every step taken in compiler design for the last 50 years
6. Bugs, and unexpected behaviors (ML/Boehm)
7. No tracking
8. Too many irrelevant Lambda discussions
9. Too much irrelevant science
10. I started blogging

Must have: Specialization

A must have for the language is specialization of a value to a type. A) It is a nice to have. B) It makes handling of thrown exceptions easier (in Hi any value can be thrown).

For orthogonality reasons, it would be nice if the syntax of specialization would resemble a case distinction. Or, better, if it could be handled inside a case. It must be treated differently, because, after specialization, the value gets a new type.

This seems reasonable (A):


[ i: int -> do something with i
| c: char -> do something with c
| l: list (char, a) -> do something with l ]


So, I need to be able to compile type-cases on values...

Or, a more trivial approach (B):


[ i: int -> do something with i
| c: char -> do something with c
| l: nil -> do something with l
| l: cons -> do something with l ]


Make a case distinction not on the type label but on the constructor. This is easier than the former, but probably less nice?

So, it's (A) or (B). Pro's and cons's:

+A, nice readability.
-A, hard to implement.

+B, you make a case distinction immediately, so that might also result in short code.
-B, you need access to, or expose, the structure of the type.

Question, given specialization on types, we might as well specialize on the interface? (There's no RTTI for that...)

After some thought: (A) is lousy, in the absence of a lot of RTTI you might need to visit the whole graph to derive the correct type instance (observe list (opt int)).

There is also option (C), differentiate on the type constructor:


[ i: int -> do something with i
| c: char -> do something with c
| l: list -> do something with l ]


(C) it is. Not really nice, but the best of (A) and (B).

Log 011909

Option parsing (hi). Marshalling (ml). Modules (hi). Runtime makefile (ml). Hi favicon.

Performance Issues

This is a reference list I keep to measure performance between various implementations.

It computes (ackerman 3 8). The list is from a languages shoot-out, but run on a 500Mhz machine. Some of my measurements are on a 1Ghz laptop, the latest are on my Macbook Air, so I am of at least a factor two. All my own measurements are normalized towards a 1Ghz machine. I don't treat integers as primitives in the language so I am fine with the performance.


ocaml 0.04 664 9 log
vc++ 0.04 540 13 log
vc 0.05 492 14 log
mercury 0.06 1728 36 log
gcc 0.07 1504 14 log
ghc 0.07 1224 9 log
mingw32 0.08 596 14 log
delphi 0.10 604 15 log
modula2 0.11 672 0 log
bcc 0.12 608 14 log
fpascal 0.13 556 18 log
lcc 0.13 548 14 log
gnat 0.14 792 0 log
se 0.14 592 30 log
bigforth 0.15 924 13 log
pliant 0.16 3228 14 log
csharp 0.18 3280 15 log
modula3 0.18 932 24 log
smlnj 0.22 940 20 log
vpascal 0.28 600 18 log
java 0.53 4628 11 log
nice 0.64 4940 0 log
gforth 0.67 1504 15 log
poplisp 0.78 3272 10 log
*** c combinator evaluator 0.82 (GRAPH-FAKE/NO_DICK)
erlang 1.01 5268 10 log
ocamlb 1.09 380 9 log
oz 1.26 652 19 log
*** target
cim 1.32 2044 23 log
*** c combinator evaluator 1.48 (GRAPH/NO_DICK)
parrot 2.53 8036 35 log
pike 2.61 3620 9 log
lua5 2.71 840 11 log
lua 3.81 1000 11 log
*** c combinator evaluator 3.80 (GRAPH/BOEHM)
mawk 3.99 1912 10 log
slang 5.51 2296 15 log
*** c combinator evaluator 6.25 (DAG)
guile 8.52 2784 9 log
icon 9.37 1280 13 log
awka 10.34 17396 10 log
ici 10.78 2264 7 log
cygperl 13.85 37080 11 log
*** c ast evaluator 13.97
python 14.74 3544 12 log
perl 50.15 36100 11 log
*** ml intermediate lambda evaluator 120.29
rexx 78.82 5896 14 log
*** ml ast evaluator 457.9
gawk T T 10 log
php T T 10 log
elastic F F 17 log
jscript F F 14 log
rebol F F 12 log
ruby F F 11 log
tcl F F 15 log
vbscript F F 15 log


ML ast is an evaluator directly on the ast, ml lambda is a lambda evaluator on a compiled ast. The C ast evaluator is an evaluator on the ast, the C dag evaluator is a evaluator on a dag rewriter of inverted lambda terms, the C graph evaluator is a graph rewrite system with destructive updates - Boehm is with the Boehm-Demers-Weiser garbage collector, no-dick is with a Cheney style garbage collector, graph-fake is with some small inlining of definitions.

Note that there is a factor 120 between the ml compiled version and the C runtime.

Redesigning Redesigns

I have been designing the language and the compiler(s) for ages now, it seems. The language has seen several forms, the type system has been overhauled several times, as has the internal structure of the compiler, and I have compiled to various intermediate forms.

And all of that mixed with a form of explorative programming, well, I can only hope it ends with something useful. The good part, it keeps getting better, and I have no idea how I should have done it otherwise.

Incremental Compilation

I looked at various manners of implementing incremental compilation. Of course, one of the best manners would be to choose an object format. However, since I don't want to write all the code for writing and reading objects, I'll do it a bit differently and just marshal the AST to object files.

Benefits, ML supports marshalling, and it should be okay-ishly simple to implement a simple marshalling scheme in the C runtime. So, two strikes in one.

The Hi Language

The Hi language is glue-like language implemented by me. At the moment, it is a mess. I have worked a long time on the compiler, but it is difficult to measure progress. I started this blog so I can keep better track of design decisions.

I got a stage zero compiler implemented in ML (ocaml). It reads Hi source and generates ML. I also had an evaluator in ML, and used to compile the AST to C. Unfortunately, recent development broke the evaluator (since I was more interested in compiling), and I broke the to C translator, since the generated C had hick-ups on interpreting large pieces of code. The ML target is slow also, but has a very clean implementation, so I'll stick with that.

The stage zero compiler at the moment accepts most of the language, but a few errors remain in the type checker. This is not a big problem, as I can compile the source without type checking it.

I also wrote the major portion of the stage one compiler in Hi which compiles to another variant of C. At the moment I am looking at how to build it such that it allows for incremental compilation. This is kind of a must, since the compiler, as compiled by stage zero, is too slow to handle a lot of input.

The stage one compiler performs more checks, and handles types correctly, except for a few corner cases.

The C runtime has some small, difficult to analyse, bugs.

So, the path forward. Fix the stage one compiler such that it allows for incremental compilation. Fix the C runtime.

I am moving the source-code into a version control system to keep track of it; I used tarballs until now. I chose git, for no better reason that I trust that Linus knows what he is doing. I find it strange to add half-baked code into a repository though.