Sunday, March 28, 2010

Damned Slow

Well, the author is. Took a few days of watching DVDs and playing farmville because that logical error ain't a nice one. I figured it will disappear if I (re-)introduce the let for local computations since the fault seems to be in handling the order of, possibly local, computations. But I have grown somewhat weary of writing code, and I want, again, to have some pretty readable solution, which just didn't pop into my brain right now. Ah well, will see.

Introducing lets also might solve another problem, the fact that now, by eta-expansion and combinator lifting, too many functions are introduced.

(Argh, it might make local computations explicit, but at the expense of introducing local variables instead of function arguments.)

I tried a few forms, they all look wrong. Problem is that in the translation scheme used, a let v = e0 in e1 is inversed into a series of thunks where e1 is set up first, v refers to a position in that thunk, and e0 is reduced before e1 is reduced. The whole let thing just seems superfluous and upside-down/inside-out.

032910: Introducing lets is done, it just boils down to an old observation that the DOT representation is somewhat more convenient then ANF, but I went for ANF, since that's somewhat better readable and better known anyway.
033010: And now I am annoyed again, the lets don't play nice with the direct coding scheme I had.


Tuesday, March 9, 2010


Great, most of the stuff seems in order, I am dropping the development on the other runtime for the moment. There's a small logical bug somewhere, concatenating two strings sometimes reverses the arguments, it probably boils down to the thunk/record observation I made before.

Hard to track, easy to solve.

Wednesday, March 3, 2010


The ocaml interpreted compiler slowly prods along at an unfortunate speed. By now, I have to admit I just hate ocaml, if it can't handle recursion, they should have published that.

Gave up on this whole strategy all together. I can get to the point it gives the lambda terms and I have a runtime, so, I am gonna take a zillion steps back and write a basic lambda evaluator in glorious dependable C.

030410: Worked my way through yy and y files. That was fun and easy, it compiles and generates an AST.
030510: Hit the evaluation problem again, there are not trivial schemes which give very good performance. Opted for applicative order evaluation based on substitution - the assumption being that you end up reducing/substituting in the head mostly, which will for the better part be small terms, while the rest is fully reduced. It will break sometimes. Read a 1996 Brics paper -to no avail- for better schemes.
030710: Evaluation works, gonna patch the bindings to C in. Tomorrow kids. Day after, patch the emitter also, thought it through, I might waste too much memory on lists of lists of characters as it stands anyway.
030810: Working on two laptops now. The ML interpreted compiler takes a looong time to get to the emitting stage, at which point it will have allocated 200Mb, and, after which, it stalls while growing to 1.3+ Gb of memory. I patched the emitter to emit collections of lines directly to a file as to avoid the memory explosion, and one laptop is bravely ploughing through the source code. But, unless I get lucky and the bootstrap works in one translation, the compile times are just prohibitive to debug. So, I am writing the simple lambda interpreter at the second laptop, which is a bit of a waste too, since I am only writing it to throw it away after the bootstrap.
030910: It's annoying writing the nth lamda-evaluator, so I find myself playing famville, hoping that the ML compiler bootstraps...
030910: Guess its another milestone, the ML interpreted to C compiler emitted C. Lots of it.
030910: It's bloody unbelievable. I manage to choke gcc with a whopping 400K lines, 18M chars of input.
030910: Ooops, mistake. I was `tail'-ing the output of the Makefile, thought it stalled. Uh, some stuff works. If I strip out some comments I end up with 14M source. Not sure, the binary size -without semantic checks- is 9M on a 64 bit system. Okish I guess, half of GHC.

Gnu, Flex and Bisons. Yay, farmville!

A Fundamental Change: Self?

I was looking at some examples, and I think it just doesn't make sense anymore not to pass a self pointer in each instance declaration. Or not?

The thing is, it would be just for convenience, since one could, if one wishes, explicitly model a self pointer as the first argument to each method for each interface.


When in doubt, don't?

Tuesday, March 2, 2010

Schlemiel the Painter's Algorithm

This is a Jiddish joke by Joel Spolsky on certain poor programming practices. In the joke, Schlemiel (also rendered Shlemiel) has a job painting the dotted lines down the middle of a road. Each day, Schlemiel paints less than he painted the day before. When he is asked why, Schlemiel complains that it is because each day he gets farther away from the paint can.

Schlemiel is Dutch for retard, btw.

I run into this mostly when printing debugging output which often involves rendering a data structure to a document, beautifying that, rendering the document to text (list of chars), and subsequently concatenating the texts. If you add that I need to bootstrap with an ML interpreted compiler, you'll get the picture.

I did a little optimization on some of the compiler in order not to be bothered by this, but there is little I can do until the compiler is bootstrapped. And I don't feel like `fixing' it since at this point, minimality is preferred.

Subsequent thoughts:
  • Native strings do make sense since I interface with C a lot, it speeds up the compiler, it gives smaller binaries. However, I would like to use list of characters as the runtime default. The question is: what should a string/text literal default to? List of characters or character array - tagging it with a suffix might make sense, but also would mean a lot of explicit conversions when people start optimizing code.
    Going to look what GHC decided on this, but wondering if it is documented. [They are lists of UTF-8 chars..., that'll eat your memory... Though I essentially have the same scheme...]
    (I'm at an impasse here.
    1. Recursive text processing functions are natural and fast and lists avoid the need for copying sometimes. 2. Heap allocated string objects save space, and significantly speed up the compiler, since all internal algorithms now work on lists of chars (a 2*n slowdown given the representation of a lists of chars as 2*n objects instead of one object). Heap allocated string objects are automatically collected. But, making this the default for interfacing with C means the introduction of shared state. 3. Using just a  special type for pointers to character array means an orthogonal convention, but explicit memory management. 4. The worst convention might be: working with strings, rendering them to list of chars, and rendering these to strings again. Maybe what I have ain't that bad after all?)
  • I thought of a stratifying scheme for the runtime. I pay now for the fact that the garbage collector used was developed as a drop-in replacement for Boehm's.
  • The marshalling code is the poorest I've ever written. It should have resembled Cheney's algorithm, messed up there.
  • I might add the let into the back-end lambda representation later, I removed it for the time being. Though it's just a convenience representation it may simplify the code generation algorithm. Not sure I am fixing a poorly written function.
  • I might go to some scheme like ocaml to tag type variables. I now use that when an identifier doesn't match a declaration, it's automatically a type variable. Not sure, it gives error messages on typos like "system.nt". Then again, forgetting a quote would do the same, and these are usually easily fixed.
Skipped to another laptop while the other laptop is running....

Monday, March 1, 2010

Everything is a thunk!

And sometimes, you find out why. I remember a few reports on the implementation of GHC, it seems a good approach.

The bug broke down to a case which I can handle in few manners, mixtures of thunks and records.

I patched it for now, but the everything-is-a-thunk in the runtime starts to look attractive.

GHC version -0.3?