Sunday, May 31, 2009

A Narrow Road

I once had a great idea to compile to Thunk form, which is explicitly different from a stack machine, but is a simple representation for a graph rewrite system. The idea was a follows, the fac function


fac = \n -> if n = 1 then 1 else n * (fac (n-1))


Is compiled to

fac = \n -> if n = 1 then [1] else
[. . .] [*] [n] [[. .] [fac] [[. . .] [-] [n] *[1]]]


Reduces like

[fac 3]
[. . .] [*] [3] [[. .] [fac] [[. . .] [-] [3] *[1]]]
[. . .] [*] [3] [[. .] [fac] [[. . 1] [-] *[3]]]
[. . .] [*] [3] [[. .] [fac] [[. 3 1] *[-]]]
[. . .] [*] [3] [[. .] [fac] [*[- 3 1]]]
[. . .] [*] [3] [[. .] [fac] *[2]]
[. . .] [*] [3] [[. 2] *[fac]]
[. . .] [*] [3] [*[fac 2]]
[. . .] [*] [3] [[. . .] [*] [2] [[. .] [fac] [[. . .] [-] [2] *[1]]]]
... [. . .] [*] [3] [[. . .] [*] [2] [[. .] [fac] *[1]]]
... [. . .] [*] [3] [[. . .] [*] [2] *[1]]
... [. . .] [*] [3] *[2]
... *[6]


An optimizing compiler shouldn't push constants, so would evaluate (generate code) like this


fac = \n -> if n = 1 then 1 else
[* n .] [[fac .] *[[- n 1]]]


I am trying to get rid of the thunk form, to simplify the internal representation, and just use a dot to show where an application series is expecting an argument. I.e. the brackets now are simple parenthesis denoting subformulas, and after 'dotting' the formula it is 'linearized', i.e. the order of applications do not matter that much any more.

Tuesday, May 12, 2009

Thursday, May 7, 2009

Small Mistakes

John McCarthy insists on having a good abstract syntax, and how right he is. It are the small mistakes in the abstract syntax which really careen the compilation process. Small work-arounds pop up to fix problems which just "shouldn't" be there and code starts to look perfidious. And it just takes ages to find why, what, when, which mistake was made.

Refactoring miasmic code.

Sunday, May 3, 2009

Reality Check

I haven't been working on the language for quite some time now. I recently started again. Puzzling thought: I now added two extra internal languages to the compiler. A minimalist untyped lambda calculus core language, and a code language, which should abstractly represent the C subset I am compiling to.

Why? These are two extra levels of indirection, which if I look hard at them, are slightly more direct (no meta-information to handle), but I think I can get away with restricting myself to the original abstract syntax, meanwhile ascribing a slightly different semantics while converting.