Sunday, June 13, 2010

C-Machine

I thought of a name for the setting described in the last post, and since there is a G-machine -where G means Graph,- I thought I would name it C-machine, somewhat reminiscent of that. C, of course, a double reference to the language compiled to, and combinators, since that is what it essentially is: A Combinator Evaluation Machine in C.

Thing is, I don't really know yet where to state the formalism, as a reverse-polish stack machine close to a normal machine including K and RT pointers and an understanding of primitive values, or as just a combinatorial eager evaluation strategy on combinatorial DOT expressions. The difference between those is that in the first it is possible to express Shift/Reset directly, in the second it is not.

Goal is not to reinvent the wheel, but just to have a good understanding of what I did, and see where it fits w.r.t. other machines. (Don't assume I think I did something outstandingly smart though.) I thought about it some more, the G-machine, as an example, is certainly different, since it trivially allows shared reductions for instance. A thing the C-machine, apart for explicit lets, does not.

Guess I'll use the space below to elaborate on both.

A DOT Evaluator

Below, the gist of DOT evaluation, semi-formal. A DOT evaluator is a combinatorial evaluation strategy used as an intermediate notation to make clear how thunks can be set up and reduced.

Note that there is nothing new about this notation. It is a machine-amiable convenience notation used to show, with a few invariants, that it is possible to safely translate combinatorial expressions to C, not as a new notation for combinatorics.

A DOT expression is a syntactic notation for standard combinatorial expressions. A combinatorial DOT expression consists of constants written in lower case, dots '.', combinators in uppercase, and series of squared series out of the former three. A rewrite rule is a squared series starting with a combinator producing a combinator expression, bound variables may occur in rewrites.

Examples:
[I x] = x
[K x y] = x
[S x y z] = [x z .] [y z]
(What is the expression for lazy evaluation of S?)

Basic Rewrite Rules

A DOT evaluator rewrites a combinator with a fixed strategy. Always, the last expression in a series is evaluated. When it reduces to a constant, its value is placed in the first not-nested dot encountered reading from right-to-left. When it reduces to a squared series, that squared series replaces the original term. Evaluation halts when the last expression does not reduce.

The above rewriter gives a fixed rewrite strategy on DOT expressions. Note that the order of the series rewritten to matters.

Example, using some extra notation:
F n = if n = 1 then [1] else [* . .] [n] [F .] [- n 1]

Evaluation of F 3:
[F 3]
= (if 3 = 1 then [1] else [* . .] [3] [F .] [- 3 1])
= [* . .] [3] [F .] [- 3 1]
= [* . .] [3] [F .] [2]
= [* . .] [3] [F 2]
= [* . .] [3] (if 2 = 1 then [1] else [* . .] [2] [F .] [- 2 1])
= [* . .] [3] [* . .] [2] [F .] [- 2 1]
= [* . .] [3] [* . .] [2] [F .] [1]
= [* . .] [3] [* . .] [2] [F 1]
= [* . .] [3] [* . .] [2] (if 1 = 1 then [1] else [* . .] [1] [F .] [- 1 1])
= [* . .] [3] [* . .] [2] [1]
= [* . .] [3] [* . 1] [2]
= [* . .] [3] [* 2 1]
= [* . .] [3] [2]
= [* . 2] [3]
= [* 3 2]
= [6]
Expand and apply rules.

The above rewrite rules are not complete. Reducing a combinator also implies that all extra arguments are inherited by the first squared expression. Also, a Curried expression is treated as a constant. Dots or variables may not occur in the first position, rather, a special apply @ combinator is used where [@ v] = [v], in case the v held a squared expression. In the rules above, the apply is implicit. I.e., [I x] = [@ x].

Examples:

[K x y z] 
= [ @ x z ]
also,

[K . y z] [S x] 
= [K [S x] y z]
= [@ [S x] z]
= [S x z]
Well-formedness and complete reduction
(I need help getting this section right.)

All variables occurring on the right-hand side must be bound on the left-hand side of a rewrite. In the right-hand-side of a rewrite, a squared expression must be bound to an exclusive dot to its right, there may not be more dots than squared expressions.

Since a rewrite substitutes all variables, no variables will occur when rewriting a term which doesn't contain variables; a term without variables will therefore rewrite to another term without variables. Since the last term, during the rewriting of a dot expression, cannot contain dots, since all dots in the last term will always be filled, terms rewritten are always strongly reduced. Since all terms rewritten to always start with a combinator, albeit apply, all squared terms always start with a combinator.

The above invariants are sufficient to guarantee that each combinator can be compiled directly to a C routine, and that, by the confluency lemma, terms who have a normal form reduce to that. (I guess.) (I should show Turing equivalence here.)

Shift/Reset
The shift/reset pair saves and restores a stack position. It cannot be expressed directly, therefore some extra notation with curly braces is used. When a shift is encountered, curly braces are introduced, and evaluation continues in that expression, when a value is determined the curly braces are lost, when a reset is reduced, the whole evaluation continues with that expression.

For example (check with compiler internals):
[ Shift G ]
= { [G .] [Reset] }
= { [G [Reset]] }
= {[K . . ] [ F ] [ Reset . ] [3] }
= {[K . . ] [ F ] [ Reset 3 ] }
= [3]

Anyone talking about the eighties?

No comments:

Post a Comment