Wednesday, May 9, 2012

Turing Tarpit

I went back to my original compiler, and I was actually surprised at how good it is. At least, in theory, not in practice. There are a number of simple mistakes in the source, which are mostly attributable to writers block.

I've been looking at targeting GHC core, for no particular reason. It doesn't seem to make a lot of sense. GHC core mostly seems to be desugared Haskell source, so if you're going to target anything, then you'll end up targeting the least of all moving targets, which is plain Haskell source code.

 Thing is... There is no native exception support in Haskell. So you'ld need to bypass common evaluation by either wrapping everything up in a Monad, or emitting continuations with continuations for exceptions and regular evaluation.

Maddening really, stuck in the Turing tarpit.

Tuesday, May 1, 2012

Lexing

I've been looking at recent implementation of the Kernel language, a Lisp derivative, hoping to find better scanner support. The good new, some of them solved the problem and even implement a REPL on top of the lexer; the bad news, what murky implementations. Maybe there isn't anything better, but, wow, hopeless.

What I call a reader, seems to be called a port. Then it's pure unadulterated C. I thought of reading files to a wide character array for unbounded access, but that makes a REPL impossible. Maybe I'll just read stuff line by line, that could work.

Yeah, the latter should work. I'll mark the start of a lexing operation, and flush up to a certain point in case of success, or rewind otherwise.

Basically, the C code should look like:
token_t* read_token(reader_t* rdr) {
    if (tok= alt_0(rdr)) return tok;
    if (tok = alt_1(rdr)) return tok;
    return null;
}

I.e., lexing tries to recognize decreasingly complex alternatives written like this:
token_t* alt_n(reader_t* rdr) {
    token_t* tok;
    // start scanning
    reader_mark(rdr);
    // try an alternative
    if (reader_look(rdr) == L'c') {
        ...
        // flush all recognized input
        reader_flush(rdr);
        // return the recognized token
        return tok;
    } else if ( ... ) {
        ...
    } else {
        return null;
    }
}

Could work. Bit of a problem on how to handle EOL exactly, REPL demands that the next line is read lazily, but that's about it.

Wow

I haven't done a lot since a year. Oops. Here's the source code of the old -not even wrong- compiler.