Tuesday, May 1, 2012

Lexing

I've been looking at recent implementation of the Kernel language, a Lisp derivative, hoping to find better scanner support. The good new, some of them solved the problem and even implement a REPL on top of the lexer; the bad news, what murky implementations. Maybe there isn't anything better, but, wow, hopeless.

What I call a reader, seems to be called a port. Then it's pure unadulterated C. I thought of reading files to a wide character array for unbounded access, but that makes a REPL impossible. Maybe I'll just read stuff line by line, that could work.

Yeah, the latter should work. I'll mark the start of a lexing operation, and flush up to a certain point in case of success, or rewind otherwise.

Basically, the C code should look like:
token_t* read_token(reader_t* rdr) {
    if (tok= alt_0(rdr)) return tok;
    if (tok = alt_1(rdr)) return tok;
    return null;
}

I.e., lexing tries to recognize decreasingly complex alternatives written like this:
token_t* alt_n(reader_t* rdr) {
    token_t* tok;
    // start scanning
    reader_mark(rdr);
    // try an alternative
    if (reader_look(rdr) == L'c') {
        ...
        // flush all recognized input
        reader_flush(rdr);
        // return the recognized token
        return tok;
    } else if ( ... ) {
        ...
    } else {
        return null;
    }
}

Could work. Bit of a problem on how to handle EOL exactly, REPL demands that the next line is read lazily, but that's about it.

No comments:

Post a Comment