Tuesday, July 31, 2012

Elalang

I came across Elalang on LtU, which apart from meaning Hi language in Greek, has a lot more in common with my language than I thought. The language is a strict, sometimes lazy, dynamic variant of ML or Haskell. It runs on .Net by compiling to bytecode for the CLI.

The funny thing is that I designed the Hi language to be executable even if you erase all types. Apart from that that gave me the time to experiment with the typechecker while bootstrapping the language, it also means that when you would compare the informal cores of both languages, Ela Core would implement an extended form of the Hi language. Basically, untyped lambda calculus evaluated strictly with some Haskell class like mechanism for dynamic dispatch, where Ela Core would be more expressive since it allows to dispatch on the return type of functions. A good thing for generality, but something I didn't design into Hi since I wanted a simple mechanism to dispatch on the type of values; basically following OO in order to make it possible to bind to C++ libraries without too much problems. Hi has OO like interfaces, not Haskell, or Ela, like type classes.

(The meaning of the above: I could simply translate Hi to Ela with some syntactic transformation on the source code; i.e., almost by copy pasting Hi source code. The OO like interfaces also imply I can typecheck with an adaption of F<:. So I guess I should stop mucking about and just implement a version of that; mind you, not even that is trivial.)

I am not sure about robustness or performance of Ela. I read it emits bytecode for a stack machine, which shouldn't be possible, higher order programming implies you'ld normally want some kind of heap.

(What I forgot to say. I like Ela. The fact that it is a dynamic variant of Haskell is really, really nice. You can do tons of tricks you can't do in a strongly typed language, and the runtime performance penalty can't even be that big, in theory, since it's all tagged graph manipulation abstractly anyway.)

The Spinners-I'm Working my way through Types and Programming Languages

Sunday, July 29, 2012

Wednesday, May 9, 2012

Turing Tarpit

I went back to my original compiler, and I was actually surprised at how good it is. At least, in theory, not in practice. There are a number of simple mistakes in the source, which are mostly attributable to writers block.

I've been looking at targeting GHC core, for no particular reason. It doesn't seem to make a lot of sense. GHC core mostly seems to be desugared Haskell source, so if you're going to target anything, then you'll end up targeting the least of all moving targets, which is plain Haskell source code.

 Thing is... There is no native exception support in Haskell. So you'ld need to bypass common evaluation by either wrapping everything up in a Monad, or emitting continuations with continuations for exceptions and regular evaluation.

Maddening really, stuck in the Turing tarpit.

Tuesday, May 1, 2012

Lexing

I've been looking at recent implementation of the Kernel language, a Lisp derivative, hoping to find better scanner support. The good new, some of them solved the problem and even implement a REPL on top of the lexer; the bad news, what murky implementations. Maybe there isn't anything better, but, wow, hopeless.

What I call a reader, seems to be called a port. Then it's pure unadulterated C. I thought of reading files to a wide character array for unbounded access, but that makes a REPL impossible. Maybe I'll just read stuff line by line, that could work.

Yeah, the latter should work. I'll mark the start of a lexing operation, and flush up to a certain point in case of success, or rewind otherwise.

Basically, the C code should look like:
token_t* read_token(reader_t* rdr) {
    if (tok= alt_0(rdr)) return tok;
    if (tok = alt_1(rdr)) return tok;
    return null;
}

I.e., lexing tries to recognize decreasingly complex alternatives written like this:
token_t* alt_n(reader_t* rdr) {
    token_t* tok;
    // start scanning
    reader_mark(rdr);
    // try an alternative
    if (reader_look(rdr) == L'c') {
        ...
        // flush all recognized input
        reader_flush(rdr);
        // return the recognized token
        return tok;
    } else if ( ... ) {
        ...
    } else {
        return null;
    }
}

Could work. Bit of a problem on how to handle EOL exactly, REPL demands that the next line is read lazily, but that's about it.

Wow

I haven't done a lot since a year. Oops. Here's the source code of the old -not even wrong- compiler.

Wednesday, April 25, 2012

Rubbish^3

Cubic rubbish. As stated, I've been looking at doing a reimplementation in C, as an interpreter, and, for the last few months, have mostly been doing other stuff. Like games, or watching movies.

Now I am coding, I am engineering a lexer implementation, and please ignore the last post since it is, well, rubbish. Since I want Unicode support, I've been reading up on C wide characters, and assume for the moment that wide characters are Unicode characters. All nice and dandy.

Now writing a lexer for Unicode most likely means writing a hand coded lexer since the state space for automatically generated scanning automata usually explodes. Also nice and dandy, I've written various small toy languages in my life, and hand coding a lexer shouldn't be that hard.

Rubbish, hand coding a lexer is hard. In the past, I have never written lexers for floating point literals, and floating point scanning breaks the usual one or two symbols lookahead invariant. Moreover, I've often written lexers in functional languages, usually employing my own set of parser combinators meanwhile employing the benefits of functional transparency, curried functions, closures, and garbage collection giving me arbitrary lookahead. With the prime benefit that every step one would advance automagically would be undone for each failure to recognize (part of) a token. Now, I need primitives to back up to a recognized portion of the input in case some alternative fails.

So, stuff I needed to consider when writing a lexer in C: no parser combinators, no curried applications, no referential transparency, no (automatic) state handling, no garbage collection, can't use lexer generators, can't automagically rewind portions of the input.

So I have been looking at other interpreters with hand coded lexers, and what do I find for GCC, Lua, and Python? Rubbish. Okay, be nice: Elaborate rubbish.

I have no clue how people managed to write programs in the past. Except for rubbish.

Monday, January 2, 2012

Large C Projects

I am rewriting the project in C. I am going a bit slow, and am mostly making the usual decisions of where to place what abstraction. Embarrassingly, that's not an easy thing for C. The default seems to be to place all headers in an include directory, and all sources separate. Lousy, I want more and better.

A minimal include directory for external usage, directories for header files for different architectures, for utilities and different stages of the compiler, for testing. And a testing framework. That's what I need.

Now where is an Internet guide to setting up C projects?