Monday, December 26, 2011

Unicode Decisions

Strings are peculiar things these days. Since I am now aiming at a high-level language with a fast interpreter, I need to solve what to do with character encodings and string representations.

The best I can think of for now is that characters should be represented internally as wide characters and strings as multibyte sequences.

Which breaks binding to C, since normal characters are much larger than C expects. Unless I think of something smart I'll do this representation anyway, which also means that I'll need a handwritten lexer and parser since tables explode with wide character entries...

No libffi, no lex, no yacc. No libffi also implies glue code or no cheap manner of reusing C libraries. Hmm.... Whatever?

No comments:

Post a Comment