Thursday, December 9, 2010

On 64 Bits, A Text Symbol is a Small Constant

In my mind, I've been going over how to improve the compiler performance. Now, I don't keep a symbol table at the moment, and I am thinking at supporting strings instead of character lists in the compiler.

I could, of course, also implement a symbol table just to make all strings unique and abstractly calculate with integers mapping into the symbol table. But, say, I implemented native strings, then most local variables (like 'x' or 'x#30') in the symbol table will be less than eight characters long, and most fully qualified identifiers (like 'system.print' or 'map.foldr') will be less than sixteen characters long.

That means most symbols are just one or two word sizes long!


I.e., it makes as much sense to use the native string representation of a symbol instead of an abstract integer mapping into a symbol table since the native string symbol will most likely take as much heap space as an integer.

For my compiler, due to the slowness of term rewriting, a text symbol should be seen as a constant, just like an integer, where the abstract costs of operations are O(1), just like on integers.

No comments:

Post a Comment