Sunday, January 3, 2010


A highly unsophisticated breakdown of grep/wc. Looks pretty functional to me: Lots of stuff moving around.
counting words in test_print_tuple.s
add         :  6855
and         :   372
call        :   472
cltq        :   299
cmov        :   302
cmp         :  6595
ja          :  2390
jb          :  1812
je          :  4936
jg          :   304
jle         :   305
jmp         :  1431
jne         :   718
lea         :  8010
mov         : 27955
pop         :  4181
push        :  1797
pxor        :   297
ret         :   870
salq        :   609
shr         :  1190
sub         :   729
test        :  2203
xorl        :  1943
\.string    :   574

Given everything, a factor 3-4 bigger in size than Ocaml isn't even that bad, for a first version which does no optimization and compiles to C. I don't have a stack, so every basic assignment translates to roughly two moves instead of a move and a relatively light push. I don't factor out string constants, so there are spurious calls to strneq. Each pattern match does an extra structure test on a record, for safety, at the moment.

Its hard not to optimize. Factoring out strings is easy, moving to bytecode too. But factoring out strings will make it harder to generate thunks 'on-the-fly,' which I want for a REPL, and moving to bytecode means an own optimizer, whereas I now leave that to C, which compiles to way more code than I expected. The usual route for bytecode is that performance drives implementers to optimizations such as hotspot compiles, which means dynamically ending up with the same, if not more, machine code in memory. Bytecode interpretation often remains faster than native code due to less cache misses, so its still a good option.

Still, some small changes and I might end up with 1.5x with respect to Ocaml? And 27k moves for 1kLoC source code? It might pay of to write my own optimizer on the generated intermediate assembly since the compiler knows best it is basically just shifting constants around.

I need a comparison between what assembly is, and should be, generated.

Note: Its all a first glance.

I have no humour

No comments:

Post a Comment