- Inlining of simple functions. Inlining will just mean less thunks/stack frames are created on the heap, less calls are made, and less garbage collection is needed. Should give better than linear speedup.
- Generational garbage collection. A lot of 'calls' are done once, just keeping them around for too long means hogging the memory.
- Memoization of libffi calls. For each call, the dynamic library symbol is looked up, an FFI cif is created from an FFI Hi type descriptor. Both are constant lookups/creations in the end and can be memoized.
Think that's about it. I'll never get to GHC or Ocaml speed without native ints and stack allocations, but it should perform fine.