Tuesday, March 2, 2010

Schlemiel the Painter's Algorithm

This is a Jiddish joke by Joel Spolsky on certain poor programming practices. In the joke, Schlemiel (also rendered Shlemiel) has a job painting the dotted lines down the middle of a road. Each day, Schlemiel paints less than he painted the day before. When he is asked why, Schlemiel complains that it is because each day he gets farther away from the paint can.


Schlemiel is Dutch for retard, btw.


I run into this mostly when printing debugging output which often involves rendering a data structure to a document, beautifying that, rendering the document to text (list of chars), and subsequently concatenating the texts. If you add that I need to bootstrap with an ML interpreted compiler, you'll get the picture.

I did a little optimization on some of the compiler in order not to be bothered by this, but there is little I can do until the compiler is bootstrapped. And I don't feel like `fixing' it since at this point, minimality is preferred.

Subsequent thoughts:
  • Native strings do make sense since I interface with C a lot, it speeds up the compiler, it gives smaller binaries. However, I would like to use list of characters as the runtime default. The question is: what should a string/text literal default to? List of characters or character array - tagging it with a suffix might make sense, but also would mean a lot of explicit conversions when people start optimizing code.
    Going to look what GHC decided on this, but wondering if it is documented. [They are lists of UTF-8 chars..., that'll eat your memory... Though I essentially have the same scheme...]
    (I'm at an impasse here.
    1. Recursive text processing functions are natural and fast and lists avoid the need for copying sometimes. 2. Heap allocated string objects save space, and significantly speed up the compiler, since all internal algorithms now work on lists of chars (a 2*n slowdown given the representation of a lists of chars as 2*n objects instead of one object). Heap allocated string objects are automatically collected. But, making this the default for interfacing with C means the introduction of shared state. 3. Using just a  special type for pointers to character array means an orthogonal convention, but explicit memory management. 4. The worst convention might be: working with strings, rendering them to list of chars, and rendering these to strings again. Maybe what I have ain't that bad after all?)
  • I thought of a stratifying scheme for the runtime. I pay now for the fact that the garbage collector used was developed as a drop-in replacement for Boehm's.
  • The marshalling code is the poorest I've ever written. It should have resembled Cheney's algorithm, messed up there.
  • I might add the let into the back-end lambda representation later, I removed it for the time being. Though it's just a convenience representation it may simplify the code generation algorithm. Not sure I am fixing a poorly written function.
  • I might go to some scheme like ocaml to tag type variables. I now use that when an identifier doesn't match a declaration, it's automatically a type variable. Not sure, it gives error messages on typos like "system.nt". Then again, forgetting a quote would do the same, and these are usually easily fixed.
Skipped to another laptop while the other laptop is running....

No comments:

Post a Comment