Well. I am having great fun, C++ is a bloody fine language; if it isn't a fine functional language, it is a fine language for implementing compilers. I implemented my first transform. I didn't opt for the visitor pattern, or a new solution by Stroustrup et. al. (too unstable), and I don't regret it one bit. Transforms are easily expressed, although be it a bit verbose.
The original bootstrap is implemented with one transform which both passes information around as well as rewrites; all other transforms are derived from that. Very heavy on overhead, but the bootstrap was supposed to be correct and robust, not fast. In the C++ compiler, I'll write out various manner of visiting the AST by hand.
I tested a transform, based on a template class with a shared state, which simply counts the number of nodes and, by transforming, copies the entire graph. The overhead of reference counting somehow dropped to negligible, no idea why. If I assume 1MB/s overhead for parsing, a copy of an entire graph on a 1MB source is roughly .3 seconds. More than adequate. I am laughing my head off.
The sharing solution might beat copying in the end. I now recommend it.