Tuesday, May 19, 2015

Log 051915

C++ has fine tools. With valgrind I looked at memory leaks; none so far, RAII works. With gprof I debugged a performance issue while printing ridiculously large series of declarations. If you don't stick to the C++ manner of iterating over a vector, performance degrades because somewhere internally it a) holds a list of buckets to store all elements and iterating in the C manner searches for the bucket each step, b) it insists in allocating an extra object for each lookup. I don't understand the latter but after some tinkering using C++11's manner of iterating solved the problem. Printing performance dropped from 11 to 3 seconds on a large file.

It spends 40% of the time reference counting while printing which is a good indication of the overhead while rewriting terms. If a pointer comes into scope it needs to increase the reference count, if it leaves a scope it needs to decrease it; there's nothing to be done about it, and it will be like this for all rewriting functions.

After the parser recognized a small subset of the language, definitions and expressions, I now am expanding to the full language. The AST explodes, every alternative becomes about 50 lines instead of one. It's inconvenient and a lot of work.

The real hurdle will be whether I can express transforms concisely in C++. Well, it won't be very concise, pattern matching must be replaced by tedious downcasting and picking a record apart through accessor methods. I'll probably write some macro's for that, even if I find programming with macros bad practice. But I need some generic solution.

I need a transform which supports this:

               ↓:T | ↑:T
              APPLICATION
          ↙:T / ↗:T  ↘:T \ ↖:T
       TERM0              TERM1

The arrows denote the information flow around the terms during a transform where a term may also be rewritten. By fiddling with the information flow locally in each rewrite step it's not very difficult to implement semantic analysis and by fiddling with the expressions returned it is not difficult to implement rewrite passes. It more or less generalizes attribute grammar approaches to compiler construction.

The transform should be a generic function which leaves information flow and terms intact, unless you override certain cases. In that manner, without spelling out all functionality for each leave, you end up with concise definitions of passes, albeit that the transform may not be very fast.

Well, that's it. There's progress but still two to three years of work left. I fully understand why people implement untyped scripting languages, it's just so much easier.

ADDENDUM: I am nearly finished with the parser. I severely restricted the input grammar for types; it used to have a liberal free-flow grammar where pretty absurd types would be accepted since I was rushing the compiler to a bootstrap. This time, since I know I have a compilation scheme, the type system will go first. So, no explicit forall constructs, no predicates on type synonyms, etc. Dumbing down continues.

No comments:

Post a Comment