Sunday, May 31, 2015

Log 053115

Well. The circle I am in is: Think random thoughts until I've got a reasonable grasp on all invariants and the compilation scheme I want to support, work up some appetite to translate it to code, code a bit.


It's mind-numbing. Bosch was right, I feel like the bloke above.

I am not sure yet to support a naming scheme which interfaces with C object files right. I need something where if a combinator 'foo' is declared in namespace 'baz.baf' in some file 'qux.hi' it unambiguously 'knows' the symbol name of the C function representing 'foo' such that it will link...

You'ld expect a compiler is all about parsing and typing but, honestly, that's only because these are the trivial aspects academics can concentrate on easily to publish their greeklish.

The mundane world of naming stuff. Tss.

Friday, May 29, 2015

Identification Hell

Hi has a pretty modern syntax. It has namespaces, qualified names, implicitly declared variables, doesn't force you to write -for instance- upper case letters in pattern matches, and uses the 'dot' selector liberally.

The downside is that, preceding the rest of the semantic analysis, it employs an identification pass to disambiguate and clearly identify which (qualified) name is bound to what constant or variable. It flattens namespaces, declares unbound variables, disambiguates expressions with dots in it, etc. After the identification pass, the compiler proceeds like you would expect for a far simpler language.

It leads to a syntax I find very pretty but there are disadvantages to this approach: For the user, typos may be translated to unexpected compiler error messages. For the implementor, it's hellish to implement.

So, for the next few weeks I'll be stuck in identification hell. It's really something I would rather not but, yeah, I don't want to give up on my pretty little language.

Sunday, May 24, 2015

Log 052415

Another simple pass implemented, all declarations now become fully qualified within their namespace. The next pass is the first real "big" work: identification where all applied identifiers become bound to their defining occurrences.

To be safe, I did some profiling again on a big file.

-rw-rw-r--. 1 marco marco 1298048 May 24 21:19 big.hi

It takes 1.3 seconds to parse, process, and print. Still a steady 1MB/s.

Output from gprof:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  9.88      0.08     0.08  4736519     0.00     0.00  __gnu_cxx::__atomic_add(int volatile*, int)
  4.94      0.12     0.04  5241745     0.00     0.00  __gnu_cxx::__exchange_and_add(int volatile*, int)
  3.70      0.15     0.03  9536286     0.00     0.00  icu_52::UnicodeString::isBogus() const
  3.70      0.18     0.03  3486210     0.00     0.00  TokenVector::look(unsigned int)
  3.09      0.21     0.03 18877218     0.00     0.00  icu_52::UnicodeString::length() const
  2.47      0.23     0.02  5010563     0.00     0.00  Token::Token(Token const&)
  2.47      0.25     0.02  4734982     0.00     0.00  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count(std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&)
  2.47      0.27     0.02  3486210     0.00     0.00  std::vector<Token, std::allocator<Token> >::operator[](unsigned long)
  2.47      0.29     0.02  2270618     0.00     0.00  std::vector<std::shared_ptr<Ast>, std::allocator<std::shared_ptr<Ast> > >::begin() const
  2.47      0.31     0.02  1483420     0.00     0.00  std::_Vector_base<std::shared_ptr<Ast>, std::allocator<std::shared_ptr<Ast> > >::~_Vector_base()
  2.47      0.33     0.02   366209     0.00     0.00  Token::Token(token_t, Position const&, icu_52::UnicodeString const&)
  2.47      0.35     0.02   267648     0.00     0.00  adjust_reserved(Token&&)
  2.47      0.37     0.02    84864     0.00     0.00  Parser::is_qualified_prefix_operator()
  1.85      0.38     0.02  8336922     0.00     0.00  Position::Position(Position const&)
  1.85      0.40     0.02  7964955     0.00     0.00  Position::~Position()
  1.85      0.41     0.02  4374018     0.00     0.00  std::shared_ptr<Ast>::shared_ptr(std::shared_ptr<Ast> const&)
  1.85      0.43     0.02  1587171     0.00     0.00  void std::_Destroy<std::shared_ptr<Ast>*>(std::shared_ptr<Ast>*, std::shared_ptr<Ast>*)
  1.23      0.44     0.01  6489602     0.00     0.00  StringCharReader::end()
  1.23      0.45     0.01  6013970     0.00     0.00  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count()
  1.23      0.46     0.01  5666315     0.00     0.00  std::__shared_ptr<Ast, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr()
  1.23      0.47     0.01  5421190     0.00     0.00  std::shared_ptr<Ast>::~shared_ptr()
  1.23      0.48     0.01  3486311     0.00     0.00  std::vector<Token, std::allocator<Token> >::size() const
  1.23      0.49     0.01  2441729     0.00     0.00  Token::tag() const

Looks good, around 15-20% overhead for reference counting.

Heap usage output from valgrind:


Nothing I want to change either. Looks like some spikes from lexing, parsing, and the few passes. I wonder whether libicu retains string data but, if so, nothing I want to do about that.

Addendum: Found the line still a bit puzzling. Did another leak check:

==23682== 
==23682== HEAP SUMMARY:
==23682==     in use at exit: 3,804 bytes in 5 blocks
==23682==   total heap usage: 9,992 allocs, 9,987 frees, 1,670,914 bytes allocated
==23682== 
==23682== LEAK SUMMARY:
==23682==    definitely lost: 0 bytes in 0 blocks
==23682==    indirectly lost: 0 bytes in 0 blocks
==23682==      possibly lost: 0 bytes in 0 blocks
==23682==    still reachable: 3,804 bytes in 5 blocks
==23682==         suppressed: 0 bytes in 0 blocks
==23682== Rerun with --leak-check=full to see details of leaked memory
==23682== 
==23682== For counts of detected and suppressed errors, rerun with: -v
==23682== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Nothing to worry about?

Doubts

I implemented another pass but now have doubts about whether I am tackling the problem right. These doubt stem from switching from a declarative description to implementing the compiler impurely in C++ and the transform I implemented.

For the bootstrap compiler, I was forced to reason within a declarative pure framework. Because a compiler is bookkeeping plus term rewriting, I more or less naturally decided to shift as much of the work to term rewriting as I could because of the lack of imperative manners of bookkeeping. The compiler rewrites the term, hoists a number of local definitions to the top level, and the only minimal bookkeeping it does is keeping an environment table around to look up the definitions of applied identifiers, it furthermore uses an impure counter to create fresh variables.

This is a fine and robust manner when working declaratively. All information is neatly represented in the tree and each pass can be visually inspected; the environment can be, and is, constructed on the fly when needed.

But I wonder whether I will end up with a rather odd beast when implemented impurely. I could also implement it more imperatively and shift work back to bookkeeping. People from the C++ world will probably find it a very odd compiler that it doesn't do the bookkeeping directly by impurely creating tables with references around to the various constructs in the tree and rewriting the tree in-situ.

The second doubt is whether I got the current transform right. I really appreciate minimalism but in a flurry of code I shifted from a push model to a pull model as basis for rewriting the tree. I.e., in the bootstrap compiler a typical transform looks like this:

    [ decl_space po c id aa dd -> 
        (rr0, e) = transform_child transform_metintro bindings.empty e;
        (rr0, e) = add_bindings rr0 e;
            (rr,e)
    | part po c id im dd -> 
        (rr0, e) = transform_child transform_metintro bindings.empty e;
        (rr0, e) = add_bindings rr0 e;
            (rr,e)
    | decl_interface po c id aa t k -> 
        rr0 = reduce_method_constraint id t bindings.empty;
            (bindings.union rr rr0, e)
    | e -> (rr,e) ]

It handles some special cases and 'pushes' the rewriting process further by 'asking' the transform to rewrite a node and its children.

In C++ I now have rewrites like this:

class RewriteBind: public Rewrite {
public:
    RewriteBind(): _state(AstEmpty().clone()) {
    }

    // state manipulation
    void set_state(const AstPtr& a) {
        _state = a;
    }

    ...
    // rewrites
    AstPtr rewrite_decl_type(const Position& p, const AstPtr& n, const AstPtr& t) override {
        set_state(n);
        auto t0 = rewrite(t);
        clear_state();
        return AstDeclType(p, n, t0).clone();
    }

    ...
    // cuts
    AstPtr rewrite_decl_definition(const Position& p, const AstPtr& n, const AstPtr& t, const AstPtr& e) override {
        return AstDeclDefinition(p, n, t, e).clone(); // cut
    }

private:
    AstPtr _state;
};

The transform rewrites everything and it is manually directed with various cut-offs where not to continue rewriting. The drawback is a lot of cuts, I somewhere prefer the original solution. I guess nobody looks at the source code of a compiler so I'll guess I'll stick to it. But doubts.

Saturday, May 23, 2015

Log 052315

Right. I implemented the second transform. It took some pondering but, as it turned out, it wasn't hard once I got the invariants, there were little, right. Because I dumbed down the language, the pass only needs to collect the quantified variables it encounters in a local state, determine new record declarations at each record statement rewrite, and pass the new constructed declarations to the top level.

The subsequent implementation of the pass is somewhat verbose, but somewhere much simpler than in the bootstrap. Just a class which implements a rewrite and has as local state a variable list and a set of declarations. I didn't need the nested context yet.

I am inclined to say that imperative wins but I am near certain the pass in the bootstrap can be rewritten more concisely. Must have been due to trying to use the transform function I had just invented, the changes I made during the development, and the more elaborate type system I was unsure I would, or could, support at the time of writing.

Legacy code.

Ah well.

The Battle: Declarative versus Imperative

While my pet project is rapidly deteriorating to a project for a good C++ programmer, which I am not, I still need to implement the next pass.

I implemented a general imperative construct which allows me to keep track of bindings. I did it in the straightforward manner in which most impure compilers keep track of an environment: A nested set of declarations. Basically, enter a scope, declare some (unique) variables, pop the scope. I decided not to worry too much about performance issues anymore, there are lots of cycles still to waste and memory is cheap.

It's implemented as a stack (vector) of maps from AST nodes to AST nodes. Entering a scope costs 600 bytes, something which I would have found appalling twenty years ago. Now I guess it's fine. Some testing showed that C++ easily handles scoping depths of 100,000.

The question becomes how to implement the pass below. It does something trivial, it lifts the constructors in a type declaration to typed declarations in the context. E.g., in a type 'type list = \a => [ nil | cons a (list a)]' the record 'cons' becomes a function 'def cons: !a => a -> list a -> list a = [ x, y -> Cons x y]'.

It's a purely syntactic transformation which, I am not entirely content with it, precedes the rest of the semantical analysis. As you can read yourself, it's a transform which passes a map of bindings around. Now to do it imperatively...

namespace records (

    using system

    using list

    using position

    using ast

    using transform

    using bindings

    def transform_rec_search: ast -> transform bindings =
        type_arr = [ t0, t1 -> type_arrow (pos t0) t0 t1 ];
        type_app = [ t0, t1 -> type_application (pos t0) t0 t1 ];
        search_child = 
            [ t, p, e -> 
                transform_child (transform_rec_search t) p e ];
        record_update_type =
            [ f, decl_record p c id aa i t e  -> 
                decl_record p c id aa i (f t) e
            | f, e -> e ];
        to_record_type = fix
            [ to_record_type, nil, t -> t
            | to_record_type, cons t0 tt, t -> 
                type_arr t0 (to_record_type tt t) ];
        record_abstraction =
            [ nil, e -> e
            | cons v vv, e -> 
                p = pos v;
                e = expr_match p (cons v vv) (expr_empty p) e;
                e = expr_abstraction p e;
                e ];
        [ t, rr, e -> 
        [ type_record po e0 t0 tt -> 
            ta = to_record_type tt t;
            type_to_var = [e -> fresh.fresh_expr_variable (pos e)];
            vv = list.map type_to_var tt;
            nn = list.map var_to_name vv;
            r  = record_abstraction vv (expr_record po e0 t0 nn);
            d  = decl_record po (expr_empty po) e0 nil t0 ta r;
            (bindings.insert (txt e0) d rr,e)

        | type_universal po t0 t1 -> 
            (rr0,_) = search_child t bindings.empty e;
            rr0 = map.apply 
                (record_update_type [e -> type_universal po t0 e])
                rr0;
            (bindings.union rr rr0, e)

        | type_abstraction po t0 t1 -> 
            (rr0,_) = search_child (type_app t (var_to_name t0)) 
                    bindings.empty e;
            rr0 = map.apply 
                (record_update_type [e -> type_universal po t0 e])
                rr0;
            (bindings.union rr rr0, e)
        | e -> search_child t rr e ] e ]

    def transform_recintro: transform bindings =
        bindings.to_records =
            [ rr -> list.map [(n,e) -> e] (map.to_list rr) ];
        add_bindings =
            [ rr, decl_space po c id aa dd -> 
                (bindings.empty, decl_space po c id aa 
                    (dd ++ bindings.to_records rr))
            | rr, part po c id im dd -> 
                (bindings.empty, part po c id im
                    (dd ++ bindings.to_records rr))
            | rr, e   -> (rr,e) ];
        [ rr, e -> 
        [ decl_space po c id aa dd -> 
            (rr0, e) = transform_child transform_recintro 
                bindings.empty e;
            (rr0, e) = add_bindings rr0 e;
                (rr,e)
        | part po c id im dd -> 
            (rr0, e) = transform_child transform_recintro
                bindings.empty e;
            (rr0, e) = add_bindings rr0 e;
            (rr,e)
        | decl_type po c id aa t k ->
            transform_rec_search id rr e
        | e   -> (rr,e) ] e ]

    def ast_record_intro: ast -> ast =
        transform_ast transform_recintro bindings.empty

)

Friday, May 22, 2015

Log 052215

Implemented a general compare function on the AST pointers. Did some memory profiling. Valgrind is my best friend. For future reference:

valgrind --tool=massif --pages-as-heap=yes --massif-out-file=massif.out.

I really have to get used to how much memory is allocated these days. Printing an AST:

-rw-rw-r--. 1 marco marco 2596096 May 22 20:26 test0.hi
mem_heap_B=279101440

Right,source code of 2,596,096 bytes becomes 279,101,440 bytes heap usage; 2.6 seconds to print. Ah well, at least I don't expect Moore's law to end soon when it comes to memory.

Plotted (x-axis is sample number over time, the 2.6 seconds), it becomes:

Right. Two spikes I don't get. Possibly it copies the entire vector of declarations while rewriting it. Nothing I want to solve at the moment.

The steady climb is a bit worrying, but valgrind reports no memory leaks. Moreover, I am pretty sure I do everything right, it copies pointers a lot while visiting the AST, a lot of refcounting but no 'real' pressure on the garbage collector. Probably due to that the AST remains in memory until the program exits, or C's memory allocator 'gives up' upto the next garbage collection point (yes, C does garbage collection too); yeah well, RAII...

From Toy to Tool

The first transform I implemented is also the simplest. I've no idea how I did it but, after all the fooling around, I now find myself implementing a formal spec, the bootstrap compiler.

A compiler is bookkeeping plus term rewriting. Everything boils down to how you're going to treat the information flow while rewriting a tree.


              ↓:T | ↑:T
             APPLICATION
        ↙:T / ↗:T   ↘:T \ ↖:T
       TERM0             TERM1


Most, if not all, passes implemented with functional transforms in the bootstrap pass information around, in a pure fashion using functional data structures. Functional data structures are great when working declaratively; there's no shared state so all invariants you want remain in place when passing them around.

Despite everything, using functional data structures often isn't as fast as proponents claim it is. It's nice if people can write Lisp compilers in a nanosecond but, usually, everything breaks down when you throw large files at a compiler written in that manner.

A scenario as I imagine the life time of a Lisp compiler: It starts of representing all declarations in a program with a list. Given more than 5K declarations it hits the first performance problem so that is replaced with a red-black tree. Given more than 40K declarations it hits the next bottleneck so that becomes a hash map. Given more than 150K declarations, new issues, and that becomes a data structure which pushes and pops ranges of hash maps. Given more than 500K declarations that breaks down, it ends with a fast imperative data structure where memory is allocated by hand.

Despite the fact that I fully appreciate the interpreter will probably never be used, I want fast imperative data structures. Ah man can dream, right?

Therefore, the next step, before writing the next passes, is thinking about how to get various imperative data structures right.

Thursday, May 21, 2015

Log 052115 - b

Mightily pleased with myself. My first transform on the AST. Lightning speed.

class RewriteBind: public Rewrite<AstPtr> {
public:
    RewriteBind(): Rewrite(AstEmpty().clone()) {
    }

    void clear_state() {
        set_state(AstEmpty().clone());
    }

    AstPtr rewrite_decl_type(const Position& p, const AstPtr& n, const AstPtr& t) {
        set_state(n);
        auto t0 = rewrite(t);
        clear_state();
        return AstDeclType(p, n, t0).clone();
    }

    AstPtr rewrite_expr_record(const Position& p, const AstPtr& f, const AstPtr& i, const AstPtrs& aa) {
        auto i0 = get_state();
        return AstExprRecord(p, f, i0, aa).clone();
    }

    AstPtr rewrite_decl_interface(const Position& p, const AstPtr& n, const AstPtr& t) {
        set_state(n);
        auto t0 = rewrite(t);
        clear_state();
        return AstDeclInterface(p, n, t0).clone();
    }

    AstPtr rewrite_decl_instance(const Position& p, const AstPtr& pr, const AstPtr& t, const AstPtr& b) {
        set_state(t);
        auto b0 = rewrite(b);
        clear_state();
        return AstDeclInstance(p, pr, t, b0).clone();
    }

    AstPtr rewrite_decl_method(const Position& p, const AstPtr& n, const AstPtr& i, const AstPtr& t, const AstPtr& e) {
        auto i0 = get_state();
        return AstDeclMethod(p, n, i0, t, e).clone();
    }

    AstPtr rewrite_decl_definition(const Position& p, const AstPtr& n, const AstPtr& t, const AstPtr& e) {
        return AstDeclDefinition(p, n, t, e).clone(); // cut
    }
};

Log 052115 - a

This is the complete definition of the first of the many small passes the bootstrap performs. And it has been five years since I worked on it; which is a good thing, because I needed to forget all this to even get the appetite to start all over in that verbose, odd, dialect of a language called C++. And I am baffled how I managed to make it that concise and I wouldn't know yet how to translate it into efficient, mostly imperative, C++ code. At the moment, I can only estimate what it does.

I have no idea what I was smoking in that period of my life or how I ever got the bootstrap functioning?

namespace bind (

    using system

    using position

    using ast

    using transform

    def find_constructor_name: ast -> ast =
        [ type_universal p t0 t1    -> find_constructor_name t1
        | type_abstraction p t0 t1  -> find_constructor_name t1
        | type_constrain p t0 t1 t2 -> find_constructor_name t2
        | type_instance p t0 t1 ee  -> find_constructor_name t1
        | type_application p t0 t1  -> find_constructor_name t0
        | t -> t ]

    def reshape_bind: reshape ast =
        [ i0, type_record  p  i t _e -> 
            type_record p i (set_pos p i0) _e
        | i0, decl_method p c id aa i t _e -> 
            decl_method p c id aa (set_pos p i0) t _e
        | i0, decl_bind p c id aa i t _e -> 
            decl_bind p c id aa (set_pos p i0) t _e
        | i0, e -> reshape_child reshape_bind i0 e ]

    def reshape_bind_search: reshape unit =
        [ u, e ->
        [ decl_space p c id aa dd -> 
            reshape_child reshape_bind_search u e
        | part p c id im dd -> 
            reshape_child reshape_bind_search u e
        | decl_interface p c id aa t k -> 
            reshape_child reshape_bind id e
        | decl_type p c id aa t k -> 
            reshape_child reshape_bind id e
        | decl_instance p c aa t -> 
            reshape_child reshape_bind (find_constructor_name t) e
        | e   -> e ] e ]

    def ast_bind: ast -> ast =
        [ e -> reshape_bind_search nop e ]

)

C++ is a Bloody Fine Functional Language

Well. I am having great fun, C++ is a bloody fine language; if it isn't a fine functional language, it is a fine language for implementing compilers. I implemented my first transform. I didn't opt for the visitor pattern, or a new solution by Stroustrup et. al. (too unstable), and I don't regret it one bit. Transforms are easily expressed, although be it a bit verbose.

The original bootstrap is implemented with one transform which both passes information around as well as rewrites; all other transforms are derived from that. Very heavy on overhead, but the bootstrap was supposed to be correct and robust, not fast. In the C++ compiler, I'll write out various manner of visiting the AST by hand.

I tested a transform, based on a template class with a shared state, which simply counts the number of nodes and, by transforming, copies the entire graph. The overhead of reference counting somehow dropped to negligible, no idea why. If I assume 1MB/s overhead for parsing, a copy of an entire graph on a 1MB source is roughly .3 seconds. More than adequate. I am laughing my head off.

The sharing solution might beat copying in the end. I now recommend it.

Wednesday, May 20, 2015

Log 052015

So. The parser is finished but I found a bug in infix parsing. I'll need to rethink it once, I probably need some scheme where I mimic an attribute grammar solution by passing a 'bottom' precedence operator in to the infix parsing rule. Ah well. There's a lot of work, it would be better if I just leave it as it is. For version 0.2?

I patched some rendering routines together to support a crude view on the AST. Nothing too elaborate, but it works for now and I expect to be debugging a lot.

It roughly parses 1MB/s.

I would really enjoy some memory usage statistics. Naive compilers usually waste memory on the internal representation and this compiler is pretty naive, but C++. Ah well, nobody is interested in declarative languages anyway, I guess.

Tuesday, May 19, 2015

Log 051915

C++ has fine tools. With valgrind I looked at memory leaks; none so far, RAII works. With gprof I debugged a performance issue while printing ridiculously large series of declarations. If you don't stick to the C++ manner of iterating over a vector, performance degrades because somewhere internally it a) holds a list of buckets to store all elements and iterating in the C manner searches for the bucket each step, b) it insists in allocating an extra object for each lookup. I don't understand the latter but after some tinkering using C++11's manner of iterating solved the problem. Printing performance dropped from 11 to 3 seconds on a large file.

It spends 40% of the time reference counting while printing which is a good indication of the overhead while rewriting terms. If a pointer comes into scope it needs to increase the reference count, if it leaves a scope it needs to decrease it; there's nothing to be done about it, and it will be like this for all rewriting functions.

After the parser recognized a small subset of the language, definitions and expressions, I now am expanding to the full language. The AST explodes, every alternative becomes about 50 lines instead of one. It's inconvenient and a lot of work.

The real hurdle will be whether I can express transforms concisely in C++. Well, it won't be very concise, pattern matching must be replaced by tedious downcasting and picking a record apart through accessor methods. I'll probably write some macro's for that, even if I find programming with macros bad practice. But I need some generic solution.

I need a transform which supports this:

               ↓:T | ↑:T
              APPLICATION
          ↙:T / ↗:T  ↘:T \ ↖:T
       TERM0              TERM1

The arrows denote the information flow around the terms during a transform where a term may also be rewritten. By fiddling with the information flow locally in each rewrite step it's not very difficult to implement semantic analysis and by fiddling with the expressions returned it is not difficult to implement rewrite passes. It more or less generalizes attribute grammar approaches to compiler construction.

The transform should be a generic function which leaves information flow and terms intact, unless you override certain cases. In that manner, without spelling out all functionality for each leave, you end up with concise definitions of passes, albeit that the transform may not be very fast.

Well, that's it. There's progress but still two to three years of work left. I fully understand why people implement untyped scripting languages, it's just so much easier.

ADDENDUM: I am nearly finished with the parser. I severely restricted the input grammar for types; it used to have a liberal free-flow grammar where pretty absurd types would be accepted since I was rushing the compiler to a bootstrap. This time, since I know I have a compilation scheme, the type system will go first. So, no explicit forall constructs, no predicates on type synonyms, etc. Dumbing down continues.

Sunday, May 17, 2015

Term Rewriting with Shared Pointers is a Major Design Decision

A compiler is bookkeeping plus term rewriting. Old compilers were usually full of bookkeeping through the use of tables. They needed to be since resources were scarce and performance was low. As processors evolved and became more powerful, compilers became more declarative and shifted in the direction of term rewriting.

Hi is a declarative language, so naturally, the bootstrap relies little on tables but simply rewrites, massages, the term until it is semantically verified and code can easily be output in a number of steps.

The bootstrap lacked in performance, due to:
  1. Wasteful memory representation of the data being worked upon.
  2. A slow operational model, combinator rewriting instead of the C calling convention.
  3. Lack of imperative constructs.
  4. Lack of fast impure containers.
  5. Garbage collection.
I could have solved 1., made an effort to fix 2., added 3., and ultimately added 4. (but that only becomes necessary, probably, when processing large, >1MLoc, files) and, in a final last effort, implement features to do 5. fast.

The path of a selfhosting declarative compiler seems to be that you keep on adding imperative features until you've reinvented C. I personally loath the Haskell IO monad and barely agree with Clean's uniqueness types. For Hi, it isn't worth it: The language is fine for small programs, making it fast and selfhosting is a 20-100 man year effort, which I don't have, can fail anyway, and I want to keep the language simple and declarative.

Most people agree we hit the end of Moore's Law, the era of the free lunch is over. We had a 3500x speedup over the last decades since the introduction of the first microprocessor, declarative languages cannot wait anymore for another few decades until they become that fast that people want to use them.

So, I decided to ditch the effort and do it all in C++. Fast interpreter, slow language. If it would ever attract a user base, there are plenty of optimizations left.

But since I have a term-rewriting self-hosting compiler as a design reference, naturally the choice for representing terms comes up. Since I rewrite a tree, or DAG, I basically have two choices: use trees and copy, or use DAGS and support sharing with some form of garbage collection. So, it's either copying or sharing. Both have nonperforming corner cases: trees are fast but you might spend too much time copying, for reference counting the overhead may become large on terms when only rewriting trees and the reference counting degrades to pure overhead.

I made the choice to represent the AST with C++'s shared pointers. After having looked at GCC, which uses a pointer-to-massive-struct copying system internally, I wanted to test performance of a sharing solution on a 500KLoc file, a large list of small declarations, with respect to copying. It spends 40% of its time reference counting.

For industrial compilers this would imply that copying beats sharing; at least, pending the implementation of C++'s shared pointer.

It does 30ms over a small file of 5k declarations, so I'll stick to this scheme of shared pointers. It's just too convenient implementation wise, if a node gets rewritten or falls out of scope in a C routine, it is automagically garbage collected. But boy, oh boy, it is horrific to pay performance for design decisions this early in the implementation stage.

Saturday, May 16, 2015

Log 051605

Working on the AST, parsing, and rendering code simultaneously. Verbose, but after I sank my teeth in it, not unpleasant. Dijkstra would hate the manner in which I write code. Ah well, he turned out to be somewhat of an idiot when it comes to software engineering anyway. So, whatever.

Later that day: *poof* After a flurry of activity I managed to get a rudimentary grammar recognizer ready. I immediately did some performance tests. On small files, fine; on a 500KLoc file, not so good. Somehow parsing took 11 seconds but rendering becomes slower as the file grows larger and I broke that off. I blame smart pointers for now.

Ah well. Guess if anyone will be using it it'll take a long while before people start writing 500KLoc files.

Tuesday, May 12, 2015

Log 051215

A bit of reading while thinking about qualified names and namespaces. Stuff I noticed. Chicken Scheme has a problem with running out of stack space too. There is a nice language named Nim under development, glancing through it, it seems to mostly be an Algol like derivation. But fast and bootstrapped, I like it. Haskell allocates large objects in chunks to keep GC happy. If you compile to straightforward C it's actually quite difficult to keep the GC manager happy due to the fact that at any moment the C routine might ask for memory, which might trigger a collection, and ruin all invariants in the C routine. Using C reference counting completely avoids that problem. (Or you use bytecode, as in Clean, OCaml and Java, and avoid the problem by keeping the complete machine abstract.)

There is a small but major design decision in the Ast representation. Lists or vectors? Lists allow for sharing and a declarative style, vectors might be faster but at the expense of ugly imperative code and copying. After some dabbling I threw up a coin and decided vectors.

I continue with my streak of shooting myself in the foot and am going to implement namespaces no matter what. Though it's academically uninteresting and pretty annoying to get right algorithmically.

I am leaning towards dropping the combinator rewriting system... (But what about OS's where the stack size is a hard limit? No MacOS?).

Eager. Pure. No GC. No combinator rewriting. HM inference. No bootstrapping. No cyclic structures. C semantics. It'll be completely academically uninteresting. But hopefully a great small tool for patching small functional programs. A front-end to LLVM.

(As a side note: Because of the end of Moore's law I expect the next thing to be vertical integration which means Hardware OS's; or since OS's are written in C/C++, language integration. All current research paywalled, unfortunately.)


Sunday, May 10, 2015

Log 051015

So you thought you could do term rewriting in C++, ey? I got the idiom for representing ASTs in C++ figured out. Of course, the idiom is well-known, it's an instantiation of this, but it took me a while of struggling with the template library and some googling to get right.

Basically, terms in the datatype derive from an abstract class. Since an abstract class cannot be instantiated, C++ more or less forces you to pass references back as the result of functions. Moreover, since the abstract class doesn't know how to copy objects from the stack to the heap, the solution is to write a copying constructor and clone method for each variant.

This is a lot of cruft, the source code is going to explode on it. It's a very heavy price to pay for not being willing to implement the interpreter/compiler in a functional language such as SML.

But yeah. I chose C++ because reasons.

Friday, May 8, 2015

Log 050915

Moving stuff around in the sources, trying to get an idiom right on how to handle shared pointers in C++, adjusting a provisional lexer such that it will scan a test language, rethinking qualified names and namespaces, thinking about how to represent the parser right in C++. Unmotivating tidbits of engineering.

Type checker in the back of my mind. I am rethinking the ideas I had five years ago. I re-realize that there probably is no subtyping relation on types. Any concrete type can implement multiple interfaces at some given point, and that may be extended. Once extended, all old type derivations remain unchanged. Simply put: any concrete type just gets augmented with a list of interfaces and simple HM inferencing is enough. (Though it would be nice to put arbitrary values sharing interfaces in a container.) Moreover, since the language lacks assignment, there shouldn't be the need for a monomorphism restriction a la ML. Of course, you might want to store stuff at the module level but I would be perfectly happy labeling primitive routines for that 'unsafe'.

Since the language has exceptions I actually think I want RAII too. Which means 'destructors' on garbage collected nodes referring open files, database connections, etc. Which means I am inclined to drop my copying garbage collector and simply reference count everything. Even more wasteful but I can use C++ shared pointers for that and get thread safety for free. Moreover, good enough since the language guarantees that you're rewriting a DAG anyway.

It's just a trivial language. The lack of use cases means I shift between the semantics. If I read on a programmer's language I want to go into the direction of representing bytes, words, longs, etc.; if I read up on scripting languages my mood changes and I want to go for BigInts. Whatever, or not?

Log 050815

So no programming today. Family business, which was a good idea since I am easily mentally worn-out these days. But it remains in my mind.

So, a few random thoughts:

Trading robustness for speed. I rewrite everything to an eager manner of rewriting combinators to avoid stack overflows. I am not sure people willing to use the language will appreciate the, rather enormous, cost in running speed. If I am going to compile to LLVM, or some other JIT since LLVM seems to have performance problems, it becomes 'trivial' to encode to standard function calls. Moreover, a rather large number of optimizations would become feasible, or could easily be done by the LLVM compiler.

But you sacrifice robustness, and after a myriad of stack overflows bootstrapping in OCaml I don't want to go back.

As a side note, I also figured out how to handle closures in that scenario. The function knows when to reduce or push a closure so that's trivial. Exception handling is also enabled by default in the modern processor ABIs so I figure I don't even need to do difficult stuff for that.

Who will ever use this language? This is really a difficult question. FP enthusiasts? They already have a large number of compilers to chose from. SML or Haskell probably will gobble all of them up. I thought numeric computing for a moment, but those favor simple algorithmic languages. I read up on Julia, these programmers think fundamentally different than functional programmers and want optimal speed. I thought about web servers, which may be a target for some people, in which case I need to think about handling threads and shared state. Then I thought that it might be nice to write simple robust GUI applications; but who would choose my language over Python? I need a convincing use case. Some place where you need ridiculously robust software?

But I still like the language.

Compiling to Object files and C/C++ integration. Given a JIT compiling to .o files is automated for you, which is something I really want. But how the hell does the linker know about namespaces? Is there some name mangling going on I don't know about? Answer here, it seems.

Thursday, May 7, 2015

Log 050715

So combinator parsing in C++ turns out to be a bad idea, not only is it near impossible to get right most C++ implementations need several seconds for large files whereas the 'dumb' method usually finishes in milliseconds, needed to unlearn the functional style. Back to straight-forward recursive descent with switches on the current token.

I think I got how to write a far more trivial type checker than what I once started on. Straightforward Hindley-Milner with a subtyping relation on types implementing interfaces. It shouldn't be hard, I am also dropping all other ideas I had. Bottom-up HM+subtyping is good enough.

I've got a start of an AST description now going. Going to reference count everything through the use of C++ smart pointers; that's a bit overkill but it beats thinking about copying.

For downcasting it's going to be tag based switches followed by a static cast.

Everything is of course far more verbose than the bootstrap which I am using as a reference implementation.

So far so good. Making progress but this is probably still the easy part.

ADDENDUM: I thought I would share the AST solution I chose for people who might have ended up here by chance.

1) Define tags for your expressions:

typedef enum {
     VAR,
     ABSTRACTION,
     APPLICATION,
} expr_t;

2) Define a base object:

class Expr;
typedef std::shared_ptr<Expr> ExprPtr;

class Expr {
public:
     Expr(expr_t t): _tag(t) {};

     expr_t  tag() { return _tag; }

     virtual ExprPtr clone() = 0;
private:
     expr_t _tag;
}

3) Derive variants, for instance

class ExprApplication: public Expr {
public:
      ExprApplication(ExprPtr e0, ExprPtr e1): Expr(APPLICATION), _left(e0), _right(e1) {}
      
      ExprApplication(const ExprApplication& e) {
          _left = e._left;
          _right = e._right;
      }

      ExprPtr clone() {
          return ExprPtr(new ExprApplication(*this));
      }

      // etc.

private:
    ExprPtr _left;
    ExprPtr _right;
}


It took a lot of googling but the idiom for using shared pointers where you mostly only reference the base class is aptly described here.

Well, that's it. Don't create cycles, as long as your term is a DAG, most likely it will always be a tree, this scheme should work with RAII and might even allow for memory-compact solutions far better than simply copying everything.

Wednesday, May 6, 2015

I mean like: Wow

I've been reading the entire Internet trying to track down how people write compilers, more precisely, parsers and abstract transformers in C/C++.

One post claimed the roughly the following: Compiler construction is a well-defined and finished trivial field where grammar theory is fully developed and compiler writers know very well how to implement the various aspects of compilers.

Then I read various sources, even GCC and CLang. The parsing compiler theory, roughly described as: bottom-up parsing is the most general and serves all your needs, has been abandoned for hand-written PEG recursive descent parsers with infinite lookahead; moreover, all pretty LALR and other compiler generation tools have been abandoned and rot away. Nobody knows what the best manner of representing an AST in C++ is and people patch the weirdest solutions together to write recursive descent parsers. I haven't even started on semantic analysis.

If all problems are solved then all the code I just read doesn't show it.

Monday, May 4, 2015

Term Rewriting using the Visitor pattern ist Verboten

I've been looking into what the correct manner of downcasting is in C++. Unfortunately, downcasting is only allowed on pointer or reference types and then in a manner I don't really like yet.

Still, I need it. Rewriting a term is best done functionally for various reasons. If you cannot imagine why, and I've seen a lot of people advising against it, and you propose a visitor pattern then you're an idiot.

Not convinced you're an idiot? Even Stroustrup agrees on that.

Which is the problem with the Internet these days. It used to be that I could google or post a question to a mailing list and get an informed answer within a few hours. These days, with widespread programming education, the Internet is full of idiots and bad advice.

So I am not sure what the proper manner of downcasting terms in C++ is. Another small puzzle.

Sunday, May 3, 2015

Rewriting Terms in C++?

Given that I am now writing a compiler in C++ the question of term rewriting and following RAII comes up. A typical scenario following RAII is given below.

Say your simple term language has elements like this:

class AppTerm: public Term {
public:
     AppTerm(const Term &l, const Term &r): _left(l), _right(r) {};

     Term left() {
         return _left;
     }

     Term right() {
          return _right;
     }
private:
     Term _left;
     Term _right;
}

Then, since most compilers do a lot of term rewriting, you'll have an enormous amount of code like:

Term rewrite_app(const AppTerm &t) {
     SomeResult r0 = rewrite_term(t.left());
     SomeResult r1 = rewrite_term(t.right());

     // do something

     return AppTerm(r0.value(), r1.value());
}

If I get C semantics right, then r0 and r1 will go out of scope and the resulting term will receive copies of whatever was constructed in the rewrite of both constituents of an application term.

I say it again: copies. Think about that for a moment. Copies.

If every copying rewrite function is going to copy entire terms for each intermediate step, as I think it will do when implemented naively, you're going to have one hell slow compiler.

Did I get C++ semantics right? The only way out is to use smart pointers it seems.

A C++ interpreter/compiler with an LLVM back-end

After an old-school bootstrap of the compiler I wrote, I decided I did it all wrong. Doing it right would imply creating an interpreter/compiler in C/C++.

The major lesson here: Despite academic enthusiasm, old-skool bootstrapping is awful and extending a bootstrap compiler is even more awful.

The major hurdle I needed to take: my language is that good that I didn't feel like doing it all in C/C++ again.

Last week, I took the plunge. Starting on an interpreter/compiler in C++ which will hopefully have an LLVM back-end.

There go another few years of my life. I feel good about it.