Friday, October 16, 2015

Goodbye Hi. Last Post.

So I gave up on implementing a typed language. The problem, in a nutshell, is that I couldn't convince myself I could solve a particular problem associated with constraint typing.

The problem: In order to prove a specific goal you need all information derivable from a term; and with juggling around only local information, I cannot convince myself I could arrive at a typing system which would prove specific goals given all information derivable in the term.

I am not sure how general this problem actually is. The closest typing system I could look at is Haskell, and there are no results on this problem. I gather they patched a system together which usually works.

The language I am now developing looks naked though. It'll become an experiment in how to write untyped algebraic specifications.

Too slow for anything useful probably but a nice experiment.

After the new language is developed enough, I'll start a new blog. Goodbye Hi.

Tuesday, September 29, 2015

Hi Relegated

I concede. The minor typing systems I could come up with didn't handle all corner cases well. Instead of thinking about it more, I decided to start on a new untyped language.

Wednesday, June 24, 2015

Algorithm W

I like the prover in my bootstrap but, yeah, there's algorithm W too. So, I decided to go for that. Gone are the bisimulation proofs of type equivalences but I simply don't know what I am dropping there and can't care to figure it out. At least with W I know it has been proven correct, terminates, and gives reasonable performance on small terms.

Which also implies that I'll need to figure out how to do constraint checking with W. But, in some sense, W already checks constraints by providing a proof, a most general unifier, for types.

So. How hard can it be? Right...

Sunday, June 21, 2015

Log 062115

Made a few small changes to the AST. Implementing a number of routines ahead of time until I figure out what to do with the type checker.

The type checker, as implemented in the bootstrap, is okay-ish. Not too shabby. The major difference between it and other functional type checker is that it records substitutions as type equivalences. I probably could speed it up already by factoring out different propositions. Moreover, I decided against using the gathered type equivalences to attribute the tree; I could do without recording substitutions.

Still, there's this thing about being able to typecheck cyclic types. Suppose, you have a type definition which unravels to the following:

             / \
                / \
                 .. ..

The type checker I now have "records" substitutions. It should therefore be able to prove type equivalences like the above by stating an equivalence, unfolding the type once, and using the postulated type. (Conceptually, the second "t" in the picture above becomes a cycle pointing to the first "t'.)

But that doesn't seem to be a problem in practice. To prove equivalences ML compilers simply unravel types a few times and use the definition to "close the cycle".

Ah well. First kindification I guess.

Thursday, June 18, 2015

A Small Derivation

I completely forgot what my prover does; I rushed it anyway. So, I worked out a rough example by hand below to get some grasp on it. In a bottom up fashion it collects a mess of constraints, say it gathered

int:*∧ (::Ord int) ⊢t0:k0->t1:k1 = t2:k2 ∧ t2:k2 = int:ka->int:kb ∧ (::Ord t1)

It then proves that (roughly) likes this

int:*∧ (::Ord int) ∧ k2 = * ∧ t2:* = t3:k3->t4:k4:* ∧ int:*= t3:k3 ∧ int:* = t4:k4 ∧ 
k0 = ka =* ∧ t0: = int:*∧ k1 = kb=* ∧ t1:* = int:* 
int:*∧ (::Ord int) ∧ k2 = * ∧ t2:* = t3:k3->t4:k4:* ∧ int:*= t3:k3 ∧ int:* = t4:k4 
 k0 = ka =* ∧ t0:* = int:*∧ k1= kb = * ∧ t1:* = int:* 
⊢ (::Ord int)
int:*∧ (::Ord int) ∧ k2 = * ∧ t2:* = t3:k3->t4:k4:* ∧ int:*= t3:k3 ∧ t1:k1 = t4:k4 
∧ k0 = ka =* ∧ t0:* = int:* 
⊢ t1:k1:* = int:kb ∧ (::Ord t1)
int:*∧ (::Ord int) ∧ k2 = * ∧ t2:* = t3:k3->t4:k4:* ∧ t0:k0= t3:k3 ∧ t1:k1 = t4:k4 
⊢ t0:k0 = int:ka ∧ t1:k1 = int:kb ∧ (::Ord t1)
int:*∧ (::Ord int) ∧ k2 = * ∧ t2:* = t3:k3->t4:k4:* ∧ t0:k0= t3:k3 ∧ t1:k1 = t4:k4 
⊢ t0:k0->t1:k1 = int:ka->int:kb ∧ (::Ord t1)
int:*∧ (::Ord int) ∧ k2 = * ∧ t2:* = t3:k3->t4:k4 ∧ t0:k0= t3:k3 
⊢ t1:k1 = t4:k4 ∧ t0:k0->t4:k4 = int:ka->int:kb ∧ (::Ord t1)
int:*∧ (::Ord int) ∧ k2 = * ∧ t2:* = t3:k3->t4:k4 
⊢ t0:k0= t3:k3 ∧ t1:k1 = t4:k4 ∧ t3:k3->t4:k4 = int:ka->int:kb ∧ (::Ord t1)
int:*∧ (::Ord int) ⊢ t0:k0->t1:k1 = t2:k2 ∧ t2:k2 = int:ka->int:kb ∧ (::Ord t1)

So, okay. Guess that works. I made a reasonable estimate of how constraints should be solved, dump the mess in the prover, from the antecedent I read off how to attribute the tree with the solved constraints.

It doesn't seem to handle intermediaries well, though. Guess I need to check it with the algorithm. (Hmm. Intermediaries should remain on the left-hand side of the equation?)

Log 061815 - B

Right. I started implementing a general prover, then after the first few lines of code, decided against it. Thing is, I have a prover which works on terms like this

a0 ∧ ... ∧ anc0 ∧ ... ∧ cm

It discharges each consequent given what it knows by inspecting the antecedents. Antecedents and consequents may be anything; e.g., kind equalities, type equalities, instantiation requests, etc. While proving a goal new consequents, sometimes an antecedent, may be introduced (trivially).

Thing is, I don't trust the time complexity of the algorithm. Checking for a consequent whether an antecedent exists is linear (times the term size of the antecedent), substitution is linear (times the term size), almost every step the prover takes is two of these linear, possibly quadratic, operations. As far as I know, proving in this general manner might be cubic.

So, I am working in the unknown here. Looks to me I want to keep the number of antecedents as far down as possible to keep the number of lookups and substitutions down, might also be that it is cubic but pragmatically that doesn't matter since you only operate on small numbers of small terms, lastly, since I sometimes am doing a 'bisimulation' proof, I might actually need to do it this way.


Next Step: Kind and Type Checking

Looks like the module system is finished in that it passes a minimal unit test. I.e., given the next minimal non-sensical unit test:

import "util.hi"

namespace f.g (
    interface Some = #t where (
        def x : t-> v ->unit

    instance Some int = (
        def x : unit = nop

namespace i.j (
    def h: unit = nop
    def test: unit = h.f.g.x

namespace k (
    using f.g
    using i.j

    def test1: unit = h.x

type list = \a => [ nil | cons a (list a) ]

def ++: list a -> list a -> list a =
    [ nil, yy       -> yy
    | cons x xx, yy -> cons x (xx ++ yy) ]

That gets elaborated to

import "util.hi"
namespace f.g (
    interface f.g.SomeTC = (!vTV => #tTV where ( 
            method xC {SomeTC} : (tTV -> (vTV -> unitTC)) = (XXX)
    instance f.g.SomeTC f.g.intTC = (
            binding xC {intTC} : unitTC = nopC
    select f.g.xC {f.g.SomeTC} : (!vTV => (!tTV => (::SomeTC tTC => (tTV -> (tTV -> (vTV -> unitTC)))))) = (XXX)
namespace i.j (
    def i.j.hC: unitTC = nopC
    def i.j.testC: unitTC = (i.j.hC@f.g.xC)
namespace k (
    using f.g
    using i.j
    def k.test1C: unitTC = (i.j.hC@f.g.xC)
type listTC = (\aTV => [ nilC{listTC} | consC{listTC} aTV (listTC aTV)])
def ++C: 
    (!aTV => ((listTC aTV) -> ((listTC aTV) -> (listTC aTV)))) = 
    [ nilC, yyV -> yyV | consC{list} xV xxV, yyV -> ((consC xV) ((++C xxV) yyV))]
record nilC{listTC}: (!aTV => (listTC aTV)) = nilC{listTC}
record consC{listTC}: 
    (!aTV => (aTV -> ((listTC aTV) -> (listTC aTV)))) = 
    \v#0V, v#1V -> consC{listTC} v#0V v#1V 

So, that's good enough for now.

The next step is deciding whether I am going to do semantical analysis or compile to LLVM code such that the interpreter will run. I decided to go for semantical analysis first.

The first step in semantical analysis is kindification, checking type denotations have the correct kinds. There are some minor design decisions in whether I'll attribute type definitions and type variables with kinds, or only semantically check code.

I also need to think over the scheme of how the kind, and later type, checker will work. What I do in the bootstrap compiler is gather up constraints in a bottom up fashion and dispatch the constraints to a general prover. I like that implementation but I am somewhat concerned about performance.

It's either: a) bottom up solve all constraints in situ, or b) bottom up gather constraints and dispatch to a prover and I don't know what to choose yet.

Log 061815 - A

After I rewrote passing the environment through copying, that continued bugging me since copying is expensive. It also dawned to me where I made the mistake; I copied a local reference, modified that, and never passed it back before.

I rewrote the compiler such that the environment is passed as a pointer. It makes more sense and a lot of passes will use the environment, I didn't like the idea of spurious copying.

I also noticed again how computationally expensive the identification phase is, but I am not going to change that. It's just one pass and can hopefully easily be rewritten at a later stage.

Tuesday, June 16, 2015

Got Bugs

Well, I have been trying to understand a bug for the last few hours. Somehow, the AST doesn't get updated neither do I build a correct environment. Guess I got some of the semantics of C++ wrong[1][2], but I don't see where.

[1] Riiiight. auto& m:mm (reference) instead of auto m:mm (local value) was one bug. Now the other one.
[2] I couldn't get a local update for environments right. Probably should pass the environment as a pointer instead of a reference but I am not sure. Solved it with something which looks like a copying scheme... Might be bloody expensive. I need a good C++ programmer to work on this project too.

Log 061615

I implemented the module system. It starts of with one parsed top module top, and in one pass eagerly depth-first adds the list of included modules m0, .., mN, top through a closure construction. Semantic analysis is implemented by creating the complete context of declaration first, then doing various passes modulo that complete context. The modules keep track of their state, so at some point I could implement reusing pre-compiled modules.

Other variations could be to build up the context of declarations in left-to-right order during semantic analysis, or to use a cyclic graph representing the inclusion path and include only the local context.

It don't think the other variations are worth it. The left-to-right order is problematic with mutually recursive modules, the cyclic graph doesn't detect name clashes which could occur during linking easily. Moreover, the presented simple manner could allow for concurrent semantic analysis.

But yeah. There's a worst-case scenario that you'll do semantic analysis on a ten-line m0 file in a context of a million declarations.

Life is never easy. Guess as long it is documented, it's fine.

Monday, June 15, 2015

Breaking Up the Design

I've been thinking too functionally for a while. If I am going to have a module system where things may be pre-compiled, have some non-obvious state, and relate to each other it's simply easier to implement that module system in an OOAD manner on top of the passes which can be seen to implement simple "pure" AST transforming functions.

Implementing an object-oriented module system and breaking up the identification phase into separate phases.

Log 061515

So. The idiotic observation is that I can only compile correctly if I assume that all files are typechecked and compiled given all declarations imported in the project. That leads to a somewhat expensive design where the top module drives everything.

Somehow, this trivial observation took me half a week, and I am still not sure how to implement it right; i.e., how to translate this to performant and clear handling of data structures internally.

It simply isn't nice that a simple module needs to be compiled with a context including all the declarations[1], it breaks compositionality one assumes one has[2], in the project but there doesn't seem to be a way out. At least, not an easy way out.

And, at minimum, I want a design which can easily be extended to allow pre-compiled modules to be loaded at some point too. There's that too.

[1] Two modules may mutually recursively import each other's declaration, so there is already a need to import "forward" declarations. The top module may import declarations of modules which "shadow" imported declarations and lead to a name clash during linking.

[2] But it looks like other compilers don't have these compositionality invariants either. If I modify a file half-way through a C project that'll likely give me a core dump. And I can easily program C code such that all files will compile nicely individually but linking will fail. It's just an unsolvable problem.

Saturday, June 13, 2015

Log 061315

Suppose A0 and A1 define "". Suppose B0 imports A0 and B1 imports A1 and use "". Suppose C imports B0 and B1.

I can solve that B0 and B1 bind correctly to "" during typechecking. But then I just shifted the problem to the linking phase where it'll turn out "" gets imported from two different files.

I can also solve that by fully qualifying "" with the file in which it is defined. But then what if A0 and A1 are also named the same but live in two different paths? I would also need to fully qualify each identifier with both the filename as well as the path.

So. I am not going to solve this. Programmers will need to make sure all identifiers for a project live in different namespaces/don't overlap. Everything is solvable but this simply isn't worth it.

It's a feature, not a bug.

Friday, June 12, 2015

Log 061215

Right. Of course, I am going to implement module inclusion in the right manner but I shot myself a bit in the foot choosing namespaces as the guiding principle instead of files. Most compilers (Python, Java, Haskell) treat files as a, or the, guiding principle for grouping declarations. This greatly simplifies a compiler but I chose namespaces, decoupling the file from declarations, because I noticed I personally like it more to group functions which implement a certain functionality in a single file.

Thing is: You can't really decouple in that manner, import directives (inclusion of files) are necessary due to performance reasons and any import directive manipulates scope. So, I now have two directives manipulating the scope of declarations of a file, or module. The import directive and the using directive. I ended up doubling the work. Moreover, the scoping rules I have imply implementing a module system is now a non-trivial exercise and data-structures need to reflect the 'double nature' of scoping.

The other problem now is mutually dependent modules. If module m0 imports module m1, and m1 imports m0, I cannot have a simple identification phase since m1 needs to be identified before it can process m0 and vice-versa. This wasn't a problem in the batch compiler because that treated all included declarations as a flat list.

Going from batch compiler to an interpreter now also implies the identification phase needs to be split such that declarations can be gathered "halfway"; i.e., after syntactic transforms but before any kind of semantic analysis.

Thursday, June 11, 2015

Simple Design Decisions on Batch Mode Compilation

So. Always when I am mucking around I find I didn't think something through; i.e., bothered by underspecification. The problem with the module system is that I haven't decided on how the top-level program should work. I want to support some scheme where I can use the program as both an interpreter as well as a compiler, and that affects the internal organization.

First, the bootstrap compiler: This program behaves akin to a batch compiler as gcc. Each module is compiled individually to "object code" and all modules are "linked" together and output as a C program. (Where in my simple compiler, the object code is the serialized type-checked AST, and the linking phase is simply whole program compilation. But yeah, works.) That's a pretty straightforward manner of compilation, moreover, keeps a number of invariants trivially.

But I want an interpreter first, and a compiler second, which means I want to support invoking the program on one input file, where it includes all other files, and all files are semantically checked and translated in one run. Now you would think you could run the interpreter internally in batch mode on all imported files but, no, it turns out you can easily pollute the declaration space of the import chain of individual files since other modules will have been loaded; it probably is also too expensive to run the interpreter as a compiler in batch mode first.

The easy way out is to live a little and don't give a fuck, of course.

How does Python handle this? I need some solution. Asking the question is answering it, of course.

Wednesday, June 10, 2015

Log 061015

I am mucking around with the module system as to be able to test larger multi-file projects. It's silly, sloppily trying to implement something which took me 20 lines in the bootstrap compiler.

Tuesday, June 9, 2015

Log 060915

After the initial identification phase the next step is semantical analysis. A major part of semantical analysis is kind and type checking, which relies on unification and substitution.

Implementing substitution primitives.

Saturday, June 6, 2015

Log 060615

It took me a while but I'll be dropping the namespace local implicit constant. It solves one particular range of important use-cases well but, once introduced, people would want more.

Still, it's a very important range of use-cases: calculations modulo a local state...

I'll shift it to some later date. Like an 0.5 version.

Thursday, June 4, 2015

Classes, Monads, and Implicits

Since I am programming in C++ my transforms/rewrites are implemented as classes which often have a simple local state. I subsequently wrap these classes into 'pure' simple AST manipulating functions which construct an object of the class, applies the rewrite, and returns the rewritten AST. It works surprisingly well, even though it feels somewhat 'muddy' relying on all kinds of invariants.

In a 'pure' language like Haskell I gather you would normally do this with a monad. Create a transform which works modulo some state wrapped in the monad, and hide the monad by returning the result after the monad was ran to completion. I have never programmed a Haskell monad in my life but somehow the C++ impure solution looks better to me.

I want to do something similar like C++/Haskell in Hi and am thinking of extending the language with namespace local implicits. The idea is that it should work like below, a namespace local implicit is a constant threaded implicitly through the functions in a namespace.

namespace implicit [ 0: int ] (

    using system

    def inc: int =
        this := this + 1; this

    def map_inc: list int -> list int =
        [ nil       -> nil
        | cons x xx -> cons (inc + x) (map_inc xx) ]


The idea would be that on calling a function in the namespace implicit, it is augmented with the implicit constant. A function within the namespace can refer to that constant with the 'this' keyword, and also manipulate the value threaded with something which looks like an assignment (but in it's operational semantics simply means: thread this new value along instead of the old value, although it may be implemented as a variable); every function called from within that namespace receives the (updated) implicit; leaving the namespace stops the threading and the implicit is lost.

Given these semantics, running the above function 'map_inc' on the list '0,1' should deliver the following result '1,3':

    implicit.map_inc (cons 0 (cons 1 nil))
    (cons 1 (cons 3 nil))

It's a non-trivial extension which I am not sure will break the type system, be used, or achieves what I want to mimic in the right manner. But it looks like there is a 'pure' operational semantics for it, so it should be somewhat safe, and it somehow 'feels' right. Now what?

Log 060415

The identification phase, consisting of five or six passes, now passes minimal unit tests. So well, done with that, I guess.

Tuesday, June 2, 2015

Significant Performance Drop

I unit tested an alpha algorithm for the identification phase. It isn't the last pass I need to implement but it's the heart of identification. During the implementation I was already a bit worried about how I implemented it, so I, again, did some performance measurements.

Well. A significant performance drop. The reason? It keeps a flat map of all declarations it can deduce. Every time you enter a namespace, or use a using directive, it searches through that whole map and declares in a local dictionary all declarations which share a certain prefix.

Tested on one large file, you get a whole lot of declarations, and for every namespace and using declaration a linear search. That's a quadratic slow-down I guesstimate. For a 1MB file, 1MB/min instead of 1MB/sec (though it takes 3ms for the simple file I copied into 2000 different namespaces to get to 1MB.)

Performance dropped that significantly that it'll only comfortably compile medium sized programs now. I'll leave it as is, since it's just one pass, it's unlikely people will write large monolithic files with lots of namespaces now, and (probably) solvable by someone implementing a better search tree for namespaces at some point.

It strengthens my belief that all algorithms in a compiler should be linear.

Below, performance measurements on a 1MB sized file.

Output of gprof:
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 24.62     27.19    27.19 597938347     0.00     0.00  __gnu_cxx::__atomic_add(int volatile*, int)
 16.92     45.88    18.69 600390363     0.00     0.00  __gnu_cxx::__exchange_and_add(int volatile*, int)
 12.18     59.33    13.45 207564208     0.00     0.00  Ast::tag()
  6.73     66.77     7.44 392894022     0.00     0.00  std::__shared_ptr<Ast, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__shared_ptr<Ast, (__gnu_cxx::_Lock_policy)2> const&)
  4.73     71.99     5.22 194555526     0.00     0.00  std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std:
:shared_ptr<Ast> > >, LessAstPtr, std::allocator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > >::_S_right(std::_Rb_tree_node_base*)
  2.81     75.09     3.10 599838760     0.00     0.00  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count()
  1.63     76.89     1.80 597760346     0.00     0.00  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count(std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&)
  1.58     78.63     1.74 192160000     0.00     0.00  std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std:
:shared_ptr<Ast> > >, LessAstPtr, std::allocator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > >::_M_clone_node(std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > 
  1.25     80.01     1.39 600390363     0.00     0.00  __gnu_cxx::__exchange_and_add_dispatch(int*, int)
  1.23     81.37     1.36 597938347     0.00     0.00  __gnu_cxx::__atomic_add_dispatch(int*, int)
  1.21     82.71     1.34 209242230     0.00     0.00  std::__shared_ptr<Ast, (__gnu_cxx::_Lock_policy)2>::operator->() const
  1.04     83.86     1.15 192000000     0.00     0.00  RewriteIdentify::ast_has_prefix(icu_52::UnicodeString const&, std::shared_ptr<Ast> const&)
  0.86     84.81     0.95 1198328710     0.00     0.00  __gthread_active_p()
  0.76     85.65     0.84 199744948     0.00     0.00  std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >::_M_valptr() const
  0.74     86.47     0.82    42017     0.00     0.00  std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std::
shared_ptr<Ast> > >, LessAstPtr, std::allocator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > >::_M_erase(std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >*)
  0.74     87.29     0.82     6005     0.00     0.01  std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std::
shared_ptr<Ast> > >, LessAstPtr, std::allocator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > >::_M_copy(std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > const*,
 std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >*)
  0.62     87.98     0.69 197855159     0.00     0.00  icu_52::UnicodeString::doCompare(int, int, icu_52::UnicodeString const&, int, int) const
  0.60     88.64     0.66 192160000     0.00     0.00  std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >* std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const,
 std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >, LessAstPtr, std::allocator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > >::_M_create_nod
e<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > const&>(std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > const&)
  0.57     89.28     0.64 392894022     0.00     0.00  std::shared_ptr<Ast>::shared_ptr(std::shared_ptr<Ast> const&)
  0.57     89.91     0.63 394982420     0.00     0.00  std::__shared_ptr<Ast, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr()
  0.57     90.53     0.63 597938347     0.00     0.00  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_add_ref_copy()
  0.57     91.16     0.63 197855159     0.00     0.00  icu_52::UnicodeString::pinIndices(int&, int&) const
  0.55     91.77     0.61 606783091     0.00     0.00  icu_52::UnicodeString::length() const
  0.55     92.38     0.61 192064000     0.00     0.00  std::_Rb_tree_iterator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >::operator++()
  0.48     92.91     0.53 192318000     0.00     0.00  std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std:
:shared_ptr<Ast> > >, LessAstPtr, std::allocator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > >::_M_destroy_node(std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > 
  0.45     93.41     0.50 599164355     0.00     0.00  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()
  0.45     93.91     0.50 192549131     0.00     0.00  std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std:

Heap measurements:

Log 060215


Sunday, May 31, 2015

Log 053115

Well. The circle I am in is: Think random thoughts until I've got a reasonable grasp on all invariants and the compilation scheme I want to support, work up some appetite to translate it to code, code a bit.

It's mind-numbing. Bosch was right, I feel like the bloke above.

I am not sure yet to support a naming scheme which interfaces with C object files right. I need something where if a combinator 'foo' is declared in namespace 'baz.baf' in some file 'qux.hi' it unambiguously 'knows' the symbol name of the C function representing 'foo' such that it will link...

You'ld expect a compiler is all about parsing and typing but, honestly, that's only because these are the trivial aspects academics can concentrate on easily to publish their greeklish.

The mundane world of naming stuff. Tss.

Friday, May 29, 2015

Identification Hell

Hi has a pretty modern syntax. It has namespaces, qualified names, implicitly declared variables, doesn't force you to write -for instance- upper case letters in pattern matches, and uses the 'dot' selector liberally.

The downside is that, preceding the rest of the semantic analysis, it employs an identification pass to disambiguate and clearly identify which (qualified) name is bound to what constant or variable. It flattens namespaces, declares unbound variables, disambiguates expressions with dots in it, etc. After the identification pass, the compiler proceeds like you would expect for a far simpler language.

It leads to a syntax I find very pretty but there are disadvantages to this approach: For the user, typos may be translated to unexpected compiler error messages. For the implementor, it's hellish to implement.

So, for the next few weeks I'll be stuck in identification hell. It's really something I would rather not but, yeah, I don't want to give up on my pretty little language.

Sunday, May 24, 2015

Log 052415

Another simple pass implemented, all declarations now become fully qualified within their namespace. The next pass is the first real "big" work: identification where all applied identifiers become bound to their defining occurrences.

To be safe, I did some profiling again on a big file.

-rw-rw-r--. 1 marco marco 1298048 May 24 21:19 big.hi

It takes 1.3 seconds to parse, process, and print. Still a steady 1MB/s.

Output from gprof:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  9.88      0.08     0.08  4736519     0.00     0.00  __gnu_cxx::__atomic_add(int volatile*, int)
  4.94      0.12     0.04  5241745     0.00     0.00  __gnu_cxx::__exchange_and_add(int volatile*, int)
  3.70      0.15     0.03  9536286     0.00     0.00  icu_52::UnicodeString::isBogus() const
  3.70      0.18     0.03  3486210     0.00     0.00  TokenVector::look(unsigned int)
  3.09      0.21     0.03 18877218     0.00     0.00  icu_52::UnicodeString::length() const
  2.47      0.23     0.02  5010563     0.00     0.00  Token::Token(Token const&)
  2.47      0.25     0.02  4734982     0.00     0.00  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count(std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&)
  2.47      0.27     0.02  3486210     0.00     0.00  std::vector<Token, std::allocator<Token> >::operator[](unsigned long)
  2.47      0.29     0.02  2270618     0.00     0.00  std::vector<std::shared_ptr<Ast>, std::allocator<std::shared_ptr<Ast> > >::begin() const
  2.47      0.31     0.02  1483420     0.00     0.00  std::_Vector_base<std::shared_ptr<Ast>, std::allocator<std::shared_ptr<Ast> > >::~_Vector_base()
  2.47      0.33     0.02   366209     0.00     0.00  Token::Token(token_t, Position const&, icu_52::UnicodeString const&)
  2.47      0.35     0.02   267648     0.00     0.00  adjust_reserved(Token&&)
  2.47      0.37     0.02    84864     0.00     0.00  Parser::is_qualified_prefix_operator()
  1.85      0.38     0.02  8336922     0.00     0.00  Position::Position(Position const&)
  1.85      0.40     0.02  7964955     0.00     0.00  Position::~Position()
  1.85      0.41     0.02  4374018     0.00     0.00  std::shared_ptr<Ast>::shared_ptr(std::shared_ptr<Ast> const&)
  1.85      0.43     0.02  1587171     0.00     0.00  void std::_Destroy<std::shared_ptr<Ast>*>(std::shared_ptr<Ast>*, std::shared_ptr<Ast>*)
  1.23      0.44     0.01  6489602     0.00     0.00  StringCharReader::end()
  1.23      0.45     0.01  6013970     0.00     0.00  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count()
  1.23      0.46     0.01  5666315     0.00     0.00  std::__shared_ptr<Ast, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr()
  1.23      0.47     0.01  5421190     0.00     0.00  std::shared_ptr<Ast>::~shared_ptr()
  1.23      0.48     0.01  3486311     0.00     0.00  std::vector<Token, std::allocator<Token> >::size() const
  1.23      0.49     0.01  2441729     0.00     0.00  Token::tag() const

Looks good, around 15-20% overhead for reference counting.

Heap usage output from valgrind:

Nothing I want to change either. Looks like some spikes from lexing, parsing, and the few passes. I wonder whether libicu retains string data but, if so, nothing I want to do about that.

Addendum: Found the line still a bit puzzling. Did another leak check:

==23682== HEAP SUMMARY:
==23682==     in use at exit: 3,804 bytes in 5 blocks
==23682==   total heap usage: 9,992 allocs, 9,987 frees, 1,670,914 bytes allocated
==23682== LEAK SUMMARY:
==23682==    definitely lost: 0 bytes in 0 blocks
==23682==    indirectly lost: 0 bytes in 0 blocks
==23682==      possibly lost: 0 bytes in 0 blocks
==23682==    still reachable: 3,804 bytes in 5 blocks
==23682==         suppressed: 0 bytes in 0 blocks
==23682== Rerun with --leak-check=full to see details of leaked memory
==23682== For counts of detected and suppressed errors, rerun with: -v
==23682== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Nothing to worry about?


I implemented another pass but now have doubts about whether I am tackling the problem right. These doubt stem from switching from a declarative description to implementing the compiler impurely in C++ and the transform I implemented.

For the bootstrap compiler, I was forced to reason within a declarative pure framework. Because a compiler is bookkeeping plus term rewriting, I more or less naturally decided to shift as much of the work to term rewriting as I could because of the lack of imperative manners of bookkeeping. The compiler rewrites the term, hoists a number of local definitions to the top level, and the only minimal bookkeeping it does is keeping an environment table around to look up the definitions of applied identifiers, it furthermore uses an impure counter to create fresh variables.

This is a fine and robust manner when working declaratively. All information is neatly represented in the tree and each pass can be visually inspected; the environment can be, and is, constructed on the fly when needed.

But I wonder whether I will end up with a rather odd beast when implemented impurely. I could also implement it more imperatively and shift work back to bookkeeping. People from the C++ world will probably find it a very odd compiler that it doesn't do the bookkeeping directly by impurely creating tables with references around to the various constructs in the tree and rewriting the tree in-situ.

The second doubt is whether I got the current transform right. I really appreciate minimalism but in a flurry of code I shifted from a push model to a pull model as basis for rewriting the tree. I.e., in the bootstrap compiler a typical transform looks like this:

    [ decl_space po c id aa dd -> 
        (rr0, e) = transform_child transform_metintro bindings.empty e;
        (rr0, e) = add_bindings rr0 e;
    | part po c id im dd -> 
        (rr0, e) = transform_child transform_metintro bindings.empty e;
        (rr0, e) = add_bindings rr0 e;
    | decl_interface po c id aa t k -> 
        rr0 = reduce_method_constraint id t bindings.empty;
            (bindings.union rr rr0, e)
    | e -> (rr,e) ]

It handles some special cases and 'pushes' the rewriting process further by 'asking' the transform to rewrite a node and its children.

In C++ I now have rewrites like this:

class RewriteBind: public Rewrite {
    RewriteBind(): _state(AstEmpty().clone()) {

    // state manipulation
    void set_state(const AstPtr& a) {
        _state = a;

    // rewrites
    AstPtr rewrite_decl_type(const Position& p, const AstPtr& n, const AstPtr& t) override {
        auto t0 = rewrite(t);
        return AstDeclType(p, n, t0).clone();

    // cuts
    AstPtr rewrite_decl_definition(const Position& p, const AstPtr& n, const AstPtr& t, const AstPtr& e) override {
        return AstDeclDefinition(p, n, t, e).clone(); // cut

    AstPtr _state;

The transform rewrites everything and it is manually directed with various cut-offs where not to continue rewriting. The drawback is a lot of cuts, I somewhere prefer the original solution. I guess nobody looks at the source code of a compiler so I'll guess I'll stick to it. But doubts.

Saturday, May 23, 2015

Log 052315

Right. I implemented the second transform. It took some pondering but, as it turned out, it wasn't hard once I got the invariants, there were little, right. Because I dumbed down the language, the pass only needs to collect the quantified variables it encounters in a local state, determine new record declarations at each record statement rewrite, and pass the new constructed declarations to the top level.

The subsequent implementation of the pass is somewhat verbose, but somewhere much simpler than in the bootstrap. Just a class which implements a rewrite and has as local state a variable list and a set of declarations. I didn't need the nested context yet.

I am inclined to say that imperative wins but I am near certain the pass in the bootstrap can be rewritten more concisely. Must have been due to trying to use the transform function I had just invented, the changes I made during the development, and the more elaborate type system I was unsure I would, or could, support at the time of writing.

Legacy code.

Ah well.

The Battle: Declarative versus Imperative

While my pet project is rapidly deteriorating to a project for a good C++ programmer, which I am not, I still need to implement the next pass.

I implemented a general imperative construct which allows me to keep track of bindings. I did it in the straightforward manner in which most impure compilers keep track of an environment: A nested set of declarations. Basically, enter a scope, declare some (unique) variables, pop the scope. I decided not to worry too much about performance issues anymore, there are lots of cycles still to waste and memory is cheap.

It's implemented as a stack (vector) of maps from AST nodes to AST nodes. Entering a scope costs 600 bytes, something which I would have found appalling twenty years ago. Now I guess it's fine. Some testing showed that C++ easily handles scoping depths of 100,000.

The question becomes how to implement the pass below. It does something trivial, it lifts the constructors in a type declaration to typed declarations in the context. E.g., in a type 'type list = \a => [ nil | cons a (list a)]' the record 'cons' becomes a function 'def cons: !a => a -> list a -> list a = [ x, y -> Cons x y]'.

It's a purely syntactic transformation which, I am not entirely content with it, precedes the rest of the semantical analysis. As you can read yourself, it's a transform which passes a map of bindings around. Now to do it imperatively...

namespace records (

    using system

    using list

    using position

    using ast

    using transform

    using bindings

    def transform_rec_search: ast -> transform bindings =
        type_arr = [ t0, t1 -> type_arrow (pos t0) t0 t1 ];
        type_app = [ t0, t1 -> type_application (pos t0) t0 t1 ];
        search_child = 
            [ t, p, e -> 
                transform_child (transform_rec_search t) p e ];
        record_update_type =
            [ f, decl_record p c id aa i t e  -> 
                decl_record p c id aa i (f t) e
            | f, e -> e ];
        to_record_type = fix
            [ to_record_type, nil, t -> t
            | to_record_type, cons t0 tt, t -> 
                type_arr t0 (to_record_type tt t) ];
        record_abstraction =
            [ nil, e -> e
            | cons v vv, e -> 
                p = pos v;
                e = expr_match p (cons v vv) (expr_empty p) e;
                e = expr_abstraction p e;
                e ];
        [ t, rr, e -> 
        [ type_record po e0 t0 tt -> 
            ta = to_record_type tt t;
            type_to_var = [e -> fresh.fresh_expr_variable (pos e)];
            vv = type_to_var tt;
            nn = var_to_name vv;
            r  = record_abstraction vv (expr_record po e0 t0 nn);
            d  = decl_record po (expr_empty po) e0 nil t0 ta r;
            (bindings.insert (txt e0) d rr,e)

        | type_universal po t0 t1 -> 
            (rr0,_) = search_child t bindings.empty e;
            rr0 = map.apply 
                (record_update_type [e -> type_universal po t0 e])
            (bindings.union rr rr0, e)

        | type_abstraction po t0 t1 -> 
            (rr0,_) = search_child (type_app t (var_to_name t0)) 
                    bindings.empty e;
            rr0 = map.apply 
                (record_update_type [e -> type_universal po t0 e])
            (bindings.union rr rr0, e)
        | e -> search_child t rr e ] e ]

    def transform_recintro: transform bindings =
        bindings.to_records =
            [ rr -> [(n,e) -> e] (map.to_list rr) ];
        add_bindings =
            [ rr, decl_space po c id aa dd -> 
                (bindings.empty, decl_space po c id aa 
                    (dd ++ bindings.to_records rr))
            | rr, part po c id im dd -> 
                (bindings.empty, part po c id im
                    (dd ++ bindings.to_records rr))
            | rr, e   -> (rr,e) ];
        [ rr, e -> 
        [ decl_space po c id aa dd -> 
            (rr0, e) = transform_child transform_recintro 
                bindings.empty e;
            (rr0, e) = add_bindings rr0 e;
        | part po c id im dd -> 
            (rr0, e) = transform_child transform_recintro
                bindings.empty e;
            (rr0, e) = add_bindings rr0 e;
        | decl_type po c id aa t k ->
            transform_rec_search id rr e
        | e   -> (rr,e) ] e ]

    def ast_record_intro: ast -> ast =
        transform_ast transform_recintro bindings.empty


Friday, May 22, 2015

Log 052215

Implemented a general compare function on the AST pointers. Did some memory profiling. Valgrind is my best friend. For future reference:

valgrind --tool=massif --pages-as-heap=yes --massif-out-file=massif.out.

I really have to get used to how much memory is allocated these days. Printing an AST:

-rw-rw-r--. 1 marco marco 2596096 May 22 20:26 test0.hi

Right,source code of 2,596,096 bytes becomes 279,101,440 bytes heap usage; 2.6 seconds to print. Ah well, at least I don't expect Moore's law to end soon when it comes to memory.

Plotted (x-axis is sample number over time, the 2.6 seconds), it becomes:

Right. Two spikes I don't get. Possibly it copies the entire vector of declarations while rewriting it. Nothing I want to solve at the moment.

The steady climb is a bit worrying, but valgrind reports no memory leaks. Moreover, I am pretty sure I do everything right, it copies pointers a lot while visiting the AST, a lot of refcounting but no 'real' pressure on the garbage collector. Probably due to that the AST remains in memory until the program exits, or C's memory allocator 'gives up' upto the next garbage collection point (yes, C does garbage collection too); yeah well, RAII...

From Toy to Tool

The first transform I implemented is also the simplest. I've no idea how I did it but, after all the fooling around, I now find myself implementing a formal spec, the bootstrap compiler.

A compiler is bookkeeping plus term rewriting. Everything boils down to how you're going to treat the information flow while rewriting a tree.

              ↓:T | ↑:T
        ↙:T / ↗:T   ↘:T \ ↖:T
       TERM0             TERM1

Most, if not all, passes implemented with functional transforms in the bootstrap pass information around, in a pure fashion using functional data structures. Functional data structures are great when working declaratively; there's no shared state so all invariants you want remain in place when passing them around.

Despite everything, using functional data structures often isn't as fast as proponents claim it is. It's nice if people can write Lisp compilers in a nanosecond but, usually, everything breaks down when you throw large files at a compiler written in that manner.

A scenario as I imagine the life time of a Lisp compiler: It starts of representing all declarations in a program with a list. Given more than 5K declarations it hits the first performance problem so that is replaced with a red-black tree. Given more than 40K declarations it hits the next bottleneck so that becomes a hash map. Given more than 150K declarations, new issues, and that becomes a data structure which pushes and pops ranges of hash maps. Given more than 500K declarations that breaks down, it ends with a fast imperative data structure where memory is allocated by hand.

Despite the fact that I fully appreciate the interpreter will probably never be used, I want fast imperative data structures. Ah man can dream, right?

Therefore, the next step, before writing the next passes, is thinking about how to get various imperative data structures right.

Thursday, May 21, 2015

Log 052115 - b

Mightily pleased with myself. My first transform on the AST. Lightning speed.

class RewriteBind: public Rewrite<AstPtr> {
    RewriteBind(): Rewrite(AstEmpty().clone()) {

    void clear_state() {

    AstPtr rewrite_decl_type(const Position& p, const AstPtr& n, const AstPtr& t) {
        auto t0 = rewrite(t);
        return AstDeclType(p, n, t0).clone();

    AstPtr rewrite_expr_record(const Position& p, const AstPtr& f, const AstPtr& i, const AstPtrs& aa) {
        auto i0 = get_state();
        return AstExprRecord(p, f, i0, aa).clone();

    AstPtr rewrite_decl_interface(const Position& p, const AstPtr& n, const AstPtr& t) {
        auto t0 = rewrite(t);
        return AstDeclInterface(p, n, t0).clone();

    AstPtr rewrite_decl_instance(const Position& p, const AstPtr& pr, const AstPtr& t, const AstPtr& b) {
        auto b0 = rewrite(b);
        return AstDeclInstance(p, pr, t, b0).clone();

    AstPtr rewrite_decl_method(const Position& p, const AstPtr& n, const AstPtr& i, const AstPtr& t, const AstPtr& e) {
        auto i0 = get_state();
        return AstDeclMethod(p, n, i0, t, e).clone();

    AstPtr rewrite_decl_definition(const Position& p, const AstPtr& n, const AstPtr& t, const AstPtr& e) {
        return AstDeclDefinition(p, n, t, e).clone(); // cut

Log 052115 - a

This is the complete definition of the first of the many small passes the bootstrap performs. And it has been five years since I worked on it; which is a good thing, because I needed to forget all this to even get the appetite to start all over in that verbose, odd, dialect of a language called C++. And I am baffled how I managed to make it that concise and I wouldn't know yet how to translate it into efficient, mostly imperative, C++ code. At the moment, I can only estimate what it does.

I have no idea what I was smoking in that period of my life or how I ever got the bootstrap functioning?

namespace bind (

    using system

    using position

    using ast

    using transform

    def find_constructor_name: ast -> ast =
        [ type_universal p t0 t1    -> find_constructor_name t1
        | type_abstraction p t0 t1  -> find_constructor_name t1
        | type_constrain p t0 t1 t2 -> find_constructor_name t2
        | type_instance p t0 t1 ee  -> find_constructor_name t1
        | type_application p t0 t1  -> find_constructor_name t0
        | t -> t ]

    def reshape_bind: reshape ast =
        [ i0, type_record  p  i t _e -> 
            type_record p i (set_pos p i0) _e
        | i0, decl_method p c id aa i t _e -> 
            decl_method p c id aa (set_pos p i0) t _e
        | i0, decl_bind p c id aa i t _e -> 
            decl_bind p c id aa (set_pos p i0) t _e
        | i0, e -> reshape_child reshape_bind i0 e ]

    def reshape_bind_search: reshape unit =
        [ u, e ->
        [ decl_space p c id aa dd -> 
            reshape_child reshape_bind_search u e
        | part p c id im dd -> 
            reshape_child reshape_bind_search u e
        | decl_interface p c id aa t k -> 
            reshape_child reshape_bind id e
        | decl_type p c id aa t k -> 
            reshape_child reshape_bind id e
        | decl_instance p c aa t -> 
            reshape_child reshape_bind (find_constructor_name t) e
        | e   -> e ] e ]

    def ast_bind: ast -> ast =
        [ e -> reshape_bind_search nop e ]


C++ is a Bloody Fine Functional Language

Well. I am having great fun, C++ is a bloody fine language; if it isn't a fine functional language, it is a fine language for implementing compilers. I implemented my first transform. I didn't opt for the visitor pattern, or a new solution by Stroustrup et. al. (too unstable), and I don't regret it one bit. Transforms are easily expressed, although be it a bit verbose.

The original bootstrap is implemented with one transform which both passes information around as well as rewrites; all other transforms are derived from that. Very heavy on overhead, but the bootstrap was supposed to be correct and robust, not fast. In the C++ compiler, I'll write out various manner of visiting the AST by hand.

I tested a transform, based on a template class with a shared state, which simply counts the number of nodes and, by transforming, copies the entire graph. The overhead of reference counting somehow dropped to negligible, no idea why. If I assume 1MB/s overhead for parsing, a copy of an entire graph on a 1MB source is roughly .3 seconds. More than adequate. I am laughing my head off.

The sharing solution might beat copying in the end. I now recommend it.

Wednesday, May 20, 2015

Log 052015

So. The parser is finished but I found a bug in infix parsing. I'll need to rethink it once, I probably need some scheme where I mimic an attribute grammar solution by passing a 'bottom' precedence operator in to the infix parsing rule. Ah well. There's a lot of work, it would be better if I just leave it as it is. For version 0.2?

I patched some rendering routines together to support a crude view on the AST. Nothing too elaborate, but it works for now and I expect to be debugging a lot.

It roughly parses 1MB/s.

I would really enjoy some memory usage statistics. Naive compilers usually waste memory on the internal representation and this compiler is pretty naive, but C++. Ah well, nobody is interested in declarative languages anyway, I guess.

Tuesday, May 19, 2015

Log 051915

C++ has fine tools. With valgrind I looked at memory leaks; none so far, RAII works. With gprof I debugged a performance issue while printing ridiculously large series of declarations. If you don't stick to the C++ manner of iterating over a vector, performance degrades because somewhere internally it a) holds a list of buckets to store all elements and iterating in the C manner searches for the bucket each step, b) it insists in allocating an extra object for each lookup. I don't understand the latter but after some tinkering using C++11's manner of iterating solved the problem. Printing performance dropped from 11 to 3 seconds on a large file.

It spends 40% of the time reference counting while printing which is a good indication of the overhead while rewriting terms. If a pointer comes into scope it needs to increase the reference count, if it leaves a scope it needs to decrease it; there's nothing to be done about it, and it will be like this for all rewriting functions.

After the parser recognized a small subset of the language, definitions and expressions, I now am expanding to the full language. The AST explodes, every alternative becomes about 50 lines instead of one. It's inconvenient and a lot of work.

The real hurdle will be whether I can express transforms concisely in C++. Well, it won't be very concise, pattern matching must be replaced by tedious downcasting and picking a record apart through accessor methods. I'll probably write some macro's for that, even if I find programming with macros bad practice. But I need some generic solution.

I need a transform which supports this:

               ↓:T | ↑:T
          ↙:T / ↗:T  ↘:T \ ↖:T
       TERM0              TERM1

The arrows denote the information flow around the terms during a transform where a term may also be rewritten. By fiddling with the information flow locally in each rewrite step it's not very difficult to implement semantic analysis and by fiddling with the expressions returned it is not difficult to implement rewrite passes. It more or less generalizes attribute grammar approaches to compiler construction.

The transform should be a generic function which leaves information flow and terms intact, unless you override certain cases. In that manner, without spelling out all functionality for each leave, you end up with concise definitions of passes, albeit that the transform may not be very fast.

Well, that's it. There's progress but still two to three years of work left. I fully understand why people implement untyped scripting languages, it's just so much easier.

ADDENDUM: I am nearly finished with the parser. I severely restricted the input grammar for types; it used to have a liberal free-flow grammar where pretty absurd types would be accepted since I was rushing the compiler to a bootstrap. This time, since I know I have a compilation scheme, the type system will go first. So, no explicit forall constructs, no predicates on type synonyms, etc. Dumbing down continues.

Sunday, May 17, 2015

Term Rewriting with Shared Pointers is a Major Design Decision

A compiler is bookkeeping plus term rewriting. Old compilers were usually full of bookkeeping through the use of tables. They needed to be since resources were scarce and performance was low. As processors evolved and became more powerful, compilers became more declarative and shifted in the direction of term rewriting.

Hi is a declarative language, so naturally, the bootstrap relies little on tables but simply rewrites, massages, the term until it is semantically verified and code can easily be output in a number of steps.

The bootstrap lacked in performance, due to:
  1. Wasteful memory representation of the data being worked upon.
  2. A slow operational model, combinator rewriting instead of the C calling convention.
  3. Lack of imperative constructs.
  4. Lack of fast impure containers.
  5. Garbage collection.
I could have solved 1., made an effort to fix 2., added 3., and ultimately added 4. (but that only becomes necessary, probably, when processing large, >1MLoc, files) and, in a final last effort, implement features to do 5. fast.

The path of a selfhosting declarative compiler seems to be that you keep on adding imperative features until you've reinvented C. I personally loath the Haskell IO monad and barely agree with Clean's uniqueness types. For Hi, it isn't worth it: The language is fine for small programs, making it fast and selfhosting is a 20-100 man year effort, which I don't have, can fail anyway, and I want to keep the language simple and declarative.

Most people agree we hit the end of Moore's Law, the era of the free lunch is over. We had a 3500x speedup over the last decades since the introduction of the first microprocessor, declarative languages cannot wait anymore for another few decades until they become that fast that people want to use them.

So, I decided to ditch the effort and do it all in C++. Fast interpreter, slow language. If it would ever attract a user base, there are plenty of optimizations left.

But since I have a term-rewriting self-hosting compiler as a design reference, naturally the choice for representing terms comes up. Since I rewrite a tree, or DAG, I basically have two choices: use trees and copy, or use DAGS and support sharing with some form of garbage collection. So, it's either copying or sharing. Both have nonperforming corner cases: trees are fast but you might spend too much time copying, for reference counting the overhead may become large on terms when only rewriting trees and the reference counting degrades to pure overhead.

I made the choice to represent the AST with C++'s shared pointers. After having looked at GCC, which uses a pointer-to-massive-struct copying system internally, I wanted to test performance of a sharing solution on a 500KLoc file, a large list of small declarations, with respect to copying. It spends 40% of its time reference counting.

For industrial compilers this would imply that copying beats sharing; at least, pending the implementation of C++'s shared pointer.

It does 30ms over a small file of 5k declarations, so I'll stick to this scheme of shared pointers. It's just too convenient implementation wise, if a node gets rewritten or falls out of scope in a C routine, it is automagically garbage collected. But boy, oh boy, it is horrific to pay performance for design decisions this early in the implementation stage.

Saturday, May 16, 2015

Log 051605

Working on the AST, parsing, and rendering code simultaneously. Verbose, but after I sank my teeth in it, not unpleasant. Dijkstra would hate the manner in which I write code. Ah well, he turned out to be somewhat of an idiot when it comes to software engineering anyway. So, whatever.

Later that day: *poof* After a flurry of activity I managed to get a rudimentary grammar recognizer ready. I immediately did some performance tests. On small files, fine; on a 500KLoc file, not so good. Somehow parsing took 11 seconds but rendering becomes slower as the file grows larger and I broke that off. I blame smart pointers for now.

Ah well. Guess if anyone will be using it it'll take a long while before people start writing 500KLoc files.

Tuesday, May 12, 2015

Log 051215

A bit of reading while thinking about qualified names and namespaces. Stuff I noticed. Chicken Scheme has a problem with running out of stack space too. There is a nice language named Nim under development, glancing through it, it seems to mostly be an Algol like derivation. But fast and bootstrapped, I like it. Haskell allocates large objects in chunks to keep GC happy. If you compile to straightforward C it's actually quite difficult to keep the GC manager happy due to the fact that at any moment the C routine might ask for memory, which might trigger a collection, and ruin all invariants in the C routine. Using C reference counting completely avoids that problem. (Or you use bytecode, as in Clean, OCaml and Java, and avoid the problem by keeping the complete machine abstract.)

There is a small but major design decision in the Ast representation. Lists or vectors? Lists allow for sharing and a declarative style, vectors might be faster but at the expense of ugly imperative code and copying. After some dabbling I threw up a coin and decided vectors.

I continue with my streak of shooting myself in the foot and am going to implement namespaces no matter what. Though it's academically uninteresting and pretty annoying to get right algorithmically.

I am leaning towards dropping the combinator rewriting system... (But what about OS's where the stack size is a hard limit? No MacOS?).

Eager. Pure. No GC. No combinator rewriting. HM inference. No bootstrapping. No cyclic structures. C semantics. It'll be completely academically uninteresting. But hopefully a great small tool for patching small functional programs. A front-end to LLVM.

(As a side note: Because of the end of Moore's law I expect the next thing to be vertical integration which means Hardware OS's; or since OS's are written in C/C++, language integration. All current research paywalled, unfortunately.)

Sunday, May 10, 2015

Log 051015

So you thought you could do term rewriting in C++, ey? I got the idiom for representing ASTs in C++ figured out. Of course, the idiom is well-known, it's an instantiation of this, but it took me a while of struggling with the template library and some googling to get right.

Basically, terms in the datatype derive from an abstract class. Since an abstract class cannot be instantiated, C++ more or less forces you to pass references back as the result of functions. Moreover, since the abstract class doesn't know how to copy objects from the stack to the heap, the solution is to write a copying constructor and clone method for each variant.

This is a lot of cruft, the source code is going to explode on it. It's a very heavy price to pay for not being willing to implement the interpreter/compiler in a functional language such as SML.

But yeah. I chose C++ because reasons.

Friday, May 8, 2015

Log 050915

Moving stuff around in the sources, trying to get an idiom right on how to handle shared pointers in C++, adjusting a provisional lexer such that it will scan a test language, rethinking qualified names and namespaces, thinking about how to represent the parser right in C++. Unmotivating tidbits of engineering.

Type checker in the back of my mind. I am rethinking the ideas I had five years ago. I re-realize that there probably is no subtyping relation on types. Any concrete type can implement multiple interfaces at some given point, and that may be extended. Once extended, all old type derivations remain unchanged. Simply put: any concrete type just gets augmented with a list of interfaces and simple HM inferencing is enough. (Though it would be nice to put arbitrary values sharing interfaces in a container.) Moreover, since the language lacks assignment, there shouldn't be the need for a monomorphism restriction a la ML. Of course, you might want to store stuff at the module level but I would be perfectly happy labeling primitive routines for that 'unsafe'.

Since the language has exceptions I actually think I want RAII too. Which means 'destructors' on garbage collected nodes referring open files, database connections, etc. Which means I am inclined to drop my copying garbage collector and simply reference count everything. Even more wasteful but I can use C++ shared pointers for that and get thread safety for free. Moreover, good enough since the language guarantees that you're rewriting a DAG anyway.

It's just a trivial language. The lack of use cases means I shift between the semantics. If I read on a programmer's language I want to go into the direction of representing bytes, words, longs, etc.; if I read up on scripting languages my mood changes and I want to go for BigInts. Whatever, or not?

Log 050815

So no programming today. Family business, which was a good idea since I am easily mentally worn-out these days. But it remains in my mind.

So, a few random thoughts:

Trading robustness for speed. I rewrite everything to an eager manner of rewriting combinators to avoid stack overflows. I am not sure people willing to use the language will appreciate the, rather enormous, cost in running speed. If I am going to compile to LLVM, or some other JIT since LLVM seems to have performance problems, it becomes 'trivial' to encode to standard function calls. Moreover, a rather large number of optimizations would become feasible, or could easily be done by the LLVM compiler.

But you sacrifice robustness, and after a myriad of stack overflows bootstrapping in OCaml I don't want to go back.

As a side note, I also figured out how to handle closures in that scenario. The function knows when to reduce or push a closure so that's trivial. Exception handling is also enabled by default in the modern processor ABIs so I figure I don't even need to do difficult stuff for that.

Who will ever use this language? This is really a difficult question. FP enthusiasts? They already have a large number of compilers to chose from. SML or Haskell probably will gobble all of them up. I thought numeric computing for a moment, but those favor simple algorithmic languages. I read up on Julia, these programmers think fundamentally different than functional programmers and want optimal speed. I thought about web servers, which may be a target for some people, in which case I need to think about handling threads and shared state. Then I thought that it might be nice to write simple robust GUI applications; but who would choose my language over Python? I need a convincing use case. Some place where you need ridiculously robust software?

But I still like the language.

Compiling to Object files and C/C++ integration. Given a JIT compiling to .o files is automated for you, which is something I really want. But how the hell does the linker know about namespaces? Is there some name mangling going on I don't know about? Answer here, it seems.

Thursday, May 7, 2015

Log 050715

So combinator parsing in C++ turns out to be a bad idea, not only is it near impossible to get right most C++ implementations need several seconds for large files whereas the 'dumb' method usually finishes in milliseconds, needed to unlearn the functional style. Back to straight-forward recursive descent with switches on the current token.

I think I got how to write a far more trivial type checker than what I once started on. Straightforward Hindley-Milner with a subtyping relation on types implementing interfaces. It shouldn't be hard, I am also dropping all other ideas I had. Bottom-up HM+subtyping is good enough.

I've got a start of an AST description now going. Going to reference count everything through the use of C++ smart pointers; that's a bit overkill but it beats thinking about copying.

For downcasting it's going to be tag based switches followed by a static cast.

Everything is of course far more verbose than the bootstrap which I am using as a reference implementation.

So far so good. Making progress but this is probably still the easy part.

ADDENDUM: I thought I would share the AST solution I chose for people who might have ended up here by chance.

1) Define tags for your expressions:

typedef enum {
} expr_t;

2) Define a base object:

class Expr;
typedef std::shared_ptr<Expr> ExprPtr;

class Expr {
     Expr(expr_t t): _tag(t) {};

     expr_t  tag() { return _tag; }

     virtual ExprPtr clone() = 0;
     expr_t _tag;

3) Derive variants, for instance

class ExprApplication: public Expr {
      ExprApplication(ExprPtr e0, ExprPtr e1): Expr(APPLICATION), _left(e0), _right(e1) {}
      ExprApplication(const ExprApplication& e) {
          _left = e._left;
          _right = e._right;

      ExprPtr clone() {
          return ExprPtr(new ExprApplication(*this));

      // etc.

    ExprPtr _left;
    ExprPtr _right;

It took a lot of googling but the idiom for using shared pointers where you mostly only reference the base class is aptly described here.

Well, that's it. Don't create cycles, as long as your term is a DAG, most likely it will always be a tree, this scheme should work with RAII and might even allow for memory-compact solutions far better than simply copying everything.

Wednesday, May 6, 2015

I mean like: Wow

I've been reading the entire Internet trying to track down how people write compilers, more precisely, parsers and abstract transformers in C/C++.

One post claimed the roughly the following: Compiler construction is a well-defined and finished trivial field where grammar theory is fully developed and compiler writers know very well how to implement the various aspects of compilers.

Then I read various sources, even GCC and CLang. The parsing compiler theory, roughly described as: bottom-up parsing is the most general and serves all your needs, has been abandoned for hand-written PEG recursive descent parsers with infinite lookahead; moreover, all pretty LALR and other compiler generation tools have been abandoned and rot away. Nobody knows what the best manner of representing an AST in C++ is and people patch the weirdest solutions together to write recursive descent parsers. I haven't even started on semantic analysis.

If all problems are solved then all the code I just read doesn't show it.

Monday, May 4, 2015

Term Rewriting using the Visitor pattern ist Verboten

I've been looking into what the correct manner of downcasting is in C++. Unfortunately, downcasting is only allowed on pointer or reference types and then in a manner I don't really like yet.

Still, I need it. Rewriting a term is best done functionally for various reasons. If you cannot imagine why, and I've seen a lot of people advising against it, and you propose a visitor pattern then you're an idiot.

Not convinced you're an idiot? Even Stroustrup agrees on that.

Which is the problem with the Internet these days. It used to be that I could google or post a question to a mailing list and get an informed answer within a few hours. These days, with widespread programming education, the Internet is full of idiots and bad advice.

So I am not sure what the proper manner of downcasting terms in C++ is. Another small puzzle.

Sunday, May 3, 2015

Rewriting Terms in C++?

Given that I am now writing a compiler in C++ the question of term rewriting and following RAII comes up. A typical scenario following RAII is given below.

Say your simple term language has elements like this:

class AppTerm: public Term {
     AppTerm(const Term &l, const Term &r): _left(l), _right(r) {};

     Term left() {
         return _left;

     Term right() {
          return _right;
     Term _left;
     Term _right;

Then, since most compilers do a lot of term rewriting, you'll have an enormous amount of code like:

Term rewrite_app(const AppTerm &t) {
     SomeResult r0 = rewrite_term(t.left());
     SomeResult r1 = rewrite_term(t.right());

     // do something

     return AppTerm(r0.value(), r1.value());

If I get C semantics right, then r0 and r1 will go out of scope and the resulting term will receive copies of whatever was constructed in the rewrite of both constituents of an application term.

I say it again: copies. Think about that for a moment. Copies.

If every copying rewrite function is going to copy entire terms for each intermediate step, as I think it will do when implemented naively, you're going to have one hell slow compiler.

Did I get C++ semantics right? The only way out is to use smart pointers it seems.