I like the prover in my bootstrap but, yeah, there's algorithm W too. So, I decided to go for that. Gone are the bisimulation proofs of type equivalences but I simply don't know what I am dropping there and can't care to figure it out. At least with W I know it has been proven correct, terminates, and gives reasonable performance on small terms.
Which also implies that I'll need to figure out how to do constraint checking with W. But, in some sense, W already checks constraints by providing a proof, a most general unifier, for types.
So. How hard can it be? Right...
Wednesday, June 24, 2015
Sunday, June 21, 2015
Log 062115
Made a few small changes to the AST. Implementing a number of routines ahead of time until I figure out what to do with the type checker.
The type checker, as implemented in the bootstrap, is okay-ish. Not too shabby. The major difference between it and other functional type checker is that it records substitutions as type equivalences. I probably could speed it up already by factoring out different propositions. Moreover, I decided against using the gathered type equivalences to attribute the tree; I could do without recording substitutions.
Still, there's this thing about being able to typecheck cyclic types. Suppose, you have a type definition which unravels to the following:
t
/ \
/\
t
/ \
/\
.. ..
The type checker I now have "records" substitutions. It should therefore be able to prove type equivalences like the above by stating an equivalence, unfolding the type once, and using the postulated type. (Conceptually, the second "t" in the picture above becomes a cycle pointing to the first "t'.)
But that doesn't seem to be a problem in practice. To prove equivalences ML compilers simply unravel types a few times and use the definition to "close the cycle".
Ah well. First kindification I guess.
The type checker, as implemented in the bootstrap, is okay-ish. Not too shabby. The major difference between it and other functional type checker is that it records substitutions as type equivalences. I probably could speed it up already by factoring out different propositions. Moreover, I decided against using the gathered type equivalences to attribute the tree; I could do without recording substitutions.
Still, there's this thing about being able to typecheck cyclic types. Suppose, you have a type definition which unravels to the following:
t
/ \
/\
t
/ \
/\
.. ..
The type checker I now have "records" substitutions. It should therefore be able to prove type equivalences like the above by stating an equivalence, unfolding the type once, and using the postulated type. (Conceptually, the second "t" in the picture above becomes a cycle pointing to the first "t'.)
But that doesn't seem to be a problem in practice. To prove equivalences ML compilers simply unravel types a few times and use the definition to "close the cycle".
Ah well. First kindification I guess.
Thursday, June 18, 2015
A Small Derivation
I completely forgot what my prover does; I rushed it anyway. So, I worked out a rough example by hand below to get some grasp on it. In a bottom up fashion it collects a mess of constraints, say it gathered
So, okay. Guess that works. I made a reasonable estimate of how constraints should be solved, dump the mess in the prover, from the antecedent I read off how to attribute the tree with the solved constraints.
It doesn't seem to handle intermediaries well, though. Guess I need to check it with the algorithm. (Hmm. Intermediaries should remain on the left-hand side of the equation?)
int:*∧ (::Ord int) ⊢t0:k0->t1:k1 = t2:k2 ∧ t2:k2 = int:ka->int:kb ∧ (::Ord t1)
It then proves that (roughly) likes this
int:*∧ (::Ord int) ∧ k2 = * ∧ t2:* = t3:k3->t4:k4:* ∧ int:*= t3:k3 ∧ int:* = t4:k4 ∧
k0 = ka =* ∧ t0: = int:*∧ k1 = kb=* ∧ t1:* = int:*
⊢
-----------------------------------------------------------
int:*∧ (::Ord int) ∧ k2 = * ∧ t2:* = t3:k3->t4:k4:* ∧ int:*= t3:k3 ∧ int:* = t4:k4 ∧
k0 = ka =* ∧ t0:* = int:*∧ k1= kb = * ∧ t1:* = int:*
⊢ (::Ord int)
----------------------------------------------------------
int:*∧ (::Ord int) ∧ k2 = * ∧ t2:* = t3:k3->t4:k4:* ∧ int:*= t3:k3 ∧ t1:k1 = t4:k4
∧ k0 = ka =* ∧ t0:* = int:*
⊢ t1:k1:* = int:kb ∧ (::Ord t1)
-----------------------------------------------------------
int:*∧ (::Ord int) ∧ k2 = * ∧ t2:* = t3:k3->t4:k4:* ∧ t0:k0= t3:k3 ∧ t1:k1 = t4:k4
⊢ t0:k0 = int:ka ∧ t1:k1 = int:kb ∧ (::Ord t1)
----------------------------------------------------------
int:*∧ (::Ord int) ∧ k2 = * ∧ t2:* = t3:k3->t4:k4:* ∧ t0:k0= t3:k3 ∧ t1:k1 = t4:k4
⊢ t0:k0->t1:k1 = int:ka->int:kb ∧ (::Ord t1)
---------------------------------------------------------
int:*∧ (::Ord int) ∧ k2 = * ∧ t2:* = t3:k3->t4:k4 ∧ t0:k0= t3:k3
⊢ t1:k1 = t4:k4 ∧ t0:k0->t4:k4 = int:ka->int:kb ∧ (::Ord t1)
-----------------------------------------------------
int:*∧ (::Ord int) ∧ k2 = * ∧ t2:* = t3:k3->t4:k4
⊢ t0:k0= t3:k3 ∧ t1:k1 = t4:k4 ∧ t3:k3->t4:k4 = int:ka->int:kb ∧ (::Ord t1)
----------------------------------------------------
int:*∧ (::Ord int) ⊢ t0:k0->t1:k1 = t2:k2 ∧ t2:k2 = int:ka->int:kb ∧ (::Ord t1)
It doesn't seem to handle intermediaries well, though. Guess I need to check it with the algorithm. (Hmm. Intermediaries should remain on the left-hand side of the equation?)
Log 061815 - B
Right. I started implementing a general prover, then after the first few lines of code, decided against it. Thing is, I have a prover which works on terms like this
It discharges each consequent given what it knows by inspecting the antecedents. Antecedents and consequents may be anything; e.g., kind equalities, type equalities, instantiation requests, etc. While proving a goal new consequents, sometimes an antecedent, may be introduced (trivially).
Thing is, I don't trust the time complexity of the algorithm. Checking for a consequent whether an antecedent exists is linear (times the term size of the antecedent), substitution is linear (times the term size), almost every step the prover takes is two of these linear, possibly quadratic, operations. As far as I know, proving in this general manner might be cubic.
So, I am working in the unknown here. Looks to me I want to keep the number of antecedents as far down as possible to keep the number of lookups and substitutions down, might also be that it is cubic but pragmatically that doesn't matter since you only operate on small numbers of small terms, lastly, since I sometimes am doing a 'bisimulation' proof, I might actually need to do it this way.
Stuck.
a0 ∧ ... ∧ an ⊢c0 ∧ ... ∧ cm
It discharges each consequent given what it knows by inspecting the antecedents. Antecedents and consequents may be anything; e.g., kind equalities, type equalities, instantiation requests, etc. While proving a goal new consequents, sometimes an antecedent, may be introduced (trivially).
Thing is, I don't trust the time complexity of the algorithm. Checking for a consequent whether an antecedent exists is linear (times the term size of the antecedent), substitution is linear (times the term size), almost every step the prover takes is two of these linear, possibly quadratic, operations. As far as I know, proving in this general manner might be cubic.
So, I am working in the unknown here. Looks to me I want to keep the number of antecedents as far down as possible to keep the number of lookups and substitutions down, might also be that it is cubic but pragmatically that doesn't matter since you only operate on small numbers of small terms, lastly, since I sometimes am doing a 'bisimulation' proof, I might actually need to do it this way.
Stuck.
Next Step: Kind and Type Checking
Looks like the module system is finished in that it passes a minimal unit test. I.e., given the next minimal non-sensical unit test:
That gets elaborated to
So, that's good enough for now.
The next step is deciding whether I am going to do semantical analysis or compile to LLVM code such that the interpreter will run. I decided to go for semantical analysis first.
The first step in semantical analysis is kindification, checking type denotations have the correct kinds. There are some minor design decisions in whether I'll attribute type definitions and type variables with kinds, or only semantically check code.
I also need to think over the scheme of how the kind, and later type, checker will work. What I do in the bootstrap compiler is gather up constraints in a bottom up fashion and dispatch the constraints to a general prover. I like that implementation but I am somewhat concerned about performance.
It's either: a) bottom up solve all constraints in situ, or b) bottom up gather constraints and dispatch to a prover and I don't know what to choose yet.
import "util.hi" namespace f.g ( interface Some = #t where ( def x : t-> v ->unit ) instance Some int = ( def x : unit = nop ) ) namespace i.j ( def h: unit = nop def test: unit = h.f.g.x ) namespace k ( using f.g using i.j def test1: unit = h.x ) type list = \a => [ nil | cons a (list a) ] def ++: list a -> list a -> list a = [ nil, yy -> yy | cons x xx, yy -> cons x (xx ++ yy) ]
That gets elaborated to
import "util.hi" namespace f.g ( interface f.g.SomeTC = (!vTV => #tTV where ( method xC {SomeTC} : (tTV -> (vTV -> unitTC)) = (XXX) )) instance f.g.SomeTC f.g.intTC = ( binding xC {intTC} : unitTC = nopC ) select f.g.xC {f.g.SomeTC} : (!vTV => (!tTV => (::SomeTC tTC => (tTV -> (tTV -> (vTV -> unitTC)))))) = (XXX) ) namespace i.j ( def i.j.hC: unitTC = nopC def i.j.testC: unitTC = (i.j.hC@f.g.xC) ) namespace k ( using f.g using i.j def k.test1C: unitTC = (i.j.hC@f.g.xC) ) type listTC = (\aTV => [ nilC{listTC} | consC{listTC} aTV (listTC aTV)]) def ++C: (!aTV => ((listTC aTV) -> ((listTC aTV) -> (listTC aTV)))) = [ nilC, yyV -> yyV | consC{list} xV xxV, yyV -> ((consC xV) ((++C xxV) yyV))] record nilC{listTC}: (!aTV => (listTC aTV)) = nilC{listTC} record consC{listTC}: (!aTV => (aTV -> ((listTC aTV) -> (listTC aTV)))) = \v#0V, v#1V -> consC{listTC} v#0V v#1V
So, that's good enough for now.
The next step is deciding whether I am going to do semantical analysis or compile to LLVM code such that the interpreter will run. I decided to go for semantical analysis first.
The first step in semantical analysis is kindification, checking type denotations have the correct kinds. There are some minor design decisions in whether I'll attribute type definitions and type variables with kinds, or only semantically check code.
I also need to think over the scheme of how the kind, and later type, checker will work. What I do in the bootstrap compiler is gather up constraints in a bottom up fashion and dispatch the constraints to a general prover. I like that implementation but I am somewhat concerned about performance.
It's either: a) bottom up solve all constraints in situ, or b) bottom up gather constraints and dispatch to a prover and I don't know what to choose yet.
Log 061815 - A
After I rewrote passing the environment through copying, that continued bugging me since copying is expensive. It also dawned to me where I made the mistake; I copied a local reference, modified that, and never passed it back before.
I rewrote the compiler such that the environment is passed as a pointer. It makes more sense and a lot of passes will use the environment, I didn't like the idea of spurious copying.
I also noticed again how computationally expensive the identification phase is, but I am not going to change that. It's just one pass and can hopefully easily be rewritten at a later stage.
I rewrote the compiler such that the environment is passed as a pointer. It makes more sense and a lot of passes will use the environment, I didn't like the idea of spurious copying.
I also noticed again how computationally expensive the identification phase is, but I am not going to change that. It's just one pass and can hopefully easily be rewritten at a later stage.
Tuesday, June 16, 2015
Got Bugs
Well, I have been trying to understand a bug for the last few hours. Somehow, the AST doesn't get updated neither do I build a correct environment. Guess I got some of the semantics of C++ wrong[1][2], but I don't see where.
[1] Riiiight. auto& m:mm (reference) instead of auto m:mm (local value) was one bug. Now the other one.
[2] I couldn't get a local update for environments right. Probably should pass the environment as a pointer instead of a reference but I am not sure. Solved it with something which looks like a copying scheme... Might be bloody expensive. I need a good C++ programmer to work on this project too.
[1] Riiiight. auto& m:mm (reference) instead of auto m:mm (local value) was one bug. Now the other one.
[2] I couldn't get a local update for environments right. Probably should pass the environment as a pointer instead of a reference but I am not sure. Solved it with something which looks like a copying scheme... Might be bloody expensive. I need a good C++ programmer to work on this project too.
Log 061615
I implemented the module system. It starts of with one parsed top module top, and in one pass eagerly depth-first adds the list of included modules m0, .., mN, top through a closure construction. Semantic analysis is implemented by creating the complete context of declaration first, then doing various passes modulo that complete context. The modules keep track of their state, so at some point I could implement reusing pre-compiled modules.
Other variations could be to build up the context of declarations in left-to-right order during semantic analysis, or to use a cyclic graph representing the inclusion path and include only the local context.
It don't think the other variations are worth it. The left-to-right order is problematic with mutually recursive modules, the cyclic graph doesn't detect name clashes which could occur during linking easily. Moreover, the presented simple manner could allow for concurrent semantic analysis.
But yeah. There's a worst-case scenario that you'll do semantic analysis on a ten-line m0 file in a context of a million declarations.
Life is never easy. Guess as long it is documented, it's fine.
Other variations could be to build up the context of declarations in left-to-right order during semantic analysis, or to use a cyclic graph representing the inclusion path and include only the local context.
It don't think the other variations are worth it. The left-to-right order is problematic with mutually recursive modules, the cyclic graph doesn't detect name clashes which could occur during linking easily. Moreover, the presented simple manner could allow for concurrent semantic analysis.
But yeah. There's a worst-case scenario that you'll do semantic analysis on a ten-line m0 file in a context of a million declarations.
Life is never easy. Guess as long it is documented, it's fine.
Monday, June 15, 2015
Breaking Up the Design
I've been thinking too functionally for a while. If I am going to have a module system where things may be pre-compiled, have some non-obvious state, and relate to each other it's simply easier to implement that module system in an OOAD manner on top of the passes which can be seen to implement simple "pure" AST transforming functions.
Implementing an object-oriented module system and breaking up the identification phase into separate phases.
Implementing an object-oriented module system and breaking up the identification phase into separate phases.
Log 061515
So. The idiotic observation is that I can only compile correctly if I assume that all files are typechecked and compiled given all declarations imported in the project. That leads to a somewhat expensive design where the top module drives everything.
Somehow, this trivial observation took me half a week, and I am still not sure how to implement it right; i.e., how to translate this to performant and clear handling of data structures internally.
It simply isn't nice that a simple module needs to be compiled with a context including all the declarations[1], it breaks compositionality one assumes one has[2], in the project but there doesn't seem to be a way out. At least, not an easy way out.
And, at minimum, I want a design which can easily be extended to allow pre-compiled modules to be loaded at some point too. There's that too.
[1] Two modules may mutually recursively import each other's declaration, so there is already a need to import "forward" declarations. The top module may import declarations of modules which "shadow" imported declarations and lead to a name clash during linking.
[2] But it looks like other compilers don't have these compositionality invariants either. If I modify a file half-way through a C project that'll likely give me a core dump. And I can easily program C code such that all files will compile nicely individually but linking will fail. It's just an unsolvable problem.
Somehow, this trivial observation took me half a week, and I am still not sure how to implement it right; i.e., how to translate this to performant and clear handling of data structures internally.
It simply isn't nice that a simple module needs to be compiled with a context including all the declarations[1], it breaks compositionality one assumes one has[2], in the project but there doesn't seem to be a way out. At least, not an easy way out.
And, at minimum, I want a design which can easily be extended to allow pre-compiled modules to be loaded at some point too. There's that too.
[1] Two modules may mutually recursively import each other's declaration, so there is already a need to import "forward" declarations. The top module may import declarations of modules which "shadow" imported declarations and lead to a name clash during linking.
[2] But it looks like other compilers don't have these compositionality invariants either. If I modify a file half-way through a C project that'll likely give me a core dump. And I can easily program C code such that all files will compile nicely individually but linking will fail. It's just an unsolvable problem.
Saturday, June 13, 2015
Log 061315
Suppose A0 and A1 define "a.foo". Suppose B0 imports A0 and B1 imports A1 and use "a.foo". Suppose C imports B0 and B1.
I can solve that B0 and B1 bind correctly to "a.foo" during typechecking. But then I just shifted the problem to the linking phase where it'll turn out "a.foo" gets imported from two different files.
I can also solve that by fully qualifying "a.foo" with the file in which it is defined. But then what if A0 and A1 are also named the same but live in two different paths? I would also need to fully qualify each identifier with both the filename as well as the path.
So. I am not going to solve this. Programmers will need to make sure all identifiers for a project live in different namespaces/don't overlap. Everything is solvable but this simply isn't worth it.
It's a feature, not a bug.
I can solve that B0 and B1 bind correctly to "a.foo" during typechecking. But then I just shifted the problem to the linking phase where it'll turn out "a.foo" gets imported from two different files.
I can also solve that by fully qualifying "a.foo" with the file in which it is defined. But then what if A0 and A1 are also named the same but live in two different paths? I would also need to fully qualify each identifier with both the filename as well as the path.
So. I am not going to solve this. Programmers will need to make sure all identifiers for a project live in different namespaces/don't overlap. Everything is solvable but this simply isn't worth it.
It's a feature, not a bug.
Friday, June 12, 2015
Log 061215
Right. Of course, I am going to implement module inclusion in the right manner but I shot myself a bit in the foot choosing namespaces as the guiding principle instead of files. Most compilers (Python, Java, Haskell) treat files as a, or the, guiding principle for grouping declarations. This greatly simplifies a compiler but I chose namespaces, decoupling the file from declarations, because I noticed I personally like it more to group functions which implement a certain functionality in a single file.
Thing is: You can't really decouple in that manner, import directives (inclusion of files) are necessary due to performance reasons and any import directive manipulates scope. So, I now have two directives manipulating the scope of declarations of a file, or module. The import directive and the using directive. I ended up doubling the work. Moreover, the scoping rules I have imply implementing a module system is now a non-trivial exercise and data-structures need to reflect the 'double nature' of scoping.
The other problem now is mutually dependent modules. If module m0 imports module m1, and m1 imports m0, I cannot have a simple identification phase since m1 needs to be identified before it can process m0 and vice-versa. This wasn't a problem in the batch compiler because that treated all included declarations as a flat list.
Going from batch compiler to an interpreter now also implies the identification phase needs to be split such that declarations can be gathered "halfway"; i.e., after syntactic transforms but before any kind of semantic analysis.
Thing is: You can't really decouple in that manner, import directives (inclusion of files) are necessary due to performance reasons and any import directive manipulates scope. So, I now have two directives manipulating the scope of declarations of a file, or module. The import directive and the using directive. I ended up doubling the work. Moreover, the scoping rules I have imply implementing a module system is now a non-trivial exercise and data-structures need to reflect the 'double nature' of scoping.
The other problem now is mutually dependent modules. If module m0 imports module m1, and m1 imports m0, I cannot have a simple identification phase since m1 needs to be identified before it can process m0 and vice-versa. This wasn't a problem in the batch compiler because that treated all included declarations as a flat list.
Going from batch compiler to an interpreter now also implies the identification phase needs to be split such that declarations can be gathered "halfway"; i.e., after syntactic transforms but before any kind of semantic analysis.
Thursday, June 11, 2015
Simple Design Decisions on Batch Mode Compilation
So. Always when I am mucking around I find I didn't think something through; i.e., bothered by underspecification. The problem with the module system is that I haven't decided on how the top-level program should work. I want to support some scheme where I can use the program as both an interpreter as well as a compiler, and that affects the internal organization.
First, the bootstrap compiler: This program behaves akin to a batch compiler as gcc. Each module is compiled individually to "object code" and all modules are "linked" together and output as a C program. (Where in my simple compiler, the object code is the serialized type-checked AST, and the linking phase is simply whole program compilation. But yeah, works.) That's a pretty straightforward manner of compilation, moreover, keeps a number of invariants trivially.
But I want an interpreter first, and a compiler second, which means I want to support invoking the program on one input file, where it includes all other files, and all files are semantically checked and translated in one run. Now you would think you could run the interpreter internally in batch mode on all imported files but, no, it turns out you can easily pollute the declaration space of the import chain of individual files since other modules will have been loaded; it probably is also too expensive to run the interpreter as a compiler in batch mode first.
The easy way out is to live a little and don't give a fuck, of course.
How does Python handle this? I need some solution. Asking the question is answering it, of course.
First, the bootstrap compiler: This program behaves akin to a batch compiler as gcc. Each module is compiled individually to "object code" and all modules are "linked" together and output as a C program. (Where in my simple compiler, the object code is the serialized type-checked AST, and the linking phase is simply whole program compilation. But yeah, works.) That's a pretty straightforward manner of compilation, moreover, keeps a number of invariants trivially.
But I want an interpreter first, and a compiler second, which means I want to support invoking the program on one input file, where it includes all other files, and all files are semantically checked and translated in one run. Now you would think you could run the interpreter internally in batch mode on all imported files but, no, it turns out you can easily pollute the declaration space of the import chain of individual files since other modules will have been loaded; it probably is also too expensive to run the interpreter as a compiler in batch mode first.
The easy way out is to live a little and don't give a fuck, of course.
How does Python handle this? I need some solution. Asking the question is answering it, of course.
Wednesday, June 10, 2015
Log 061015
I am mucking around with the module system as to be able to test larger multi-file projects. It's silly, sloppily trying to implement something which took me 20 lines in the bootstrap compiler.
Tuesday, June 9, 2015
Log 060915
After the initial identification phase the next step is semantical analysis. A major part of semantical analysis is kind and type checking, which relies on unification and substitution.
Implementing substitution primitives.
Implementing substitution primitives.
Saturday, June 6, 2015
Log 060615
It took me a while but I'll be dropping the namespace local implicit constant. It solves one particular range of important use-cases well but, once introduced, people would want more.
Still, it's a very important range of use-cases: calculations modulo a local state...
I'll shift it to some later date. Like an 0.5 version.
Still, it's a very important range of use-cases: calculations modulo a local state...
I'll shift it to some later date. Like an 0.5 version.
Thursday, June 4, 2015
Classes, Monads, and Implicits
Since I am programming in C++ my transforms/rewrites are implemented as classes which often have a simple local state. I subsequently wrap these classes into 'pure' simple AST manipulating functions which construct an object of the class, applies the rewrite, and returns the rewritten AST. It works surprisingly well, even though it feels somewhat 'muddy' relying on all kinds of invariants.
It's a non-trivial extension which I am not sure will break the type system, be used, or achieves what I want to mimic in the right manner. But it looks like there is a 'pure' operational semantics for it, so it should be somewhat safe, and it somehow 'feels' right. Now what?
In a 'pure' language like Haskell I gather you would normally do this with a monad. Create a transform which works modulo some state wrapped in the monad, and hide the monad by returning the result after the monad was ran to completion. I have never programmed a Haskell monad in my life but somehow the C++ impure solution looks better to me.
I want to do something similar like C++/Haskell in Hi and am thinking of extending the language with namespace local implicits. The idea is that it should work like below, a namespace local implicit is a constant threaded implicitly through the functions in a namespace.
namespace implicit [ 0: int ] (
using system
def inc: int =
this := this + 1; this
def map_inc: list int -> list int =
[ nil -> nil
| cons x xx -> cons (inc + x) (map_inc xx) ]
)
The idea would be that on calling a function in the namespace implicit, it is augmented with the implicit constant. A function within the namespace can refer to that constant with the 'this' keyword, and also manipulate the value threaded with something which looks like an assignment (but in it's operational semantics simply means: thread this new value along instead of the old value, although it may be implemented as a variable); every function called from within that namespace receives the (updated) implicit; leaving the namespace stops the threading and the implicit is lost.
Given these semantics, running the above function 'map_inc' on the list '0,1' should deliver the following result '1,3':
implicit.map_inc (cons 0 (cons 1 nil))
=
(cons 1 (cons 3 nil))
Log 060415
The identification phase, consisting of five or six passes, now passes minimal unit tests. So well, done with that, I guess.
Tuesday, June 2, 2015
Significant Performance Drop
I unit tested an alpha algorithm for the identification phase. It isn't the last pass I need to implement but it's the heart of identification. During the implementation I was already a bit worried about how I implemented it, so I, again, did some performance measurements.
Well. A significant performance drop. The reason? It keeps a flat map of all declarations it can deduce. Every time you enter a namespace, or use a using directive, it searches through that whole map and declares in a local dictionary all declarations which share a certain prefix.
Tested on one large file, you get a whole lot of declarations, and for every namespace and using declaration a linear search. That's a quadratic slow-down I guesstimate. For a 1MB file, 1MB/min instead of 1MB/sec (though it takes 3ms for the simple file I copied into 2000 different namespaces to get to 1MB.)
Performance dropped that significantly that it'll only comfortably compile medium sized programs now. I'll leave it as is, since it's just one pass, it's unlikely people will write large monolithic files with lots of namespaces now, and (probably) solvable by someone implementing a better search tree for namespaces at some point.
It strengthens my belief that all algorithms in a compiler should be linear.
Below, performance measurements on a 1MB sized file.
Output of gprof:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
24.62 27.19 27.19 597938347 0.00 0.00 __gnu_cxx::__atomic_add(int volatile*, int)
16.92 45.88 18.69 600390363 0.00 0.00 __gnu_cxx::__exchange_and_add(int volatile*, int)
12.18 59.33 13.45 207564208 0.00 0.00 Ast::tag()
6.73 66.77 7.44 392894022 0.00 0.00 std::__shared_ptr<Ast, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__shared_ptr<Ast, (__gnu_cxx::_Lock_policy)2> const&)
4.73 71.99 5.22 194555526 0.00 0.00 std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std:
:shared_ptr<Ast> > >, LessAstPtr, std::allocator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > >::_S_right(std::_Rb_tree_node_base*)
2.81 75.09 3.10 599838760 0.00 0.00 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count()
1.63 76.89 1.80 597760346 0.00 0.00 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count(std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&)
1.58 78.63 1.74 192160000 0.00 0.00 std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std:
:shared_ptr<Ast> > >, LessAstPtr, std::allocator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > >::_M_clone_node(std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >
const*)
1.25 80.01 1.39 600390363 0.00 0.00 __gnu_cxx::__exchange_and_add_dispatch(int*, int)
1.23 81.37 1.36 597938347 0.00 0.00 __gnu_cxx::__atomic_add_dispatch(int*, int)
1.21 82.71 1.34 209242230 0.00 0.00 std::__shared_ptr<Ast, (__gnu_cxx::_Lock_policy)2>::operator->() const
1.04 83.86 1.15 192000000 0.00 0.00 RewriteIdentify::ast_has_prefix(icu_52::UnicodeString const&, std::shared_ptr<Ast> const&)
0.86 84.81 0.95 1198328710 0.00 0.00 __gthread_active_p()
0.76 85.65 0.84 199744948 0.00 0.00 std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >::_M_valptr() const
0.74 86.47 0.82 42017 0.00 0.00 std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std::
shared_ptr<Ast> > >, LessAstPtr, std::allocator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > >::_M_erase(std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >*)
0.74 87.29 0.82 6005 0.00 0.01 std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std::
shared_ptr<Ast> > >, LessAstPtr, std::allocator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > >::_M_copy(std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > const*,
std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >*)
0.62 87.98 0.69 197855159 0.00 0.00 icu_52::UnicodeString::doCompare(int, int, icu_52::UnicodeString const&, int, int) const
0.60 88.64 0.66 192160000 0.00 0.00 std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >* std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const,
std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >, LessAstPtr, std::allocator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > >::_M_create_nod
e<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > const&>(std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > const&)
0.57 89.28 0.64 392894022 0.00 0.00 std::shared_ptr<Ast>::shared_ptr(std::shared_ptr<Ast> const&)
0.57 89.91 0.63 394982420 0.00 0.00 std::__shared_ptr<Ast, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr()
0.57 90.53 0.63 597938347 0.00 0.00 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_add_ref_copy()
0.57 91.16 0.63 197855159 0.00 0.00 icu_52::UnicodeString::pinIndices(int&, int&) const
0.55 91.77 0.61 606783091 0.00 0.00 icu_52::UnicodeString::length() const
0.55 92.38 0.61 192064000 0.00 0.00 std::_Rb_tree_iterator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >::operator++()
0.48 92.91 0.53 192318000 0.00 0.00 std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std:
:shared_ptr<Ast> > >, LessAstPtr, std::allocator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > >::_M_destroy_node(std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >
>*)
0.45 93.41 0.50 599164355 0.00 0.00 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()
0.45 93.91 0.50 192549131 0.00 0.00 std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std:
Heap measurements:
Well. A significant performance drop. The reason? It keeps a flat map of all declarations it can deduce. Every time you enter a namespace, or use a using directive, it searches through that whole map and declares in a local dictionary all declarations which share a certain prefix.
Tested on one large file, you get a whole lot of declarations, and for every namespace and using declaration a linear search. That's a quadratic slow-down I guesstimate. For a 1MB file, 1MB/min instead of 1MB/sec (though it takes 3ms for the simple file I copied into 2000 different namespaces to get to 1MB.)
Performance dropped that significantly that it'll only comfortably compile medium sized programs now. I'll leave it as is, since it's just one pass, it's unlikely people will write large monolithic files with lots of namespaces now, and (probably) solvable by someone implementing a better search tree for namespaces at some point.
It strengthens my belief that all algorithms in a compiler should be linear.
Below, performance measurements on a 1MB sized file.
Output of gprof:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
24.62 27.19 27.19 597938347 0.00 0.00 __gnu_cxx::__atomic_add(int volatile*, int)
16.92 45.88 18.69 600390363 0.00 0.00 __gnu_cxx::__exchange_and_add(int volatile*, int)
12.18 59.33 13.45 207564208 0.00 0.00 Ast::tag()
6.73 66.77 7.44 392894022 0.00 0.00 std::__shared_ptr<Ast, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__shared_ptr<Ast, (__gnu_cxx::_Lock_policy)2> const&)
4.73 71.99 5.22 194555526 0.00 0.00 std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std:
:shared_ptr<Ast> > >, LessAstPtr, std::allocator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > >::_S_right(std::_Rb_tree_node_base*)
2.81 75.09 3.10 599838760 0.00 0.00 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count()
1.63 76.89 1.80 597760346 0.00 0.00 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count(std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&)
1.58 78.63 1.74 192160000 0.00 0.00 std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std:
:shared_ptr<Ast> > >, LessAstPtr, std::allocator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > >::_M_clone_node(std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >
const*)
1.25 80.01 1.39 600390363 0.00 0.00 __gnu_cxx::__exchange_and_add_dispatch(int*, int)
1.23 81.37 1.36 597938347 0.00 0.00 __gnu_cxx::__atomic_add_dispatch(int*, int)
1.21 82.71 1.34 209242230 0.00 0.00 std::__shared_ptr<Ast, (__gnu_cxx::_Lock_policy)2>::operator->() const
1.04 83.86 1.15 192000000 0.00 0.00 RewriteIdentify::ast_has_prefix(icu_52::UnicodeString const&, std::shared_ptr<Ast> const&)
0.86 84.81 0.95 1198328710 0.00 0.00 __gthread_active_p()
0.76 85.65 0.84 199744948 0.00 0.00 std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >::_M_valptr() const
0.74 86.47 0.82 42017 0.00 0.00 std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std::
shared_ptr<Ast> > >, LessAstPtr, std::allocator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > >::_M_erase(std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >*)
0.74 87.29 0.82 6005 0.00 0.01 std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std::
shared_ptr<Ast> > >, LessAstPtr, std::allocator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > >::_M_copy(std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > const*,
std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >*)
0.62 87.98 0.69 197855159 0.00 0.00 icu_52::UnicodeString::doCompare(int, int, icu_52::UnicodeString const&, int, int) const
0.60 88.64 0.66 192160000 0.00 0.00 std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >* std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const,
std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >, LessAstPtr, std::allocator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > >::_M_create_nod
e<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > const&>(std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > const&)
0.57 89.28 0.64 392894022 0.00 0.00 std::shared_ptr<Ast>::shared_ptr(std::shared_ptr<Ast> const&)
0.57 89.91 0.63 394982420 0.00 0.00 std::__shared_ptr<Ast, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr()
0.57 90.53 0.63 597938347 0.00 0.00 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_add_ref_copy()
0.57 91.16 0.63 197855159 0.00 0.00 icu_52::UnicodeString::pinIndices(int&, int&) const
0.55 91.77 0.61 606783091 0.00 0.00 icu_52::UnicodeString::length() const
0.55 92.38 0.61 192064000 0.00 0.00 std::_Rb_tree_iterator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >::operator++()
0.48 92.91 0.53 192318000 0.00 0.00 std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std:
:shared_ptr<Ast> > >, LessAstPtr, std::allocator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > >::_M_destroy_node(std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >
>*)
0.45 93.41 0.50 599164355 0.00 0.00 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()
0.45 93.91 0.50 192549131 0.00 0.00 std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std:
Subscribe to:
Posts (Atom)