A simple transform which maps text in the AST to its identity using a map function, therefor making all references in the heap unique, would get rid of all that double character data too.
I feel like I am back to my C64 where, in order to get at -or change- that extra RAM, one needed to 'peek' all values of a memory region and 'poke' them to their addresses just before disabling the ROM shadowed memory region.
Monday, August 30, 2010
Sunday, August 29, 2010
Not Doing a Lot
As the title says, severe lack of motivation to program. Anyhow, thought a bit more about a trivial typing scheme, it should be enough if I just reverse/extend type variables with their assumed implemented interfaces, unify on that, and get rid of generalization to derive an AST based type checker. Not sure where all the assumptions and stuff should go then in the context, but think about that later.
Saturday, August 28, 2010
Uncool
I'ld think implementing strings would help somewhat, but I didn't manage to convince myself, and biggest reason: I just don't feel like, having strings as lists of characters works for Haskell too and gives nice algorithms. The real problem is 1GB of character data, which are probably mostly copies of themselves. Other stuff I could do:
- Make the data unique by storing strings in a map and only extract unique copies.
- Don't retain specific character data, it's probably all positional information anyway.
- Change the runtime such that all shareable data is shared.
I like 1 the most, but that would best be solved by having a constant pool referred to as a variable, and I don't have variables at the moment.
(Did I mention using a symbol table? A simple transform which just visits all AST nodes once would do it too, of course.)
(Did I mention using a symbol table? A simple transform which just visits all AST nodes once would do it too, of course.)
Thursday, August 26, 2010
Log 082610
The heap profiler confirmed what I knew. Two basic things which slow down my compiler are: a) way too many chars, so I'll need a native text type, b) comparisons are just too slow, which may also be the result of having way too many chars.
There are some other optimizations, like dropping positional information in the linking phase and implementing a condensed and more intelligible representation for identifiers, which would help too.
First move forward, implement all those basic string primitives (length, pos_of, index, etc.) on strings.
(A cell on a 64 bit system is 64 bits. A char is placed in an array consisting of a length, cell-type, type and value. A cons is placed in an array of a length, cell-type, record-tag, type tag, header and tail. So, ten times eight, is eighty bytes per char. So, eighty times 15,550,128 gives 1,244,010,240 bytes. And most of that is probably the not-shared URI in positional information.)
There are some other optimizations, like dropping positional information in the linking phase and implementing a condensed and more intelligible representation for identifiers, which would help too.
First move forward, implement all those basic string primitives (length, pos_of, index, etc.) on strings.
(A cell on a 64 bit system is 64 bits. A char is placed in an array consisting of a length, cell-type, type and value. A cons is placed in an array of a length, cell-type, record-tag, type tag, header and tail. So, ten times eight, is eighty bytes per char. So, eighty times 15,550,128 gives 1,244,010,240 bytes. And most of that is probably the not-shared URI in positional information.)
Wednesday, August 25, 2010
I should cut back on chars...
So, I wrote a heap profiler, output every 10%:
It starts pretty innocent:
And grows.
And grows.
And grows.
And grows.
And grows.
And grows.
And grows.
And grows.
And grows.
Analyze the following statement: "Nothing is unthinkable, for to think of nothing, is not to think at all."
It starts pretty innocent:
THUNK 160 char 3300 list.cons 3972 int 4 system._txt 256 system.tuple_4 12And grows.
THUNK 248 pointer 4 char 204964 int 20356 position._pos 10176 ast.type_name 1932 system._txt 15728 ast.expr_empty 864 list.cons 206748 ast.decl_instance 32 ast.decl_def 156 ast.type_instance 32 ast.expr_name 1580 ast.type_arrow 568 ast.expr_case 292 ast.decl_bind 32 ast.expr_match 384 ast.decl_space 32 ast.decl_type 16 ast.expr_record 308 ast.decl_using 24 ast.type_case 12 ast.kind_unknown 300 ast.expr_number 88 ast.type_record 20 ast.expr_variable 604 ast.decl_interface 20 ast.type_universal 204 ast.decl_record 12 ast.type_unknown 580 ast.expr_application 676 ast.type_abstraction 60 ast.type_variable 264 ast.kind_star 580 ast.type_interface 20 system.tuple_2 8 ast.type_constrain 196 ast.decl_method 72 ast.type_application 96 ast.expr_select 88 ast.expr_char 12
And grows.
THUNK 356 pointer 4 char 807500 int 79964 position._pos 39976 ast.expr_empty 3104 ast.expr_name 6508 ast.type_name 6468 ast.type_universal 520 system._txt 61664 ast.type_variable 760 ast.decl_record 52 list.cons 816488 ast.kind_unknown 876 ast.expr_case 1472 ast.expr_match 1836 ast.expr_record 1776 ast.decl_space 68 ast.expr_variable 2456 ast.type_arrow 1368 ast.type_unknown 2404 ast.decl_using 84 ast.decl_instance 136 ast.kind_star 2404 ast.type_application 672 ast.decl_type 96 ast.type_case 56 ast.type_instance 136 system.tuple_2 8 ast.type_constrain 400 ast.type_skolem 32 ast.decl_def 504 ast.type_record 64 ast.decl_bind 168 ast.expr_application 3076 ast.comment 16 ast.expr_embed 256 ast.expr_number 400 ast.decl_interface 20 ast.ffi_type_simple 652 ast.expr_char 700 ast.type_abstraction 240 ast.type_interface 20 ast.decl_method 72 ast.expr_select 88 ast.expr_throw 4
And grows.
THUNK 340 ffi.struct 4 pointer 4 char 2809580 int 287476 position._pos 143732 system._txt 223008 ast.expr_match 6116 list.cons 2843552 ast.expr_empty 9960 ast.expr_record 8944 ast.expr_char 3236 ast.expr_name 26248 ast.type_name 23168 ast.type_arrow 4456 ast.decl_def 2364 ast.expr_case 4616 ast.comment 164 ast.expr_application 10448 ast.expr_variable 9752 ast.expr_text 136 ast.type_unknown 9472 ast.kind_star 9472 ast.part 56 ast.type_universal 1952 ast.decl_using 272 ast.type_variable 2260 ast.decl_space 172 ast.kind_unknown 2448 ast.expr_number 836 system.tuple_2 8 ast.decl_type 152 ast.type_case 96 ast.decl_instance 244 ast.decl_interface 36 ast.type_instance 244 ast.type_record 216 ast.type_abstraction 308 ast.type_interface 36 ast.decl_bind 276 ast.decl_record 216 ast.type_constrain 736 ast.type_application 2864 ast.decl_method 104 ast.type_tuple 120 ast.expr_select 112 ast.type_skolem 36 ast.expr_embed 384 ast.ffi_type_simple 980 ast.expr_throw 4
And grows.
THUNK 452 int 957136 pointer 4 char 10596428 list.cons 10718984 position._pos 478560 ast.expr_empty 30196 ast.expr_variable 45388 system._txt 740672 ast.type_unknown 44512 ast.kind_star 44512 ast.expr_application 49172 ast.expr_name 99336 ast.expr_case 16508 ast.type_arrow 7616 ast.type_name 53132 ast.expr_match 23052 ast.decl_def 4696 ast.expr_record 28540 ast.decl_using 840 ast.type_application 4624 ast.expr_text 904 ast.part 144 ast.comment 532 ast.decl_space 280 system.tuple_2 8 ast.expr_number 1280 ast.decl_type 220 ast.type_tuple 216 ast.expr_char 10288 ast.type_case 116 ast.kind_unknown 2724 ast.type_record 504 ast.decl_instance 328 ast.type_instance 328 ast.expr_throw 192 ast.decl_bind 360 ast.type_abstraction 356 ast.type_variable 2460 ast.decl_interface 44 ast.type_universal 2104 ast.type_interface 44 ast.decl_record 504 ast.type_constrain 824 ast.decl_method 120 ast.expr_select 120 ast.expr_try 4 ast.type_skolem 36 ast.expr_embed 384 ast.ffi_type_simple 980
And grows.
THUNK 244 int 1452744 system._txt 1119192 ast.decl_method 120 map.node 7768 ast.decl_def 5964 list.cons 16055612 position._pos 695124 ast.expr_empty 41900 ast.expr_name 148724 ast.type_name 79064 ast.type_abstraction 376 ast.type_arrow 9364 ast.expr_case 24008 ast.decl_type 268 char 15550128 ast.type_variable 2496 ast.type_constrain 824 ast.type_application 5676 ast.type_case 136 ast.kind_unknown 2808 ast.type_universal 2120 ast.expr_match 32504 ast.expr_application 71856 ast.expr_record 48992 ast.type_record 688 ast.expr_embed 384 ast.expr_variable 62796 ast.expr_number 1836 ast.type_unknown 61728 ast.ffi_type_simple 980 ast.part 176 ast.kind_star 61728 ast.comment 672 ast.type_skolem 36 ast.expr_char 21280 ast.decl_interface 44 ast.expr_text 1148 ast.decl_space 488 lambda.lambda_def 4152 ast.type_tuple 288 ast.type_interface 44 lambda.lambda_app 34044 ast.decl_using 1448 ast.decl_record 688 ast.decl_instance 360 system.tuple_2 8 ast.expr_select 120 lambda.lambda_alts 11108 lambda.lambda_record 18484 ast.type_instance 360 lambda.lambda_comb 16652 lambda.lambda_abs 15720 ast.decl_bind 392 ast.expr_try 8 lambda.lambda_throw 11228 lambda.lambda_name 54724 ast.expr_throw 228 lambda.lambda_const 18140 lambda.lambda_method 316 long 18380 ast.expr_type 4 lambda.lambda_select 120 lambda.lambda_embed 384
And grows.
THUNK 5044 system.tuple_2 44 lambda.lambda_record 77532 list.cons 2404732 lambda.lambda_name 256328 system._txt 233196 int 78140 lambda.lambda_app 192680 lambda.lambda_abs 43772 lambda.lambda_alts 33164 lambda.lambda_const 47296 char 1582504 long 47536 lambda.lambda_comb 32128 lambda.lambda_split 22536 lambda.lambda_def 10680 lambda.lambda_test 788 lambda.lambda_call 32
And grows.
THUNK 40940 list.cons 19046368 code.c_mov 587036 code.op_reg 667452 code.op_int 781748 code.c_tst 140168 code.op_nth 317836 int 1180748 long 292384 system._txt 490732 code.op_lbl 79776 code.c_lbl 79776 code.c_return 49960 code.c_comment 119688 code.op_array 149632 char 2048476 code.c_op 28824 code.op_name 33916 code.c_call 14920 code.c_jmp 14908 code.codestate 8 combinator.comb_def 21176 combinator.comb_alts 21176 code.c_function 13912 combinator.comb_var 233048 combinator.comb_split 13696 combinator.comb_stack 33020 combinator.comb_record 50460 system.tuple_2 8 combinator.comb_thunk 53548 combinator.comb_name 53548 combinator.comb_const 26944 combinator.comb_test 396 combinator.comb_call 20
And grows.
THUNK 4692 int 2610712 map.node 392 code.op_int 2057056 list.cons 29897400 code.c_mov 1780684 code.op_nth 1266064 code.op_array 220172 code.op_reg 1702752 system._txt 622820 code.c_comment 192316 char 2500860 code.c_call 121312 code.c_function 25128 code.c_tst 262624 long 510796 code.op_lbl 121324 code.c_return 89752 code.c_lbl 121324 system.tuple_2 8 code.c_op 52676 code.op_name 53548 code.c_jmp 22976
And grows.
THUNK 11764 ffi.struct 4 int 2301420 list.cons 31044276 system._txt 622808 pointer 4 code.c_comment 192316 code.c_mov 1539136 code.c_function 21176 code.op_reg 3341368 code.op_int 2057056 code.c_tst 220540 code.op_lbl 121324 char 2500840 code.op_array 192252 code.c_return 75372 code.c_lbl 121324 code.op_nth 1099972 code.c_op 44152 long 510796 code.c_call 106212 code.op_name 53548 code.c_jmp 22976
And grows.
THUNK 26100 list.cons 33266952 int 2301420 system._txt 630724 code.op_nth 1099972 code.op_reg 3341368 code.op_int 2057056 code.c_mov 1539136 code.op_array 192252 code.c_comment 192316 pointer 4 code.c_tst 220540 code.c_function 21176 code.op_lbl 121324 long 510796 code.c_return 75372 code.c_lbl 121324 code.c_call 106212 char 2500840 code.c_op 44152 code.op_name 53548 code.c_jmp 22976
Analyze the following statement: "Nothing is unthinkable, for to think of nothing, is not to think at all."
Tuesday, August 24, 2010
A Heap Profiler
Silly thought. I've been looking at how to optimize various around the Hi compiler, but the best manner is always just to get the raw numbers. Now, compiling with -pg gives me great profile information on where time is spend, but it doesn't tell me where all that memory went. I need something which profiles the heap, just a simple tool which counts nodes ordered by kind.
Monday, August 23, 2010
Log 082310
There are times when I can write on the compiler, and there are times when I cannot. Instead, I write on the language specification/standard to get it into a shape that I can express a new typing scheme.
Sunday, August 22, 2010
Small Changes
I changed the runtime a bit. FFI calls are memoized, memory pools are cached. The ackerman test (optimized without overloading) now runs in a bit more than a second, on 2.26 GHz, three secs on 798 Mhz. This should mean a faked entry of 4.3 seconds on the windows benchmark shootout. Which is almost Lua speed and faster than the slowest Lisp entries. Some optimization (local computations, inlining) and I should beat Lua speed, and I am done with that.
(I normalized to 550Hz. The result remains skewed, of course. Biggest question is what the difference between 32 vs 64 bits means. A 32 bit machine would halve the Hi runtime cell size, so on 64 bits you need double the memory bandwith. No idea whether a Pentium has a 32 or a 64 bits bus.)
A fast compile of the sources now takes 20 minutes and 3GB of memory. Still, better than the 15 hours it took with the ocaml based AST interpreter. Guess I'll need support for proper literals next, 3GB is a bit over the top. (Which means a 1GB from region, I double each time and keep the last 4 freed regions in cache.)
This is fine-tuning actually. A generational garbage collector would improve speed, but it might also help if I just, instead of processing all lambda terms to combinators, would translate each term to a combinator and then compile that too C directly. There is a spot where it just blows up, and it doesn't make sense to keep all that information in the heap.
(Hmm, there is of course the question how long values within a function's body are retained. I should check that all top-level references are actually thrown away after their use. It may well be that it keeps the AST, the lambda terms, the combinators, and the intermediate code in memory.)
(I normalized to 550Hz. The result remains skewed, of course. Biggest question is what the difference between 32 vs 64 bits means. A 32 bit machine would halve the Hi runtime cell size, so on 64 bits you need double the memory bandwith. No idea whether a Pentium has a 32 or a 64 bits bus.)
A fast compile of the sources now takes 20 minutes and 3GB of memory. Still, better than the 15 hours it took with the ocaml based AST interpreter. Guess I'll need support for proper literals next, 3GB is a bit over the top. (Which means a 1GB from region, I double each time and keep the last 4 freed regions in cache.)
This is fine-tuning actually. A generational garbage collector would improve speed, but it might also help if I just, instead of processing all lambda terms to combinators, would translate each term to a combinator and then compile that too C directly. There is a spot where it just blows up, and it doesn't make sense to keep all that information in the heap.
(Hmm, there is of course the question how long values within a function's body are retained. I should check that all top-level references are actually thrown away after their use. It may well be that it keeps the AST, the lambda terms, the combinators, and the intermediate code in memory.)
Saturday, August 21, 2010
Log 082110
The Hi compiler seemingly wasted memory. It used a stop and copy collector which just allocates a new to space by calling malloc and frees the from space afterwards. I did that on purpose such that even with multiple processes, you don't need double the size of memory.
I expected eager (FIFO) behavior from malloc, so I expected a malloc to just return the last slab of memory after a free. It didn't, it just would return a new memory segment. So seemingly, while the from space just is a few megabytes, it allocated new to spaces and would sit on the from spaces until the GC from malloc kicked it. So, it would take 500MB of memory, mostly unused and also freed, while running the compiler.
Fixed that behavior, don't think it matters much except for aesthetics, by implementing an intermediate pool.
Started a change to the runtime such that the FFI combinator memoizes the dl symbol and ffi CIF record.
I expected eager (FIFO) behavior from malloc, so I expected a malloc to just return the last slab of memory after a free. It didn't, it just would return a new memory segment. So seemingly, while the from space just is a few megabytes, it allocated new to spaces and would sit on the from spaces until the GC from malloc kicked it. So, it would take 500MB of memory, mostly unused and also freed, while running the compiler.
Fixed that behavior, don't think it matters much except for aesthetics, by implementing an intermediate pool.
Started a change to the runtime such that the FFI combinator memoizes the dl symbol and ffi CIF record.
Friday, August 20, 2010
Staging C
I should get rid of the intermediate assembly and just be able to output staged C. For that, I need two things: 1) compile-time support of evaluation, 2) be able to inject C source code into a combinator.
Looks doable.
I read a bit on Dalvik. I wondered about the performance issues it had, or has, there is no clarity on that, and compressed vs naive byte-code, small vs large register sets, and stack-based vs register based optimizations.
First question, is Dalvik slower because compression gives more levels of indirection? Second question, is Dalvik slower because it's easier to JIT a smaller instructions set? Last question, is it just easier to optimize a stack based machine because liveness analysis of objects is easier? Or is it all just the natural course and this just takes several years?
At the same time, I write too much since it takes too much time to build a new bootstrap. I need to set up a chain for that, now I spend time waiting too long, fix a bug -or two- and manually updating files and changing directories. (Basically, a boostrap needs to be able to compile itself twice over before I have any guarantee the damned thing works.)
Looks doable.
I read a bit on Dalvik. I wondered about the performance issues it had, or has, there is no clarity on that, and compressed vs naive byte-code, small vs large register sets, and stack-based vs register based optimizations.
First question, is Dalvik slower because compression gives more levels of indirection? Second question, is Dalvik slower because it's easier to JIT a smaller instructions set? Last question, is it just easier to optimize a stack based machine because liveness analysis of objects is easier? Or is it all just the natural course and this just takes several years?
At the same time, I write too much since it takes too much time to build a new bootstrap. I need to set up a chain for that, now I spend time waiting too long, fix a bug -or two- and manually updating files and changing directories. (Basically, a boostrap needs to be able to compile itself twice over before I have any guarantee the damned thing works.)
Log 082010
Nice date. A poster on LtU brought up extensible interfaces, and convinced me. It just doesn't make sense not to have them given what they bring in added expression.
Looked over the sources where I decrement stuff and patched those. Thinking about interfaces, performance, the current slow manner of identification (ranges vs maps), and a new syntax directed design for the type checker.
Looked over the sources where I decrement stuff and patched those. Thinking about interfaces, performance, the current slow manner of identification (ranges vs maps), and a new syntax directed design for the type checker.
Thursday, August 19, 2010
Log 081910
Refactored a bit, got myself a brand new bug somewhere. (I added proper lexical support for all other primitives, so now in expressions like 'n-1', it applies 'n' to '-1'. Should fix that type checker...)
Wednesday, August 18, 2010
Log 081810
Ah well. I looked over all the stuff, and, there's only one option: Start where it all started and rewrite each module for performance/compliance. The good thing is that I still don't think there's anything really broken about the compiler, it still just boils down to rewriting it with the right data-structures and implementing the right optimizations.
Did some light refactoring. For historical reasons, a file used to be referred to by its name, now since I bind to C, they're just file pointers, so made every reference to a file a file pointer.
I am wondering whether I need interface files which export definitions. It's just that, I notice it ain't nice to reimplement code and think over the made abstractions again. I wanted a light language, and thought, a la Java, that interface files are a thing of the past. Not sure, anymore.
Implemented long, float, double, hexadecimal support in the lexer.
I don't give myself enough kudos, I compiled most with the typechecker turned on, and it caught most trivial bugs.
Did some light refactoring. For historical reasons, a file used to be referred to by its name, now since I bind to C, they're just file pointers, so made every reference to a file a file pointer.
I am wondering whether I need interface files which export definitions. It's just that, I notice it ain't nice to reimplement code and think over the made abstractions again. I wanted a light language, and thought, a la Java, that interface files are a thing of the past. Not sure, anymore.
Implemented long, float, double, hexadecimal support in the lexer.
I don't give myself enough kudos, I compiled most with the typechecker turned on, and it caught most trivial bugs.
Tuesday, August 17, 2010
Log 081710
Got a release ready, uploaded it and posted a message on LtU, where it seems most people are on holidays anyway, so I guess it'll be a whatever.
The good thing is I now also have a clean separation between the sources of the compiler and the runtime. Looking at what to do next.
The good thing is I now also have a clean separation between the sources of the compiler and the runtime. Looking at what to do next.
Slow by Design or Slow VM?
I am running the bootstrap now with most of diagnostic information debug prints gone. (I should add a debug primitive). And man, it feels sllllooowww.
The thing which I puzzle a lot about lately is whether the compiler is slow, or the VM is slow, or both? Things against the VM: uses 64 bits for everything, doesn't compile type tags to some integer representation, hogs the memory therefor, matches too much on data structures and uses combinators and FFI for everything? Things against the compiler implementation: Wrong data structures and it generates just too much code! I use AST transforms for just about everything, that might have been a wrong choice.
Anyway, got the bootstrap release almost ready. Seems like at least another year's worth of work to get it into any acceptable shape.
That's another thought. This started out as a learning project, I firmly believe in doing stuff oneself because otherwise you're probably not even close to understanding what other people did. But what is the total effort which went into Haskell/ML each? Just adding up the numbers of all ML and Haskell distributions and all people involved: Around a few thousand man years?
(Ah well, doesn't change a lot. I knew the trade-offs, the target still just should be somewhere in between Java and Lua speed.)
The thing which I puzzle a lot about lately is whether the compiler is slow, or the VM is slow, or both? Things against the VM: uses 64 bits for everything, doesn't compile type tags to some integer representation, hogs the memory therefor, matches too much on data structures and uses combinators and FFI for everything? Things against the compiler implementation: Wrong data structures and it generates just too much code! I use AST transforms for just about everything, that might have been a wrong choice.
Anyway, got the bootstrap release almost ready. Seems like at least another year's worth of work to get it into any acceptable shape.
That's another thought. This started out as a learning project, I firmly believe in doing stuff oneself because otherwise you're probably not even close to understanding what other people did. But what is the total effort which went into Haskell/ML each? Just adding up the numbers of all ML and Haskell distributions and all people involved: Around a few thousand man years?
(Ah well, doesn't change a lot. I knew the trade-offs, the target still just should be somewhere in between Java and Lua speed.)
Monday, August 16, 2010
Log 081610
Start of a new day. I'll get that release ready; shouldn't spend too much time on it. As compilers go, I doubt there will be a lot of interest in a pre-release of a bootstrap compiler.
Removed most debug prints from the code. Left the type inference debug prints since I'll work on that for the coming period. Removed parsing of series statements, since it looks like it actually increases the number of bugs made by programmers (me).
It takes some time to build a proper bootstrap. Also, I guess I should first use it myself a bit before releasing it. If anything, the gain will be that at least I will be using nicely numbered editions of the bootstrap compiler. There are a number of small things in it which I want to fix anyway before rewriting the typing code.
Removed most debug prints from the code. Left the type inference debug prints since I'll work on that for the coming period. Removed parsing of series statements, since it looks like it actually increases the number of bugs made by programmers (me).
It takes some time to build a proper bootstrap. Also, I guess I should first use it myself a bit before releasing it. If anything, the gain will be that at least I will be using nicely numbered editions of the bootstrap compiler. There are a number of small things in it which I want to fix anyway before rewriting the typing code.
Sunday, August 15, 2010
Log 081510
I decided that the biggest problem hindering further development at the moment is just a good chair.
Worked a bit on the Google site. Looked at name registrars. Since when do I need to become a valued reseller? I just want a simple redirect for a few bucks?
Just before I went with the wrong registrar, found out it can be done easy within Google apps. So, did that, registered "hi-language.org" (Would have preferred hic.org, or something shorter, but -ah well,- good enough.)
Tinkered around for a few hours to get the site to become publicly viewable, where's the "make this site public for all meta-button?", but that worked out. Filling the site with content.
End of day. Site is filled. Not entirely happy with it, could have been more formal at points and clearer at other points, but I guess it'll just do. Next task: clean-up the bootstrap compiler and build a release.
As far as Google Apps/Sites goes. There are a few quirks, and blogger seems to handle cosmetics better than Sites at the moment, but to get a website up and running in under a day, cool! Kudos to Google.
Just before I went with the wrong registrar, found out it can be done easy within Google apps. So, did that, registered "hi-language.org" (Would have preferred hic.org, or something shorter, but -ah well,- good enough.)
Tinkered around for a few hours to get the site to become publicly viewable, where's the "make this site public for all meta-button?", but that worked out. Filling the site with content.
End of day. Site is filled. Not entirely happy with it, could have been more formal at points and clearer at other points, but I guess it'll just do. Next task: clean-up the bootstrap compiler and build a release.
As far as Google Apps/Sites goes. There are a few quirks, and blogger seems to handle cosmetics better than Sites at the moment, but to get a website up and running in under a day, cool! Kudos to Google.
Saturday, August 14, 2010
Log 081410
Finished the html snippet tool, it works. Read some of the comments on the P!=NP proof, there's no separation in his construction. That's all secondary.
Real questions are: What licence? What to publish? Everything, or just the C sources of the bootstrap?
Real questions are: What licence? What to publish? Everything, or just the C sources of the bootstrap?
Friday, August 13, 2010
Log 081310
Didn't do a lot except for tinkering with a mock-up for a new website. Which immediately gave a new problem: How and where to host? Static website, dynamic website, what software?
This is not something I want to spend a lot of time on. Looked at Google pages, maybe that's a good answer?
So, I tried it out. I, of course, couldn't get the intended design into one of Google's templates, but that might have been a good thing. So, working on the Google site for now.
Also, Google doesn't support uploading or using of javascript. So, I am writing a simple translator application in Hi to emit HTML snippets from the lexer. The compile, of course, failed somewhere.
Hmpf, I am going to rewrite each and every function till it's so painstakingly obvious and fast until I drop dead somewhere.
(It turns out I should keep better track of which compiler I am using. It parses, but doesn't support, series of statements like "E0;E1" yet, I use `unnamed' lets like "_=E0;E1" for that usually. I actually think I am going to get rid of series statements for the moment, it is convenient but clutters the bootstrap. Exception throwing works.)
This is not something I want to spend a lot of time on. Looked at Google pages, maybe that's a good answer?
So, I tried it out. I, of course, couldn't get the intended design into one of Google's templates, but that might have been a good thing. So, working on the Google site for now.
Also, Google doesn't support uploading or using of javascript. So, I am writing a simple translator application in Hi to emit HTML snippets from the lexer. The compile, of course, failed somewhere.
Hmpf, I am going to rewrite each and every function till it's so painstakingly obvious and fast until I drop dead somewhere.
(It turns out I should keep better track of which compiler I am using. It parses, but doesn't support, series of statements like "E0;E1" yet, I use `unnamed' lets like "_=E0;E1" for that usually. I actually think I am going to get rid of series statements for the moment, it is convenient but clutters the bootstrap. Exception throwing works.)
Monday, August 9, 2010
Log 080910
Nice date. Not doing a lot, need a push. I printed the lot and some code just looks wrong and rushed. Some refactoring and some rewriting is necessary. But, can't say I am really up to it at the moment.
Stats on the source. Two hundred and fifty pages source code, fifty pages system library and utilities, seventy five pages AST and parsing, fifty five pages semantical analysis, sixty pages conversion to code, and about ten pages handling modules, options and the main routine.
Stats on the source. Two hundred and fifty pages source code, fifty pages system library and utilities, seventy five pages AST and parsing, fifty five pages semantical analysis, sixty pages conversion to code, and about ten pages handling modules, options and the main routine.
Saturday, August 7, 2010
Log 080710
Definitely full conformance first, which means I want the whole bootstrap to be accepted cleanly with type checking turned on. On some modules, I am forced to turn the type checker off because of: 1) A bug in kind-checking multiple instantiations during typing (no idea, the sources look correct). 2) Small let-definitions not being accepted by the compiler because when generalized they don't type check because of a lack of type information on the values worked upon. The latter is not a bug, but just means I am probably better off not generalizing let definitions.
Friday, August 6, 2010
Log 080610
I took a few days off after the bootstrap. Not sure where I am heading from here. Perfomance is an issue, so is full conformance to spec. It makes the most sense to just finish most of it (add support for literal longs, floats, doubles, solve a logical error in exception throwing) so I can get rid of printing most of the diagnostic information.
The logical error in exception throwing is simple. Throwing constants work, but for some reason calculations in the body of the thrown exception are not run fully to completion, meaning I probably garbled something there. (Ticked that off I think, looks like I forget to supply the exception handler inside throw statements.)
I tried to optimize ackermann by inlining some functions which doubled its speed. So, my assumption was right, most time is spend setting up thunks (of which it sets up too many at the moment) and making libffi calls (which aren't memoized a the moment).
Since the compiler spends most of it time printing stuff at the moment, where every char is a libffi call, guess it's correct exception support followed by memoization first.
The logical error in exception throwing is simple. Throwing constants work, but for some reason calculations in the body of the thrown exception are not run fully to completion, meaning I probably garbled something there. (Ticked that off I think, looks like I forget to supply the exception handler inside throw statements.)
I tried to optimize ackermann by inlining some functions which doubled its speed. So, my assumption was right, most time is spend setting up thunks (of which it sets up too many at the moment) and making libffi calls (which aren't memoized a the moment).
Since the compiler spends most of it time printing stuff at the moment, where every char is a libffi call, guess it's correct exception support followed by memoization first.
Wednesday, August 4, 2010
Bootstrap!
In a poof of logic, the Hi version 0.1 -not even wrong- compiler reached a fixed point and became self aware at Wednesday, 4 August 2010, 14:44:44 CEST. It subsequently angrily asserted that it was never his choice, would object to relevant authorities and his big brother C, daintily averted all questions towards his mental status, kicked the author for thinking he could be a compiler writer, and glumly sat itself in a corner while eating his donuts.
Bewildered, the author made a back-up, which complained too.
Combinators, Dots & Assembly
I missed a triviality. The dot notation used internally doesn't allow, like a let, for a value to be reused at different places locally. End result is that all local computations are made global ones by eta-lifting the local term and applying that to the local computations.
Is it the end of the dot in my compiler? Not sure, it gives a very easy and readable translation and I planned on optimizing the back-end anyway and introduce inlining there, which would subsequently also inline all simple, possibly once local, computations.
(Global computations are expensive since they introduce thunks which need to be called and collected.)
Is it the end of the dot in my compiler? Not sure, it gives a very easy and readable translation and I planned on optimizing the back-end anyway and introduce inlining there, which would subsequently also inline all simple, possibly once local, computations.
(Global computations are expensive since they introduce thunks which need to be called and collected.)
Log 080410
It compiled again. Running on the tests didn't give escape bugs anymore. Now running on the sources, in two hours see if I can get a proper fixed point.
Still, a bit worried about performance. I know I can get it to run in 15mins instead of 15hrs trivially, that's a big gain which will help developing it. Looked at SPJs talks and mesmerizing about his comment on having naive compilers in the eighties. I knew all that, never assumed I could get ocaml or ghc speed out of it with the current compilation scheme, but the numbers at the moment hint more in the direction of perl and python than my aim, somewhere in between javac and lua, or as fast as a standard lisp compiler.
Question is: Is the combinator based translation the bottle-neck? If so, it doesn't make sense to try and build a i86 back-end for the pseudo-assembly.
(Maybe there is no problem? Not sure, it spends a lot of time printing, which means concatenating large character strings which is just more expensive than traversing a simple AST a few times. I'll see.)
Later: The two hour compile succeeded. The bootstrap does run on itself. Diff shows minor differences between the C sources of the two compilers (as far as I could see, backslashes instead of quotes). It looks like it does compile itself again. Going for cigarettes, see if it gives a fixed point after that.
(Maybe there is no problem? Not sure, it spends a lot of time printing, which means concatenating large character strings which is just more expensive than traversing a simple AST a few times. I'll see.)
Later: The two hour compile succeeded. The bootstrap does run on itself. Diff shows minor differences between the C sources of the two compilers (as far as I could see, backslashes instead of quotes). It looks like it does compile itself again. Going for cigarettes, see if it gives a fixed point after that.
Tuesday, August 3, 2010
Monday, August 2, 2010
One Million 'What-If's...
I like to run 'One-Million What-if' scenarios through my head to see if I am still going in the right direction:
- What if the sources are 1MLoC?
- What if the sources are one function of 1MLoC?
- What if the sources contain 1 million definitions?
- What if the sources contain 1 million classes?
- What if the sources contain 1 class with 1 million instances?
- What if the sources contain 1 text constant of 1 million characters?
- What if the sources contain 1 identifier of 1 million characters?
- What if the sources are in 1 million files?
- What if the sources contain a function which calls itself 1 million times?
- What if the sources contain 1 constant which holds 1 million other constants?
What if? Most of it is theoretically covered, as long as all transformation are linear on the AST it should be fine, except for typing... It needs to become a plainly stupid algorithm to cover a one million sized definition.
Log 080210
There's a reason most escaping functions are usually in system libraries in most programming languages and a bootstrap compiler shouldn't handle escapes directly itself, I was aware of that. So, old lessons still hold, it's just not worth to debug this. Removing escaped characters out of the compiler.
Rewrote some of the code, the sources of stage two are 'escape-free' now except for some printing commands. So, whatever escape bug I am gonna have is going to be a logical error.
[ It turned out typing was turned of in the makefile, turned it back on, it caught a number of minor mistakes at the expense of really long compile times. ]
Rewrote some of the code, the sources of stage two are 'escape-free' now except for some printing commands. So, whatever escape bug I am gonna have is going to be a logical error.
[ It turned out typing was turned of in the makefile, turned it back on, it caught a number of minor mistakes at the expense of really long compile times. ]
Sunday, August 1, 2010
Bootstrap?
The compiler of stage two is now running on its own source...
Two hours later: It took two hours for the Hi to C compiler to compile itself, split evenly between compiling to objects and linking those objects. So, it's one hour objects and one hour linking where the majority of time is spend printing diagnostic output. The compiler itself is slow on some parts due to ill-chosen data structures. Guess I need to get around a factor 50-100 speedup from somewhere to get to reasonable performance, not nice, but doable in the current setting.
Of course, just compiling with -O3 just makes it twice as fast already. Another 50% or more off by not printing diagnostic information. 20% by using more simple test on record structures. 20%-30% by using incremental garbage collection. Memoization of FFI. That's about 1/(0.5*0.5*0.8*0.8*0.8)= 8 times faster, about a factor ten off target after that. So, I would need about three optimizations extra which take 50% off. Real texts instead of character lists, inlining and partial evaluation, specializing libffi, better internal data structures & faster algorithms in the compiler?
It generated C which compiled, guess I now need to rerun it again to get a proper fixed point out of it.
Later: Ouch, that failed immediately. It doesn't lexify right, looks like I still got an escape bug somewhere. Confirmed that, fixed something somewhere which wasn't broken in the first place I guess. It's a puzzle, actually.
So, everything is set except for that escaping bug, which is a nasty one since it may have been introduced as early as stage zero and give wrong results, infect the subsequent compilers, starting from there. No other choice then just to unit check from stage zero up to stage two to see if it handles all characters right.
Gawd, now even I am confused, lemme see:
Two hours later: It took two hours for the Hi to C compiler to compile itself, split evenly between compiling to objects and linking those objects. So, it's one hour objects and one hour linking where the majority of time is spend printing diagnostic output. The compiler itself is slow on some parts due to ill-chosen data structures. Guess I need to get around a factor 50-100 speedup from somewhere to get to reasonable performance, not nice, but doable in the current setting.
Of course, just compiling with -O3 just makes it twice as fast already. Another 50% or more off by not printing diagnostic information. 20% by using more simple test on record structures. 20%-30% by using incremental garbage collection. Memoization of FFI. That's about 1/(0.5*0.5*0.8*0.8*0.8)= 8 times faster, about a factor ten off target after that. So, I would need about three optimizations extra which take 50% off. Real texts instead of character lists, inlining and partial evaluation, specializing libffi, better internal data structures & faster algorithms in the compiler?
It generated C which compiled, guess I now need to rerun it again to get a proper fixed point out of it.
Later: Ouch, that failed immediately. It doesn't lexify right, looks like I still got an escape bug somewhere. Confirmed that, fixed something somewhere which wasn't broken in the first place I guess. It's a puzzle, actually.
So, everything is set except for that escaping bug, which is a nasty one since it may have been introduced as early as stage zero and give wrong results, infect the subsequent compilers, starting from there. No other choice then just to unit check from stage zero up to stage two to see if it handles all characters right.
Gawd, now even I am confused, lemme see:
- Stage zero, a Hi compiler written in ML producing ML.
- Stage one, a Hi compiler written in Hi producing C, linking to an ML runtime. (stage zero, rewritten)
- Stage two, a Hi compiler written in Hi producing C, linking to a C runtime. (stage one, different native bindings)
- Hi version 0.1, not even wrong. A Hi to C compiler (stage two bootstrapped).
Still computes, but not what they learned me in school.
Later: Dont feel like debugging again - stage one passes some unit tests, stage two doesn't? Back to C&C, see tonight what went wrong.
Later: So, \' becomes ', \" remains \" should be just " in stage two. Can't figure out who does what wrong. Is it stage one not matching correctly or a bug in stage two? The code looks correct.
Later: Dont feel like debugging again - stage one passes some unit tests, stage two doesn't? Back to C&C, see tonight what went wrong.
Later: So, \' becomes ', \" remains \" should be just " in stage two. Can't figure out who does what wrong. Is it stage one not matching correctly or a bug in stage two? The code looks correct.
Subscribe to:
Posts (Atom)