Wednesday, June 30, 2010

Log 063010

Writing serialization code. It ain't too nice, should look at betters ways of doing it, just using a piece of malloc'ed memory for a buffer and some serialization primitives - which I should do better.

Wrote a faculty function on longs, debugged it and it got accepted. Got too much boilerplate code now on various types in the C runtime, thinking about removing references to uint8, .., sint64, hi_char, stuff like that - though some people might just like that approach, not sure. Principally, I only need C types and the Hi internal word size.

I should serialize w.r.t. to a serializer, and make it possible to drop in different ones.

Tuesday, June 29, 2010

Foot Hurts

Shot myself in the foot again. Good news, the system file of stage2 compiles, there don't seem to be bugs. Bad news, I use integers of system width... Which means on the somewhat awkward i86_64 architecture, ints are 32 bits wide, and pointers are 64 bits wide, which means that in my stage2 compiler, at some places I can't use ints to represent bits, but need longs. Which is fine, but hell.. On the good side, it's gonna be robust, since I need to cover all corner cases. (It's just adjusting a type definition and some code.)

(Notice I did change the semantics of the language somewhat the last weeks by not coercing values anymore. I am not sure about that step, but I thought it'll probably work out better if I map types to C types directly since I want to remain very close to C.

Which I am still not that sure of:
For: Type safety. One doesn't coerce that often (unless you link to C....)
Against: Loss in speed in some cases due to explicit conversions, I work with 64 bit word sizes anyway. The need for different sizes in C is not hogging the memory and speed, both I don't care about. Difficult to optimize the conversions away since everything I do is done through FFI anyway.)

XText

I like Eclipse, I guess I might use Xtext. Looks like it has all I need. Problem is, the parser I have is not fully LL, but I think I can make it LL by factoring out some stuff. (I cheated for some instances like 'pattern = expression; expression'.)

Monday, June 28, 2010

Log 062810

I write too much bullshit. Anyway, had a funny bug, somehow, somewhere, sometime, some integer comparison gave the wrong result. For some reason, I thought the ints in the Hi language were 64 bits, but they are not - all types follow the same convention as C, so char is a system char size, int is a system in size, float is a .... (I don't want surprises there, it's supposed to be a glue language.) It's just that internally the compiler uses ints of pointer size, it needs to, for the simple memory model I have. So, changed some routines to 64 bits integer, thought "Whoa, ho there!", changed it back. Bug gone.

Which leaves getting some of the dynamic libraries I link against right, for that I'll just follow the Posix C standard.

How to Implement: Var?

I gave it some more thought. If I ever want Hi to become anything more than a marginal language, I need state... So, for later: How to implement var? And at what level...? (And what to do with multicore, I explicitly want to avoid the need for locking....

Still, monadic programming in some sense is nothing more than programming w.r.t. to an encapsulated environment. There should be a manner to let that environment be mapped to a namespace. For example, each function when invoked keeps around a (fresh?) copy of the variables within that namespace?

Namespace bound variables seem too cumbersome to me, so all variables will just have the same life-time as a heap. Instantiate a new heap, and all variables are set to their initial value.

Integer Code

Hi code is compiled to lambda expression, lambda expressions are compiled to combinators, combinators are compiled to an abstract assembly which only knows how to do one thing: manipulate, inspect and adjust, arrays of integers of pointer size width. Arrays of integers are primitives of the assembly.

In order to do end up with only integer (bit) arrays, all primitives are compiled to them, which, in all respects, works great, and is just a cool thing to do.

In the array language, it is determined which arrays are constants, and which arrays are dynamic, and in one pass the code is generated which explicitly allocates dynamic arrays. (The latter is an easy check, if an array operand is somewhere filled with a register, or is a record array holding a null pointer, it is dynamic.)

The assembly assumes an infinite amount of write-once registers, which means I don't translate to loops, as of yet, and except for dynamically allocated arrays, all registers hold constant values.

I assume C can optimize a lot out of that, but C has some problems handling the static data ('near initializer isn't a constant,' solved for now), and I am not sure C understands all invariants I am trying to communicate to it.

I need my own optimizer.

Code Sizes and Performance

I am looking at my generated code, and yes, it produces a lot. I'll stick with it, for now, but I can but help wonder about the bloat of it. Below, a gross oversimplification.

Say you have n number of constructs, which binary encoded, take N bits. Say your program consists out of m of these constructs. Say for each of these constructs you need E bits to evaluate them. Then the difference in size between interpreted and compiled code is the decision between m*N+n*E (the binary size of m constructs + the size of the interpreter) or m*N*E (the binary size of compiling each construct). This balance is skewed in the sense that N*E can be made small, so I should concentrate on that I think.

It affects speed too. Given caches and all, if the size of the interpreted code is sufficiently smaller than the size of the compiled code, the interpreted code just might run faster.

Second, a change I'll implement anyway. I generate a lot of constants, my assumption is that the C compiler is smart enough to collapse a large number of the same constant to one unique constant. I think I was wrong, the const tag in C denotes a read-only value, not a constant value, so I guess -a minor adoption,- at some point I'ld need to factor constants out myself.

Last thing is locality. If a group of functions define a module, than that group of functions should be placed close together in memory, such that the program doesn't jump 'all over the place.' I think I am good in that respect, the namespaces help there too, but I might look at select statements which are general, and used a lot.

Log 062810

Woke up and most system file unit tests pass, which leaves writing serialization code.

I found some strange bugs in the unit tests, which shouldn't have been there? Might have been patches of code against operational semantics bugs of the previous compiler.

Doing It Right

My last compile halted, with a 'This is a message from C to man: Don't place necessary code into DEBUG statements.' core dump.

Which gave me some time to think, and I finally figured out how I am going to do concurrency, or I think I might go in this direction. No locking, no nothing, just tasks. I'll need three primitives: thread f (treat this function as a thread), apply f (apply this thread-like function, send me back the result, and dispose f (you're time is done). A fourth primitive ready f, is to coordinate the results, see if the thread is done yet.

Now f can't be an ordinary function, what I want is a function which can carry its state through several computations. So, coalgebraic functions a gogo: this is approximately what the interface of the thread module will look like.

type threadable = \i \o => i -> (threadable i o, o) // Grin, this typechecks in Hi
type thread_t = \i \o => [ whatever I need here ]

def thread: threadable i o -> thread_t i o
def apply: thread_t i o -> i -> o
def ready: thread_t i o -> bool
def dispose: thread_t i o -> unit

The idea is just to inject a coalgebraic function into a heap/piece of memory, and supply it as many times an argument as you need it. If an argument is supplied, a thread runs the function to completion, returns the result, and puts itself into a new state. The ready routine can be used to check whether a thread is ready to accept new computations.

The thing is entirely impure, but who cares. Also bad is that threads, as far as I am concerned, may migrate between processors, not sure that is good or bad.

It's a pretty raw idea, but this is the direction I guess I am heading.

(I doodled a bit further with this idea, just to see what goes where, and which primitives I need.)

The thread routines allocates a thread structure outside of the heap. Both input and output must be serializable and be placeable into that structure. A new heap is created, the function f is injected into the new heap, together with main/exit/exception wrappers which also refer to the thread structure.

The apply routine places the return location, a new root in the original heap, into the thread structure. It also copies the argument into the thread structure, and starts the [main . . ] f p computation. Main copies the argument out of the thread and runs f.

When the function f finishes, the address of the result is stored, the result is copied to the thread structure and the heap is put back into the accepting state.

The return routine checks if a result is made available, if so, it copies it, wipes the thread structure, and places the thread structure in the ready state, and returns true. If no result is there, it returns false, if the thread structure is in the accepting state, it returns true.

The dispose routine stops the thread and deallocates the thread structure.

It makes the most sense to see a heap/environment as a 'thing' which just holds a function which takes input and produces output - i.e., I need to build everything such that they coalesce.

It's still too much copying, hmmm....

Sunday, June 27, 2010

Dots and Faculties

While I am grappling with my mind, and the compiler happily crunches on some Hi files, I am writing on Dot evaluation again. Decided to write a simple interpreter for it in C also, never bad to be able to fall back, and it makes sure I got all invariants right. One thing I got wrong, not in the compiler, is identity. I x = x and J x = [@ x] just are not the same functions since J cannot `handle' simple constants.

Saturday, June 26, 2010

Log 062610

The stage1 compiler compiles the system file again, together with some of the unit tests (Hello world, Fac, etc). I guess I am gonna unit test myself through the stage 2 compiler again.

Sigh, great. It accepts and generates correct code on all simple tests, but there's a recursion bug in the system file? I don't get it. What happened to induction? (It may be the FFI combinator, guess I'll check there...)

062710: There's a reordering bug, shouldn't be hard. Seems fixed, I had it before and thought an easy fix was just to push in a LIFO queue. Silliness. Dots for thunk stacks should be pushed FIFO, but in thunks -the arguments- LIFO.

Proposal for Macros

Great to be talking to myself again. A small proposal to get my metaprogramming/template thingy working.

I am not going C-style templates, neither am I going the Haskell way, rather, I guess I am going enhanced LaTeX, i.e., I am going to work on the syntactic level with a C-preprocessor like syntax.

I.e., the only thing I am adding is a #define primitive, but with a twist. A define macro defines one in a series of rewrite rules, define macros pattern match on the output of the tokenizer. Also, a #definition macro expands a defining construct in scope to its series of tokens. And I guess #if, a rule is defined, #ifndef, if no rule is defined, #else, yeah, what else, #undef, remove all rules, are also trivial extensions.

Example:

#define add_guard('[', v, '->')  '[' v ': if' v '> 10'


#define fac('1') '1'
#define fac(n)   '(' n '* (fac (' n '- 1)))' 

Not too sure, it might be too verbose in its usage, but it is simple, and Turing complete anyway.

Yeah, sure of it. Convenient well-know syntax, easy semantics, and just as type-safe as the Haskell proposal since the result is type-checked anyway. (Ok, the last is almost true ;) )

The sun shines, the sky is blue, I didn't sleep, again?

Staging

I have three compilers:

  • Stage 0, a Hi to ML compiler written in ML
  • Stage 1, a Hi to C compiler written in Hi which links to an ML runtime
  • Stage 2, a Hi to C compiler written in Hi which links to a C runtime
Stage 0 has been stable, and very slow, for ages now. I have tried to keep stage 1 and 2 as close as possible to each other (only difference is the system file) to get to a proper bootstrap. Stage 1 is even slower but getting to be stable now. I am not going to think about stage 2 for a while, and concentrate on getting a stable stage 1 compiler, and just rewrite whatever needs to be rewritten of stage 2 at a later point because of the marshalling / serialization / reflection code.

Friday, June 25, 2010

Marshalling, serialization, introspection and metaprogramming

The system file, which holds -among others- all primitives, compiles again, and I am looking at how everything w.r.t. the new runtime holds.

I have a compiler which mimicks gcc in the sense that it compiles 'compilation units,' which are files at the moment. I marshal precompiled ASTs in between. Now, for technical reasons, the runtime doesn't play nice with marshalling. Basically, since it needs to insert an unknown number of nodes into it while making an FFI call. Besides that, raw pointers aren't chased since they are seen as series of bits. The, rather clumsy, way around that is just to reserve enough space and do a collect before unmarshalling a structure, and regarding pointers, well, the same trick.

(Why not do it better than that like a-la a Java VM? Well, I am doing translation to, and easy interfacing with, C, and doing it better comes at a costs. I now only try to collect in between combinator rewrites, and that works well and has a very low overhead compared to a 'try-to-collect with each pointer update approach,' which is the only other way to get big structures inserted. A lot of fine-grained combinator rewrites does the same, but leaves my very simple runtime intact.)

I don't like reserving space and I don't like the unsafe pointers.

So, I Figured out the raw pointer problem, I'll 'expose' raw pointers in the heap and just chase them if they point into it, it's the only sane solution.

As far as marshalling goes, As stated before, the other option is to serialize data, which means writing (un-)serialization code in the programming language. I like that manner, it is probably slower, but I don't need to get involved in making the runtime more complex.

Writing serialization code is boilerplate, it is a lot uninteresting error-prone code (the AST has a hundred plus nodes). Since I have interfaces which mimic Haskell's type classes, I'ld like to implement a 'Deriving' feature. Bad thing about that is that it means making the compiler more complex, and I'ld need to implement routines for each possible deriving type of interface.

Another option is to implement introspection code and generate serialization code from that. Bad thing is, since I don't save that much type information and serialization depends on that, I, well, can't really.

Which leaves the last option: Metaprogramming. I have been thinking about implementing a preprocessor a-la C. I am not really that much of a fan of it, it opens a way to abuse, but given everything, that really is the best option since it would solve all my problems, while opening up a lot of interesting features for programmers.

But, it is for later, I'll write some scrap code now.

And I figured out how to do it too, a neat trick. I'll just expose (parts of) the AST as data compile time, and rely on the partial evaluator to optimize the result. That means I (probably) won't generate, or insert, strings in the AST, which means something somewhat less than C templates, but it'll mean I can scrap enough boilerplate code from the compiler, which means, by absurd induction, it'll probably be enough to solve six sigma 99.99..% of all cases.

It boils down to something easy -a primitive which exposes/inserts part of the AST compile time,- so a half-baked, compile-time introspection, no real meta-programming to speak of. Just enough not to write down all definitions explicitly.

I toyed around with some pseudo-code, imagining the resulting code. I cannot produce an explicit test? Or at least, not without generating AST code, I guess, somehow. So, reflection is just not enough... Template meta-programming, going back to the syntactic level?

Scanned the Meta-programming in Haskell paper. Except for monads (yuck), it just follows the introspection/generation pattern. Good: Stuff is typed. Bad: Not a lot of 'syntactic' sugar, and no syntax tricks on the meta-programming level. Explicitly building an AST is just, well, cumbersome. Sometimes, it's just more convenient to run syntactic tricks. See if I can do it differently, go a step back, and just work on the token level.

062610: Man, I assumed the heap garbage collector was obvious, but I guess I'ld better check all invariants there too. Added the pointer chasing code, checked the invariants informally against a pseudo algorithm of classic Cheney. They seem to hold.

My 't' is stuck in Dutch, and my 's' is stuck in English.

Thursday, June 24, 2010

Performance Day

The new set-up with the combinator intermediate passes the runtime checks. I decided that for the hell of it I'ld patch together some performance tests to run after the system file compiles. I want to see how it compares to the other benchmarks as posted on the Factor blog.

Wednesday, June 23, 2010

Log 062310

I decided it just doesn't make sense not to implement a combinator intermediate, so again, a zillion steps back, one step forward.

It's an annoying step. I've been going backwards and forwards between two representations, and thought it would just be the easiest to maintain some invariants in lambda expressions. In the end, it just doesn't make sense. The invariants are ok-ish (a series of applications is a thunk), but don't deal well with corner casess (one single combinator should set up a thunk too). The approach isn't faster (compiling to a simpler intermediate is better than transforming complex ones), and in the end, an intermediate for combinators is just more straightforward, better analyzable and better readable, even if it means introducing some new code.

062310: It's mainly refactoring in the end. So, explicit code came out fluently, it cleaned up.
062410: Decided again that sleep is a waste of time. Generated code seems to pass the basic runtime checks; next, see if it compiles the system library.

Monday, June 21, 2010

Log 062110

Sigh. Headaches and it just makes too much sense to implement a combinator intermediate...

Oh man, stupid.

Sunday, June 20, 2010

Log 062010

FFI works again, fac compiles again. Still weeding out small bugs, no deal breakers yet. More optimizations came up, for the future, memoization of DL separate from FFI calls, i.e. break it into two combinators and memoization of FFI.

Friday, June 18, 2010

Log 061810

Programmed a bit further, factored out the bits conversion. Code looks clean again. Looked at the CAM machine.

There's, of course, an informal proof that termination of eager reduction is good enough to prove Turing completeness of Dot. Since the SKI is expressible for normalizing terms, and normalization equals a normalizing eager algorithm which just rewrites an encoded term lazily, it is Turing complete.

QED?

Thursday, June 17, 2010

Log 061710

The first few unit tests compile and link, just a few bit representation errors in the code. Weeded out some trivial coding errors, like you shift 8 bits or you multiply by 256, but you don't multiply by 8. The compiler doesn't handle the new FFI interface yet, so I am implementing that. Bit stuck at which interface boundary I should shift from handling primitive values to series of bits representation. Before, or after, I compile down to lambda terms? Silly thing actually, doing it before means less code but difficult to debug.

Should put the conversion of primitive data to series of integers in a separate module. Make sure I implement the reverse conversions too, and I am set.

I read a bit more on the G-machine, had forgotten half of it, and was a bit doubtful I wasn't actually implementing the same machinery. I am not, it is close but different. I was right. Where the G-machine reduces term expressions of the graph representation of a combinatorial term, by traversing the graph, and storing back links, I reduce directly on a stack representation, it should be faster.

Bit thinking further on the Dot model. I actually don't use it internally, but just refer to it as a mental model, and compile straight from 'combinatorized' lambda terms to code. It's hard to maintain the right invariants that manner, should rethink, whether, at some point, it is right to just add an intermediate combinatorics layer. Unless I run into big debugging problems, I guess this is for later.

Wednesday, June 16, 2010

Silly Kotcha

I was writing some informal stuff on the Dot notation again, just to finally settle it and prove some invariant. It's pretty light-weight what I did, no real math as of yet, there doesn't seem a point in formalizing a lot of the arguments to the bone since it is trivial enough as it is.

One thing which is nice though, is how to prove Turing completeness. First, there's a somewhat trivial argument that the Dot notation and evaluation is Turing complete, since S and K are trivially expressible, and eager evaluation will, by the confluency theorem, normalize terminating computations.

But, I wonder, is that good enough? I am not sure I actually proved Turing completeness that manner since I only showed it then for terminating terms. It would be better to show that it is possible to restate S and K such, that the eager reduction of those will correspond to the lazy evaluation of the original symbols.

I.e., K can be expressed as [K x y] = [@ x] and S can be expressed as [S x y z] = [@ x z .] [@ y z]. It should be easy to restate them lazily, since it is well known how to defer evaluation in an eager language by abstracting terms and suspending evaluation until a dummy argument is supplied. So, I am looking at S' and K' which would do that, or possibly, by restating the evaluation of application to a lazy form.

Below, my funny thoughts using Blogger as a scratch pad. Don't read if you don't like hazy thoughts.

So, an initial thought, since lazy evaluation reduces the head of an expression, which should be equivalent to reducing the last term in the Dot notation, K' should become [K' x y d] = [@ x d], but what should S' be?

Tinkering further, A for lazy apply should become, [A x y d] = [@ . y d] [@ x d]? Hmm, shallow encoding doesn't work...? Tinkering along, see no reason shallowness shouldn't work, apply should do it... Maybe need a defer operator and express it in SKI directly? [D f d] = [@ f]?

Lemme see, I think I nailed it. First, I need to show that SKI can be translated to a deferred term such that when that term is applied to a dummy argument, we get the original term back. Second, I'll need to show that that term eager reduced is equivalent to lazy reducing the original term.

So, let A x y d = (let x' = (x d) in x' (y d)), making the evaluation order explicit, and let D f d = f. Then let the translation function T be defined as: T(K) = D K, T(S) = D S, and T(x y) is A (T(x)) (T(y)). Then,  T(E) d = E, since:
  • T(K) d = D K d = K
  • T(S) d = D S d = S
  • T(x y) d = A (T(x)) (T(y)) d = (let x' = (T(x) d) in x' (T(y) d)) = (T(x) d) (T(y) d), thus by induction is (x y).
Next step... This is a bit fuzzy, it would be better to write it down in normal arrow notation, but whatever. The only relevant case is the A definition. Since in the A definition the let variable, which is the translation of the function, is reduced first, and later applied to a reduced argument...? Hmm, it breaks here? Wrong A/translation? The argument should be reduced whenever it comes in head position, not earlier. Ah well, tinker away tomorrow again. Guess I need both a suspend/defer and an evaluate combinator.


Hmmm....

Log 061610

Another nice date. I changed the ocaml runtime to support 64 bit native integer, which turned out to be trivial. For some reason, 64 bit integers are not easy to handle in C, there are small errors in macros and type conversions, no idea, I am basically hacking that code until it passes the unit tests which is always a bad thing. I weeded out some bugs in the compilation of Hi to C, no major issues there at the moment, just some minor coding errors and some unfinished stuff. I minimized the runtime further, guess I got rid of some 2kLoC. The generated code for the first Hi unit test links with the new runtime, with some minor bugs.

Ah well.

Sunday, June 13, 2010

C-Machine

I thought of a name for the setting described in the last post, and since there is a G-machine -where G means Graph,- I thought I would name it C-machine, somewhat reminiscent of that. C, of course, a double reference to the language compiled to, and combinators, since that is what it essentially is: A Combinator Evaluation Machine in C.

Thing is, I don't really know yet where to state the formalism, as a reverse-polish stack machine close to a normal machine including K and RT pointers and an understanding of primitive values, or as just a combinatorial eager evaluation strategy on combinatorial DOT expressions. The difference between those is that in the first it is possible to express Shift/Reset directly, in the second it is not.

Goal is not to reinvent the wheel, but just to have a good understanding of what I did, and see where it fits w.r.t. other machines. (Don't assume I think I did something outstandingly smart though.) I thought about it some more, the G-machine, as an example, is certainly different, since it trivially allows shared reductions for instance. A thing the C-machine, apart for explicit lets, does not.

Guess I'll use the space below to elaborate on both.

A DOT Evaluator

Below, the gist of DOT evaluation, semi-formal. A DOT evaluator is a combinatorial evaluation strategy used as an intermediate notation to make clear how thunks can be set up and reduced.

Note that there is nothing new about this notation. It is a machine-amiable convenience notation used to show, with a few invariants, that it is possible to safely translate combinatorial expressions to C, not as a new notation for combinatorics.

A DOT expression is a syntactic notation for standard combinatorial expressions. A combinatorial DOT expression consists of constants written in lower case, dots '.', combinators in uppercase, and series of squared series out of the former three. A rewrite rule is a squared series starting with a combinator producing a combinator expression, bound variables may occur in rewrites.

Examples:
[I x] = x
[K x y] = x
[S x y z] = [x z .] [y z]
(What is the expression for lazy evaluation of S?)

Basic Rewrite Rules

A DOT evaluator rewrites a combinator with a fixed strategy. Always, the last expression in a series is evaluated. When it reduces to a constant, its value is placed in the first not-nested dot encountered reading from right-to-left. When it reduces to a squared series, that squared series replaces the original term. Evaluation halts when the last expression does not reduce.

The above rewriter gives a fixed rewrite strategy on DOT expressions. Note that the order of the series rewritten to matters.

Example, using some extra notation:
F n = if n = 1 then [1] else [* . .] [n] [F .] [- n 1]

Evaluation of F 3:
[F 3]
= (if 3 = 1 then [1] else [* . .] [3] [F .] [- 3 1])
= [* . .] [3] [F .] [- 3 1]
= [* . .] [3] [F .] [2]
= [* . .] [3] [F 2]
= [* . .] [3] (if 2 = 1 then [1] else [* . .] [2] [F .] [- 2 1])
= [* . .] [3] [* . .] [2] [F .] [- 2 1]
= [* . .] [3] [* . .] [2] [F .] [1]
= [* . .] [3] [* . .] [2] [F 1]
= [* . .] [3] [* . .] [2] (if 1 = 1 then [1] else [* . .] [1] [F .] [- 1 1])
= [* . .] [3] [* . .] [2] [1]
= [* . .] [3] [* . 1] [2]
= [* . .] [3] [* 2 1]
= [* . .] [3] [2]
= [* . 2] [3]
= [* 3 2]
= [6]
Expand and apply rules.

The above rewrite rules are not complete. Reducing a combinator also implies that all extra arguments are inherited by the first squared expression. Also, a Curried expression is treated as a constant. Dots or variables may not occur in the first position, rather, a special apply @ combinator is used where [@ v] = [v], in case the v held a squared expression. In the rules above, the apply is implicit. I.e., [I x] = [@ x].

Examples:

[K x y z] 
= [ @ x z ]
also,

[K . y z] [S x] 
= [K [S x] y z]
= [@ [S x] z]
= [S x z]
Well-formedness and complete reduction
(I need help getting this section right.)

All variables occurring on the right-hand side must be bound on the left-hand side of a rewrite. In the right-hand-side of a rewrite, a squared expression must be bound to an exclusive dot to its right, there may not be more dots than squared expressions.

Since a rewrite substitutes all variables, no variables will occur when rewriting a term which doesn't contain variables; a term without variables will therefore rewrite to another term without variables. Since the last term, during the rewriting of a dot expression, cannot contain dots, since all dots in the last term will always be filled, terms rewritten are always strongly reduced. Since all terms rewritten to always start with a combinator, albeit apply, all squared terms always start with a combinator.

The above invariants are sufficient to guarantee that each combinator can be compiled directly to a C routine, and that, by the confluency lemma, terms who have a normal form reduce to that. (I guess.) (I should show Turing equivalence here.)

Shift/Reset
The shift/reset pair saves and restores a stack position. It cannot be expressed directly, therefore some extra notation with curly braces is used. When a shift is encountered, curly braces are introduced, and evaluation continues in that expression, when a value is determined the curly braces are lost, when a reset is reduced, the whole evaluation continues with that expression.

For example (check with compiler internals):
[ Shift G ]
= { [G .] [Reset] }
= { [G [Reset]] }
= {[K . . ] [ F ] [ Reset . ] [3] }
= {[K . . ] [ F ] [ Reset 3 ] }
= [3]

Anyone talking about the eighties?

Friday, June 11, 2010

Evaluation (Untested)

Below, some calls and combinators which are used in the runtime. It is rewritten, untested at the moment, since I use a slightly different thunk layout than before.

The design is new, I didn't look too much at other implementations. All work is done in one heap, no stack since I didn't want to make the runtime too complex, and certainly didn't want to use the C stack since it is too easy to run out of stack space with a functional language.

The calls are just those primitives I needed in the language, everything else -including integer addition- is handled through FFI. For some applications, that will mean a big performance hit, but that's the price you pay.

The runtime has three sorts of values: Constants, size and type tagged series of bits. Records, size, type and record typed tagged series of pointers. And Thunks, the result of combinator translating lambda expressions. A thunk is translated in a somewhat novel way; I tried not to implement a CPS transform but go for a direct translation which should have enough performance.

The translation is straightforward when you think of it. Everything is compiled down to untyped lambda terms, local computations are 'letified,' but I still think of local computations as if they just push a series of thunks into the heap. A thunk just knows two things: 'Where to store the result of a calculation' which may be a thunk, it is stored into an RT pointer. And, 'What to do next' which is the continuation, it is stored into a K pointer. Since the runtime doesn't allow for pointers into arrays, the RT pointer is split into two parts: A pointer which points to the start of a record or thunk, and an index into that thunk.

An example, suppose you have a computation '\x -> f (g x) 3'. This is letified to '\x -> let a is 3 in let b is g x in f b a' which translates to a combinator S which pushes three thunks into the heap. In DOT form, 'S x = [f . .] [g x] [3]', each respective calculation is stored into each dot reading from right to left. During the translation the bookkeeping for pushing local computations and constants is resolved and instructions which store the pointers to respective thunks and records are generated. For proper constants, actually nothing is done except for placing the constant in the RT and returning K. For thunks, the K and RT fields are copied into the thunk, and new calculations are set up which will evaluate that thunk first after their own evaluation is done.

In pseudocode, a program for the above example would read something like:
[S, k, rt, args] ->
    if args is [] then rt = [S, k, rt, args]; return k
    else x = expand([f, k, rt], args, 1) ; y = [g, x, x[3], args[0] ]; x[4] = 3; return y
The first line is an obligatory arity check. If not enough arguments are presents, the thunk is treated as the result and stored for currying purposes. Otherwise, the thunk for the body of the function x is pushed which is evaluated last according to eager semantics, it also inherits any extra arguments to S; next, another thunk y is pushed, corresponding to an argument of the body thunk, which when evaluated places its results in the x thunk; finally, a constant 3 is stored into the body thunk, and the thunk y is returned as the thunk to be evaluated first.

During translation, all 'arrays' are checked whether they are constant and those are compiled to static data, anything else is allocated on the heap. A main combinator, which calls -erm- main, is wrapped around the whole program together with an exception handler.

I like it a lot more than a CPS transform, though it is similar. Another way of thinking about is it like a G-machine, except that we already reversed some pointers. (The G-machine rewrites graphs, this translation just evaluates rewrites.) There is a straightforward translation for lazy functional languages too, since lazy evaluation corresponds to another evaluation order, it means just a simple swap in pushing K pointers, i.e., the order of 'What to do next' is just different.

Since combinators just push thunks, there is no stack, and they are usually not referenced anymore after they are evaluated, it is unnecessary to treat recursion different than any other function call. And hence, there is no need for tail call optimizations.

Comparison of speed with a CPS transform and translation to native code or a G-machine is difficult. I don't have a stack, so the heap will be used a lot but the translation is so direct that I think I removed the need for what would be a lot of stack pushes in other settings. It should -in theory- be faster than a naive G-machine evaluator since a lot of what the G-machine does is already statically determined. I translate to C, trampoline thunks, use FFI for almost everything, and use 64 bits for integers and pointers. It should be sluggish, but give decent performance with a generational garbage collector.

(But, I don't care too much about performance since it is easy to call C and I guess I am gonna be on par with something like Java and faster than Python. If you need performance, or don't like wasting bits, then you should call C. For example, a StringBuffer class could be implemented through libffi.)

Enough blabbering, below some untested and unfinished evaluation routines.


//** start of file:  eval/eval.h
/**
 *  COPYRIGHT 2010, M.C.A. Devillers
 *
 *  THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY
 *  EXPRESSED OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
 *
 *  The runtime trampolines thunks - each thunk points to a
 *  C routine which evaluates it.
 *
 *  Layout of a constant:
 *     c[0]     size
 *     c[1]     0
 *     c[2]     type tag
 *     c[3..]   integers
 *
 *  Layout of a record:
 *     r[0]     size
 *     r[1]     1
 *     r[2]     pointer to type tag
 *     r[3]     pointer to record tag
 *     r[4..]   pointers to values
 *
 *  Layout of a thunk:
 *     t[0]     size
 *     t[1]     pointer to C evaluator
 *     t[2]     pointer to continuation thunk
 *     t[3]     pointer to result
 *     t[4]     pointer to integer (index in result)
 *     t[5..]   pointers to arguments
 */

#include <stdlib.h>
#include <stdio.h>
#include <stdarg.h>
#include <stdint.h>
#include <string.h>
#include <dlfcn.h>
#include "../types/types.h"

hi_int* __expand(void* v, hi_int *t0, hi_int* t1, hi_int* n);
hi_int* __alloc(void* v, hi_int* n);

hi_int* ___apply(void* v, hi_int *n);
hi_int* ___embed(void* v, hi_int *n);
hi_int* ___shift(void* v, hi_int *n);
hi_int* ___reset(void* v, hi_int *n);

hi_int* __exception(void* v, hi_int *n);
hi_int* __exit(void* v, hi_int *n);

hi_int* __system__reserve(void* v, hi_int *n);
hi_int* __enter_symbol(void* v, hi_int* s);

hi_int* __environment(void* v, hi_int* exc, hi_int* n);
hi_int* __environment_arg(void* v, int argc, char** argv);

//hi_int* __to_pointer(void* v, hi_int* exc, hi_int *n);
//hi_int* __from_pointer(void* v, hi_int* exc, hi_int *n);

hi_int* eval(void* v, hi_int* n);

int streq(char* s0, char* s1);
int strneq(char* s0, char* s1);

#define MAIN \
    int main(int argc, char** argv) { \
        __enter_symbols(env, (hi_int*) 0); \
        environment_arg(env, argc, argv); \
        hi_int* t = heap_alloc(env->heap, 5); \
        t[0] = 5; \
        t[1] = (hi_int) ___main; \
        t[2] = (hi_int) 0; \
        t[3] = (hi_int) 0; \
        t[4] = (hi_int) 0; \
        while (t) { \
            t = eval((void*) env, t); \
            t = heap_try_gc(env->heap, t); \
        } \
        return 0; \
    };
//** end of file:  eval/eval.h

//** start of file:  eval/eval.c
/** Runtime for the Hi Language (stage2).
 *
 *  COPYRIGHT 2008, M.C.A. Devillers
 *
 *  THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY
 *  EXPRESSED OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
 *
 *  This runtime consists of a number of parts which are all included
 *  in one file for convenience.
 */

#include <stdlib.h>
#include <stdio.h>
#include <stdarg.h>
#include <stdint.h>
#include <string.h>
#include <dlfcn.h>
#include "ffi.h"
#include "eval.h"
#include "../types/types.h"
#include "../heap/heap.h"
#include "../dynamic/dynamic.h"
#include "../ffi/hffi.h"

#define SIZE_CONSTANT       4
#define SIZE_RECORD         4
#define SIZE_THUNK          5

#define INDEX_SIZE          0

#define INDEX_FIELD0        4
#define INDEX_FIELD1        5
#define INDEX_FIELD2        6
#define INDEX_FIELD3        7

#define INDEX_TYPE_TAG      1
#define INDEX_RECORD_TAG    2

#define INDEX_F             1
#define INDEX_K             2
#define INDEX_RT            3
#define INDEX_RTI           4

#define INDEX_ARG0          5
#define INDEX_ARG1          6
#define INDEX_ARG2          7
#define INDEX_ARG3          8

hi_int* __expand(void* v, hi_int *t0, hi_int* t1, hi_int* n);
hi_int* __alloc(void* v, hi_int* n);

/** A combinator which takes one handle and sets up the thunk.
 *
 *  This is a combinator instead of a call since it introduces
 *  a new thunk.
 *
 *  It handles cases where the first argument of a thunk is
 *  a translated variable, therefore postprocessing is needed
 *  to inherit arguments.
 *
 *  /param v    the environment
 *  /param n    the apply thunk holding one handle
 *  /return     the thunk with the handle
 */
hi_int* ___apply(void* v, hi_int *n) {
    heap_t* hp = (heap_t*) v;
    if (n == 0) return heap_string(hp, "___apply");
    if (n[INDEX_SIZE] < 1+SIZE_THUNK) {
        hi_int* k  = (hi_int*) n[INDEX_K];
        hi_int* rt = (hi_int*) n[INDEX_RT];
        hi_int* ri = (hi_int*) n[INDEX_RTI];
        rt[convert_to_int(ri)] = (hi_int) n;
        return k;
    }
    hi_int* f = (hi_int*) n[INDEX_ARG0];
    f = __expand(v, f, n, (hi_int*) 1);
    f[INDEX_K]   = n[INDEX_K];
    f[INDEX_RT]  = n[INDEX_RT];
    f[INDEX_RTI] = n[INDEX_RTI];
    return f;
}

/** A combinator which makes an FFI call.
 *
 *  This is a combinator instead of a call since it takes,
 *  and inspects an arbitrary number of the arguments.
 *
 *  /param v    the environment
 *  /param n    the ffi thunk holding an exception pointer, a dl
 *              record, an ffi record, and a number of arguments
 *  /return     the continuation after the call
 */
hi_int* ___embed(void* v, hi_int *n) {
    heap_t* hp = (heap_t*) v;
    if (n == 0) return heap_string(hp, "___embed");
    if (n[INDEX_SIZE] < 3 + SIZE_THUNK) {
        hi_int* k  = (hi_int*) n[INDEX_K];
        hi_int* rt = (hi_int*) n[INDEX_RT];
        hi_int* ri = (hi_int*) n[INDEX_RTI];
        rt[convert_to_int(ri)] = (hi_int) n;
        return k;
    }

    hi_int* dl  = (hi_int*) n[INDEX_ARG1];
    hi_int* ffi = (hi_int*) n[INDEX_ARG2];
    hi_int  ffi_argc = (hi_int) ffi[INDEX_SIZE] - SIZE_RECORD;

    hi_int  argc = n[INDEX_SIZE] - SIZE_THUNK - 3;
    hi_int* argv = &(n[INDEX_ARG3]);

    hi_int* k  = (hi_int*) n[INDEX_K];
    hi_int* rt = (hi_int*) n[INDEX_RT];
    hi_int* ri = (hi_int*) n[INDEX_RTI];

    hi_int* result;
    if (argc < ffi_argc) {
        result = n;
    } else {
        result = ffi_hi_call(hp, dl, ffi, argc, argv);
    }

    rt[convert_to_int(ri)] = (hi_int) result;
    // ignore the expansion check/extra args

    return k;
}

/** A combinator 'shift'.
 *
 *  This is a combinator instead of a call since it refers the calling
 *  sequence.
 *
 *  /param v    the environment
 *  /param n    the shift thunk holding the exception handler and an
 *              exception
 *  /return     the new size of the heap
 */
hi_int* ___shift(void* v, hi_int *n) {
    heap_t* hp = (heap_t*) v;
    if (n == 0) return heap_string(hp, "___shift");

    if (n[4] < 4) {
        hi_int* k  = (hi_int*) n[INDEX_K];
        hi_int* rt = (hi_int*) n[INDEX_RT];
        hi_int* ri = (hi_int*) n[INDEX_RTI];
        rt[convert_to_int(ri)] = (hi_int) n;
        return k;
    }

    hi_int* k  = (hi_int*)  n[INDEX_ARG0];
    hi_int* rt = (hi_int*)  n[INDEX_ARG1];
    hi_int* ri = (hi_int*)  n[INDEX_ARG2];
    hi_int* e  = (hi_int*)  n[INDEX_ARG3];

    rt[convert_to_int(ri)] = (hi_int) e;
    return k;
}

/** A combinator 'reset' which introduces a 'shift' combinator.
 *
 *  References to Reset/Shift may be bollocks. At the moment, I forgot
 *  why most of this works in this manner. I hacked it.
 *
 *  The point is, that reset saves the 'current contex', k and rt, and
 *  a function to be called into a combinator S, which, when used will
 *  restore/continue into that context with that function.
 *
 *  Save/restore would be better names. It looks like restore is
 *  unnecessary/could be simplified.
 *
 *  This is a combinator instead of a call since it refers the calling
 *  sequence.
 *
 *  /param v    the environment
 *  /param n    a function which is
 *  /return     the new size of the heap
 */
hi_int* ___reset(void* v, hi_int *n) {
    heap_t* hp = (heap_t*) v;

    if (n == 0) return heap_string(hp, "___reset");

    hi_int* k  = (hi_int*) n[INDEX_K];
    hi_int* rt = (hi_int*) n[INDEX_RT];
    hi_int* ri = (hi_int*) n[INDEX_RTI];

    if (n[INDEX_SIZE] < SIZE_THUNK + 1) {
        rt[convert_to_int(ri)] = (hi_int) n;
        return k;
    }

    hi_int* f = (hi_int*) n[INDEX_ARG0];

    hi_int* a = (hi_int*) __alloc(v, (hi_int*) 7);
    a[INDEX_SIZE]   = 7;
    a[INDEX_F]      = (hi_int) ___apply;
    a[INDEX_K]      = (hi_int) k;
    a[INDEX_RT]     = (hi_int) rt;
    a[INDEX_RTI]    = (hi_int) ri;
    a[INDEX_ARG0]   = (hi_int) f;
    a[INDEX_ARG1]   = (hi_int) 0;

    if (((hi_int) n[4]) != ((hi_int) 1)) {
        a = __expand(v, a, n, (hi_int*) 1);
    }

    hi_int* s = (hi_int*) __alloc(v, (hi_int*) (SIZE_THUNK + 3));
    s[INDEX_SIZE]   = (hi_int) (SIZE_THUNK + 3);
    s[INDEX_F]      = (hi_int) ___shift;
    s[INDEX_K]      = (hi_int) a;
    s[INDEX_RT]     = (hi_int) a;
    s[INDEX_RTI]    = (hi_int) heap_to_int(hp, 6);
    s[INDEX_ARG0]   = (hi_int) k;
    s[INDEX_ARG1]   = (hi_int) rt;
    s[INDEX_ARG2]   = (hi_int) ri;

    return s;

}

// depreciated ;) : should be called through ffi
/** A call which reserves N free cells.
 *
 *  This is a combinator instead of a call since it calls the heap.
 *
 *  /param v    the environment
 *  /param n    the free thunk holding the exception and one integer
 *  /return     the new size of the heap
 */
hi_int* __system__reserve(void* v, hi_int *n) {
    heap_t* hp = (heap_t*) v;

    hi_int  sz = convert_to_int(n);
//    n = heap_reserve(hp, sz);

    return n;
}

// depreciated ;) : symbols can be inspected at runtime too through dl
/** The enter_symbol call inserts a symbol into the symbol table.
 *
 *  This is used for serialization code, sometimes it is necessary to
 *  translate names back to pointers.
 *
 *  /param s    the symbol being entered
 *  /return     s
 */
hi_int* __enter_symbol(void* v, hi_int* s) {
    // this is a stub for the moment, it is not used atm and libdl
    // actually could solve it too.
    /*
    heap_t* hp = (heap_t*) v;
    env_t* env = (env_t*) v;
    hi_int*(*f)(void *, hi_int*) =
            (hi_int*(*)(void*, hi_int*)) s;
    char*  sym = (char*) f(0,0);
    syms_enter(env->syms, sym, s);
    */
    return s;
}

/** The exit call prints its argument and exits.
 *  the result.
 *
 *  /param v    the environment
 *  /param n    any value
 *  /return     nothing, it calls exit(0)
 */
hi_int* __exit(void* v, hi_int *n) {
    heap_t* hp = (heap_t*) v;
    if ((n[0] == 1) && (strncmp((char*) n[1], "system.nop", 16) == 0)) {
        fprintf(stdout, "\n");
    } else {
        print_hi(stdout, n);
        fprintf(stdout, "\n");
    };
    exit(0);
    return 0;
}

/** The exception call exits the main program and prints
 *  the exception.
 *
 *  /param v    the environment
 *  /param n    the exception combinator
 *  /return     nothing, it calls exit(1)
 */
hi_int* __exception(void* v, hi_int *n) {
    heap_t* hp = (heap_t*) v;
    fprintf(stderr, "exception(");
    print_hi(stderr, n);
    fprintf(stderr, ")\n");
    exit(1);
    return 0;
}

/** The environment call returns a pointer to the environment.
 *
 *  This is used such that through libffi the environment can be
 *  manipulated by the application.
 *
 *  /param v    the environment
 *  /param n    the exception combinator
 *  /return     a Hi pointer to the environment.
 */
hi_int* __environment(void* v, hi_int* exc, hi_int* n) {
    heap_t* hp = (heap_t*) v;
    hi_int* result = heap_pointer(hp, (void*) v);
    return result;
}

/** Store the application arguments in the heap.
 *
 *  /param v    the environment
 *  /param argc number of arguments
 *  /param argv arguments
 *  /return     a Hi pointer to the environment.
 */
hi_int* __environment_arg(void* v, int argc, char** argv) {
    // XXX: for later
    heap_t* hp = (heap_t*) v;
    hi_int* result = heap_pointer(hp, (void*) v);
    return result;
}

/** Allocate n cells in the heap.
 *
 *  /param n    the nr of cells
 *  /return     a memory region
 */
hi_int* __alloc(void* v, hi_int* n) {
    heap_t* hp = (heap_t*) v;
    hi_int* r = heap_allocate(hp, (hi_int) n);
    r[0] = (hi_int) n;
    r[1] = 0;
    return r;
}

/** Expand the first thunk with the second thunk except for the first
 *  n arguments of the second thunk
 *
 *  /param t0   the first thunk
 *  /param t1   the second thunk
 *  /param n    the nr of arguments which should be ommited
 *  /return     the expanded thunk
 */
hi_int* __expand(void* v, hi_int *t0, hi_int* t1, hi_int* n) {
    heap_t* hp = (heap_t*) v;
    hi_int m = (hi_int) n;
    hi_int argc0 = t0[0] - SIZE_THUNK;
    hi_int argc1 = t1[0] - SIZE_THUNK;
    hi_int new_size = argc0 + argc1 - m + SIZE_THUNK;
    hi_int* new_thunk = heap_alloc(hp, new_size);
    new_thunk[INDEX_SIZE] = new_size;
    new_thunk[INDEX_F]    = t0[INDEX_F];
    new_thunk[INDEX_K]    = t0[INDEX_K];
    new_thunk[INDEX_RT]   = t0[INDEX_RT];
    new_thunk[INDEX_RTI]  = t0[INDEX_RTI];
    hi_int i;
    for (i = 0; i < argc0; i++)
        new_thunk[INDEX_ARG0+i] = t0[INDEX_ARG0+i];
    for (i = 0; i < argc1 - m; i++)
        new_thunk[INDEX_ARG0+argc0+i] = t1[INDEX_ARG0+m+i];
    return new_thunk;
}


//////////////////////////////////////////////////////////////////////

/** Evaluate one thunk.
 *
 *  /param v    an environment pointer
 *  /param n    the starting thunk
 *  /return     the result of the call
 */
inline hi_int* eval(void* v, hi_int* n) {
    heap_t* hp = (heap_t*) v;
//    fprintf(stdout, "\nevaluating : ");
//    term_pretty(stdout, n);
//    fprintf(stdout, "\n");
    if (((hi_int) n[0]) < 2) {
        fprintf(stderr, "cannot evaluate a constant : ");
        term_debug(stderr, n);
        exit(1);
    }
    hi_int*(*f)(void*, hi_int*) =
        (hi_int*(*)(void*, hi_int*)) n[0];
    return f(v, n);
}

int streq(char* s0, char* s1) {
    if (s0 == s1) return (0==0);
    return (strncmp(s0, s1, 128) == 0);
}

int strneq(char* s0, char* s1) {
    if (s0 == s1) return (0==1);
    return (strncmp(s0, s1, 128) != 0);
}

//** end of file:  eval/eval.c


Sunday, June 6, 2010

Log 060610

I finished the monadic style lambda emitter. Pretty sure it is correct now, at least in theory. A number of invariants just dropped into the basket, it looks correct. Leaves getting rid of some bugs.

I now have a small new problem, the Hi language assumes integers to be of system pointer width, which is 64 bits, and uses that to translate strings to their integer representation. The ocaml interpreter uses 31 tagged bit representation, it cannot produce the integer representation directly, so I will do some name mangling of identifiers to 3 chars to get to a bootstrap.

I wanted to serialize heaps. But there are two problems. One is that heap serializer probably just wastes too much memory, and I want to change the representation of it. Two is that, in the end, serialization of Hi structures to memory from the compiler directly will probably just give better portability, and possibly performance anyway, drops reliance on the heap serializer, and can be reused easier for parallel processing. So, I am gonna write a serialization interface, and build serialization code for the AST.

Friday, June 4, 2010

One CHunk

Below, the rewritten runtime in one chunk. The thing is pretty much untested. I had a logical error and changed the representation a bit. A primitive constant now is a series of bits with a size header, a zero constant, a type tag integer, and a value. On 64 bit intel, it means a whopping 256 bits for most primitives, but that's not too bad for something which is meant to be close to a scripting language I guess.

I post it as one chunk since that is how it is meant to be used. I.e., it should be included with the compiled program such that the C optimizer can inline as much as possible.

This chunk doesn't include combinators and primitive calls to the runtime. I am at 2-3 kLoC, expect to end at 3-4 kLoC.

I like the representation, it's good enough for me. Can't say I like the collector much. Two optimizations could be done, a nursery for newly allocated objects and a separate region for constants. The first would be good, the second I don't think is necessary if most constants are static data anyway. Good about the collector is its interface, i.e., it should be easy to drop in new collectors with similar calls.



//** start of file:  assert/assert.h
/**
 *  COPYRIGHT 2010, M.C.A. Devillers
 *
 *  THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY
 *  EXPRESSED OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
 *
 *  Two macros which handle debugging.
 */

#ifndef ASSERT_H
#define ASSERT_H

#include <stdlib.h>
#include <stdio.h>

#ifndef TRACE
#define TRACE \
    {fprintf(stdout, \
        "file %s: line %d: trace\n", \
            __FILE__, \
            __LINE__); \
    fflush(stdout);}
#endif

#ifndef ASSERT
#define ASSERT(b) \
    {if (!(b)) { \
        fprintf(stderr, \
            "file %s: line %d: assertion failed (%s)\n", \
                __FILE__, \
                __LINE__, \
                #b); \
        exit(1); \
        } \
    };
#endif

#ifndef DEBUG
#define DEBUG(b)    b
#endif

#endif // ASSERT_H
//** end of file:  assert/assert.h

//** start of file:  types/types.h
/**
 *  COPYRIGHT 2008, M.C.A. Devillers
 *
 *  THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY
 *  EXPRESSED OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
 *
 *  Internally used types and conversions from them to native C types.
 *
 *  A number of these routines are added at the moment pure for
 *  debugging purposes. It is expected the compiler compiles static
 *  values to C static values.
 *
 *  A Hi value is either a constant, a record, or a thunk. They are
 *  tagged with record and type tags.
 *
 *  A constant is a size bounded, type tagged, series of bits.
 *
 *  c[0]    size of the constant (number of hi_ints)
 *  c[1]    0
 *  c[2]    type tag
 *  c[3..]  bits which make up the constant (with unused extra bits)
 *
 *  A record is a size bounded, record and type tagged, series of
 *  pointers.
 *
 *  r[0]    size of the record (number of hi_ints)
 *  r[1]    1
 *  r[2]    record record field pointer
 *  r[3]    record type field pointer
 *  r[4..]  pointers which make up the record
 *
 *  This encoding is pretty portable but uses a lot of bits.
 *  For example, any simple integer value is encoded with 4 * 64 is
 *  256 bits on an Intel 64 bit system.
 *
 *  Bit packing, and using a seperate info record for rtti is for
 *  later.
 *
 *  Assuming that most tags are shared, some of that burden is
 *  aleviated.
 *
 *  Tested on:
 *       Intel x86-64
 */
#ifndef TYPES_H
#define TYPES_H

#include <stdlib.h>
#include <stdint.h>
#include "ffi.h"

#define HI_SIZE(t) \
    ((sizeof(t) + sizeof(hi_int) - 1)/(sizeof(hi_int)))

typedef void                hi_void;

#if SCHAR_MAX == 127
typedef unsigned char       hi_uint8;
typedef signed char         hi_sint8;
#else
 #error "int8 size not supported"
#endif

#if SHRT_MAX == 32767
typedef unsigned short      hi_uint16;
typedef signed short        hi_sint16;
#elif INT_MAX == 32767
typedef unsigned int        hi_uint16;
typedef signed int          hi_sint16;
#else
 #error "int16 size not supported"
#endif

#if INT_MAX == 2147483647
typedef unsigned int        hi_uint32;
typedef signed int          hi_sint32;
#elif SHRT_MAX == 2147483647
typedef unsigned short      hi_uint32;
typedef signed short        hi_sint32;
#elif LONG_MAX == 2147483647
typedef unsigned long       hi_uint32;
typedef signed long         hi_sint32;
#else
 #error "int32 size not supported"
#endif

#if INT_MAX == 9223372036854775807
typedef unsigned int        hi_uint64;
typedef signed int          hi_sint64;
#elif LONG_MAX == 9223372036854775807
typedef unsigned long       hi_uint64;
typedef signed long         hi_sint64;
#else
 #error "int64 size not supported"
#endif

typedef hi_uint64           hi_int;

typedef char                hi_char;
typedef float               hi_float;
typedef double              hi_double;
typedef long double         hi_ldouble;
typedef void*               hi_pointer;
typedef char*               hi_string;

static const hi_int void_tag;
static const hi_int int_tag;
static const hi_int char_tag;
static const hi_int float_tag;
static const hi_int double_tag;
static const hi_int ldouble_tag;
static const hi_int pointer_tag;
static const hi_int string_tag;
static const hi_int array_tag;

static const hi_int uint8_tag;
static const hi_int sint8_tag;
static const hi_int uint16_tag;
static const hi_int sint16_tag;
static const hi_int uint32_tag;
static const hi_int sint32_tag;
static const hi_int uint64_tag;
static const hi_int sint64_tag;

int is_int(hi_int* v);
int is_char(hi_int* v);
int is_float(hi_int* v);
int is_double(hi_int* v);
int is_ldouble(hi_int* v);
int is_pointer(hi_int* v);
int is_string(hi_int* v);
int is_array(hi_int* v);

hi_int*     convert_from_int(int v);
hi_int*     convert_from_char(hi_char v);
hi_int*     convert_from_float(hi_float v);
hi_int*     convert_from_double(hi_double v);
hi_int*     convert_from_ldouble(hi_ldouble v);
hi_int*     convert_from_pointer(hi_pointer v);
hi_int*     convert_from_string(char* v);

hi_int*     convert_from_uint8(hi_uint8 v);
hi_int*     convert_from_sint8(hi_sint8 v);
hi_int*     convert_from_uint16(hi_uint16 v);
hi_int*     convert_from_sint16(hi_sint16 v);
hi_int*     convert_from_uint32(hi_uint32 v);
hi_int*     convert_from_sint32(hi_sint32 v);
hi_int*     convert_from_uint64(hi_uint64 v);
hi_int*     convert_from_sint64(hi_sint64 v);

int*        get_int(hi_int* v);
hi_char*    get_char(hi_int* v);
hi_float*   get_float(hi_int* v);
hi_double*  get_double(hi_int* v);
hi_ldouble* get_ldouble(hi_int* v);
hi_pointer* get_pointer(hi_int* v);
char*       get_string(hi_int* v);

hi_uint8*   get_uint8(hi_int* v);
hi_sint8*   get_sint8(hi_int* v);
hi_uint16*  get_uint16(hi_int* v);
hi_sint16*  get_sint16(hi_int* v);
hi_uint32*  get_uint32(hi_int* v);
hi_sint32*  get_sint32(hi_int* v);
hi_uint64*  get_uint64(hi_int* v);
hi_sint64*  get_sint64(hi_int* v);

int         convert_to_int(hi_int* v);
hi_char     convert_to_char(hi_int* v);
hi_float    convert_to_float(hi_int* v);
hi_double   convert_to_double(hi_int* v);
hi_ldouble  convert_to_ldouble(hi_int* v);
hi_pointer  convert_to_pointer(hi_int* v);
char*       convert_to_string(hi_int* v);

hi_uint8    convert_to_uint8(hi_int* v);
hi_sint8    convert_to_sint8(hi_int* v);
hi_uint16   convert_to_uint16(hi_int* v);
hi_sint16   convert_to_sint16(hi_int* v);
hi_uint32   convert_to_uint32(hi_int* v);
hi_sint32   convert_to_sint32(hi_int* v);
hi_uint64   convert_to_uint64(hi_int* v);
hi_sint64   convert_to_sint64(hi_int* v);

static hi_int* system_unit_tag;
static hi_int* system_nop_tag;
static hi_int* system_bool_tag;
static hi_int* system_true_tag;
static hi_int* system_false_tag;

static hi_int* system_unit_nop;
static hi_int* system_bool_true;
static hi_int* system_bool_false;

hi_int  convert_to_constant_tag(char* v);
char*   convert_from_constant_tag(hi_int  v);

hi_int* convert_to_record_tag(char* v);
char*   convert_from_record_tag(hi_int* v);

hi_int* get_record_tag(hi_int* v);
hi_int* get_type_tag(hi_int* v);

int     is_nop(hi_int* v);
int     is_true(hi_int* v);
int     is_false(hi_int* v);

void    print_hi_value(FILE* f, hi_int* v);

#endif // TYPES_H
//** end of file:  types/types.h

//** start of file:  types/types.c
/**
 *  COPYRIGHT 2008, M.C.A. Devillers
 *
 *  THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY
 *  EXPRESSED OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
 *
 */
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include "ffi.h"

//////////////////////////////////////////////////////////////////////

#ifndef ONE_LARGE_FILE
#include "types.h"
#include "../assert/assert.h"
#endif

//////////////////////////////////////////////////////////////////////

#define ENCODEx86_64(c0,c1,c2,c3,c4,c5,c6,c7) \
  ( (((hi_uint64) c0) << 0) | \
    (((hi_uint64) c1) << 8) | \
    (((hi_uint64) c2) << 16) | \
    (((hi_uint64) c3) << 24) | \
    (((hi_uint64) c4) << 32) | \
    (((hi_uint64) c5) << 40) | \
    (((hi_uint64) c6) << 48) | \
    (((hi_uint64) c7) << 56) )

#define ENCODE8(c0,c1,c2,c3,c4,c5,c6,c7) \
    ENCODEx86_64(c0,c1,c2,c3,c4,c5,c6,c7)
    
#define ENCODE16(c0,c1,c2,c3,c4,c5,c6,c7,c8,c9,cA,cB,cC,cD,cE,cF) \
    ENCODE8(c0,c1,c2,c3,c4,c5,c6,c7), \
    ENCODE8(c8,c9,cA,cB,cC,cD,cE,cF)

//////////////////////////////////////////////////////////////////////

static const hi_int void_tag =
    ENCODE8('v','o','i','d',  0,  0,  0,  0);

static const hi_int int_tag =
    ENCODE8('i','n','t',  0,  0,  0,  0,  0);

static const hi_int char_tag =
    ENCODE8('c','h','a','r',  0,  0,  0,  0);

static const hi_int float_tag =
    ENCODE8('f','l','o','a','t',  0,  0,  0);

static const hi_int double_tag =
    ENCODE8('d','o','u','b','l','e',  0,  0);

static const hi_int ldouble_tag =
    ENCODE8('l','d','o','u','b','l','e',  0);

static const hi_int pointer_tag =
    ENCODE8('p','o','i','n','t','e','r',  0);

static const hi_int string_tag =
    ENCODE8('s','t','r','i','n','g',  0,  0);

static const hi_int array_tag =
    ENCODE8('a','r','r','a','y',  0,  0,  0);

static const hi_int uint8_tag =
    ENCODE8('u','i','n','t','8',  0,  0,  0);

static const hi_int sint8_tag =
    ENCODE8('s','i','n','t','8',  0,  0,  0);

static const hi_int uint16_tag =
    ENCODE8('u','i','n','t','1','6',  0,  0);

static const hi_int sint16_tag =
    ENCODE8('s','i','n','t','1','6',  0,  0);

static const hi_int uint32_tag =
    ENCODE8('u','i','n','t','3','2',  0,  0);

static const hi_int sint32_tag =
    ENCODE8('s','i','n','t','3','2',  0,  0);

static const hi_int uint64_tag =
    ENCODE8('u','i','n','t','6','4',  0,  0);

static const hi_int sint64_tag =
    ENCODE8('s','i','n','t','6','4',  0,  0);

//////////////////////////////////////////////////////////////////////

int         is_int(hi_int* v) {
    return (v[1] == 0) && (v[2] == int_tag);
}

int         is_char(hi_int* v) {
    return (v[1] == 0) && (v[2] == char_tag);
}

int         is_float(hi_int* v) {
    return (v[1] == 0) && (v[2] == float_tag);
}

int         is_double(hi_int* v) {
    return (v[1] == 0) && (v[2] == double_tag);
}

int         is_ldouble(hi_int* v) {
    return (v[1] == 0) && (v[2] == ldouble_tag);
}

int         is_pointer(hi_int* v) {
    return (v[1] == 0) && (v[2] == pointer_tag);
}

int         is_string(hi_int* v) {
    return (v[1] == 0) && (v[2] == string_tag);
}

int         is_array(hi_int* v) {
    return (v[1] == 0) && (v[2] == array_tag);
}

//////////////////////////////////////////////////////////////////////

hi_int* convert_from_int(int v) {
    int sz = 3 + HI_SIZE(int);
    hi_int* n = (hi_int*) malloc(sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = int_tag;
    n[3] = 0;
    int* p = (int*) &(n[3]);
    *p = v;
    return n;
}

hi_int* convert_from_char(hi_char v) {
    int sz = 3 + HI_SIZE(int);
    hi_int* n = (hi_int*) malloc(sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = char_tag;
    n[3] = 0;
    char* p = (char*) &(n[3]);
    *p = v;
    return n;
}

hi_int* convert_from_float(hi_float v) {
    int sz = 3 + HI_SIZE(hi_float);
    hi_int* n = (hi_int*) malloc(sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = float_tag;
    hi_float* p = (hi_float*) &(n[3]);
    *p = v;
    return n;
}

hi_int* convert_from_double(hi_double v) {
    int sz = 3 + HI_SIZE(hi_double);
    hi_int* n = (hi_int*) malloc(sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = double_tag;
    hi_double* p = (hi_double*) &(n[3]);
    *p = v;
    return n;
}

hi_int* convert_from_ldouble(hi_ldouble v) {
    int sz = 3 + HI_SIZE(hi_ldouble);
    hi_int* n = (hi_int*) malloc(sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = ldouble_tag;
    hi_ldouble* p = (hi_ldouble*) &(n[3]);
    *p = v;
    return n;
}

hi_int* convert_from_pointer(hi_pointer v) {
    int sz = 3 + HI_SIZE(hi_pointer);
    hi_int* n = (hi_int*) malloc(sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = pointer_tag;
    hi_pointer* p = (hi_pointer*) &(n[3]);
    *p = v;
    return n;
}

hi_int* convert_from_string(char* v) {
    int v_len = strnlen(v, 2048);
    //v[v_len] = (char) 0;
    int sz = 3 + (v_len + sizeof(hi_int))/sizeof(hi_int);
    hi_int* n = (hi_int*) malloc(sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = string_tag;
    char* p = (char*) &(n[3]);
    strcpy(p, v);
    return n;
}

hi_int* convert_from_uint8(hi_uint8 v) {
    int sz = 3 + HI_SIZE(hi_uint8);
    hi_int* n = (hi_int*) malloc(sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = uint8_tag;
    hi_uint8* p = (hi_uint8*) &(n[3]);
    *p = v;
    return n;
}

hi_int* convert_from_sint8(hi_sint8 v) {
    int sz = 3 + HI_SIZE(hi_sint8);
    hi_int* n = (hi_int*) malloc(sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = sint8_tag;
    hi_sint8* p = (hi_sint8*) &(n[3]);
    *p = v;
    return n;
}

hi_int* convert_from_uint16(hi_uint16 v) {
    int sz = 3 + HI_SIZE(hi_uint16);
    hi_int* n = (hi_int*) malloc(sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = uint16_tag;
    hi_uint16* p = (hi_uint16*) &(n[3]);
    *p = v;
    return n;
}

hi_int* convert_from_sint16(hi_sint16 v) {
    int sz = 3 + HI_SIZE(hi_sint16);
    hi_int* n = (hi_int*) malloc(sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = sint16_tag;
    hi_sint16* p = (hi_sint16*) &(n[3]);
    *p = v;
    return n;
}

hi_int* convert_from_uint32(hi_uint32 v) {
    int sz = 3 + HI_SIZE(hi_uint32);
    hi_int* n = (hi_int*) malloc(sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = uint32_tag;
    hi_uint32* p = (hi_uint32*) &(n[3]);
    *p = v;
    return n;
}

hi_int* convert_from_sint32(hi_sint32 v) {
    int sz = 3 + HI_SIZE(hi_sint32);
    hi_int* n = (hi_int*) malloc(sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = sint32_tag;
    hi_sint32* p = (hi_sint32*) &(n[3]);
    *p = v;
    return n;
}

hi_int* convert_from_uint64(hi_uint64 v) {
    int sz = 3 + HI_SIZE(hi_uint64);
    hi_int* n = (hi_int*) malloc(sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = uint64_tag;
    hi_uint64* p = (hi_uint64*) &(n[3]);
    *p = v;
    return n;
}

hi_int* convert_from_sint64(hi_sint64 v) {
    int sz = 3 + HI_SIZE(hi_sint64);
    hi_int* n = (hi_int*) malloc(sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = sint64_tag;
    hi_sint64* p = (hi_sint64*) &(n[3]);
    *p = v;
    return n;
}

//////////////////////////////////////////////////////////////////////

int*        get_int(hi_int* v) {
    return (int*) (&(v[3]));
}

hi_char*    get_char(hi_int* v) {
    return (hi_char*) (&(v[3]));
}

hi_float*   get_float(hi_int* v) {
    return (hi_float*) (&(v[3]));
}

hi_double*  get_double(hi_int* v) {
    return (hi_double*) (&(v[3]));
}

hi_ldouble*  get_ldouble(hi_int* v) {
    return (hi_ldouble*) (&(v[3]));
}

hi_pointer* get_pointer(hi_int* v) {
    return (hi_pointer*) (&(v[3]));
}

char*  get_string(hi_int* v) {
    return (char*) (&(v[3]));
}

hi_uint8*   get_uint8(hi_int* v) {
    return (hi_uint8*) (&(v[3]));
}

hi_sint8*   get_sint8(hi_int* v) {
    return (hi_sint8*) (&(v[3]));
}

hi_uint16*  get_uint16(hi_int* v) {
    return (hi_uint16*) (&(v[3]));
}

hi_sint16*  get_sint16(hi_int* v) {
    return (hi_sint16*) (&(v[3]));
}

hi_uint32*  get_uint32(hi_int* v) {
    return (hi_uint32*) (&(v[3]));
}

hi_sint32*  get_sint32(hi_int* v) {
    return (hi_sint32*) (&(v[3]));
}

hi_uint64*  get_uint64(hi_int* v) {
    return (hi_uint64*) (&(v[3]));
}

hi_sint64*  get_sint64(hi_int* v) {
    return (hi_sint64*) (&(v[3]));
}

//////////////////////////////////////////////////////////////////////

int         convert_to_int(hi_int* v) {
    return *(get_int(v));
}

hi_char     convert_to_char(hi_int* v) {
    return *(get_char(v));
}

hi_float    convert_to_float(hi_int* v) {
    return *(get_float(v));
}

hi_double   convert_to_double(hi_int* v) {
    return *(get_double(v));
}

hi_ldouble  convert_to_ldouble(hi_int* v) {
    return *(get_ldouble(v));
}

hi_pointer  convert_to_pointer(hi_int* v) {
    return *(get_pointer(v));
}

char*       convert_to_string(hi_int* v) {
    return (get_string(v));
}

hi_uint8    convert_to_uint8(hi_int* v) {
    return *(get_uint8(v));
}

hi_sint8    convert_to_sint8(hi_int* v) {
    return *(get_sint8(v));
}

hi_uint16   convert_to_uint16(hi_int* v) {
    return *(get_uint16(v));
}

hi_sint16   convert_to_sint16(hi_int* v) {
    return *(get_sint16(v));
}

hi_uint32   convert_to_uint32(hi_int* v) {
    return *(get_uint32(v));
}

hi_sint32   convert_to_sint32(hi_int* v) {
    return *(get_sint32(v));
}

hi_uint64   convert_to_uint64(hi_int* v) {
    return *(get_uint64(v));
}

hi_sint64   convert_to_sint64(hi_int* v) {
    return *(get_sint64(v));
}

//////////////////////////////////////////////////////////////////////

static hi_int system_unit[5] = {
    5,
    0,
    ENCODE8('s','t','r','i','n','g',  0,  0),
    ENCODE16('s','y','s','t','e','m','.','u','n','i','t',0,0,0,0,0)
};
static hi_int* system_unit_tag = system_unit;

static hi_int system_nop[5] = {
    5,
    0,
    ENCODE8('s','t','r','i','n','g',  0,  0),
    ENCODE16('s','y','s','t','e','m','.','n','o','p',0,0,0,0,0,0)
};
static hi_int* system_nop_tag = system_nop;

static hi_int system_bool[5] = {
    5,
    0,
    ENCODE8('s','t','r','i','n','g',  0,  0),
    ENCODE16('s','y','s','t','e','m','.','b','o','o','l',0,0,0,0,0)
};
static hi_int* system_bool_tag = system_bool;

static hi_int system_true[5] = {
    5,
    0,
    ENCODE8('s','t','r','i','n','g',  0,  0),
    ENCODE16('s','y','s','t','e','m','.','t','r','u','e',0,0,0,0,0)
};
static hi_int* system_true_tag = system_true;

static hi_int system_false[5] = {
    5,
    0,
    ENCODE8('s','t','r','i','n','g',  0,  0),
    ENCODE16('s','y','s','t','e','m','.','f','a','l','s','e',0,0,0,0)
};
static hi_int* system_false_tag = system_false;

//////////////////////////////////////////////////////////////////////

hi_int  convert_to_constant_tag(char* v) {
    return *((hi_int*) v);
}

char*   convert_from_constant_tag(hi_int* v) {
    return (char*) &(v[2]);
}

hi_int* convert_to_record_tag(char* v) {
    return convert_from_string(v);
}

char*   convert_from_record_tag(hi_int* v) {
    return convert_to_string(v);
}

//////////////////////////////////////////////////////////////////////

static hi_int system_unit_nop_record[4] = {
    4,
    1,
    (hi_int) system_unit,
    (hi_int) system_nop,
};
static hi_int* system_unit_nop = system_unit_nop_record;

static hi_int system_bool_false_record[4] = {
    4,
    1,
    (hi_int) system_bool,
    (hi_int) system_false,
};
static hi_int* system_bool_false = system_bool_false_record;


static hi_int system_bool_true_record[4] = {
    4,
    1,
    (hi_int) system_bool,
    (hi_int) system_true,
};
static hi_int* system_true_false = system_bool_true_record;

//////////////////////////////////////////////////////////////////////

hi_int* get_type_tag(hi_int* v) {
    return ((hi_int*) v[2]);
}

hi_int* get_record_tag(hi_int* v) {
    return ((hi_int*) v[3]);
}

//////////////////////////////////////////////////////////////////////

int         is_nop(hi_int* v) {
    return (v[1] == 1) &&
     (strncmp(convert_from_record_tag((hi_int*) v[3]), "system.nop", 32) == 0);
}

int         is_true(hi_int* v) {
    return (v[1] == 1) &&
     (strncmp(convert_from_record_tag((hi_int*) v[3]), "system.true", 32) == 0);
}

int         is_false(hi_int* v) {
    return (v[1] == 1) &&
     (strncmp(convert_from_record_tag((hi_int*) v[3]), "system.false", 32) == 0);
}

//////////////////////////////////////////////////////////////////////

void print_hi(FILE* f, hi_int* v) {
    if (v == 0) {
        fprintf(f, "(null)");
    } else if ( (v[0] > 2048) || (v[0] < 0) ) {
        fprintf(f, "(garbled)");
    } else if (is_nop(v)) {
        fprintf(f, "nop");
    } else if (is_true(v)) {
        fprintf(f, "true");
    } else if (is_false(v)) {
        fprintf(f, "false");
    } else if (is_char(v)) {
        fprintf(f, "'%c'", convert_to_uint8(v));
    } else if (is_int(v)) {
        fprintf(f, "%d", convert_to_uint64(v));
    } else if (is_float(v)) {
        fprintf(f, "%f", convert_to_float(v));
    } else if (is_double(v)) {
        fprintf(f, "%f", convert_to_double(v));
    } else if (is_string(v)) {
        fprintf(f, "%s", convert_to_string(v));
    } else if (is_pointer(v)) {
        fprintf(f, "%x", convert_to_pointer(v));
    } else if ( v[1] == 0 ) {
        fprintf(f, "('%s' ", &(v[2]));
        int i;
        for (i = 3; i < v[0]; i++) {
            fprintf(f, " %d", v[i]);
        }
        fprintf(f, ")");
    } else if ( v[1] == 1 ) {
        fprintf(f, "(");
        print_hi(f, (hi_int*) v[2]);
        fprintf(f, " ");
        print_hi(f, (hi_int*) v[3]);
        int i;
        for (i = 4; i < v[0]; i++) {
            fprintf(f, " ");
            print_hi(f, (hi_int*) v[i]);
        }
        fprintf(f, ")");
    } else {
        fprintf(f, "[");
        int i;
        for (i = 1; i < v[0]; i++) {
            fprintf(f, "%p ", (hi_int*) v[i]);
        }
        fprintf(f, "]");
    }
}

//////////////////////////////////////////////////////////////////////
//** end of file:  types/types.c

//** start of file:  heap/heap.h
/**
 *  COPYRIGHT 2010, M.C.A. Devillers
 *
 *  THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY
 *  EXPRESSED OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
 *
 *  A simple library for a Cheney style heap.
 *
 *  1. Every value is either a pointer or an integer of pointer size.
 *
 *  2. The heap holds nodes. Every node starts of with a two integer
 *     header. The first integer describes the size of the whole node,
 *     the second integer is either 0, in which case all values in
 *     that node are integers, or not, in which case all values in
 *     that node are pointers to other nodes possibly in that heap.
 *
 *     Note: the Hi language runtime elaborates on the meaning of the
 *     second integer in the following manner:
 *          0 - a constant, 1 - a record, otherwise - a thunk.
 *
 *  3. At the moment, the collector is semi-exact. Pointers which do
 *     not point into the heap are not chased. This makes it possible
 *     to use pointers to static data, and it makes it possible to
 *     use small integers as if they would be integers.
 *
 *     It implies serialization of static data is handled
 *     different, it serializes everything and can only handle non-
 *     cyclic structures which is a severe drawback.
 *
 *  4. A heap may be GCed, in which case _a whole new heap is
 *     allocated_ and the heap is copied to the new heap. This is to
 *     facilitate multi-core processing of several coarse-grained
 *     threads without the need to keep unused to spaces around for
 *     all heaps used.
 *     Ordinary 'malloc' is used to allocate new heaps.
 *
 *  5. The heap size may be grown at will for special operations.
 *     There are no facilities, as of yet, to shrink the heap size.
 *
 *  6. A heap may be serialized to a region in memory, in which case
 *     only the first root node is copied. Unserialization inserts a
 *     root node from a region in memory to a given heap.
 *
 *  A design which is neither very compact nor very fast, but minimal
 *  and robust.
 */
#ifndef HEAP_H
#define HEAP_H

#include <stdio.h>
#include <stdint.h>

#ifndef ONE_LARGE_FILE
#include "../types/types.h"
#endif

#define ROOTS   1024

typedef struct {
    hi_int      size;
    hi_int*     low;
    hi_int*     high;
    hi_int*     free;

    hi_int*     trigger_gc;
    hi_int*     trigger_grow;

    hi_int      root_free;
    hi_int*     root[ROOTS];

    hi_int*     memory;
} heap_t;

heap_t* heap_create(hi_int sz);
void*   heap_destroy(heap_t* hp);

hi_int  heap_root_count(heap_t* hp);
hi_int  heap_root_new(heap_t* hp);
void    heap_root_set(heap_t* hp, hi_int n, hi_int* r);
hi_int* heap_root_get(heap_t* hp, hi_int n);

hi_int* heap_allocate(heap_t* hp, hi_int sz);
heap_t* heap_collect(heap_t* hp);
heap_t* heap_try_collect(heap_t* hp);
heap_t* heap_reserve(heap_t* hp, hi_int sz);

hi_int  heap_serialize(heap_t* hp, void* p, hi_int sz);
hi_int* heap_unserialize(heap_t* hp, void* p, hi_int sz);

hi_int  heap_size(heap_t* hp);
hi_int  heap_used(heap_t* hp);
hi_int  heap_free(heap_t* hp);

void    heap_info(FILE* f, heap_t* hp);
void    heap_debug(FILE* f, heap_t* hp);

hi_int* heap_int(heap_t* hp, int v);
hi_int* heap_char(heap_t* hp, hi_char v);
hi_int* heap_float(heap_t* hp, hi_float v);
hi_int* heap_double(heap_t* hp, hi_double v);
hi_int* heap_ldouble(heap_t* hp, hi_ldouble v);
hi_int* heap_pointer(heap_t* hp, hi_pointer v);
hi_int* heap_string(heap_t* hp, char* v);
hi_int* heap_uint8(heap_t* hp, hi_uint8 v);
hi_int* heap_sint8(heap_t* hp, hi_sint8 v);
hi_int* heap_uint16(heap_t* hp, hi_uint16 v);
hi_int* heap_sint16(heap_t* hp, hi_sint16 v);
hi_int* heap_uint32(heap_t* hp, hi_uint32 v);
hi_int* heap_sint32(heap_t* hp, hi_sint32 v);
hi_int* heap_uint64(heap_t* hp, hi_uint64 v);
hi_int* heap_sint64(heap_t* hp, hi_sint64 v);

#endif // HEAP_H
//** end of file:  heap/heap.h

//** start of file:  heap/heap.c
/**
 *  COPYRIGHT 2008, M.C.A. Devillers
 *
 *  THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY
 *  EXPRESSED OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
 */

#include <stdlib.h>
#include <string.h>
#include <dlfcn.h>

#ifndef ONE_LARGE_FILE
#include "../assert/assert.h"
#include "heap.h"
#endif

#define BITS_SET(b0,b1)     ((b0) | (b1))
#define BITS_UNSET(b0,b1)   ((b0) & (~(b1)))
#define BITS_TEST(b0,b1)    (((b0) & (b1)) != 0)
#define BITS_FLIP(b0,b1)    ((b0) (b1))

#define BIT_SET(n,b)        BITS_SET(n, 1<<b)
#define BIT_UNSET(n,b)      BITS_UNSET(n, 1<<b)
#define BIT_TEST(n,b)       BITS_TEST(n, 1<<b)
#define BIT_FLIP(n,b)       BITS_FLIP(n, 1<<b)

#define BELOW(p,q)          (((void*) p) < ((void*) q))
#define BELOWEQ(p,q)        ((void*) p <= (void*) q)
#define IN_HEAP(p, l, h) \
    ( ((void*) p >= (void*) l) && ((void*) p < (void*) h) )

#define DIFFERENCE(p,q)     ((hi_int)((void*)p - (void*) q))
#define INCREMENT(p,n)      ((hi_int*)((void*)p + (int) n))
#define DECREMENT(p,n)      ((hi_int*)((void*)p - (int) n))

#define ADVANCE(p,n)        &(((hi_int*)p)[(int)n])
#define RETREAT(p,n)        &(((hi_int*)p)[0-(int)n])

#define HEAP_ALLIGNMENT     512
#define HEAP_MINIMAL_GROW   2048
#define HEAP_MINIMAL_FREE   1024
#define HEAP_MINIMAL_SIZE   4096

#define SMALL_INTEGER       64*1024
#define IS_SMALL_INTEGER(p) BELOW(p, (SMALL_INTEGER))

/** Allocate alligned memory.
 *
 *  /param m    the size to be allocared
 *  /param n    the alignement (some power of two, preferably cache-size)
 *  /return     a chunk of memory
 */
void* malloc_alligned(size_t m, int n) {
    TRACE;
    void* p = 0;
    if (posix_memalign(&p, n, m) == 0) {
        ASSERT(!IS_SMALL_INTEGER(p));
        return p;
    } else {
        return 0;
    }
}

/** Create a heap.
 *
 *  The heap size is _not_ the number of bytes allocated, but the
 *  total number of bytes _including_ the header size.
 *
 *  Exits if a new heap cannot be allocated.
 *
 *  /param sz   the heap size
 *  /return     a heap
 */
heap_t* heap_create(hi_int sz) {
    TRACE;
    ASSERT(sz >= HEAP_MINIMAL_SIZE);

    heap_t* r = (heap_t*) malloc_alligned(sz, HEAP_ALLIGNMENT);
    if (r == 0) {
        fprintf(stderr,
            "HEAP_PANIC[0]: failed to allocate a heap of size %u\n",
            (unsigned int) sz);
        exit(1);
    }

    r->size = sz;
    r->low  = (hi_int*) &(r->memory);
    r->high = INCREMENT(r, sz);
    r->free = r->low;

    r->root_free = 0;

    DEBUG(hi_int i;)
    DEBUG(for (i = 0; i < ROOTS; i++) r->root[i] = 0;)

    r->trigger_gc   = DECREMENT(r->high, HEAP_MINIMAL_FREE);
    r->trigger_grow = DECREMENT(r->high, HEAP_MINIMAL_GROW);

    DEBUG(hi_int* p;)
    DEBUG(for (p = r->low; p < r->high; p++) *p = 0;)

    return r;
}

/** Destroy a heap.
 *
 *  /param hp   a heap
 */
void* heap_destroy(heap_t* hp) {
    TRACE;

    free(hp);
    return 0;
}

/** Give the number of roots in the heap.
 *
 *  /param hp   a heap
 *  /return     the number of roots
 */
hi_int heap_root_count(heap_t* hp) {
    TRACE;

    return hp->root_free;
}

/** Allocate a new root slot in the heap.
 *
 *  The number of roots is a fixed number, see ROOTS.
 *
 *  Exits if there are no more roots available.
 *
 *  /param hp   a heap
 *  /return     a new root slot
 */
hi_int heap_root_new(heap_t* r) {
    TRACE;

    hi_int free = r->root_free;
    r->root_free++;
    if (r->root_free > ROOTS) {
        fprintf(stderr,
            "HEAP_PANIC[1]: out of roots %d\n",
            ROOTS);
        exit(1);
    }
    return free;
}

/** Set a root node to a pointer (in the heap).
 *
 *  /param hp   a heap
 *  /param n    a root slot
 *  /param r    a pointer to a structure
 */
void heap_root_set(heap_t* hp, hi_int n, hi_int* r) {
    TRACE;
    ASSERT(n < ROOTS);

    hp->root[n] = r;
}

/** Get a root node to a pointer (in the heap).
 *
 *  /param hp   a heap
 *  /param n    a root slot
 *  /return     a pointer to a structure
 */
hi_int* heap_root_get(heap_t* hp, hi_int n) {
    TRACE;
    ASSERT(n < ROOTS);

    return hp->root[n];
}

/** Allocate a node in the heap of given size.
 *
 *  Nodes may, but should not assumed, to be blanked.
 *  Nodes may, but should not be assumed, to have an initialized
 *  header field.
 *
 *  /param hp   a heap
 *  /param sz   the size requested
 *  /return     a pointer to a new node
 */
hi_int* heap_allocate(heap_t* hp, hi_int sz) {
    TRACE;

    hi_int* p = hp->free;
    hp->free = ADVANCE(hp->free, sz);

    DEBUG(p[0] = sz;)
    DEBUG(int i;)
    DEBUG(for (i = 1; i < sz; i++) p[i] = 0;)

    return p;
}

/** Copy nodes from one heap to another heap.
 *
 *  All nodes refered to by root nodes in the from heap are
 *  copied to the to heap.
 *
 *  Unsafe, the caller should make sure the to space holds
 *  enough room.
 *
 * /param from  the source heap
 * /param to    the destination heap
 * /retrun      the destination heap
 */
heap_t* heap_copy(heap_t* from, heap_t* to) {
    TRACE;

    hi_int* copy_low;
    hi_int* copy_high;
    hi_int  i,j;
    hi_int* p0;
    hi_int* p1;

    copy_low  = to->free;
    copy_high = to->free;

    // copy the roots to the to-space
    for (i = 0; i < from->root_free; i++) {
        p0 = from->root[i];
        if (IN_HEAP(p0, from->low, from->high)) {
            p1 = (hi_int*) p0[0];
            if (IN_HEAP(p1, to->low, to->high)) {
                to->root[to->root_free + i] = p1;
            } else {
                for (j = 0; j < p0[0]; j++) {
                    copy_high[j] = p0[j];
                }
                p0[0] = (hi_int) copy_high;
                to->root[to->root_free + i] = copy_high;
                copy_high = ADVANCE(copy_high, copy_high[0]);
            }
        }
    }
    to->root_free += from->root_free;

    // copy all references to the to-space
    while (copy_low != copy_high)  {
        if (copy_low[1] != 0) {
            for (i = 2; i < copy_low[0]; i++) {
                p0 = (hi_int*) copy_low[i];
                if (IN_HEAP(p0, from->low, from->high)) {
                    p1 = (hi_int*) p0[0];
                    if (IN_HEAP(p1, to->low, to->high)) {
                        copy_low[i] = (hi_int) p1;
                    } else {
                        for (j = 0; j < p0[0]; j++) copy_high[j] = p0[j];
                        copy_low[i] = (hi_int) copy_high;
                        p0[0] = (hi_int) copy_high;
                        copy_high= ADVANCE(copy_high, copy_high[0]);
                    }
                }
            }
        }
        copy_low = ADVANCE(copy_low, copy_low[0]);
    }

    to->free = copy_low;

    return to;
}

/** Resize a given heap to a given size by a copy.
 *
 *  /param hp   a heap
 *  /param sz   the requested size
 *  /return     a copy of the heap of given size
 */
heap_t* heap_resize(heap_t* hp, hi_int sz) {
    TRACE;

    heap_t* to;
    to = heap_create(sz);
    to = heap_copy(hp, to);
    heap_destroy(hp);
    return to;
}

/** Collect (copy) a heap.
 *
 *  /param hp   a heap
 *  /return     a garbage collected copy of the heap
 */
heap_t* heap_collect(heap_t* r) {
    TRACE;

    heap_t* s;
    s = r;
    s = heap_resize(s, s->size);
    if (BELOW(s->trigger_grow, s->free)) {
        s = heap_resize(s, s->size + s->size);
    }
    return s;
}

/** Collect (copy) a heap if it runs out of space.
 *
 *  /param hp   a heap
 *  /return     a collected heap, or the original
 */
heap_t* heap_try_collect(heap_t* r) {
    TRACE;

    heap_t* s;
    s = r;
    if (BELOW(s->trigger_gc, s->free)) {
        fprintf(stdout, "HEAP!\n"); // XXX
        s = heap_collect(s);
    }
    return s;
}

/** Reserve space in a heap.
 *
 *  /param hp   a heap
 *  /param sz   the requested size
 *  /return     a heap with at least sz free bytes
 */
heap_t* heap_reserve(heap_t* r, hi_int sz) {
    TRACE;

    heap_t* s;
    s = r;
    if ((void*)heap_free(s) <= (void*)sz) {
        s = heap_collect(s);
    }
    while ((void*)heap_free(s) <= (void*)sz) {
        s = heap_resize(s, s->size + s->size);
    }

    return s;
}

/** Copy everything reachable (also outside the heap) to another
 *  heap.
 *
 *  Note: Doesn't handle cyclic structures.
 *
 *  /param from a source heap
 *  /param to   a destination heap
 *  /return     the destination heap
 */
heap_t* heap_copy_deep(heap_t* from, heap_t* to) {
    TRACE;

    hi_int* copy_low;
    hi_int* copy_high;
    hi_int  i,j;
    hi_int* p0;
    hi_int* p1;

    copy_low  = to->low;
    copy_high = to->low;

    ASSERT(from->root_free == 1);
    to->root_free = from->root_free;

    // copy the root to the to-space
    p0 = from->root[0];
    for (j = 0; j < p0[0]; j++) {
        copy_high[j] = p0[j];
    }
    to->root[0] = copy_high;
    copy_high = ADVANCE(copy_high, copy_high[0]);

    // copy all references to the to-space
    while (copy_low != copy_high)  {
        if (copy_low[1] != 0) {
            for (i = 2; i < copy_low[0]; i++) {
                p0 = (hi_int*) copy_low[i];
                if (!IS_SMALL_INTEGER(p0)) {
                    for (j = 0; j < p0[0]; j++) copy_high[j] = p0[j];
                    copy_low[i] = (hi_int) copy_high;
                    copy_high= ADVANCE(copy_high, copy_high[0]);
                }
            }
        }
        copy_low = ADVANCE(copy_low, copy_low[0]);
    }

    to->free = copy_low;

    return to;
}

/** Serialize a heap to memory.
 *
 *  /param hp   a heap
 *  /param p    a piece of memory
 *  /param sz   the size of the memory in bytes
 *  /return     the number of bytes used
 */
hi_int heap_serialize(heap_t* t, void* p, hi_int sz) {
    TRACE;

    // coerce p to a heap
    heap_t* r = (heap_t*) p;
    r->size = sz;
    r->low  = (hi_int*) &(r->memory);
    r->high = INCREMENT(r, sz);
    r->free = r->low;
    r->root_free = 0;
    r->trigger_gc   = DECREMENT(r->high, HEAP_MINIMAL_FREE);
    r->trigger_grow = DECREMENT(r->high, HEAP_MINIMAL_GROW);

    // copy the first root to p
    hi_int tmp;
    tmp = t->root_free;
    t->root_free = 1;
    heap_copy_deep(t, r);
    t->root_free = tmp;

    // return the size of the copied region
    return (hi_int) (DIFFERENCE(r->free, r));
}

/** Unserialize memory to a heap.
 *
 *  /param hp   a heap
 *  /param p    a piece of memory
 *  /param sz   the size of the memory in bytes
 *  /return     the root of the structure copied
 */
hi_int* heap_unserialize(heap_t* t, void* p, hi_int sz) {
    TRACE;

    hi_int i;
    heap_t* r = (heap_t*) p;

    // get the pointer offset...
    hi_int offset = DIFFERENCE((&(r->memory)), r->root[0]);

    // alligning code
    r->low  = INCREMENT(r->low, offset);
    r->high = INCREMENT(r->high, offset);
    r->free = INCREMENT(r->free, offset);
    
    for (i = 0; i < ROOTS; i++) {
        r->root[i] = INCREMENT(r->root[i], offset);
    }

    hi_int* low  = r->low;
    hi_int* high = r->free;
    while (low != high)  {
        if (low[1] != 0) {
            for (i = 2; i < low[0]; i++) {
                if (!IS_SMALL_INTEGER(low[i])) {
                    low[i] = (hi_int)
                        INCREMENT((hi_int*)low[i], offset);
                }
            }
        }
        low = ADVANCE(low, low[0]);
    }

    // copying code
    heap_copy(r, t);
    t->root_free--;
    return t->root[t->root_free];
}

/** The size, total number of integers/pointers, of a heap.
 *
 *  /param hp   a heap
 *  /return     the size of the heap
 */
hi_int heap_size(heap_t* hp) {
    return (hi_int)
        (((void*) hp->high - (void*) hp->low)/sizeof(hi_int));
}

/** The free space, total number of unallocated integers/pointers, of a heap.
 *
 *  /param hp   a heap
 *  /return     the size of the free space of the heap
 */
hi_int heap_free(heap_t* hp) {
    return (hi_int)
        (((void*) hp->high - (void*) hp->free)/sizeof(hi_int));
}

/** The used space, total number of allocated integers/pointers, of a heap.
 *
 *  /param hp   a heap
 *  /return     the size of the used space of the heap
 */
hi_int heap_used(heap_t* hp) {
    return (hi_int)
        (((void*) hp->free - (void*) hp->low)/sizeof(hi_int));
}

/** Print some information to a file pointer.
 *
 *  /param f    a file pointer
 *  /param hp   a heap
 */
void heap_info(FILE* f, heap_t* hp) {
    fprintf(f, "**********************************************************************\n");
    fprintf(f, "GARBAGE COLLECTION INFORMATION\n");
    fprintf(f, "0-heap size           : %ld\n",   hp->size);
    fprintf(f, "0-heap low            : %p\n",    hp->low);
    fprintf(f, "0-heap high           : %p\n",    hp->high);
    fprintf(f, "0-heap free           : %p\n",    hp->free);
    fprintf(f, "0-heap roots          : %d\n",    hp->root_free);
    fprintf(f, "0-heap gc trigger     : %p\n",    hp->trigger_gc);
    fprintf(f, "0-heap grow trigger   : %p\n\n",  hp->trigger_grow);
    fprintf(f, "1-heap size           : %ld\n",   heap_size(hp));
    fprintf(f, "1-heap used           : %ld\n",   heap_used(hp));
    fprintf(f, "1-heap free           : %ld\n\n", heap_free(hp));
    fprintf(f, "2-heap minimal free   : %d\n",    HEAP_MINIMAL_FREE);
    fprintf(f, "2-heap grow           : %d\n\n",  HEAP_MINIMAL_GROW);
    fprintf(f, "**********************************************************************\n");
}

// just for debugging purposes, loops on cyclic structures
void heap_print_node(FILE* f, hi_int* n) {
    unsigned int i;
    fprintf(f, "------------------------------------\n");
    fprintf(f, "node[%p].size : %d\n", (void*) n, (unsigned int) n[0]);
    fprintf(f, "node[%p].tag  : %d\n", (void*) n, (unsigned int) n[1]);
    if (n[1] == 0) {
        for (i = 2; i < n[0]; i++) {
            fprintf(f, "node[%p].[%d]  : %d\n", (void*) n, i, (unsigned int) n[i]);
        }
    } else {
        for (i = 2; i < n[0]; i++) {
            fprintf(f, "node[%p].[%d]  : %p\n", (void*) n, i, (void*) n[i]);
        }
        for (i = 2; i < n[0]; i++) {
            heap_print_node(f, (hi_int*) n[i]);
        }
    }
}

// just for debugging purposes
void heap_debug(FILE* f, heap_t* hp) {
    hi_int i;
    heap_info(f, hp);
    for (i = 0; i < hp->root_free; i++) {
        fprintf(f, "ROOT %d:\n", i);
        heap_print_node(f, hp->root[i]);
        fprintf(f, "**********************************************************************\n");
    }
}

//////////////////////////////////////////////////////////////////////
// wasn't sure where to put this code, doesn't really fit here too.

hi_int* heap_int(heap_t* hp, int v) {
    int sz = 3 + HI_SIZE(int);
    hi_int* n = (hi_int*) heap_allocate(hp, sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = int_tag;
    n[3] = 0;
    int* p = (int*) &(n[3]);
    *p = v;
    return n;
}

hi_int* heap_char(heap_t* hp, hi_char v) {
    int sz = 3 + HI_SIZE(int);
    hi_int* n = (hi_int*) heap_allocate(hp, sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = char_tag;
    n[3] = 0;
    char* p = (char*) &(n[3]);
    *p = v;
    return n;
}

hi_int* heap_float(heap_t* hp, hi_float v) {
    int sz = 3 + HI_SIZE(hi_float);
    hi_int* n = (hi_int*) heap_allocate(hp, sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = float_tag;
    hi_float* p = (hi_float*) &(n[3]);
    *p = v;
    return n;
}

hi_int* heap_double(heap_t* hp, hi_double v) {
    int sz = 3 + HI_SIZE(hi_double);
    hi_int* n = (hi_int*) heap_allocate(hp, sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = double_tag;
    hi_double* p = (hi_double*) &(n[3]);
    *p = v;
    return n;
}

hi_int* heap_ldouble(heap_t* hp, hi_ldouble v) {
    int sz = 3 + HI_SIZE(hi_ldouble);
    hi_int* n = (hi_int*) heap_allocate(hp, sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = ldouble_tag;
    hi_ldouble* p = (hi_ldouble*) &(n[3]);
    *p = v;
    return n;
}

hi_int* heap_pointer(heap_t* hp, hi_pointer v) {
    int sz = 3 + HI_SIZE(hi_pointer);
    hi_int* n = (hi_int*) heap_allocate(hp, sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = pointer_tag;
    hi_pointer* p = (hi_pointer*) &(n[3]);
    *p = v;
    return n;
}

hi_int* heap_string(heap_t* hp, char* v) {
    int v_len = strnlen(v, 2048);
    //v[v_len] = (char) 0;
    int sz = 3 + (v_len + sizeof(hi_int))/sizeof(hi_int);
    hi_int* n = (hi_int*) heap_allocate(hp, sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = string_tag;
    char* p = (char*) &(n[3]);
    strcpy(p, v);
    return n;
}

hi_int* heap_uint8(heap_t* hp, hi_uint8 v) {
    int sz = 3 + HI_SIZE(hi_uint8);
    hi_int* n = (hi_int*) heap_allocate(hp, sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = uint8_tag;
    hi_uint8* p = (hi_uint8*) &(n[3]);
    *p = v;
    return n;
}

hi_int* heap_sint8(heap_t* hp, hi_sint8 v) {
    int sz = 3 + HI_SIZE(hi_sint8);
    hi_int* n = (hi_int*) heap_allocate(hp, sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = sint8_tag;
    hi_sint8* p = (hi_sint8*) &(n[3]);
    *p = v;
    return n;
}

hi_int* heap_uint16(heap_t* hp, hi_uint16 v) {
    int sz = 3 + HI_SIZE(hi_uint16);
    hi_int* n = (hi_int*) heap_allocate(hp, sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = uint16_tag;
    hi_uint16* p = (hi_uint16*) &(n[3]);
    *p = v;
    return n;
}

hi_int* heap_sint16(heap_t* hp, hi_sint16 v) {
    int sz = 3 + HI_SIZE(hi_sint16);
    hi_int* n = (hi_int*) heap_allocate(hp, sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = sint16_tag;
    hi_sint16* p = (hi_sint16*) &(n[3]);
    *p = v;
    return n;
}

hi_int* heap_uint32(heap_t* hp, hi_uint32 v) {
    int sz = 3 + HI_SIZE(hi_uint32);
    hi_int* n = (hi_int*) heap_allocate(hp, sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = uint32_tag;
    hi_uint32* p = (hi_uint32*) &(n[3]);
    *p = v;
    return n;
}

hi_int* heap_sint32(heap_t* hp, hi_sint32 v) {
    int sz = 3 + HI_SIZE(hi_sint32);
    hi_int* n = (hi_int*) heap_allocate(hp, sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = sint32_tag;
    hi_sint32* p = (hi_sint32*) &(n[3]);
    *p = v;
    return n;
}

hi_int* heap_uint64(heap_t* hp, hi_uint64 v) {
    int sz = 3 + HI_SIZE(hi_uint64);
    hi_int* n = (hi_int*) heap_allocate(hp, sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = uint64_tag;
    hi_uint64* p = (hi_uint64*) &(n[3]);
    *p = v;
    return n;
}

hi_int* heap_sint64(heap_t* hp, hi_sint64 v) {
    int sz = 3 + HI_SIZE(hi_sint64);
    hi_int* n = (hi_int*) heap_allocate(hp, sz * sizeof(hi_int));
    n[0] = sz;
    n[1] = 0;
    n[2] = sint64_tag;
    hi_sint64* p = (hi_sint64*) &(n[3]);
    *p = v;
    return n;
}


//** end of file:  heap/heap.c

//** start of file:  fast/fast.h
/** Runtime for the Hi Language (stage2).
 *
 *  COPYRIGHT 2008, M.C.A. Devillers
 *
 *  THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY
 *  EXPRESSED OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
 *
 *  A fast allocator for intermediates.
 *
 *  Typical usage is coercing a part of stack allocated space to
 *  a fast region.
 */
#ifndef FAST_H
#define FAST_H

#ifndef ONE_LARGE_FILE
#include "../types/types.h"
#endif

typedef struct {
    hi_uint8*      low;
    hi_uint8*      free;
    hi_uint8*      high;
    hi_uint8*      memory;
} fast_t;

#define fast_size(f) \
    ((void*) f->high - (void*) f->low)
#define fast_used(f) \
    ((void*) f->free - (void*) f->low)
#define fast_free(f) \
    ((void*) f->high - (void*) f->free)

fast_t*     fast_create(size_t sz);
void        fast_destroy(fast_t* f);
fast_t*     fast_coerce(void* p, int sz);

void        fast_reset(fast_t* f);
hi_uint8*   fast_alloc(fast_t* f, size_t sz);
hi_uint8*   fast_alloc_alligned(fast_t* f, size_t sz);

void        fast_debug(FILE* fp, fast_t* f);

#endif // FAST_H
//** end of file:  fast/fast.h

//** start of file:  fast/fast.c
#include <stdlib.h>
#include <stdio.h>
#include <stdarg.h>
#include <stdint.h>
#include <string.h>

#ifndef ONE_LARGE_FILE
#include "../assert/assert.h"
#include "fast.h"
#endif

#define BITS_SET(b0,b1)     ((b0) | (b1))
#define BITS_UNSET(b0,b1)   ((b0) & (~(b1)))
#define BITS_TEST(b0,b1)    (((b0) & (b1)) != 0)
#define BITS_FLIP(b0,b1)    ((b0) (b1))

#define BIT_SET(n,b)        BITS_SET(n, 1<<b)
#define BIT_UNSET(n,b)      BITS_UNSET(n, 1<<b)
#define BIT_TEST(n,b)       BITS_TEST(n, 1<<b)
#define BIT_FLIP(n,b)       BITS_FLIP(n, 1<<b)

/** Create a memory region for fast allocation of a given size.
 *
 *  Note, there is more memory allocated than only needed for the
 *  fast region.
 *
 *  /param  n   the requested size
 *  /return the allocated heap, exits if unsuccessfull.
 */
fast_t* fast_create(size_t sz) {
    TRACE;
    fast_t* f = (fast_t*) malloc(sz);
    if (f == 0) {
        fprintf(stderr,
            "FAST_PANIC[0]: failed to allocate a fast region of size %u\n",
            (unsigned int) sz);
        exit(1);
    }
    f->low = (hi_uint8*) &(f->memory);
    f->free = f->low;
    f->high = ((hi_uint8*) f)+sz;

    return f;
}

/** Destroy the fast memory region.
 *
 *  /param  n   the requested size
 */
void fast_destroy(fast_t* f) {
    TRACE;
    free(f);
}

/** Coerce memory to a fast region.
 *
 *  /param  p   a pointer to memory
 *  /param  n   the size of that memory
 *  /return     a fast region
 */
fast_t* fast_coerce(void* p, int sz) {
    TRACE;
    
    fast_t* f = (fast_t*) p;

    f->low = (hi_uint8*) &(f->memory);
    f->free = f->low;
    f->high = ((hi_uint8*) f)+sz;

    return f;
}

/** Print the fast memory region.
 *
 *  /param  f   the fast region
 */
void fast_print(fast_t* f) {
    hi_uint8 *p;
    for (p = f->low; p < f->high; p++) {
        fprintf(stdout, "fast[%ld] [%p]: ", p - f->low, p);
        fprintf(stdout, "%u\n", (unsigned int) p[0]);
    }
}

/** Reset the fast memory region (put the free ptr to the low).
 *
 *  /param  f   the fast region
 */
void fast_reset(fast_t* f) {
    f->free = f->low;
}

/** Allocate memory in the fast region.
 *
 *  /param  f   the fast region
 *  /param  sz  the requested size
 *  /return the memory allocated, exits if unsuccessfull.
 */
hi_uint8* fast_alloc(fast_t* f, size_t sz) {
    TRACE;

    hi_uint8* tmp = f->free;
    f->free += sz;

    return tmp;
}

/** Allocate pointer alligned memory in the fast region.
 *
 *  /param  f    the fast region
 *  /param  sz   the requested size
 *  /return the memory allocated, exits if unsuccessfull.
 */
hi_uint8* fast_alloc_alligned(fast_t* f, size_t sz) {
    TRACE;

    const int pointer_size = sizeof(void*);

    f->free = (hi_uint8*) BITS_UNSET(
                ((unsigned long) f->free)+pointer_size-1,
                ((unsigned long) pointer_size)-1);

    return fast_alloc(f, sz);
}

/** Print debugging information about the fast region.
 *
 *  /param  fp     the file pointer printed to
 *  /param  f      the fast region
 */
void fast_debug(FILE* fp, fast_t* f) {
    fprintf(fp, "**********************************************************************\n");
    fprintf(fp, "FAST REGION INFORMATION\n");
    fprintf(fp, "0-fast low            : %p\n",    f->low);
    fprintf(fp, "0-fast free           : %p\n",    f->free);
    fprintf(fp, "0-fast high           : %p\n",    f->high);
    fprintf(fp, "0-fast size           : %ld\n",   (void*) f->high - (void*) f->low);
    fprintf(fp, "**********************************************************************\n");
}
//** end of file:  fast/fast.c

//** start of file:  dynamic/dynamic.h
#ifndef DYNAMIC_H
#define DYNAMIC_H

#ifndef ONE_LARGE_FILE
#include "../types/types.h"
#endif

// type dl.dl = [ dl.dl string string pointer ]

void* dynamic_symbol (hi_int* dl);

#endif // DYNAMIC_H
//** end of file:  dynamic/dynamic.h

//** start of file:  dynamic/dynamic.c
#include <stdlib.h>
#include <stdio.h>
#include <stdarg.h>
#include <stdint.h>
#include <string.h>
#include <dlfcn.h>

#ifndef ONE_LARGE_FILE
#include "dynamic.h"
#include "../assert/assert.h"
#endif

#define SYMBOL_LENGTH 128
void* sym_posix (char* libname, char* symname) {
    void* handle;
    void* fptr;

    char libnameso[256] = "lib\0";
    strncat(libnameso, libname, SYMBOL_LENGTH);
    strncat(libnameso, ".so", SYMBOL_LENGTH);

    int mode = RTLD_LAZY;
    handle = dlopen (libnameso, mode);
    if (handle == NULL)
    {
        fprintf (stderr, "%s: dynamic failure: `%s'\n", libname, dlerror ());
        exit (1);
    };

    fptr = dlsym (handle, symname);

    if (fptr == NULL) {
        fprintf (stderr, "%s: dlsym: `%s'\n", symname, dlerror ());
        exit (1);
    };

    DEBUG(fprintf(stderr, "dlsym[%s, %s] = %p\n", libname, symname, fptr));

    return fptr;
}

void* dynamic_symbol(hi_int* n) {
    void* fptr;

    if (n[1] != 1) return (void*) n[1];

    DEBUG(hi_int* tt = get_type_tag(n));
    DEBUG(ASSERT( strncmp(convert_from_record_tag(tt), "dl.dl", 64) == 0 ));
    DEBUG(hi_int* tr = get_type_tag(n));
    DEBUG(ASSERT( strncmp(convert_from_record_tag(tr), "dl.dl", 64) == 0 ));

    hi_int* lib_tag = (hi_int*) n[4];
    hi_int* sym_tag = (hi_int*) n[5];

    char*   lib = convert_to_string(lib_tag);
    char*   sym = convert_to_string(sym_tag);


    fptr = sym_posix(lib, sym);
    // n[1] = (hi_int) fptr;

    return fptr;
}
//** end of file:  dynamic/dynamic.c

//** start of file:  ffi/hffi.h
/**
 *  COPYRIGHT 2010, M.C.A. Devillers
 *
 *  THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY
 *  EXPRESSED OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
 *
 *  Calling dynamic C libraries by using libffi.
 *
 *  On the Hi size, a combinator is set up which gets an FFI
 *  descriptor and a number of arguments.
 *
 *  The FFI descriptor describes a symbol in a dynamic library
 *  and its C type. The symbol is resolved first.
 *
 *  The FFI part is a bit of a patch work. The Hi descriptor is
 *  translated to a CIF of libffi, which is then used as the sole
 *  reference for translating values from Hi towards C values, and the
 *  inverse.
 *
 *  The only relevant exported function is ffi_call, which takes a
 *  thunk, resolves the symbol, makes the call, and places the
 *  result in the heap.
 */
#ifndef HI_FFI_H
#define HI_FFI_H

#include <stdlib.h>
#include <stdio.h>
#include <stdarg.h>
#include <stdint.h>
#include <string.h>
#include "ffi.h"

#ifndef ONE_LARGE_FILE
#include "../types/types.h"
#include "../heap/heap.h"
#include "../fast/fast.h"
#endif

ffi_type*   ffi_type_hi_to_ffi(fast_t* f, hi_int *n);

void*       ffi_value_hi_to_ffi(fast_t* f, ffi_type* ft, hi_int *n);
hi_int*     ffi_value_ffi_to_hi(heap_t* hp, ffi_type* ft, void* val);

hi_int*     ffi_hi_call(heap_t* hp, hi_int* dl, hi_int* ffi, hi_int argc, hi_int* argv);

void        ffi_type_print(FILE* f, ffi_type* tm_type);
size_t      ffi_type_size(ffi_type* ft);

#endif // HI_FFI_H
//** end of file:  ffi/hffi.h

//** start of file:  ffi/hffi.c
/**
 *  COPYRIGHT 2010, M.C.A. Devillers
 *
 *  THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY
 *  EXPRESSED OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
 */

#include <stdlib.h>
#include <stdio.h>
#include <stdarg.h>
#include <stdint.h>
#include <string.h>
#include <dlfcn.h>
#include "hffi.h"

#ifndef ONE_LARGE_FILE
#include "../assert/assert.h"
#include "../dynamic/dynamic.h"
#endif

// type ffi =
//   [ void | char | short | int | long | float | double | pointer |
//     struct1 ffi | struct2 ffi ffi | struct3 ffi ffi ffi | .. ]

/** Build an ffi_type in the fast region from a C type.
 *
 *  /param  f       the fast region
 *  /param  n       an ffi descriptor
 *  /return the ffi_type reconstructed from the value
 */
ffi_type* ffi_type_hi_to_ffi(fast_t* f, hi_int *n) {

    DEBUG(fprintf(stderr, "converting from: "));
    DEBUG(print_hi(stderr, n));
    DEBUG(fprintf(stderr, "\n"));
    DEBUG(ASSERT(n[1] == 1));
    DEBUG(hi_int* tt = get_type_tag(n));
    DEBUG(ASSERT(strncmp(convert_from_record_tag(tt), "ffi.ffi", 64) == 0));


    hi_int* rt = get_record_tag(n);
    char* s = convert_from_record_tag(rt);

    if (strncmp(s, "ffi.void", 64) == 0) {
        return &ffi_type_void;
    } else if (strncmp(s, "ffi.pointer", 64) == 0) {
        return &ffi_type_pointer;
    } else if (strncmp(s, "ffi.char", 64) == 0) {
        return &ffi_type_uchar;
    } else if (strncmp(s, "ffi.int", 64) == 0) {
        return &ffi_type_sint;
    } else if (strncmp(s, "ffi.short", 64) == 0) {
        return &ffi_type_sshort;
    } else if (strncmp(s, "ffi.long", 64) == 0) {
        return &ffi_type_slong;
    } else if (strncmp(s, "ffi.float", 64) == 0) {
        return &ffi_type_float;
    } else if (strncmp(s, "ffi.double", 64) == 0) {
        return &ffi_type_double;
    } else if (strncmp(s, "ffi.struct", 64) == 0) {
        hi_int sz = n[0];
        hi_int c  = sz - 4;

        ffi_type*  ft = (ffi_type*)
            fast_alloc_alligned(f, sizeof(ffi_type));
        ffi_type** ft_elements = (ffi_type**)
            fast_alloc_alligned(f, (sz+1) * sizeof(ffi_type*));

        ft->type = FFI_TYPE_STRUCT;
        ft->size = 0;
        ft->alignment = 0;
        ft->elements = ft_elements;

        int i;
        for (i = 0; i < c; i++) {
            hi_int* t = (hi_int*) n[4+i];
            ft_elements[i] = ffi_type_hi_to_ffi(f, t);
        }
        ft_elements[c] = NULL;

        return ft;
    }
}

/** Construct the ffi values in the fast region of a native value.
 *
 *  /param  f       the fast region
 *  /param  ft      an ffi_type description
 *  /param  n       the const/record/thunk being converted
 *  /return a pointer to the ffi value
 */
void* ffi_value_hi_to_ffi(fast_t* f, ffi_type* ft, hi_int *n) {
    if (ft->type == FFI_TYPE_VOID) {
        return (void*) 0;
    } else if (ft->type == FFI_TYPE_POINTER) {
        return (void*) get_pointer(n);
    } else if (ft->type == FFI_TYPE_UINT64) {
        return (void*) get_uint64(n);
    } else if (ft->type == FFI_TYPE_SINT64) {
        return (void*) get_sint64(n);
    } else if (ft->type == FFI_TYPE_UINT32) {
        return (void*) get_uint32(n);
    } else if (ft->type == FFI_TYPE_SINT32) {
        return (void*) get_sint32(n);
    } else if (ft->type == FFI_TYPE_UINT16) {
        return (void*) get_uint16(n);
    } else if (ft->type == FFI_TYPE_SINT16) {
        return (void*) get_sint16(n);
    } else if (ft->type == FFI_TYPE_UINT8) {
        return (void*) get_uint8(n);
    } else if (ft->type == FFI_TYPE_SINT8) {
        return (void*) get_sint8(n);
    } else if (ft->type == FFI_TYPE_FLOAT) {
        return (void*) get_float(n);
    } else if (ft->type == FFI_TYPE_DOUBLE) {
        return (void*) get_double(n);
    } else if (ft->type == FFI_TYPE_STRUCT) {
        // UNTESTED
        size_t sz = ffi_type_size(ft);
        void* p = fast_alloc_alligned(f, sz);
        char* q = (char*) p;
        int i = 0;
        for (i = 0; ft->elements[i] != 0; i++) {
            sz = ffi_type_size(ft->elements[i]);
            char* r = (char*)
                ffi_value_hi_to_ffi(f, ft->elements[i],
                                (hi_int*) n[4+i]);
            int j;
            for (j = 0; j < sz; j++) {
                q[0] = r[0];
                q++;r++;
            }
        }
        return p;
    } else {
        fprintf(stderr, "FFI_PANIC[4]: cannot create ffi value.\n");
        exit(1);
        return 0;

    }
}

/** Copy the ffi values to the heap.
 *
 *  /param  hp      the heap
 *  /param  ft      an ffi_type description
 *  /param  val     the value
 *  /return the corresponding value in the heap
 */
hi_int* ffi_value_ffi_to_hi(heap_t* hp, ffi_type* ft, void* p) {
    if (ft->type == FFI_TYPE_VOID) {
        return 0;
    } else if (ft->type == FFI_TYPE_POINTER) {
        return heap_pointer(hp, *((hi_pointer*) p));
    } else if (ft->type == FFI_TYPE_UINT64) {
        return heap_int(hp, *((hi_uint64*) p));
    } else if (ft->type == FFI_TYPE_SINT64) {
        return heap_int(hp, *((hi_sint64*) p));
    } else if (ft->type == FFI_TYPE_UINT32) {
        return heap_int(hp, *((hi_uint32*) p));
    } else if (ft->type == FFI_TYPE_SINT32) {
        return heap_int(hp, *((hi_sint32*) p));
    } else if (ft->type == FFI_TYPE_UINT16) {
        return heap_int(hp, *((hi_uint16*) p));
    } else if (ft->type == FFI_TYPE_SINT16) {
        return heap_int(hp, *((hi_sint16*) p));
    } else if (ft->type == FFI_TYPE_UINT8) {
        return heap_char(hp, *((hi_uint8*) p));
    } else if (ft->type == FFI_TYPE_SINT8) {
        return heap_char(hp, *((hi_sint8*) p));
    } else if (ft->type == FFI_TYPE_FLOAT) {
        return heap_float(hp, *((hi_float*) p));
    } else if (ft->type == FFI_TYPE_DOUBLE) {
        return heap_double(hp, *((hi_double*) p));
#ifdef FFI_TYPE_LONGDOUBLE
    } else if (ft->type == FFI_TYPE_LONGDOUBLE) {
        return heap_ldouble(hp, *((hi_ldouble*) p));
#endif
    } else {
        fprintf(stderr, "FFI_PANIC[4]: cannot create return value.\n");
        exit(1);
        return 0;

    }
}

/** Make a call through the ffi interface.
 *
 *  Note: could/should store ffi information in the embedding thunk in
 *  the future.
 *
 *  The caller is responsible for making sure that enough arguments
 *  are present.
 *
 *  /param  hp      the heap
 *  /param  dl      the dl record
 *  /param  ffi     the ffi record
 *  /param  argc    the number of arguments
 *  /param  argv    the arguments
 */
hi_int* ffi_hi_call(heap_t* hp, hi_int* dl, hi_int* ffi, hi_int argc, hi_int* argv) {
    // set up the fast scratch region
    hi_uint8 fmem[512];
    fast_t* f = fast_coerce((void*) fmem, 512);
    
    void* fn = dynamic_symbol(dl);

    print_hi(stderr, ffi);

    hi_int  ffi_record_argc   = (hi_int)  ffi[0] - 5;
    hi_int* ffi_record_argv   = (hi_int*) &(ffi[5]);
    hi_int* ffi_record_return = (hi_int*) (ffi[4]);

    fprintf(stderr, "0: ");
    print_hi(stderr, (hi_int*) ffi_record_argv[0]);
    fprintf(stderr, "1: ");
    print_hi(stderr, (hi_int*) ffi_record_argv[1]);
    fprintf(stderr, "-1: ");
    print_hi(stderr, (hi_int*) ffi_record_argv[-1]);
    fprintf(stderr, "ret: ");
    print_hi(stderr, ffi_record_return);

    if (ffi_record_argc == 0) {
        fprintf(stderr,
            "FFI_PANIC[5]: not implemented yet\n");
        exit(1);
    }

    ASSERT(argc == ffi_record_argc);

    if (argc == 0) goto ffi_constant;

    // build the cif  NOTE:one extra arg, ends with 0!
    ffi_cif cif;
    ffi_type** ffi_type_args = (ffi_type**)
        fast_alloc(f, (ffi_record_argc+1) * sizeof(ffi_type*));

    int i;
    for (i = 0; i < ffi_record_argc; i++) {
        ffi_type_args[i] = ffi_type_hi_to_ffi(f, (hi_int*) ffi_record_argv[i]);
    }
    ffi_type_args[i] = 0;

    ffi_type* ret = ffi_type_hi_to_ffi(f, ffi_record_return);

    // prepare the values
    void** arg_values = (void**)
            fast_alloc(f, (ffi_record_argc+1) * sizeof(void*));
    for (i = 0; i < ffi_record_argc; i++) {
        arg_values[i] =
            ffi_value_hi_to_ffi(f, ffi_type_args[i],
                        (hi_int*) argv[i]);
    }
    arg_values[i] = 0;

    // void hack: FFI doesnt allow void as type/arg.  XXX should be fixed
    if (ffi_type_args[0]->type == FFI_TYPE_VOID) {
        ffi_record_argc--;
        ffi_type_args[0] = 0;
        arg_values[0] = 0;
    }

    // prepare the cif
    if (ffi_prep_cif(&cif, FFI_DEFAULT_ABI,
            ffi_record_argc, ret, ffi_type_args) != FFI_OK) {
        fprintf(stderr,
            "FFI_PANIC[6]: could not create a call interface\n");
        exit(1);
    }

    void* ret_value = fast_alloc_alligned(f, 128);
    ffi_call(&cif, fn, ret_value, arg_values);

    // return the conversion
    hi_int* hi_ret = ffi_value_ffi_to_hi(hp, ret, ret_value);
    
    return hi_ret;

ffi_constant:
    // XXX: NO IDEA IF THIS WORKS :~/
    ret = ffi_type_hi_to_ffi(f, ffi_record_return);
    hi_ret = ffi_value_ffi_to_hi(hp, ret, fn);

    return hi_ret;
}

/** Retrieve the size of an ffi_type.
 *
 *  /param  ft      an ffi_type description
 *  /return the size of a corresponding value
 */
size_t ffi_type_size(ffi_type* ft) {
    if (ft->type == FFI_TYPE_STRUCT) {
        // XXX: all fields are alligned on their sizes
        size_t size = 0;
        ffi_type** ft_elements = ft->elements;
        int i;
        while (ft_elements[i] != 0) {
            size += ffi_type_size(ft_elements[i]);
            i++;
        }
        return size;
    } else {
        return ft->size;
    }
}

typedef struct {
    int     ffi_type;
    char*   name;
} ffi_print_conversion;

/** Print an ffi_type to a file. Exit on failure.
 *
 *  /param  f        a file pointer.
 *  /param  tm_type  an ffi_type
 */
void ffi_type_print(FILE* f, ffi_type* tm_type) {
    static ffi_print_conversion ffi_print_table[16] =
    {
        {FFI_TYPE_VOID,       "void"},
        {FFI_TYPE_INT,        "int"},
        {FFI_TYPE_FLOAT,      "float"},
        {FFI_TYPE_DOUBLE,     "double"},
        {FFI_TYPE_LONGDOUBLE, "longdouble"},
        {FFI_TYPE_UINT8,      "uint8"} ,
        {FFI_TYPE_SINT8,      "sint8"},
        {FFI_TYPE_UINT16,     "uint16"},
        {FFI_TYPE_SINT16,     "sint16"},
        {FFI_TYPE_UINT32,     "uint32"},
        {FFI_TYPE_SINT32,     "sint32"},
        {FFI_TYPE_UINT64,     "uint64"},
        {FFI_TYPE_SINT64,     "sint64"},
        {FFI_TYPE_POINTER,    "pointer"},
        {FFI_TYPE_STRUCT,     "struct"},
        {-1,0}
    };

    int i = 0;
    int ct = tm_type->type;
    for (i = 0; ffi_print_table[i].ffi_type != -1; i++) {
        if (ffi_print_table[i].ffi_type == ct) {
            fprintf(f, "%s", ffi_print_table[i].name);
            if (ct == FFI_TYPE_STRUCT) {
                fprintf(f, "{");
                ffi_type** tm_type_elements = tm_type->elements;

                int j;
                for (j = 0; tm_type_elements[j] != 0; j++) {
                    ffi_type_print(f, tm_type_elements[j]);
                    fprintf(f, " ");
                }
                fprintf(f, "}");
            }
            return;
        }
    }
    fprintf(stderr, "FFI_PANIC[0]: garbled ffi type.\n");
    exit(1);
}
//** end of file:  ffi/hffi.c