Saturday, May 29, 2010

OCaml Threads on Multi-core Processors

Project : OCaml Threads on Multi-core Processors
Authors :  
  Programmers   : Adrien Jonquet (Master student - UPMC Paris 6)   
              and Mathias Bourgoin (Master sudent  - UPMC Paris 6)
  Supervisors   : Benjamin Canou (Phd student - LIP6 - UPMC Paris 6), 
                  Philippe Wang  (Phd student - LIP6 - UPMC Paris 6)
              and Emmanuel Chailloux (Professor - LIP6 - UPMC Paris 6)

Abstract : 
      OCaml currently allows concurrent programming, with a serious
      restriction : only one thread can be executed at the same time.
      We propose to provide threads for OCaml which can be run in
      parallel, then concurrent programs can take advantage of the now
      widespread multicore CPUs.  To achieve this goal, we propose to
      write a new threads API based on an alternative compatible
      runtime, including a new garbage collector which allowing
      multithreading on multicore.

Threads in OCaml :

Historically, OCaml's runtime, including the garbage collector, has
been designed for Caml Light, its predecessor.  Caml Light's runtime
was efficient (which highly participated in its success), including a
garbage collector taking advantage of the absence of threads.

OCaml's garbage collector combines various techniques described in
detail in [1]. It works on two generations, the old and the young. It
mainly uses a Stop&Copy algorithm on the young generation (minor
garbage collection) and an incremental Mark&Sweep on the old
generation (major garbage collection).

The main idea of Stop&Copy is to use a secondary memory in order to
copy and compact the memory regions to be saved.  The heap is divided
into two parts: the useful part (called from-space), and the part
being re-written (called to-space).  It depends only on the size of
the objects to be kept, but implies to double the memory used by the
program.

The idea of Mark&Sweep is to mark memory blocks actually used in the
first phase, and to delete unmarked blocks in a second �sweeping�
phase.  It does not compact the memory after freeing it and depends on
the total size of the heap (not only the object to be kept), but can
be done incrementally.  A final ''Compact'' phase can be added, moving
the non freed block at the beginning of the heap.

A young object that survives a minor garbage collection is relocated
to the old generation. The Stop&Copy uses the old generation as the
to-space.  When it is finished, the entire from-space is completely
freed.

This GC algorithm cannot be executed with other threads running in
parallel. The Stop&Copy for young generation, and the Compact phase
for old generation, become invalid if another thread is accessing data
in the heap since they can move pointers.  These phases also modify
the internal structures of the runtime (like the base heap address) so
even if other threads are suspended before a GC call, they have to be
stopped before an access to these structures.

Moreover, allocation is done by decreasing the lower bound of the heap
(simply a pointer decrement without checking if another thread is
already performing an allocation).  As well, some primitives are coded
in a very optimised way, by directly modifying internal structures.
These practices are made possible by the assumption that only one
thread is running at a time.  Also, the compiler inlines some
allocations, which means that not only the runtime but also some code
emitted by the compiler makes assumptions on the absence of
parallelism.

Proposition : 

We want to add POSIX-like threads, which can be executed in parallel,
to OCaml.  Since the current runtime has not be designed with this
model in mind, we need to provide a compatible alternative runtime, in
particular a new garbage collector and a new allocator.

The main design choices for a GC, as described in [2] for Java, are
the following :
    1. serial versus parallel, 
    2. concurrent versus stop-the-world
and 3. compacting versus non-compacting versus copying.

We plan to design a simple garbage collection with mainly a unique
serial, stop-the-world, copying algorithm adapted for POSIX
threads. The current allocation as explained in the previous section
is really efficient, and this is very important for languages as OCaml
which allocate a lot of small objects. A first step will be to use a
similar mechanism encapsulated in a critical section. However, locking
a mutex for each allocation will probably cost too much so we will
then have to find a way to lock the heap less often (by using local
small heaps for example).

Programming a garbage collector is not a really long or difficult
task, but the debugging and tuning phases are on the contrary very
long because they require to find a good test base (which we intend to
publish along with the runtime), and difficult because the algorithm
may terminate without complaining but corrupting the memory, making
bugs appear later and anywhere in the code.  We also plan to provide
an interface to ease the plugging of another alternative garbage
collector into the runtime, and to write its documentation.

Finally, we intend to test the new runtime extensively, and to provide
benchmarks of concurrent algorithm on multicore machines, as well as
on sequential ones to determine the overhead compared to the current
garbage collector.

Conclusion :

Our project will be considered successful when we have an alternative
OCaml runtime allowing shared memory parallel threads and taking good
avantage of the now widespread multicore architectures.  This means
that a good-for-parallelism program running on an architecture with 2
or more cores should return the result quicker with the new runtime
than the same program with the current runtime. This work can be
evaluated to a 6 man-months project that we can split as : 5 months
for the programmers and 1 month for the supervisors. It can be started
at the end of May to finish before mid-August. The delivery will be
under open software licence as CeCILL-B
(http://www.cecill.info/index.en.html) and will contain the sources of
a small patch for the OCaml compiler, a new threads API, a compatible
runtime and a relevant set of tests.

References : 

[1] Emmanuel Chailloux, Pascal Manoury, and Bruno Pagano.
    Developing applications with Objective Caml 
    URL = http://caml.inria.fr/pub/docs/oreilly-book/.
    french version : O'Reilly, Apr 2000.

[2] Sun Microsystems.
    Memory management In The Hotspot Virtual Machine.
    Technical White Paper, Apr 2006.

[3] Bruno Pagano, Olivier Andrieu, Benjamin Canou, Emmanuel Chailloux, 
    Jean-Louis Cola�o, Thomas Moniot, and Philippe Wang.
    Certified Development Tools Implementation in OCaml.
    In Pratical Aspects of Declarative Laguages (PADL),
    pages 2--17, LNCS, Springer, San Francisco, Jan 2008.

[4] Esterel technologies.
    MLCov, a code coverage tool for OCaml with MC/DC, 
    URL = http://www.esterel-technologies.com/technology/free-software
    Dec, 2007.

No comments:

Post a Comment