Project : OCaml Threads on Multi-core Processors Authors : Programmers : Adrien Jonquet (Master student - UPMC Paris 6) and Mathias Bourgoin (Master sudent - UPMC Paris 6) Supervisors : Benjamin Canou (Phd student - LIP6 - UPMC Paris 6), Philippe Wang (Phd student - LIP6 - UPMC Paris 6) and Emmanuel Chailloux (Professor - LIP6 - UPMC Paris 6) Abstract : OCaml currently allows concurrent programming, with a serious restriction : only one thread can be executed at the same time. We propose to provide threads for OCaml which can be run in parallel, then concurrent programs can take advantage of the now widespread multicore CPUs. To achieve this goal, we propose to write a new threads API based on an alternative compatible runtime, including a new garbage collector which allowing multithreading on multicore. Threads in OCaml : Historically, OCaml's runtime, including the garbage collector, has been designed for Caml Light, its predecessor. Caml Light's runtime was efficient (which highly participated in its success), including a garbage collector taking advantage of the absence of threads. OCaml's garbage collector combines various techniques described in detail in [1]. It works on two generations, the old and the young. It mainly uses a Stop&Copy algorithm on the young generation (minor garbage collection) and an incremental Mark&Sweep on the old generation (major garbage collection). The main idea of Stop&Copy is to use a secondary memory in order to copy and compact the memory regions to be saved. The heap is divided into two parts: the useful part (called from-space), and the part being re-written (called to-space). It depends only on the size of the objects to be kept, but implies to double the memory used by the program. The idea of Mark&Sweep is to mark memory blocks actually used in the first phase, and to delete unmarked blocks in a second �sweeping� phase. It does not compact the memory after freeing it and depends on the total size of the heap (not only the object to be kept), but can be done incrementally. A final ''Compact'' phase can be added, moving the non freed block at the beginning of the heap. A young object that survives a minor garbage collection is relocated to the old generation. The Stop&Copy uses the old generation as the to-space. When it is finished, the entire from-space is completely freed. This GC algorithm cannot be executed with other threads running in parallel. The Stop&Copy for young generation, and the Compact phase for old generation, become invalid if another thread is accessing data in the heap since they can move pointers. These phases also modify the internal structures of the runtime (like the base heap address) so even if other threads are suspended before a GC call, they have to be stopped before an access to these structures. Moreover, allocation is done by decreasing the lower bound of the heap (simply a pointer decrement without checking if another thread is already performing an allocation). As well, some primitives are coded in a very optimised way, by directly modifying internal structures. These practices are made possible by the assumption that only one thread is running at a time. Also, the compiler inlines some allocations, which means that not only the runtime but also some code emitted by the compiler makes assumptions on the absence of parallelism. Proposition : We want to add POSIX-like threads, which can be executed in parallel, to OCaml. Since the current runtime has not be designed with this model in mind, we need to provide a compatible alternative runtime, in particular a new garbage collector and a new allocator. The main design choices for a GC, as described in [2] for Java, are the following : 1. serial versus parallel, 2. concurrent versus stop-the-world and 3. compacting versus non-compacting versus copying. We plan to design a simple garbage collection with mainly a unique serial, stop-the-world, copying algorithm adapted for POSIX threads. The current allocation as explained in the previous section is really efficient, and this is very important for languages as OCaml which allocate a lot of small objects. A first step will be to use a similar mechanism encapsulated in a critical section. However, locking a mutex for each allocation will probably cost too much so we will then have to find a way to lock the heap less often (by using local small heaps for example). Programming a garbage collector is not a really long or difficult task, but the debugging and tuning phases are on the contrary very long because they require to find a good test base (which we intend to publish along with the runtime), and difficult because the algorithm may terminate without complaining but corrupting the memory, making bugs appear later and anywhere in the code. We also plan to provide an interface to ease the plugging of another alternative garbage collector into the runtime, and to write its documentation. Finally, we intend to test the new runtime extensively, and to provide benchmarks of concurrent algorithm on multicore machines, as well as on sequential ones to determine the overhead compared to the current garbage collector. Conclusion : Our project will be considered successful when we have an alternative OCaml runtime allowing shared memory parallel threads and taking good avantage of the now widespread multicore architectures. This means that a good-for-parallelism program running on an architecture with 2 or more cores should return the result quicker with the new runtime than the same program with the current runtime. This work can be evaluated to a 6 man-months project that we can split as : 5 months for the programmers and 1 month for the supervisors. It can be started at the end of May to finish before mid-August. The delivery will be under open software licence as CeCILL-B (http://www.cecill.info/index.en.html) and will contain the sources of a small patch for the OCaml compiler, a new threads API, a compatible runtime and a relevant set of tests. References : [1] Emmanuel Chailloux, Pascal Manoury, and Bruno Pagano. Developing applications with Objective Caml URL = http://caml.inria.fr/pub/docs/oreilly-book/. french version : O'Reilly, Apr 2000. [2] Sun Microsystems. Memory management In The Hotspot Virtual Machine. Technical White Paper, Apr 2006. [3] Bruno Pagano, Olivier Andrieu, Benjamin Canou, Emmanuel Chailloux, Jean-Louis Cola�o, Thomas Moniot, and Philippe Wang. Certified Development Tools Implementation in OCaml. In Pratical Aspects of Declarative Laguages (PADL), pages 2--17, LNCS, Springer, San Francisco, Jan 2008. [4] Esterel technologies. MLCov, a code coverage tool for OCaml with MC/DC, URL = http://www.esterel-technologies.com/technology/free-software Dec, 2007.
Saturday, May 29, 2010
OCaml Threads on Multi-core Processors
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment