Read this and some stuff of the emerging languages conference 2010. It seems everyone agrees that the way to handle parallelism is by viewing an algorithm as a DAG and to be able to spawn separate threads, defer evaluation in a thread while it waits for IO, and being able to join several threads together in different manners. The number of primitives are just not that big.
The above model doesn't take into account persistency of communicating processes (unless a reference to the process is part of the return result). It's still fork and join.