In order to go through this tutorial, you will need a working Haskell
environment. If you don’t already have one follow the instructions
here to install the compiler and
then
go here
to install stack
, a popular build tool for Haskell projects.
Once you’re up and running, you’ll want to get hold of the
distributed-process
library and a choice of network transport
backend. This guide will use the network-transport-tcp
backend, but
other backends are available on Hackage
and GitHub.
Starting a new Cloud Haskell project using stack
is as easy as
in a fresh new directory. This will populate the directory with
a number of files, chiefly stack.yaml
and *.cabal
metadata files
for the project. You’ll want to add distributed-process
and
network-transport-tcp
to the build-depends
stanza of the
executable section.
Cloud Haskell’s lightweight processes reside on a “node”, which must be initialised with a network transport implementation and a remote table. The latter is required so that physically separate nodes can identify known objects in the system (such as types and functions) when receiving messages from other nodes. We will look at inter-node communication later, for now it will suffice to pass the default remote table, which defines the built-in types that Cloud Haskell needs at a minimum in order to run.
In app/Main.hs
, we start with our imports:
Our TCP network transport backend needs an IP address and port to get started with:
And now we have a running node.
We start a new process by evaluating runProcess
, which takes a node and
a Process
action to run, because our concurrent code will run in the
Process
monad. Each process has an identifier associated to it. The process
id can be used to send messages to the running process - here we will send one
to ourselves!
Note that we haven’t deadlocked our own thread by sending to and receiving
from its mailbox in this fashion. Sending messages is a completely
asynchronous operation - even if the recipient doesn’t exist, no error will be
raised and evaluating send
will not block the caller, even if the caller is
sending messages to itself.
Each process also has a mailbox associated with it. Messages sent to
a process are queued in this mailbox. A process can pop a message out of its
mailbox using expect
or the receive*
family of functions. If no message of
the expected type is in the mailbox currently, the process will block until
there is. Messages in the mailbox are ordered by time of arrival.
Let’s spawn two processes on the same node and have them talk to each other:
Note that we’ve used receiveWait
this time around to get a message.
receiveWait
and similarly named functions can be used with the
Match
data type to provide a range of advanced message
processing capabilities. The match
primitive allows you to construct
a “potential message handler” and have it evaluated against received
(or incoming) messages. Think of a list of Match
es as the
distributed equivalent of a pattern match. As with expect
, if the
mailbox does not contain a message that can be matched, the evaluating
process will be blocked until a message arrives which can be
matched.
In the echo server above, our first match prints out whatever string it
receives. If the first message in our mailbox is not a String
, then our
second match is evaluated. Thus, given a tuple t :: (ProcessId, String)
, it
will send the String
component back to the sender’s ProcessId
. If neither
match succeeds, the echo server blocks until another message arrives and tries
again.
Processes may send any datum whose type implements the Serializable
typeclass, defined as:
That is, any type that is Binary
and Typeable
is Serializable
. This is
the case for most of Cloud Haskell’s primitive types as well as many standard
data types. For custom data types, the Typeable
instance is always
given by the compiler, and the Binary
instance can be auto-generated
too in most cases, e.g.:
We saw above that the behaviour of processes is determined by an action in the
Process
monad. However, actions in the Process
monad, no more serializable
than actions in the IO
monad. If we can’t serialize actions, then how can we
spawn processes on remote nodes?
The solution is to consider only static actions and compositions thereof.
A static action is always defined using a closed expression (intuitively, an
expression that could in principle be evaluated at compile-time since it does
not depend on any runtime arguments). The type of static actions in Cloud
Haskell is Closure (Process a)
. More generally, a value of type Closure b
is a value that was constructed explicitly as the composition of symbolic
pointers and serializable values. Values of type Closure b
are serializable,
even if values of type b
might not be. For instance, while we can’t in general
send actions of type Process ()
, we can construct a value of type Closure
(Process ())
instead, containing a symbolic name for the action, and send
that instead. So long as the remote end understands the same meaning for the
symbolic name, this works just as well. A remote spawn then, takes a static
action and sends that across the wire to the remote node.
Static actions are not easy to construct by hand, but fortunately Cloud
Haskell provides a little bit of Template Haskell to help. If f :: T1 -> T2
then
You can turn any top-level unary function into a Closure
using mkClosure
.
For curried functions, you’ll need to uncurry them first (i.e. “tuple up” the
arguments). However, to ensure that the remote side can adequately interpret
the resulting Closure
, you’ll need to add a mapping in a so-called remote
table associating the symbolic name of a function to its value. Processes can
only be successfully spawned on remote nodes if all these remote nodes have
the same remote table as the local one.
We need to configure our remote table (see the API reference for more details) and the easiest way to do this, is to let the library generate the relevant code for us. For example:
The last line is a top-level Template Haskell splice. At the call site for
spawn
, we can construct a Closure
corresponding to an application of
sampleTask
like so:
The call to remotable
implicitly generates a remote table by inserting
a top-level definition __remoteTable :: RemoteTable -> RemoteTable
in our
module for us. We compose this with other remote tables in order to come up
with a final, merged remote table for all modules in our program:
In the above example, we spawn sampleTask
on node us
in two
different ways:
spawn
, which expects some node identifier to spawn a process
on along with a Closure
for the action of the process.spawnLocal
, a specialization of spawn
for the case when the
node identifier actually refers to the local node (i.e. us
). In
this special case, no serialization is necessary, so passing an
action directly rather than a Closure
works just fine.