The Dining Philosophers
Split SPLIT_ID
CS 300, Parallel and Distributed Computing
(PDC)
Due Wednesday, October 3, 2018
Implement a solution to Dining Philosophers in MPI. Experience/observe deadlock, starvation, etc.
Sync, SyncServer
Preliminary material
Deadlock, starvation, race conditions
Client/Server applications
Synchronization code
Sync.h
Laboratory exercises
Login to a cluster admin node. Create a subdirectory
~/PDC/lab3
, then change your default directory tolab3
.Starting a synchronization server. We will use MPI to program one process as a synchronization server, which can be used to insure correct interprocess communication.
Make a copy of the file
Sync.h
, which defines an enumerated type and two classes. This file implements a simple client-server mechanism using MPI.The enumerated type
SyncMessageType
defines two named integersQUIT
andPRINT
, to be used as MPI tags.The class
SyncServer
consists of just one methodrun()
that receives MPI messages, takes an action, and sends a message in reply.The class
Sync
has methods for sending and receiving to a process that is runningSyncServer
. The constructor forSync
takes one argument, the rank of a process runningSyncServer
.Note: Be sure to use the rank of the
SyncServer
process for this constructor call. The argument of theSync()
constructor will be the destination for messages, so if you fill in the rank of the process callingSync()
, that process will be sending and receiving from itself -- which risks deadlock!
Write an MPI application
dp.cpp
that testsSync.h
The head node (rank 0) should execute
SyncServer::run()
Other nodes should create an instance of
Sync
with constructor argument 0, then should carry out the following steps:Send a
PRINT
message with string"hello"
to the serverReceive an
ACK
message from the serverSend a
QUIT
message (with empty string) to serverReceive
ACK
from server.
Note: Only one of the non-head nodes will get as far as sending its
QUIT
message before the program terminates.
After your code is working correctly, create a copy named
dp1.cpp
andcommit
both copies plusSync.h
. (The copydp1.cpp
will make it easier for graders to retrieve your work at this stage, after you make further changes indp.cpp
during steps below.)cumulus$ cp dp.cpp dp1.cpp
cumulus$ git add dp.cpp dp1.cpp Sync.h cumulus$ git commit -m "lab3: First test program for Sync.h"
Make a copy
dp.h
ofSync.h
, and editdp.cpp
to includedp.h
instead ofSync.h
. Then, add an "echo" service todp.h
, in which the server sends anACK
message with the same content in response to anECHO
message. This will require adding the identifierECHO
to the enumerated type indp.h
, and adding a new case to the infinitewhile
loop inSync::run()
to handleECHO
requests.Modify
dp.cpp
to include a test of the newECHO
service, and test your program, showing enough diagnostic output to verify that your new service works.Once your code is correct, create a
commit
recording this work as follows:cumulus$ cp dp.cpp dp2.cpp cumulus$ cp dp.h dp2.h
cumulus$ git add dp.cpp dp2.cpp dp.h dp2.h cumulus$ git commit -m "lab3: Add ECHO service to dp.h"
Change
dp.cpp
so the head node runs on the highest-ranked node instead of the node with rank 0. Test, and create a commit if you want.Add a "chopstick" service to
dp.h
, which allocates an arraychop[]
ofint
having one element for each process other than the server, representing whether that chopstick is available (valueNONE
) or is allocated by a process (value is rank of that process). Note: Define astatic const int
variableNONE
with value -1, for code readability.Here are some suggestions for this step.
Add two new message types, named
GETCHOP
andPUTCHOP
, for requesting a chopstick and releasing one, resp.If there are
nprocs
processes in your job, there will benprocs
-1 "philosopher" (non-server) processes, and thereforenprocs
-1 chopsticks. So, your chopstick arraychop[]
should have sizenproc
-1int
s.You can allocate your array
chop[]
in yourmain()
program, and passchop
and its length to the server as arguments forrun()
. Or, you can allocatechop[]
as a local variable inrun()
and calculate its length usingMPI_Comm_size()
.The "philosophers" will have rank 0 to
nprocs
-2, and the server will have ranknprocs
-1.You can represent that philosopher's left chopstick by
chop[p]
and that philosopher's right chopstick bychop[p+1]
, except when p =nprocs
-2, in which case the right chopstick would bechop[0]
. Under this strategy, a formula for the right chopstick would bechop[(p+1)%(nprocs-1)]
.You can store a philosopher's rank p in an array element
chop[k]
when p possesses chopstickk
.Your server code for handling
PUTCHOP
andGETCHOP
requests can determine the requesting process's rank p as status.MPI_SOURCE. The message's content can consist of a one-character string, either"R"
for the Right chopstick (chop[p]
) or"L"
for the Left chopstick (chop[(p+1)%(nprocs-1)]
).For
GETCHOP
requests:If the requested chopstick is unassigned (value
NONE
), then assign p to that array location and send an emptyACK
message (containing zero-length string) to the process with rank p, to indicate that the request succeeded.If the requested chopstick is already assigned to a process with a different rank, send an
ACK
message containing the string"NO"
to the process with rank p.If the requested chopstick is already assigned to that same process with rank p, send an
ACK
message containing the string"ERROR"
to the process with rank p, because no process should be requesting the same chopstick twice without releasing it between requests.
For
PUTCHOP
requests:If the requested chopstick is already assigned to that same process with rank p, then assign
NONE
to that array location and send an emptyACK
message to that process with rank p, to indicate that the request succeeded.Otherwise, send an
ACK
message containing the string"ERROR"
to the process with rank p, because no process should be trying to put back a chopstick that it doesn't possess.
Create a
commit
recording this work as follows:cumulus$ cp dp.cpp dp3.cpp cumulus$ cp dp.h dp3.h
cumulus$ git add dp.cpp dp3.cpp dp.h dp3.h cumulus$ git commit -m "lab3: Add chopstick service to dp.h, GETCHOP and PUTCHOP"
-
Implement a simulation of the dining philosophers problem, with random thinking and eating time and 5 philosophers (thus 6 processes, one for the synchronization server)
The function
stosleep()
below will cause a process to pause execution for a given number of milliseconds (thousandths of a second). For example, the callstosleep(1500)
will pause for 1.5 seconds. (This uses a system callselect
; system calls are discussed in an Operating Systems course.)#include <sys/select.h> /* pause execution for msec thousandths of a second, on Linux */ void stosleep(long msec) { struct timeval tv; tv.tv_sec = msec/1000; tv.tv_usec = 1000 * msec % 1000000; select(0, NULL, NULL, NULL, &tv); }
You can use
stosleep()
to simulate your philosophers thinking and eating. For example, the callstosleep(rand()%2000)
will cause a random delay of up to 2 sec. Be sure to callsrand()
with an argument that depends on rank in order to get different random delays for different philosophers.Use a simple (if incorrect) algorithm for philosophers attempting to obtain chopsticks, such as the following.
repeat forever think pick up left chopstick (as soon as it's available) pick up right chopstick (as soon as it's available) eat return left chopstick return right chopstick
Check theACK
message for aGETCHOP
request to determine when a chopstick is available: if thatACK
message is empty ( MPI_Get_count() delivers a non-zero value with data typeMPI_BYTE
), then the chopstick was obtained; a non-emptyACK
message means the chopstick wasn't available (or an error). If the chopstick isn't available,print a message
"Process p retrying for its C chopstick"
(where C is the chopstickL
orR
being sought). (wherep
is the philosopher's rank), thenrepeatedly wait for a short time (
stosleep(100)
waits for 0.1 sec) and try again, until the chopstick becomes available, and finallyprint a message
"Process p now possesses chopstick
.C
"
Try running your simulation, with output from each philosopher process labelled according to rank indicating each step of the algorithm.
Note: This algorithm never stops, so use CTRL-C to stop the
mpirun
job after letting your program run for a short while.Create a
commit
recording this work as follows:cumulus$ cp dp.cpp dp4.cpp cumulus$ cp dp.h dp4.h
cumulus$ git add {dp,dp4}.{cpp,h} cumulus$ git commit -m "lab3: Dining philosophers simulation with a simple, vulnerable algorithm"
Note: The notation{dp,dp4}.{cpp,h}
is interpreted by the shell and expands to the four filenamesdp.cpp dp4.cpp dp.h dp4.h
. You may use the expanded version instead of the special notation if you prefer.Patterns in The Dining Philosophers .
Besides Message Passing, this lab also represents the SPMD pattern. Although literally the same program is executed by each node, the synchronization process does completely different work than the philosopher processes, so the SPMD pattern actually only applies to the philosopher processes. In this case, the set data for a philosopher process consists of the parameters that determine our philosopher simulation, which includes a node's rank and that node's particular sequence of random numbers.
The chopstick service provides an example of a new pattern at the mid level.
The array
chop[]
illustrates the Shared Array pattern, in which multiple processes or threads can use the same array data structure safely, without danger of race conditions.
In
dp.cpp
, the synchronization process that runs theSyncServer
code prevents any race conditions. No race conditions can occur becauseonly the synchronization process has direct access to the array elements, and
that synchronization process is programmed to carry out any philosopher process's request completely before proceeding to any other requests, which insures that no two requests will interfere with each other's array access through unfortunate timing.
This synchronization strategy for protecting the array
chop[]
implements a new low-level pattern.The Mutual Exclusion pattern appears when no two processes or threads can access shared resources at the same time.
Here, the array
chop[]
is the shared resource among the philosopher processes, and the synchronization process guarantees that only one of those philosopher processes can access that array at a time. Mutual Exclusion is a low-level pattern, focused much more on hardware resources than on the features of an application, but is quite unlike the other low-level patterns we have seen (Message Passing and Collective Communication).There is also a high level pattern at work in
dp.cpp
.In Task Decomposition, a programming problem is approached by dividing the work into multiple tasks that can be carried out simultaneously.
In
dp.cpp
, there are two kinds of tasks: the synchronization service; and the philosopher tasks. Only one synchronization task is used, whereas multiple philosopher tasks are needed.Our two high-level patterns, Data Decomposition and Task Decomposition, concern strategies for finding the opportunities for parallelism in a programming problem. The mid- and low-level patterns focus on implementing and executing those decompositions.
Make copies
deadlock.cpp
anddeadlock.h
of your dining philosophers simulation codedp.cpp
anddp.h
, and modify the copies in order to cause a deadlock.One way to do this is to take out the random delays for eating and thinking, and instead add delays between picking up left and right chopsticks, so that all philosophers will pick up their left chopsticks and be unable to obtain their right chopsticks.
Note: Since our philosophers don't actually block when they can't get a desired chopstick, but keep retrying (say, every 0.1 sec), then our simulated "deadlock" occurs when a set of philosophers gets in a cycle of repeatedly trying
GETCHOP
requests.Create a
commit
recording this work as follows:cumulus$ cp deadlock.cpp deadlock1.cpp cumulus$ cp deadlock.h deadlock1.h
cumulus$ git add {deadlock,deadlock1}.{cpp,h} cumulus$ git commit -m "lab3: Preliminary deadlock example"
The program
deadlock.cpp
above does not truly deadlock, because philosophers do not block when a chopstick is unavailable. Revisedeadlock.cpp
to create a true deadlock by adding a new type of requestGETCHOP_BLOCKING
to your synchronization server, with the following behavior (where the process with rank p is requesting the chopstick with index c):If the requested chopstick is unassigned (value -1), then assign p to that array location and send an empty
ACK
message (containing zero-length string) to the process with rank p, to indicate that the request succeeded.If the requested chopstick is already assigned to a process with a different rank, then assign p to an array element
waiting[c]
, and do not send anACK
message to the process with rank p at this time.If the requested chopstick is already assigned to that same process with rank p, send an
ACK
message containing the string"ERROR"
to the process with rank p, because no process should be requesting the same chopstick twice without releasing it between requests.
The array
waiting[]
should be an array of int with one element for each chopstick (same length aschop[]
). Initialize all the elements ofwaiting[]
with the constant valueNONE
= -1 to indicate that no process is waiting for any of the chopsticks at the beginning of the program. Note: Because each chopstick is shared by only two processes, the arraywaiting[]
only needs one memory location per chopstick for this problem. For a more general problem, a whole queue of waiting processes might be needed per chopstick.The
PUTCHOP
algorithm will now need to sendACK
messages to any waiting processes when a chopstick is returned. Here is the revised algorithm for aPUTCHOP
request by processq
with chopstickc
(additions in boldface):If the requested chopstick number is out of range (negative or beyond the
chop[]
array), send anACK
message containing the string"ERROR in chopstick number"
.If the requested chopstick is already assigned to that same process with rank q, then
if
waiting[c]
holds the rank of a process waiting on that chopstick, then assign that rankwaiting[c]
to the locationchop[c]
, send an emptyACK
message to that processwaiting[c]
, and assignNONE
= -1 to the array locationwaiting[c]
.Otherwise, assign -1 to
chop[c]
.
Then send an empty
ACK
message to that process with rank q, to indicate that itsPUTCHOP
request succeeded.Otherwise, send an
ACK
message containing the string"ERROR"
to the process with rank q, because no process should be trying to put back a chopstick that it doesn't possess.
Create a
commit
recording this work as follows:cumulus$ cp deadlock.cpp deadlock2.cpp cumulus$ cp deadlock.h deadlock2.h
cumulus$ git add {deadlock,deadlock2}.{cpp,h} cumulus$ git commit -m "lab3: True deadlock example"
Try to program a simulation of dining philosophers that has starvation. Name your code files
starvation.cpp
andstarvation.h
. See the second algorithm in the page on race conditions for an algorithm with starvation; usestosleep()
as needed to time your computation in order to produce a starvation situation.Make a
commit
of your code:cumulus$ git add starvation.{cpp,h} cumulus$ git commit -m "lab3: Starvation example"
Try to program a simulation of dining philosophers that has a race condition. Call your code files
race.cpp
andrace.h
. See the third algorithm in the page on race conditions for an algorithm with a race condition. You can usestosleep
to force "bad timing" and cause the race.The philosopher processes need shared access to an array
myturn[]
for this algorithm. Add this array to the synchronization server's memory, and add new message typesREAD_MYTURN
andWRITE_MYTURN
for assigning or reading from that shared array.The content of a
READ_MYTURN
message can be an index for the desired element in the shared array.The content of a
WRITE_MYTURN
message can consist of two integers, an index and a value to assign to that location of the shared array (0 for false, 1 for true).
Implement
myturn[]
to satisfy the Shared Array pattern, including Mutual Exclusion, like the arraychop[]
. The race condition in this problem should occur in the philosopher algorithms, not in the mechanism for array access.Note: In order to read integers from the content of a message, you can use C++'s
stringstream
class. Ifbuff
holds a string (such as the content of a message), then#include <sstream> ... stringstream ss(buff);
defines astringstream
objectss
which behaves like an input stream, extracting characters from the stringbuff
. For example, you can read an integer argument fromss
using the>>
operator:int val; ss >> val;
The algorithm also requires a way for processes to block themselves (
sleep()
) and other processes to unblock other sleeping processes (wakeup
). You can implement this as a service in the synchronization server as follows:Add two new message types,
SLEEP
andWAKEUP
.Add an array
sleeping[]
with one element for each philosopher process, similar tochop[]
. Initialize all the elements ofsleeping[]
to 0 (representing False) during initialization of yourSyncServer::run()
method (inrace.h
), before entering the loop for processing requests from philosophers.For a
SLEEP
request from a philosopher with rank p:Assign 1 to
sleeping[p]
. Do not send anACK
message to philosopher process p at this time! That philosopher process will block on theMPI_Recv()
call, awaiting thatACK
message, thus implementing the sleep operation.
A
WAKEUP
request should have one integer argument, the rank q of a philosopher process. ForWAKEUP
requests:Obtain the integer q from the message content using
stringstream
, as described above. This is the usual way to convert strings to integers in C++.If q is not the rank of a philosopher (i.e, is not in the range 0 to
nprocs
-2), send anACK
message to the philosopher process p that requestedWAKEUP
, with a non-empty content to indicate an error. Then,break
from theWAKEUP
code to avoid carrying out the rest of these steps.Note: That philosopher p should check the content of the
ACK
message it receives so it can detect the error it made and print an error message. Then, that philosopher p might as well callMPI_Abort()
to shut down the MPI job, so you can correct the bug. (In other circumstances, it may be possible to recover from an error, but not in this situation.)Assign the value 0 (representing False) to
sleeping[q]
.Send an
ACK
message (with empty content, indicating no error) to the philosopher process p that requestedWAKEUP
.Then, send an
ACK
message (with empty content) to the philosopher process q being woken up.
Note: The order of those last two
ACK
messages is very important for correctness: If philosopher q becomes unblocked before philosopher p does, then it's conceivable that some callwakeup(p)
will cause a lost wakeup between the those twoACK
messages that wouldn't have been lost if theACK
for q had correctly occured before theACK
for p.Make a
commit
of your code:cumulus$ git add race.{cpp,h} cumulus$ git commit -m "lab3: Race condition example"
Correctness with low-level patterns.
In general, it is more difficult to program correctly with low-level patterns than with higher-level patterns. The ordering of
ACK
messages above illustrates why this is true: TwoACK
messages need to take place at about the same time, and simply executing them in the wrong order leads to a race condition. This could also be described as an error in the Mutual Exclusion pattern: When thoseACK
messages execute in the wrong order, two different processes could have access to the shared resourcesleeping[]
at the same time, leaving open a possible race condition logical error.In contrast, the mid-level pattern Shared Array includes correct "thread-safe" concurrency, so programs using an implementation of Shared Array need not include precautions to avoid race conditions from multiple processes accessing a single array element at about the same time. (Students who have taken Hardware Design might recognize the same kind of comparison between programming a loop or an
if
construct in a high level language vs. in assembly language: a programmer must pay careful attention to instruction order, branch/jump conditions, destinations of branches, etc., at th low level of assembly programming.)Our quick implementation of
sleep()
andwakeup()
using Message Passing illustrates the power and flexibility of that pattern. But a programmer must give extra attention to program correctness with a pattern at such a low level as Message Passing.
Deliverables
All of your code for this assignment should already be
contained in commit
s. Modify the most recent
commit
message to indicate that you are submitting
the completed assignment.
cumulus$ git commit --amendAdd the following to the latest
commit
message.
lab3: complete
If your assignment is not complete, indicate your progress instead, e.g.,lab3: items 1-5 complete, item 6 partialYou can make later commits to submit updates.
Finally, pull/push
your commit
s in
the usual way.
cumulus$ git pull origin master cumulus$ git push origin master
Submit your work using git
on the cluster head
node you used. Include the cluster name in your commit message, e.g.,
mist$ cd ~/PDC mist$ git pull origin master mist$ git add -A mist$ git commit -m "Lab 3 submit (mist)" mist$ git push origin masterAlso, fill out this form to report on your work.
See lab1 Deliverables
section if you need to set up the cluster you used for
git
.
If you did your work on two different clusters, submit work from both of them using one of the strategies in lab1 Deliverables.
This lab is due by Wednesday, October 3, 2018.
Files: dp.cpp dp1.cpp Sync.h dp.h dp2.cpp dp2.h dp3.cpp dp3.h dp4.cpp dp4.h deadlock.cpp deadlock.h deadlock1.cpp deadlock1.h deadlock2.cpp deadlock2.h starvation.cpp starvation.h race.cpp race.h