Home
>>    




The Dining Philosophers
Split SPLIT_ID


CS 300, Parallel and Distributed Computing (PDC)
Due Wednesday, October 3, 2018

The Dining Philosophers

  • Implement a solution to Dining Philosophers in MPI. Experience/observe deadlock, starvation, etc.

  • Sync, SyncServer

Preliminary material

Laboratory exercises

  1. Login to a cluster admin node. Create a subdirectory ~/PDC/lab3 , then change your default directory to lab3.

  2. Starting a synchronization server. We will use MPI to program one process as a synchronization server, which can be used to insure correct interprocess communication.

    Make a copy of the file Sync.h, which defines an enumerated type and two classes. This file implements a simple client-server mechanism using MPI.

    • The enumerated type SyncMessageType defines two named integers QUIT and PRINT, to be used as MPI tags.

    • The class SyncServer consists of just one method run() that receives MPI messages, takes an action, and sends a message in reply.

    • The class Sync has methods for sending and receiving to a process that is running SyncServer. The constructor for Sync takes one argument, the rank of a process running SyncServer.

      Note: Be sure to use the rank of the SyncServer process for this constructor call. The argument of the Sync() constructor will be the destination for messages, so if you fill in the rank of the process calling Sync(), that process will be sending and receiving from itself -- which risks deadlock!

    Write an MPI application dp.cpp that tests Sync.h

    • The head node (rank 0) should execute SyncServer::run()

    • Other nodes should create an instance of Sync with constructor argument 0, then should carry out the following steps:

      1. Send a PRINT message with string "hello" to the server

      2. Receive an ACK message from the server

      3. Send a QUIT message (with empty string) to server

      4. Receive ACK from server.

      Note: Only one of the non-head nodes will get as far as sending its QUIT message before the program terminates.

    After your code is working correctly, create a copy named dp1.cpp and commit both copies plus Sync.h. (The copy dp1.cpp will make it easier for graders to retrieve your work at this stage, after you make further changes in dp.cpp during steps below.)

    cumulus$  cp dp.cpp dp1.cpp
    
    cumulus$  git add dp.cpp dp1.cpp Sync.h 
    cumulus$  git commit -m "lab3: First test program for Sync.h"
    


  3. Make a copy dp.h of Sync.h, and edit dp.cpp to include dp.h instead of Sync.h. Then, add an "echo" service to dp.h, in which the server sends an ACK message with the same content in response to an ECHO message. This will require adding the identifier ECHO to the enumerated type in dp.h, and adding a new case to the infinite while loop in Sync::run() to handle ECHO requests.

    Modify dp.cpp to include a test of the new ECHO service, and test your program, showing enough diagnostic output to verify that your new service works.

    Once your code is correct, create a commit recording this work as follows:

    cumulus$  cp dp.cpp dp2.cpp
    cumulus$  cp dp.h dp2.h
    
    cumulus$  git add dp.cpp dp2.cpp dp.h dp2.h 
    cumulus$  git commit -m "lab3: Add ECHO service to dp.h"
    


  4. Change dp.cpp so the head node runs on the highest-ranked node instead of the node with rank 0. Test, and create a commit if you want.

  5. Add a "chopstick" service to dp.h, which allocates an array chop[] of int having one element for each process other than the server, representing whether that chopstick is available (value NONE) or is allocated by a process (value is rank of that process). Note: Define a static const int variable NONE with value -1, for code readability.

    Here are some suggestions for this step.

    • Add two new message types, named GETCHOP and PUTCHOP, for requesting a chopstick and releasing one, resp.

    • If there are nprocs processes in your job, there will be nprocs-1 "philosopher" (non-server) processes, and therefore nprocs-1 chopsticks. So, your chopstick array chop[] should have size nproc-1 ints.

    • You can allocate your array chop[] in your main() program, and pass chop and its length to the server as arguments for run(). Or, you can allocate chop[] as a local variable in run() and calculate its length using MPI_Comm_size().

    • The "philosophers" will have rank 0 to nprocs-2, and the server will have rank nprocs-1.

    • You can represent that philosopher's left chopstick by chop[p] and that philosopher's right chopstick by chop[p+1], except when p = nprocs-2, in which case the right chopstick would be chop[0]. Under this strategy, a formula for the right chopstick would be chop[(p+1)%(nprocs-1)].

    • You can store a philosopher's rank p in an array element chop[k] when p possesses chopstick k.

    • Your server code for handling PUTCHOP and GETCHOP requests can determine the requesting process's rank p as status.MPI_SOURCE. The message's content can consist of a one-character string, either "R" for the Right chopstick (chop[p]) or "L" for the Left chopstick (chop[(p+1)%(nprocs-1)]).

    • For GETCHOP requests:

      • If the requested chopstick is unassigned (value NONE), then assign p to that array location and send an empty ACK message (containing zero-length string) to the process with rank p, to indicate that the request succeeded.

      • If the requested chopstick is already assigned to a process with a different rank, send an ACK message containing the string "NO" to the process with rank p.

      • If the requested chopstick is already assigned to that same process with rank p, send an ACK message containing the string "ERROR" to the process with rank p, because no process should be requesting the same chopstick twice without releasing it between requests.

    • For PUTCHOP requests:

      • If the requested chopstick is already assigned to that same process with rank p, then assign NONE to that array location and send an empty ACK message to that process with rank p, to indicate that the request succeeded.

      • Otherwise, send an ACK message containing the string "ERROR" to the process with rank p, because no process should be trying to put back a chopstick that it doesn't possess.

    • Create a commit recording this work as follows:

      cumulus$  cp dp.cpp dp3.cpp
      cumulus$  cp dp.h dp3.h
      
      cumulus$  git add dp.cpp dp3.cpp dp.h dp3.h 
      cumulus$  git commit -m "lab3: Add chopstick service to dp.h, GETCHOP and PUTCHOP"
      


  6. Implement a simulation of the dining philosophers problem, with random thinking and eating time and 5 philosophers (thus 6 processes, one for the synchronization server)

    • The function stosleep() below will cause a process to pause execution for a given number of milliseconds (thousandths of a second). For example, the call stosleep(1500) will pause for 1.5 seconds. (This uses a system call select; system calls are discussed in an Operating Systems course.)

      #include <sys/select.h>
      
      /* pause execution for msec thousandths of a second, on Linux */
      void stosleep(long msec) {
        struct timeval tv;
        tv.tv_sec = msec/1000;
        tv.tv_usec = 1000 * msec % 1000000;
        select(0, NULL, NULL, NULL, &tv);
      }
      

    • You can use stosleep() to simulate your philosophers thinking and eating. For example, the call

      stosleep(rand()%2000)
      
      will cause a random delay of up to 2 sec. Be sure to call srand() with an argument that depends on rank in order to get different random delays for different philosophers.

    • Use a simple (if incorrect) algorithm for philosophers attempting to obtain chopsticks, such as the following.

      repeat forever
         think
         pick up left chopstick (as soon as it's available)
         pick up right chopstick (as soon as it's available)
         eat
         return left chopstick
         return right chopstick
      
      Check the ACK message for a GETCHOP request to determine when a chopstick is available: if that ACK message is empty ( MPI_Get_count() delivers a non-zero value with data type MPI_BYTE), then the chopstick was obtained; a non-empty ACK message means the chopstick wasn't available (or an error). If the chopstick isn't available,
      • print a message "Process p retrying for its C chopstick" (where C is the chopstick L or R being sought). (where p is the philosopher's rank), then

      • repeatedly wait for a short time (stosleep(100) waits for 0.1 sec) and try again, until the chopstick becomes available, and finally

      • print a message "Process p now possesses chopstick C".

      .

    Try running your simulation, with output from each philosopher process labelled according to rank indicating each step of the algorithm.

    Note: This algorithm never stops, so use CTRL-C to stop the mpirun job after letting your program run for a short while.

    Create a commit recording this work as follows:

    cumulus$  cp dp.cpp dp4.cpp
    cumulus$  cp dp.h dp4.h
    
    cumulus$  git add {dp,dp4}.{cpp,h} 
    cumulus$  git commit -m "lab3: Dining philosophers simulation with a simple, vulnerable algorithm"
    
    Note: The notation {dp,dp4}.{cpp,h} is interpreted by the shell and expands to the four filenames dp.cpp dp4.cpp dp.h dp4.h . You may use the expanded version instead of the special notation if you prefer.

    Patterns in The Dining Philosophers .

    Besides Message Passing, this lab also represents the SPMD pattern. Although literally the same program is executed by each node, the synchronization process does completely different work than the philosopher processes, so the SPMD pattern actually only applies to the philosopher processes. In this case, the set data for a philosopher process consists of the parameters that determine our philosopher simulation, which includes a node's rank and that node's particular sequence of random numbers.

    The chopstick service provides an example of a new pattern at the mid level.

    • The array chop[] illustrates the Shared Array pattern, in which multiple processes or threads can use the same array data structure safely, without danger of race conditions.

    In dp.cpp, the synchronization process that runs the SyncServer code prevents any race conditions. No race conditions can occur because

    • only the synchronization process has direct access to the array elements, and

    • that synchronization process is programmed to carry out any philosopher process's request completely before proceeding to any other requests, which insures that no two requests will interfere with each other's array access through unfortunate timing.

    This synchronization strategy for protecting the array chop[] implements a new low-level pattern.

    • The Mutual Exclusion pattern appears when no two processes or threads can access shared resources at the same time.

    Here, the array chop[] is the shared resource among the philosopher processes, and the synchronization process guarantees that only one of those philosopher processes can access that array at a time. Mutual Exclusion is a low-level pattern, focused much more on hardware resources than on the features of an application, but is quite unlike the other low-level patterns we have seen (Message Passing and Collective Communication).

    There is also a high level pattern at work in dp.cpp.

    • In Task Decomposition, a programming problem is approached by dividing the work into multiple tasks that can be carried out simultaneously.

    In dp.cpp, there are two kinds of tasks: the synchronization service; and the philosopher tasks. Only one synchronization task is used, whereas multiple philosopher tasks are needed.

    Our two high-level patterns, Data Decomposition and Task Decomposition, concern strategies for finding the opportunities for parallelism in a programming problem. The mid- and low-level patterns focus on implementing and executing those decompositions.


  7. Make copies deadlock.cpp and deadlock.h of your dining philosophers simulation code dp.cpp and dp.h, and modify the copies in order to cause a deadlock.

    One way to do this is to take out the random delays for eating and thinking, and instead add delays between picking up left and right chopsticks, so that all philosophers will pick up their left chopsticks and be unable to obtain their right chopsticks.

    Note: Since our philosophers don't actually block when they can't get a desired chopstick, but keep retrying (say, every 0.1 sec), then our simulated "deadlock" occurs when a set of philosophers gets in a cycle of repeatedly trying GETCHOP requests.

    Create a commit recording this work as follows:

    cumulus$  cp deadlock.cpp deadlock1.cpp
    cumulus$  cp deadlock.h deadlock1.h
    
    cumulus$  git add {deadlock,deadlock1}.{cpp,h} 
    cumulus$  git commit -m "lab3: Preliminary deadlock example"
    

  8. The program deadlock.cpp above does not truly deadlock, because philosophers do not block when a chopstick is unavailable. Revise deadlock.cpp to create a true deadlock by adding a new type of request GETCHOP_BLOCKING to your synchronization server, with the following behavior (where the process with rank p is requesting the chopstick with index c):

    • If the requested chopstick is unassigned (value -1), then assign p to that array location and send an empty ACK message (containing zero-length string) to the process with rank p, to indicate that the request succeeded.

    • If the requested chopstick is already assigned to a process with a different rank, then assign p to an array element waiting[c], and do not send an ACK message to the process with rank p at this time.

    • If the requested chopstick is already assigned to that same process with rank p, send an ACK message containing the string "ERROR" to the process with rank p, because no process should be requesting the same chopstick twice without releasing it between requests.

    The array waiting[] should be an array of int with one element for each chopstick (same length as chop[]). Initialize all the elements of waiting[] with the constant value NONE = -1 to indicate that no process is waiting for any of the chopsticks at the beginning of the program. Note: Because each chopstick is shared by only two processes, the array waiting[] only needs one memory location per chopstick for this problem. For a more general problem, a whole queue of waiting processes might be needed per chopstick.

    The PUTCHOP algorithm will now need to send ACK messages to any waiting processes when a chopstick is returned. Here is the revised algorithm for a PUTCHOP request by process q with chopstick c (additions in boldface):

    • If the requested chopstick number is out of range (negative or beyond the chop[] array), send an ACK message containing the string "ERROR in chopstick number".

    • If the requested chopstick is already assigned to that same process with rank q, then

      • if waiting[c] holds the rank of a process waiting on that chopstick, then assign that rank waiting[c] to the location chop[c], send an empty ACK message to that process waiting[c], and assign NONE = -1 to the array location waiting[c].

      • Otherwise, assign -1 to chop[c].

      Then send an empty ACK message to that process with rank q, to indicate that its PUTCHOP request succeeded.

    • Otherwise, send an ACK message containing the string "ERROR" to the process with rank q, because no process should be trying to put back a chopstick that it doesn't possess.

    Create a commit recording this work as follows:

    cumulus$  cp deadlock.cpp deadlock2.cpp
    cumulus$  cp deadlock.h deadlock2.h
    
    cumulus$  git add {deadlock,deadlock2}.{cpp,h} 
    cumulus$  git commit -m "lab3: True deadlock example"
    


  9. Try to program a simulation of dining philosophers that has starvation. Name your code files starvation.cpp and starvation.h . See the second algorithm in the page on race conditions for an algorithm with starvation; use stosleep() as needed to time your computation in order to produce a starvation situation.

    Make a commit of your code:

    cumulus$  git add starvation.{cpp,h} 
    cumulus$  git commit -m "lab3: Starvation example"
    


  10. Try to program a simulation of dining philosophers that has a race condition. Call your code files race.cpp and race.h . See the third algorithm in the page on race conditions for an algorithm with a race condition. You can use stosleep to force "bad timing" and cause the race.

    1. The philosopher processes need shared access to an array myturn[] for this algorithm. Add this array to the synchronization server's memory, and add new message types READ_MYTURN and WRITE_MYTURN for assigning or reading from that shared array.

      • The content of a READ_MYTURN message can be an index for the desired element in the shared array.

      • The content of a WRITE_MYTURN message can consist of two integers, an index and a value to assign to that location of the shared array (0 for false, 1 for true).

      Implement myturn[] to satisfy the Shared Array pattern, including Mutual Exclusion, like the array chop[]. The race condition in this problem should occur in the philosopher algorithms, not in the mechanism for array access.

    2. Note: In order to read integers from the content of a message, you can use C++'s stringstream class. If buff holds a string (such as the content of a message), then

      #include <sstream>
      ...
      stringstream ss(buff);
      
      defines a stringstream object ss which behaves like an input stream, extracting characters from the string buff. For example, you can read an integer argument from ss using the >> operator:
      int val;
      ss >> val;
      

    3. The algorithm also requires a way for processes to block themselves (sleep()) and other processes to unblock other sleeping processes (wakeup). You can implement this as a service in the synchronization server as follows:

      • Add two new message types, SLEEP and WAKEUP.

      • Add an array sleeping[] with one element for each philosopher process, similar to chop[]. Initialize all the elements of sleeping[] to 0 (representing False) during initialization of your SyncServer::run() method (in race.h), before entering the loop for processing requests from philosophers.

      • For a SLEEP request from a philosopher with rank p:

        • Assign 1 to sleeping[p]. Do not send an ACK message to philosopher process p at this time! That philosopher process will block on the MPI_Recv() call, awaiting that ACK message, thus implementing the sleep operation.

      • A WAKEUP request should have one integer argument, the rank q of a philosopher process. For WAKEUP requests:

        • Obtain the integer q from the message content using stringstream, as described above. This is the usual way to convert strings to integers in C++.

        • If q is not the rank of a philosopher (i.e, is not in the range 0 to nprocs-2), send an ACK message to the philosopher process p that requested WAKEUP, with a non-empty content to indicate an error. Then, break from the WAKEUP code to avoid carrying out the rest of these steps.

          Note: That philosopher p should check the content of the ACK message it receives so it can detect the error it made and print an error message. Then, that philosopher p might as well call MPI_Abort() to shut down the MPI job, so you can correct the bug. (In other circumstances, it may be possible to recover from an error, but not in this situation.)

        • Assign the value 0 (representing False) to sleeping[q].

        • Send an ACK message (with empty content, indicating no error) to the philosopher process p that requested WAKEUP.

        • Then, send an ACK message (with empty content) to the philosopher process q being woken up.

      Note: The order of those last two ACK messages is very important for correctness: If philosopher q becomes unblocked before philosopher p does, then it's conceivable that some call wakeup(p) will cause a lost wakeup between the those two ACK messages that wouldn't have been lost if the ACK for q had correctly occured before the ACK for p.

      Make a commit of your code:

      cumulus$  git add race.{cpp,h} 
      cumulus$  git commit -m "lab3: Race condition example"
      

    Correctness with low-level patterns.

    In general, it is more difficult to program correctly with low-level patterns than with higher-level patterns. The ordering of ACK messages above illustrates why this is true: Two ACK messages need to take place at about the same time, and simply executing them in the wrong order leads to a race condition. This could also be described as an error in the Mutual Exclusion pattern: When those ACK messages execute in the wrong order, two different processes could have access to the shared resource sleeping[] at the same time, leaving open a possible race condition logical error.

    In contrast, the mid-level pattern Shared Array includes correct "thread-safe" concurrency, so programs using an implementation of Shared Array need not include precautions to avoid race conditions from multiple processes accessing a single array element at about the same time. (Students who have taken Hardware Design might recognize the same kind of comparison between programming a loop or an if construct in a high level language vs. in assembly language: a programmer must pay careful attention to instruction order, branch/jump conditions, destinations of branches, etc., at th low level of assembly programming.)

    Our quick implementation of sleep() and wakeup() using Message Passing illustrates the power and flexibility of that pattern. But a programmer must give extra attention to program correctness with a pattern at such a low level as Message Passing.


Deliverables

All of your code for this assignment should already be contained in commits. Modify the most recent commit message to indicate that you are submitting the completed assignment.

cumulus$  git commit --amend
Add the following to the latest commit message.
    lab3: complete

If your assignment is not complete, indicate your progress instead, e.g.,
    lab3: items 1-5 complete, item 6 partial
You can make later commits to submit updates.

Finally, pull/push your commits in the usual way.

cumulus$  git pull origin master
cumulus$  git push origin master

Submit your work using git on the cluster head node you used. Include the cluster name in your commit message, e.g.,

mist$  cd ~/PDC
mist$  git pull origin master
mist$  git add -A
mist$  git commit -m "Lab 3 submit (mist)"
mist$  git push origin master
Also, fill out this form to report on your work.

See lab1 Deliverables section if you need to set up the cluster you used for git.

If you did your work on two different clusters, submit work from both of them using one of the strategies in lab1 Deliverables.

This lab is due by Wednesday, October 3, 2018.


Files: dp.cpp dp1.cpp Sync.h dp.h dp2.cpp dp2.h dp3.cpp dp3.h dp4.cpp dp4.h deadlock.cpp deadlock.h deadlock1.cpp deadlock1.h deadlock2.cpp deadlock2.h starvation.cpp starvation.h race.cpp race.h