The Dining Philosophers

CS 300, Parallel and Distributed Computing (PDC)
Due Friday, January 9, 2015

The Dining Philosophers

Preliminary material

Laboratory exercises

  1. Login to a cluster admin node. Create two subdirectories ~/PDC/lab3 and ~/PDC/lab3/job, then change your default directory to lab3.

  2. Starting a synchronization server. We will use MPI to program one process as a synchronization server, which can be used to insure correct interprocess communication.

    Make a copy of the file Sync.h, which defines an enumerated type and two classes. This file implements a simple client-server mechanism using MPI.

    Write an MPI application dp.cpp that tests Sync.h...

  3. Add an "echo" service to Sync.h, in which the server sends an ACK message with the same content in response to an ECHO message... Test, showing enough diagnostic output to verify that your new service works.

  4. Change dp.cpp so the head node runs on the highest-ranked node instead of the node with rank 0...

  5. Add a "chopstick" service to Sync.h, which allocates an array chop[] of int with one element for each process other than the server, representing whether that chopstick is available (value NONE) or is allocated by a process (value is rank of that process). Note: Use #define to define a preprocessor constant NONE to have the value -1, for code readability.

    Here are some suggestions for this step.

  6. Implement a simulation of the dining philosophers problem, with random thinking and eating time and 5 philosophers (thus 6 processes, one for the synchronization server)

    Try running your simulation, with output from each philosopher process labelled according to rank indicating each step of the algorithm.

    Note: This algorithm never stops, so use scancel to delete the job after letting it run for a short while. You can check on progress by viewing the job's output file while the job is running.

    Patterns in The Dining Philosophers.

    Besides Message Passing, the chopstick service provides an example of a new pattern at the mid level.

    • The array chop[] illustrates the Shared Array pattern, in which multiple processes or threads can use the same array data structure safely, without danger of race conditions.

    In dp.cpp, the synchronization process that runs the SyncServer code prevents any race conditions. No race conditions can occur because

    • only the synchronization process has direct access to the array elements, and

    • that synchronization process is programmed to carry out any philosopher process's request completely before proceeding to any other requests, which insures that no two requests will interfere with each other's array access through unfortunate timing.

    This synchronization strategy for protecting the array chop[] implements a new low-level pattern.

    • The Mutual Exclusion pattern appears when no two processes or threads can access shared resources at the same time.

    Here, the array chop[] is the shared resource among the philosopher processes, and the synchronization process guarantees that only one of those philosopher processes can access that array at a time. Mutual Exclusion is a low-level pattern, focused much more on hardware resources than on the features of an application, but is quite unlike the other low-level patterns we have seen (Message Passing and Collective Communication).

    There is also a high level pattern at work in dp.cpp.

    • In Task Decomposition, a programming problem is approached by dividing the work into multiple tasks that can be carried out simultaneously.

    In dp.cpp, there are two kinds of tasks: the synchronization service; and the philosopher tasks. Only one synchronization task is used, whereas multiple philosopher tasks are needed.

    Our two high-level patterns, Data Decomposition and Task Decomposition, concern strategies for finding the opportunities for parallelism in a programming problem. The mid- and low-level patterns focus on implementing and executing those decompositions.

  7. Make a copy deadlock.cpp of your dining philosophers simulation dp.cpp, and modify the copy in order to cause a deadlock.

    One way to do this is to take out the random delays for eating and thinking, and instead add delays between picking up left and right chopsticks, so that all philosophers will pick up their left chopsticks and be unable to obtain their right chopsticks.

    Note: Since our philosophers don't actually block when they can't get a desired chopstick, but keep retrying (say, every 0.1 sec), then our simulated "deadlock" occurs when a set of philosophers gets stuck repeatedly trying GETCHOP requests.

  8. The program deadlock.cpp above does not truly deadlock, because philosophers do not block when a chopstick is unavailable. Revise deadlock.cpp to create a true deadlock by adding a new type of request GETCHOP_BLOCKING to your synchronization server, with the following behavior (where the process with rank p is requesting the chopstick with index c):

    The array waiting[] should be an array of int with one element for each chopstick (same length as chop[]). Initialize all the elements of waiting[] with the constant value NONE = -1 to indicate that no process is waiting for any of the chopsticks at the beginning of the program. Note: Because each chopstick is shared by only two processes, the array waiting[] only needs one memory location per chopstick for this problem. For a more general problem, a whole queue of waiting processes might be needed per chopstick.

    The PUTCHOP algorithm will now need to send ACK messages to any waiting processes when a chopstick is returned. Here is the revised algorithm for a PUTCHOP request by process q with chopstick c (additions in boldface):

  9. Try to program a simulation of dining philosophers that has starvation. See the second algorithm in the page on race conditions for an algorithm with starvation; use stosleep() as needed to time your computation in order to produce a starvation situation.

  10. Try to program a simulation of dining philosophers that has a race condition. See the third algorithm in the page on race conditions for an algorithm with a race condition. You can use stosleep to force "bad timing" and cause the race.

    1. The philosopher processes need shared access to an array myturn[] for this algorithm. Add this array to the synchronization server's memory, and add new message types READ_MYTURN and WRITE_MYTURN for assigning or reading from that shared array.

      • The content of a READ_MYTURN message can be an index for the desired element in the shared array.

      • The content of a WRITE_MYTURN message can consist of two integers, an index and a value to assign to that location of the shared array (0 for false, 1 for true).

      Implement myturn[] to satisfy the Shared Array pattern, including Mutual Exclusion, like the array chop[]. The race condition in this problem should occur in the philosopher algorithms, not in the mechanism for array access.

    2. Note: In order to read integers from the content of a message, you can use C++'s stringstream class. If buff holds a string (such as the content of a message), then

         #include <sstream>
         ...
         stringstream ss(buff);
      
      defines a stringstream object ss which behaves like an input stream, extracting characters from the string buff. For example, you can read an integer argument from ss using the >> operator:
         int val;
         ss >> val;
      

    3. The algorithm also requires a way for processes to block themselves (sleep()) and other processes to unblock other sleeping processes (wakeup). You can implement this as a service in the synchronization server as follows:

      • Add two new message types, SLEEP and WAKEUP.

      • Add an array sleeping[] with one element for each philosopher process, similar to chop[]. Initialize all the elements of sleeping[] to 0 (representing False) during initialization of your SyncServer::run() method, before entering the loop for processing requests from philosophers.

      • For a SLEEP request from a philosopher with rank p:

        • Assign 1 to sleeping[p]. Do not send an ACK message to philosopher process p at this time! That philosopher process will block on the MPI_Recv() call, awaiting that ACK message, thus implementing the sleep operation.

      • A WAKEUP request should have one integer argument, the rank q of a philosopher process. For WAKEUP requests:

        • Obtain the integer q from the message content using stringstream, as described above. This is the usual way to convert strings to integers in C++.

        • If q is not the rank of a philosopher (i.e, is not in the range 0 to nprocs-2), send an ACK message to the philosopher process p that requested WAKEUP, with a non-empty content to indicate an error. Then, break from the WAKEUP code to avoid carrying out the rest of these steps.

          Note: That philosopher p should check the content of the ACK message it receives so it can detect the error it made and print an error message. Then, that philosopher p might as well call MPI_Abort() to shut down the MPI job, so you can correct the bug. (In other circumstances, it may be possible to recover from an error, but not in this situation.)

        • Assign the value 0 (representing False) to sleeping[q].

        • Send an ACK message (with empty content, indicating no error) to the philosopher process p that requested WAKEUP.

        • Then, send an ACK message (with empty content) to the philosopher process q being woken up.

      Note: The order of those last two ACK messages is very important for correctness: If philosopher q becomes unblocked before philosopher p does, then it's conceivable that some call wakeup(p) will cause a lost wakeup between the those two ACK messages that wouldn't have been lost if the ACK for q had correctly occured before the ACK for p.

    Correctness with low-level patterns.

    In general, it is more difficult to program correctly with low-level patterns than with higher-level patterns. The ordering of ACK messages above illustrates why this is true: Two ACK messages need to take place at about the same time, and simply executing them in the wrong order leads to a race condition. This could also be described as an error in the Mutual Exclusion pattern: When those ACK messages execute in the wrong order, two different processes could have access to the shared resource sleeping[] at the same time, leaving open a possible race condition logical error.

    In contrast, the mid-level pattern Shared Array includes correct "thread-safe" concurrency, so programs using an implementation of Shared Array need not include precautions to avoid race conditions from multiple processes accessing a single array element at about the same time. (Students who have taken Hardware Design might recognize the same kind of comparison between programming a loop or an if construct in a high level language vs. in assembly language: a programmer must pay careful attention to instruction order, branch/jump conditions, destinations of branches, etc., at th low level of assembly programming.)

    Our quick implementation of sleep() and wakeup() using Message Passing illustrates the power and flexibility of that pattern. But a programmer must give extra attention to program correctness with a pattern at such a low level as Message Passing.

Deliverables

Submit your work using git on the cluster head node you used. Include the cluster name in your commit message, e.g.,

    mist$  cd ~/PDC
    mist$  git pull origin master
    mist$  git add -A
    mist$  git commit -m "Lab 3 submit (mist)"
    mist$  git push origin master
Also, fill out this form to report on your work.

See lab1 Deliverables section if you need to set up the cluster you used for git.

If you did your work on two different clusters, submit work from both of them using one of the strategies in lab1 Deliverables.

This lab is due by Friday, January 9, 2015.