IPC strategies (CS 273 (OS), Fall 2019)

IPC strategies

CS 273 (OS), Fall 2019

General features of IPC problems

Inter-process communication (IPC) concerns the interaction and correct synchronization between processes.
A critical section of a process's program's code is a section that affects resources that are shared with other processes. For example, if two processes share some memory variables, code that affects the values of those memory locations would be a critical section.
A race condition occurs when the correct action of a program depends upon timing.
- See introduction to IPC issues document for examples of race conditions.
- Race conditions are particularly difficult to debug in programs, because an error will show up only intermittently.
- Rather than debugging, the best approach is to learn to sense and avoid race conditions during program development.
Goals for correct and satisfactory IPC
- Mutual exclusion: No two processes should be in their critical sections at the same time
- Bounded wait: No process should have to wait arbitrarily long to enter its critical section
- Independence of speed: Correct action should not depend on relative speeds of the processes
- Progress: No process A outside of A's critical section should be able to block another process B from entering B's critical section
  - (Not to be confused with making progress in development of programming loops.)

Busy-wait strategies for mutual exclusion

Disable interrupts
- Cause the hardware not to process interrupts throughout each process's critical section. This prevents other processes from receiving their time slice (triggered by clock interrupt).
- Advantages: Was simple to implement in single-core computers. (Multicore and multiprocessor systems use spinlocks, which use up a core's CPU but achieves Mutual Exclusion goal.)
- Disadvantages:
  - Too much power for user processes; potential for abuse.
  - Interferes with normal operation of system while interrupts are disabled. If interrupts somehow fail to become re-enabled, the system is trashed.
  - Must avoid any calls that rely on interrupts during critical sections, e.g., I/O.
- Usable within kernel (e.g., Linux) for brief critical operations.
Lock variables
- Use a variable whose value indicates whether it is safe to enter a critical section.
```
int lock = 0;

void get_lock() {
  while (lock)
    ;
  lock = 1;
  return
}

void release_lock() {
  lock = 0;
}
```
- Advantages: Simple, easily understood idea.
- Disadvantages: Race condition (violates Independence of Speed).
- If the get/release operations are implemented using correct IPC, then locks are handy for solving other IPC problems. Linux flock system call (OS-level lock on an open file descriptor); lock files for printer jobs and VNC processes; kernel level lock variables (that may, e.g., disable interrupts at beginning of get, enable just before return.
Strict alternation
- Example: (turn is a shared variable, initially 0)
```
process A                      process B                
			                                
while (1) {		       while (1) {              
  while (turn != 0)	         while (turn != 1)      
    ;			           ;                    
  critical_region();	         critical_region();     
  turn = 1;		         turn = 0;              
  noncritical_region();	         noncritical_region();  
}			       }                        
```
- Advantages: Can be implemented safely and easily in user code.
- Disadvantages:
  - The busy loops that await particular turn values waste CPU time.
  - Violation of "progress" goal for IPC (a process executing noncritical_region may prevent the other process from getting out of its busy loop and into its critical code).
  - With many problems, strict alternation between processes is undesirable (e.g., consider a producer/consumer situation)
- Use of busy loops (called busy waiting) may be justifiable if the delays are sure not to be very long. Use of some kind of blocking system call is preferable, because a blocking call deschedules process immediately rather than wasting CPU time throughout a time slice.

Peterson's solution

A clever software solution for achieving mutual exclusion.

#define N 2  /* number of processes */

int turn;
int interested[N];  /* initialized at 0 (FALSE) */

void enter_region(int process)  /* process is 0 or 1 */
{
  int other = 1 - process;  /* other process's number, 1 or 0 */
  
  interested[process] = 1;
  turn = process;
  while (turn == process && interested[other] = 1)
    ;
}

void leave_region(int process)
{
  interested[process] = 0;
}

Advantages:
- Like locks, it's easy to understand how to use enter_region() and leave_region() once they are written.
- Satisfies all four goals for IPC.
Disadvantages: Busy waiting.

TSL instruction
- The ISA level of particular CPUs may offer a "test and set lock" (TSL) instruction.
  - Assembly code: TSL R,X where R denotes a register and X denotes a memory location.
  - Action: The contents of memory location X is copied into register R, then a known value (e.g., 1) is stored in X. This is performed as an indivisible CPU instruction, so either all of the TSL instruction is performed or none of it is.
  - Example usage (in an Intel-like assembly language):
```
enter_region:
	TSL R0,LOCK       // copy value of LOCK to R0, and store 1 in LOCK
	CMP R0,#0         // compare prior LOCK value to 0
	JNE enter_region  // if LOCK started non-zero, try again
	RETURN		  // else LOCK was 0 and is now 1, so crit code safe

leave_region:
	MOVE LOCK,#0	  // store 0 in LOCK
	RETURN
```
- Advantages:
  - Hardware guarantee that there won't be a race when checking and setting lock variable.
  - Easy to program correctly (at assembly level).
- Disadvantages: Still a busy wait.
- Safe for use at user level, unlike disabling interrupts.
- Used in Linux kernel for mutual exclusion with multiple CPUs; called spin locks.
Besides the obvious CPU inefficiency, busy waiting can lead to priority inversions in which a process with high scheduling priority busy-loops, waiting for an event that can only be caused by a low-priority process that never gets scheduled (a form of starvation).
In Pentium: BTS (Bit Test and Set), BTR (Bit Test and Reset), BTC (Bit Test and Complement)
Related: CAS (Compare and Swap), which atomically compares contents of a memory location with a given value, and if they are equal, assign another value to that memory location.
```
   CAS value, dest, newval
```
CAS is used in lock-free data structures, e.g., a queue that is thread-safe (multiple processes can access it safely without race conditions) without the use of a lock mechanism.

Note: Locks aren't scalable. If 1000 processes need to compete for a resource using a lock, then only one of those processes can execute at a time, essentially "sequentializing" a potentially parallel computation. Lock-free data structures are designed to avoid this kind of bottleneck.

`sleep()` and `wakeup()`

Idea: Provide system calls for a process to block itself or unblock another process. Rationale: Blocking would be more efficient than busy looping.
- void sleep(void) blocks the process that calls it.
- void wakeup(pid_t pid) unblocks process pid if that process blocked itself using sleep(); otherwise, wakeup() has no effect.
Unfortunately, programs that use sleep()/wakeup() are prone to race conditions (typically located between a check of the condition for sleep()ing and the call to sleep()), as illustrated in race condition examples. If a process delays for any reason at that point of the program (e.g., time slice ends), the wakeup() for that call to sleep() may be sent too early, leading to a lost wakeup().
One may attempt to fix a lost wakeup() problem by setting a bit in the kernel when wakeup()s are sent to processes that are not currently sleep()ing. But a single "wakeup bit" is not enough to handle all programming situations; in fact, no finite number of wakeup bits will handle all situations.

Semaphores

(References: IPC Primitives (pdf or ps), text example)

Semaphores can be used to satisfy all four IPC goals, and because processes block when they have to wait, busy-waits can be avoided.
Semaphores must be ordinarily be implemented in the operating system, since the OS typically controls blocking of processes (needed for the queue of blocked processes). Thus, up() and down() would be generally implemented as system calls.
Semaphores are relatively easy to implement, and they can be used to prove correctness of IPC.
However, semaphores can be tricky to program with: consider the effects of downing semaphores in the wrong order.
Semaphores can be used by multicore systems with shared memory, assuming the queues of blocked processes can be shared by all the cores. A network operating system would be necessary to implement semaphores on a distributed system, such as a cluster.

Monitors

(References: IPC Primitives (pdf or ps), text example)

Monitors also satisfy all four IPC goals, and use blocking rather than busy waiting.
Monitors are implemented as part of a programming language (although they need OS support for the blocking).
Monitors are easier to program with correctly than semaphores, since monitors are not so sensitive to the ordering of wait() operations on condition variables.
But signal() calls must be treated carefully to preserve mutual exclusion violations. Typically, this is handled by making signal() the last call before leaving a critical section.
Monitors support an object-oriented paradigm, in which one wait()s on all the condition variables one will need at the beginning of a method, and signal()s on those condition variables at the end of that method.
Java implements a limited form of monitors, using the synchronized keyword. Implicitly, every Java object has its own condition variable; entering a synchronized section of code implicitly does a wait() on that variable; and leaving a synchronized section implicitly signal()s on that condition variable.

Programming with synchronized methods implicitly calls wait() on the object's implicit condition variable before the method begins, then implicitly signal()s after the method ends, making correct IPC convenient (without a risk of "crossed signal()s).
Few other widely used languages implement monitors. (They were developed for Concurrent Euclid.)
Monitors can be implemented for multicore systems with shared memory. Monitors could be implemented on a distributed system assuming a network OS.

Message passing

(References: IPC Primitives (pdf or ps), text example)

Unlike semaphores or monitors, there are many design choices for a message-passing system: blocking vs. non-blocking primitives; optional use of mailbox mechanisms to receive messages that are separate from any one process; size of message buffers; etc.
Message passing can be used to satisfy all four IPC goals using blocking rather than busy waiting. This is often done using non-blocking send()s and blocking receive()s, which depends on sufficient buffer capacity for the application's stored send()s.
Message-passing primitives send() and receive() are implemented in an OS, but the source and destination of a message transmission can be on different machines. Thus, message passing suits distributed computing well (e.g., clusters).
send() and receive() are error prone in the same way that semaphore up() and down() operations are: it's not difficult to get requests out of order, in ways that may lead to deadlock.
Libraries such as MPI (Message Passing Interface) provide consistent application-level functions, for portability, standard usage, and avoidance of error pitfalls.
Berkeley sockets are typically system calls in UNIX/Linux, although they may be a library for other systems. A higher-level library such as MPI might be built using sockets.
Message passing can be used on multicore processors, and on distributed systems without the need for a network operating system.

Barriers

For parallel computations with any number of processes: when each process reaches the barrier, it is blocked until all processes have reached the barrier.
Primitive: barrier()
block until all processes have called barrier()
Useful for parallel computations that proceed in stages that are not otherwise forced, e.g., by data transfer messages.
Implemented for distributed systems in MPI (MPI_Barrier(comm)) and for multicore systems in OpenMP (barrier pragma).

"Equivalence" of primitives

Linux kernel implementation

See ipc/ subdirectory (sem.c, shm.c, msg.c) and related include/kernel/ files.

General features of IPC problems

Busy-wait strategies for mutual exclusion

sleep() and wakeup()

Semaphores

Monitors

Message passing

Barriers

"Equivalence" of primitives

Linux kernel implementation

`sleep()` and `wakeup()`