IPC strategies
CS 273 (OS), Fall 2019
General features of IPC problems
Inter-process communication (IPC) concerns the interaction and correct synchronization between processes.
A critical section of a process's program's code is a section that affects resources that are shared with other processes. For example, if two processes share some memory variables, code that affects the values of those memory locations would be a critical section.
A race condition occurs when the correct action of a program depends upon timing.
- See introduction to IPC issues document for examples of race conditions.
- Race conditions are particularly difficult to debug in programs, because an error will show up only intermittently.
- Rather than debugging, the best approach is to learn to sense and avoid race conditions during program development.
Goals for correct and satisfactory IPC
Mutual exclusion: No two processes should be in their critical sections at the same time
Bounded wait: No process should have to wait arbitrarily long to enter its critical section
Independence of speed: Correct action should not depend on relative speeds of the processes
Progress: No process A outside of A's critical section should be able to block another process B from entering B's critical section
- (Not to be confused with making progress in development of programming loops.)
Busy-wait strategies for mutual exclusion
Disable interrupts
- Cause the hardware not to process interrupts throughout each process's critical section. This prevents other processes from receiving their time slice (triggered by clock interrupt).
- Advantages: Was simple to implement in single-core computers. (Multicore and multiprocessor systems use spinlocks, which use up a core's CPU but achieves Mutual Exclusion goal.)
- Disadvantages:
- Too much power for user processes; potential for abuse.
- Interferes with normal operation of system while interrupts are disabled. If interrupts somehow fail to become re-enabled, the system is trashed.
- Must avoid any calls that rely on interrupts during critical sections, e.g., I/O.
- Usable within kernel (e.g., Linux) for brief critical operations.
Lock variables
- Use a variable whose value indicates whether it is safe to enter a
critical section.
int lock = 0; void get_lock() { while (lock) ; lock = 1; return } void release_lock() { lock = 0; }
- Advantages: Simple, easily understood idea.
- Disadvantages: Race condition (violates Independence of Speed).
- If the
get
/release
operations are implemented using correct IPC, then locks are handy for solving other IPC problems. Linuxflock
system call (OS-level lock on an open file descriptor); lock files for printer jobs and VNC processes; kernel level lock variables (that may, e.g., disable interrupts at beginning ofget
, enable just beforereturn
.
- Use a variable whose value indicates whether it is safe to enter a
critical section.
Strict alternation
- Example: (
turn
is a shared variable, initially 0)process A process B while (1) { while (1) { while (turn != 0) while (turn != 1) ; ; critical_region(); critical_region(); turn = 1; turn = 0; noncritical_region(); noncritical_region(); } }
- Advantages: Can be implemented safely and easily in user code.
- Disadvantages:
- The busy loops that await particular
turn
values waste CPU time. - Violation of "progress" goal
for IPC (a process executing
noncritical_region
may prevent the other process from getting out of its busy loop and into its critical code). - With many problems, strict alternation between processes is undesirable (e.g., consider a producer/consumer situation)
- The busy loops that await particular
- Use of busy loops (called busy waiting) may be justifiable if the delays are sure not to be very long. Use of some kind of blocking system call is preferable, because a blocking call deschedules process immediately rather than wasting CPU time throughout a time slice.
- Example: (
Peterson's solution
- A clever software solution for achieving mutual exclusion.
#define N 2 /* number of processes */ int turn; int interested[N]; /* initialized at 0 (FALSE) */ void enter_region(int process) /* process is 0 or 1 */ { int other = 1 - process; /* other process's number, 1 or 0 */ interested[process] = 1; turn = process; while (turn == process && interested[other] = 1) ; } void leave_region(int process) { interested[process] = 0; }
- Advantages:
- Like locks, it's easy to understand how to
use
enter_region()
andleave_region()
once they are written. - Satisfies all four goals for IPC.
- Like locks, it's easy to understand how to
use
- Disadvantages: Busy waiting.
- A clever software solution for achieving mutual exclusion.
TSL
instruction- The ISA level of particular CPUs may offer a "test and set lock"
(
TSL
) instruction.- Assembly code:
TSL R,X
whereR
denotes a register andX
denotes a memory location. - Action: The contents of memory location
X
is copied into registerR
, then a known value (e.g., 1) is stored inX
. This is performed as an indivisible CPU instruction, so either all of theTSL
instruction is performed or none of it is. - Example usage (in an Intel-like assembly language):
enter_region: TSL R0,LOCK // copy value of LOCK to R0, and store 1 in LOCK CMP R0,#0 // compare prior LOCK value to 0 JNE enter_region // if LOCK started non-zero, try again RETURN // else LOCK was 0 and is now 1, so crit code safe leave_region: MOVE LOCK,#0 // store 0 in LOCK RETURN
- Assembly code:
- Advantages:
- Hardware guarantee that there won't be a race when checking and setting lock variable.
- Easy to program correctly (at assembly level).
- Disadvantages: Still a busy wait.
- Safe for use at user level, unlike disabling interrupts.
- Used in Linux kernel for mutual exclusion with multiple CPUs; called spin locks.
- The ISA level of particular CPUs may offer a "test and set lock"
(
Besides the obvious CPU inefficiency, busy waiting can lead to priority inversions in which a process with high scheduling priority busy-loops, waiting for an event that can only be caused by a low-priority process that never gets scheduled (a form of starvation).
In Pentium:
BTS
(Bit Test and Set),BTR
(Bit Test and Reset),BTC
(Bit Test and Complement)Related:
CAS
(Compare and Swap), which atomically compares contents of a memory location with a given value, and if they are equal, assign another value to that memory location.CAS value, dest, newval
CAS
is used in lock-free data structures, e.g., a queue that is thread-safe (multiple processes can access it safely without race conditions) without the use of a lock mechanism.Note: Locks aren't scalable. If 1000 processes need to compete for a resource using a lock, then only one of those processes can execute at a time, essentially "sequentializing" a potentially parallel computation. Lock-free data structures are designed to avoid this kind of bottleneck.
sleep()
and wakeup()
Idea: Provide system calls for a process to block itself or unblock another process. Rationale: Blocking would be more efficient than busy looping.
void sleep(void)
blocks the process that calls it.void wakeup(pid_t pid)
unblocks processpid
if that process blocked itself usingsleep()
; otherwise,wakeup()
has no effect.
Unfortunately, programs that use
sleep()
/wakeup()
are prone to race conditions (typically located between a check of the condition forsleep()
ing and the call tosleep()
), as illustrated in race condition examples. If a process delays for any reason at that point of the program (e.g., time slice ends), thewakeup()
for that call tosleep()
may be sent too early, leading to a lostwakeup()
.One may attempt to fix a lost
wakeup()
problem by setting a bit in the kernel whenwakeup()
s are sent to processes that are not currentlysleep()ing
. But a single "wakeup bit" is not enough to handle all programming situations; in fact, no finite number of wakeup bits will handle all situations.
Semaphores
(References: IPC Primitives (pdf or ps), text example)
Semaphores can be used to satisfy all four IPC goals, and because processes block when they have to wait, busy-waits can be avoided.
Semaphores must be ordinarily be implemented in the operating system, since the OS typically controls blocking of processes (needed for the queue of blocked processes). Thus,
up()
anddown()
would be generally implemented as system calls.Semaphores are relatively easy to implement, and they can be used to prove correctness of IPC.
However, semaphores can be tricky to program with: consider the effects of
down
ing semaphores in the wrong order.Semaphores can be used by multicore systems with shared memory, assuming the queues of blocked processes can be shared by all the cores. A network operating system would be necessary to implement semaphores on a distributed system, such as a cluster.
Monitors
(References: IPC Primitives (pdf or ps), text example)
Monitors also satisfy all four IPC goals, and use blocking rather than busy waiting.
Monitors are implemented as part of a programming language (although they need OS support for the blocking).
Monitors are easier to program with correctly than semaphores, since monitors are not so sensitive to the ordering of
wait()
operations on condition variables.But
signal()
calls must be treated carefully to preserve mutual exclusion violations. Typically, this is handled by makingsignal()
the last call before leaving a critical section.Monitors support an object-oriented paradigm, in which one
wait()
s on all the condition variables one will need at the beginning of a method, andsignal()
s on those condition variables at the end of that method.Java implements a limited form of monitors, using the
synchronized
keyword. Implicitly, every Java object has its own condition variable; entering asynchronized
section of code implicitly does await()
on that variable; and leaving asynchronized
section implicitlysignal()
s on that condition variable.Programming with
synchronized
methods implicitly callswait()
on the object's implicit condition variable before the method begins, then implicitlysignal()
s after the method ends, making correct IPC convenient (without a risk of "crossedsignal()
s).Few other widely used languages implement monitors. (They were developed for Concurrent Euclid.)
Monitors can be implemented for multicore systems with shared memory. Monitors could be implemented on a distributed system assuming a network OS.
Message passing
(References: IPC Primitives (pdf or ps), text example)
Unlike semaphores or monitors, there are many design choices for a message-passing system: blocking vs. non-blocking primitives; optional use of mailbox mechanisms to receive messages that are separate from any one process; size of message buffers; etc.
Message passing can be used to satisfy all four IPC goals using blocking rather than busy waiting. This is often done using non-blocking
send()
s and blockingreceive()
s, which depends on sufficient buffer capacity for the application's storedsend()
s.Message-passing primitives
send()
andreceive()
are implemented in an OS, but the source and destination of a message transmission can be on different machines. Thus, message passing suits distributed computing well (e.g., clusters).send()
andreceive()
are error prone in the same way that semaphoreup()
anddown()
operations are: it's not difficult to get requests out of order, in ways that may lead to deadlock.Libraries such as MPI (Message Passing Interface) provide consistent application-level functions, for portability, standard usage, and avoidance of error pitfalls.
Berkeley sockets are typically system calls in UNIX/Linux, although they may be a library for other systems. A higher-level library such as MPI might be built using sockets.
Message passing can be used on multicore processors, and on distributed systems without the need for a network operating system.
Barriers
For parallel computations with any number of processes: when each process reaches the barrier, it is blocked until all processes have reached the barrier.
Primitive:
barrier()
block until all processes have calledbarrier()
Useful for parallel computations that proceed in stages that are not otherwise forced, e.g., by data transfer messages.
Implemented for distributed systems in MPI (
MPI_Barrier(comm)
) and for multicore systems in OpenMP (barrier
pragma).
"Equivalence" of primitives
Linux kernel implementation
See ipc/
subdirectory (sem.c
,
shm.c
, msg.c
) and related
include/kernel/ files.