Parallel and Distributed Computing (CS 256 (PDC), Fall 2018)

Parallel and Distributed Computing

CS 256 (PDC), Fall 2018

First-day items

Looking ahead

Read Introduction to MPI, section 1 through section 4.5, and also section 6

Why study parallelism?

Parallelism has been used to speed up computation since the earliest computers, but until recently, most parallel computation took place "behind the scenes": lower level; specialized high-performance computing
Multicore architectures mean that all programmers will need to know about parallelism (just as everyone knows basics about client-server architecture now)
Web-scale problems increasingly require petascale (10¹⁵) computations. Social networking examples; the rise of cloud computing.

Levels of parallelism: some general categories.

Processor-level, i.e., within a processor, among microarchitecture components
Computer-level, i.e., within a computer enclosure, among processors
Distributed, i.e., within a computer network, among computers

Many forms of parallelism

Shared Memory, e.g., multithreaded computation using multiple cores
- OS or language features such as thread packages
- Libraries such as OpenMP
- Languages such as Go
Distributed computing - multiple networked computers
- OS features such as sockets
- Libraries such as MPI
- Languages such as Erlang
SIMD, such as CUDA...
Frameworks, such as Hadoop...
Heterogeneous approaches (combinations of the above)

Our plan

MPI cluster computing
Shared memory approaches
Hadoop and MapReduce
Volunteer topics from class on languages, SIMD, etc., according to interest
Team projects

Some PDC terms

Some general computing terms

process

The execution of a program

processing element

A unit of circuitry for carrying out computation.

memory

"Local" storage. Memory hierarchy.

interconnection network

Connections between processing elements and memories.

host

A computer in a distributed system.

remote

Separated by a computer network, e.g., www.gmail.com is a remote host from a St. Olaf host.

local

Nearby; on the host you're using (when contrasted with remote

resource

In general: memory/storage, processors, network. In a particular context: anything needed for a computation (e.g., a piece of information, a variable, a network connection, ...).

protocol

Rules for correct communication in a computation. Examples: programmer-defined; HTTP, FTP, SSH, TCP/IP, Ethernet
Parallelism-related terms

degree of parallelism

Number of processing elements used in a parallel algorithm.

sequential computation

A sequential computation takes place on a single processing element.

parallel computation

Uses multiple processing elements to carry out a task.

data parallelism

The same processing is applied to multiple subsets of a large data set, in parallel.

task parallelism

Different tasks or stages of a computation are performed in parallel.
Desirable properties for a parallel computation.

scalability

Invariance (remaining the same) of a property of interest when the order of magnitude changes. A major concern: is the (performance, correctness, efficiency) of a (protocol, algorithm, system) scalable?

robust

A robust system keeps working in the face of challenges.

fault-tolerant

Capacity for a system or algorithm to continue working correctly, even when there are failures.
Standalone computing means computing on a single computer, as perceived by the user. In the strictest sense, no network connections would be used in standalone computing, although we will consider a computation as "standalone" if the only networking used is a connection to a single computer located elsewhere, on which a task's computation is performed.

A lot of processor-level and computer-level parallel computing takes place under the surface when one uses standalone computing. However, the user or programmer needs not explicitly direct that parallelism in order to use or program the system.
Interactive, batch
Example applications
- client-server applications: One host, the server (which is typically a remote computer), acts as a source for computing resource(s); other hosts obtain those resources over a network. Example resources: files via the NFS protocol (a file server); computation, such as CPET providing Scheme computation to the wiki; web pages via the HTTP protocol (web server; a browser is a web client).
- parallel BLAST: The BLASTalgorithm efficiently compares a given genetic sequence against a collection of known genetic sequences, identifying common subsequences and assessing how many changes would be required to transform the given sequence to each of the known sequences. Parallel BLAST performs these comparisons in parallel.
- map-reduce: A three-phase framework for efficient, robust, scalable parallel computation with large data sets, adapted by Google for web computations from an earlier LISP (cf. Scheme) problem-solving strategy.
High-performance computing (HPC) focuses on delivering large amounts of computing for relatively short amounts of time, e.g., minutes or days.

High-throughput computing (HTC) also focuses on delivering large amounts of computing, but for longer amounts of time, e.g., months or years [Condor project].

Beowulf clusters

A Beowulf cluster is a networked collection of off-the-shelf computers that are used together to solve (large-scale) computing problems.
Nodes, head node, worker node.
dedicated clusters, rack-mounted clusters

St. Olaf's Beowulf clusters

Several clusters: Helios, MistRider, Cumulus, Castaway; others under development
Head node: cumulus.cs.stolaf.edu

Head nodes are virtual machines on a 64-core computer
Some history...

< >