Parallel and Distributed Computing
CS 256 (PDC), Fall 2018
Looking ahead
Why study parallelism?
Parallelism has been used to speed up computation since the earliest computers, but until recently, most parallel computation took place "behind the scenes": lower level; specialized high-performance computing
Multicore architectures mean that all programmers will need to know about parallelism (just as everyone knows basics about client-server architecture now)
Web-scale problems increasingly require petascale (1015) computations. Social networking examples; the rise of cloud computing.
Levels of parallelism: some general categories.
Processor-level, i.e., within a processor, among microarchitecture components
Computer-level, i.e., within a computer enclosure, among processors
Distributed, i.e., within a computer network, among computers
Many forms of parallelism
Shared Memory, e.g., multithreaded computation using multiple cores
OS or language features such as thread packages
Libraries such as OpenMP
Languages such as Go
Distributed computing - multiple networked computers
OS features such as sockets
Libraries such as MPI
Languages such as Erlang
SIMD, such as CUDA...
Frameworks, such as Hadoop...
Heterogeneous approaches (combinations of the above)
Our plan
MPI cluster computing
Shared memory approaches
Hadoop and MapReduce
Volunteer topics from class on languages, SIMD, etc., according to interest
Team projects
Some PDC terms
Some general computing terms
- process
- The execution of a program
- processing element
- A unit of circuitry for carrying out computation.
- memory
- "Local" storage. Memory hierarchy.
- interconnection network
- Connections between processing elements and memories.
- host
- A computer in a distributed system.
- remote
- Separated by a computer network, e.g.,
www.gmail.com
is a remote host from a St. Olaf host. - local
- Nearby; on the host you're using (when contrasted with remote
- resource
- In general: memory/storage, processors, network. In a particular context: anything needed for a computation (e.g., a piece of information, a variable, a network connection, ...).
- protocol
- Rules for correct communication in a computation. Examples: programmer-defined; HTTP, FTP, SSH, TCP/IP, Ethernet
Parallelism-related terms
- degree of parallelism
- Number of processing elements used in a parallel algorithm.
- sequential computation
- A sequential computation takes place on a single processing element.
- parallel computation
- Uses multiple processing elements to carry out a task.
- data parallelism
- The same processing is applied to multiple subsets of a large data set, in parallel.
- task parallelism
- Different tasks or stages of a computation are performed in parallel.
-
Desirable properties for a parallel computation.
- scalability
- Invariance (remaining the same) of a property of interest when the order of magnitude changes. A major concern: is the (performance, correctness, efficiency) of a (protocol, algorithm, system) scalable?
- robust
- A robust system keeps working in the face of challenges.
- fault-tolerant
- Capacity for a system or algorithm to continue working correctly, even when there are failures.
Standalone computing means computing on a single computer, as perceived by the user. In the strictest sense, no network connections would be used in standalone computing, although we will consider a computation as "standalone" if the only networking used is a connection to a single computer located elsewhere, on which a task's computation is performed.
A lot of processor-level and computer-level parallel computing takes place under the surface when one uses standalone computing. However, the user or programmer needs not explicitly direct that parallelism in order to use or program the system.
Interactive, batch
Example applications
client-server applications: One host, the server (which is typically a remote computer), acts as a source for computing resource(s); other hosts obtain those resources over a network. Example resources: files via the NFS protocol (a file server); computation, such as CPET providing Scheme computation to the wiki; web pages via the HTTP protocol (web server; a browser is a web client).
parallel BLAST: The BLASTalgorithm efficiently compares a given genetic sequence against a collection of known genetic sequences, identifying common subsequences and assessing how many changes would be required to transform the given sequence to each of the known sequences. Parallel BLAST performs these comparisons in parallel.
map-reduce: A three-phase framework for efficient, robust, scalable parallel computation with large data sets, adapted by Google for web computations from an earlier LISP (cf. Scheme) problem-solving strategy.
High-performance computing (HPC) focuses on delivering large amounts of computing for relatively short amounts of time, e.g., minutes or days.
High-throughput computing (HTC) also focuses on delivering large amounts of computing, but for longer amounts of time, e.g., months or years [Condor project].
Beowulf clusters
A Beowulf cluster is a networked collection of off-the-shelf computers that are used together to solve (large-scale) computing problems.
Nodes, head node, worker node.
dedicated clusters, rack-mounted clusters
St. Olaf's Beowulf clusters
Several clusters:
Helios, MistRider, Cumulus, Castaway
; others under developmentHead node:
cumulus.cs.stolaf.edu
Head nodes are virtual machines on a 64-core computer
Some history...
< >