Introduction to OpenMP programming
Split SPLIT_ID
CS 300, Parallel and Distributed Computing (PDC)
Due Thursday, October 18, 2018
run on >= 3 environments
g++ -fopenmp -o trap_omp trap_omp.cpp
experiment with trap_omp
fix sections
sections with function calls
Other constructs?
Preliminary material
Shared Memory Parallel computing
OpenMP concepts
parallel for
andparallel sections
constructsPotential for race conditions...
Laboratory exercises
On a link computer, create a
~/PDC/lab5
subdirectory for work on the lab, and change directory to that directory.On that link computer, copy
~rab/pdc/trap_omp.cpp
to yourlab5
directory, then compile and link this OpenMP version of the trapezoid computation as follows:link% g++ -fopenmp -o trap_omp trap_omp.cpp
The
-fopenmp
flag requests compiling and linking support for OpenMP. Note that one could also compile and link in separate steps, in which case-fopenmp
should be used in both commands.Try running the resulting program
trap_omp
without command-line arguments, then with a single positive command-line argument to request different thread counts. Observe how the output varies.Make a
commit
of this code:link% git add trap_omp.cpp link% git commit -m "lab5: Trapezoidal approximation with OpenMP"
The Fork-Join pattern involves (1) allocating one or more threads ("forking," as if creating a fork in the road for two separate execution paths), (2) carrying out computations in those threads in parallel, then (3) waiting for all of those threads to finish their work ("joining" the separate execution paths back into a single path) before proceeding with sequential execution.
A Thread Pool is a collection of threads that have been created and not yet destroyed that a programmer can reuse for segments of parallel computations as needed.
-
The
time
feature of the shell provides running time information for a program. Uselink% time ./trap_omp n
to see how the running time varies depending on the number of threads used.Try several powers of 2. Also try numbers that are near but not exactly powers of 2 (both above and below), and look for interesting patterns, as well as some arbitrary values that are not near powers of two. Try multiple runs with the same number of threads for a few thread counts: are the results always the same?
Record your observations and results in a file
README
in yourlab5
directory.Create a
commit
containing yourREADME
results.link% git add README link% git commit -m "lab5: Link performance testing recorded in README"
Compare performance of
trap_omp
on that link computer with the 64-core computerthing2.cs.stolaf.edu
, as follows. Report on your observations inREADME
To accomplish this step,
Pull/push
your work on the Link computer to your stogit repository.link% git pull origin master link% git push origin master
Copy your Link public SSH key to
thing2.cs.stolaf.edu
. Note: Using passwordless SSH is more secure than sending a password over the network. The following command adds your Link public SSH key to your account's~/.ssh/authorized_keys
file onthing2.cs.stolaf.edu
.link% ssh-copy-id username@thing2.cs.stolaf.edu
You can expect to be asked whether you trustthing2.cs.stolaf.edu
, and for your CS-managed password for connecting tothing2
this first time.- To test this step:
Enter
link% ssh thing2.cs.stolaf.edu
You should be able to log in successfully without a password.
Log into
thing2.cs.stolaf.edu
(no password should be required). Prepare yourthing2
account for git.thing2$ git config --global user.name "Your Name" thing2$ git config --global user.email username@stolaf.edu thing2$ git config --global core.editor emacs
Also create an SSH key on
thing2
.thing2$ ssh-keygen -t rsa
As before, you can use default responses for all three prompts from thessh-keygen
command.- To test this step:
Copy your new public SSH key to another CS-managed computer that you have an account on, and SSHing to that computer. For example,
thing2$ ssh-copy-id username@.cs.stolaf.edu thing2$ ssh username@.cs.stolaf.edu
Use your own username forusername
. Thessh-copy-id
command should prompt for your CS-managed password, but thessh
command should succeed in logging you intocumulus
without a password.
Then, manually copy your
thing2
public SSH key tostogit
, by printing that public key file in your terminal windowthing2$ cat ~/.ssh/id_rsa.pub
browsing tostogit.cs.stolaf.edu
and logging in (with CS-managed password), navigating to add an SSH key, and copy/pasting the public key file's contents, as described in Getting started with Stogit.Now clone your stogit repository on
thing2
thing2$ cd ~ thing2$ git clone git@stogit.cs.stolaf.edu:pdc-f16/username.git PDC
This should create a new subdirectory~/PDC
that contains all of your PDC repository.Change to your
~/PDC/lab5
subdirectory and compiletrap_omp.cpp
using the same compilation command as on the Link machine.Proceed to use
time
to test the performance of the resulting executabletrap_omp
. Record your results by adding toREADME
, and add observations inREADME
about how performance and speedup differ on the two systems.Commit your changes to
README
:thing2$ git add README thing2$ git commit -m "lab5: thing2 performance testing in README"
Create a program
trap2.cpp
that is the same astrap_omp.cpp
, except removing thereduction
clause in the OpenMP construct and addingintegral
as ashared
variable. Try running this with various numbers of threads, including 1 (the default). How does this change the output? Can you explain this behavior? Write your observations and conclusions inREADME
thing2$ git add trap2.cpp README thing2$ git commit -m "lab5: trade reduction for shared variable"
The program
sections.cpp
was presented in class. Copy this program to your directory on a Link computer, compile it, and observe the behavior of the program over multiple runs.Notes:
To avoid a merge commit, first pull/push your work on
thing2
.thing2$ git pull origin master thing2$ git push origin master
Now, log into a link computer and copy
sections.cpp
to yourlab5
directory on that link computer. Compile using-fopenmp
as you did with other OpenMP programs.Report on your runs in your
README
file. Can you explain what went wrong for two or more of your sample runs? Include that analysis inREADME
.Note: Please create only one
README
file, and use it to describe what you find on both computing systems, rather than making multipleREADME
s.Create a
commit
to record your progress.link$ cp sections.cpp sections1.cpp
link% git add sections1.cpp README link% git commit -m "lab5: buggy runs of sections.cpp"
Since you have been
commit
ting on multiple machines, let's do a pull/push to double-check that the repository is up to date with your most recent changes on the Link machine.link% git pull origin master link% git push origin master
Note: If you did the steps above slightly differently than the instructions, you may find that
pull
ing causes a mergecommit
, and potentially a merge conflict. If you are not yet comfortable with mergecommit
s and merge conflicts, see this video.
Use OpenMP clauses, constructs, and/or other strategies to correct the behavior of
sections.cpp
.Notes:
First, decide what correct behavior means. Does it mean that computation produces the same answer each time? That each C++ statement occurs completely without interruption? That each section is computed once and only once? Write your definition of correct in your
README
file.The
sections
construct supports the following OpenMP clauses (and others):private, reduction, num_threads, shared, default
(eithernone
orshared
in C/C++). Other constructs may help, too.
Keep notes on your efforts to correct
sections.cpp
inREADME
.When you're ready, create a
commit
link% git add sections.cpp README link% git commit -m "lab5: Fixed sections.cpp bugs"
Patterns in trap_omp.cpp
.
Examining the code of trap_omp.cpp
, we see
Data Decomposition at work as before, this time
splitting work of adding the areas of trapezoids among multiple
threads within a single process instead of among multiple processes
spread out on a cluster. However, the higher-level OpenMP
#pragma omp
call mostly conceals the details of
Data Decomposition, since OpenMP divides the interval
of trapezoids among the threads automatically with
#pragma omp for
.
The #pragma
's reduction(+, integral)
clause
specifies that a reduce operation should take place to add the partial
sums of trapezoids. We have known this as a
Collective Communication pattern. However, unlike
MPI, OpenMP does not need to use network communication for
this reduce operation. Instead, OpenMP can share values among threads
using memory locations in order to accomplish the reduction. All of
the details are hidden in that reduction()
clause, except
the essentials of what reduction operation to perform and what values
to reduce, namely the value of each thread's variable
integral
.
The trap_omp.cpp
code also represents Loop
parallel, which focuses on computationally
intensive loops as opportunities for parallelism. In the case of
trap_omp.cpp
, as with the MPI program
trap.cpp
, the number of iterations constitutes the
main factor in computational intensity. Other loops may include more
computation within each iteration of a loop.
Threads in OpenMP
Behind the scenes, OpenMP's (parallel for
) feature
divides the work of the loop among multiple threads, which
carry out separate paths of execution within a process. Threads
are sometimes called lightweight processes: they execute
their own sequences of instructions much as if they were independent
processes, but there is less overhead computation to switch between
threads within a process than to switch between different processes.
This is because a process's threads share computational
resources such as memoy with that process. Those shared
resources don't need to be switched, but only thread-specific
resources such as those that control the execution path
(e.g., the program counter as discussed in Hardware
Design). As we have seen with OpenMP clauses such as
shared()
and private()
, a programmer can
control whether certain resources are shared among threads or dedicated to
individual threads when solving a parallel programming problem.
Of course, there is computational overhead for a process to create threads and destroying them when no longer needed. Here are two patterns relevant to managing threads.
OpenMP uses Fork-Join and
Thread Pool implicitly in its work, so an
OpenMP programmer
never forks threads or interacts with OpenMP's thread pool directly.
The omp parallel for
directive splits up the range of
values of a loop-control variable
(Data Decomposition) of a loop, then carries out
those parcels of work using
Fork-Join. Also, OpenMP typically creates a
thread pool once at the outset of a program,
then reuses that thread pool for all of the omp parallel
regions throughout
a program's code, in order to save the computational overhead of
repeatedly creating and destroying threads.
A programmer doesn't need to use OpenMP to program with threads. We will see several other ways to use threads implicitly using other libraries (such as OpenMP) or other languages. A programmer can also create and use threads directly using various thread packages such as C++-11 threads and Posix threads (pthreads, also available for C language).
Deliverables
First, perform a pull/push
on thing2
to double check that your changes on thing2
were sent to your repository, and to update the
working directory on thing2
.
To do this, first SSH into
thing2
,cd
to yourlab5
subdirectoryPerform the following.
thing2$ git pull origin master thing2$ git push origin master
If you encounter a mergecommit
or merge conflict, this video may help.Log back into your link computer and
cd
to yourlab5
file for the following steps.
All of your code for this assignment should already be
contained in commit
s. Modify the most recent
commit
message to indicate that you are submitting
the completed assignment.
link% git commit --amendAdd the following to the latest
commit
message.
lab5: complete
If your assignment is not complete, indicate your progress instead, e.g.,lab5: items 1-5 complete, item 6 partialYou can make later commits to submit updates.
Finally, pull/push
your commit
s in
the usual way.
link% git pull origin master link% git push origin master
Use one of the
git
stratgies in lab1 Deliverables to submit
your work in this lab. For example, rename your Link lab5
directory to lab5-link
, and rename your thing3
lab5
directory to lab5-thing3
on thing3
.
Include the cluster name in your commit message, e.g.,
thing3$ cd ~/PDC thing3$ git pull origin master thing3$ git add -A thing3$ git commit -m "Lab 18 submit (thing3)" thing3$ git push origin masterAlso, fill out this form to report on your work.
See lab1 Deliverables
section if you need to set up the cluster you used for
git
.
If you did your work on two different clusters, submit work from both of them using one of the strategies in lab1 Deliverables.
This lab is due by Friday, October 14, 2016.
Files: trap_omp.cpp README trap2.cpp sections1.cpp sections.cpp