Performance improvement with OpenMP
CS 300, Parallel and Distributed Computing (PDC)
Due Thursday, October 25, 2018
Sequential alg
Parallelize
Opt 1: delete all evens
Opt 2: precompute, then remove barriers
Opt 3: cache
Preliminary material
OpenMP features
Given sequential algorithm for sieve computation of primes
Makefile for this lab (to be used in a subdirectory per lab segment.
Laboratory exercises
On the link computers, create a
lab6
subdirectory for work on the lab, and change directory to that directory.Create a subdirectory
omprimes1
of yourlab6
directory. Copy~cs300/lab6/Makefile
and sequential version~cs300/lab6/omprimes1.cpp
for computing primes to youromprimes1
subdirectory, then change directory to that subdirectory. Renameomprimes1.cpp
tosieve.cpp
(or create a symbolic link).Issue the
make
command to compile the given program, then test it using% bin/sieve 100 4
Verify that it correctly prints the first 20 primes, and that it correctly determines the number of primes less than 100 (you can start counting at the 21st prime by using the output provided).
Note: The first command-line argument for
sieve.cpp
is an upper bound on the primes found by this program: the largest prime it will find is 97 (maximum prime less than or equal to 100). The second argument is the number of threads used in OpenMP parallel computation. Since this is a sequential program (noparallel
constructs), the second command-line argument is actually ignored for the computation in this version of the program, althoughsieve.cpp
requires a user to provide both arguments anyway.Test the code to see how many primes can be computed, e.g., one million or 100 million primes. Record your observations in a file
lab6/README
.Record this work in a
commit
.link% git add omprimes1.cpp sieve.cpp Makefile ../README link% git commit -m "lab6: time performance of sequential version"
Omitomprimes1.cpp
if it no longer exists due to renaming.Change back to your
lab6
subdirectory, and create a new subdirectoryomprimes2
for a second version of primes computation. Copy the sequential versionomprimes1/sieve.cpp
and the Makefileomprimes1/Makefile
to your newomprimes2
subdirectory. (This can be done conveniently using the command% cp omprimes1/{sieve.cpp,Makefile} omprimes2
). Then makeomprimes2
your current directory.Modify the program
sieve.cpp
inlab6/omprimes2
to operate in parallel, usingOpenMP
. Look for any opportunities for parallelism, decide whether parallelism is wise, and double-check your work to insure that the parallelism is correct (no race conditions or other indeterminacy.Compile and test your code, first deterimining 100 primes, then testing with large numbers. Compare results with the sequential version -- do they agree?
Use
time
to compare performance foromprimes1/bin/sieve
andomprimes2/bin/sieve
for various prime ranges and thread counts. Also compare these programs' performance on the 32-core computers and potentially other systems. Record your results, observations, and conclusions inlab6/README
.Record this work in a
commit
.link% git add sieve.cpp Makefile ../README link% git commit -m "lab6: time performance of parallel version"
Further performance improvements
Create a shell script
lab6/sievetest.sh
for uniformly and conveniently running your benchmark tests for various versions of thesieve.cpp
program. Here is an example:#!/bin/bash TIME="/usr/bin/time -p" OUT=../sievetest.`date +%m%d_%H:%M:%S` for DIR in $* do cd $DIR for MAX in 1000000 100000000 do for NTHREADS in 1 4 16 32 do echo ========== $DIR $MAX $NTHREADS ========== >> $OUT for COUNT in `seq 1 3` do $TIME bin/sieve $MAX $NTHREADS >/dev/null 2>>$OUT done echo "" >> $OUT done done cd .. done
This is a file of commands for the (Bourne or
bash
) shell. You can create a filesievetest.sh
in yourlab6
subdirectory containing that code, make it executable% chmod +x sievetest.sh
then run% ./sievetest.sh omprimes1 omprimes2
as a program to run yourbin/sieve
programs in youromprimes1
andomprimes2
directories, three times at each of several choices of maximum prime and thread count. The results will be stored in a file whose name includes timestamp information. Note: Keep in mind that everyone will be doing this -- be selective about how many tests you run! Also, the tests other people will be running on the same machine may affect your results (which is a reason to make multiple runs for each choice of command-line arguments, but not to make too many replicate runs).Try the benchmarking script
sievetest.sh
with the code in youromprimes1
andomprimes2
directories on a lab machine. Examine the output filesievetest.date
freom each such run, and enter any observations inREADME
.Record this work in a
commit
.
Note: Since you are using git to submit your work for grading, include a small number ofsievetest.date
files produced bysievetest.sh
. If you have many of these output files, please select one or a few of them, perhaps deleting or renaming the rest (e.g., copying them to a subdirectorylab6/old
), so graders will know what to look at.link% git add sievetest.sh README sievetest.*_* link% git commit -m "lab6: performance testing script sievetest.sh, plus some sample runs on Link"
Note: The shell notationsievetest.*_*
will match allsievetest.date
files in yourlab6
subdirectory, since all of thosedate
s have an underscore character.-
Copy your work on this lab to
thing2
via git, and trysievetest.sh
on that machine.To accomplish this, first
pull/push
to upload your work from your Link account to your stogit repository.link% git pull origin master link% git push origin master
Then,ssh
tothing2
and perform apull
to update yourthing2
working directory.link% ssh thing2.cs.stolaf.edu thing2$ cd PDC thing2$ git pull origin master thing2$ cd lab6
Finally, make your executables onthing2
and runsievetest.sh
to generate some more test data.thing2$ cd omprimes1 ; make clean ; make thing2$ cd ../omprimes2 ; make clean ; make thing2$ cd .. thing2$ sievetest.sh omprimes1 omprimes2
Notes:Using semicolons enable us to enter multiple commands per line. Thus,
thing2$ cd omprimes1 ; make clean ; make
is equivalent tothing2$ cd omprimes1 thing2$ make clean thing2$ make
You may enter the separate commands if you prefer.The
clean
target in the providedMakefile
deletes files such assieve.o
and the executablesieve
, if those files exist. If youcommit
ted only source files, there should be none of those files; issuingmake clean
anyway insure that you are creating new executablessieve
in case you pushed something that wasn't a source file (or test output).
Now, examine your new
sievetest.date
files and enter any observations inREADME
Record your work on
thing2
in acommit
as follows.thing2$ git add sievetest.*_* README thing2$ git commit -m "lab6: sievetest.sh runs on thing2"
Note: This re-commit
s previouslycommit
tedsievetest.date
files, even though they haven't changed. This does not cause any harm, and it's a convenient way to be include the newsievetest.date
file(s).Also, copy your code directories and
sievetest.sh
to athingn
machine, and run the benchmarks there to compare performance on those computers. Here is a way to perform the copy and tests:% cd ~/lab6 % tar cf primes.tar omprimes1 omprimes2 sievetest.sh % scp primes.tar helios.public.stolaf.edu: % ssh helios helios $ scp primes.tar thingn: helios $ ssh thingn thingn $ tar xvf primes.tar thingn $ cd omprimes1 ; make clean; make thingn $ cd ../omprimes2 ; make clean; make thingn $ cd .. thingn $ sievetest.sh omprimes1 omprimes2
The
tar
program creates archives of files and directories, stored in a single file (primes.tar
in this case). Theclean
target for theMakefile
deletes any object files and executables, so these can be remade on the new machine. (The executables created on lab machines will probably work onthingn
, but this tests with code generated by the compiler on thatthingn
. The program
~cs300/lab6/omprimes3.cpp
adds OpenMP constructs to parallelize the sieve computation.Note: The OpenMP
single
construct insures that only one thread updates the value of the shared variablep
.On a Link computer, create a subdirectory
lab6/omprimes3
and copyomprimes3.cpp
and theMakefile
into this newomprimes3
directory. To make the code within theomprimes3
directory, copy (or link)omprimes3.cpp
tosieve.cpp
within that folder and entermake
.Use
./sievetest.sh omprimes3
to perform benchmarks on this version. How does it compare with the sequential version? With your parallelization inomprimes2
? Enter your observations inREADME
.Note: You may notice that your Link account's copy of
README
doesn't include changes that you made onthing2
. Don't copy thething2
changes to your LinkREADME
at this time. We will incorporate those changes later using git.Record this update of your work on your Link account in a
commit
as follows.link% cd ~/PDC/lab6
link% git add omprimes3/{omprimes3.cpp,sieve.cpp,Makefile} sievetest.*_* README link% git commit -m "lab6: omprimes3 alternate parallelization"
In order to improve performance of
omprimes3
, observe that a lot of time is spent performing initialization and sieve operations on a large array. Half of the time, multiples of a primep
will be even numbers, which we already know are non-primes (exceptp=2
).Use this observation to modify your program to store only odd numbers in the array
marked[]
.First, on a Link computer, create a subdirectory
lab6/omprimes4
, and copyomprimes3/sieve.cpp
and theMakefile
toomprimes4
.In
omprimes4/sieve.cpp
, use the memory locationmarked[i]
to represent the number2*i+1
; for example,marked[0]
represents 1,marked[1]
represents 3, etc.Start
count
at 1 instead of zero, in order to count the only even prime 2.Observe that marking every
p
elements of this compressedmarked[]
array will correctly sieve the odd multiples ofp
from the prime computation.Be sure to adjust the starting and ending indices for loops. The index for a number
N
will be(N-1)/2
; for example, the index forp*p
will be(p*p-1)/2
.Test your program manually with
maxprimes
= 100 andnumthreads
= 4 to check that your program is computing primes directly, before proceeding to benchmark tests.
Use
./sievetest.sh omprimes4
to perform benchmarks on your debugged code. How does it compare to earlier versions? Enter observations inREADME
.Record this update of your work on your Link account in a
commit
as follows.link% cd ~/PDC/lab6
link% git add omprimes4/{sieve.cpp,Makefile} sievetest.*_* README link% git commit -m "lab6: omprimes4 improved parallelization"
-
______
Patterns in Performance improvement with OpenMP .
Loop Parallel; Mutual Exclusion
______single
; Collective Communication BarrierThe OpenMP
single
construct used inomprimes3.cpp
causes a barrier: all threads must wait for that new prime valuep
to be determined before they can mark multiples of that prime in their segments of the arraymarked[]
. To eliminate this barrier, we will precompute all the primes we will need (up to the square root ofprime_max
), then let each thread proceed to move through those primes at its own rate.To implement this idea, make a copy of our
omprimes4/sieve.cpp
in a new directoryomprimes5
, plus theMakefile
, and add a sequential version of the prime sieve to compute the primes up toceil(sqrt((double)prime_max))
.Suggestion: Compute these values in another array (perhaps named
pre_marked[]
), and store them in avector
variablekset
for convenient, quick access by all threads during the parallel computation.Verify that this program is running correctly, then use
sievetest.sh
to compare to other versions. Enter observations inREADME
.Record this update of your work on your Link account in a
commit
as follows.link% cd ~/PDC/lab6
link% git add omprimes5/{sieve.cpp,Makefile} sievetest.*_* README link% git commit -m "lab6: omprimes5 precompute primes to eliminate barrier"
Deliverables
Since you worked on both your Link account and thing2
, you must merge the work on those systems in
order to update your stogit repository. Carry this out using the
following steps.
On
thing2
, push your latestcommit
(s).thing2$ git pull origin master thing2$ git push origin master
Unless you have done otherpush
operations besides those in this lab, thispull/push
should work as usual. But if you encounter a problem with thepull
operation, see below.On a link computer, attempt a
pull
operationlink% git pull origin master
Since you made different changes toREADME
on your Link account and onthing2
, this will presumably cause a merge conflict (unless you already encountered a merge conflict in a priorpull
). See the video on merge conflicts for more information.To reconcile this merge conflict:
Edit the Link copy of
README
and edit the sections delimited by<<<<<<<< ... ======== ... >>>>>>>>
to reflect your README entries in the correct order.Create a new commit containing your modified
README
.link% git add README link% git commit -m "merged entries in README from Link and thing2"
Now try
pull
ing again. The following should now succeed (unless there have been more changes to your stogit repository since your lastpull
attempt).link% git pull origin master link% git push origin master
All of your code for this assignment should already be
contained in commit
s. Modify the most recent
commit
message to indicate that you are submitting
the completed assignment.
link% git commit --amendAdd the following to the latest
commit
message.
lab6: complete
If your assignment is not complete, indicate your progress instead, e.g.,lab6: items 1-5 complete, item 6 partialYou can make later commits to submit updates.
Finally, pull/push
your commit
s in
the usual way.
link% git pull origin master link% git push origin master
Files: README omprimes1/omprimes1.cpp omprimes1/sieve.cpp omprimes1/Makefile omprimes2/sieve.cpp omprimes2/Makefile sievetest.sh sievetest.*_* omprimes3/omprimes3.cpp omprimes3/sieve.cpp omprimes3/Makefile omprimes4/sieve.cpp omprimes4/Makefile omprimes5/sieve.cpp omprimes5/Makefile
Use one of the
git
stratgies in lab1 Deliverables to submit
your work in this lab. For example, if you worked exclusively on
thing2
, include the name thing2
in your
commit message.
thing2$ cd ~/PDC thing2$ git pull origin master thing2$ git add -A thing2$ git commit -m "Lab 6 submit (thing2)" thing2$ git push origin masterLikewise, if you worked on both link machines and a
thing
, you can rename your lab6
folders and submit each, with commit messages indicating the computer
on which that commit was created.Also, fill out this form to report on your work.
See lab1 Deliverables
section if you need to set up the cluster you used for
git
.
If you did your work on two different clusters, submit work from both of them using one of the strategies in lab1 Deliverables.
This lab is due by Saturday, October 22, 2016.