Performance improvement with OpenMP
CS 300, Parallel and Distributed Computing (PDC)
Due Thursday, October 25, 2018
Sequential alg
Parallelize
Opt 1: delete all evens
Opt 2: precompute, then remove barriers
Opt 3: cache
Preliminary material
OpenMP features
Given sequential algorithm for sieve computation of primes
Makefile for this lab (to be used in a subdirectory per lab segment.
Laboratory exercises
On the link computers, create a
lab6subdirectory for work on the lab, and change directory to that directory.Create a subdirectory
omprimes1of yourlab6directory. Copy~cs300/lab6/Makefileand sequential version~cs300/lab6/omprimes1.cppfor computing primes to youromprimes1subdirectory, then change directory to that subdirectory. Renameomprimes1.cpptosieve.cpp(or create a symbolic link).Issue the
makecommand to compile the given program, then test it using% bin/sieve 100 4
Verify that it correctly prints the first 20 primes, and that it correctly determines the number of primes less than 100 (you can start counting at the 21st prime by using the output provided).
Note: The first command-line argument for
sieve.cppis an upper bound on the primes found by this program: the largest prime it will find is 97 (maximum prime less than or equal to 100). The second argument is the number of threads used in OpenMP parallel computation. Since this is a sequential program (noparallelconstructs), the second command-line argument is actually ignored for the computation in this version of the program, althoughsieve.cpprequires a user to provide both arguments anyway.Test the code to see how many primes can be computed, e.g., one million or 100 million primes. Record your observations in a file
lab6/README.Record this work in a
commit.link% git add omprimes1.cpp sieve.cpp Makefile ../README link% git commit -m "lab6: time performance of sequential version"
Omitomprimes1.cppif it no longer exists due to renaming.Change back to your
lab6subdirectory, and create a new subdirectoryomprimes2for a second version of primes computation. Copy the sequential versionomprimes1/sieve.cppand the Makefileomprimes1/Makefileto your newomprimes2subdirectory. (This can be done conveniently using the command% cp omprimes1/{sieve.cpp,Makefile} omprimes2). Then makeomprimes2your current directory.Modify the program
sieve.cppinlab6/omprimes2to operate in parallel, usingOpenMP. Look for any opportunities for parallelism, decide whether parallelism is wise, and double-check your work to insure that the parallelism is correct (no race conditions or other indeterminacy.Compile and test your code, first deterimining 100 primes, then testing with large numbers. Compare results with the sequential version -- do they agree?
Use
timeto compare performance foromprimes1/bin/sieveandomprimes2/bin/sievefor various prime ranges and thread counts. Also compare these programs' performance on the 32-core computers and potentially other systems. Record your results, observations, and conclusions inlab6/README.Record this work in a
commit.link% git add sieve.cpp Makefile ../README link% git commit -m "lab6: time performance of parallel version"
Further performance improvements
Create a shell script
lab6/sievetest.shfor uniformly and conveniently running your benchmark tests for various versions of thesieve.cppprogram. Here is an example:#!/bin/bash TIME="/usr/bin/time -p" OUT=../sievetest.`date +%m%d_%H:%M:%S` for DIR in $* do cd $DIR for MAX in 1000000 100000000 do for NTHREADS in 1 4 16 32 do echo ========== $DIR $MAX $NTHREADS ========== >> $OUT for COUNT in `seq 1 3` do $TIME bin/sieve $MAX $NTHREADS >/dev/null 2>>$OUT done echo "" >> $OUT done done cd .. doneThis is a file of commands for the (Bourne or
bash) shell. You can create a filesievetest.shin yourlab6subdirectory containing that code, make it executable% chmod +x sievetest.sh
then run% ./sievetest.sh omprimes1 omprimes2
as a program to run yourbin/sieveprograms in youromprimes1andomprimes2directories, three times at each of several choices of maximum prime and thread count. The results will be stored in a file whose name includes timestamp information. Note: Keep in mind that everyone will be doing this -- be selective about how many tests you run! Also, the tests other people will be running on the same machine may affect your results (which is a reason to make multiple runs for each choice of command-line arguments, but not to make too many replicate runs).Try the benchmarking script
sievetest.shwith the code in youromprimes1andomprimes2directories on a lab machine. Examine the output filesievetest.datefreom each such run, and enter any observations inREADME.Record this work in a
commit.
Note: Since you are using git to submit your work for grading, include a small number ofsievetest.datefiles produced bysievetest.sh. If you have many of these output files, please select one or a few of them, perhaps deleting or renaming the rest (e.g., copying them to a subdirectorylab6/old), so graders will know what to look at.link% git add sievetest.sh README sievetest.*_* link% git commit -m "lab6: performance testing script sievetest.sh, plus some sample runs on Link"
Note: The shell notationsievetest.*_*will match allsievetest.datefiles in yourlab6subdirectory, since all of thosedates have an underscore character.-
Copy your work on this lab to
thing2via git, and trysievetest.shon that machine.To accomplish this, first
pull/pushto upload your work from your Link account to your stogit repository.link% git pull origin master link% git push origin master
Then,sshtothing2and perform apullto update yourthing2working directory.link% ssh thing2.cs.stolaf.edu thing2$ cd PDC thing2$ git pull origin master thing2$ cd lab6
Finally, make your executables onthing2and runsievetest.shto generate some more test data.thing2$ cd omprimes1 ; make clean ; make thing2$ cd ../omprimes2 ; make clean ; make thing2$ cd .. thing2$ sievetest.sh omprimes1 omprimes2
Notes:Using semicolons enable us to enter multiple commands per line. Thus,
thing2$ cd omprimes1 ; make clean ; make
is equivalent tothing2$ cd omprimes1 thing2$ make clean thing2$ make
You may enter the separate commands if you prefer.The
cleantarget in the providedMakefiledeletes files such assieve.oand the executablesieve, if those files exist. If youcommitted only source files, there should be none of those files; issuingmake cleananyway insure that you are creating new executablessievein case you pushed something that wasn't a source file (or test output).
Now, examine your new
sievetest.datefiles and enter any observations inREADMERecord your work on
thing2in acommitas follows.thing2$ git add sievetest.*_* README thing2$ git commit -m "lab6: sievetest.sh runs on thing2"
Note: This re-commits previouslycommittedsievetest.datefiles, even though they haven't changed. This does not cause any harm, and it's a convenient way to be include the newsievetest.datefile(s).Also, copy your code directories and
sievetest.shto athingnmachine, and run the benchmarks there to compare performance on those computers. Here is a way to perform the copy and tests:% cd ~/lab6 % tar cf primes.tar omprimes1 omprimes2 sievetest.sh % scp primes.tar helios.public.stolaf.edu: % ssh helios helios $ scp primes.tar thingn: helios $ ssh thingn thingn $ tar xvf primes.tar thingn $ cd omprimes1 ; make clean; make thingn $ cd ../omprimes2 ; make clean; make thingn $ cd .. thingn $ sievetest.sh omprimes1 omprimes2
The
tarprogram creates archives of files and directories, stored in a single file (primes.tarin this case). Thecleantarget for theMakefiledeletes any object files and executables, so these can be remade on the new machine. (The executables created on lab machines will probably work onthingn, but this tests with code generated by the compiler on thatthingn. The program
~cs300/lab6/omprimes3.cppadds OpenMP constructs to parallelize the sieve computation.Note: The OpenMP
singleconstruct insures that only one thread updates the value of the shared variablep.On a Link computer, create a subdirectory
lab6/omprimes3and copyomprimes3.cppand theMakefileinto this newomprimes3directory. To make the code within theomprimes3directory, copy (or link)omprimes3.cpptosieve.cppwithin that folder and entermake.Use
./sievetest.sh omprimes3to perform benchmarks on this version. How does it compare with the sequential version? With your parallelization inomprimes2? Enter your observations inREADME.Note: You may notice that your Link account's copy of
READMEdoesn't include changes that you made onthing2. Don't copy thething2changes to your LinkREADMEat this time. We will incorporate those changes later using git.Record this update of your work on your Link account in a
commitas follows.link% cd ~/PDC/lab6
link% git add omprimes3/{omprimes3.cpp,sieve.cpp,Makefile} sievetest.*_* README link% git commit -m "lab6: omprimes3 alternate parallelization"In order to improve performance of
omprimes3, observe that a lot of time is spent performing initialization and sieve operations on a large array. Half of the time, multiples of a primepwill be even numbers, which we already know are non-primes (exceptp=2).Use this observation to modify your program to store only odd numbers in the array
marked[].First, on a Link computer, create a subdirectory
lab6/omprimes4, and copyomprimes3/sieve.cppand theMakefiletoomprimes4.In
omprimes4/sieve.cpp, use the memory locationmarked[i]to represent the number2*i+1; for example,marked[0]represents 1,marked[1]represents 3, etc.Start
countat 1 instead of zero, in order to count the only even prime 2.Observe that marking every
pelements of this compressedmarked[]array will correctly sieve the odd multiples ofpfrom the prime computation.Be sure to adjust the starting and ending indices for loops. The index for a number
Nwill be(N-1)/2; for example, the index forp*pwill be(p*p-1)/2.Test your program manually with
maxprimes= 100 andnumthreads= 4 to check that your program is computing primes directly, before proceeding to benchmark tests.
Use
./sievetest.sh omprimes4to perform benchmarks on your debugged code. How does it compare to earlier versions? Enter observations inREADME.Record this update of your work on your Link account in a
commitas follows.link% cd ~/PDC/lab6
link% git add omprimes4/{sieve.cpp,Makefile} sievetest.*_* README link% git commit -m "lab6: omprimes4 improved parallelization"-
______
Patterns in Performance improvement with OpenMP .
Loop Parallel; Mutual Exclusion
______single; Collective Communication BarrierThe OpenMP
singleconstruct used inomprimes3.cppcauses a barrier: all threads must wait for that new prime valuepto be determined before they can mark multiples of that prime in their segments of the arraymarked[]. To eliminate this barrier, we will precompute all the primes we will need (up to the square root ofprime_max), then let each thread proceed to move through those primes at its own rate.To implement this idea, make a copy of our
omprimes4/sieve.cppin a new directoryomprimes5, plus theMakefile, and add a sequential version of the prime sieve to compute the primes up toceil(sqrt((double)prime_max)).Suggestion: Compute these values in another array (perhaps named
pre_marked[]), and store them in avectorvariableksetfor convenient, quick access by all threads during the parallel computation.Verify that this program is running correctly, then use
sievetest.shto compare to other versions. Enter observations inREADME.Record this update of your work on your Link account in a
commitas follows.link% cd ~/PDC/lab6
link% git add omprimes5/{sieve.cpp,Makefile} sievetest.*_* README link% git commit -m "lab6: omprimes5 precompute primes to eliminate barrier"
Deliverables
Since you worked on both your Link account and thing2, you must merge the work on those systems in
order to update your stogit repository. Carry this out using the
following steps.
On
thing2, push your latestcommit(s).thing2$ git pull origin master thing2$ git push origin master
Unless you have done otherpushoperations besides those in this lab, thispull/pushshould work as usual. But if you encounter a problem with thepulloperation, see below.On a link computer, attempt a
pulloperationlink% git pull origin master
Since you made different changes toREADMEon your Link account and onthing2, this will presumably cause a merge conflict (unless you already encountered a merge conflict in a priorpull). See the video on merge conflicts for more information.To reconcile this merge conflict:
Edit the Link copy of
READMEand edit the sections delimited by<<<<<<<< ... ======== ... >>>>>>>>
to reflect your README entries in the correct order.Create a new commit containing your modified
README.link% git add README link% git commit -m "merged entries in README from Link and thing2"
Now try
pulling again. The following should now succeed (unless there have been more changes to your stogit repository since your lastpullattempt).link% git pull origin master link% git push origin master
All of your code for this assignment should already be
contained in commits. Modify the most recent
commit message to indicate that you are submitting
the completed assignment.
link% git commit --amendAdd the following to the latest
commit message.
lab6: complete
If your assignment is not complete, indicate your progress instead, e.g.,lab6: items 1-5 complete, item 6 partialYou can make later commits to submit updates.
Finally, pull/push your commits in
the usual way.
link% git pull origin master link% git push origin master
Files: README omprimes1/omprimes1.cpp omprimes1/sieve.cpp omprimes1/Makefile omprimes2/sieve.cpp omprimes2/Makefile sievetest.sh sievetest.*_* omprimes3/omprimes3.cpp omprimes3/sieve.cpp omprimes3/Makefile omprimes4/sieve.cpp omprimes4/Makefile omprimes5/sieve.cpp omprimes5/Makefile
Use one of the
git stratgies in lab1 Deliverables to submit
your work in this lab. For example, if you worked exclusively on
thing2, include the name thing2 in your
commit message.
thing2$ cd ~/PDC thing2$ git pull origin master thing2$ git add -A thing2$ git commit -m "Lab 6 submit (thing2)" thing2$ git push origin masterLikewise, if you worked on both link machines and a
thing, you can rename your lab6
folders and submit each, with commit messages indicating the computer
on which that commit was created.Also, fill out this form to report on your work.
See lab1 Deliverables
section if you need to set up the cluster you used for
git.
If you did your work on two different clusters, submit work from both of them using one of the strategies in lab1 Deliverables.
This lab is due by Saturday, October 22, 2016.