Performance improvement with OpenMP
CS 300, Parallel and Distributed Computing (PDC)
Due Wednesday, January 14, 2015
Sequential alg
Parallelize
Opt 1: delete all evens
Opt 2: precompute, then remove barriers
Opt 3: cache
OpenMP features
Given sequential algorithm for sieve computation of primes
Makefile for this lab (to be used in a subdirectory per lab segment.
______
On the link computers, create a
lab6
subdirectory for work on the lab, and change
directory to that directory.
Create a subdirectory omprimes1
of your
lab6
directory. Copy the Makefile and
sequential version omprimes1.cpp
for computing primes to
your omprimes1
subdirectory, then change directory to
that subdirectory. Rename omprimes1.cpp
to
sieve.cpp
(or create a symbolic link).
Issue the make
command to compile the given
program, then test it using
% bin/sieve 100 4
Verify that it correctly prints the first 20 primes, and that it correctly determines the number of primes less than 100 (you can start counting at the 21st prime by using the output provided).
Note: The first command-line argument for
sieve.cpp
is an upper bound on the primes found by this
program: the largest prime it will find is 97 (maximum prime less
than or equal to 100). The second argument is the number of threads
used in OpenMP parallel computation. Since this is a sequential
program (no parallel
constructs), the second command-line
argument is actually ignored for
the computation in this version of the program, although
sieve.cpp
requires a user to provide both arguments
anyway.
Test the code to see how many primes can be computed, e.g., one
million or 100 million primes. Record your observations in a file
lab6/README
.
Change back to your lab6
subdirectory, and create a new subdirectory omprimes2
for a second version of primes computation. Copy the sequential
version omprimes1/sieve.cpp
and the Makefile
omprimes1/Makefile
to your new omprimes2
subdirectory. (This can be done conveniently using the command
% cp omprimes1/{sieve.cpp,Makefile} omprimes2). Then make
omprimes2
your current directory.Modify the program sieve.cpp
in lab6/omprimes2
to operate in parallel, using
OpenMP
. Look for any opportunities for parallelism,
decide whether parallelism is wise, and double-check your work to
insure that the parallelism is correct (no race conditions or other
indeterminacy.
Compile and test your code, first deterimining 100 primes, then testing with large numbers. Compare results with the sequential version -- do they agree?
Use time
to compare performance for
omprimes1/bin/sieve
and
omprimes2/bin/sieve
for various prime ranges and thread
counts. Also compare these programs' performance on the 32-core
computers and potentially other systems. Record your results,
observations, and conclusions in lab6/README
.
Create a shell script for uniformly and conveniently running
your benchmark tests for various versions of the
sieve.cpp
program. Here is an example:
#!/bin/sh TIME="/usr/bin/time -p" OUT=../sievetest.`date +%m%d_%H:%M:%S` for DIR in $* do cd $DIR for MAX in 1000000 100000000 do for NTHREADS in 1 4 16 32 do echo ========== $DIR $MAX $NTHREADS ========== >> $OUT for COUNT in `seq 1 3` do $TIME bin/sieve $MAX $NTHREADS >/dev/null 2>>$OUT done echo "" >> $OUT done done cd .. done
This is a file of commands for the (Bourne or bash
)
shell. You can create a file sievetest.sh
in your
lab6
subdirectory containing that
code, make it executable
% chmod +x sievetest.shthen run
% ./sievetest.sh omprimes1 omprimes2as a program to run your
bin/sieve
programs in your
omprimes1
and omprimes2
directories,
three times at each of several choices of maximum prime and thread
count. The results will be stored in a file whose name includes
timestamp information. Note: Keep in mind that everyone
will be doing
this -- be selective about how many tests you run! Also, the
tests other people will be running on the same machine may affect
your results (which
is a reason to make multiple runs for each choice of
command-line arguments, but not to make too many
replicate runs).
Try the benchmarking script sievetest.sh
with the code
in your omprimes1
and omprimes2
directories on a lab machine.
Also, copy your code directories and sievetest.sh
to a
thingn
machine, and run the benchmarks there
to compare performance on those computers. Here is a way to
perform the copy and tests:
% cd ~/lab6 % tar cf primes.tar omprimes1 omprimes2 sievetest.sh % scp primes.tar helios.public.stolaf.edu: % ssh helios helios $ scp primes.tar thingn: helios $ ssh thingn thingn $ tar xvf primes.tar thingn $ cd omprimes1 ; make clean; make thingn $ cd ../omprimes2 ; make clean; make thingn $ cd .. thingn $ sievetest.sh omprimes1 omprimes2
The tar
program creates archives of files and
directories, stored in a single file (primes.tar
in
this case). The clean
target for the
Makefile
deletes any object files and executables, so
these can be remade on the new machine.
(The executables created on lab machines will probably work on
thingn
, but this tests with code generated
by the compiler on that thingn
.
The program cs300/lab6/omprimes3.cpp
adds OpenMP constructs to parallelize
the sieve computation.
Note: The OpenMP single
construct insures
that only one thread updates the value of the shared variable p
.
Create a subdirectory lab6/omprimes3
and copy omprimes3.cpp
and the
Makefile
into this new omprimes3
directory.
Use ./sievetest.sh omprimes3
to perform benchmarks on
this version. How does it compare with the sequential version? With
your parallelization in omprimes2
? Record your
observations in README
.
In order to improve performance of omprimes3
,
observe that a lot of time is spent performing initialization and
sieve operations on a large array. Half of the time, multiples of a
prime p
will be even numbers, which we already know are
non-primes (except p=2
).
Use this observation to modify your program to store only odd
numbers in the array marked[]
. Carry out this
modification in a subdirectory omprimes4
.
Use the memory location marked[i]
to represent the
number 2*i+1
; for example, marked[0]
represents 1, marked[1]
represents 3, etc.
Start count
at 1 instead of zero, in order to
count the only even prime 2.
Observe that marking every p
elements of this compressed
marked[]
array will correctly sieve the odd multiples of
p
from the prime computation.
Be sure to adjust the starting and ending indices for loops.
The index for a number N
will be (N-1)/2
;
for example, the index for p*p
will be
(p*p-1)/2
.
Test your program manually with maxprimes
= 100
and numthreads
= 4 to check that your program is
computing primes directly, before proceeding to benchmark tests.
Use ./sievetest.sh omprimes4
to perform benchmarks on
your debugged code. How does it compare to earlier versions? Record
observations in README
.
Patterns in Performance improvement with OpenMP.
Loop Parallel;
Mutual Exclusion single
;
Collective Communication Barrier
The OpenMP single
construct used in
omprimes3.cpp
causes a barrier: all threads
must wait for that new prime value p
to be determined
before they can mark multiples of that prime in their segments of the
array marked[]
. To eliminate this barrier, we will
precompute all the primes we will need (up to the square root of
prime_max
), then let each thread proceed to move through
those primes at its own rate.
To implement this idea, make a copy of our
omprimes4/sieve.cpp
in a new directory
omprimes5
, plus the Makefile
, and add a
sequential version of the prime sieve to compute the primes up to
ceil(sqrt((double)prime_max))
.
Suggestion: Compute these values in another array
(perhaps named pre_marked[]
), and store them in a
vector
variable kset
for convenient, quick
access by all threads during the parallel computation.
Verify that this program is running correctly, then compare to
other versions. Store observations in README
.
Use one of the
git
stratgies in lab1 Deliverables to submit
your work in this lab. For example, if you worked exclusively on
thing3
, include the name thing3
in your
commit message.
thing3$ cd ~/PDC thing3$ git pull origin master thing3$ git add -A thing3$ git commit -m "Lab 6 submit (thing3)" thing3$ git push origin masterLikewise, if you worked on both link machines and a
thing
, you can rename your lab6
folders and submit each, with commit messages indicating the computer
on which that commit was created.Also, fill out this form to report on your work.
See lab1 Deliverables
section if you need to set up the cluster you used for
git
.
If you did your work on two different clusters, submit work from both of them using one of the strategies in lab1 Deliverables.
This lab is due by Wednesday, January 14, 2015.