Performance improvement with OpenMP (CS 300, Parallel and Distributed Computing (PDC)<br/><span style='font-size: 65%'>Due Thursday, October 25, 2018</span>)

Performance improvement with OpenMP

CS 300, Parallel and Distributed Computing (PDC)
Due Thursday, October 25, 2018

Sequential alg
Parallelize
Opt 1: delete all evens
Opt 2: precompute, then remove barriers
Opt 3: cache

Preliminary material

OpenMP features
Given sequential algorithm for sieve computation of primes
Makefile for this lab (to be used in a subdirectory per lab segment.

Laboratory exercises

On the link computers, create a lab6 subdirectory for work on the lab, and change directory to that directory.
Create a subdirectory omprimes1 of your lab6 directory. Copy ~cs300/lab6/Makefile and sequential version ~cs300/lab6/omprimes1.cpp for computing primes to your omprimes1 subdirectory, then change directory to that subdirectory. Rename omprimes1.cpp to sieve.cpp (or create a symbolic link).
Issue the make command to compile the given program, then test it using
```
% bin/sieve 100 4
```
Verify that it correctly prints the first 20 primes, and that it correctly determines the number of primes less than 100 (you can start counting at the 21st prime by using the output provided).

Note: The first command-line argument for sieve.cpp is an upper bound on the primes found by this program: the largest prime it will find is 97 (maximum prime less than or equal to 100). The second argument is the number of threads used in OpenMP parallel computation. Since this is a sequential program (no parallel constructs), the second command-line argument is actually ignored for the computation in this version of the program, although sieve.cpp requires a user to provide both arguments anyway.
Test the code to see how many primes can be computed, e.g., one million or 100 million primes. Record your observations in a file lab6/README .

Record this work in a commit.

link%  git add omprimes1.cpp sieve.cpp Makefile ../README
link%  git commit -m "lab6: time performance of sequential version"

Omit omprimes1.cpp if it no longer exists due to renaming.

Change back to your lab6 subdirectory, and create a new subdirectory omprimes2 for a second version of primes computation. Copy the sequential version omprimes1/sieve.cpp and the Makefile omprimes1/Makefile to your new omprimes2 subdirectory. (This can be done conveniently using the command
```
% cp omprimes1/{sieve.cpp,Makefile} omprimes2
```
). Then make omprimes2 your current directory.
Modify the program sieve.cpp in lab6/omprimes2 to operate in parallel, using OpenMP. Look for any opportunities for parallelism, decide whether parallelism is wise, and double-check your work to insure that the parallelism is correct (no race conditions or other indeterminacy.

Compile and test your code, first deterimining 100 primes, then testing with large numbers. Compare results with the sequential version -- do they agree?
Use time to compare performance for omprimes1/bin/sieve and omprimes2/bin/sieve for various prime ranges and thread counts. Also compare these programs' performance on the 32-core computers and potentially other systems. Record your results, observations, and conclusions in lab6/README.

Record this work in a commit.

link%  git add sieve.cpp Makefile ../README
link%  git commit -m "lab6: time performance of parallel version"

Further performance improvements

Create a shell script lab6/sievetest.sh for uniformly and conveniently running your benchmark tests for various versions of the sieve.cpp program. Here is an example:
```
#!/bin/bash

TIME="/usr/bin/time -p"
OUT=../sievetest.`date +%m%d_%H:%M:%S`
for DIR in $*
do  cd $DIR
    for MAX in 1000000 100000000
    do  for NTHREADS in 1 4 16 32
	do  echo ========== $DIR $MAX $NTHREADS ========== >> $OUT
	    for COUNT in `seq 1 3`
	    do  $TIME bin/sieve $MAX $NTHREADS >/dev/null 2>>$OUT
	    done
	    echo "" >> $OUT
	done
    done
    cd ..
done
```
This is a file of commands for the (Bourne or bash) shell. You can create a file sievetest.sh in your lab6 subdirectory containing that code, make it executable
```
% chmod +x sievetest.sh
```
then run
```
% ./sievetest.sh omprimes1 omprimes2
```
as a program to run your bin/sieve programs in your omprimes1 and omprimes2 directories, three times at each of several choices of maximum prime and thread count. The results will be stored in a file whose name includes timestamp information. Note: Keep in mind that everyone will be doing this -- be selective about how many tests you run! Also, the tests other people will be running on the same machine may affect your results (which is a reason to make multiple runs for each choice of command-line arguments, but not to make too many replicate runs).

Try the benchmarking script sievetest.sh with the code in your omprimes1 and omprimes2 directories on a lab machine. Examine the output file sievetest.date freom each such run, and enter any observations in README.

Record this work in a commit.
Note: Since you are using git to submit your work for grading, include a small number of sievetest.date files produced by sievetest.sh. If you have many of these output files, please select one or a few of them, perhaps deleting or renaming the rest (e.g., copying them to a subdirectory lab6/old), so graders will know what to look at.
```
link%  git add sievetest.sh README sievetest.*_*
link%  git commit -m "lab6: performance testing script sievetest.sh, plus some sample runs on Link"
```
Note: The shell notation sievetest.*_* will match all sievetest.date files in your lab6 subdirectory, since all of those dates have an underscore character.
Copy your work on this lab to thing2 via git, and try sievetest.sh on that machine.

To accomplish this, first pull/push to upload your work from your Link account to your stogit repository.
```
link%  git pull origin master
link%  git push origin master
```
Then, ssh to thing2 and perform a pull to update your thing2 working directory.
```
link%  ssh thing2.cs.stolaf.edu
thing2$  cd PDC
thing2$  git pull origin master
thing2$  cd lab6 
```
Finally, make your executables on thing2 and run sievetest.sh to generate some more test data.
```
thing2$  cd omprimes1 ; make clean ; make
thing2$  cd ../omprimes2 ; make clean ; make
thing2$  cd ..
thing2$  sievetest.sh omprimes1 omprimes2
```
Notes:
- Using semicolons enable us to enter multiple commands per line. Thus,
```
thing2$  cd omprimes1 ; make clean ; make
```
  is equivalent to
```
thing2$  cd omprimes1 
thing2$  make clean 
thing2$  make
```
  You may enter the separate commands if you prefer.
- The clean target in the provided Makefile deletes files such as sieve.o and the executable sieve, if those files exist. If you committed only source files, there should be none of those files; issuing make clean anyway insure that you are creating new executables sieve in case you pushed something that wasn't a source file (or test output).
Now, examine your new sievetest.date files and enter any observations in README

Record your work on thing2 in a commit as follows.
```
thing2$  git add sievetest.*_* README
thing2$  git commit -m "lab6: sievetest.sh runs on thing2"
```
Note: This re-commits previously committed sievetest.date files, even though they haven't changed. This does not cause any harm, and it's a convenient way to be include the new sievetest.date file(s).
Also, copy your code directories and sievetest.sh to a thingn machine, and run the benchmarks there to compare performance on those computers. Here is a way to perform the copy and tests:
```
% cd ~/lab6 
% tar cf primes.tar omprimes1 omprimes2 sievetest.sh 
% scp primes.tar helios.public.stolaf.edu:
% ssh helios
helios $ scp primes.tar thingn:
helios $ ssh thingn
thingn $ tar xvf primes.tar
thingn $ cd omprimes1 ; make clean;  make
thingn $ cd ../omprimes2 ; make clean;  make
thingn $ cd ..
thingn $ sievetest.sh omprimes1 omprimes2
```
The tar program creates archives of files and directories, stored in a single file (primes.tar in this case). The clean target for the Makefile deletes any object files and executables, so these can be remade on the new machine. (The executables created on lab machines will probably work on thingn, but this tests with code generated by the compiler on that thingn.
The program ~cs300/lab6/omprimes3.cpp adds OpenMP constructs to parallelize the sieve computation.

Note: The OpenMP single construct insures that only one thread updates the value of the shared variable p.

On a Link computer, create a subdirectory lab6/omprimes3 and copy omprimes3.cpp and the Makefile into this new omprimes3 directory. To make the code within the omprimes3 directory, copy (or link) omprimes3.cpp to sieve.cpp within that folder and enter make.

Use ./sievetest.sh omprimes3 to perform benchmarks on this version. How does it compare with the sequential version? With your parallelization in omprimes2? Enter your observations in README.

Note: You may notice that your Link account's copy of README doesn't include changes that you made on thing2. Don't copy the thing2 changes to your Link README at this time. We will incorporate those changes later using git.

Record this update of your work on your Link account in a commit as follows.
```
link%  cd ~/PDC/lab6 
```
```
link%  git add omprimes3/{omprimes3.cpp,sieve.cpp,Makefile} sievetest.*_* README
link%  git commit -m "lab6: omprimes3 alternate parallelization"
```
In order to improve performance of omprimes3, observe that a lot of time is spent performing initialization and sieve operations on a large array. Half of the time, multiples of a prime p will be even numbers, which we already know are non-primes (except p=2).

Use this observation to modify your program to store only odd numbers in the array marked[].
- First, on a Link computer, create a subdirectory lab6/omprimes4, and copy omprimes3/sieve.cpp and the Makefile to omprimes4.
- In omprimes4/sieve.cpp, use the memory location marked[i] to represent the number 2*i+1; for example, marked[0] represents 1, marked[1] represents 3, etc.
- Start count at 1 instead of zero, in order to count the only even prime 2.
- Observe that marking every p elements of this compressed marked[] array will correctly sieve the odd multiples of p from the prime computation.
- Be sure to adjust the starting and ending indices for loops. The index for a number N will be (N-1)/2; for example, the index for p*p will be (p*p-1)/2.
- Test your program manually with maxprimes = 100 and numthreads = 4 to check that your program is computing primes directly, before proceeding to benchmark tests.
Use ./sievetest.sh omprimes4 to perform benchmarks on your debugged code. How does it compare to earlier versions? Enter observations in README.

Record this update of your work on your Link account in a commit as follows.
```
link%  cd ~/PDC/lab6 
```
```
link%  git add omprimes4/{sieve.cpp,Makefile} sievetest.*_* README
link%  git commit -m "lab6: omprimes4 improved parallelization"
```
Patterns in Performance improvement with OpenMP .

Loop Parallel; Mutual Exclusion single; Collective Communication Barrier
______
______
The OpenMP single construct used in omprimes3.cpp causes a barrier: all threads must wait for that new prime value p to be determined before they can mark multiples of that prime in their segments of the array marked[]. To eliminate this barrier, we will precompute all the primes we will need (up to the square root of prime_max), then let each thread proceed to move through those primes at its own rate.

To implement this idea, make a copy of our omprimes4/sieve.cpp in a new directory omprimes5, plus the Makefile, and add a sequential version of the prime sieve to compute the primes up to ceil(sqrt((double)prime_max)).

Suggestion: Compute these values in another array (perhaps named pre_marked[]), and store them in a vector variable kset for convenient, quick access by all threads during the parallel computation.

Verify that this program is running correctly, then use sievetest.sh to compare to other versions. Enter observations in README.

Record this update of your work on your Link account in a commit as follows.
```
link%  cd ~/PDC/lab6 
```
```
link%  git add omprimes5/{sieve.cpp,Makefile} sievetest.*_* README
link%  git commit -m "lab6: omprimes5 precompute primes to eliminate barrier"
```

Deliverables

Since you worked on both your Link account and thing2, you must merge the work on those systems in order to update your stogit repository. Carry this out using the following steps.

On thing2, push your latest commit(s).
```
thing2$  git pull origin master
thing2$  git push origin master
```
Unless you have done other push operations besides those in this lab, this pull/push should work as usual. But if you encounter a problem with the pull operation, see below.
On a link computer, attempt a pull operation
```
link%  git pull origin master
```
Since you made different changes to README on your Link account and on thing2, this will presumably cause a merge conflict (unless you already encountered a merge conflict in a prior pull). See the video on merge conflicts for more information.

To reconcile this merge conflict:
1. Edit the Link copy of README and edit the sections delimited by
```
<<<<<<<<
...
========
...
>>>>>>>>
```
  to reflect your README entries in the correct order.
2. Create a new commit containing your modified README.
```
link%  git add README
link%  git commit -m "merged entries in README from Link and thing2"
```
3. Now try pulling again. The following should now succeed (unless there have been more changes to your stogit repository since your last pull attempt).
```
link%  git pull origin master
link%  git push origin master
```

All of your code for this assignment should already be contained in commits. Modify the most recent commit message to indicate that you are submitting the completed assignment.

link%  git commit --amend

Add the following to the latest commit message.

    lab6: complete

If your assignment is not complete, indicate your progress instead, e.g.,
    lab6: items 1-5 complete, item 6 partial
You can make later commits to submit updates.

Finally, pull/push your commits in the usual way.

link%  git pull origin master
link%  git push origin master

Files: README omprimes1/omprimes1.cpp omprimes1/sieve.cpp omprimes1/Makefile omprimes2/sieve.cpp omprimes2/Makefile sievetest.sh sievetest.*_* omprimes3/omprimes3.cpp omprimes3/sieve.cpp omprimes3/Makefile omprimes4/sieve.cpp omprimes4/Makefile omprimes5/sieve.cpp omprimes5/Makefile

Use one of the git stratgies in lab1 Deliverables to submit your work in this lab. For example, if you worked exclusively on thing2, include the name thing2 in your commit message.

thing2$  cd ~/PDC
thing2$  git pull origin master
thing2$  git add -A
thing2$  git commit -m "Lab 6 submit (thing2)"
thing2$  git push origin master

Likewise, if you worked on both link machines and a thing, you can rename your lab6 folders and submit each, with commit messages indicating the computer on which that commit was created.

Also, fill out this form to report on your work.

See lab1 Deliverables section if you need to set up the cluster you used for git.

If you did your work on two different clusters, submit work from both of them using one of the strategies in lab1 Deliverables.

This lab is due by Saturday, October 22, 2016.