Finding large primes with MPI
Split SPLIT_ID
CS 300, Parallel and Distributed Computing (PDC)
First submission due
Saturday September 22, 2018;
complete lab due Thursday September
27, 2018
Finding large primes with MPI
compute primes up to 10k (k = 9?), standalone
rand
; generate 18-digit random digits and test for primality, reporting lowest divisor if compositewith n=10 processes, find an 18-digit prime by testing random digit strings
improve efficiency?
Preliminary material
HW1, computing primes less than N
Pseudo-random number generation with
rand()
(srand()
)Public-key encryption (e.g., RSA), using a product of large primes.
Protocol
Laboratory exercises
Login to a cluster admin node. Create a subdirectory
~/PDC/lab2
, then change your default directory to your newlab2
directory.Added instructions for Fall 2016:
Perform the following steps in order to enable multi-node MPI jobs on
cumulus
Log into
cumulus
and create an SSH key (unless you have already done so).cumulus$ ssh-keygen -t rsa
Press ENTER key in response to all prompts.Add your public key to your
authorized_keys
file.cumulus$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
The>>
above causes the shell to append to a file, creating that file if necessary; so the command above should add your public key to the end of yourauthorized_keys
file whether that file already exists or not.Why we need this step: Since your home directory is shared between
cumulus
and all thecumulusNN
cluster nodes (cumulus01, cumulus02,
etc.), this will enable you to login between thecumulus
head node and thecumulusNN
cluster nodes without supplying a password.mpiexec
will require this passwordless-SSH capability in order to start processes on other nodes.Check the success of this step by attempting to SSH between nodes, e.g.,
cumulus$ ssh cumulus02
You should successfully log intocumulus02
without having to enter a password. If you are prompted for a password, check that the contents of your file~/.ssh/id_rsa.pub
was copied correctly to the end of your file~/.ssh/authorized_keys
.Note: Only some
cumulusNN
nodes may be in service. For a list of workingcumulusNN
nodes, enter the following command oncumulus
cumulus$ cat /usr/mpihosts.new
Also do this:
SSH into each of the hosts in
/usr/mpihosts.new
once before proceeding.
Answer "yes" if SSH asks if you want to continue logging into an unknown computer. This insures that each of the cluster nodes gets added to your
~/.ssh/known_hosts
file, whichmpiexec
will need for passwordless SSH connections.Now try running an MPI job that works from the first lab, as follows:
cumulus$ cp ~/PDC/lab1/trap.cpp . cumulus$ mpiCC -o trap trap.cpp cumulus$ mpiexec -np 4 -hostfile /usr/mpihosts.new --map-by node trap
The--map-by node
command-line argument causes the MPI processes to run on multiple cluster nodes in a round robin schedule, first one process per node for each node in/usr/mpihosts.new
, then a second node per node (running on second cores), etc., until all processes are assigned.Check the success of this step by looking for normal output from a run of your program (e.g.,
trap
).Verify that MPI is actually using multiple nodes by adding the following to your
trap.cpp
program (or other test program):We will print the rank and host name (computer name) from each process in the MPI computation. Note that standard output from each process is forwarded to the standard out of the
mpiexec
command, so this will enable us to check that multiple nodes are being used.The system call
gethostname()
enables you to obtain a printable name of the computer running your program. Googleman gethostname
to find a manual page that describesgethostname()
. You can add the following lines to your code to yourtrap.cpp
program, perhaps at the point in the code just after defining and initializingintegral
.#define MAXHOSTNAME 25 const int MAXHOSTNAME=25; char hostname[MAXHOSTNAME]; gethostname(hostname, MAXHOSTNAME);
Here, we define and use the constant MAXHOSTNAME for readability and maintainability (it is easy to change that value in one place and have all its uses updated througout the code. Also add this include file near the top oftrap.cpp
:#include <unistd.h>
System calls. An (operating) system call is an operation that is carried out the operating system kernel, i.e., the code that is loaded on your computer when you boot it, then runs continually until you power your system down (or suspend it).
gethostname()
is a Linux system call; Windows would use a system call for the Windows kernel with a different name.We would like to use a C++ statement such as the following to print out process information:
cout << "Rank " << my_rank << " (" << hostname << ")" << endl;
However, the output from multiple processes might get garbled, because each operator<<
is actually a function call in C++, and one process series of<<
calls might not finish before another process's<<
calls start. (This is an example of a race condition, in which the correct behavior of a computation depends on timing. We will study race conditions later.)To avoid overlapping garbled output, we will pack our desired output into a string, then use
cout
to print that string in a single call to<<
. Add this code to your programtrap.cpp
just after yourgethostname()
call:std::stringstream ss; ss << "Rank " << my_rank << " (" << hostname << ")" << endl; cout << ss.str();
Also add the following#include
near the beginning oftrap.cpp
#include <sstream>
See thecplusplus.com
reference (used for Software Design) for specs of the standardstringstream
class.Recompile and test your program, using the
mpiexec
arguments above. You should see the usualtrap
output, plus four lines such asRank 1 (cumulus04)
Note: Those lines printed by MPI processes may occur in any order, and ordinarily will change order each time you run the MPI program, because there's no enforced scheduling order among these MPI processes.
(Followup on Lab1.) The C++ library for MPI is now deprecated, meaning that it is no longer supported in recent versions of the MPI software. Even though it is still part of the version we are using, we would rather invest our efforts in technologies that are likely to remain relevant. Also, you may already have noticed that the documentation for the C MPI library is better than the documentation for the C++ MPI library. Note that the C-language bindings can be used in C++ programs.
Copy
trap.cpp
(or a later version of the C++trap
code) from yourlab1
directory to yourlab2
directory, and modify it to use the C language MPI library (as found intrap.c
) instead of the C++ MPI library. You can still usempiCC
to compile your C++ code. Test the new version using mpiexec with SLURM, and verify that it works.Note: We will use the C-language MPI library in this and future labs.
More features of SLURM batch scripts. Create a simple MPI program
try.cpp
that prints out a one-line message including the MPI rank. Compile your program usingmpiCC
, and create a scripttry.sh
containing the following lines to run your program.#!/bin/sh #SBATCH -o try.out -e try.err #SBATCH -n 4 mpiexec -x LD_LIBRARY_PATH=/usr/lib/openmpi/lib try
You can submit this script using the command
$ sbatch try.sh
This script demonstrates additional features of SLURM job scripts:
Shell-script comments that begin with
#SBATCH
can be used to provide command-line arguments for the SLURM job submission. In particular, the two#SBATCH
commands in this script specify three command-line arguments with their values.
You can override script parameters at the
sbatch
command line. For example,$ sbatch -n 6 try.sh
causes the program to run with six processes instead of four.On a cluster admin node (e.g.,
cumulus
, copy your homework programprimes.cpp
into your subdirectory~/PDC/lab2
. For this part, haveprimes.cpp
produce a count of primes between 2 andN
=1000.Then, compile
primes.cpp
and and run usingmpiexec
, in order to produce answers from a 1-processor run ofprimes.cpp
for "3-digit" primes (up to 1000).Note. This program has no MPI calls, so it can be compiled using
g++
. You can usempiexec
even though that programprimes.cpp
does not actually need to use MPI. We could usempiexec
with other executables to make multiple simultaneous runs of a program conveniently. However, in this case, we are only using one node instead of many.After testing your code in this way, create a
commit
containing it.cumulus$ git add primes.cpp cumulus$ git commit -m "lab2: primes.cpp, copied from HW1"
How many primes can we compute this way? Try computing 4-digit primes (up to
N
=10000), 5-digit primes (up toN
=100000), etc., to find the largest number k≤9 such that a cluster node can compute k-digit primes.Carry out this computation in parallel using MPI, as follows.
Modify
primes.cpp
to callMPI_Init()
at the beginning ofmain()
andMPI_Finalize()
at the end. Use 10r+4 for the size of the sieve array, wherer
is the MPI rank. You can compute the number 10r+4 using a loop, perhaps located in a helper functionipow(base, exp)
. (This will require the header for the MPI calls.) Test this version by usingmpiexec
with-np 1
to request one processor (note that this test also performs the check for 4-digit primes). Be sure to usempiCC
to compile your program, and use the-o
and-e
arguments forsbatch
to capture standard output and standard error.Now change the array size to be 10r+5 and use
mpiexec -np 5
to request five processors, thus checking for 5-, 6-, ..., and 9-digit primes in parallel.Observe that the output from all the processors is merged in the output file. (If you are only seeing the output for 5-digit primes, try printing less output, e.g., just a count of how many primes were found.)
Also, any error output from the processes will be merged into your error file. If any of your processes generates an error (probably for trying to allocate too much memory for the array), the entire job (all processes) will be terminated immediately. As a result, you may see correct output from some processes, error output from one process, and see nothing from any other processes.
If an error arises, reduce the number of processes and resubmit the job, until you can determine the largest value of k≤9 for which array allocation succeeds.
Optional: Our sieve array uses a whole byte to represent what is essentially one bit of information (either 0 or 1). Can you achieve a greater value of k if you use a different strategy for representing the sieve array?
Now that you have determined the largest integer k≤9 such that k-digit primes can be computed using our seive method, create a file
/home/user/PDC/lab2/prime.list
consisting of the primes between 2 and 10k, one per line.Note: Do not
commit
primes.list
to git -- we want source code, not large data files, in your stogit repository.If you want to prevent
primes.list
from appearing as an "untracked file" ingit status
commands, add the linelab2/primes.list
to a file~/PDC/.gitignore
.Use an
ofstream
object to create your file. Here's an example program that prints a string to a file:#include
#include using namespace std; int main () { ofstream outfile ("/home/user/PDC/lab2/hello.out"); outfile << "Hello, world!" << endl; outfile.close(); } Note: It's possible for a cluster node to create a file in your
lab2
directory because your admin-node's home directory is shared with all the cluster nodes on that cluster (via NFS).End of first submission stage.
Submit your work up to this step (or later) by Saturday, September 22, 2018 as follows:
Use
git commit --amend
to update your most recent commit message to add the stringsubmit lab2: first submit
to the existing message. Modify that added string if you have any clarifications about this submission (e.g.,submit lab1: items 1-6 only
).Pull/push that commit.
cumulus$ git pull origin master cumulus$ git push origin master
Write a new program
big-primes.cpp
that randomly generates 2k-digit numbers and checks whether they are primes, using your fileprime.list
.This first version of
big-primes.cpp
does not need MPI. (You can compile withg++
.) However, we can still run the program usingmpiexec
in order potentially to execute multiple copies at once. Invokempiexec
with-np 1
for one processor, for now.Note: Our general strategy with 1-processor MPI jobs is to make sure our jobs run using MPI jobs in the simplified situation of a single processor, before adding in multiple processors. This incremental approach simplifies debugging, and distributed programs are notoriously difficult to debug.
Your program should begin by reading the primes in
prime.list
and storing them in an array oflong
. Open that file for input as anifstream
object. You can use the operator>>
to read integers from that file stream. You may hard-code the number of primes inprime.list
for the array size (first column in output from shell command$ wc prime.list
).Use the function
rand()
in the C++ library<cstdlib>
to generate pseudo-random numbers. For example, the following short program prints five random numbers between 0 and 1000.#include <iostream> #include <cstdlib> using namespace std; static const int N = 1000; main() { for (int i = 0; i < 5; i++) cout << (rand() % N) << endl; }
Note: The C++ operator
%
returns the remainder from integer division of its operands. For example,103%5
returns the value 3.Also note that the keyword
static
for a global variable limits the scope of that global to the current filebigprimes.cpp
.The function
rand()
does not generate arbitrarily large pseudo-random numbers. The maximum is the value ofRAND_MAX
(defined in<cstdlib>
), which you can determine for your cluster nodes by writing a short program to print that value.Create and run a simple MPI test program (with one processor) to determine the value of
RAND_MAX
, so you will getRAND_MAX
's value for a cluster node instead of the cluster's admin machine.Since the number of digits in
RAND_MAX
is probably less than 2k, you will have to fabricate long prime numbers from multiple return values fromrand()
. For example, ifRAND_MAX
exceeds 106 and 18-digit random numbers are desired, you can create one using three calls torand()
as follows.const long long M = 1000000; // million long long bigrand = (rand() % M); bigrand += (rand()%M)*M; bigrand += (rand()%M)*M*M; // bigrand is between 0 and M^3
Here, we use the type
long long
for the variablebigrand
, to insure that we have a 64-bit (8-byte) integer, in order to contain an 18-digit positive integer. (The size of a data type for a particular version of a C++ compiler can be checked using thesizeof()
operator. You can check for yourself whether the typelong
has 64-bit integers.)Note: We limited k to be no greater than 9 so that an integer with 2k digits can be contained in a
long long
variable.long long
variables can hold values between -263 and +263-1 , where 263 ~ 8×103*6.Note 2: The
gcc/g++
compiler supports a nonstandard type identifier__int128
for 128-bit integers for computers that support them. If you find that k = 9 succeeds on the cluster you're using, using a test program that simply prints the value ofsizeof(__int128)
. Submit that test program usingmpiexec
, in order to see the results for a worker node on that cluster, not the head node.To determine whether a large number
bigrand
is a prime, we will repeatedly check known primes that are less thansqrt(bigrand)
to see if they dividebigrand
. This means computingbigrand % p
for all primesp
in your array of primes. Ifbigrand % p
returns 0 for some primep
, thenbigrand
is not a prime. Ifbigrand % p
is non-zero for all primesp
in your list (which contains all primes with up to 9 digits long), thenbigrand
must be a prime, because the square root of an 18-digit number contains 9 digits.Have your program print a message indicating the value of
bigrand
and whether it was a prime; if it was not prime, also print the prime divisor you found.Run your program to check one random number with (up to) 2k digits for primality, using one cluster node.
When you know you have working code, create a
commit
containing it.cumulus$ git add big-primes.cpp cumulus$ git commit -m "lab2: big-primes.cpp, randomly generate numbers and test them for primality"
Make a quick run of your current version of
big-primes.cpp
, except substitutingbigrand
19 (a prime) just before checking for primality, to insure that your program correctly determines primes. Run this on one processor.If you need to make corrections in
big-primes.cpp
, fix the program, thencommit
with an appropriate message.Now, adapt
big-primes.cpp
to check 2k-digit random numbers until it finds at least one prime, using 8 processes in parallel.Add calls to MPI library functions
MPI_Init()
,MPI_Comm_rank()
, andMPI_Finalize()
so each copy of thebig-primes
executable can determine its rank.In order to insure that each node checks different primes, include a call to
srand(r);
before your program makes any calls torand()
, wherer
is a process's rank. The functionsrand()
assigns a random number seed, leading to different sequences of pseudo-random numbers for different ranks.-
Now add a loop for repeatedly generating a value for
bigrand
, checking it for primality, printing results to standard output, then communicating the results with other processes, until some process finds a prime.Use point-to-point communication to share with the processes whether a prime has been found. Here is a strategy for the processes to communicate results with each other and determine when to stop: Just after each process checks a value for
bigrand
for primality and prints that result to standard output, insert the following communication code.The head node (rank 0) sends messages to every non-head node (rank>0) containing one
MPI_INT
value, either a 0 (for false) or a 1 (for true), indicating whether that head node found a prime. This will require a loop ofMPI_Recv()
calls.)Each non-head node NH should receive that
MPI_INT
value from the head node, and should terminate immediately if the received value is 1. Otherwise, that non-head node NH should send anMPI_INT
value 0 or 1 to the head node, indicating whether NH found a prime, then NH should continue to the next iteration of its loop (find the nextbigrand
, test thatbigrand
, etc.).If the head node did not find a prime on its own, then the head node should now receive from every non-head node (in a loop).
If the head node receives a 1 from one of the non-head nodes, the head node should send a 1 to all non-head nodes, then the head node should terminate.
If instead the head node receives a 0 from all of the non-head nodes, then the head node should continue with the next iteration of its loop (make another
bigrand
, test thatbigrand
, etc.).You can debug your program by generating, say, 6-digit random numbers, then switch back to 2k-digit random numbers once you are convinced it works.
Note: One way to accomplish this is to encapsulate your code for generating a large random number
bigrand
in a functiongetBigrand()
with return typelong long int
, and to write a second functiongetSmallrand()
that also has return typelong long int
, but which returns a shorter random number (e.g., returnsrand()%M
). This makes it convenient to switch the initialization ofbigrand
between long and short random numbers.
Note: This protocol for communicating results among the processors is correct because:
Every process terminates as soon as the head node informs all the processes that a prime has been found. Otherwise, every process continues to look for new primes.
If any non-head process finds a prime, it will report it to the head node, and the head node will inform all processes.
If the head node finds a prime, then it will inform all processes.
When you know you have working code, create a
commit
containing it.cumulus$ git add big-primes2.cpp cumulus$ git commit -m "lab2: big-primes2.cpp, find a 2k digit prime, with broadcast and reduce"
-
Patterns in Finding large primes with MPI .
This lab exhibits some of the same patterns as we saw in the first lab.
Data Decomposition is present in the MPI version of
big-primes.cpp
, where generated random numbers constitute the data being split up among processors. We also see Data Decomposition when determining k, since we tested array lengths for five nodes in parallel. In that case, we used parallelism more for convenience than for speedup.We used the Loop Parallel pattern when we used MPI to parallelize the loop in
big-primes.cpp
. Unlike the loop in the trapezoid lab, we are not dividing the work of computing with a known range of values. Instead, we are using MPI parallelism to check multiple random values at once, in order to shorten the search time for a random large prime.Message Passing is present in the MPI version of
big-primes.cpp
, implemented in calls toMPI_Send()
andMPI_Recv()
.
Each of these MPI labs (including the first lab) also serves as an example of the following pattern.
In the SPMD, a single program is executed in parallel with multiple data values or data sets, making the computation easy to manage. In these MPI programs, literally the same program is executed on multiple computers, and even though the head process executes a variant set of instructions in order to assemble answers, SPMD is well represented in these two labs.
In the first lab, all nodes carry out the core computation of adding areas of trapezoids.
In this lab, when determining k, all nodes attempt prime computations at various scales (numbers of digits).
In the MPI version of
big-primes.cpp
, all nodes check random digit strings for primality.
The program
big-primes.cpp
also indicates some patterns we haven't encountered before.In the Master-Worker pattern, one thread of computation (the master) assigns work to other threads (the workers). In
big-primes
, the head node represents the master, and sending a message containing 0 to a non-head node constitutes assigning work to that node (which represents a worker), namely, to try another random prime. Often, as in this case, the master also collects results from the workers.The Scatter-Gather pattern is comparable to Master-Worker, but specifically refers to a master sending values to workers (scatter) then receiving results from them all (gather). Scatter-Gather would more specifically describe the protocol in
big-primes.cpp
than Master-Worker, except that the master in the Scatter-Gather pattern ordinarily sends different values to each worker, whereas this master sends the same value to each worker.
SPMD and Master-Worker are "mid-level" patterns, along with Loop Parallel. As a variant of Master-Worker, the Scatter-Gather pattern would also be at a middle level.
Create a second version
big-primes2.cpp
of your program that uses collective communication to share whether a large prime has been found. Here is a protocol:After every process checks one value, a reduce operation is performed to deliver the number of primes found to the head process. Each process transmits 1 if it found a prime and 0 otherwise.
The head uses a broadcast to inform all processes whether a prime was found, so they can know whether to try again or terminate.
Note:
Look up the arguments for a
MPI_Bcast
using online documentation.You could use a Google search for "MPI_Bcast" to find documentation for that library call. You may see references to particular implementations of MPI, such as OpenMPI (which we use on our main clusters) or MPICH (used on Raspberry Pi clusters and at some other sites' main clusters). We would expect the arguments for MPI calls to be identical for standard implementations of MPI.
When you know you have working code, create a
commit
containing it.cumulus$ git add big-primes2.cpp cumulus$ git commit -m "lab2: modify big_primes to find a 2k digit prime"
Some communication options.
MPI includes calls
MPI_Scatter()
andMPI_Gather()
that support the Scatter-Gather pattern. However,MPI_Bcast()
the andMPI_Reduce()
(both considered to be forms of Collective Communication) are a better fit for this implementation, since a single value is sent to all workers, and there is no need to distinguish between the individual responses received from those workers.If the primes found by the workers (non-head nodes) were being sent back to the master (head node) instead of just a number 0 or 1, then
MPI_Gather()
would be a better fit, although we might still useMPI_Bcast()
to inform workers that they should continue the search.If in addition the head node were to distribute candidate digit strings to the workers instead of those workers generating digit strings on their own, then the computation would finally represent a complete Scatter-Gather pattern.
We count Scatter-Gather as a mid-level pattern because the two-stage implementation pattern of scattering followed by gathering is further removed from the hardware and more closely related to a (high-level) Data Decomposition strategy for a particular application (e.g., testing multiple candidate digit sequences for primality in parallel) than low-level patterns such as Collective Communication, which more closely reflecting execution of operations in hardware (e.g., sending all nodes a message via broadcast, or combining values from each node through reduction).
Create a version
big-primes3.cpp
of your program that uses the collective callMPI_Gather()
to communicate the actual prime(s) found, if any. Each process participating in theMPI_Gather()
operation either provides the prime it found or the integer 0, indicating no prime was found.Notes:
Look up the use of
MPI_Gather()
in online resources.The first argument
sendbuf
in a call ofMPI_Gather()
is the address of a value to transmit to the head node.A later argument
recvbuf
is the address of an array of locations, one per process, each capable of holding one value from a node. TheMPI_Gather()
operation fills that array with values for the head node only. The index of a value in that array corresponds to the rank of the process providing that value.
For our application, we want
MPI_Gather()
to transmit 64-bitlong long
values, namely,bigrand
values from nodes that turn out to be prime. Investigate how to transmit 64-bit values using MPI data types.MPI data types include
MPI_INT, MPI_LONG, MPI_FLOAT,
etc. Some implementations of MPI includeMPI_LONG_LONG_INT
. A web search can be used to find a more complete list of options.We want to communicate an 8-byte integer. You can use the library function
MPI_Type_size()
to determine the number of bytes used for various MPI types.MPI_Type_size()
's second argument is the address of an integer variable (be sure to allocate it) that receives the number of bytes in its first argument. The typeMPI_LONG_LONG_INT
(if it's supported in the current MPI implementation) might be a good one to try first.Create and run a simple MPI test program for this work, so you will get values on cluster nodes instead of the admin machine (such as
cumulus
).If no predefined MPI data type has length 8 bytes, then you will have to transmit 8-byte large prime values using two 4-byte integers. In that case, you can use an array of two 4-byte integers for the first argument
sendbuf
ofMPI_Gather()
call, the value 2 for the second argumentsendcount
, and a 4-byte MPI integer type (MPI_LONG
?) for the third argumentsendtype
. You will have to break your 8-byte values into the two 4-byte array elements, perhaps using%
and>>
operators.
Use a copy of
big-primes2.cpp
as the starting point forbig-primes3.cpp
. In this version, we will retain theMPI_Bcast()
strategy for the head process to send requests to the other processes, and replaceMPI_Reduce()
by aMPI_Gather()
approach for receiving results from processes.In the head process, you can use a loop to determine whether any non-zero values were delivered by the head process itself or other processes. (This loop that includes the head process constitutes a simpler approach than the protocol used before for determining whether the head or other processes had found a prime.)
In most MPI implementations, the array argument
recvbuf
that gathers results only needs to be allocated in the head process, which is the only process that can access gathered values in that array. Use dynamic allocation for that array, since its length depends on the number of MPI processes, known only at runtime.If you had to use an array of two integer types for the
sendbuf
value, then include that factor 2 when computing the number of bytes to allocate dynamically forrecvbuf
.
Unlike previous implementations of
big-primes
, the head process will now know the value of the first large prime(s) found. Havebig-primes3.cpp
print a report of all large primes found in the firstMPI_Gather()
call that has nonzero values inrecvbuf
.As before, you can use shorter random numbers to debug your program, before applying it with 2k-digit random numbers.
When you know you have working code, create a
commit
containing it.cumulus$ git add big-primes3.cpp cumulus$ git commit -m "lab2: big-primes3.cpp, with broadcast and gather"
Create a version
big-primes4.cpp
of your program that uses the collective callMPI_Scatter()
to distribute candidate 2k-digit random numbers to test for primality.Notes:
Look up the use of
MPI_Scatter()
in online resources.For
MPI_Scatter()
, the first argumentsendbuf
is the array of values to send out; only the root process's value for this argument will have an effect.Unlike
MPI_Gather()
, for this callMPI_Scatter()
the argumentrecvbuf
points to a single value, and all processes use that argumentrecvbuf
to receive values that the root process supplied throughsendbuf
.
Use a copy of
big-primes3.cpp
as a starting point forbig-primes4.cpp
.Instead of each process generating its own random numbers, in
big-primes4.cpp
only the head process generates random numbers to test for primality. Have the head process generate enough new random numbers to fill the array argumentsendbuf
for its call toMPI_Scatter()
.Also, remove the random-number generation from the code area for non-head processes, and have them obtain their random numbers from the
recvbuf
argument in their calls toMPI_Scatter()
.In the head process, use the
MPI_Scatter()
call instead ofMPI_Bcast()
to initiate new rounds of primality testing.Use shorter random numbers to debug your program, then run it with 2k-digit random numbers. Note: We expect to discover different prime(s) than before, since all random numbers are generated from the random-number seed for the head process only, instead of using multiple random-number seeds for all processes.
When you know you have working code, create a
commit
containing it.cumulus$ git add big-primes4.cpp cumulus$ git commit -m "lab2: big-primes4.cpp, with scatter and gather"
[Optional.] Explore other variants of the
big-primes
code, such as:Write a version that produces at least P big primes. You could accumulate discovered primes in the head process until P of them are collected.
Does it make a difference if more than one random number is sent to processes at a time? Observe that an array of ct random numbers could be received by each process using the
recvbuf
argument forMPI_Scatter()
, and an array of ct results (primes or zeroes) could be sent from each process using thesendbuf
argument forMPI_Gather()
. In that approach, make sure your head-only arrays (sendbuf
forMPI_Scatter()
, andrecvbuf
forMPI_Gather()
) are expanded by a factor of ct (i.e., can hold ct*N large integer values).Do you have an idea of your own?
If you try an optional idea:
create a file
README
in yourlab2
directory that lists the filenames of your optional source code, and describes what they seek to do; andcommit
yourREADME
and your optional source files, using thecommit
messages to indicate progress towards your goals for each program.
Choosing which loops to parallelize, parallelization overhead, and cache usage.
The final version of the
big-primes
program applies the Loop Parallel pattern at the head node, when iterating over code that represents the Scatter-Gather pattern. We did not parallelize other loops in that code.The loop in which the master generates multiple candidate random digit sequences for a worker is not worth parallelizing, because the overhead of sending the small task of generating one of these random digit strings to another processor (even another core on the same computer) is far greater than simply carry out that task on the head itself.
The loop within a worker for checking whether any of that worker's candidate random digit sequences is divisible by a particular known prime does not merit parallization, also because the parallelization overhead far exceeds the division operation that might be carried out in parallel.
The loop within a worker that iterates through all primes might conceivably benefit from parallelization. However, small primes are far more likely to divide a random large number than large primes (e.g., 2 divides half of large random numbers, whereas the prime 99999989 divides only 1/99999989 of all large random numbers). It is unclear that the savings from this parallelization for those rare instances when small primes do not eliminate a random large number would exceed the overhead of parallelization.
On the other hand, the computation time required to find a dozen large primes is much less than twelve times computation required to find a single large prime, because of cache usage...______
Deliverables
All of your code for this assignment should already be
contained in commit
s. Modify the most recent
commit
message to indicate that you are submitting
the completed assignment.
cumulus$ git commit --amendAdd the following to the latest
commit
message.
lab2: complete
If your assignment is not complete, indicate your progress instead, e.g.,lab2: items 1-5 complete, item 6 partialYou can make later commits to submit updates.
Finally, pull/push
your commit
s in
the usual way.
cumulus$ git pull origin master cumulus$ git push origin master
Submit your work using
git
on the cluster head
node you used. Include the cluster name in your commit message, e.g.,
mist$ cd ~/PDC mist$ git pull origin master mist$ git add -A mist$ git commit -m "Lab 2 submit (mist)" mist$ git push origin masterAlso, fill out this form to report on your work.
See lab1 Deliverables
section if you need to set up the cluster you used for
git
.
If you did your work on two different clusters, submit work from both of them using one of the strategies in lab1 Deliverables.
This lab is due by Sunday, September 25, 2016.
Files: primes.cpp big-primes.cpp big-primes2.cpp big-primes2.cpp big-primes3.cpp big-primes4.cpp