Parallel Scientific Computing in C++ and MPI :
A Seamless Approach to Parallel Algorithms and their Implementation

by George Em Karniadakis (Author), Robert M. Kirby II (Author)

This book provides a seamless approach to numerical algorithms, modern programming techniques and parallel computing. These concepts and tools are usually taught serially across different courses and different textbooks, thus observing the connection between them. The necessity of integrating these subjects usually comes after such courses are concluded (e.g., during a first job or a thesis project), thus forcing the student to synthesize what is perceived to be three independent subfields into one in order to produce a solution. The book includes both basic and advanced topics and places equal emphasis on the discretization of partial differential equations and on solvers. Advanced topics include wavelets, high-order methods, non-symmetric systems and parallelization of sparse systems. A CD-ROM accompanies the text.

Purchase This Book

Sample Chapter

Errata

George Em Karniadakis is Professor of Applied Mathematics at Brown University. In addition to pioneering spectral methods for unstructured grids, microfluidic simulations, and fast methods in uncertainty modeling, he has published more than 200 papers covering topics such as numerical methods, parallel computing, and various applications in fluid mechanics. He has co-autored two popular books.

Robert M. Kirby II is Assistant Professor of Computer Science at University of Utah. He Specializes in large-scale scientific computing and visualization, with particular focus on software design, parallel computing, and direct numerical simulation of flow--structure interactions.

Sample Code

Compilation Guide to the Basic Scientific Computing Mathematics Library and to the programs associated with each chapter.

Basic Scientific Computing Mathematics Library

To help facilitate presenting a large amount of C++ code quickly within the book, we have created a SCmathlib library which contain basic functions and classes used throughout the text. Some examples of such classes are the SCVector and SCMatrix classes. To examine the functions/classes available to you, use the two links provided below.

Go to SCmathlib.h for function/class declarations
Go to SCmathlib.cpp for function/class definitions

For chapters 2-10, we present an overview of what can be found on this CD. The overview for each chapter can be found by clicking on the corresponding overview link associated with each chapter. Contained within the overviews will be a brief explanation of how software on this CD is associated with code presented in the text.

COPYRIGHT

This software is in copyright. Individuals are permitted to copy the software for their personal use and for educational and research purposes only. All other use, and in particular any commercial use, requires the permission of the author.

Library purchasers may make one copy of the CD-ROM for archive purposes only. The library purchaser may make the CD-ROM available through a secure server for single-user access only, but the CD-ROM may not otherwise be networked within the library or the library's institution without the prior permission of the publisher.

DISCLAIMER

This software is provided 'as is'. No warranty, express or implied, is given as to the software's freedom from error, merchantibility, or fitness for any particular purpose. The author and the publisher disclaim any and all liability to the extent permitted by applicable law for direct, incidental, or consequential damages resulting from the use of the software. Users are specifically advised that the software should not be used in any application that could result in injury to a person and/or in damage to or loss of property.

NOTE ON CPP FILES

Due to publisher constraints, we are not allowed to place cpp files on the website. Please obtain a copy of the book with CD to get all available code.

Chapter 2: Basic Concepts and Tools

Book Chapter Introduction

In this chapter we introduce the main themes that we will cover in this book and provide an introduction for each of them. We begin with a brief overview of C++ and define the two basic concepts of functions and classes as well as other syntactic elements of the language. We then introduce basic mathematical concepts that include elements of linear algebra, vector orthogonalization, and corresponding codes and software. Finally, we introduce parallel programming and review some generic parallel architectures as well as standard parallel algorithms for basic operations, e.g., the fan-in algorithm for recursive doubling. We also provide a brief overview of the main MPI commands.

SCchapter2 Introduction and Chapter 2 Driver Programs

Within the text, there are several places where the software suite is referenced. In some cases the code is explicitly placed within the text, and at other times within the text we merely alert you that the software is available on this CD. As you read through Chapter 2, you will find that the following function/classes were discussed.

Section 2.1.1: SCVector class declaration
Section 2.2.2: float FloatMachineEps() function definition
Section 2.2.2: double DoubleMachineEps() function definition
Section 2.2.9: SCstatus GramSchmidt(SCVector * x, SCVector * q) function definition
Section 2.2.9: SCstatus GramSchmidt(SCVector * x, SCVector * q, SCMatrix &r) function definition
Section 2.2.9: SCstatus QRDecomposition(SCMatrix X, SCMatrix &Q, SCMatrix &R) function definition
Section 2.2.9: ModifiedGramSchmidt(SCVector * x, SCVector * q, SCMatrix &r) function definition

The declarations and definitions of these functions/classes can be found in the following files:

Go to the file SCchapter2.h for function/class declarations
Go to the file SCchapter2.cpp for function/class definitions

In the case that an entire program (meaning that a main() function is provided) is presented in the text, we classify this as a driver program. Unlike the functions/classes above, driver programs are complete C++ programs which can be compiled and executed. As you read through the book, you will see that driver programs are often times created by using functions/classes which are in the SCchapter files. We denote driver programs which are explicitly given in the text of the book in red. In some chapters, we present very few driver programs explicitly in the text, however we provide some example driver programs which demonstrate how to use the functions/classes with in SCchapter files. Such driver programs are denoted in black.

Section 2.1: The simplest C++ code	chapter2c0.cpp
Section 2.1.1: First non-trivial C++ program ("Hello World'')	chapter2c1.cpp
Section 2.1.1: Using functions in C++	chapter2c2.cpp
Section 2.1.2: Examination of the boolean expression `<'	chapter2c3.cpp
Section 2.1.2: Examination of the boolean expression `&&'	chapter2c4.cpp
Section 2.1.2: Algorithm Collatz-A	chapter2c5.cpp
Section 2.1.2: Algorithm Collatz-B	chapter2c6.cpp
Section 2.2.2: Program to determine machine precision	chapter2c7.cpp
Section 2.3.4: MPI - "HELLO!!'' program	chapter2c8P.cpp
Section 2.3.4: MPI - Modified ``Hello world'' type program	chapter2c9P.cpp
Section 2.3.4: Serial Summation	chapter2c10.cpp
Section 2.3.4: MPI - Parallel Summation	chapter2c11P.cpp

Chapter 3: Approximation

Book Chapter Introduction

Two of the most common tasks in scientific computing are interpolation of discrete data and approximation by known functions of the numerical solution, the source terms, and the boundary or initial conditions. Therefore, we need to perform these tasks both accurately and efficiently. The data are not always nicely distributed on a uniform lattice or grid, and thus we must learn how to manage these situations as well. We often use polynomials to represent discrete data as they are easy to "manipulate,'' i.e., differentiate and integrate. However, sines and cosines as well as special functions, called wavelets, are very effective means to perform interpolation and approximation, and they have very interesting properties.

In this section, we will study different such representations and their corresponding C++ implementations. We consider cases where the data are just sufficient to determine exactly the representation (deterministic case) as well as cases where the data are more than the information needed (overdetermined case).

Finally, we will present a more detailed discussion of MPI_Send and MPI_Recv, the two fundamental building blocks of MPI.

SCchapter3 Introduction and Chapter 3 Driver Programs

Within the text, there are several places where the software suite is referenced. In some cases the code is explicitly placed within the text, and at other times within the text we merely alert you that the software is available on this CD. As you read through Chapter 3, you will find that the following function/classes were discussed.

Section 3.1.2: double NewtonDiffTable(int npts, double *xpts, double *funcvals, double * newton_coeffs) function definition
Section 3.1.2: double NewtonDiffFunction(int start_index, int ending_index, double * xpts, double * funcvals) function definition
Section 3.1.2: double NewtonInterpolant(double x, int npts, double * xpts, double * newton_coeffs) function definition
Section 3.1.3: double LagrangePoly(double x, int pt, int npts, double * xpts) function definition
Section 3.1.3: double LagrangeInterpolant(double x, int npts, double *xpts, double * funcvals) function definition
Section 3.1.5: double ChebyshevPoly(int degree, double x) function definition
Section 3.1.7: void LS_ComputeCoeffs(int npts, double *xpts, double *funcvals, int ndeg, double *alpha, double *beta, double *lcoeffs) function definition
Section 3.1.7: double LS_OrthoPoly(int j, double x, double *alpha, double *beta) function definition
Section 3.1.8: SCVector class declaration
Section 3.1.8: SCVector class definition
Section 3.1.8: LSPoly class declaration
Section 3.1.8: LSPoly class definition
Section 3.1.10: double Square_2dInterpolant(SCPoint x, int npts, double *funcvals) function definition

The declarations and definitions of these functions/classes can be found in the following files:

Go to the file SCchapter3.h for function/class declarations
Go to the file SCchapter3.cpp for function/class definitions

Section 3.1.2: Static array allocation, initialization, and printing	chapter3c0.cpp
Section 3.1.2: Dynamic array allocation, initialization, and printing	chapter3c1.cpp
Section 3.1.2: Dynamic array allocation with initialization accomplished by a library function (CreateGrid_EvenlySpaced)	chapter3c2.cpp
Section 3.1.2: Program accomplishing Newton interpolation	chapter3c3.cpp
Section 3.1.3: Program accomplishing Lagrange interpolation	chapter3c4.cpp
Section 3.1.7: Program accomplishing Least Squares approximation (using functions)	chapter3c5.cpp
Section 3.1.8: Program accomplishing Least Squares approximation (using classes)	chapter3c6.cpp
Section 3.4: MPI - Program using MPI_Send and MPI_Recv	chapter3c7P.cpp

Chapter 4: Roots and Integrals

Book Chapter Introduction

In this chapter we apply the approximation theory we presented in chapter 3 to find solutions of linear and nonlinear equations and to perform integration of general functions. Both subjects are classical, but they serve as basic tools in scientific computing operations and in solving systems of ordinary and partial differential equations. With regards to root finding, we consider both scalar as well as systems of nonlinear equations. We present different versions of the Newton-Raphson method, the steepest descent method, and the conjugate gradient method; we will revisit the latter in chapter 9. With regards to numerical integration we present some basic quadrature approaches, but we also consider advanced quadrature rules with singular integrands or in unbounded domains.

On the programming side, we first introduce the concept of passing a function to a function; in the previous chapter we were passing variables. This allows an easy implementation of recursion, which is so often encountered in scientific computing. We offer several C++ examples from root finding and numerical integration applications that make use of recursion, and show an effective use of classes and overloaded operators. We also address parallel programming with emphasis on domain decomposition, specifically the concept of reduction operations. We introduce the MPI commands MPI_Reduce and MPI_Allreduce for accomplishing reduction operations among a collection of processes.

SCchapter4 Introduction and Chapter 4 Driver Programs

Within the text, there are several places where the software suite is referenced. In some cases the code is explicitly placed within the text, and at other times within the text we merely alert you that the software is available on this CD. As you read through Chapter 4, you will find that the following function/classes were discussed.

Section 4.1: double SquareRoot(double value, double guess, int interation) function definition
Section 4.1.4: double NewtonRaphson(double x0, double (*func)(double), double (*func_der)(double), int max_iter, int multiplicity) function definition
Section 4.1.4: double NewtonRaphson(double x0, double (*func)(double), double (*func_der)(double), int max_iter) function definition
Section 4.1.4: double NewtonRaphson(double x0, double (*func)(double), double(*func_der)(double),double *(func_secondder)(double), int max_iter, int multiplicity) function definition
Section 4.1.7: SCVector ConjugateGradient(SCMatrix A, SCVector b, SCVector x0) function definition
Section 4.2.1: double MidpointRule(int level, double xleft, double xright, double (*func)(double)) function definition
Section 4.2.1: double TrapezoidRule(int level, double xleft, double xright, double (*func)(double)) function definition
Section 4.2.1: double Romberg(int m, int k, double xleft, double xright, double (*func)(double)) function definition
Section 4.2.2: double JacobiPoly(int degree, double x, double alpha, double beta) function definition
Section 4.2.2: double JacobiPolyDerivative(int degree, double x, double alpha, double beta) function definition
Section 4.2.2: void JacobiZeros(int degree, double *z, double alpha, double beta) function definition
Section 4.2.2: void JacobiZW(int degree, double *z, double *w, double alpha, double beta) function definition
Section 4.2.2: double LaguarrePoly(int degree, double x, double alpha, double beta) function definition
Section 4.2.2: double LaguarrePolyDerivative(int degree, double x, double alpha, double beta) function definition
Section 4.2.2: void LaguarreZeros(int degree, double *z, double alpha, double beta) function definition
Section 4.2.2: void Laguarre ZW(int degree, double *z, double *w, double alpha, double beta) function definition
Section 4.2.2: double HermitePoly(int degree, double x, double alpha, double beta) function definition
Section 4.2.2: double HermitePolyDerivative(int degree, double x, double alpha, double beta) function definition
Section 4.2.2: void HermiteZeros(int degree, double *z, double alpha, double beta) function definition
Section 4.2.2: void HermiteZW(int degree, double *z, double *w, double alpha, double beta) function definition

The declarations and definitions of these functions/classes can be found in the following files:

Go to the file SCchapter4.h for function/class declarations
Go to the file SCchapter4.cpp for function/class definitions

Section 4.1.4: Newton-Raphson root finding with function passing	chapter4c0.cpp
Section 4.1.7: Conjugate Gradient example program	chapter4c1.cpp
Section 4.2.1: Example program using the Midpoint, Trapezoidal, and Simpson's rules	chapter4c2.cpp
Section 4.2.1: Example program which passes a function to the Trapezoidal Rule function	chapter4c3.cpp
Section 4.2.2: Example program for computing the zeros and weights for Legendre, Hermite, and Laguerre quadrature	chapter4c4.cpp
Section 4.3: MPI - Program demonstrating the use of MPI_Reduce	chapter4c5P.cpp

Chapter 5: Explicit Discretizations

Book Chapter Introduction

In this chapter we consider explicit discretizations of space- and time-derivatives. In such discretizations we can express directly a derivative at one grid point in terms of function values at adjacent grid points (spatial discretizations) or in terms of previous time levels (temporal discretizations). This, in turn, implies that there is no implicit coupling, and thus there is no matrix inversion involved but instead simple daxpy

The material in this chapter is relatively easy to program both on serial as well as on parallel computers. It is appropriate for demonstrating fundamental concepts of discretization as well as primary constructs of the C++ language and of the MPI library. Specifically, we will demonstrate the use of loops, arrays, functions and passing functions to functions. In addition to presenting MPI_Send and MPI_RecvMPI_Sendrecv and MPI_Sendrecv_replace as alternative advanced MPI function calls for parallelizing finite differences discretizations.

SCchapter5 Introduction and Chapter 5 Driver Programs

Within the text, there are several places where the software suite is referenced. In some cases the code is explicitly placed within the text, and at other times within the text we merely alert you that the software is available on this CD. As you read through Chapter 5, you will find that the following function/classes were discussed.

Section 5.1.1: void SO_FirstDeriv_1D(int npts, double dx, double *u, double *u_x) function definition
Section 5.1.1: void SO_FirstDeriv_1Dper(int npts, double dx, double *u, double *u_x) function definition
Section 5.1.2: void SO_SecondDeriv_1D(int npts, double dx, double *u, double *u_xx) function definition
Section 5.1.2: void SO_SecondDeriv_1Dper(int npts, double dx, double *u, double *u_xx) function definition
Section 5.1.3: void SO_FirstDeriv_1DP(int npts, double dx, double *u, double *u_x, int mynode, int totalnodes) function definition with MPI_Send/MPI_Recv
Section 5.1.3: void SO_FirstDeriv_1DP(int npts, double dx, double *u, double *u_x, int mynode, int totalnodes) function definition with MPI_Sendrecv_replace
Section 5.1.5: void FornbergWeights(double xi, double *x, int m, int n, double ***C) function definition
Section 5.1.7: void CD_SecondDeriv(int npts, double dx, double dy, double **u, double **u_xx_yy) function definition
Section 5.1.7: void CrossDerivative(int npts, double dx, double dy, double **u, double **u_xy) function definition
Section 5.2.1: double AdamsBashforth(int order, double u_old, double dt, double *RHS) function definition
Section 5.2.4: double RungeKutta4(double uold, double time, double dt, double (*rkfunc)(double,double)) function definition
Section 5.2.4: double RungeKutta(int order, double dt, double uold, double (*rkfunc)(double)) function definition

The declarations and definitions of these functions/classes can be found in the following files:

Go to the file SCchapter5.h for function/class declarations
Go to the file SCchapter5.cpp for function/class definitions

Section 5.1.2: Program for testing the non-periodic first and second derivative approximation routines	chapter5c0.cpp
Section 5.1.2: Program for testing the periodic first and second derivative approximation routines	chapter5c1.cpp
Section 5.1.3: MPI - Program for testing the non-periodic parallel first and second derivative approximation routines	chapter5c2P.cpp
Section 5.1.3: MPI - Program for testing the periodic parallel first and second derivative approximation routines	chapter5c3P.cpp

Chapter 6: Implicit Discretizations

Book Chapter Introduction

In this chapter we consider implicit discretizations of space- and time-derivatives. Unlike the explicit discretizations presented in the previous chapter, here we express a derivative at one grid point in terms of function values as well as derivative values at adjacent grid points (spatial discretization) or in terms of previous and current time levels (temporal discretization). This, in turn, implies that there is implicit coupling, and thus matrix inversion is required to obtain the solution.

The material of this chapter serves to introduce solutions of tridiagonal systems and correspondingly parallel computing of sparse linear systems using MPI. We also introduce two new MPI functions: MPI_Barrier, used for synchronizing processes, and MPI_Wtime, used for obtaining the wall clock timing information.

SCchapter6 Introduction and Chapter 6 Driver Programs

Within the text, there are several places where the software suite is referenced. In some cases the code is explicitly placed within the text, and at other times within the text we merely alert you that the software is available on this CD. As you read through Chapter 6, you will find that the following function/classes were discussed.

Section 6.1.4: void ThomasAlgorithm(int N, double *b, double *a, double *c, double *x, double *q) function definition
Section 6.1.4: void ThomasAlgorithLU(int N, double *b, double *a, double *c, double *l, double *u, double *d) function definition
Section 6.1.4: void ThomasAlgorithm_per(int N, double *b, double *a, double *c, double *x, double *q) function definition
Section 6.1.5: void ThomasAlgorithm_P(int mynode, int numnodes, int N, double *b, double *a, double *c, double *x, double *q) function definition

The declarations and definitions of these functions/classes can be found in the following files:

Go to the file SCchapter6.h for function/class declarations
Go to the file SCchapter6.cpp for function/class definitions

Section 6.1.4: Program to demonstrate the use of the Thomas Algorithm routine	chapter6c0.cpp
Section 6.1.5: MPI - Program to demonstrate the use of the parallel Thomas Algorithm routine	chapter6c1P.cpp

Chapter 7: Relaxation: Discretizations and Solvers

Book Chapter Introduction

In this chapter we present discretizations for mixed initial value/boundary value problems (IVP/BVP) as well as relaxation iterative solvers associated with such discretizations. The analogy between iterative procedures and equations of evolution, especially of parabolic type (diffusion), was realized about two centuries ago, but a rigorous connection was not established until the mid 1950s.

In the following, we first consider various mixed discretizations, and subsequently we derive some of the most popular iterative solvers. Our emphasis will be on parallel computing: A good algorithm is not simply the one that converges faster but also the one that is parallelizable. The Jacobi algorithm is such an example; forgotten for years in favor of the Gauss-Seidel algorithm, which converges twice as fast for about the same computational work, it was re-discovered in the last two decades as it is trivially parallelizable, and today is used mostly as a preconditioner for multigrid methods. The Gauss-Seidel algorithm, although faster on a serial computer, is not parallelizable unless a special multi-color algorithm is employed, as we explain in section 7.2.4. Based on these two basic algorithms, we present the multigrid method that exploits their good convergence properties but in a smart adaptive way.

On the parallel computing side, we introduce three new commands: MPI_Gather, MPI_Allgather, and MPI_Scatter. Both MPI_Gather and MPI_Allgather are used for gathering information from a collection of processes. MPI_Scatter is used to scatter data from one process to a collection of processes. In addition to providing syntax and usage information, we present the applicability of the gathering functions in the parallel implementation of the Jacobi method.

SCchapter7 Introduction and Chapter 7 Driver Programs

Within the text, there are several places where the software suite is referenced. In some cases the code is explicitly placed within the text, and at other times within the text we merely alert you that the software is available on this CD. As you read through Chapter 7, you will find that the following function/classes were discussed.

Section 7.1.4: void Diffusion_EF_CentralDifference(int N, double DN, double *uold, double *unew) function definition
Section 7.1.4: void Diffusion_EB_CentralDifference(int N, double DN, double *uold, double *unew) function definition
Section 7.2.1: int Diffusion_Jacobi(int N, double dx, double dt, double **A, double **q, double abstol) function definition
Section 7.2.2: int Jacobi_P(int mynode, int numnodes, int N, double **A, double *x, double *b, double abstol) function definition
Section 7.2.3: int Diffusion_GaussSeidel(int N, double dx, double dt, double **A, double **q, double abstol) function definition
Section 7.2.3: void GaussSeidel(int N, double **A, double *x, double *b, int iterations) function definition
Section 7.2.5: void SOR(double omega, int N, double **A, double *x, double *b, double abstol) function definition

The declarations and definitions of these functions/classes can be found in the following files:

Go to the file SCchapter7.h for function/class declarations
Go to the file SCchapter7.cpp for function/class definitions

Section 7.2.1: Program to demonstrate the use of the Diffusion_Jacobi routine	chapter7c0.cpp
Section 7.2.2: MPI - Program to demonstrate the use of the parallel Diffusion_Jacobi routine	chapter7c1P.cpp
Section 7.2.2: MPI - Program to demonstrate the use of the parallel Jacobi_P routine	chapter7c2P.cpp

Chapter 8: Propagation: Numerical Diffusion and Dispersion

Book Chapter Introduction

In this chapter we continue with mixed discretizations for initial value problems (IVP) and boundary value problems (BVP), but our emphasis is on advection equations and wave propagation with and without dissipation. Specifically, we introduce two important non-dimensional numbers that define the accuracy and the stability of discretizations we present: The Courant number or CFL condition first proposed by Courant in 1928, and the (grid) Peclet number for advection-diffusion systems. We also provide C++ functions and corresponding results that illustrate the effects of numerical diffusion and numerical dispersion in pure advection and in advection-diffusion systems.

On the parallel computing side, we introduce the concept of non-blocking communications, specifically the use of MPI_Isend and MPI_Irecv. Non-blocking communications may lead to a reduction in the computational time by allowing the programmer to appropriately intertwine computation and communication.

SCchapter8 Introduction and Chapter 8 Driver Programs

Within the text, there are several places where the software suite is referenced. In some cases the code is explicitly placed within the text, and at other times within the text we merely alert you that the software is available on this CD. As you read through Chapter 8, you will find that the following function/classes were discussed.

Section 8.1.3: void EF_CentralDifference(int N, double CFL, double *uold, double *unew) function definition
Section 8.1.3: void EF_FirstOrderUpwind(int N, double CFL, double *uold, double *unew) function definition
Section 8.1.3: void LaxFriedrichs(int N, double CFL, double *uold, double *unew) function definition
Section 8.1.3: void LaxFriedrichsTadmor(int N, double CFL, double *uold, double *unew) function definition
Section 8.1.3: void LaxFriedrichsTadmor(int N, double CFL, double *uold, double *unew, double alpha) function definition
Section 8.1.4: void EF_SecondOrderUpwind(int N, double CFL, double *uold, double *unew) function definition
Section 8.1.4: void LaxWendroff(int N, double CFL, double *uold, double *unew) function definition
Section 8.1.4: void CrankNicolson_CentralDifference(int N, double CFL, double *uold, double *unew) function definition
Section 8.1.5: void AB2_CentralDifferenceNP(int N, double CFL, double **uold, double *unew) function definition
Section 8.2.2: void CrankNicolson_CentralDifference_AdvectionDiffusion(int N, double CFL, double DN, double *uold, double *unew) function definition

The declarations and definitions of these functions/classes can be found in the following files:

Go to the file SCchapter8.h for function/class declarations
Go to the file SCchapter8.cpp for function/class definitions

Section 8.1.3: Program demonstrating the use of the Euler-Forward First-Order Upwind scheme	chapter8c0.cpp
Section 8.1.3: Program demonstrating the use of the Lax-Friedrichs scheme and Tadmor's Correction	chapter8c1.cpp
Section 8.1.: Program demonstrating the use of the Third-Order Adams-Bashforth/Central Difference scheme	chapter8c2.cpp
Section 8.2: Program demonstrating the use of the Crank-Nicolson/Central Difference for the advection-diffusion equation	chapter8c3.cpp

Chapter 9: Fast Linear Solvers

Book Chapter Introduction

We have already discussed how to solve tridiagonal linear systems of equations using direct solvers (the Thomas algorithm) in chapter 6 and some iterative solvers (Jacobi, Gauss-Seidel, SOR and multigrid) in chapter 7. We have also discussed solutions of nonlinear and linear systems and have introduced the conjugate gradient method in chapter 4. In the current chapter we revisit this subject and present general algorithms for the direct and iterative solution of large linear systems. We start with the classical Gaussian elimination (which is a fast solver!) and then proceed with more sophisticated solvers and preconditioners for symmetric and non-symmetric systems.

In parallel computing, we introduce the broadcasting command MPI_Bcast, and demonstrate its usefulness in the context of Gaussian elimination. In addition, we reiterate the use of MPI_Send,MPI_Recv, MPI_Allgather, and MPI_Allreduce through example implementations of algorithms presented in this chapter.

SCchapter9 Introduction and Chapter 9 Driver Programs

Within the text, there are several places where the software suite is referenced. In some cases the code is explicitly placed within the text, and at other times within the text we merely alert you that the software is available on this CD. As you read through Chapter 9, you will find that the following function/classes were discussed.

Section 9.1.2: void GaussElimination(SCMatrix &A, SCVector &b, int pivotflag) function definition
Section 9.3.1: void Hessenberg(SCMatrix &A) function definition
Section 9.5.1: void ModifiedArnoldi(int m, const SCMatrix &A, SCMatrix &H, SCMatrix &V) function definition
Section 9.5.3: void GMRES(int m, const SCMatrix &A, const SCVector &b, SCVector &x) function definition

The declarations and definitions of these functions/classes can be found in the following files:

Go to the file SCchapter9.h for function/class declarations
Go to the file SCchapter9.cpp for function/class definitions

Section 9.1.4: MPI - Program demonstrating parallel Gaussian elimination	chapter9c0P.cpp
Section 9.1.6: Program demonstrating cyclic reduction for triadiagonal systems	chapter9c1.cpp
Section 9.1.6: MPI - Program demonstrating parallel cyclic reduction for triadiagonal systems	chapter9c2P.cpp
Section 9.4.4: MPI - Program demonstrating the parallel preconditioned conjugate gradient method (PCGM)	chapter9c3P.cpp

Chapter 10: Fast Eigensolvers

Book Chapter Introduction

In this chapter we introduce methods for solutions of the standard eigenvalue problem A x = lambda x as well as for generalized eigenproblems; A is a square n x n matrix. The main theory is based on the solvers of the previous chapter for linear systems. Unlike, however, the methods of the previous chapter where both direct and iterative approaches are effective, in eigenvalue problems only iterative solvers are efficient. We start with the simple power method and its variants, and we proceed with more sophisticated methods including a method for non-symmetric eigenproblems using the Arnoldi iteration. We classify the different eigensolvers as local or global if they are typically used to compute one or two eigenvalues or the entire spectrum, respectively.

We introduce one new MPI function, MPI_Alltoall, and demonstrate its use through some of the algorithms presented in this chapter. In addition, we reiterate the use of MPI_Allgather and MPI_Allreduce through example implementations of algorithms presented in this chapter.

SCchapter10 Introduction and Chapter 10 Driver Programs

Within the text, there are several places where the software suite is referenced. In some cases the code is explicitly placed within the text, and at other times within the text we merely alert you that the software is available on this CD. As you read through Chapter 10, you will find that the following function/classes were discussed.

Section 10.1.2: double ISPowerMethod(SCMatrix &A, SCVector &x) function definition

The declarations and definitions of these functions/classes can be found in the following files:

Go to the file SCchapter10.h for function/class declarations
Go to the file SCchapter10.cpp for function/class definitions

Section 10.1.2: Program demonstrating the use of the Inverse Power method	chapter10c0.cpp
Section 10.2: MPI - Program demonstrating the first stage in parallel Householder Deflation	chapter10c1P.cpp
Section 10.3.5: Program demonstrating the TDQREigensolver routine used in the tridiagonal QR eigensolver	chapter10c2.cpp
Section 10.3.5: MPI - Program demonstrating the implementation of a parallel tridiagonal QR eigensolver	chapter10c3P.cpp

Compilation Guide

This directory (/CODE) contains the following:

Source files (containing function/class definitions) and header files (containing function/class declarations) for SCmathlib, a scientific computing mathematics library which we have devised for use throughout the book. SCmathlib.h/SCmathlib.cpp contain general functions/classes which we may use in all the chapters of the book. SCchapter*.h/SCchapter*.cpp (where * = 2,3,...,10) are header and source files respectively which correspond to functions/classes that were introduced in the corresponding chapter.

NOTE: The reason for creating this library is to allow the programmer to use the functions/classes declared and defined therein in any of the programs written throughout the book.

A Makefile for compiling the SCmathlib library

Directories Chapter2 - Chapter10, each of which contains programs which were written in the corresponding chapter.

A Note Concerning Compilers, Compilation and Navigation

In the writing of this software, every attempt has been made to conform to the ANSI C++ standard. Software development and testing were performed on an Intel processor based PC running Redhat Linux version 8.0. Compilation was accomplished using the GNU gcc 3.2 compiler. Although some compilation tests were accomplished using native compilers under AIX, IRIX and SunOS operating systems, proper compilation and execution are not guaranteed. No compilation tests were performed under the Microsoft Windows or Macintosh operating systems.

Navigation of this CD has been tested under both the Linux and the Microsoft Windows operating systems.

A Note to Macintosh Users: Mac OS X is required for properly navigating this CD. Prior versions of the Mac operating system appear to interpret the filenames on this CD using the 8.3 character convention, and hence filenames with the first eight characters being identical are not distinguishable.

Compilation of SCmathlib for Serial Programs

(1)Enter the directory CODE. If you do a directory listing, you should see a file called Makefile.

(2)type: gmake

After hitting return, you should see something similar (depending on what system you are running on) to the following:

g++ -O3 -c SCmathlib.cpp
g++ -O3 -c SCchapter2.cpp
g++ -O3 -c SCchapter3.cpp
g++ -O3 -c SCchapter4.cpp
g++ -O3 -c SCchapter5.cpp
g++ -O3 -c SCchapter6.cpp
g++ -O3 -c SCchapter7.cpp
g++ -O3 -c SCchapter8.cpp
g++ -O3 -c SCchapter9.cpp
ar rv libSCmathlib.a SCmathlib.o SCchapter2.o SCchapter3.o
SCchapter4.o SCchapter5.o SCchapter6.o SCchapter7.o
SCchapter8.o SCchapter9.o
a - SCmathlib.o
a - SCchapter2.o
a - SCchapter3.o
a - SCchapter4.o
a - SCchapter5.o
a - SCchapter6.o
a - SCchapter7.o
a - SCchapter8.o
a - SCchapter9.o

(3) If you now do a directory listing, you should see that the file libSCmathlib.a has been created.

Compilation of SCmathlib for Parallel Programs

(1) Enter the directory CODE. If you do a directory listing, you should see a file called Makefile.

(2) type: gmake MPISRC=1

After hitting return, you should see something similar (depending on what system you are running on) to the following:

g++ -O3 -DMPISRC -c SCmathlib.cpp
g++ -O3 -DMPISRC -c SCchapter2.cpp
g++ -O3 -DMPISRC -c SCchapter3.cpp
g++ -O3 -DMPISRC -c SCchapter4.cpp
g++ -O3 -DMPISRC -c SCchapter5.cpp
g++ -O3 -DMPISRC -c SCchapter6.cpp
g++ -O3 -DMPISRC -c SCchapter7.cpp
g++ -O3 -DMPISRC -c SCchapter8.cpp
g++ -O3 -DMPISRC -c SCchapter9.cpp
ar rv libSCmathlibP.a SCmathlib.o SCchapter2.o SCchapter3.o
SCchapter4.o SCchapter5.o SCchapter6.o SCchapter7.o
SCchapter8.o SCchapter9.o
a - SCmathlib.o
a - SCchapter2.o
a - SCchapter3.o
a - SCchapter4.o
a - SCchapter5.o
a - SCchapter6.o
a - SCchapter7.o
a - SCchapter8.o
a - SCchapter9.o

(3) If you now do a directory listing, you should see that the file libSCmathlibP.a has been created.

Serial Program Compilation Guide (from Appendix A)

For the purposes of this compilation example, we will assume that we are using the GNU g++ compiler to compile a C++ program we have written contained within the file myprog.cpp. In the following examples, the argument following the '-o' flag designates the file name to be used for the output. If no '-o' option is specified, most compilers default to using the name ''a.out''. We now present several different programming scenarios:

No user-defined libraries or user-defined header files are needed, and no special system libraries (such as those associated with math.h are needed):

g++ -o myprog myprog.cpp

No user-defined libraries or user-defined header files are needed, but the special system library corresponding to math.h is needed:

g++ -o myprog myprog.cpp -lmath

User-defined libraries, user-defined header files, and the special system library corresponding to math.h are needed:

g++ -o myprog myprog.cpp -I/users/kirby/includes -L/users/kirby/libs
-lSCmathlib -lmath

The string following the `-I' flag designates the location of the user-defined header files to be included. The string following the `-L' flag designates the location of the user-defined libraries to be included. The string `-lSCmathlib' links the program with the user-defined library we created, and the string `-lmath' links the program with the system math library corresponding to math.h

Parallel Program Compilation Guide (from Appendix B)

For the purposes of this compilation example, we will assume that we are using the GNU g++ compiler to compile a C++ program we have written contained within the file myprog.cpp. We will also assume that the machine on which you are trying to compile is a parallel machine with some version of MPI installed. You will need to contact your system administrator to find out the exact version of MPI that is available and the paths on your local architecture. In the following examples, the argument following the `-o' flag designates the file name to be used for the output. If no `-o' option is specified, most compilers default to using the name ``a.out''. We now present several different programming scenarios:

No user-defined libraries or user-defined header files are needed, and no special system libraries (such as those associated with math.h are needed) other than the MPI libraries:

g++ -o myprog myprog.cpp -lmpi

No user-defined libraries or user-defined header files are needed, but the special system library corresponding to math.h is needed along with the MPI libraries:

g++ -o myprog myprog.cpp -lmath -lmpi

User-defined libraries, user-defined header files and the special system library corresponding to math.h are needed along with the MPI libraries:

g++ -o myprog myprog.cpp -I/users/kirby/includes -L/users/kirby/libs
-lSCmathlib -lmath -lmpi

You will need to contact your system administrator to verify the exact command used on your computing architecture to run an MPI program. In general, most architectures execute MPI programs in a fashion similar to the following:

mpirun -np 4 myprog

where mpirun is a special program used for starting execution on the parallel machine, `-np 4' indicates the number of processes requested, and myprog denotes the program executable compiled as discussed above.

A Note Concerning Makefiles

For those new to the Unix programming environment, addition information concerning makefiles can be found both on the web and in many good books, one of which is Managing Projects with make by Andrew Oram and Steve Talbott. Another useful source of information for gmake (GNU make, which is used above) and g++ (the GNU C++ compiler) is the GNU website, www.gnu.org.

A Note Concerning File Permissions

Depending on how your system is configured, it may be necessary to change the directory and file permissions in order to compile within the CODE directory. If you do encounter permission problems, do the following: change directory to the parent directory of the CODE directory (from which, under Unix, you can see the directory CODE by doing an 'ls'). Type the expression within quotes: "chmod -R 755 CODE". This will set the directory CODE and all files and subdirectories to be owner-RWX (read/write/execute) and group and world RX (read/execute).

Parallel Scientific Computing in C++ and MPI : A Seamless Approach to Parallel Algorithms and their Implementation

Parallel Scientific Computing in C++ and MPI : A Seamless Approach to Parallel Algorithms and their Implementation

Sample Code

Basic Scientific Computing Mathematics Library

COPYRIGHT

DISCLAIMER

NOTE ON CPP FILES

Chapter 2: Basic Concepts and Tools

Book Chapter Introduction

SCchapter2 Introduction and Chapter 2 Driver Programs

Chapter 3: Approximation

Book Chapter Introduction

SCchapter3 Introduction and Chapter 3 Driver Programs

Chapter 4: Roots and Integrals

Book Chapter Introduction

SCchapter4 Introduction and Chapter 4 Driver Programs

Chapter 5: Explicit Discretizations

Book Chapter Introduction

SCchapter5 Introduction and Chapter 5 Driver Programs

Chapter 6: Implicit Discretizations

Book Chapter Introduction

SCchapter6 Introduction and Chapter 6 Driver Programs

Chapter 7: Relaxation: Discretizations and Solvers

Book Chapter Introduction

SCchapter7 Introduction and Chapter 7 Driver Programs

Chapter 8: Propagation: Numerical Diffusion and Dispersion

Book Chapter Introduction

SCchapter8 Introduction and Chapter 8 Driver Programs

Chapter 9: Fast Linear Solvers

Book Chapter Introduction

SCchapter9 Introduction and Chapter 9 Driver Programs

Chapter 10: Fast Eigensolvers

Book Chapter Introduction

SCchapter10 Introduction and Chapter 10 Driver Programs

Compilation Guide

This directory (/CODE) contains the following:

A Note Concerning Compilers, Compilation and Navigation

Compilation of SCmathlib for Serial Programs

Compilation of SCmathlib for Parallel Programs

Serial Program Compilation Guide (from Appendix A)

Parallel Program Compilation Guide (from Appendix B)

A Note Concerning Makefiles

A Note Concerning File Permissions

Parallel Scientific Computing in C++ and MPI :
A Seamless Approach to Parallel Algorithms and their Implementation

Parallel Scientific Computing in C++ and MPI :
A Seamless Approach to Parallel Algorithms and their Implementation