Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Iterative Solution of Large Linear Systems
Iterative Solution of Large Linear Systems
Iterative Solution of Large Linear Systems
Ebook1,163 pages7 hours

Iterative Solution of Large Linear Systems

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This self-contained treatment offers a systematic development of the theory of iterative methods. Its focal point resides in an analysis of the convergence properties of the successive overrelaxation (SOR) method, as applied to a linear system with a consistently ordered matrix. The text explores the convergence properties of the SOR method and related techniques in terms of the spectral radii of the associated matrices as well as in terms of certain matrix norms. Contents include a review of matrix theory and general properties of iterative methods; SOR method and stationary modified SOR method for consistently ordered matrices; nonstationary methods; generalizations of SOR theory and variants of method; second-degree methods, alternating direction-implicit methods, and a comparison of methods. 1971 edition.
LanguageEnglish
Release dateJul 24, 2013
ISBN9780486153339
Iterative Solution of Large Linear Systems

Read more from David M. Young

Related to Iterative Solution of Large Linear Systems

Titles in the series (100)

View More

Related ebooks

Mathematics For You

View More

Related articles

Reviews for Iterative Solution of Large Linear Systems

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Iterative Solution of Large Linear Systems - David M. Young

    Chapter 1 / INTRODUCTION

    In this book we study numerical methods for solving linear systems of the form

    (1)

    where A is a given real N × N matrix and b is a given real column vector of order N. It is desired to determine the unknown column vector u. For the case N = 3, the system (1) may be written in the alternative forms

    (2)

    or

    (3)

    We shall be primarily concerned with cases where N is large, say in the range 10³ to 10⁶, and where the matrix A is sparse, i.e., has only a few nonzero elements as compared to the total number of elements of A. We shall assume that the matrix has one or more of a number of properties, described in Chapter 2, such as weak diagonal dominance, irreducibility, symmetry, etc. In nearly every case the assumed properties will be sufficient to guarantee that the matrix A is nonsingular.

    We shall study various iterative methods for solving (1). Such methods appear to be ideally suited for problems involving large sparse matrices, much more so in most cases than direct methods such as the Gauss elimination method. A typical iterative method involves the selection of an initial approximation u(0) to the solution of (1) and the determination of a sequence u(1), u(2), ... according to some algorithm, which, if the method is properly chosen, will converge to the exact solution ū of (1). The use of such an algorithm has the advantage that the matrix A is not altered during the computation. Hence, though the computation may be long, the problem of the accumulation of rounding errors is less serious than for those methods, such as most direct methods, where the matrix is changed during the computation process.

    1.1. THE MODEL PROBLEM

    In order to illustrate our discussion we shall frequently consider the following model problem. Let G(x, y) and g(x, y) be continuous functions defined in R and S, respectively, where R is the interior and S is the boundary of the unit square 0 ≤ x ≤ 1, 0 ≤ y ≤ 1. We seek a function u(x, y) continuous in R + S, which is twice continuously differentiable in R and which satisfies Poisson’s equation

    (1.1)

    On the boundary, u(x, y) satisfies the condition

    (1.2)

    If G(x, y) ≡ 0, then (1.1) reduces to Laplace’s equation

    (1.3)

    and the model problem is a special case of the Dirichlet problem.

    We shall primarily be concerned with the linear system arising from the numerical solution of the model problem using a five-point difference equation. We superimpose a mesh of horizontal and vertical lines over the region with a uniform spacing h = M−¹, for some integer M. We seek to determine approximate values of u(x, y) at the mesh points, i.e., at the intersections of these lines. For a given mesh point (x, y) we use the usual central difference quotients to approximate the partial derivatives. Thus we use the approximations

    (1.4)

    Replacing the partial derivatives by difference quotients in (1.1) and multiplying by −h² we obtain the difference equation

    (1.5)

    This equation together with the boundary condition (1.2) defines a discrete analog of the model problem. Evidently from (1.5) we obtain one linear algebraic equation for each interior mesh point, i.e., one equation for each unknown value of u.

    To be specific, if G(x, y) ≡ 0 and if h , we have the situation shown in Figure 1.1. We seek approximate values of u1 = u(x1, y1,), u2, u3, and u4. The values of u at the points labeled 5, 6, ... are determined by (1.2). Thus we have u5 = g5, u6 = g6, etc. From (1.5) we obtain

    (1.6)

    .

    Moving the known values to the right-hand side, we have the linear system

    (1.7)

    which can be written in the form (1),

    (1.8)

    Suppose now that g(x, y) = 0 on all sides of the square except where y = 1, where g(x, y) = 4500x(1 − x). All boundary values vanish except u9 = u10 = 1000. The system (1.8) becomes

    (1.9)

    We can solve (1.9) directly, obtaining

    u1 = 125,    u2 = 125,    u3 = 375,    u4 = 375

    While the matrix of (1.9) is not particularly sparse, we note that no matter how small h , the matrix has order 19² = 361. However, out of 361² = 130,321 elements in the matrix, fewer than 1800 are nonzero. Thus, by any reasonable definition the matrix is indeed sparse.

    We now apply a very simple iterative method to the system (1.9). We simply solve the first equation for u1, the second for u2, etc., obtaining

    (1.10)

    , ... in the right-hand sides. Thus, we have

    (1.11)

    This method is known as the Jacobi , we have the values shown in the accompanying tabulation.

    While this method converges fairly rapidly in the simple case shown, it is very slow indeed if the mesh size h . Thus, for instance, in the second equation of (1.11) one would use the formula

    Neither the Jacobi nor the Gauss-Seidel method is satisfactory even for the model problem unless h is fairly large. By a slight modification of the Gauss-Seidel method one can dramatically improve the convergence in certain cases. Still further improvements can also be obtained by other methods in certain cases. Our purpose in this book is to study a number of iterative methods from the standpoints of speed of convergence and suitability for use on high-speed computers. In order to do so we shall make extensive use of matrix theory. In Chapter 2 we shall review matrix theory and define certain matrix properties which we shall use in our studies.

    SUPPLEMENTARY DISCUSSION

    The use of direct methods, even for solving very large problems, has received increasing attention recently (see, for instance, Angel, 1970). In some cases their use is quite appropriate. However, there is the danger that if one does not properly apply iterative methods in certain cases one will incorrectly conclude that they are not effective and that direct methods must be used. It is hoped that this book will provide some guidance for the proper application of iterative methods so that a decision on whether or not to use them can be made on a sound basis.

    Section 1.1. The model problem has been used frequently in the study of the behavior of iterative methods for solving large linear systems. The term model problem for the problem (1.2)−(1.3) was introduced by Varga [1962].

    EXERCISES

    Section 1.1

    1. Solve (1.10) using the Gauss-Seidel method. Use the same starting values as were used for the Jacobi method.

    2. where the {ui} are exact values. Show that the ratios ε(n+1) / ε(n) approach a limit. Do the same for the Gauss-Seidel method.

    3. Consider the discrete analog of the model problem for (1.1) where G(x, y) = x + y + 1 and g(x, y) = 1 + xby: direct elimination; by the Jacobi method; by the Gauss-Seidel method. Use starting values of zero and iterate until the iterants do not change in the third significant figure.

    4. what is the ratio of nonvanishing elements of A to the total number?

    Chapter 2 / MATRIX PRELIMINARIES

    In this chapter we shall present basic facts about matrix theory which we shall later use in the study of iterative methods. We assume that the reader is already familiar with the general theory of matrices as presented, for instance, in Faddeev and Faddeeva [1963] and Birkhoff and MacLane [1953]. We shall review some of this theory in Section 2.1 and shall also state, often without proof, some of the less elementary facts which will be used later. Special attention is given to Hermitian and positive definite matrices in Section 2.2. Sections 2.3 and 2.4 are devoted to vector and matrix norms and to the convergence of sequences of vectors and matrices. Sections 2.5-2.7 are devoted to matrices with certain properties. Finally, in Section 2.8 we show how one obtains one or more of these properties when one uses finite difference methods to obtain numerical solutions of certain problems involving partial differential equations.

    2.1. REVIEW OF MATRIX THEORY

    A rectangular N × M matrix A is a rectangular array of complex numbers

    (1.1)

    If N = M, then the matrix is said to be a square matrix of order N. We shall use capital letters for matrices, and, unless otherwise specified, a matrix indicated by a capital letter is assumed to be square. The elements of a matrix will be represented by lowercase letters with two subscripts.

    If N = 1, the matrix is a row matrix or row vector. Similarly, if M = 1 the matrix is a column matrix or column vector. We shall denote a column vector by a small letter and the elements of the vector will be denoted by lowercase letters with a single subscript. Thus we have

    (1.2)

    The diagonal elements of a matrix A are the elements a1,1, a2,2, ..., aN,N. For a diagonal matrix, all off-diagonal elements vanish. The identity matrix of order N, usually denoted by I, is a diagonal matrix all of whose diagonal elements are unity. If ai,j = 0 for i < j (i j), then A is a lower triangular (strictly lower triangular) matrix. If ai,j = 0 for i > j (i j), then A is an upper triangular (strictly upper triangular) matrix.

    The sum of two N × M matrices A = (ai,j) and B = (bi,j) is an N × M matrix C = (ci,j) where ci,j = ai,j + bi,j. The product of an N × J matrix A and a J × M matrix B is an N × M matrix C where

    (1.3)

    The trace of a matrix A = (ai,j) is given by

    (1.4)

    We define the matrix obtained from A by replacing all off-diagonal elements of A by zero as "diag A." Thus we have

    (1.5)

    Given an N × M matrix A = (ai,j) we obtain the transpose of A, denoted by AT by interchanging the rows and columns of A. Thus if A is given by (1.1), then

    (1.6)

    Given a complex number z we denote the complex conjugate of z by z* as is more usual). The conjugate of A, denoted by A* is obtained from A by replacing each element by its complex conjugate. Thus

    (1.7)

    The conjugate transpose of A, denoted by AH, is the transpose of A* (also the conjugate of AT). Thus we have

    (1.8)

    Evidently we have

    (1.9)

    and for any two rectangular matrices A and B such that AB is defined we have

    (1.10)

    If A = AT, then A is symmetric (usually the property of symmetry is useful only for real matrices). If A = AH, then A is Hermitian. If A is real and AAT = I, then A is orthogonal. If AAH = I, then A is unitary. If AAH = AHA, then A is normal.

    If A is square and if there exists a square matrix B such that AB = I or BA = I, then A is nonsingular and we let B = A−¹. If A is nonsingular, then the inverse matrix, denoted by A−1, is unique and AA−1 = A−1A = I. Moreover, if A and B are nonsingular, then AB is nonsingular and

    (1.11)

    A permutation matrix P = (pi,j) is a matrix with exactly one nonzero element, namely unity, in each row and each column. Thus, for example,

    is a permutation matrix. A permutation matrix corresponds to a permutation function, σ(i), defined for i = 1, 2, ..., N by

    σ(i) =j

    where pi,j = 1. In general, a permutation function σ is a single-valued function defined for i = 1, 2, ..., N and with a single-valued inverse σ−1 also defined for i = 1, 2, ..., N. Thus in the above example we have

    σ(1) = 2,   σ(2) = 4,   σ(3) = 3, σ  (4) = 1

    and

    σ−1(1) = 4,   σ−1(2) = 1,   σ−1(3) = 3,   σ−1(4) = 2

    It is easy to verify that if P corresponds to σ, then P−¹ = PT corresponds to σ−1. Thus in the example we have

    which corresponds to σ−1.

    Vector Spaces

    Let EN denote the set of all N × 1 column matrices, which we shall henceforth refer to as vectors. The reader is assumed to be familiar with the notions of vector spaces and subspaces, linear independence, etc. A basis for EN is a set of N linearly independent vectors of EN. Given such a basis, say v(1), v(2), ..., v(N), then any vector v EN can be expressed uniquely as a linear combination of basis vectors. Thus there exists a unique set of numbers c1, c2, ..., cN such that

    (1.12)

    For any subspace W of EN, every basis for W has the same number of vectors. This number, which is independent of the choice of bases, is known as the dimension of W.

    Given any two vectors v = (v1, v2, ..., vN)T and w = (w1, w2, ..., wN)T we define the inner product of v and w by

    (1.13)

    Evidently we have for any v and w

    (1.14)

    and for any matrix A

    (1.15)

    We define the Euclidean length of a vector v = (v1, v2, ..

    (1.16)

    It is easy to verify the Schwarz inequality which states

    (1.17)

    Two vectors v and w are orthogonal if (v, w) = 0. A set of vectors v(1), v(2), ..., v(p) is pairwise orthogonal if (v(i), v(j)) = 0 for i ≠ j. They are orthonormal if

    (1.18)

    Any set of nonvanishing pairwise orthogonal vectors is linearly independent. The columns of an orthogonal matrix form an orthonormal set of vectors as well as a basis for EN. The same is true for the columns of a unitary matrix.

    Determinants

    The determinant of an N × N matrix A, denoted by det A, is the sum of N! terms of the form

    (1.19)

    where σ is a permutation of the integers 1, 2, ..., N. Here s(σ) = 1 or s(σ) = − 1 if the permutation σ is even or odd, respectively. A permutation σ is even if the sequence σ(1), σ(2), ..., σ(N) can be put in the form 1, 2, ..., N by an even number of transpositions, i.e., interchanges of any pair of distinct elements of the sequence. Otherwise the permutation is odd. Thus the permutation σ(1) = 2, σ(2) = 4, σ(3) = 5, σ(4) = 1, σ(5) = 3, σ(6) = 6 is odd since the sequence 2, 4, 5, 1, 3, 6 can be put in the form 1, 2, 3, 4, 5, 6 by the three transpositions: 1 ↔ 2, 2 ↔ 4, 3 ↔ 5. In practice it is usually more convenient to count the number of inversions in the array σ(1), σ(2), ..., σ(N). An inversion occurs when σ(j) < σ(i) even though i < j. Thus in the example we have the five inversions (2,1), (4,1), (4,3), (5,1), (5,3). Using either procedure we find that the sign of the term a1,2a2,4a3,5a4,1a5,3a6,6 in the expansion of the determinant of the 6 × 6 matrix A = (ai,j) is negative.

    We shall assume that the basic properties of determinants are known. We now give an alternative method for determining the sign of a term in a determinant which will be used in Chapter 10. A cycle is a permutation σ such that σ(i) = i except that for s distinct integers i1, i2, ..., is we have

    σ(i1) = i2,   σ(i2) = i3, .... σ(is−1) = is,   σ(is) = i1

    We designate such a cycle by (i1, i2, ..., is). A cycle is even or odd depending on whether s is even or odd. Any permutation can be written as the product of disjoint cycles, and the permutation is even or odd depending on whether or not there are an even number of even cycles. Thus in the example given above, σ is the product of the cycles (1, 2, 4), (3, 5), (6). Since there is only one even cycle, the permutation is odd.

    We now state without proof

    Theorem 1.1. If A and B are N × N matrices, then

    (1.20)

    Systems of Linear Equations

    Of fundamental importance is the following theorem.

    Theorem 1.2. If det A ≠ 0, then for given b the linear system

    (1.21)

    has a unique solution which is given by

    (1.22)

    where Δk = det Ak and where Ak is the matrix obtained from A by replacing the kth column of A by b.

    The above representation of the solution of a linear system is known as Cramer’s rule. While important theoretically, it is not practical to actually carry out unless the order of A is very small. If one wishes to use a direct method, the Gauss elimination method is much superior. As already stated, we shall be primarily concerned with iterative methods as applied to very large systems with sparse matrices.

    Theorem 1.3. A matrix A is nonsingular if and only if det A ≠ 0.

    One can in principle determine A−1 by solving the N linear systems Av(k) = e(k), k = 1, 2, ..., Nif i = k otherwise. The matrix whose columns are the v(k) is A−1. Once one has determined A−1 one can easily solve (1.21) for any b by multiplying both sides by A−1 obtaining

    (1.23)

    Theorem 1.4. There exists a vector u ≠ 0 such that Au = 0 if and only if A is singular.

    The fact that Au = 0 has only the trivial solution u = 0 if A is nonsingular follows from Theorem 1.2. We also have

    Theorem 1.5. The linear system (1.21) has a unique solution if and only if A is nonsingular. If A is singular, then (1.21) either has no solution or else it has an infinite number.

    Proof. Suppose that A is singular and (1.21) has a solution, say u. By Theorem 1.4 there exists a solution v ≠ 0 of Av = 0. Hence for any constant c the vector ū + cv satisfies (1.21). Hence the solution of (1.21) is not unique and, in fact, there are an infinite number of solutions. The rest of the proof follows from Theorem 1.2.

    The null space (A) of an N × M matrix A is the set of all v EM such that Av = 0. The range ℛ(A) of A is the set of all w EN such that for some x EM we have Ax = w. Evidently, if A is an N × M matrix, then (1.21) has a solution if and only if b ∈ ℛ(A).

    The column rank of an N × M matrix A is the dimension of the subspace of EN spanned by the columns of A. Similarly, the row rank is the dimension of the subspace of EM spanned by the rows of A. The row rank and the column rank are equal, and their common value equals the rank. If P is a nonsingular N × N matrix, then PA has the same rank as A. For a square matrix of order N the sum of the row rank and the nullity (i.e., the dimension of the null space) is equal to the order of the matrix. The system (1.21), where A is an N × M matrix, has a solution if and only if the rank of the augmented matrix (A b) is the same as the rank of A.

    Eigenvalues and Eigenvectors

    An eigenvalue of a matrix A is a real or complex number λ such that for some v ≠ 0 we have

    (1.24)

    An eigenvector of A is a vector v such that v ≠ 0 and for some λ (1.24) holds. From Theorem 1.4 we have

    Theorem 1.6. The number λ is an eigenvalue of A if and only if

    (1.25)

    Evidently (1.25) is a polynomial equation, referred to as the characteristic equation of A, of the form

    (−1)N det(A−λI)=λN−(traceA)λN−1+ ┈ +(−1)N det A=0 (1.26)

    Since the sum and product of the roots of (1.26) are trace A and det A, respectively, we have

    Theorem 1.7. If A is a square matrix of order N with eigenvalues λ1, λ2, ..., λN, then

    (1.27)

    Theorem 1.8. If λ1, λ2, ..., λk are distinct eigenvalues of A and if v(1), v(2), ..., v(k) are associated eigenvectors, then the {v(k)} are linearly independent.

    Two matrices A and B of order N are similar if B = P−1AP for some nonsingular matrix P. Similar matrices have the same rank.

    Theorem 1.9. If A and B are similar, then they have the same eigenvalues. Moreover, λ is an eigenvalue of A of multiplicity k if and only if λ is an eigenvalue of B of multiplicity k.

    We now define the spectral radius of a matrix A as

    (1.28)

    where SA is the set of all eigenvalues of A (i.e., the spectrum of A).

    The eigenvalues of a matrix are continuous functions of the elements of the matrix. Thus we have

    Theorem 1.10. Given a matrix A = (ai,j) and any ε > 0 there exists δ > 0 such that if A′ = (a′i,j) is any matrix with | a′i,j ai,j |< δ for all i and j, the following holds: If λ1, λ2, ..., λN are the eigenvalues of A and λ1′, λ2′, ... , λN′ are the eigenvalues of A′, then for some permutation σ(i) of the integers 1, 2, ..., N we have

    (1.29)

    (A proof of this result can be found in Appendix A of Ostrowski [1960].)

    As an application of Theorem 1.10 we have

    Theorem 1.11. If A and B are square matrices, then AB and BA have the same eigenvalues. Moreover, λ is an eigenvalue of AB of multiplicity k if and only if λ is an eigenvalue of BA of multiplicity k.

    Proof. If B is nonsingular, then B−1 exists and BA is similar to B−1(BA)B = AB, and the result follows from Theorem 1.9. A similar argument holds if A is nonsingular.

    If both A and B are singular, then 0 is an eigenvalue of B. Let δ be the modulus of the nonzero eigenvalue of B of smallest modulus. If 0 < ε < δ, then B + εI is nonsingular and the eigenvalues of A(B + εI) are the same as those of (B + εI)A. As ε → 0 both sets of eigenvalues converge to the eigenvalues of AB and BA, respectively, by Theorem 1.10.

    A somewhat weaker version of Theorem 1.11 can be proved without using the continuity argument.

    Theorem 1.12. If A is an N × M matrix and if B is an M × N matrix, then λ ≠ 0 is an eigenvalue of AB if and only if λ is an eigenvalue of BA. If M = N, then the conclusion is true even for λ = 0.

    Proof. If λ ≠ 0 is an eigenvalue of AB and if v is an eigenvector, then ABv = λv ≠ 0 since λ ≠ 0 and v ≠ 0. Hence Bv ≠ 0. Therefore, BABv = λBv and λ is an eigenvalue of BA.

    If A and B are square matrices and λ = 0 is an eigenvalue of AB with eigenvector v, then, by Theorems 1.1 and 1.4 we have det AB = det BA = 0. Hence λ = 0 is an eigenvalue of BA.

    then

    The eigenvalues of AB is the only eigenvalue of BA.

    We remark that the conclusion of Theorem 1.12 is weaker than that of Theorem 1.11 in the case of square matrices since we do not assert that the eigenvalues of AB, together with their multiplicities, are the same as those of BA.

    Canonical Forms

    By means of similarity transformations involving matrices with certain properties one can reduce a given matrix to one or more special forms. In some cases one can obtain a diagonal matrix by a similarity transformation involving a unitary transformation. Thus we have

    Theorem 1.13. If A is a normal matrix then there exists a unitary matrix P such that P−1AP is a diagonal matrix. If A is Hermitian, then P−1AP is real. If A is real and symmetric, then P can be taken to be an orthogonal matrix.

    We remark that in each case the columns of P form an orthogonal set of eigenvectors of A.

    In the general case one cannot always reduce A to diagonal form by means of unitary similarity transformations or even with more general transformations. However, one can always obtain an upper triangular form by unitary similarity transformations and a special type of bi-diagonal form by more general transformations. Thus we have

    Theorem 1.14. For any matrix A there exists a unitary matrix P such that P−1AP is an upper triangular matrix whose diagonal elements are the eigenvalues of A.

    This result is due to Schur [1909]. An elementary proof is given by Rheinboldt [1969] using a result of Householder [1958]. The proof of the following theorem, on the other hand, is somewhat long and complicated.

    Theorem 1.15. For any matrix A there exists a nonsingular matrix P such that P−1AP has the form

    (1.30)

    where each block Ji has the form

    (1.31)

    and where the λi are the eigenvalues of A. Moreover, if J’ is any matrix of the form of J such that for some nonsingular matrix Q we have Q−1AQ = J’, then J and J’ are identical except for possible permutation of the diagonal blocks.

    The matrix J is said to be a Jordan canonical form of A. It is unique except as noted above. As an immediate corollary we note that if the eigenvalues of A are distinct, then the Jordan canonical form of A is diagonal. We also note that two similar matrices have the same Jordan canonical form, up to permutations of the Jordan blocks.

    It is easy to see that for each block Ji there is one column of P which is an eigenvector of A. The remaining columns of P associated with Ji are said to be principal vectors of A. We say that the grade of a principal vector v is the smallest integer k such that

    (1.32)

    where λ is the eigenvalue associated with v. An eigenvector is thus a principal vector of grade one. For a block Ji of size q there corresponds one principal vector for each grade 1, 2, ..., q.

    Nonnegative Matrices

    If every element of A is real and nonnegative (positive), we say that A is nonnegative and A ≥ 0 (positive and A > 0). If A ≥ 0, B ≥ 0, and A B ≥ 0 we say that A ≥ B. Similarly, if every element of a vector v is nonnegative (positive) we say that v ≥ 0 (v > 0). We let ∣A∣ denote the matrix whose elements are the moduli of the corresponding elements of A.

    The Perron-Frobenius theory of nonnegative matrices provides many theorems concerning the eigenvalues and eigenvectors of nonnegative matrices. (See, for instance, Varga [1962], Chapter 2.) We shall use the following results.

    Theorem 1.16. If A ≥ ∣B∣, then S(A) ≥ S(B).

    Theorem 1.17. If A ≥ 0, then S(A) is an eigenvalue of A, and there exists a nonnegative eigenvector of A associated with S(A).

    Proofs of Theorems 1.16 and 1.17 are given by Oldenberger [1940] and by Frobenius [1908], respectively. Many other proofs are given in the literature. In Section 2.7 we shall prove a stronger form of Theorem 1.17 for the case where A is real and symmetric.

    2.2. HERMITIAN MATRICES AND POSITIVE DEFINITE MATRICES

    We first prove the following characteristic property of Hermitian matrices.

    Theorem 2.1. A is an Hermitian matrix if and only if (v, Av) is real for all v.

    Proof. Suppose that (v, Av) is real for all v. Let A = B + iC, where B and C are real. If v is real, then

    (2.1)

    and hence

    (2.2)

    for all real v. Moreover, since C is real we have

    (2.3)

    by (2.2). Therefore we have

    (2.4)

    for all real v. But this implies

    (2.5)

    To see this, consider for each i and j the vector v(i,j) unless k = i or k = j . In order that (2.4) be satisfied for v(i,j), the (i,j) element of C + CT must vanish.

    Next, let v = u + iw where u and w are real. Then the imaginary part of (v, Av) is

    (2.6)

    by (2.2). We now show that

    (2.7)

    Given i and j let

    (2.8)

    Evidently J = (u(i,j), (B − BT)w(i,j)) is the (i,j) element of B BT. Since J vanishes for all real u and w, it follows that this element must vanish and (2.7) holds. By (2.5) and (2.7) it follows that A is Hermitian since

    AH = (B + iC)H = BT − iCT = B + iC = A.

    If A is Hermitian, then (v, Av) is real for all v since by (1.15) we have

    (v, Av)* = (Av, v) = (v, AHv) = (v, Av)

    This completes the proof of Theorem 2.1.

    From Theorem 1.13 the eigenvalues of an Hermitian matrix are real. We now give a method for determining bounds on these eigenvalues.

    Theorem 2.2. If A are the largest and the smallest eigenvalues of A, respectively, then

    (2.9)

    where and υ , respectively.

    Proof. Let the eigenvalues of A be λ1, λ2, ..., λN . By Theorem 1.13 there exists an orthonormal set of eigenvectors v(1) v(2), ..., v(N) corresponding to λ1, λ2, ..., λN. Since the {v(i)} form a basis of EN, then for any v EN we have

    (2.10)

    for suitable constants c1, c2 , ... , cN. If v ≠ 0, then

    (2.11)

    and

    (2.12)

    Evidently

    (2.13)

    where

    (2.14)

    and

    (2.15)

    It follows that (v, Av)/(v, v) is a weighted average of the λi with non-negative weights. Thus we have for all v

    and (2.9) follows.

    There are many definitions given for positive definite matrices. We shall use the following

    Definition 2.1. A matrix A is positive definite if A is Hermitian and

    (2.16)

    for all v ≠ 0. If (v, Av) ≥ 0 for all v, then A is nonnegative definite.

    Similar definitions can be given for negative definite and nonpositive definite matrices.

    From Theorem 2.2 we have

    Theorem 2.3. A matrix A is positive definite (nonnegative definite) if and only if it is Hermitian and all of its eigenvalues are positive (non-negative).

    Theorem 2.4. The real part of a positive definite (nonnegative definite) matrix is positive definite (nonnegative definite).

    Proof. Let A = B + iC where B and C are real be a positive definite matrix and let v = u + iw where u and w are real. Since B = BT the imaginary part of (v, Bv) = 0 by (2.6). Therefore

    (v, Bv) = (u, Bu) + (w, Bw)

    But (x, Bx) > 0 for all real x > 0 since

    (x, Ax) = (x, (B + iC)x) = (x, Bx) + i(x, Cx)

    is real and positive. Therefore (v, Bv) is real and positive for all v ≠ 0 and, by Theorem 2.1, B is positive definite.

    Theorem 2.5. An Hermitian matrix of order N is positive definite (nonnegative definite) if and only if every subdeterminant formed by deleting any m rows and the corresponding columns is positive (non-negative), where 0 ≤ m < N. In particular, det A and all diagonal elements of A are positive (nonnegative).

    Proof. Let à be any matrix formed by deleting certain rows and the corresponding columns of A. Let v be any vector such that vi be the vector formed from v by deleting the (zero) elements corresponding to deleted rows of A. Evidently, we have

    If A , then v ≠ 0 and (v, Av, and à must be positive definite. Since all eigenvalues of à are thus positive and since the product of the eigenvalues is det Ã, we have det à > 0.

    Suppose, on the other hand, that all subdeterminants are positive. Then the coefficients of the characteristic polynomial of A have alternating signs. Thus, for example, in the case N = 4, the characteristic polynomial is

    where = det A, ∆i is the determinant formed by deleting the ith row and column, etc. Since all ’s are positive, it follows that for λ < 0 the characteristic polynomial is positive; hence there can be no negative roots. Therefore, by Theorem 2.3, A is positive definite.

    For any matrix L we have

    (v, LLHv) = (LHv, LHv) ≥ 0

    for all v. Moreover, if (v, LLHv) = 0, and if L is nonsingular, then LHv = 0 and v = 0. Thus we have

    Theorem 2.6. For any matrix L the matrix LLH is Hermitian and nonnegative definite. If L is nonsingular, then LLH is positive definite.

    We now show that given a positive definite matrix A we can find a positive definite square root. Thus we have

    Theorem 2.7. If A is a positive definite (nonnegative definite) matrix, then there exists a unique positive definite (nonnegative definite) matrix B such that

    (2.17)

    Moreover, if A is real, then B is real. The matrix B satisfying (2.17) is usually denoted by A¹/².

    Proof. Since A is Hermitian, by Theorem 1.13 there exists a unitary matrix P such that P−1AP = Λ where Λ is a diagonal matrix with positive diagonal elements, say λi. If Λ¹/² is the diagonal matrix with diagonal elements (λi)¹/², then (Λ¹/²)² = Λ. If B = ¹/²P−1, (2.17) holds. If A is real, then P may be taken to be orthogonal, by Theorem 1.13, and hence B is real.

    To prove the uniqueness let us assume that the diagonal elements of Λ are arranged in ascending order. Let C be any positive definite matrix such that = A. There exists a unitary matrix R such that

    R−1CR = Γ

    where Γ is a diagonal matrix. Moreover,

    Γ² = R−1C²R = R−1AR

    so that Γ² is a Jordan canonical form of A. Hence by Theorem 1.1 for some permutation matrix Q we have

    Γ² = Q−1ΛQ

    and

    Λ = QΓ²Q−1= (QΓQ−1)²

    Since QΓQ−1 is a diagonal matrix it follows that the diagonal elements of QΓQ−1 are the (positive) square roots of the corresponding elements of Λ, i.e.,

    QΓQ−1 = Λ¹/²

    Evidently we have

    Q−1ΛQ = Γ² = R−1C²R = R−1AR = R−1PΛP−1R

    or

    EΛ = ΛE

    where

    E = P−1RQ−1

    Since Λ is a diagonal matrix all elements of E vanish except for those elements in diagonal blocks associated with equal eigenvalues of A.

    Thus we have

    where the Ki and Ii are diagonal blocks and λ1, λ2, ..., λp are the distinct eigenvalues of A. Evidently

    C = RΓR−1 = RQ−1Λ¹/²QR−1 = PEΛ¹/²E−1P−1 = ¹/²P−1 = B

    and the uniqueness follows.

    Corollary 2.8. If A is a positive definite matrix, then for any nonsingular matrix L the matrix M given by

    (2.18)

    is positive definite.

    Proof. Let A¹/² be the positive definite matrix whose square is A. Since M = LA¹/²(LA¹/²)H and since LA¹/² is nonsingular, it follows from Theorem 2.6 that M is positive definite. The corollary also can be proved directly by noting that (u, LALHu = (LHu, ALHu) > 0 if u ≠ 0.

    Wachspress [1966] defined a class of matrices such that (v, Av) is real and positive, not necessarily for all v, but when v is real.

    Definition 2.2. A matrix A is positive real (nonnegative real) if (v, Av) > 0 (≥ 0), for all real v ≠ 0.

    While any positive definite matrix is, of course, positive real, the converse is not true. For example, the matrix

    (2.19)

    is positive real but not positive definite.

    Positive real matrices are characterized by the following theorem.

    Theorem 2.9. A matrix A is positive real (nonnegative real) if and only if A + AT is real and positive definite (nonnegative definite).

    Proof. Let A = B + iC where B and C are real. In the proof of Theorem 2.1 we showed that in order for (v, Av) to be real for all real v, we must have (2.5). Hence A + AT is real if A is positive real. If v is real, then

    (v, Av) = (v, Bv) = (Bv, v) = (v, BTv)

    so that

    (2.20)

    Suppose B + BT is not positive definite. Then since B + BT is Hermitian, by

    Enjoying the preview?
    Page 1 of 1