Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Data Assimilation for the Geosciences: From Theory to Application
Data Assimilation for the Geosciences: From Theory to Application
Data Assimilation for the Geosciences: From Theory to Application
Ebook2,101 pages13 hours

Data Assimilation for the Geosciences: From Theory to Application

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Data Assimilation for the Geosciences: From Theory to Application brings together all of the mathematical,statistical, and probability background knowledge needed to formulate data assimilation systems in one place. It includes practical exercises for understanding theoretical formulation and presents some aspects of coding the theory with a toy problem.

The book also demonstrates how data assimilation systems are implemented in larger scale fluid dynamical problems related to the atmosphere, oceans, as well as the land surface and other geophysical situations. It offers a comprehensive presentation of the subject, from basic principles to advanced methods, such as Particle Filters and Markov-Chain Monte-Carlo methods. Additionally, Data Assimilation for the Geosciences: From Theory to Application covers the applications of data assimilation techniques in various disciplines of the geosciences, making the book useful to students, teachers, and research scientists.

  • Includes practical exercises, enabling readers to apply concepts in a theoretical formulation
  • Offers explanations for how to code certain parts of the theory
  • Presents a step-by-step guide on how, and why, data assimilation works and can be used
LanguageEnglish
Release dateMar 10, 2017
ISBN9780128044841
Data Assimilation for the Geosciences: From Theory to Application
Author

Steven J. Fletcher

Steven J. Fletcher is a Research Scientist III at the Cooperative Institute for Research in the Atmosphere (CIRA) at Colorado State University, where he is the lead scientist on the development of non-Gaussian based data assimilation theory for variational, PSAS, and hybrid systems. He has worked extensively with the Naval Research Laboratory in Monterey in development of their data assimilation system, as well as working with the National Atmospheric and Oceanic Administration (NOAA)’s Environmental Prediction Centers (EMC) data assimilation system. Dr. Fletcher is extensively involved with the American Geophysical Union (AGU)’s Fall meeting planning committee, having served on the committee since 2013 as the representative of the Nonlinear Geophysics section. He has also been the lead organizer and science program committee member for the Joint Center for Satellite Data Assimilation Summer Colloquium on Satellite Data Assimilation since 2016. Dr. Fletcher is the author of Data Assimilation for the Geosciences: From Theory to Application (Elsevier, 2017). In 2017 Dr. Fletcher became a fellow of the Royal Meteorological Society.

Read more from Steven J. Fletcher

Related to Data Assimilation for the Geosciences

Related ebooks

Physics For You

View More

Related articles

Reviews for Data Assimilation for the Geosciences

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Data Assimilation for the Geosciences - Steven J. Fletcher

    Data Assimilation for the Geosciences

    From Theory to Application

    First Edition

    Steven J. Fletcher

    Cooperative Institute for Research in the Atmosphere, Colorado State University, Fort Collins, CO, USA

    Table of Contents

    Cover image

    Title page

    Copyright

    Chapter 1: Introduction

    Abstract

    Chapter 2: Overview of Linear Algebra

    Abstract

    2.1 Properties of Matrices

    2.2 Matrix and Vector Norms

    2.3 Eigenvalues and Eigenvectors

    2.4 Matrix Decompositions

    2.5 Sherman-Morrison-Woodbury Formula

    2.6 Summary

    Chapter 3: Univariate Distribution Theory

    Abstract

    3.1 Random Variables

    3.2 Discrete Probability Theory

    3.3 Continuous Probability Theory

    3.4 Discrete Distribution Theory

    3.5 Expectation and Variance of Discrete Random Variables

    3.6 Moments and Moment-Generating Functions

    3.7 Continuous Distribution Theory

    3.8 Lognormal Distribution

    3.9 Exponential Distribution

    3.10 Gamma Distribution

    3.11 Beta Distribution

    3.12 Chi-Squared (χ²) Distribution

    3.13 Rayleigh Distribution

    3.14 Weibull Distribution

    3.15 Gumbel Distribution

    3.16 Summary of the Descriptive Statistics, Moment-Generating Functions, and Moments for the Univariate Distribution

    3.17 Summary

    Chapter 4: Multivariate Distribution Theory

    Abstract

    4.1 Descriptive Statistics for Multivariate Density Functions

    4.2 Gaussian Distribution

    4.3 Lognormal Distribution

    4.4 Mixed Gaussian-Lognormal Distribution

    4.5 Multivariate Mixed Gaussian-Lognormal Distribution

    4.6 Gamma Distribution

    4.7 Summary

    Chapter 5: Introduction to Calculus of Variation

    Abstract

    5.1 Examples of Calculus of Variation Problems

    5.2 Solving Calculus of Variation Problems

    5.3 Functional With Higher-Order Derivatives

    5.4 Three-Dimensional Problems

    5.5 Functionals With Constraints

    5.6 Functional With Extremals That Are Functions of Two or More Variables

    5.7 Summary

    Chapter 6: Introduction to Control Theory

    Abstract

    6.1 The Control Problem

    6.2 The Uncontrolled Problem

    6.3 The Controlled Problem

    6.4 Observability

    6.5 Duality

    6.6 Stability

    6.7 Feedback

    6.8 Summary

    Chapter 7: Optimal Control Theory

    Abstract

    7.1 Optimizing Scalar Control Problems

    7.2 Multivariate Case

    7.3 Autonomous (Time-Invariant) Problem

    7.4 Extension to General Boundary Conditions

    7.5 Free End Time Optimal Control Problems

    7.6 Piecewise Smooth Calculus of Variation Problems

    7.7 Maximization of Constrained Control Problems

    7.8 Two Classical Optimal Control Problems

    7.9 Summary

    Chapter 8: Numerical Solutions to Initial Value Problems

    Abstract

    8.1 Local and Truncation Errors

    8.2 Linear Multistep Methods

    8.3 Stability

    8.4 Convergence

    8.5 Runge-Kutta Schemes

    8.6 Numerical Solutions to Initial Value Partial Differential Equations

    8.7 Wave Equation

    8.8 Courant Friedrichs Lewy Condition

    8.9 Summary

    Chapter 9: Numerical Solutions to Boundary Value Problems

    Abstract

    9.1 Types of Differential Equations

    9.2 Shooting Methods

    9.3 Finite Difference Methods

    9.4 Self-Adjoint Problems

    9.5 Error Analysis

    9.6 Partial Differential Equations

    9.7 Self-Adjoint Problem in Two Dimensions

    9.8 Periodic Boundary Conditions

    9.9 Summary

    Chapter 10: Introduction to Semi-Lagrangian Advection Methods

    Abstract

    10.1 History of Semi-Lagrangian Approaches

    10.2 Derivation of Semi-Lagrangian Approach

    10.3 Interpolation Polynomials

    10.4 Stability of Semi-Lagrangian Schemes

    10.5 Consistency Analysis of Semi-Lagrangian Schemes

    10.6 Semi-Lagrangian Schemes for Non-Constant Advection Velocity

    10.7 Semi-Lagrangian Scheme for Non-Zero Forcing

    10.8 Example: 2D Quasi-Geostrophic Potential Vorticity (Eady Model)

    10.9 Summary

    Chapter 11: Introduction to Finite Element Modeling

    Abstract

    11.1 Solving the Boundary Value Problem

    11.2 Weak Solutions of Differential Equation

    11.3 Accuracy of the Finite Element Approach

    11.4 Pin Tong

    11.5 Finite Element Basis Functions

    11.6 Coding Finite Element Approximations for Triangle Elements

    11.7 Isoparametric Elements

    11.8 Summary

    Chapter 12: Numerical Modeling on the Sphere

    Abstract

    12.1 Vector Operators in Spherical Coordinates

    12.2 Spherical Vector Derivative Operators

    12.3 Finite Differencing on the Sphere

    12.4 Introduction to Fourier Analysis

    12.5 Spectral Modeling

    12.6 Summary

    Chapter 13: Tangent Linear Modeling and Adjoints

    Abstract

    13.1 Additive Tangent Linear and Adjoint Modeling Theory

    13.2 Multiplicative Tangent Linear and Adjoint Modeling Theory

    13.3 Examples of Adjoint Derivations

    13.4 Perturbation Forecast Modeling

    13.5 Adjoint Sensitivities

    13.6 Singular Vectors

    13.7 Summary

    Chapter 14: Observations

    Abstract

    14.1 Conventional Observations

    14.2 Remote Sensing

    14.3 Quality Control

    14.4 Summary

    Chapter 15: Non-variational Sequential Data Assimilation Methods

    Abstract

    15.1 Direct Insertion

    15.2 Nudging

    15.3 Successive Correction

    15.4 Linear and Nonlinear Least Squares

    15.5 Regression

    15.6 Optimal (Optimum) Interpolation/Statistical Interpolation/Analysis Correction

    15.7 Summary

    Chapter 16: Variational Data Assimilation

    Abstract

    16.1 Sasaki and the Strong and Weak Constraints

    16.2 Three-Dimensional Data Assimilation

    16.3 Four-Dimensional Data Assimilation

    16.4 Incremental VAR

    16.5 Weak Constraint—Model Error 4D VAR

    16.6 Observational Errors

    16.7 4D VAR as an Optimal Control Problem

    16.8 Summary

    Chapter 17: Subcomponents of Variational Data Assimilation

    Abstract

    17.1 Balance

    17.2 Control Variable Transforms

    17.3 Background Error Covariance Modeling

    17.4 Preconditioning

    17.5 Minimization Algorithms

    17.6 Performance Metrics

    17.7 Summary

    Chapter 18: Observation Space Variational Data Assimilation Methods

    Abstract

    18.1 Derivation of Observation Space-Based 3D VAR

    18.2 4D VAR in Observation Space

    18.3 Duality of the VAR and PSAS Systems

    18.4 Summary

    Chapter 19: Kalman Filter and Smoother

    Abstract

    19.1 Derivation of the Kalman Filter

    19.2 Kalman Filter Derivation from a Statistical Approach

    19.3 Extended Kalman Filter

    19.4 Square Root Kalman Filter

    19.5 Smoother

    19.6 Properties and Equivalencies of the Kalman Filter and Smoother

    19.7 Summary

    Chapter 20: Ensemble-Based Data Assimilation

    Abstract

    20.1 Stochastic Dynamical Modeling

    20.2 Ensemble Kalman Filter

    20.3 Ensemble Square Root Filters

    20.4 Ensemble and Local Ensemble Transform Kalman Filter

    20.5 Maximum Likelihood Ensemble Filter

    20.6 Hybrid Ensemble and Variational Data Assimilation Methods

    20.7 NDEnVAR

    20.8 Ensemble Kalman Smoother

    20.9 Ensemble Sensitivity

    20.10 Summary

    Chapter 21: Non-Gaussian Variational Data Assimilation

    Abstract

    21.1 Error Definitions

    21.2 Full Field Lognormal 3D VAR

    21.3 Logarithmic Transforms

    21.4 Mixed Gaussian-Lognormal 3D VAR

    21.5 Lognormal Calculus of Variation-Based 4D VAR

    21.6 Bayesian-Based 4D VAR

    21.7 Bayesian Networks Formulation of Weak Constraint/Model Error 4D VAR

    21.8 Results of the Lorenz 1963 Model for 4D VAR

    21.9 Incremental Lognormal and Mixed 3D and 4D VAR

    21.10 Regions of Optimality for Lognormal Descriptive Statistics

    21.11 Summary

    Chapter 22: Markov Chain Monte Carlo and Particle Filter Methods

    Abstract

    22.1 Markov Chain Monte Carlo Methods

    22.2 Particle Filters

    22.3 Summary

    Chapter 23: Applications of Data Assimilation in the Geosciences

    Abstract

    23.1 Atmospheric Science

    23.2 Oceans

    23.3 Hydrological Applications

    23.4 Coupled Data Assimilation

    23.5 Reanalysis

    23.6 Ionospheric Data Assimilation

    23.7 Renewable Energy Data Application

    23.8 Oil and Natural Gas

    23.9 Biogeoscience Application of Data Assimilation

    23.10 Other Applications of Data Assimilation

    23.11 Summary

    Chapter 24: Solutions to Select Exercise

    Chapter 2

    Chapter 3

    Chapter 5

    Chapter 6

    Chapter 7

    Chapter 8

    Chapter 9

    Bibliography

    Index

    Copyright

    Elsevier

    Radarweg 29, PO Box 211, 1000 AE Amsterdam, Netherlands

    The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom

    50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States

    Copyright © 2017 Elsevier Inc. All rights reserved.

    No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.

    This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).

    Notices

    Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.

    Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.

    To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.

    Library of Congress Cataloging-in-Publication Data

    A catalog record for this book is available from the Library of Congress

    British Library Cataloguing-in-Publication Data

    A catalogue record for this book is available from the British Library

    ISBN: 978-0-12-804444-5

    For information on all Elsevier publications visit our website at https://www.elsevier.com/books-and-journals

    Publisher: Candice Janco

    Acquisition Editor: Marisa LaFleur

    Editorial Project Manager: Marisa LaFleur

    Production Project Manager: Paul Prasad Chandramohan

    Cover Designer: Mark Rogers

    Typeset by SPi Global, India

    Chapter 1

    Introduction

    Abstract

    In this chapter we provide a brief motivation as to what data assimilation is, how it has progressed over the last 60 years, the mathematical, statistical and probabilistic theory that will be covered in the book, as well as the various different forms of data assimilation.

    Keywords

    Data Assimilation; Variational; Kalman Filter; Ensemble; Adjoints; Markov chain Monte Carlo; Particle Filters

    Data assimilation plays a vital role in how forecasts are made for different geophysical disciplines. While the numerical model of the geophysical systems are critical, so are the observations of the same system, be they direct or indirect. Neither the model nor the observations are perfect, and both are required for an improved forecast than can be achieved through using solely the numerical model without guidance of how accurate the current state is, or through producing a persistence, or advection, forecast from observations.

    Fig. 1.1 shows a conceptual diagram of the forecast skill of different methods without data assimilation from [1]. We see that the order in which the forecast skill drops off, at least for the atmosphere and the prediction of clouds, is quite telling. Data assimilation, when included in this type of figure, will have a higher forecast skill, and for longer than the other approaches. Why? Because when I am asked what I do at Colorado State University, I say that I do research in data assimilation, to which the usual response is What’s that? I reply: It is the science of combining the numerical models and observations of a geophysical system, such that the new forecast produced by the numerical model is better than the model or the observations on their own.

    Fig. 1.1 Copy of Figure 1 from [ 1] of the different forecast skill lengths for persistence, advection, numerical modeling, and climatology.

    Data assimilation acts as a bridge between numerical models and observations. There are many different methods to enable the bridging, but while data assimilation was used initially in engineering, it should not be considered as just an engineering tool. Its application today in numerical weather prediction, numerical ocean prediction, land surface processes, hydrological cycle prediction, cryosphere prediction, space weather prediction, soil moisture evolution, land surface-atmosphere coupling, ocean-atmosphere coupling, carbon cycle prediction, and climate reanalysis, to name but a few areas of application, require many different techniques and approaches to deal with the dimension of the problems, the different time scales for different geophysical processes, nonlinearity of the observations operators, and non-Gaussianity of the distribution for the errors associated with the different processes and observation types.

    The weather forecasts that you see on television, read on a phone, tablet or computer, or hear on the radio are generated from the output of a data assimilation system. Water resource managers rely on forecasts from a cryospheric-driven data assimilation system. Data assimilation plays an important part in renewable energy production. For example, wind farms require advance knowledge of ramp-up and ramp-down events. These are times when the wind is forecasted to exceed the safe upper limit for which the turbines can operate without overloading the blade motors. There were cases in the United Kingdom where ramp-up events were not correctly forecasted; the wind turbines were overloaded and caught fire and exploded as a result. Wind farms need to inform the electrical grids how much electricity they can provide either that day or over the next 48 hours. If the wind is forecasted to exceed the safe speed, then wind farms have to know how long the turbines will be switched off for, and how much less electricity they can provide.

    Solar farms have a similar issue with clouds and snow/ice/dust. While individual clouds are difficult to predict, a general idea of cloud percentage cover is important for the amount of electricity that a solar farm can provide. Snow forecasts are important for solar farms that are in areas of plentiful sunlight but receive snow in the winter months. The forecast of Haboobs in the desert regions is also important for advance knowledge of the possibility of being able to produce electricity from solar panels. Knowing whether panels are covered in dust, or advance warning that they may be covered in dust and sand, is important. The forecasts are a by-product of a data assimilation system.

    Data assimilation is also known in some scientific disciplines as inverse modeling. This is where we may not be producing a forecast, but we wish to combine an a priori state with an observation that is not directly of the a priori variables, to extract an estimate of the physical state at that time and place. A very frequent use of this technique is referred to as retrieval. In some geophysical fields, and for certain applications, the retrieved product may be assimilated into the model, rather than assimilating the indirect observation itself. This practice was quite common in the early stages of satellite data assimilation, as it meant that it avoided the highly nonlinear Jacobians of what are called radiative transfer models in the minimization of the cost function. Retrievals also play a vital part in gaining information from satellite brightness temperatures and radiances.

    There are many different forms of data assimilation spanning many decades and uses, and each system has its advantages and disadvantages. The earliest forms of data assimilation were referred to as objective analysis and included empirical methods, where there is no probabilistic information determining the weights given to observations. These early data assimilation approaches were also referred to as successive correction methods, where they applied a series of corrections at different scales to filter the unresolved scales. Examples of these successive correction schemes are the Cressman and the Barnes schemes [2, 3].

    The next set of data assimilation methods after the successive correction methods were the different versions of optimum interpolation. The basis of OI, as it is more commonly known nowadays, is the minimization of a least squares problem. The first appearance of OI was in Gandin’s 1963 book (in Russian), translated into English in 1965 [4]. In his book, Gandin refers to his approach as optimum rather that optimal, as we do not know the true expressions for the error variances and covariances involved. OI was the operational numerical weather prediction center’s data assimilation method of choice in the 1980s and early 1990s, but because OI schemes have the restriction of linearity, and are not global solvers for the analysis, they use either a volume of observations in a local area or take only a few observations that are close to each grid point.

    An alternative approach to the statistical-based OI was being developed by Yoshikazu Sasaki, where he used the idea of functionals to constrain the numerical model, given the observations. His approach would lead to the variational methods of data assimilation, specifically 1D, 2D, 3D, and 4DVAR. It was shown in [5] that the non-temporal variational methods, 1D, 2D, and 3DVAR, can also be derived from Bayes’s equation for conditional probability. In [6] it was shown that it is also possible to describe 4DVAR as a Bayesian problem.

    At the same time that Sasaki was developing his variational approach for data assimilation, Kalman was developing the Kalman Filter. This filter is based on control theory and it has since been shown that Kalman’s approach is equivalent to an observer feedback design control system. One of the differences between the Kalman filter and the variational approach is the descriptive statistic that they are both trying to find. In the variational approach we are seeking the mode of the posterior distribution—we shall explain these terms later—while the Kalman filter seeks the minimum variance state (mean), along with the covariance matrix. For Gaussian distributions the two descriptive statistics, mean and mode, are the same.

    The implementation of 4DVAR is quite expensive computationally speaking and as such the idea of including temporal information in the observations took some time to become a reality in the operational centers. It did so as a result of Courtier et al.’s 1997 paper [7] and their idea to incrementalize the variational approaches that enabled 4DVAR to go operational.

    In the mid-1990s, the idea of using an ensemble to approximate the analysis and forecast error covariance matrix, as well as the update step, from the Kalman filter equations was presented by Evensen [8], and was called the Ensemble Kalman Filter (EKF). As a result of the 1994 paper, ensemble-based data assimilation systems have become quite widespread in their usage in the prediction of different geophysical phenomena. This led to many different versions of ensemble-based approximations to the Kalman filter, referred to as the Ensemble Transform Kalman Filter (ETKF), the Local Ensemble Transform Kalman Filter (LETKF), and the Maximum Likelihood Ensemble Filter (MLEF), to name a few. The advantage of the ensemble-based methods is that they bring flow dependency into the analysis step, while the variational schemes assume a static model for the error covariance. Although 4DVAR does evolve the covariance matrix implicitly, it is still capturing the larger-scale errors, and not those that are referred to as errors of the day.

    There has been movement to combine the ensemble methods with the variational methods to form hybrid methods. One of these approaches is the EnNDVAR, the hybrid NDVAR, which is where the background error covariance matrix is a weighted combination of an estimate from an ensemble of states and the static 4DVAR matrix. The other hybrid approach is NDEnVAR. The 4DVAR cost function is defined in terms of a four-dimensional trajectory which is applied as a linear model through the ensemble of trajectories.

    Recently, with the need to allow for more nonlinearity, and the idea that probabilistic behavior on a smaller scale is less likely to be Gaussian distributed, the need for data assimilation methods that allow for non-Gaussian errors has grown. One set of approaches has been derived for the lognormal distribution, and the mixed Gaussian-lognormal distribution in a variational framework, all the way to the incremental version [6, 9–13]. The non-Gaussian variational approach seeks the mode of the posterior distribution, given a mixed distribution for both the background and the observation error distributions.

    Another approach that has been developed to tackle the non-Gaussian aspect are methods involving Markov Chain Monte Carlo (MCMC) theory, which involves using an ensemble to sample the whole posterior distribution and then, from that estimation, determining the value of the descriptive statistic required. To be able to integrate this distribution in time, we then require the particle filters which are seen as sequential MCMC in time. Being able to model the evolution of the posterior distribution is an important approach, but the filters, if not modified, will suffer from filter degeneracy if the number of particles is not sufficiently large enough. There is a great deal of research taking place to find a way around this curse of dimensionality.

    In this book, our aim is to provide the tools to understand the mathematic, statistics, and probability theory behind the different forms of data assimilation, as well as the derivations and properties of the different schemes, so that you can decide which approach to follow. In this book we shall cover linear algebra, random variables, descriptive statistics, univariate and multivariate distribution theory, calculus of variation, control and optimal control theory, finite differencing for initial and boundary value differential equations, semi-Lagrangian methods, the finite element model, Fourier analysis, spectral modeling, tangent linear modeling, adjoints, observations, successive correction, linear and nonlinear least squares, regression, optimum interpolation, analysis correction, variational data assimilation, physical space analysis system (PSAS) observation-space based variational data assimilation, ensemble data assimilation, Markov Chain Monte Carlo, particle filters, and finally applications of data assimilation in different geophysical disciplines.

    Therefore, at the end of this book you will hopefully have an unbiased opinion of which data assimilation approach you prefer. We have tried to be impartial, highlighting both the strengths and weaknesses of all of the data assimilation approaches. Ultimately, we would like you to understand that the goal of a data assimilation method is to:

    optimize the strengths of the models and observations, while simultaneously minimizing their weaknesses.

    With this in mind, we now move on to introduce the many different mathematical and statistical disciplines that create data assimilation for the geosciences.

    References

    [1] Miller S.D., Heidinger A.K., Sengupta M. Physically based satellite methods. In: Kleissl J., ed. Solar Energy Forecasting and Resource Assessment. New York, USA: Academic Press; 2013:49–79.

    [2] Cressman G.P. An operational objective analysis system. Mon. Wea. Rev. 1959;87:367–374.

    [3] Barnes S.L. A technique for maximizing details in numerical weather map analysis. J. Appl. Meteor. 1963;3:396–409.

    [4] Gandin L.S. Objective analysis of meteorological fields. Translated from Russian by the Israeli Program for Scientific Translations. 1965.

    [5] Lorenc A.C. Analysis methods for numerical weather prediction. Q. J. R. Meteor. Soc. 1986;112:1177–1194.

    [6] Fletcher S.J. Mixed lognormal-Gaussian four-dimensional data assimilation. Tellus. 2010;62A:266–287.

    [7] Courtier P., Thépaut J.-N., Hollingsworth A. A strategy for operational implementation of 4D-VAR, using an incremental approach. Q. J. R. Meteor. Soc. 1994;120:1367–1387.

    [8] Evensen G. Data assimilation with a non-linear quasi-geostrophic model using Monte-Carlo methods to forecast error statistics. J. Geophys. Res. Oceans. 1994;99: 10143–10162:C5.

    [9] Fletcher S.J., Zupanski M. A data assimilation method for log-normally distributed observational errors. Q. J. R. Meteor. Soc. 2006;132:2505–2519.

    [10] Fletcher S.J., Zupanski M. A hybrid normal and lognormal distribution for data assimilation. Atmos. Sci. Lett. 2006;7:43–46.

    [11] Fletcher S.J., Zupanski M. Implications and impacts of transforming lognormal variables into normal variables in VAR. Meteorologische Zeitschrift. 2007;16:755–765.

    [12] Fletcher S.J., Jones A.S. Multiplicative and additive incremental variational data assimilation for mixed lognormal-Gaussian errors. Mon. Wea. Rev. 2014;142:2521–2544.

    [13] Song H., Edwards C.A., Moore A.M., Fiechter J. Incremental four-dimensional variational data assimilation of positive-definite oceanic variables using a logarithm transformation. Ocean Model. 2012;54:1–17.


    ⋆ "To view the full reference list for the book, click here"

    Chapter 2

    Overview of Linear Algebra

    Abstract

    In the derivation of the different forms of data assimilation methods there are many mathematical properties are required. This chapter is designed to offer a refresher on mathematical formulas and properties that are important for the understanding of the derivations in this book.

    Keywords

    Matrices; Vectors; Eigenvalues; Eigenvectors; Singular value decomposition; Vector calculus

    Chapter Outline

    2.1 Properties of Matrices

    2.1.1 Matrix Multiplication

    2.1.2 Transpose of a Matrix

    2.1.3 Determinants of Matrices

    2.1.4 Inversions of Matrices

    2.1.5 Rank, Linear Independence and Dependence

    2.1.6 Matrix Structures

    2.2 Matrix and Vector Norms

    2.2.1 Vector Norms

    2.2.2 Matrix Norms

    2.2.3 Conditioning of Matrices

    2.2.4 Matrix Condition Number

    2.3 Eigenvalues and Eigenvectors

    2.4 Matrix Decompositions

    2.4.1 Gaussian Elimination and the LU Decomposition

    2.4.2 Cholesky Decomposition

    2.4.3 The QR Decomposition

    2.4.4 Diagonalization

    2.4.5 Singular Value Decomposition

    2.5 Sherman-Morrison-Woodbury Formula

    2.6 Summary

    The derivation of the different forms of data assimilation that are introduced in this book require many different mathematical properties, identities, and definitions for different differential operators and integral theorems. We start with the properties of matrices.

    2.1 Properties of Matrices

    Matrices play a vital role in most forms of data assimilation and the derivation of the assimilation schemes require the understanding of certain properties of matrices. A matrix in the data assimilation literature is usually denoted as a bold capital letter, A. The first matrix that we need to consider is the identity matrix which is denoted by I. The definition of the identity matrix is

    A matrix A is said to be a square matrix if its dimensions are equal, i.e., it is of dimensions N × N. The general form for an N × N matrix is given by

       (2.1)

    The matrix is said to be real valued if all of its entries aij are real numbers. A real N × N .

    2.1.1 Matrix Multiplication

    The first property of matrix-matrix and matrix-vector multiplication is that it is not a direct element by element multiplication, although there is an element by element matrix-matrix multiplication operator which will become important when we address non-Gaussian based data assimilation methods and we shall introduce that operator there. The rule for multiplying matrices is to multiply the rows by the columns and and the expression for the product is

    For the multiplication of two 3 × 3 matrices we have

       (2.2)

    The summation expression in (2.2) is extendable to any size matrix. Matrix multiplication does not just apply to square matrices. The same formula applies for rectangular matrices so long as the number of columns of the left matrix matches the number of rows in the right matrix. If we have a 3 × 4 matrix and a 4 × 3 matrix then they are compatible for multiplication. The size of matrix that arises as the product of these two matrices is 3 × 3. This is important because in data assimilation, we sometimes deal with matrices that are not square; as such, we need to know their dimensions correctly to ascertain the size of the problem we are solving. The rule for the dimension of the product is given by

    , then

    2.1.2 Transpose of a Matrix

    The first operator acting on a matrix that we introduce is the transpose. The transpose of a matrix A is denoted as AT. The effect of the transpose operator is to interchange the rows and columns of the matrix. If we consider the general matrix in (2.1), then the transpose of that matrix is

       (2.3)

    A special class of matrices that are important in data assimilation are symmetric matrices. A square matrix is said to be symmetric if aij = aji for ij. An important property of symmetric matrices is that AT = A.

    Exercise 2.1

    Find the transpose of the following matrices, identifying which, if any, are symmetric:

    2.1.3 Determinants of Matrices

    . The determinant is only applicable to square matrices. We start by considering a general 2 × 2 matrix who’s determinant is defined by

    For a general 3 × 3 matrix, the technique to derive the determinant involves the expansion as follows

       (2.4)

    Finally we consider the general 4 × 4 case which is

       (2.5)

    where the next set of determinants in (2.4) and (2.5) are referred to as the minors of A and are expanded as demonstrated for the 3 × 3 case. As can be seen in (2.4) and (2.5), the sign of the factors multiplying the minors are alternating. The signs for the specific elements in a 4 × 4 matrix are

       (2.6)

    It is important to note that it does not matter which row or column you expand the determinant about; you obtain the same answer. This is important for saving time where there are zeros present in any line or column of a matrix, as this removes associated minors from having to be evaluated.

    Example 2.2

    Find the determinants of the following three matrices:

       (2.7)

    Solution

    Taking A first we have

    For B, expanding the first row yields

    For matrix C is

    Next expanding about the second row in the remaining minor gives

    Note: The plus and minus signs for minor reset for each subdeterminant.

    are real and square matrices, then

    ,

    ,

    and

    .

    The second property above is referring to the inverse of the matrix which is defined in the next subsection.

    2.1.4 Inversions of Matrices

    The inverting of matrices play a very important part in many forms of data assimilation. The inversion of a matrix is not a trivial operation to perform as the dimensions of the matrices become large. As with the determinants we start with the general 2 × 2 matrix.

       (2.8)

    The denominator of the fraction multiplying the inverse matrix in (2.8) is the determinant of A. If a matrix does have a determinant equal to zero, then it is said to be singular.

    For matrices larger than order 2, the inversion of these matrices becomes quite cumbersome. Before we consider higher dimensional matrices we first introduce the matrix of cofactors. We shall illustrate this for a 3 × 3 matrix, but the definitions expand to larger dimension matrices. The matrix of cofactors is defined as

       (2.9)

    where the Cij are the minors expanded about the location of the index ij. Note: The cofactors follow the plus and minus signs as presented in (2.6). For the 3 × 3 case the cofactors are

    Finally the definition for the inverse of a square matrix is

       (2.10)

    For the general 3 × 3 matrix the general expression for the inverse is

       (2.11)

    where the negative cofactors have interchanged the minus signs of the sum-matrices’ determinants.

    Example 2.3

    Find the inverse of the following matrix,

       (2.12)

    The first step is to find the determinant of A. Expanding about by the first row we have

    Next we form the matrix of cofactors which can easily be shown for this example to be

    Therefore, the inverse of the matrix in (2.12) is

       (2.13)

    An important equation that links the matrix inverse to the identity matrix is

       (2.14)

    Exercise 2.4

    Verify that the matrix in (2.13) is the inverse of the matrix in (2.12).

    Exercise 2.5

    Identify which of the following matrices are singular by finding their determinants:

    Now that we have introduced the transpose and the inverse of a matrix we consider important properties of these operators on the product of matrices, which again will play an important part in the derivations of many of the equations that are involved in different aspects of data assimilation.

    We first consider the inverse and the transpose of the product of two matrices. These important properties are:

       (2.15a)

       (2.15b)

    Therefore the inverse of the product is the product of the inverses in reverse order. This is also true for the transpose as well. Note that for the transpose operator the matrices do not have to be square matrices, however, for the inverse operator the matrices do have to be square. The proof for (2.15a) is quite straight forward and is given below.

    Proof

    . Multiplying on the right of both sides by B. Now multiplying on the right of both sides by A.

    The property of the inverse of the product of two matrices can be extended to the product of n matrices, i.e.,

    The same is also true for the transpose operator as well

    An important property that links the transpose and the inverse is that the order that you perform the operators can be interchanged, i.e.,

    2.1.5 Rank, Linear Independence and Dependence

    As we saw in the previous subsection there were matrices that were not invertible and had determinants equal to zero. The reason for this is that these matrices had either rows or columns that were equal to the sum of all or some of the other columns or rows.

    A method to determine if a matrix is singular, if it a square matrix, is referred to as row or column reduction. The row reduction technique, which is the same for column reduction, is to take the leading diagonal that does not have all zeros below it and see if any rows have a 1 in that column. If not, then divide that row by the diagonal entry to make the leading diagonal a 1. The next step is to remove the entries below the diagonal entry. This is achieved through multiplying the leading diagonal entry by the factor multiplying the entries below them, and then subtracting this scaled row from the matching row below. This process is repeated for each diagonal entry. If rows or columns of zeros occur then the total number of non-zero rows of columns is referred to as the rank of the matrix. The remaining rows or columns are referred to as being linearly independent, while the rows or columns that are all zeros are referred to as being linearly dependent. Matrices that have a rank equal to the dimension of that matrix are said to be full-rank, while those who’s rank is less than the dimensions of the matrix are said to be rank-deficient.

    Example 2.6

    .

    Proof

    . This then implies that there are two linearly independent rows, so the matrix has rank = 2.

    Knowing about linear independence is a critical tool in diagnosing mistakes in matrix coding and formulations. If you know that your matrix should be invertible, but is appears not to be, then either it is a coding problem where two or more rows have been repeated, which makes the matrix singular, or the formulation of the problem is referred to as ill-posed. We shall explore ill-posedness in Chapter 8.

    If row-reducing a rectangular matrix then the number of linearly independent columns is the matrix’s column-rank, and the number of linearly independent rows is the row-rank.

    2.1.6 Matrix Structures

    The first matrix structure that has explicitly been identified is symmetry. The identity matrix introduced in Section 2.1 is a diagonal matrix. The diagonal matrices have the property that their inverses are simply the reciprocal of the diagonal entries. Diagonal matrices play important roles in data assimilation, primarily due to the ease of finding their inverses. However, if it is not possible to obtain a diagonal matrix, the next best matrix form is a banded diagonal. The 4 × 4 example in (2.16) is a specific type of banded matrix referred to as a tri-diagonal matrix,

       (2.16)

    Matrices similar to that in (2.16) occur in finite differences approximations to certain type of differential equations, which we shall see in Chapter 8. Matrices that have entries either side of the diagonal entry, but not both sides, and the first diagonal above or below are referred to as bidiagonal matrices.

    Another form of diagonal matrix is the block diagonal. An example of a block diagonal matrix is

       (2.17)

    , and the 0 represent the part of the matrix that have only zeros there. The dimensions of F . Note that the four matrices on the diagonal may still be full and difficult to invert. However, the inverse of F is

       (2.18)

    Block diagonal matrices can occur in data assimilation when trying to decouple/decorrelate certain variables to make their covariance matrix less full and both manageable for computational storage and for easier inversion.

    The next set of matrix structures are the triangular forms. There are two forms: the lower triangular and the upper triangular. The general forms for both are

       (2.19)

    These occur in decompositions of matrices that are often associated with the solver of large set of simultaneous equations. We shall go into more detail about this specific decomposition in Section 2.4.

    A useful class of matrices are the orthogonal matrices. These have the important property that AAT =ATA = I. These matrices can occur when different transforms are applied to certain variables to enable the problem to be easier to invert.

    2.2 Matrix and Vector Norms

    Norms play an important part in determining not only the performance of the data assimilation algorithm but also the error analysis, which then provides the bounds of accuracy of the performance of approximations that have been made in the assimilation schemes.

    2.2.1 Vector Norms

    The purpose of a vector norm is to provide a measure of some form of a specific vector. The definitions of the vector norms apply to both vectors in the real number and complex number spaces. The mathematical definition of a vector norm is given by the following.

    Definition 2.7

    , that have the following properties:

    with equality only occurring for x = 0.

    2. For any scalar, λ.

    3. For any vectors x and y. This inequality of the sums is referred to as the triangle inequality.

    A class of norms are referred to as the lp and are defined by

    If we consider the case where p = 2, then this is often referred to in the data assimilation literature as the l2 or the l − 2 norm, and is defined as

       (2.20)

    and is the normal length.

    There are three lp norms that are commonly used. These are as follows:

    1. Euclidean Norm ,

       (2.21)

    where the overhead bar is the complex conjugate of the vector x.

    2. Absolute Norm ,

       (2.22)

    3. Maximum Norm , this norm is also referred to as the infinity norm,

       (2.23)

    Example 2.8

    Find the Euclidean norm, the absolute norm and the infinity norm of the following two vectors:

       (2.24)

    Taking the x . For the y vector we have

    .

    Exercise 2.9

    Find the Euclidean, maximum and absolute norms for the following vectors

    and verify the triangle identity for the Euclidean norm for x + y, x + z, and y + z.

    Another property between the norms is that of equivalence are equivalent such that given two positive constants c1 and c, then

    It can easily be shown that the necessary constants to show equivalence between the three p norms above are

    2.2.2 Matrix Norms

    The norm of a real n × n space, or a complex numbered n × n and satisfies the following properties:

    1. 

    .

    where λ .

    (Triangle inequality).

    .

    We now introduce a definition to link the matrix norms to vector norms.

    Definition 2.10

    If a matrix norm and a vector norm satisfy

       (2.25)

    then the matrix norm is said to be consistent with the vector norm.

    It should be noted that the norm of the left-hand side of the inequality in (2.25) is a vector norm, and that the right-hand side of the inequality in (2.25) is the product of a matrix and a vector norm.

    Definition 2.11

    Given some vector norms, we define a matrix norm as follows:

       (2.26)

    is called a subordinate matrix norm and is always consistent with the vector norm.

    Given Definitions 2.10 and 2.11, the subordinate lp norms for the maximum/infinity, the absolute and the l2 vector norms are

    , which is the maximum of the row sums.

    , which is the maximum of the column sums.

    and λi is the eigenvalue of ATA.

    Note: All of the matrix norms above are consistent with the corresponding lp vector norm.

    However, the l2 matrix norm defined above appears quite different from its consistent vector norm. There is one more vector norm that we shall introduce here and appears to be similar in structure to the l2 vector norm and is referred to as the Frobenius norm. The definition of the Frobenius norm is

       (2.27)

    The Frobenius norm is consistent with the l; however, it should be noted that the Frobenius norm is not subordinate to any vector norm.

    2.2.3 Conditioning of Matrices

    Conditioning of problems play a vital part in geophysical numerical modeling. In this subsection we shall derive the condition number and explain how to interrupt the number as well as its implication of the accuracy of the numerical approximation that we are applying. If the problem we are approximating is sensitive to small computational errors then we need a measure to quantify this sensitivity. Consider the generalized case

       (2.28)

    where we are seeking the unknown x and have y that contains data that the solution depends upon.

    The problem expressed in (2.28) is said to be well-posed, or stable depending on the literature, if the solution x depends in a continuous way on ythat is tending towards y, then the corresponding sequence for xmust also approach x in some way. As we are interested in sensitivity to small changes, an equivalent definition of well-posedness is to consider that for smaller changes to y such that there are only small changes in x. Problems that do not satisfy these loose definitions are referred to as ill-posed or unstable.

    If a problem is ill-posed then there are serious problems in solving these types of problems. However, a continuous problem may be stable in the method described above, but the numerical approximations could involve difficulties when solving for a solution. The condition number is a measure that enables us to ascertain how stable a problem is. An important note here is that there are different degrees of stability in different problems.

    So why is the condition number so important? The condition number attempts to give guidance, or a measure, of the worst possible effect on the solution x that we are seeking, given small perturbations to y. To obtain the definition for the condition number, we consider small perturbations to both x and y denoted by δx and δy. This then makes x + δx the solution to the perturbed version of (2.28), which is expressed as

       (2.29)

    Given (, is defined as

       (2.30)

    If we were solving for a vector then the measure in (2.30) would be one of the vector norms defined in (2.21)–(2.23), however, as a caveat it should be noted that a different norm may be needed for x and y. In [14], the supremum is defined as the largest possible value for δy such that the perturbed equation (2.30) "makes sense."

    In [14], the way to interpret (2.30) is explained as a measure of the sensitivity of solution x to small changes in the data. Also in [14] is an explanation of how to understand what the magnitude of κ , then given that δy is assumed to be quite small, this then implies that δx then small changes in the data, δy, result in small changes in x.

    Therefore, what the condition number informs us about is whether or not the continuous problem that we are going to approximate is sensitive to small changes. When numerically approximating continuous problems, it is hoped that the most accurate approximating is used so that the errors introduced from the scheme will not result in large changes in the solution.

    However, it is not always possible to calculate the condition number of the continuous problem that you are seeking solutions to, therefore, there is a different condition number associated with matrices which occur in the numerical approximation to the continuous problem which can also give guidance about the order of accuracy to expect in the solution.

    2.2.4 Matrix Condition Number

    The matrix condition number come about through considering the error analysis of the matrix equation

       (2.31)

    The error analysis is based upon determining the sensitivity of the solution x to small perturbations. We start by considering the perturbed matrix equation

       (2.32)

    where r is the residual and where we have assumed that (2.31) has a unique solution x of order n.

    Subtracting (2.31) from (2.32) results in

       (2.33)

    Recalling the definition for the condition number for the continuous problem as being the ratio of the norm of the perturbed solution to the norm of the true solution to the ratio of the norm of the perturbed data to the norm of the true data. For (2.31) and (2.32), the expression just mentioned is equivalent to

       (2.34)

    where the expression in (2.34) is a way to examine the stability of (2.31) for all perturbations r that are small relative to b.

    To obtain the expression in (2.34) we first take the norms of the equations in (2.33), which results in

       (2.35)

    The next step is to divide the first inequality in (. The two divisions just mentioned enable us to find an inequality bound for the ratio of the norm of δx to x as

       (2.36)

    To obtain the next step in the derivation we have to recall that the matrix norm is induced by the vector norm. This property enables the two following inequalities

       (2.37)

    Substituting the two inequalities in (2.37) into the inequality in (2.36) results in

       (2.38)

    Dividing throughout (. It the product of the norm of the matrix multiplying the norm of the inverse of the matrix that is referred to as the condition number. Therefore,

       (2.39)

    Given the definition for the condition number, the next step is to understand how to interrupt the meaning of the number and the guidance that it gives towards the accuracy of the solution to the matrix equation. A in front of property to notice is that the number will be dependent on the norm that is chosen. However, as shown below, the lower bound for the condition number, no matter the choice of norm, is always 1, as

       (2.40)

    Given the expression in (2.40), it is clear that if the condition number is close to 1, then we can see from (2.39) that relatively small perturbations in b lead to near similar relatively small perturbations in x. However, the opposite is true that if the condition number is large then relatively small perturbations to b leads to large changes in x.

    Example 2.12

    Consider the following matrix equation which represents a numerical approximation to advection on a periodic domain. How conditioned is this numerical problem?

       (2.41)

    Upon looking at the matrix in (as a result of the zero determinant. Due to the condition number being so large, then it can be implied that the continuous problem that this discrete approximation represents is ill-posed. There are techniques to deal with these types of problems. One is to add a small number to the diagonal entry to perturb the matrix away from singularity. Another approach is to fix a point, which is equivalent to re-writing the problem with an extra constraint and then discretizing as before for the remaining points. We shall go into more detail about the actual model (2.41) arises from in Chapter 8.

    2.3 Eigenvalues and Eigenvectors

    As with many properties of matrices and vectors that we have introduced so far, eigenvalues and eigenvectors also play an important part in many aspects of numerical modeling, matrix decompositions, control theory, covariance modeling, and preconditioning to name but a few areas. The eigenvalues are the roots of the associated characteristic polynomial. The collection of the roots, eigenvalues, is called the spectrum of A , where the λis are the eigenvalues. An important property of eigenvalues is that the determinant of a square matrix is equal to the product of its eigenvalues,

    Eigenvalues are also related to the trace of a matrix, where the trace of a matrix is the sum of its diagonal entries, i.e.,

    .

    If the eigenvalue λ is in the spectrum of Athat satisfies the equation

       (2.42)

    is referred to as a eigenvector. There are two types of eigenvectors; the first are the right eigenvectors which satisfy (2.42). The second set are called the left eigenvectors and they satisfy the equation

       (2.43)

    where the superscript H is the conjugate transposethen this is simply the transposed defined earlier. Unless stated otherwise, when we use the term eigenvector we are referring to the right eigenvectors. The term conjugate in complex number refers to a pair of imaginary numbers that have equal real parts, and also have equal imaginary parts in magnitude, but are of opposite sign, i.e., 1 − 2i is the conjugate of 1 + 2i.

    Example 2.13

    Find the eigenvalues and eigenvectors of the following 2 × 2 real matrix A given by

    The first step in finding the eigenvalue is to form the matrix A λI, which for this example is

    Forming the determinant of the matrix above we obtain the following characteristic equations

    which implies that we have two distinct eigenvalues: λ1 = −2 and λ = 4. To find the associated eigenvectors for these eigenvalues, we have to form the matrix-vector equation Axi = λix, for i = 1, 2. Rearranging this equation and factorizing the eigenvector, we now have to solve the following equation:

    Therefore, substituting the first eigenvalue into the equation above yields

    Following the same derivation above it can easily be shown that the second eigenvector, x2 for the matrix A . An important property of the eigenvalues is that they can be used to determine if a matrix is singular. We stated at the beginning of this section that the determinant is related to the product of the eigenvalues. Therefore, if we have a zero eigenvalue then the matrix is singular.

    Exercise 2.14

    Find the eigenvalues and eigenvectors of the following matrices and calculate their determinants

    Eigenvalues and eigenvectors play an important role in methods for transforming matrices. We consider these decompositions in the next subsection.

    2.4 Matrix Decompositions

    In this section we shall consider three different forms of decompositions of a matrix: the singular value decomposition, the LU decomposition and diagonalization. The first decomposition that we consider is the LU decomposition which arises from what is referred to as Gaussian Elimination.

    2.4.1 Gaussian Elimination and the LU Decomposition

    Gaussian elimination is a technique for solving a set of linear simultaneous equations. We have alluded to some aspects of the techniques involved with Gaussian elimination when we explained about row and column reduction. If we have the matrix-vector equation AX = b to solve then we wish to find a way to transform the matrix A into an upper triangular matrix, then the new system of equations can be solved through the process of back substitution. As we just mentioned we have already described this technique for the row reduction of the matrix, but now we have to apply the same factorizations and subtractions to the vector b.

    The algorithmic description for the Gaussian elimination is as follow:

    Step 1: We assume that the first diagonal entry of the starting matrix is not equal to zero. Given this assumption we define the row multipliers by

    which is the first diagonal entry.

    Step 2: The second step is to eliminate the entries below the first leading diagonal entry in A but to apply the same subtraction to the entries in b, this is expressed mathematically as

    We repeat the two steps above for all N − 1 rows of the matrix A and the vector b below the first row until what remains is an upper triangular matrix-vector equation of the form

    The reason for introducing Gaussian elimination is that it plays a role in the formation of the LU decomposition. We have been able to reduce the matrix A into an upper triangular matrix U such that we are solving the matrix equation Ux = g, where the vector g is the collection of all of the altered entries of b.

    We now have defined the U part of the LU decomposition, and so we move onto the L part which comes from the row multipliers m. We have a set of row multiplies for each round of the elimination and as such if we store these multipliers into a lower triangular matrix then we have

    Now that we have both of the components of the LU decomposition we can now define the LU decomposition as.

    Theorem 2.15

    If we have a lower triangular matrix L and an upper triangular matrix U that have been found through the Gaussian elimination method just described, then the matrix A can be expressed as

    The ability to write the full square matrix as the product of a lower and an upper triangular matrix is that it enables to describe some properties of the matrix A. The first of these properties is

    which is equivalent to the product of the diagonal entries of the U matrix as the product of the diagonal entries of the L matrix is 1.

    The next property relates to the inverse of A. If we are able to perform a Gaussian elimination such that A = LU, then the inverse of A is given by

    which plays an important part when trying to solve large matrix-vector equations which occur quite frequently in numerical modeling. However, to make the numerics of the LU decomposition more stable we may be required to perform pivoting. Pivoting is where we interchange rows or columns to have a better entry on the diagonal with which to perform Gaussian elimination. This means that we have a permutation matrix P, that keeps a record of which rows and columns were interchanged and then makes the decomposition

    which then makes the inverse of A as

    2.4.2 Cholesky Decomposition

    Before we summarize this very useful decomposition we need to introduce two important definitions: The first of these is the definition of a Hermitian matrix.

    Definition 2.16

    A matrix A is said to be a Hermitian matrix if it is equal to its own conjugate transpose. This is implying that, the element in the ai, j entry is equal to the complex conjugate of the element in the aji entry. This must be true for all i and j.

    An example of a Hermitian matrix is

    An important feature to note about Hermitian matrices is that their diagonal entries have to be real numbers as they must be their own complex conjugate. One way to interpret the Hermitian matrix is as the extension of symmetric matrices for real numbers to matrices with complex number entries, i.e., AH = A.

    The second definition we require is that of definiteness of a matrix.

    Definition 2.17

    There are four possible versions of a different form of definiteness for real, symmetric matrices, and by association for the complex Hermitian matrices as follows:

    1. A symmetric N × N real matrix A is said to be positive definite if the scalar that arises from zTAz . That is to say zTAz > 0.

    2. A symmetric N × N real matrix A is said to be positive semidefinite if the scalar that arises from zTAz . That is to say zTAz ≥ 0.

    3. A symmetric N × N real matrix A is said to be negative definite if the scalar that arises from zTAz . That is to say zTAz < 0.

    4. A symmetric N × N real matrix A is said to be negative semidefinite if the scalar that arises from zTAz . That is to say zTAz ≤ 0.

    The general definition for the different forms of definiteness, which includes the Hermitian matrices, is that a N × N Hermitian matrix is said to be positive definite if

       (2.44)

    The inequality in (2.44) can easily be changed to obtain the complex matrix definition of positive semidefiniteness, negative definiteness and negative semidefiniteness.

    The reason for introducing these definitions above is because the Cholesky decomposition can only be applied to Hermitian semidefinite matrices.

    Definition 2.18

    For a Hermitian semidefinite matrix A, the Cholesky decomposition of A is defined as

       (2.45)

    where L is a lower triangular matrix and LH is the conjugate transpose of L.

    Some important properties of the Cholesky decomposition are as follows:

    • Every Hermitian positive-definite matrix has a unique Cholesky decomposition.

    • If the matrix A

    Enjoying the preview?
    Page 1 of 1