Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Model Identification and Data Analysis
Model Identification and Data Analysis
Model Identification and Data Analysis
Ebook820 pages5 hours

Model Identification and Data Analysis

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This book is about constructing models from experimental data. It covers a range of topics, from statistical data prediction to Kalman filtering, from black-box model identification to parameter estimation, from spectral analysis to predictive control.

Written for graduate students, this textbook offers an approach that has proven successful throughout the many years during which its author has taught these topics at his University.

The book:

  • Contains accessible methods explained step-by-step in simple terms
  • Offers an essential tool useful in a variety of fields, especially engineering, statistics, and mathematics
  • Includes an overview on random variables and stationary processes, as well as an introduction to discrete time models and matrix analysis
  • Incorporates historical commentaries to put into perspective the developments that have brought the discipline to its current state
  • Provides many examples and solved problems to complement the presentation and facilitate comprehension of the techniques presented
LanguageEnglish
PublisherWiley
Release dateMar 20, 2019
ISBN9781119546313
Model Identification and Data Analysis

Related to Model Identification and Data Analysis

Related ebooks

Technology & Engineering For You

View More

Related articles

Related categories

Reviews for Model Identification and Data Analysis

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Model Identification and Data Analysis - Sergio Bittanti

    Introduction

    Today, a deluge of information is available in a variety of formats. Industrial plants are equipped with distributed sensors and smart metering; huge data repositories are preserved in public and private institutions; computer networks spread bits in any corner of the world at unexpected speed. No doubt, we live in the age of data.

    This new scenario in the history of humanity has made it possible to use new paradigms to deal with old problems and, at the same time, has led to challenging questions never addressed before. To reveal the information content hidden in observations, models have to be constructed and analyzed.

    The purpose of this book is to present the first principles of model construction from data in a simple form, so as to make the treatment accessible to a wide audience. As R.E. Kalman (1930–2016) used to say Let the data speak, this is precisely our objective.

    Our path is organized as follows.

    We begin by studying signals with stationary characteristics (Chapter 1). After a brief presentation of the basic notions of random variable and random vector, we come to the definition of white noise, a peculiar process through which one can construct a fairly general family of models suitable for describing random signals. Then we move on to the realm of frequency domain by introducing a spectral characterization of data. The final goal of this chapter is to identify a wise representation of a stationary process suitable for developing prediction theory.

    In our presentation of random notions, we rely on elementary concepts: the mean, the covariance function, and the spectrum, without any assumption about the probability distribution of data. In Chapter 2, we briefly see how these features can be computed from data.

    For the simple dynamic models introduced in Chapter 1, we present the corresponding prediction theory. Given the model, this theory, explained in Chapter 3, enables one to determine the predictor with elementary computations. Having been mainly developed by Andrey N. Kolmogorov and Norbert Wiener, we shall refer it as Kolmogorov–Wiener theory or simply K–W theory.

    Then, in Chapter 4, we start studying the techniques for the construction of a model from data. This transcription of long sequences of apparently confusing numbers into a concise formula that can be scribbled into our notebook is the essence and the magic of identification science.

    The methods for the parameter estimation of input–output models are the subject of Chapter 5. The features of the identified models when the number of snapshots tends to infinity is also investigated (asymptotic analysis). Next, the recursive version of the various methods, suitable for real‐time implementation, are introduced.

    In system modeling, one of the major topics that has attracted the attention of many scholars from different disciplines is the selection of the appropriate complexity. Here, the problem is that an overcomplex model, while offering better data fitting, may also fit the noise affecting measurements. So, one has to find a trade‐off between accuracy and complexity. This is discussed in Chapter 6.

    Considering that prediction theory is model‐based, our readers might conclude that the identification methods should be presented prior to the prediction methods. The reason why we have done the opposite is that the concept of prediction is very much used in identification.

    In Chapter 7, the problem of identifying a model in a state space form is dealt with. Here, the data are organized into certain arrays from the factorization of which the system matrices are eventually identified.

    The use of the identified models for control is concisely outlined in Chapter 8. Again prediction is at the core of such techniques, since their basic principle is to ensure that the prediction supplied by the model is close to the desired target. This is why these techniques are known as predictive control methods.

    Chapter 9 is devoted to Kalman theory (or simply K theory) for filtering and prediction. Here, the problem is to estimate the temporal evolution of the state of a system. In other words, instead of parameter estimation, we deal with signal estimation. A typical situation where such a problem is encountered is deep space navigation, where the position of a spacecraft has to be found in real time from available observations.

    At the end of this chapter, we compare the two prediction theories introduced in the book, namely we compare K theory with K–W theory of Chapter 3.

    We pass then to Chapter 10, where the problem of the estimation of an unknown parameter in a given model is treated.

    Identification methods have had and continue to have a huge number of applications, in engineering, physics, biology, and economics, to mention only the main disciplines. To illustrate their applicability, a couple of case studies are discussed in Chapter 11. One of them deals with the analysis of Kobe earthquake of 1995. In this study, most facets of the estimation procedure of input–output models are involved, including parameter identification and model complexity selection. Second, we consider the problem of estimating the unknown frequency of a periodic signal corrupted by noise, by resorting to the input–output approach as well as with the state space approach by nonlinear Kalman techniques.

    There are, moreover, many numerical examples to accompany and complement the presentation and development of the various methods.

    In this book, we focus on the discrete time case. The basic pillars on which we rely are random notions, dynamic systems, and matrix theory.

    Random variables and stationary processes are gradually introduced in the first sections of the initial Chapter 1. As already said, our concise treatment hinges on simple notions, culminating in the concept of white process, the elementary brick for the construction of the class of models we deal with. Going through these pages, the readers will become progressively familiar with stationary processes and ideas, as tools for the description of uncertain data.

    The main concepts concerning linear discrete‐time dynamical systems are outlined in Appendix A. They range from state space to transfer functions, including their interplay via realization theory.

    In Appendix B, the readers who are not familiar with matrix analysis will find a comprehensive overview not only of eigenvalues and eigenvectors, determinant and basis, but also of the notion of rank and the basic tool for its practical determination, singular value decomposition.

    Finally, a set of problems with their solution is proposed in Appendix C.

    Most simulations presented in this volume have been performed with the aid of MATLAB® package, see https://it.mathworks.com/help/ident/.

    The single guiding principle in writing this books has been to introduce and explain the subject to readers as clearly as possible.

    Acknowledgments

    The birth of a new book is an emotional moment, especially when it comes after years of research and teaching.

    This text is indeed the outcome of my years of lecturing model identification and data analysis (MIDA) at the Politecnico di Milano, Italy. In its first years of existence, the course had a very limited number of students. Nowadays, there are various MIDA courses, offered to master students of automation and control engineering, electronic engineering, bio‐engineering, computer engineering, aerospace engineering, and mathematical engineering.

    In my decades of scientific activity, I have had the privilege of meeting and working with many scholars. Among them, focusing on the Italian community, are Paolo Bolzern, Claudio Bonivento, Marco Claudio Campi, Patrizio Colaneri, Antonio De Marco, Giuseppe De Nicolao, Marcello Farina, Simone Formentin, Giorgio Fronza, Simone Garatti, Roberto Guidorzi, Alberto Isidori, Antonio Lepschy, Diego Liberati, Arturo Locatelli, Marco Lovera, Claudio Maffezzoni, Gianantonio Magnani, Edoardo Mosca, Giorgio Picci, Luigi Piroddi, Maria Prandini, Fabio Previdi, Paolo Rocco, Sergio Matteo Savaresi, Riccardo Scattolini, Nicola Schiavoni, Silvia Carla Strada, Mara Tanelli, Roberto Tempo, and Antonio Vicino.

    I am greatly indebted to Silvia Maria Canevese for her generous help in the manuscript editing, thank you Silvia. Joshua Burkholder, Luigi Folcini, Chiara Pasqualini, Grace Paulin Jeeva S, Marco Rapizza, Matteo Zovadelli, and Fausto Vezzaro also helped out with the editing in various phases of the work.

    I also express my gratitude to Guido Guardabassi for all our exchanges of ideas on this or that topic and for his encouragement to move toward the subject of data analysis in my early university days.

    Some of these persons, as well as other colleagues from around the world, are featured in the picture at the end of the book (taken at a workshop held in 2017 at Lake Como, Italy).

    A last note of thanks goes to the multitude of students I met over the years in my class. Their interest has been an irreplaceable stimulus for my never ending struggle to explain the subject as clearly and intelligibly as possible.

    Sergio Bittanti

    e‐mail: sergio.bittanti@polimi.it

    website: home.deib.polimi.it/bittanti/

    The support of the Politecnico di Milano and the National Research Council of Italy (Consiglio Nazionale delle Ricerche–CNR) is gratefully acknowledged.

    1

    Stationary Processes and Time Series

    1.1 Introduction

    Forecasting the evolution of a man‐made system or a natural phenomenon is one of the most ancient problems of human kind. We develop here a prediction theory under the assumption that the variable under study can be considered as stationary process. The theory is easy to understand and simple to apply. Moreover, it lends itself to various generalizations, enabling to deal with nonstationary signals.

    The organization is as follows. After an introduction to the prediction problem (Section 1.2), we concisely review the notions of random variable, random vector, and random (or stochastic) process in Sections 1.3–1.5, respectively. This leads to the definition of white process (Section 1.6), a key notion in the subsequent developments. The readers who are familiar with random concepts can skip Sections 1.3–1.5.

    Then we introduce the moving average (MA) process and the autoregressive (AR) process (Sections 1.7 and 1.8). By combining them, we come to the family of autoregressive and moving average (ARMA) processes (Section 1.10). This is the family of stationary processes we focus on in this volume.

    For such processes, in Chapter 3, we develop a prediction theory, thanks to which we can easily work out the optimal forecast given the model.

    In our presentation, we make use of elementary concepts of linear dynamical systems such as transfer functions, poles, and zeros; the readers who are not familiar with such topics are cordially invited to first study Appendix A.

    1.2 The Prediction Problem

    Consider a real variable depending on discrete time . The variable is observed over the interval . The problem is to predict the value that will take the subsequent sample .

    Various prediction rules may be conceived, providing a guess for based on . A generic predictor is denoted with the symbol :

    equation

    The question is how to choose function .

    A possibility is to consider only a bunch of recent data, say , , , , and to construct the prediction as a linear combination of them with real coefficients , , …, :

    equation

    The problem then becomes that of selecting the integer and the most appropriate values for parameter , , …, .

    Suppose for a moment that and were selected. Then the prediction rule is fully specified and it can be applied to the past time points for which data are available to evaluate the prediction error:

    equation

    Let's now consider this fundamental question: Which characteristics should the prediction error exhibit in order to conclude that we have constructed a good predictor? In principle, the best one can hope for is that the prediction error be null at any time point. However, in practice, this is Utopian. Hence, we have to investigate the properties that a non‐null should exhibit in order to conclude that the prediction is fair.

    For the sake of illustration, consider the case when has the time evolution shown in Figure 1.1a. As can be seen, the mean value of is nonzero. Correspondingly, the rule

    equation

    would be better than the original one. Indeed, with the new rule of prediction, one can get rid of the systematic error.

    Possible diagrams for two cases of prediction error when the mean value is nonzero (top) and zero (bottom).

    Figure 1.1 Possible diagrams of the prediction error.

    As a second option, consider the case when the prediction error is given by the diagram of Figure 1.1b. Then the mean value is zero. However, the sign of changes at each instant; precisely, for even and for odd. Hence, even in such a case, a better prediction rule than the initial one can be conceived. Indeed, one can formulate the new rule:

    equation

    and

    equation

    From these simple remarks, one can conclude that the best predictor should have the following property: besides a zero mean value, the prediction error should have no regularity, rather it should be fully unpredictable. In this way, the model captures the whole dynamic hidden in data and no useful information remains unveiled in the residual error, and no better predictor can be conceived. The intuitive concept of unpredictable signal has been formalized in the twentieth century, leading to the notion of white noise (WN) or white process, a concept we precisely introduce later in this chapter. For the moment, it is important to bear in mind the following conclusion: A prediction rule is appropriate if the corresponding prediction error is a white process.

    In this connection, we make the following interesting observation. Assume that is indeed a white noise, then

    equation

    Rewrite this difference equation by means of the delay operator , namely the operator such that

    equation

    Then

    equation

    from which

    equation

    or

    equation

    with

    equation

    By reinterpreting as the complex variable, this relationship becomes the expression of a dynamical system with transfer function (from to ) given by .

    Summing up, finding a good predictor is equivalent to determining a model supplying the given sequence of data as the output of a dynamical system fed by white noise (Figure 1.2).

    Block diagram interpreting a sequence of data as the output of a dynamic model fed by white noise.

    Figure 1.2 Interpreting a sequence of data as the output of a dynamic model fed by white noise.

    This is why studying dynamical systems having a white noise at the input is a main preliminary step toward the study of prediction theory.

    The road we follow toward this objective relies first on the definition of white noise, which we pursue in four stages: random variable random vector stochastic process white noise.

    1.3 Random Variable

    A random (or stochastic) variable is a real variable that depends upon the outcome of a random experiment. For example, the variable taking the value or depending on the result of the tossing of a coin is a random variable.

    The outcome of the random experiment is denoted by ; hence, a random variable is a function of : .

    For our purposes, a random variable is described by means of its mean value (or expected value) and its variance, which we will denote by and , respectively.

    The mean value is the real number around which the values taken by the variable fluctuate. Note that, given two random variables, and with mean values and , the random variable

    equation

    obtained as a linear combination of and via the real numbers and , has a mean value:

    equation

    The variance captures the intensity of fluctuations around the mean value. To be precise, it is defined as

    equation

    where denotes the mean value of . Obviously, being non‐negative, the variance is a real non‐negative number.

    Often, the variance is denoted with symbols such as or . When one deals with various random variables, the variance of the th variable may be denoted as or .

    The square root of the variance is called standard deviation, denoted by or . If the random variable has a Gaussian distribution, then the mean value and the variance define completely the probability distribution of the variable. In particular, if a random variable is Gaussian, the probability that it takes value in the interval and is about . So if is Gaussian with mean value 10 and variance 100, then, in cases, the values taken by range from to .

    1.4 Random Vector

    A random (or stochastic) vector is a vector whose elements are random variables. We focus for simplicity on the bi‐dimensional case, namely, given two random variables and ,

    equation

    is a random vector (of dimension 2). The mean value of a random vector is defined as the vector of real numbers constituted by the mean values of the elements of the vector. Thus,

    equation

    where and are the mean values of and , respectively. The variance is a matrix given by

    equation

    where

    equation

    Here, besides variances and of the single random variables, the so‐called cross‐variance between and , , and cross‐variance between and , , appear. Obviously, , so that is a symmetric matrix.

    It is easy to verify that the variance matrix can also be written in the form

    equation

    where ′ denotes transpose.

    In general, for a vector of any dimension, the variance matrix is given by

    equation

    where is the vector whose elements are the mean values of the random variables entering .

    If is a vector with entries, is a matrix. In any case, is a symmetric matrix having the variances of the single variables composing vector along the diagonal and all cross‐variances as off‐diagonal terms.

    A remarkable feature of a variance matrix is that it is a positive semi‐definite matrix.

    Remark 1.1 (Positive semi‐definiteness)

    The notions of positive semi‐definite and positive definite matrix are explained in Appendix B. In a very concise way, given a real symmetric matrix , associate to it the scalar function defined as , where is an ‐dimensional real vector. For example, if

    equation

    we take

    equation

    Then

    equation

    Hence, is quadratic in the entries of vector . Matrix is said to be

    positive semi‐definite if ,

    positive definite if it is positive semi‐definite and only for

    We write and to denote a positive semi‐definite and a positive definite matrix, respectively.

    We can now verify that, for any random vector , is positive semi‐definite. Indeed, consider

    equation

    Then

    equation

    Here, we have used the property . Observe now that , being the product of a row vector times a column vector, is a scalar. As such, it coincides with its transpose: . Therefore,

    equation

    This is the expected value of a square, namely a non‐negative real number. Therefore, this quantity is non‐negative for any . Hence, we come to the conclusion that any variance matrix is positive semi‐definite. We simply write

    equation

    1.4.1 Covariance Coefficient

    Among the remarkable properties of positive semi‐definite matrices, there is the fact that their determinant is non‐negative (see Appendix B). Hence, referring to the two‐dimensional case,

    equation

    Under the assumption that and , this inequality suggests to define

    equation

    is known as covariance coefficient between random variables and . When and have zero mean value, is also known as correlation coefficient. The previous inequality on the determinant of the variance matrix can be restated as follows:

    equation

    One says that and are uncorrelated when . If instead or , one says that they have maximal correlation.

    Example 1.1 (Covariance coefficient for two random variables subject to a linear relation)

    Given a random variable , with , consider the variable

    equation

    where is a real number. To determine the covariance coefficient between and , we compute the mean value and the variance of as well as the cross‐covariance . The mean value of is

    equation

    Its variance is easily computed as follows:

    equation

    As for the cross‐variance, we have

    equation

    Therefore,

    equation

    Finally, if , then . In conclusion,

    equation

    In particular, we see that, if , the correlation is maximal in absolute value. This is expected, since, being , knowing the value taken by , one can evaluate without any error.

    1.5 Stationary Process

    A random or stochastic process is a sequence of random variables ordered with an index , referred to as time. We consider as a discrete index ( ). The random variable associated with time is denoted by . It is advisable to recall that a random variable is not a number, it is a real function of the outcome of a random experiment . In other words,

    equation

    Thus, a stochastic process is an infinite sequence of real variables, each of which depends upon two variables, time and outcome . Often, for simplicity in notation, the dependence upon is omitted and one simply writes to denote the process. However, one should always keep in mind that depends also upon the outcome of an underlying random experiment, .

    Once a particular outcome is fixed, the set defines a real function of time : . Such function is named process realization. To each outcome, a realization is associated. Hence, the set of realizations is the set of possible signals that the process can exhibit depending on the specific outcome of a random experiment. If, in the opposite, time is fixed, then one obtains , the random variable at time extracted from the process.

    Example 1.2 (Random process determined by the tossing of a coin)

    Consider the following process: Toss a coin, if the outcome is heads, then we associate to it the function , if the outcome is tails, we associate the function . The random process so defined has two sinusoidal signals as realizations, and . At a given time point , the process is a random variable , which can take two values, and .

    The simplest way to describe a stochastic process is to specify its mean function and its covariance function.

    Mean function:

    The mean function is defined as

    equation

    Operator performs the average over all possible outcomes of the underlying random experiment. Hence, we also write

    equation

    In such averaging, is a fixed parameter. Therefore, does not depend upon anymore; it depends on only. is the function of time around which the samples of the random variable fluctuate.

    Variance function:

    The variance function of the process is

    equation

    It provides the variances of the random variables at each time point.

    Covariance function:

    The covariance function captures the mutual dependence of two random variables extracted from the process at different time points, say at times and . It is defined as

    equation

    It characterizes the interdependence between the deviation of around its mean and the deviation of around its mean value . Note that, if we consider the same function with exchanged indexes, i.e. , we have

    equation

    Since

    , it follows that

    (1.1) equation

    Furthermore, by setting we obtain

    (1.2) equation

    This is the variance of random variable . Hence, when the two time indexes coincide, the covariance function supplies the process variance at the given time point.

    We are now in a position to introduce the concept of stationary process.

    Definition 1.1

    A stochastic process is said to be stationary when

    is constant,

    is constant,

    depends upon only.

    Therefore, the mean value of a stationary process is simply indicated as

    equation

    and the covariance function can be denoted with the symbol , where :

    equation

    Note that, for , from this expression, we have . In other words, is the variance of the process.

    Summing up, a stationary stochastic process is described by its mean value (a real number) and its covariance function (a real function). The variance of the process is implicitly given by the covariance function at .

    We now review the main properties of the covariance function of a stationary process.

    .

    Indeed, is a variance.

    .

    This is a consequence of 1.1 (taken and ).

    .

    Indeed, consider any pair of random variables drawn for the process, say and , with different time points. The covariance coefficient between such variables is

    equation

    On the other hand, we know that , so that cannot oversize .

    This last property suggests the definition of the normalized covariance function as

    equation

    Obviously, , while , . Note that, for , and may be both positive or negative.

    Further properties of the covariance function are discussed in Section 2.2.

    1.6 White Process

    A white process is defined as the stationary stochastic process having the following covariance function:

    equation

    This means that, if we take any pair of time points and with , the deviations of and from the process mean value are uncorrelated whatever and be. Thus, the knowledge of the value of the process at time is of no use to predict the value of the process at time , . The only prediction that can be formulated is the trivial one, the mean value. This is why the white process is a way to formalize the concept of fully unpredictable signal.

    The white process is also named white noise (WN).

    We will often use the compact notation

    equation

    to mean that is a white process with

    ,

    ,

    .

    The white noise is the basic brick to construct the family of stationary stochastic processes that we work with.

    1.7 MA Process

    An MA process is a stochastic process generated as a linear combination of the current and past values of a white process :

    equation

    where are real numbers.

    We now determine the main features of . We start with the computation of the mean value and the variance of . As for the mean,

    equation

    Since it follows that

    Passing to the variance, we have

    equation

    being white, all mean values of the cross‐products of the type with are equal to zero. Hence,

    equation

    Turn now to the covariance function . First, we consider the case when , and for simplicity, we set and . Then

    equation

    It is easy to see that the same conclusion holds true if , so that

    equation

    Analogous computations can be performed for , obtaining

    equation

    We see that and do not depend on time .

    In general, we come to the conclusion that does not depend upon and separately; it depends upon only. Precisely,

    equation

    Summing up, any MA process has

    constant mean value,

    constant variance,

    covariance function depending upon the distance between the two considered time points.

    Therefore, it is a stationary process, whatever values parameters , may take.

    Observe that the expression of an MA process

    equation

    can be restated by means of the delay operator as

    equation

    Then by introducing the operator

    equation

    one can write

    equation

    From this expression, the transfer function from to can be worked out

    equation

    Note that this transfer function has poles in the origin of the complex plane, whereas the zeros, the roots of polynomial , may be located in various positions, depending on the values of the parameters.

    Remark 1.2 (MA( ) process)

    We extrapolate the above notion of MA( ) process and consider the MA( ) case too:

    equation

    Of course, this definition requires some caution, as in any series of infinite terms. If the white process has zero mean value, then also has a zero mean value. The variance can be obtained by extrapolating the expression of the variance

    Enjoying the preview?
    Page 1 of 1