Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Introduction to Bayesian Estimation and Copula Models of Dependence
Introduction to Bayesian Estimation and Copula Models of Dependence
Introduction to Bayesian Estimation and Copula Models of Dependence
Ebook594 pages5 hours

Introduction to Bayesian Estimation and Copula Models of Dependence

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Presents an introduction to Bayesian statistics, presents an emphasis on Bayesian methods (prior and posterior), Bayes estimation, prediction, MCMC,Bayesian regression, and Bayesian analysis of statistical modelsof dependence, and features a focus on copulas for risk management

Introduction to Bayesian Estimation and Copula Models of Dependence emphasizes the applications of Bayesian analysis to copula modeling and equips readers with the tools needed to implement the procedures of Bayesian estimation in copula models of dependence. This book is structured in two parts: the first four chapters serve as a general introduction to Bayesian statistics with a clear emphasis on parametric estimation and the following four chapters stress statistical models of dependence with a focus of copulas.

A review of the main concepts is discussed along with the basics of Bayesian statistics including prior information and experimental data, prior and posterior distributions, with an emphasis on Bayesian parametric estimation. The basic mathematical background of both Markov chains and Monte Carlo integration and simulation is also provided. The authors discuss statistical models of dependence with a focus on copulas and present a brief survey of pre-copula dependence models. The main definitions and notations of copula models are summarized followed by discussions of real-world cases that address particular risk management problems.

In addition, this book includes:

• Practical examples of copulas in use including within the Basel Accord II documents that regulate the world banking system as well as examples of Bayesian methods within current FDA recommendations

• Step-by-step procedures of multivariate data analysis and copula modeling, allowing readers to gain insight for their own applied research and studies

• Separate reference lists within each chapter and end-of-the-chapter exercises within Chapters 2 through 8

• A companion website containing appendices: data files and demo files in Microsoft® Office Excel®, basic code in R, and selected exercise solutions

Introduction to Bayesian Estimation and Copula Models of Dependence is a reference and resource for statisticians who need to learn formal Bayesian analysis as well as professionals within analytical and risk management departments of banks and insurance companies who are involved in quantitative analysis and forecasting. This book can also be used as a textbook for upper-undergraduate and graduate-level courses in Bayesian statistics and analysis.

ARKADY SHEMYAKIN, PhD, is Professor in the Department of Mathematics and Director of the Statistics Program at the University of St. Thomas. A member of the American Statistical Association and the International Society for Bayesian Analysis, Dr. Shemyakin's research interests include informationtheory, Bayesian methods of parametric estimation, and copula models in actuarial mathematics, finance, and engineering.

ALEXANDER KNIAZEV, PhD, is Associate Professor and Head of the Department of Mathematics at Astrakhan State University in Russia. Dr. Kniazev's research interests include representation theory of Lie algebras and finite groups, mathematical statistics, econometrics, and financial mathematics.

LanguageEnglish
PublisherWiley
Release dateMar 3, 2017
ISBN9781118959022
Introduction to Bayesian Estimation and Copula Models of Dependence

Related to Introduction to Bayesian Estimation and Copula Models of Dependence

Related ebooks

Mathematics For You

View More

Related articles

Reviews for Introduction to Bayesian Estimation and Copula Models of Dependence

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Introduction to Bayesian Estimation and Copula Models of Dependence - Arkady Shemyakin

    Acknowledgments

    We express our sincere appreciation to all of our friends and colleagues who contributed to this book in many different ways. We are grateful for many discussions we had and many comments we received from Paul Alper, Susan Callaway, Oleg Lepekhin, Alex McNeil, Olga Demidova, and many others.

    We deeply appreciate the assistance we have received with collecting supplementary materials and preparing files for the companion website from Laura Hanson, Kathryn McKee, Cheryl Heskin, Matthew Galloway, Shannon Currier, Natalie Vandeweghe, and Stephanie Fritz.

    We are grateful to our collaborators on the research projects which became the foundation for the book: Heekyung Youn, Natalia Kangina, Alicia Johnson, Vadim Gordeev, Matthew Galloway, Ellen Klingner, Nicole Lenz, Nicole Lopez, Kelsie Sargent, and Katheryn Wifvat.

    We want to thank all of those to whom the material in the book was presented over the years and who contributed with their thoughtful comments regarding both the form and the content of the book. Special thanks are due to Sarah Millholland, Valentin Artemiev, Yuri Chepasov, Anastasia Rozhkova, Ekaterina Savinkina, Alexander Zyrianov, Doug Swisher, Laura Fink, Eric Schlicht, and many other students and seminar participants.

    We would like to acknowledge the staff of our schools, Astrakhan State University and the University of St. Thomas in Minnesota, for their support of our work and making this project possible.

    We also would like to thank all the editorial staff of Wiley: Susanne Steitz-Filler, Sari Friedman, Amy Hendrickson, Divya Narayanan, and others who at various stages helped with the completion of the book.

    We are also grateful to Ekaterina Kniazeva and Alexandra Savinkina for proof-reading the text and making suggestions which helped immensely in improving the style of the book and hopefully making it readable.

    Finally, our deepest gratitude goes to all our families, without whose support and companionship the project would not be possible.

    A.S. and A.K.

    Acronyms

    Glossary

    About the Companion Website

    This book is accompanied by a companion website:

    http://www.wiley.com/go/shemyakin/bayesian_estimation

    The website includes:

    Solutions to selected exercises

    Excel dataset

    Excel simulation templates

    Appendices for Chapter 8

    Datasets and results

    Code in R

    Introduction

    Why does anyone need another book in Bayesian statistics? It seems that there already exist a lot of resources for those interested in the topic. There are many excellent books covering specific aspects of Bayesian analysis or providing a wide and comprehensive background of the entire field: Berger, Bernardo and Smith, Gamerman and Freitas Lopes, Gelman et al., Robert and Cassella, and many others. Most of these books, though, will assume a certain mathematical and statistical background and would rather fit a reader’s profile of a graduate or advanced graduate level. Out of those aimed at a less sophisticated audiences, we would certainly recommend excellent books of William Bolstad, John Kruschke, and Peter Lee. There also exist some very good books on copulas: comprehensive coverage by Nelsen and Joe, and also more application-related Cherubini et al., Emrechts et al., and some others. However, instead of just referring to these works and returning to our extensive to-do lists, we decided to spend considerable amount of time and effort putting together another book—the book we presently offer to the reader.

    The main reason for our endeavor is: we target a very specific audience, which as we believe is not sufficiently serviced yet with Bayesian literature. We communicate with members of this audience routinely in our day-to-day work, and we have not failed to register that just providing them with reading recommendations does not seem to satisfy their needs. Our perceived audience could be loosely divided into two groups. The first includes advanced undergraduate students of Statistics, who in all likelihood have already had some exposure to main probabilistic and statistical principles and concepts (most likely, in classical or frequentist setup), and may (as we probably all do) exhibit some traces of Bayesian philosophy as applicable to their everyday lives. But for them these two: statistical methods on one hand and Bayesian thinking on the other, belong to very different spheres and do not easily combine in their decision-making process.

    The second group consists of practitioners of statistical methods, working in their everyday lives on specific problems requiring the use of advanced quantitative analysis. They may be aware of a Bayesian alternative to classical methods and find it vaguely attractive, but are not familiar enough with formal Bayesian analysis in order to put it to work. These practitioners populate analytical departments of banks, insurance companies, and other major businesses. In short, they might be involved in predictive modeling, quantitative forecasting, and statistical reporting which often directly call for Bayesian approach.

    In the recent years, we have frequently encountered representatives of both groups described above as our collaborators, be it in undergraduate research or in applied consulting projects, or both at once (such things do happen). We have discovered a frequent need to provide to them a crash course in Bayesian methods: prior and posterior, Bayes estimation, prediction, MCMC, Bayesian regression and time series, Bayesian analysis of statistical dependence. From this environment we get the idea to concisely summarize the methodology we normally share with our collaborators in order to provide the framework for successful joint projects. Later on this idea transformed itself into a phantasy to write this book and hand it to these two segments of the audience as a potentially useful resource. This intention determines the content of the book and dictates the necessity to cover specific topics in specific order, trying also to avoid any unnecessary detail. That is why we do not include a serious introduction to probability and classical statistics (we believe that our audience has at least some formal knowledge of the main principles and facts in these fields). Instead, in Chapter 1 we just offer a review of the concepts we will eventually use. If this review happens to be insufficient to some readers, it will hopefully at least inspire them to hit the introductory books which will provide a more comprehensive coverage.

    Chapter 2 deals with the basics of Bayesian statistics: prior information and experimental data, prior and posterior distributions, with emphasis on Bayesian parametric estimation, just barely touching Bayesian hypothesis testing. Some time is spend on addressing subjective versus objective Bayesian paradigms and brief discussion of noninformative priors. We spend just so much time with conjugate priors and analytical derivation of Bayes estimators that will give an idea of the scope and limitations of the analytical approach. It seems likely that most readers in their practical applications will require the use of MCMC—Markov chain Monte Carlo method—the most efficient tool in the hands of modern Bayesians. Therefore, Chapter 3 contains the basic mathematical background on both Markov chains and Monte Carlo integration and simulation. In our opinion, successful use of Markov chain Monte Carlo methods is heavily based on good understanding on these two components. Speaking of Monte Carlo methods, the central idea of variance reduction nicely transitions us to MCMC and its diagnostics. Equally important, both Markov chains and Monte Carlo methods have numerous important applications outside of Bayesian setting, and these applications will be discussed as examples.

    Chapter 4 covers MCMC per se. It may look suspicious from traditional point of view that we do not particularly emphasize Gibbs sampling, rather deciding to dwell on Metropolis–Hastings algorithm in its two basic versions: independent Metropolis and random walk Metropolis-Hastings. In our opinion, this approach allows us to minimize the theoretical exposure and get close to the point using simple examples. Also, in more advanced examples at the end of the book, Gibbs sampling will rarely work without Metropolis. Another disclosure we have to make: there exists a huge library of MCMC computer programs, including those available online as freeware. All necessary references are given in the text, including OpenBUGs and several R packages. However, we also introduce some very rudimentary computer code which allows the readers to get inside the algorithms. This expresses the authors’ firm belief that do-it-yourself is often the best way if not to actually apply statistical computing, but at least to learn how to use it.

    This might be a good time to explain the authors’ attitude to the use of computer software while reading the book. Clearly, working on the exercises, many readers would find it handy to use some computing tools, and many readers would like to use the software of their choice (be it SPSS, Matlab, Stata, or any other packages). We try to structure our exercises and text examples in a way that makes it as easy as possible. What we offer from our perspective, in addition to this possibility, is a number of illustrations containing code and/or outputs in Microsoft Excel, Mathematica, and R. Our choice of Mathematica for providing graphics is simply explained by the background of the audience of the short courses where the material of the book has been approbated. We found it hard to refuse to treat ourselves and the readers to nice Mathematica graphical tools. R might be the software of choice for modern statisticians. Therefore we find its use along with this book both handy and inevitable. We can take only limited responsibility for the readers’ introduction to R, restricting ourselves to specific uses of this language to accompany the book. There exist a number of excellent R tutorials, both in print and online, and all necessary references are provided. We can especially recommend the R book by Crawley, and a book by Robert and Casella.

    The first four chapters form the first part of the book which can be used as a general introduction to Bayesian statistics with a clear emphasis on the parametric estimation. Now we need to explain what is included in the remaining chapters, and what the link is between the book’s two parts. Our world is a complicated one. Due to the recent progress of communication tools and the globalization of the world economy and information space, we humans are less and less like separate universes leading our independent lives (was it ever entirely true?) Our physical lives, our economical existences, our information fields become more and more interrelated. Many processes successfully modeled in the past by probability and statistical methods assuming independent behavior of the components, become more and more intertwined. Brownian motion is still around, as well as Newtonian mechanics is, but it often fails to serve as a good model for many complicated systems with component interactions. This explains an increased interest to model statistical dependence: be it dependence of physical lives in demography and biology, dependence of financial markets, or dependence between the components of a complex engineering system. Out of the many models of statistical dependence, copulas play a special role. They provide an attractive alternative to such traditional tools as correlation analysis or Cox’s proportional hazards. The key factor in the popularity of copulas in applications to risk management is the way they model entire joint distribution function and are not limited to its moments. This allows for the treatment of nonlinear dependence including joint tail dependence and going far beyond the standard analysis of correlation. The limitations of more traditional correlation-based approaches to modeling risks were felt around the world during the last financial crisis.

    Chapter 5 is dedicated to the brief survey of pre-copula dependence models, providing necessary background for Chapter 6, where the main definitions and notations of copula models are summarized. Special attention is dedicated to a somewhat controversial problem of model selection. Here, due to a wide variety of points of view expressed in the modern literature, the authors have to narrow down the survey in sync with their (maybe subjective) preferences.

    Two types of copulas most popular in applications: Gaussian copulas and copulas from Archimedean family (Clayton, Frank, Gumbel–Hougaard, and some others) are introduced and compared from model selection standpoint in Chapter 7. Suggested principles of model selection have to be illustrated by more than just short examples. This explains the emergence of the last two sections of Chapters 7 and 8, which contain some cases dealing with particular risk management problems. The choice of the cases has to do with the authors’ recent research and consulting experience. The purpose of this chapter is to provide the readers with an opportunity to follow the procedures of multivariate data analysis and copula modeling step-by-step enabling them to use these cases as either templates or insights for their own applied research studies. The authors do not take on the ambitious goal to review the state-of-the-art Bayesian statistics or copula models of dependence. The emphasis is clearly made on applications of Bayesian analysis to copula modeling, which are still remarkably rare due to the factors discussed above as well as possibly to some other reasons unknown to the authors. The main focus of the book is on equipping the readers with the tools allowing them to implement the procedures of Bayesian estimation in copula models of dependence. These procedures seem to provide a path (one of many) into the statistics of the near future. The omens which are hard to miss: copulas found their way into Basel Accord II documents regulating the world banking system, Bayesian methods are mentioned in recent FDA recommendations.

    The material of the book was presented in various combinations as the content of special topic courses at the both schools where the authors teach: the University of St. Thomas in Minnesota, USA, and Astrakhan State University in Astrakhan, Russia. Additionally, parts of this material have been presented as topic modules in graduate programs at MCFAM—the Minnesota Center for Financial and Actuarial Mathematics at the University of Minnesota and short courses at the Research University High School of Economics in Moscow, Russia, and U.S. Bank in Minneapolis, MN.

    We can recommend this book as a text for a full one-semester course for advanced undergraduates with some background in probability and statistics. Part I (Chapters 1–4) can be used separately as an introduction to Bayesian statistics, while Part II (Chapters 5–8) can be used separately as an introduction to copula modeling for students with some prior knowledge of Bayesian statistics. We can also suggest it to accompany topics courses for students in a wide range of graduate programs. Each chapter is equipped with a separate reference list and Chapters 2–8 by their own sets of end-of-the-chapter exercises. The companion website contains Appendices: data files and demo files in Microsoft Excel, some simple code in R, and selected exercise solutions.

    Part I

    Bayesian Estimation

    1

    Random Variables and Distributions

    Chapter 1 is by no means suggested to replace or replicate a standard course in probability. Its purpose is to provide a reference source and remind the readers what topics they might need to review. For a systematic review of probability and introduction to statistics we can recommend excellent texts by DeGroot and Schervish [4], Miller and Miller [10], and Rice [12]. In-depth coverage of probability distributions in the context of loss models is offered by Klugman et al. in [8]. If the reader is interested in a review with a comprehensive software guide, we can recommend Crawley’s handbook in R [3].

    Here we will introduce the main concepts and notations used throughout the book. The emphasis is made on the simplicity of explanations, and often in order to avoid technical details we have to sacrifice mathematical rigor and conciseness. We will also introduce a library of distributions for further illustrations. Without a detailed reference to the main facts of probability theory, we need to however emphasize the role played in the sequel by the concept of conditional probability, which becomes our starting point.

    1.1 Conditional Probability

    Let A and B be two random events, which could be represented as two subsets of the same sample space S including all possible outcomes of a chance experiment: AS and BS. Conditional probability of B given A measures the chances of B to happen if A is already known to occur. It can be defined for events A and B such that P(A) > 0 as

    (1.1) numbered Display Equation

    where P(AB) = P(BA) is the probability of intersection of A and B, the event indicating that both A and B occur. This conditional probability should not be confused with the conditional probability of A given B defined as

    (1.2) numbered Display Equation

    which shares the same numerator with P(BA), but has a different denominator.

    The source of possible confusion is a different choice of the sample space or reference population—whatever language one prefers to use—in (1.1) and (1.2) corresponding to the denominators in these formulas. In case of (1.1) we consider only such cases that A occurs, so that the sample space or reference population is reduced from S to A, while in (1.2) it changes from S to B.

    To illustrate this distinction, we will use a simple example. It fits the purpose of the book, using many illustrations from the fields of insurance and risk management, to begin with an example related to insurance.

    In a fictitious country of Endolacia, people drive cars and buy insurance against accidents. Accidents do happen on a regular basis, but not too often. During the last year, which provides the reference timeframe, 1000 accidents were recorded in the entire country. In all but one case the driver of car in an accident was a human being. In one particular case it was verified to be a dog. It probably was a specially trained dog as soon as it was trusted the steering wheel. The question we want to ask is: based on our data, are dogs safe drivers? The answer to this question will be central if we consider a possibility of underwriting an insurance policy to a dog-driver. Considering events A (an accident happened) and B (the driver was a dog), we can estimate the probability of a dog being the driver in case of an accident as

    numbered Display Equation

    But what exactly does this probability measure? It can be used to properly measure the share of responsibility dog-drivers carry for car accidents in Endolacia, which is indeed rather small, because 1 is a relatively small fraction of a 1000. However, the key question: are dogs safe drivers? is not addressed by this calculation. We can even suggest that based on the above information we do not have sufficient data to address this question. What piece of data is missing?

    In order to evaluate the safety of dog-drivers, we need to estimate a different conditional probability: P(AB), which determines the probability of an accident for a dog-driver. In order to do it, we need to estimate P(B), which is the probability that a random car in Endolacia at a random time (in accident or not) happens to be driven by a dog. It requires some knowledge of the size of the population of drivers in Endolacia n(S) and the size of the population of dog-drivers n(B) so that P(B) can be estimated as n(B)/n(S).

    Let us say that there were 1,000,000 drivers on the roads of Endolacia last year, and only one dog-driver (the one who happened to get into an accident). Then we can estimate

    numbered Display Equation

    which is much higher than 1/1000 from the previous calculation and is the probability which should be used as the risk factor or the risk rating of a dog-driver. Looking at this number, we would not want to insure a dog, since it is a very risky business operation.

    Correct understanding of conditional probability is a key to understanding in what ways Bayesian statistics is different from classical or frequentist statistics. The details follow in Chapter 2 and further chapters of Part I of the book. It also provides a key to understanding the underlying principles of construction of models of statistical dependence discussed in Chapter 5 and further chapters of Part II.

    1.2 Discrete Random Variables

    Random variable is a variable that takes on a certain value which becomes known as the result of an experiment, but is not known in advance. For example, an insurance company offers 1000 contracts during a year. Insurance events will happen and claims will be filed, but the number and amount of such claims is unknown before the end of the year. Thus, the number of claims is a random variable, and the total amount of claims is a different though related variable. Another example of random variable is return on investment. If we buy a share of stock and plan to sell it in a month, we can make gain or suffer a loss. The return on such an investment transaction is not known now, though it will be known in 1 month. Let us suppose someone has started a new business. How long will it stay in the market? The life time of the business is unknown beforehand, it is a random variable.

    A random variable is defined primarily by the set of its possible values: outcomes of a chance experiment. If this set consists of isolated points, the random variable is called discrete. For example, the number of insurance claims in a year is a discrete random variable, while the return on investment is not. We will denote all possible values of a discrete random variable X as xi, i = 1, 2, …. Throughout the text we will try to reserve capital letters to denote random variables and lowercase letters will correspond to their specific numerical values. To define a random variable, along with the set of its values, one needs to define the probabilities of these values. The set of all possible values of a discrete random variable and their respective probabilities is called the probability distribution of a discrete random variable. The sum of probabilities of all the possible values equals to one. If the number of values is finite, distribution can be described as a finite table:

    Expected value or mean of a discrete random variable X is defined as

    (1.3) numbered Display Equation

    The upper limit of this sum can be set at infinity. If the number of values is infinite and the corresponding infinite series converges, then expected value exists. Expected value defines the average position of values of a discrete random variable on the numeric line or their central tendency.

    Variance of a discrete random variable X is defined as

    (1.4) numbered Display Equation

    Variance describes the spread in values of a random variable around the average. The square root of variance is known as standard deviation.

    The simplest binary random variables are known to have Bernoulli distribution, which allows for only two possible outcomes: success with probability p and failure with probability 1 − p. The table of values for this distribution can be reduced to one formula:

    numbered Display Equation

    In this formula probability of success p is the only parameter of Bernoulli distribution, which can take any value between 0 and 1. Binomial distribution describes a random variable Y, the number of successes in n independent experiments, where in each experiment only two results (success and failure) are possible. Such a distribution is described by the following law of probability distribution

    (1.5)

    numbered Display Equation

    where p is the probability of success in any experiment, and 1 − p is the probability of failure. Binomial variable can be defined as the sum of n identical Bernoulli variables associated with independent experiments. Here and further on . We will say that Y Bin(n, p), if Y is a binomial variable with parameters p ∈ [0, 1] and integer n ≥ 1.

    Let us assume that a card is drawn out of a deck with 52 cards, and the suit of the card is recorded. After that the card is returned to the deck and the deck is shuffled. Let us suppose, four cards were drawn consecutively using this procedure (drawings with replacement). Let us consider random number X of spades among those four cards. The set of possible values of this random variable consists of numbers: 0; 1; 2; 3; 4. It is obvious that the probability of drawing a spade is the same for every draw and equals . The distribution of the random variable X is a binomial distribution. Let us write down the distribution table for this random variable.

    It is easy to calculate that the expected value of this variable equals to 1 and its variance equals to . Things get more interesting and more complicated when drawings are performed without replacement, but we will not formally discuss this situation.

    A random flow of events can be represented as a sequence of events occurring at random moments of time. Let us assume that the probability of k events within a time interval (s, s + t) does not depend on the starting time of the interval s, but only on the length of the interval t (stationary flow). Assume also that the number of events occurring within the interval (s, s + t) does not depend on what had happened before time s, or is going to happen after time s + t (flow with independent increments). We will also assume that two or more events cannot happen at the same time or at an infinitely small interval from each other (condition of ordinary flow). Let us define the counting variable Xt as the number of events within a stationary ordinary flow with independent increments happening within a time interval length t. The law of probability distribution of this random variable is defined for any nonnegative integer k by the formula

    (1.6)

    numbered Display Equation

    This distribution is called Poisson distribution, X Poiss(λ), where parameter λ known as the intensity of the flow is the average number of events which occur in a unit interval t = 1.

    Let us assume that the average number of cars passing by a certain marker on the highway in 1 minute, equals to 5. If we take observations during any short time interval, then the traffic stream of cars passing by the marker may be considered a Poisson flow. Let us calculate the probability of 8 cars passing by the marker in 2 minutes. Having applied formula (1.6) with λ = 5, we get 0.1126.

    Let the trials with binary outcomes (success with probability p or failure with probability 1 − p) be performed until the first success (or failure). The distribution of the number of trials X needed to achieve the first success is defined by the formula

    (1.7)

    numbered Display Equation

    This distribution is known as the geometric distribution with parameter p, X Geom(p).

    A natural generalization of the geometric distribution is the negative binomial distribution, which considers the number of trials Y needed to achieve r successes. The negative binomial distribution has two parameters: Y NB(p, r) if

    (1.8)

    numbered Display Equation

    1.3 Continuous Distributions on the Real Line

    In this section we shall consider random variables whose set of possible values is the entire real line ( − ∞, ∞). It means that such a random variable may take on both positive and negative values. Return on investment or gain/loss in a business transaction could serve as examples.

    The cumulative distribution function (commonly abbreviated as c.d.f.) of a random variable X is defined as F(x) = P(X x). It is obvious that the function F(x) is nondecreasing and may assume values on the bounded interval [0, 1]. It is also evident that P(a < X b) = F(b) − F(a).

    If the cumulative distribution function has a derivative f(x)

    Enjoying the preview?
    Page 1 of 1