Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Textbook of Psychiatric Epidemiology
Textbook of Psychiatric Epidemiology
Textbook of Psychiatric Epidemiology
Ebook1,763 pages21 hours

Textbook of Psychiatric Epidemiology

Rating: 0 out of 5 stars

()

Read preview

About this ebook

The new edition of this critically praised textbook continues to provide the most comprehensive overview of the concepts, methods, and research advances in the field; particularly the application of molecular genomics and of neuroimaging. It has been revised and enhanced to capitalize on the strengths of the first and second editions while keeping it up-to-date with the field of psychiatry and epidemiology. This comprehensive publication now includes chapters on experimental epidemiology, gene-environment interactions, the use of case registries, eating disorders, suicide, childhood disorders and immigrant populations, and the epidemiology of a number of childhood disorders.

As in the first and second editions, the objective is to provide a comprehensive, easy to understand overview of research methods for the non-specialist. The book is ideal for students of psychiatric epidemiology, psychiatric residents, general psychiatrists, and other mental health professionals.

The book features a new editor, Peter Jones, from the University of Cambridge, who joins the successful US team of Ming Tsuang and Mauricio Tohen.

LanguageEnglish
PublisherWiley
Release dateMar 25, 2011
ISBN9780470977408
Textbook of Psychiatric Epidemiology

Related to Textbook of Psychiatric Epidemiology

Related ebooks

Medical For You

View More

Related articles

Reviews for Textbook of Psychiatric Epidemiology

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Textbook of Psychiatric Epidemiology - Ming T. Tsuang

    Chapter 2

    Analysis of categorical data: The odds ratio as a measure of association and beyond

    Garrett M. Fitzmaurice¹,²,³ and Caitlin Ravichandran¹,²

    ¹ Laboratory for Psychiatric Biostatistics, McLean Hospital, 115 Mill St,Belmont,MA, USA

    ² Department of Psychiatry, Boston, MA, USA

    ³ Department of Biostatistics, Boston, MA, USA

    2.1 Introduction

    In this chapter we present an overview of many of the statistical methods commonly used for the analysis of categorical ‘outcome’ data in psychiatric studies. A categorical variable is defined as one that takes on a finite number of levels or categories (e.g. ‘success’ and ‘failure’ in the case of a dichotomous or binary variable). For example, consider the data in Table 2.1 which are from a study of rates and predictors of recovery in patients with first-episode major affective disorders with psychosis [1]. In this study investigators obtained information on candidate predictors of recovery at the time of first hospitalisation (e.g. Axis I comorbidity) and then followed patients for 2 years to determine which patients experienced syndromal and functional recovery. In this simple illustration of one comparison of interest, the categorical outcome has two levels, ‘recovered’ or ‘not recovered’. Table 2.1 is commonly referred to as a 2 × 2 contingency table. Much of the statistical theory underlying the analysis of categorical data is more easily formulated for 2 × 2 contingency tables. Indeed, methods for the analysis of 2 × 2 contingency tables provide the cornerstone for many of the advanced statistical methods required for more complicated problems. These include extensions for analysing outcomes with more than two levels (e.g. ‘not recovered’, ‘partially recovered’ and ‘recovered’), which may or may not be ordered; the former are referred to as ordinal variables, the latter are referred to as nominal variables. In addition, there can be more than two levels of the experimental treatment or exposure variable (e.g. no Axis I comorbidity, one Axis I comorbidity, two or more Axis I comorbidities) and other factors or covariates (e.g. age, gender, health status before treatment) that influence the outcome variable.

    Table 2.1 Illustrative data from a study of recovery in patients with first-episode major affective disorders with psychosis

    NumberTable

    Some of the most widely used probability distributions for categorical outcomes include the Bernoulli, binomial, hypergeometric and multinomial distributions. Throughout this chapter we assume the reader has very little prior knowledge of these probability distributions. The chapter is organised as follows. We begin with a discussion of inference for a single probability or proportion. This is followed by a description of methods for analysing 2 × 2 contingency tables; the extensions to R × C contingency tables (i.e. contingency tables with R rows and C columns) are mentioned but not discussed in great detail. We discuss measures of association for 2 × 2 tables that quantify departures from independence. In particular, we focus on the odds ratio (OR) as a measure of association. We also discuss the analysis of sets of 2 × 2 tables, and describe the Cochran–Mantel–Haenszel test. Finally, we present an overview of regression models for categorical data, focusing extensively on logistic regression models for binary outcomes. The logistic regression model is first introduced for the simple case where there is only a single predictor or covariate. This model is compared and contrasted with the classical linear regression model. Later the generalisations to more than one predictor variable are considered. A major emphasis of this chapter is placed on how logistic regression is used in practice and how the logistic regression coefficients should be interpreted. An example, based on data from the first-episode major affective disorders with psychosis study, is used to illustrate and reinforce the main concepts. Finally, at the end of the chapter, we introduce some advanced topics, including extensions of logistic regression to matched study designs; exact logistic regression, which is appropriate for small sample sizes or sparse data; multinomial regression models for nominal and ordinal outcomes; and applications of logistic regression models to so-called ‘clustered’ categorical data, when the outcomes are not independent.

    2.2 Inference for a Single Proportion

    In this section we discuss inference for a single proportion or probability. In order to motivate the methods, we consider data from the first-episode major affective disorders with psychosis study. One of the goals of this study was to estimate the probability that patients with first-episode major affective disorders with psychosis achieve functional recovery after 2 years. The outcome for each patient can be denoted

    for i = 1, …, n patients. The binary outcomes for the n patients are assumed to be independent of each other. The probability of success (e.g. ‘recovered’) is denoted by and the probability of failure (e.g. ‘not recovered’) by . The distribution of the number of successes among the n patients, can be used to form test statistics and a confidence interval for p.

    Counts of the number of successes, Y, have a binomial probability distribution

    where the binomial coefficient, , is the number of ways y ‘successes’ can be obtained in n trials. The probability of success can be estimated using the sample proportion of successes, . In large samples (say, n > 30, and with the expected number of successes np ≥ 5 and the expected number of failures n(1 − p) ≥ 5), has an approximate normal distribution with mean p and variance . A 95% confidence interval for p is given by

    The above confidence interval for p, known as a Wald confidence interval, is commonly used and easy to compute, but has been criticised for its poor performance; for example these confidence limits cover the true value of p less than 95% of the time on average. Two alternatives with more favourable properties are the Wilson confidence interval, which is based on the score test, and the Jeffreys interval, which can be derived using Bayesian statistical theory. Both can be calculated using popular statistical software, and the Wilson interval also has the closed-form expression:

    2.1 2.1

    where z1−α/2 = 1.96 for a 95% confidence interval. For more information on the performance of these and other intervals, see for example Brown, Cai and DasGupta [2]. When sample sizes are relatively small (say, n < 30), an exact confidence interval can be obtained that is based directly on the binomial distribution for Y. Finally, hypothesis tests for p equalling a specified value, say po, can be conducted using either large sample theory for the approximate normal distribution of or via exact methods based on the binomial distribution for Y.

    2.2.1 Example

    Using data from the first-episode major affective disorders with psychosis study presented in Table 2.1 and the methods for inference for a single proportion, we can estimate the proportion of patients who achieve functional recovery 2 years after first hospitalisation. The estimated proportion is the total number of patients who recovered (Y = 68) divided by the total number of patients (n = 181), which equals 0.376. Ninety-five per cent confidence intervals for this estimate are (i) Wald: (0.305, 0.446), (ii) Wilson: (0.308, 0.448), (iii) Jeffreys: (0.308, 0.448) and (iv) exact: (0.305, 0.451). Note that there is close agreement among the four confidence intervals; this is to be expected when, as with these data, n is relatively large and both and .

    2.3 Analysis of 2 × 2 Contingency Tables

    In many settings we are interested in the effect of treatments or exposures on a binary outcome. When the treatment or exposure has only two levels the data can be summarised in a 2 × 2 contingency table. Data in the form of a 2 × 2 contingency table can arise from many different types of study designs [3]. For example, consider a clinical trial comparing the probability of remission between patients with depression assigned to a novel treatment or standard treatment. The question of scientific interest is: ‘How does treatment affect the probability of remission?’ Similarly, for the first-episode major affective disorders with psychosis study (Table 2.1), the presence of Axis I comorbidities (the exposure) was determined at baseline, and we are interested in the number of patients with and without Axis I comorbidities who recover. The question of scientific interest is: ‘How does the presence of comorbidities affect recovery?’ Both are examples of prospective study designs.

    However, data in the form of 2 × 2 tables also arise from other types of study designs. Consider, for example, the data from a retrospective case–control study of psychiatric disorders and occurrence of elderly suicides [4] presented in Table 2.2. The number of suicide cases and controls (non-cases) are fixed by design (with 85 cases and 153 controls) and the prevalence of psychiatric disorders is then ascertained on each subject in the study. In this retrospective case–control study design, the prevalence of psychiatric disorder is considered a random variable. Case–control studies are commonly used when the outcome is rare and/or when it is not ethical to randomise patients to the ‘exposure’ in a prospective study. In this particular case–control study one question of scientific interest is: ‘Does prevalence of substance use disorders vary among the cases and controls?’

    Table 2.2 Substance use disorders (SUDs) and occurrence of elderly suicides

    NumberTable

    The third type of study design in which data in the form of a 2 × 2 table arise is the so-called double dichotomy, cross-sectional or prevalence study. In this study design a fixed number (n) of subjects are randomly selected and each subject is cross-classified on the basis of the two variables (the row and column variables) of scientific interest. Table 2.3 displays data from a prevalence study of neuropsychiatric symptoms and mild cognitive impairment (MCI) in the elderly [5], where only the total number of subjects, n = 1969, is fixed by design. Table 2.3 contains data on the presence of delusions for the 1909 subjects with neuropsychiatric data available. In this example, the question of scientific interest is: ‘Are delusions and cognitive status related?’

    Table 2.3 Data from the study of neuropsychiatric symptoms and MCI

    NumberTable

    Finally, although not shown here, data in the form of a 2 × 2 table can arise when the total sample size, n, is not fixed in advance. For example, an exit poll conducted at an election station might set out to record the political party preferences (e.g. Democrat or Republican) and opinions about mental health parity legislation (in favour or against) of all respondents who agree to participate in the poll; here, the total number of individuals who agree to participate, n, is random.

    Suppose we let Xi denote the row variable (e.g. treatment or exposure) and Yi denote the column variable (e.g. outcome variable) for each one of these types of study designs, where both Xi and Yi are binary (taking values 0 or 1). Then, the data in a general 2 × 2 contingency table can be represented as in Table 2.4.

    Table 2.4 General representation of counts in a 2 × 2 contingency table

    NumberTable

    In Table 2.4 njk is the count of the number of subjects with X = j and Y = k; njk is referred to as a cell count. For example, n11 is the number of subjects with X = 1 and Y = 1. Also, in Table 2.4 the marginal row counts are nj+ = nj0 + nj1 (the number of subjects with X = j), and the marginal column counts are n+k = n0k + n1k (number of subjects with Y = k).

    In each study, different marginal totals are fixed by design. As a result, the counts in the tables have different distributions. For example, for the case–control study, n+0 and n+1 are fixed by design, and the numbers of exposed subjects in each column have independent binomial distributions. However, for all of these different types of study designs, the question of scientific interest can be formulated in a similar way: ‘Are X and Y associated or are they independent?’ For ease of explanation, we focus on data arising from a cross-sectional study design. For the cross-sectional design, we can write the probabilities for the 2 × 2 table as in Table 2.5.

    Table 2.5 Probabilities in a 2 × 2 contingency table, with only n fixed

    NumberTable

    The probabilities in Table 2.5 are , and the marginal probabilities are and . For the cross-sectional design, all of these probabilities can be estimated from the data at hand. For prospective studies with the number of exposures fixed by design, and for case-control studies, they are not all estimable. For prospective studies with the number of exposures fixed by design, only the two conditional row probabilities and can be estimated. Similarly, for case–control studies, only the two conditional column probabilities and can be estimated. For example, using data from the study of psychiatric disorders and occurrence of elderly suicides, the probability of a substance use diagnosis can be estimated for suicides and non-suicides, but the probability of suicide cannot be estimated for elderly with and without substance use diagnoses.

    2.3.1 The Odds Ratio as a Measure of Association

    To determine whether Xi and Yi are associated, it becomes necessary to formulate measures of association that quantify any departure from independence. The most commonly used measure of association is the odds ratio (OR), also known as the cross-product ratio (for reasons that will soon become apparent). The OR is a measure of association based on a comparison of ‘odds’. The odds is simply another metric for expressing risk or probability. Specifically, if p is the probability of success, then is referred to as the odds of success. For example, if the probability of success is 0.8 then the odds of success is 4 (or ) to 1. That is, the probability of success is four times as large as the probability of failure. In a prospective study with the number of exposures fixed by design, the OR measures association by comparing the odds of Y in the two exposure groups defined by X. Specifically, the OR for Y associated with X is

    The null value for the OR is 1 because it corresponds to and implies that Y and X are independent. Quite often, the log of the OR is used as a measure of association, since log(OR) = 0 under the assumption of no association between Y and X. When OR > 1, then ; similarly, when OR < 1, then . Note that the OR expresses association in relative (or multiplicative) terms in the sense that the odds of success in one group (e.g. unexposed group) is multiplied by OR to obtain the corresponding odds in the other group (e.g. exposed group).

    An appealing property of the OR is that it is symmetric in the roles of Y and X in the sense that reversing the roles of Y and X yields the same OR. That is, the OR for Y associated with X is equal to the OR for X associated with Y

    It is this property, unique to the OR, that accounts for its widespread use for assessing the association between exposure and disease in case–control studies. In addition, in ‘rare disease’ settings, the OR is a close approximation to another measure of association called the relative risk (RR). The RR is defined as

    and also expresses association in relative or multiplicative terms. Unlike the OR, however, the RR is not symmetric in Y and X. The relationship between the OR and the RR will be discussed in greater detail in Section 2.5.1. Finally, we note that a simple computational formula for the OR arises from the following equivalent expression,

    This expression helps to explain why the OR is sometimes referred to as the ‘ cross-product ratio’.

    It is usually of interest to obtain a point estimate and confidence interval for the OR, and to test the null hypothesis that the OR equals 1. For all four designs, the OR can be estimated as

    2.2 2.2

    Because the is approximately normally distributed, and because it will always result in non-negative estimates of the OR, it is preferable to obtain a confidence interval for log(OR), and then exponentiate the endpoints. That is, a 95% confidence interval for log(OR) is given by

    2.3 2.3

    where . Then, a 95% confidence interval for the OR is obtained by exponentiating the endpoints of this interval,

    Finally, suppose it is of interest to construct a test for no association (independence). There are three commonly used test statistics. The Wald test statistic for the null hypothesis, H0 : log(OR) = 0, is given by:

    2.4 2.4

    which, in large samples, has an approximate standard normal distribution, denoted by N(0, 1), under the null hypothesis of no association. Alternatively, the likelihood ratio test (LRT) statistic can be used. This is simply twice the difference in the log-likelihood under the alternative (association) and null (independence) hypotheses. Remarkably, for any of the four types of study designs considered, the LRT statistic reduces to

    2.5 2.5

    where Ojk = njk is the ‘observed’ count in the 2 × 2 table and

    is the ‘estimated expected’ count (under the assumption of independence). In addition, for any of the four study designs, the score test statistic reduces to

    2.6 2.6

    which is also known as the Pearson chi-square test for a 2 × 2 table. In large samples, both the likelihood ratio and the Pearson chi-square statistics have approximate chi-square distributions with 1 degree of freedom. Similarly, in large samples, the Wald test statistic has an approximate standard normal distribution or, equivalently, the squared Wald test statistic has an approximate chi-square distribution with 1 degree of freedom.

    If the sample size, n, is relatively small, these asymptotic (or very large sample) approximations cannot be relied upon. In particular, a rule-of-thumb in statistical folklore is that the asymptotic approximations cannot be relied upon if one (or say 25%) of the cells in the 2 × 2 table have estimated expected counts (Ejk) less than 5. When at least one Ejk is less than 5, and it is of interest to make inferences about the OR, a common technique is to fix both margins of the 2 × 2 table and use so-called ‘exact’ tests and confidence intervals. That is, for a prospective study (where the row margins are fixed), we further condition on the column margins; for a case–control study (where the column margins are fixed), we further condition on the row margins; or for a cross-sectional design (where n is fixed), we condition on both row and column margins. In all of these cases it can be shown that the counts in the resulting table with fixed margins have a non-central hypergeometric distribution. Under the null hypothesis H0 :  OR = 1, the non-central hypergeometric distribution becomes a central hypergeometric distribution, which forms the basis of Fisher's exact test of no association in a 2 × 2 contingency table (see, for example [6]). This test is appropriate in small samples; the non-central hypergeometric can also be used to obtain an estimate of the OR that has better small sample properties than the usual OR estimate given in Equation 2.2. One potential drawback with exact methods, however, is that they can be ‘conservative’ in the sense that the true significance level of an exact test is often far smaller than the nominal level (e.g. 0.05), thereby making it more difficult to reject the null hypothesis of independence.

    2.3.2 Examples

    Returning to the data from the first-episode major affective disorders with psychosis study in Table 2.1, the scientific interest is in the association between Axis I comorbidity and 2-year functional recovery. Using formulas from this section (Equations 2.2–2.6), the estimated OR comparing odds of recovery between patients with and without Axis I Comorbidity is 0.49, with 95% confidence interval (0.25, 0.94). We would estimate that patients with Axis I comorbidity have about one half the odds of 2-year functional recovery as patients without Axis I comorbidity. Performing a test of no association, we would obtain a Wald test statistic of Z = − 2.15 with a p-value of 0.03, a LRT statistic of G² = 4.81 with a p-value of 0.03, or a score (Pearson chi-square) test statistic of χ² = 4.70 with a p-value of 0.03. At the conventional α = 0.05 significance level, we would conclude there is a significant association between Axis I comorbidity and 2-year functional recovery among patients with first-episode major affective disorders with psychosis from any of these large-sample test statistics.

    For the data from the study of neuropsychiatric symptoms and MCI in Table 2.3, exact methods are more appropriate than large-sample methods due to small expected cell counts. For this example, the estimated OR comparing odds of MCI between elderly persons with and without delusions is 9.43, and the p-value for Fisher's exact test is <0.001. We conclude that elderly persons with delusions have increased odds of MCI.

    2.3.3 Analysis of R × C Contingency Tables

    In this section we briefly outline methods for the analysis of contingency table data when there are more than two rows and/or columns. Data often arise in the form of R × C contingency tables, where the number of rows R and/or the number of columns C is greater than 2. For example, in Table 2.2 substance use disorders could be divided into alcohol use disorder only and drug use disorders. In another example, using data from the first-episode major affective disorders with psychosis study, we may be interested in rates of 2-year functional recovery among patients with different types of onset.

    Suppose we again let Xi denote the row variable and Yi denote the column variable. The notation for an R × C table is a straightforward extension of the 2 × 2 table. In particular, let njk be the number of subjects with X = j and Y = k, and , for j = 1, …, R and k = 1, …, C. For an R × C table, the question of scientific interest is once again: ‘Are X and Y associated or are they independent?’ To study departures from independence, it is possible to define sets of (R − 1)(C − 1) non-redundant ORs for an R × C table. For example, one such set of OR are those that are relative to the last row and column of the table, that is

    2.7 2.7

    for j = 1, …, R − 1, and k = 1, …, C − 1. The OR in Equation 2.7 is the OR conditional on rows j and R and columns k and C. If the rows and columns are independent, then all of these ORs equal 1. The estimate of this OR is

    and confidence intervals or hypothesis tests can be formed based on the large sample normality of the estimator of the log OR. Alternatively, the likelihood ratio or the Pearson chi-square tests can be used to test for independence of X and Y, and their form is identical to Equations 2.5 and 2.6, except that the sum is now over j = 1, …, R, and k = 1, …, C. If the sample size is relatively small (and at least 25% of the cells in the R × C table have estimated expected counts Ejk less than 5), then the asymptotic chi-square approximations for these test statistics cannot be relied upon. In that case Fisher's exact test can be used to perform a test for no association in an R × C table.

    Finally, if the rows and columns are ordered, and scores are assigned to the rows and columns, the score test statistic for no association is simply a function of the correlation coefficient between the scores assigned to the rows and columns. On the other hand, if only one of the variables is ordered, and scores are assigned to the ordinal variable, then the score statistic is a function of the one-way analysis of variance (ANOVA) test statistic, where the ordinal variable is treated as the outcome, and the unordered variable is a factor or group variable. Both large sample (based on an asymptotic chi-square distribution) and exact p-values (based on the central hypergeometric) can be calculated for these score statistics.

    2.3.4 Example

    The LRT or Pearson chi-square test can be used to test for independence between type of onset of first-episode affective disorder with psychosis and 2-year functional recovery. For the data in Table 2.6, the LRT statistic is G² = 0.29, and the Pearson chi-square statistic is χ² = 0.29, both with p-values of 0.87. Either test provides no evidence of an association between type of onset and two-year functional recovery.

    Table 2.6 Additional data from the study of recovery in patients with first-episode major affective disorders with psychosis

    NumberTable

    2.4 Analysis of Sets of 2 × 2 Contingency Tables

    When there are three or more categorical variables, we can form a ‘multidimensional contingency table’. In this setting all of the variables could be random, or some margins of the table could be fixed by design. One particular type of multidimensional contingency table results from a set of J 2 × 2 tables. For example, consider the data introduced earlier on the 181 patients with first-episode major affective disorders with psychosis (see Table 2.1). In Table 2.7 these patients are now cross-classified by sex (W), in addition to Axis I comorbidity (X), and recovery (Y).

    Table 2.7 Illustration of two 2 × 2 contingency tables

    NumberTable

    The 2 × 2 tables of (X, Y) at each level of W are referred to as partial or conditional tables; for example they express the relationship between comorbidity and recovery controlling for sex. The (Comorbidity, Recovery) table formed by combining (or collapsing over) the partial tables is referred to as the marginal table (Table 2.1). Similarly, the ORs in the partial tables are called partial odds ratios, and the OR in the marginal table is called the marginal odds ratio. When the ORs in the partial tables differ, there is said to be interaction between W and (X, Y). On the other hand, when the partial ORs are the same, but the common partial OR is different from the marginal OR, there is said to be confounding. Confounding occurs when two variables are associated with a third in a way that obscures their relationship. In particular, W (Sex) can potentially confound the relationship between X (Comorbidity) and Y (Recovery) when W is related to both X and Y.

    In Table 2.7, there are two 2 × 2 tables. Suppose, in general, that there are J 2 × 2 tables, with notation as in Table 2.8. Let njkℓ denote the number of subjects with (W = j, X = k, Y = ℓ). Suppose it is of interest to test for no partial association, that is H0 : no association between Y and X given W = j, j = 1, …, J.

    Table 2.8 The n 2 × 2 contingency table, j = 1, …, J

    NumberTable

    For each of the 2 × 2 contingency tables, Cochran [7] and Mantel and Haenszel [8] proposed a test statistic based on conditioning upon both margins. Earlier, we discussed how this is valid for a single 2 × 2 table regardless of the type of study design; this result also generalises to J 2 × 2 tables. Thus, the following Cochran–Mantel–Haenszel test is valid for any design, including both prospective and case–control studies. Conditional on both margins of the jth table, the data follow a (central) hypergeometric distribution under the null hypothesis of no association, leading to the Cochran–Mantel–Haenszel test for H0 : no association between Y and X, given W

    where is the mean of the hypergeometric distribution and is the variance. The Cochran–Mantel–Haenszel statistic has an approximate N(0, 1) distribution under the null hypothesis provided that the number of tables is large, say J > 30, and/or the sample size in each table, nj++, is large. An estimate of the adjusted OR is given by

    Among the available tests for interaction (homogeneity of the ORs) is the Breslow–Day test [3], which has an approximate chi-square distribution with (J − 1) degrees of freedom. Like the Cochran–Mantel–Haenszel test, this test is based on conditioning on both margins. However, unlike the Cochran–Mantel–Haenszel test, the Breslow–Day test requires that the sample size in each partial table is large even if the number of tables is large. The calculation of the Breslow–Day test statistic is more complex than other calculations presented in the chapter, but this test is readily available within most popular statistical software.

    2.4.1 Example

    The Cochran–Mantel–Haenszel test statistic for the data in Table 2.7 for the test that the common OR equals one is Z = − 2.12, with a p-value of 0.03; the estimate of the adjusted OR is 0.49 (with 95% confidence interval: 0.26, 0.95). From the Cochran–Mantel–Haenszel test and estimate of the adjusted OR, we conclude that, adjusting for sex, patients with Axis I comorbidity have significantly lower odds of recovery than patients without Axis I comorbidity. Finally, we note that the Breslow–Day test statistic for homogeneity of the ORs is with a p-value of 0.57. From the Breslow–Day test, we would conclude we have no evidence that the association between Axis I comorbidity and recovery differs between males and females (i.e. there is no interaction between comorbidity and sex).

    2.4.2 Matched Pair Study Design

    A matched pair study design is an example of a case when the number of partial tables (J) is large, and the sample size for each partial table (nj++) is small. The matched pair design has become increasingly popular in epidemiologic studies. In a matched case–control study, a case is selected, and then a control is matched to the case on factors that could be confounders of the association between the exposure and outcome variables. Then, as in the usual case–control study, investigators determine the exposure status (exposed, not exposed) of all subjects. For example, the data in Table 2.9, reported in Everitt [9], arose from a study designed to test the hypothesis that complications during pregnancy and birth, a known risk factor for the development of schizophrenia, are more prevalent in schizophrenics with a low age of onset (prior to age 16) compared to those with a later age of onset (after age 21). In this study, 36 subjects with low age of onset schizophrenia (cases) were matched one-to-one to 36 controls with later age of onset schizophrenia; the cases and controls were pair-matched on sex, race and socioeconomic status.

    Table 2.9 Complications during pregnancy and birth for 36 matched pair cases and controls

    NumberTable

    Alternatively, in a matched prospective study, individuals are matched on exposure status. For example, individuals could be matched by sex, race and socioeconomic status, and then assigned to two different treatments and followed over time to determine whether the patients respond to the treatments. In these study designs, there are J 2 × 2 tables with one matched pair each. That is, the total sample size for each 2 × 2 table is 2. Even though each 2 × 2 table has only two subjects, assuming the number of matched pairs J is large, the Cochran–Mantel–Haenszel test can be used. In this case, the Cochran–Mantel–Haenszel test reduces to a test specific to matched paris, McNemar's test:

    which has an approximate chi-square distribution with 1 degree of freedom for large J, where n10 is the number of matched pairs in which the case is exposed and the control is unexposed (or the exposed subject is a success and the unexposed subject is a failure) and n01 is the number of matched pairs in which the case is unexposed and the control is exposed (or the exposed subject is a failure and the unexposed subject is a success). An exact test based on the binomial distribution can be used when J (and particularly n10 + n01) is small.

    Matching one case (or exposed individual) to one control (or unexposed individual) is desirable because it maximises the power of the study for a given total sample size. However, when the number of cases is limited but a greater number of controls are available (e.g. in a rare disease setting), study designs matching one case to multiple controls are common. Because the total number of subjects nj++ can vary across partial tables, the Cochran–Mantel–Haenszel test can accommodate an arbitrary number of controls for each case. In addition, conditional logistic regression, which will be discussed in Section 2.6, also accommodates matched designs other than matched pairs and offers many of the advantages of logistic regression to matched studies.

    2.4.3 Example

    McNemar's test can be used to test for an association between birth complications and age of onset of schizophrenia using the data from Table 2.9. For this example, n10 is the number of pairs for which the case with earlier onset schizophrenia experienced birth complications but the control did not and equals 9, and n01 is the number of pairs for which the control experienced birth complications but the case with earlier onset schizophrenia did not and equals 4. If the large-sample test were applied, the test statistic would be χ² = 1.92, and the p-value would be 0.17. Because n10 + n01 is small, exact methods are appropriate in this case, and would result in a p-value of 0.27. Therefore, we would conclude there is no evidence of an association between birth complications and age of onset of schizophrenia from this study.

    2.5 Logistic Regression

    In this section we consider how the relationships in multi-way contingency tables, and more complicated designs, can be explored using regression methods known as logistic regression. Logistic regression is one of the most widely used methods for the analysis of binary data. It is used to examine and describe the relationship between a binary response variable Yi (e.g. 1 = success or 0 = failure) and one or more covariates for i = 1, …, n independent subjects. The covariates can be continuous or categorical (e.g. indicator variables). Denoting the two possible outcomes for Yi by 0 and 1, the probability distribution of the response variable is the Bernoulli distribution with probability of success pi. In common with linear regression, the primary objective of logistic regression is to model the mean of the response variable, given a set of covariates. Recall that with a binary response, the mean of Yi is simply the probability that Yi takes on the value 1, pi. However, what distinguishes logistic regression from linear regression is that the response variable is binary rather than continuous in nature. This has a number of consequences for modelling the mean of the response variable. For ease of exposition, we will first consider the simple case where there is only a single predictor variable, say xi. Generalisations to more than one predictor variable will be considered later.

    Since linear models play such an important and dominant role in applied statistics, it may at first seem natural to assume a linear model relating the mean of Yi to xi

    2.8 2.8

    However, expressing pi as a linear function is problematic since it violates the restriction that probabilities must lie within the range from 0 to 1. As a result, for sufficiently large or small values of xi, the linear model given by Equation 2.8 will yield probabilities outside of the permissible range. A further difficulty with the linear model for the probabilities is that we often expect a nonlinear relationship between pi and xi. For example, a 0.2 unit increase in pi might be considered more ‘extreme’ when pi = 0.1 than when pi = 0.5. In terms of ratios, the change from pi = 0.1 to pi = 0.3 represents a threefold or 200% increase, whereas the change from pi = 0.5 to pi = 0.7 represents only a 40% increase. In a sense, the units of measurement for a probability or proportion are often not considered to be constant over the range from 0 to 1. The linear probability model given by Equation 2.8 simply does not take this into consideration when relating pi to xi.

    To circumvent these problems, a nonlinear transformation is usually applied to pi and the transformed probabilities are related linearly to xi. In particular, a transformation of pi, say g(pi), is chosen so that it maps the range of pi from (0, 1) to (−∞, ∞). Since there are many possible transformations, g(pi), that achieve this goal, this leads to an extensive choice of models that are all of the form

    2.9 2.9

    However, the most commonly used in practice are

    1. Logit or logistic function: g(pi) = log[pi/(1 − pi)]

    2. Probit or inverse normal function: g(pi) = Φ−1(pi), where Φ is the standardised normal cumulative distribution function

    3. Complementary log–log function: g(pi) = log[−log(1 − pi)].

    We note that all of these transformations are very closely related when 0.2 < pi < 0.8, and in a sense only differ in the degree of ‘tail-stretching’ outside of this range. Indeed, for most practical purposes it is not possible to discriminate between a data analysis that is based on, for example, the logit and probit functions. To discriminate empirically between probit and logistic regression would, in general, require very large numbers of observations. However, the logit function does have a number of distinct advantages over the probit and complementary log–log functions which probably account for its more widespread use in practice. Later in this chapter we will consider some of the advantages of the logit or logistic function.

    When the logit or logistic function is adopted, the resulting model

    2.10 2.10

    is known as the logistic regression model. Recall from Section 2.3.1 that if pi is the probability of success, then is the odds of success. Consequently, logistic regression assumes a linear relationship between the log odds of success and xi. Note that this simple model can be expressed equivalently in terms of pi,

    2.11 2.11

    We must emphasise that Equations 2.10 and 2.11 are completely equivalent ways of expressing the logistic regression model. Expression 2.10 describes how the log odds, , has a linear relationship with xi, while expression 2.11 describes how pi has an S-shaped relationship with increasing values of β1xi; although, in general, this relationship is approximately linear within the range 0.2 < pi < 0.8 (see Figure 2.1 for a plot of pi versus xi when β0 = 0.5 and β1 = 0.9). Observe that the expression on the right of (Equation 2.11) cannot yield a value that is either negative or greater than 1. That is, the logistic transformation ensures that the predicted probabilities are restricted to the range from 0 to 1. Finally, note that

    Figure 2.1 Plot of logistic response function.

    2.1

    so that the odds, , is simply exp(β0 + β1xi).

    2.5.1 Interpretation of Logistic Regression Coefficients

    Next we consider the interpretation of the logistic regression coefficients, β0 and β1, in Equation 2.10. In simple linear regression, recall that the interpretation of the slope of the regression is in terms of changes in the mean of Yi for a single unit change in xi. Similarly, the logistic regression slope, β1, in Equation 2.10 has interpretation as the change in the log odds of success for a single unit change in xi. Equivalently, a single unit change in xi increases or decreases the odds of success multiplicatively by a factor of exp(β1). Also, recall that the intercept in simple linear regression has interpretation as the mean value of the response variable when xi is equal to 0. Similarly, the logistic regression intercept β0, has interpretation as the log odds of success when xi = 0. Note that, for case–control studies, the intercept β0 cannot be validly estimated since it is determined by the proportions of ‘successes’ (Y = 1) and ‘failures’ (Y = 0) selected by the study design. However, in many studies, there is far less scientific interest in the intercept than in the slope.

    For the special case where xi is dichotomous, taking values of 0 and 1, the logistic regression slope, β1, has a simple and very attractive interpretation. Consider the two possible values for pi when xi = 0 and xi = 1. Let pi(xi = j), denote the probability of success when xi = j, for j = 0, 1. Then

    which is the log of the OR (or cross-product ratio) in the 2 × 2 table of the cross-classification of Yi and xi (see Table 2.10). Thus, exp(β1) has interpretation as the OR of the response for the two possible values of the covariate.

    The OR has many appealing properties that probably account for the widespread use of logistic regression in many areas of application. First, as was noted earlier, the OR does not change when rows and columns of the 2 × 2 table are interchanged. This implies that it is not necessary to distinguish which variable is the response and predictor variable in order to estimate the OR. Furthermore, as noted in the previous sections, a very appealing feature of the OR, exp(β1), is that it is equally valid regardless of whether the study design is prospective, cross-sectional or retrospective. That is, logistic regression provides an estimate of the same association between Yi and xi in all three study designs. Finally, in psychiatric studies where Yi typically denotes the presence or absence of a disease or disorder, the OR is often interpreted as an approximation to the RR of disease, . When the disease is rare, and pi is reasonably close to 0 in both of the risk groups (often known as the ‘ rare disease’ assumption), the OR provides a close approximation to the RR. Retrospective designs are especially common in psychiatry where the possible outcomes of interest are very rare. Although the RR cannot be estimated from a retrospective study, the OR can be used to provide an approximation to the RR. Extra care is necessary when interpreting the OR as an approximation to the RR in prospective studies. In many prospective studies the binary event is relatively common (say greater than 10%) and the ‘rare disease’ assumption no longer holds; in these settings, the OR can be a very poor and unreliable approximation to the RR and should not be given such an interpretation.

    Table 2.10 Cross-classification probabilities for logistic regression of Y on x

    NumberTable

    2.5.2 Hypothesis Testing and Confidence Intervals for Logistic Regression Parameters

    Often, we are interested in testing for an association between the predictor in our logistic regression model and the outcome, or, equivalently, testing H0 : β1 = 0. As for 2 × 2 table methods, Wald, likelihood ratio and score statistics can be used for this test. A Wald test statistic can be obtained using the result that the estimate of β1 divided by its standard error (s.e.) approximately follows a N(0, 1) distribution in large samples. A LRT statistic can be obtained by comparing the log likelihood for the full model with the predictor included to the log likelihood for a reduced model including only the intercept β0; the former is at least as large as the latter. In large samples, twice the difference between the maximised log likelihoods for the full and reduced models approximately follows a chi-square distribution with 1 degree of freedom.

    Two-sided Wald confidence limits for β1 can be obtained using the result that follows an approximate normal distribution; the confidence limits are given by the formula . Just as we can exponentiate to get an estimate of the OR comparing the odds of disease for a unit change in x1, we can exponentiate the lower and upper limits of the confidence interval for β1 to get a confidence interval for the OR. Estimates of β1 (or, alternatively, its associated OR), its standard error and the log likelihood for the model are available from the output from logistic regression routines from popular statistical software. Test statistics and p-values for tests that β1 = 0 and Wald 95% confidence intervals are often also included automatically. Likelihood ratio and score test statistics can sometimes be requested. Although Wald tests and confidence intervals are standard output from software for fitting logistic regression, we caution the reader that in certain circumstances the performance of Wald tests (and confidence intervals) can be somewhat irregular and lead to misleading conclusions. As a result, we recommend that LRTs (and confidence intervals) be used whenever possible.

    2.5.3 Example: Logistic Regression with a Single Binary Covariate

    We now return to the Table 2.1 data from the first-episode major affective disorders with psychosis study and show that we can obtain identical results using large-sample methods for 2 × 2 contingency tables (as reported in Section 2.3.2) and logistic regression. Recall that our interest is in the association between Axis I comorbidity and 2-year functional recovery in this group of patients. Using logistic regression, we fit the model:

    2.12

    2.12

    where Recoveryi is an indicator variable coded 1 if the ith subject recovered and 0 otherwise, and Comorbidityi is an indicator variable coded 1 if the ith subject had Axis I Comorbidity and 0 otherwise.

    The following are the results:

    unNumberTable

    The estimate of the OR comparing the odds of recovery in patients with and without Axis I comorbidities is exp(−0.7185) = 0.49, and the 95% confidence interval for the OR is exp(−1.3737, −0.0632) = (0.25, 0.94). The Wald test statistic for no association (or, equivalently, for H0 : β1 = 0 or H0 : OR = 1), which appears in the table, is Z = -2.15, with an accompanying p-value of 0.03. We can obtain the LRT statistic for no association by fitting the model with the intercept as the only covariate, which has a log-likelihood of −119.8066, and comparing it to the log likelihood from the model with both the intercept and comorbidity as covariates, −117.4037. The LRT statistic is . The associated p-value, which can be obtained using statistical software or estimated from chi-square distribution tables, is 0.03. These results and their interpretation are identical to those obtained using methods for 2 × 2 contingency tables and reported in Section 2.3.2.

    2.5.4 Multiple Logistic Regression

    So far, we have only considered the simple case where there is a single covariate xi. Next, we consider the extensions of Equations 2.10 and 2.11 to the case where there are two or more covariates. Recall that, in Section 2.4, we applied methods for stratified contingency tables to the first-episode major affective disorders with psychosis study data to test that the OR comparing patients with and without comorbidities adjusted for sex equals 1. Methods for stratified contingency tables are useful when adjusting for a small number of categorical covariates. However, multiple logistic regression has important advantages over stratified contingency table methods when the number of categorical covariates is larger or when we want to adjust for quantitative covariates. For example, using the first-episode data, we may want to test that the OR adjusted for both sex and age equals 1 and to obtain an estimate of the adjusted OR without classifying age into arbitrary categories.

    When there are many covariates, the logistic regression model becomes

    2.13 2.13

    where xi1, xi2, …, xiK are the K covariates. The logistic regression coefficients in Equation 2.13 have the following interpretations. The logistic regression intercept, β0, now has interpretation as the log odds of success when all covariates equal 0, that is when xi1 = xi2 = ⋯ = xiK = 0. Each of the logistic regression slopes, βk (for k = 1, …, K), has interpretation as the change in the log odds of success for a single unit change in xik given that all of the other covariates remain constant.

    Note that the appealing property of logistic regression that the same OR can be estimated from either a prospective or retrospective study design readily generalises when xik is quantitative rather than dichotomous, and also when there are two or more predictor variables. Methods for hypothesis testing and constructing confidence intervals also generalise easily from the predictor in a simple logistic regression model (β1) to a predictor in a multiple logistic regression model βk. Expressions for Wald test statistics and confidence intervals for (βk) can be obtained by substituting βk for β1 in the relevant portions of Section 2.5.2. LRTs of βk = 0 can be constructed by comparing the fit of the full model with βk included to the fit of a reduced model with all covariates except βk included. Twice the difference between the maximised log likelihood for the full model and the maximised log likelihood for the reduced model still approximately follows a chi-square distribution with one degree of freedom.

    2.5.5 Example: Multiple Logistic Regression

    To obtain an estimate of the OR for comorbidity adjusted for sex and age and to test that the adjusted OR equals one, we fit the following multiple logistic regression model to the first-episode major affective disorders with psychosis data:

    2.14

    2.14

    where Malei is an indicator variable coded 1 if the ith subject is male and 0 if the ith subject is female and Agei is the age of the ith subject in decades. The following results are obtained:

    unNumberTable

    The estimate of the OR for comorbidity adjusted for sex and age is exp(−0.4845) = 0.62, and its 95% confidence interval is exp(−1.1697, 0.2008) = (0.31, 1.22). Holding sex and age constant, we estimate that the odds of two-year functional recovery is 38% lower for patients with Axis I comorbidity when compared to patients without Axis I comorbidity. However, note from the 95% confidence interval that our data are consistent with odds of recovery up to 22% higher for patients with Axis I comorbidity. In addition, the Wald test statistic for testing that the adjusted OR equals one is Z = − 1.39 with an associated p-value of 0.17, and the LRT statistic is with an associated p-value of 0.16. Using either test we conclude there is no association between Axis I comorbidity and 2-year functional recovery after adjusting for sex and age.

    We can also use the results from the multiple logistic regression to obtain estimates and test statistics for the other covariates in the model. The estimated OR comparing odds of recovery in males versus females is 1.00 (95% confidence interval: 0.53, 1.90), and we conclude from the Wald test that there is no evidence of an association between sex and recovery after adjusting for Axis I comorbidity and age (Z = 0.01, p = 0.99). On the other hand, the estimated OR comparing odds of recovery for a 10-year age increase is 1.36 (95% confidence interval: 1.10, 1.69). Adjusting for Axis I comorbidity and sex, the odds of two-year functional recovery increases with age (Z = 2.84, p = 0.004); for every decade age increase, we estimate that the odds of recovery is 36% higher.

    2.5.6 Categorical Predictors with More than Two Levels in Logistic Regression

    Section 2.3.3 presented contingency table methods that could be used to test for independence with predictors or outcomes with more than two categories. This section describes how logistic regression accommodates predictors with more than two categories, either with or without adjustment for additional covariates. (A later section describes extensions of logistic regression that accommodate outcomes with more than two categories.) For K unordered categories, a test for independence can be obtained by adding K − 1 indicator or ‘dummy’ variables as covariates in the regression, where the kth indicator variable is coded 1 for subjects in the kth category and 0 for all other subjects (so that subjects in the remaining ‘reference’ category are coded 0 for all K − 1 indicator variables). A LRT for no association can be conducted by comparing the log likelihood for the model containing the predictor to the log likelihood for the model with the K − 1 indicator variables corresponding to the predictor removed; the LRT statistic follows a chi-square distribution with K − 1 degrees of freedom. Wald and score hypothesis tests are also available. However, when a predictor has three or more categories, the Wald test of no association is sometimes not available from standard logistic regression output and must be requested.

    For ordered categories, a test for independence can be conducted by assigning scores to each level of the predictor and then using the score as a covariate in the regression model. For example, the scores 1, 2 and 3 could be assigned to the categories mild, moderate and severe. The Z statistic for the covariate then corresponds to a test for no association, and interpretation of the corresponding regression parameter is similar to the interpretation of a regression parameter for a quantitative predictor. For example, the OR for the severity predictor would compare the odds of the outcome for a one category increase in severity, either moderate versus mild or severe versus moderate. This approach is most appropriate when the association between the score and outcome is approximately linear.

    2.5.7 Example: Logistic Regression with a Three-level Predictor

    In Section 2.3.4, we performed tests for independence between type of onset of first-episode affective disorder with psychosis (categorised as chronic, subacute or acute) and 2-year functional recovery. Equivalent tests can be performed using logistic regression by fitting the model:

    2.15

    2.15

    where Subacutei is an indicator variable coded 1 if the ith subject had subacute onset and 0 otherwise, Acutei is an indicator variable coded 1 if the ith subject had acute onset and 0 otherwise, and chronic onset is the reference category. The following results are obtained:

    unNumberTable

    Exponentiating the parameters for subacute and acute onset provides estimates of the ORs comparing odds of recovery for subacute and acute onset respectively versus chronic onset, and the Z statistics for these two parameters are for separate tests that these ORs are equal to 1. However, our primary interest is in the overall test for independence between onset and recovery. The log likelihood for this model is −113.421, and the log likelihood for the model with an intercept only is −113.565. The resulting LRT statistic for independence is (i.e. twice the difference in log likelihoods), with an associated p-value of 0.87. These results are identical to the LRT results from Section 2.3.4, and our conclusions are the same; that is, there is no association between type of onset and 2-year functional recovery. In this case, the Wald test statistic is also

    with a p-value of 0.87.

    2.5.8 Interactions in Logistic Regression

    In Section 2.4, we introduced a test for interaction using methods for contingency tables. Recall that an interaction between two predictor variables is present when the OR for one predictor differs according to the value of the other predictor. For example, for the data from Table 2.7, we would state that there is an interaction between comorbidity and sex if the OR comparing odds of recovery with and without comorbidity differs between males and females. We can allow for interaction in logistic regression models by multiplying the covariates for the predictors involved in the interaction and adding them as additional covariates to the regression model. For example, to test for an interaction between comorbidity and sex, we would use the model:

    2.16

    2.16

    For this model, exp(β1) is the OR comparing odds of recovery in female patients with and without comorbidity, and exp(β1 + β3) is the OR comparing odds of recovery in male patients with and without comorbidity. These two ORs are equal if and only if β3 = 0; therefore, the test of H0 : β3 = 0 is a test of no interaction between comorbidity and sex. Using logistic regression, tests for interaction are also straightforward for quantitative predictors, categorical predictors with more than two levels, and interactions among more than two predictors.

    2.5.9 Example: Logistic Regression with Interaction

    Fitting the model from Equation 2.16 to the data from Table 2.7 results in the following output:

    unNumberTable

    The estimated OR comparing odds of 2-year functional recovery for patients with and without Axis I comorbidity is exp(−0.4881) = 0.61 for females and exp(−0.4881 −0.3838) = 0.42 for males. Note that we can calculate a confidence interval for the OR for females but not for males from the information in the output; because the OR for males is calculated by summing two parameter estimates, the covariance between the two parameter estimates would be required in order to calculate the confidence interval. The Wald test statistic for no interaction is Z = − 0.57, with an associated p-value of 0.57. The LRT statistic (obtained by comparing the log-likelihood for this model to the log-likelihood for the model with covariates for comorbidity and sex but not their interaction) is 0.32 with an associated p-value of 0.57. We conclude that there is no interaction between Axis I comorbidity and sex; that is, the OR comparing the odds of functional recovery for patients with and without Axis I comorbidity is the same for males and females. These results agree with the Breslow–Day test results from Section 2.4.1.

    2.5.10 Goodness-of-fit

    When a multiple logistic regression model has been used to draw conclusions from a study, we should check the fit of the model to the study data. One way to check the fit of a model is to use statistical tests for goodness-of-fit; in the absence of significant evidence of poor fit from these test statistics, we conclude the fit of our model is adequate. The deviance (based on the likelihood ratio statistic) or the Pearson chi-square can be used as a goodness-of-fit statistic if, at each observed covariate pattern, the data can be grouped. That is, if there are ni subjects with the same covariate values (and hence the same Bernoulli distribution), they can be treated as a binomial sample and a test of goodness-of-fit can be based on the comparison of the observed and expected (or predicted) counts in each covariate pattern. Alternatively, if the covariates are quantitative rather than categorical, Hosmer and Lemeshow [10] proposed a goodness-of-fit statistic similar to the Pearson chi-square, which can be calculated after grouping individuals on the basis of having similar values of the predicted probability Evidence of poor fit can reflect a variety of problems with our model, such as an inappropriate choice of transformation function, failure to include important interaction terms, or inappropriate assumption of linearity for quantitative or ordered categorical covariates, and is an indication that we should revisit the assumptions made during the modelling process.

    2.6 Advanced Topics

    In this section we briefly review a number of more advanced topics that can be considered extensions of the standard logistic regression model. Many of these methods have been somewhat slow to move into the mainstream of

    Enjoying the preview?
    Page 1 of 1