Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Design and Analysis of Experiments in the Health Sciences
Design and Analysis of Experiments in the Health Sciences
Design and Analysis of Experiments in the Health Sciences
Ebook440 pages4 hours

Design and Analysis of Experiments in the Health Sciences

Rating: 0 out of 5 stars

()

Read preview

About this ebook

An accessible and practical approach to the design and analysis of experiments in the health sciences

Design and Analysis of Experiments in the Health Sciences provides a balanced presentation of design and analysis issues relating to data in the health sciences and emphasizes new research areas, the crucial topic of clinical trials, and state-of-the- art applications.

Advancing the idea that design drives analysis and analysis reveals the design, the book clearly explains how to apply design and analysis principles in animal, human, and laboratory experiments while illustrating topics with applications and examples from randomized clinical trials and the modern topic of microarrays. The authors outline the following five types of designs that form the basis of most experimental structures:

  • Completely randomized designs
  • Randomized block designs
  • Factorial designs
  • Multilevel experiments
  • Repeated measures designs

A related website features a wealth of data sets that are used throughout the book, allowing readers to work hands-on with the material. In addition, an extensive bibliography outlines additional resources for further study of the presented topics.

Requiring only a basic background in statistics, Design and Analysis of Experiments in the Health Sciences is an excellent book for introductory courses on experimental design and analysis at the graduate level. The book also serves as a valuable resource for researchers in medicine, dentistry, nursing, epidemiology, statistical genetics, and public health.

LanguageEnglish
PublisherWiley
Release dateJun 7, 2012
ISBN9781118279717
Design and Analysis of Experiments in the Health Sciences

Related to Design and Analysis of Experiments in the Health Sciences

Related ebooks

Medical For You

View More

Related articles

Reviews for Design and Analysis of Experiments in the Health Sciences

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Design and Analysis of Experiments in the Health Sciences - Gerald van Belle

    GvB: For West African Vocational Schools (WAVS)

    KK: For Alex and Eve

    Preface

    Why another book on the design and analysis of experiments? There are many design books with engineering or agricultural applications, but there are few books with a focus on the health sciences. The focus of this book is laboratory, animal, and human experiments and scientific investigations in the health sciences. More specifically, we sought to incorporate some newer research areas such as microarrays into the broad context of design. Finally, it is our opinion that clinical trials are a crucial topic to cover in a design book for health scientists. Hence this book.

    The principles of design and analysis have been enunciated for many years (Fisher, 1925, 1971). It is the application of these principles to research in the health sciences that forms the content of this book. We illustrate the principles with examples from a very diverse set of areas from within the health sciences. Most examples are studies involving humans and animals.

    There is a close linkage between design and analysis. Design drives the analysis, and analysis reveals the design. However, the tie is not one-to-one. Alternative analyses are available for a specific design and vice versa. Many books on design stress the analysis. This book attempts to balance aspects of design and analysis.

    This book presupposes an introduction to basic statistical concepts: the two laws of probability (addition and multiplication), t-tests (for both independent and paired data), simple linear regression analysis including a test for significance of the regression coefficient, hypothesis testing, and estimation. It assumes that you have seen the formula for sample size for comparing the means of two groups. Hence, you know what is meant by a Type I error, Type II error, power, and one-sided versus two-sided hypothesis.

    Chapter 1 discusses basic principles that provide a coherent structure for carrying out experiments. A thorough understanding and conscientious application of these principles will pay off in terms of validity of inferences, economy of study, and generalizability. Chapters 2–6 discuss five types of designs—and simple extensions—that form the basis for most experimental structures. We chose these types of designs because they cover the majority research designs in the health sciences. They consist of completely randomized, randomized block (including Latin squares and incomplete blocks), factorial, multilevel, and repeated measures designs.

    Each of the designs in Chapters 2–6 is discussed under the following headings:

    1. Randomization

    2. Hypotheses and sample size

    3. Estimation and analysis

    4. Example

    5. Discussion and extensions

    6. Notes

    7. Summary

    8. Problems

    Chapters 7 and 8 represent specific applications and illustrations of the above designs to randomized clinical trials and microarrays.

    You may notice that journals such as Science and Nature containing reports of many experiments do not refer much to the concepts enunciated in this book. This is somewhat unfortunate, because good scientists will follow the principles presented in this book. However, the presentations in these journals are highly condensed. Only key results are presented, with several years work often summarized by one table or one figure.

    A bit of historical context is useful in understanding the design of experiments. It is not just a collection of methods coming down from heaven like the statue of Athena. In the 1930s, the computational effort was a real stumbling block and an important research area was finding shortcut ways to the analysis. The computational burden is no longer a problem today. There is an ongoing interplay between statistical methodology, computational resources, and societal and scientific interests, each helping to propel advances in the others. For example, the increasing emphasis on clinical trials starting in the 1960s led to the development of survival analysis, database management, and appropriate computational procedures.

    The website for this book (vanbelle.org) is freely accessible and acts as a supplement. It contains most of the data sets used in the text, frequently in a format that can be easily imported into most statistical software. The publisher, John Wiley & Sons, has graciously allowed us to post Chapter 7, Randomized Clinical Trials; this chapter can be downloaded for free. The hope is, of course, that you'll be intrigued enough by that one chapter to buy the book, and find the book a useful resource.

    On vanbelle.org you can also find the web pages for the book Statistical Rules of Thumb by Gerald van Belle. Chapter 2 of that book dealing with sample size calculations can be downloaded (also with permission of our publisher).

    All data analyses today involve computers, and hence, computer packages. These packages are changed constantly (updated) and new packages are introduced. We have decided to extract the essence of the computer analysis and present this in the text. Almost all of the analyses were run in Stata® or R. The website is intended to be dynamic. For example, some of you will rerun an analysis using your preferred statistical package. If you send it to the website, it will be posted under the appropriate statistical package heading.

    We are indebted to many colleagues: Larissa Stanberry for help with graphics, Latex, and formatting; Corinna Mar for creative and graceful implementation of graphics; Art Peterson for helpful discussions about clinical trials; Theo Bammler, Dick Beyer, and Emily Hansen for feedback and ideas about the microarray chapter; Sandra Coke for producing some of the graphics; and the many journals and authors that allowed us to use data from their publications. Of course, we are responsible for content—especially the errors. We are also indebted to our editor, Susanne Steitz-Filler, for her patience as we extended our deadlines.

    Books generate royalties. All the royalties from this book will be distributed to charitable organizations as follows. Gerald van Belle's share is assigned to West African Vocational Schools (WAVS) (wavschools.org) that works in Guinea-Bissau, one of the poorest nations in the world. Kathleen Kerr's share is dedicated to Northwest Harvest and the Seattle Public Library Foundation.

    Gerald van Belle

    Kathleen F. Kerr

    Chapter 1

    The Basics

    In this chapter, we place the design and analysis of experiments in the health sciences in its scientific context, discuss principles, and enumerate additional considerations such as assignment of experimental conditions to experimental units and sample size considerations.

    1.1 Four Basic Questions

    In his book Science and the Modern World, Whitehead (1925) aptly described the scientific mentality as a vehement and passionate interest in the relation of general principles to irreducible and stubborn facts. There is a constant interplay between the formulation of the general principles and the stubborn facts. The following quotation from Science under a picture of a mouse embryo illustrates this interplay:

    A mouse embryo at 9 days of gestation. . . . Understanding the basis for organ development can provide insights into disease and stem cell programming.

    (Science, 2008)

    The general principles in this case refer to insights into disease and stem cell programming. The stubborn facts deal with specific and measurable observations of the mouse embryo. Statistics—as a component of the sciences—can be characterized as a vehement and passionate interest in the relation of general principles of variation and causation to observed associations. This definition includes causation as a principal interest of statistics, not just variation. Particularly in experimental design and analysis, the key question of interest almost always is one of causation. In fact, the principle of randomization as introduced by R.A. Fisher in the last century is the centerpiece of the scientific enterprise of showing cause and effect in the face of substantial and irreducible variation. Statisticians are particularly good at dealing with variation: they have learned how to describe it, how to manage it, how to induce it, and, perhaps surprisingly, how to take advantage of it. This text will illustrate these points over and over again.

    In many sciences, particularly the biological sciences, four basic questions are addressed:

    1. What is the question?

    2. Is it measurable?

    3. Where will you get the data?

    4. What do you think the data are telling you?

    1. What Is the Question?

    Why is the water in the kettle boiling? One possible answer, The flame is making the molecules of water move faster and faster so that they can break the surface tension of the water and begin to escape. Another possible answer (given perhaps by R.A. Fisher), To make tea for a lady. The first answer deals with efficient cause. The second answer with final cause. Science—and statistics—deals primarily with efficient causes, not final causes.

    The context of the question is as important as the question itself. A Monty Python observation is relevant, If you get them to ask the wrong question, you don't have to worry about the answer.

    Often the context of the question is assumed and unstated, as in the boiling water question above. A great deal of humor is based on one assumed context and a revealed context as the punch line of a joke. This may be funny on a late night show but can be fatal to a research question. For any scientific question, the context must be explicit. For example, in assessing mathematical skills it is necessary to specify the population to be assessed: fifth graders or community college students?

    Even more daunting than the context is the form of a question. Social scientists are very much aware of this. But the form is every bit as crucial in the laboratory sciences. The question is frequently formulated in terms of what is measurable; this may or may not address the issue at hand.

    2. Is It Measurable?

    Efficient causes have the potential of being measurable. In the example of the water boiling in the kettle, we can measure the heat supplied by the flame, the average velocity of the molecules, and, perhaps more important, the variation in the molecular velocities.

    Asking a measurable question can be very challenging for two reasons. First, the question needs to be specific enough so that measurements can be made. Second, the formulation of the question implicitly defines the research area to be considered. The question puts a fence around the mystery. It says, the mystery is here, not there. For example, the question are current lead levels safe? deals with a potentially toxic exposure. To make the question measurable requires a host of considerations such as population(s) of interest, specification of nonsafety, assessment of levels in the environment, and specification of lead level in the body. The study of this type of question is part of the field of toxicology, which may try to assess some aspects of toxicity in animals and other aspects in humans. This example also illustrates the societal importance of the question; the U.S. Environmental Protection Agency uses the scientific evidence to set environmental policy.

    An example of a nonmeasurable question—and very pertinent to this book—Is it ethical to do experiments on animals? Most toxicologists would argue that it is. In this book, we are using data from animal experiments and therefore, implicitly, agree that it is ethical. A challenging question might be,is it ethical to use animal data in this book while holding that it is unethical to do animal experiments? Once the ethical question is answered in the affirmative, many measurable aspects of animal experiments come up under the rubric of Good Laboratory Practice. This might include measuring the temperature at which animals are housed. Einstein said, Not everything that counts is countable, and not everything that is countable counts. It could even be argued that the things that really count are not countable!

    The social sciences provide another example of issues in measurability. There has been a 100-year debate about the existence of intelligence. Common language use suggests that there is (e.g.,I thought you were more intelligent than that. . .). Spearman in 1904 argued for such a (latent) trait on the basis of the structure of a correlation matrix.

    As another example, Canadian health data do not have reference to race or national origin. The primary reason is that there is no standard acceptable definition. In other words, it is considered very difficult to measure this concept. The question has been raised whether the concept of race is a biological concept or a social concept.

    3. Where Will You Get the Data?

    Getting the data involves two steps. First, selecting the objects to be measured; second, specifying the measurements that are to be made. This, inevitably, involves a tremendous reduction of the universe of discourse. With respect to both the objects selected and the measurements made, there is the dilemma of this, not that. We cannot measure everything.

    Implementing and accounting for the selection process is a precondition for valid experimental inference. For example, in ergonomic studies of proper lifting procedures, subjects must be selected and measurements made at specific times. Ideally, the subjects are representative of the working population or the population of interest. This is not true most of the time with many subjects being college-age students eager to make a few extra dollars. The experiment may be carried out impeccably but the question of generalizability to the population of interest still needs to be addressed.

    The process of the selection of experimental units is often not addressed. One reason is that control treatments are included so that the assessment of the treatment effect is comparative. The underlying assumption of this argument is that there is no interaction with biased selection of experimental units. In the above example, a proper lifting procedure may be compared with an improper one in terms of muscle fatigue or muscle strain. If college-age students are used for this experiment, then the assumption is that the comparative results apply to middle-age postal workers as well. This is an implicit assumption—usually only acknowledged in the discussion section of the paper reporting the results.

    4. What Do You Think the Data Are Telling You?

    The statistical analysis addresses the fourth question. Statistical analysis involves a further reduction of the data, usually according to some statistical model. Most of the data in this book will be modeled, or approximated, by some kind of linear model. A simple linear model consists of

    (1.1)

    equation

    The outcome of the experiment is considered to consist of a fixed part, the population means associated with treatments indexed by the subscript i, and a random part, the residual variation indexed by the jth observation within the ith treatment. For example, in the independent sample t-test, there are two means, μ1 and μ2. Under the null hypothesis, μ1 = μ2. Often, the residual variation is assumed to be normally distributed with mean 0 and variance σ². The assumptions and assessment of their validity will be discussed in more detail below. Implicit in the outcome model in equation 1.1 is that the error is added to the population means.

    1.2 Variation

    Whitehead (1925) also noted that events do not recur in exact detail. As he put it, No two days are identical, no two winters. What has gone, has gone forever. He further noted that in the face of variation observation is selection. Again, because we have observed this we cannot observe that.

    In a coincidence of history, R.A. Fisher published a book in 1925 that addressed the issue of selection (Fisher, 1925). Fisher's book dealt with how to manage this variation and, in fact, how to use it to the researcher's advantage. Fisher distinguished between variation controlled by the investigator and variation induced by the investigator.

    In a second book, The Design of Experiments, which appeared in 1935, R.A. Fisher dealt more specifically with the control of variation, the key to experimental design (Fisher, 1971). This book is peppered with references to principles of experimental design. It is somewhat challenging to discover what these principles actually are. Broadly speaking, the principles deal with replication, randomization, and the control of variation. These principles will be discussed in the next section.

    The principles that Fisher enunciated are still valid today. The specific application may differ: for example, clinical trials and microarray experiments. But the principles are very much applicable.

    The principles are like the axioms in a mathematical system: a particular experiment is based on these principles. Sometimes it may be difficult to see how the experiment—both design and analysis—was based on the principles, or perhaps the judgment is that the experiment is not consistent with the principles. This requires experience and understanding of both the experiment and the subject area.

    Since Fisher developed the design of experiments in an agricultural setting, many of the terms have an agricultural flavor. Terms such as plot and block refer to a section of land with the block consisting of several plots. In this terminology, the experimental unit was usually the plot, that is, the basic unit to which treatments were applied. In laboratory settings, the experimental unit could be an animal, a human subject, a sample of blood, or an observation of a subject at a particular time.

    More complicated situations arise with multilevel experiments. An example is subjects randomly assigned to, say, three types of noise (between-subject levels) and each subject is assessed using four different types of noise protective equipment (within-subject levels). The endpoint of interest could be the ability to hear specific tones under these experimental conditions.

    1.3 Principles of Design and Analysis

    In this section, we introduce five principles for the design and analysis of experiments. All experiments, especially in the health sciences dealing with living material, need to consider these principles. Sometimes they can be ignored but this will require justification.

    1: Replicate to Measure Variability That Is Not Controlled

    A replicate is a repeated observation under constant experimental conditions. Replicates could be identical plots of land, subjects under constant treatment conditions, or a blood sample split into a number of aliquots. Replication is important for two reasons. First, it allows the assessment of variability not under the control of the investigator—sometimes called error or residual variation. This, typically, is the inherent variability of the experimental situation: the residual variation in equation 1.1. Second, replication is associated with the precision of the experiment. The larger the number of replicates, the more precisely the effects of the experimental conditions can be estimated (or smaller effects can be detected). The precise relationship will be discussed in the next chapter.

    Replication can occur at several levels. For example, in dentistry, molars within the mouth could be treated as replicates, male and female patients as replicates within sex, and dental clinics within a state as another category of replicates. This leads to statistical models called hierarchical or multilevel. The analysis of data from such designs will be discussed briefly in this book.

    We need to distinguish between replication and pseudoreplication illustrated best by a simple distinction: weighing each of 10 rabbits once, and weighing one rabbit 10 times. The distinction is tied into the inference that is to be made. If the inference is to a population of rabbits, then weighing one rabbit 10 times clearly gives limited information. On the other hand, if the purpose is to assess the precision of a new scale, then the repeated weighings on the same rabbit may be quite informative (but, why use rabbits?).

    Replication is difficult to achieve, as Whitehead already noted. In practice, we can only approximate constant experimental conditions.

    2: Randomly Assign Experimental Units to Treatments

    Randomization refers to the method of assignment of experimental units to groups such as treatment groups. It can be simply drawing slips of paper labeled with experimental units out of a lunch bag and assigning the first half of the slips drawn to one experimental condition and the remaining half to another experimental condition. More elaborately, it can be the assignment of subjects to treatments in a clinical trial with various constraints such as balancing assignments within treatment centers.

    Randomization has two virtues. First, it is a probabilistic procedure and provides the validity of the statistical analyses involving tests of significance. Second, by randomization, systematic effects tend to be assigned equally to treatments and become part of the within-treatment variability. For example, if the experimental units are animals, then by randomization the average weight of animals will tend to be equal in the experimental groups. These are large sample virtues and may not occur in experiments with small sample sizes, and some additional steps may have to be taken. This will be discussed in Section 1.7.

    3. Reduce Variability Through Design

    Randomization is associated with the estimation of the error term, or residual variability. This variability may be unacceptably large and the issue comes up whether this variability can be reduced. The answer is, yes, and there are two general ways of doing this. One is by blocking—a term from agricultural research. For example, there may be a fertility gradient in the land with areas near a river more fertile than those farther away (think Nile river). If plots of land are randomly assigned to different treatments, then the variability in fertility becomes part of the error term. We could create strips (blocks) of land parallel to the river and assign treatments randomly to plots within a strip; then, variability among these plots within a block is not due to fertility.

    The second approach is to measure additional variables (covariates) at the start of the study that are associated with the quantity to be measured and adjustment made in the analysis. An example from ergonomics is that subjects vary in their tolerance to stress in using a computer keyboard. If it is desired to compare five different types of keyboards with respect to fatigue induction, then one possible design is to assign subjects randomly to one of the keyboards. In this case, variability among subjects becomes part of the error term. If fatigue is related to factors known before the experiment is designed, for example, size of hand and/or age, then we could measure these variables before the experiment is started and use them as covariates in the analysis. This is commonly done through a regression model called analysis of covariance.

    Some of the variability in fatigue is due to age, so age could be introduced as a covariate and the fatigue measurements compared for the same age. This is commonly done through a regression model called analysis of covariance.

    4: Aim for a Balanced Design

    The concept of balance is crucial to good design. For example, suppose that the toxicity of four industrial solvents at three concentrations is investigated, producing 12 experimental conditions. If 60 animals are to be used, then the requirement of balance would stipulate that 5 (randomly selected) animals be assigned to a group. This experiment is an example of a factorial experiment, where the two factors are solvents and concentrations. Balance becomes particularly important when the design is complicated. For simple designs such as those discussed in the next chapter, balance is less of an issue.

    A balanced experiment has two advantages when it comes to the analysis. First, each experimental combination is estimated with the same precision. Second, in a balanced factorial experiment, the effect of each factor can be assessed independently of the other factors, and

    Enjoying the preview?
    Page 1 of 1