Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Statistics and Probability in Forensic Anthropology
Statistics and Probability in Forensic Anthropology
Statistics and Probability in Forensic Anthropology
Ebook793 pages8 hours

Statistics and Probability in Forensic Anthropology

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Statistics and Probability in Forensic Anthropology provides a practical guide for forensic scientists, primarily anthropologists and pathologists, on how to design studies, how to choose and apply statistical approaches, and how to interpret statistical outcomes in the forensic practice. As with other forensic, medical and biological disciplines, statistics have become increasingly important in forensic anthropology and legal medicine, but there is not a single book, which specifically addresses the needs of forensic anthropologists in relation to the research undertaken in the field and the interpretation of research outcomes and case findings within the setting of legal proceedings.

The book includes the application of both frequentist and Bayesian statistics in relation to topics relevant for the research and the interpretation of findings in forensic anthropology, as well as general chapters on study design and statistical approaches addressing measurement errors and reliability. Scientific terminology understandable to students and advanced practitioners of forensic anthropology, pathology and related disciplines is used throughout. Additionally, Statistics and Probability in Forensic Anthropology facilitates sufficient understanding of the statistical procedures and data interpretation based on statistical outcomes and models, which helps the reader confidently present their work within the forensic context, either in the form of case reports for legal purposes or as research publications for the scientific community.

  • Contains the application of both frequentist and Bayesian statistics in relation to topics relevant for forensic anthropology research and the interpretation of findings
  • Provides examples of study designs and their statistical solutions, partly following the layout of scientific manuscripts on common topics in the field
  • Includes scientific terminology understandable to students and advanced practitioners of forensic anthropology, legal medicine and related disciplines
LanguageEnglish
Release dateJul 28, 2020
ISBN9780128157657
Statistics and Probability in Forensic Anthropology

Related to Statistics and Probability in Forensic Anthropology

Related ebooks

Personal & Practical Guides For You

View More

Related articles

Reviews for Statistics and Probability in Forensic Anthropology

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Statistics and Probability in Forensic Anthropology - Zuzana Obertová

    work.

    Introduction

    The world of statistics is enormous, and one can easily get lost in the numerous definitions, equations, and different approaches. There are many books on biomedical statistics/biostatistics, some specifically on statistics for forensic sciences (mostly covering topics not related to forensic anthropology) and a handful on statistics for anthropology (mostly related to research in cultural anthropology or physical anthropology). However, most of these books bring the statistical theory and definitions to the forefront, often showing complex calculations and only presenting a few examples. While understanding the mathematics behind statistical analyses is no doubt beneficial, probably only a few students and practitioners have the time (and willingness) to immerse themselves in the mathematical formulae. With this book, we intend to provide a practical guide for forensic scientists, mainly anthropologists and pathologists, on how to apply, interpret, and present statistical analyses in scientific publications and in forensic practice. Mostly the statistical concepts are presented in the context of particular research questions in forensic anthropology. The level of complexity of statistical approaches presented varies from basic descriptive statistics to advanced Bayesian frameworks, so it is our hope that each reader can learn something new.

    The book is divided into seven chapters, each with one or more contributions. Chapter 1 includes four contributions, starting off with an overview of what statistical questions forensic anthropologists may face in their cases. The other three contributions discuss the fundamental aspects of research underlying the forensic examination and evaluation of anthropological information, including study design and sampling; the importance of reference data, such as identified skeletal collections or virtual imaging collections; and assessment of sources of error. Chapter 2 includes three contributions on method selection, including advanced techniques such as data mining and decision trees and the types of data, probability distributions, and statistical modeling forensic anthropologists encounter in their work. The two contributions of Chapter 3 reflect on the frequentist and Bayesian approach to anthropological data analysis and interpretation, respectively. Chapter 4 is the longest with six contributions concerning the four components of the biological profile—sex, age, ancestry, and stature. The contributions discuss how to obtain and interpret population data to estimate the variables of the biological profile in forensic cases and how to present the (statistical) outcomes of anthropological analyses. In Chapter 5 advanced methodological and software solutions for variables of the biological profile are introduced in three contributions. Chapter 6 consists of two contributions on how to combine, evaluate, and report anthropological evidence in cases of personal identification by using logic and Bayesian inference and one contribution on visual identification of persons on images illustrating the use of verbal scales in a field, where statistical and probabilistic framework is still in development. Chapter 7 consists of four short contributions describing common statistical software—SPSS, STATA, SAS, and R. Most of us cannot imagine how we would apply statistics without one of these software packages. The contributions discuss the possibilities (and the limitations) of each of these programs, including short step-by-step guidelines how to deal with common statistical issues in forensic anthropology.

    The contributors to this book are mostly statisticians and forensic anthropologists/forensic pathologists working together, but some are 2-in-1s, forensic anthropologists/pathologists with exceptional understanding of statistics. If you are reading this book, you are likely somehow involved in forensic anthropology or related forensic disciplines. What we hope for is that after reading this book, you will also become more involved or interested in statistics. Some of you may even be encouraged to become 2-in-1s. In any case, we would like to hear from you: FAstatsbook@gmail.com if you have any comments, questions, or suggestions regarding the contents of this book.

    Section 1

    Study design, data collection and initial assessment of data

    Chapter 1.1: What statistical questions can we expect from judges? An introductory note from a European adversarial system

    Cristina Cattaneo    Professor, LABANOF, Sezione di Medicina Legale, Dipartimento di Scienze Biomediche per la Salute, Università degli Studi di Milano, Milano, Italy

    Abstract

    The use of statistics in forensic science is far from perfection. Judges however in European adversarial systems are more and more likely to barricade themselves, in the motivations of their sentences, behind the security of a statistical operation. This is quite understandable, and it has forced forensic scientists into the obligation of trying to stick numbers onto all answers—or at least many feel this pressure.

    Keywords

    Identification; Judiciary; DNA; Odontology; Forensic anthropology

    The application of statistics in forensic science is still far from perfection. Judges however in European adversarial systems are more and more likely to barricade themselves, in the motivations of their sentences, behind the security of a statistical operation. This is quite understandable, and it has forced forensic scientists into the obligation of trying to stick numbers onto all answers—or at least many feel this pressure.

    Usually, for anthropologists or forensic pathologists, the question of the use of statistical resources comes up with issues related to positive identification or concerning the completion of the biological profile with information such as sex, age, and ancestry, hence the questions posed at a hearing, for example, can be distorted and complex.

    This brief note has the mere purpose of sharing with the reader, as an introductory message, legal situations, and questions of judges and prosecutors, be they insightful or not, which commonly arise, and where a statistical response, which sometimes does not exist (or at least does not exist yet) is requested.

    Every forensic scientist is well aware of the diatribe, among experts in identification for example, between those who claim to want to quantify any type of response regardless of the method and those who do not. Many odontologists declare that morphological uniqueness is intuitive and evident and cannot or should not be quantified, which could seem tautological and autoreferential, but this attitude has been accepted for several decades. In fact, one wonders why among the three traditional primary identifiers of Interpol and the DVI system, one encompasses fingerprinting, which identifies through numbers of minutiae; genetics that has as its intrinsic characteristic the expression of identity through statistically significant allele frequencies; and odontology that has no way of expressing its results at the moment with reliable algorithms that lead to quantification of the possibility of error. In many ways, these issues still remain unsolved.

    The current emergency related to the identification of dead migrants across the world, for example, is forcing us to face this issue, because of the need to identify people who present great difficulty at being identified with so-called primary identifiers. The need to identify people from poor countries who die undocumented and who have moved away from home years ago makes it almost impossible to obtain clinical, dental, and often even antemortem fingerprints. And frequently the relatives who come forward to provide biological material for a genetic comparison are brothers or half-brothers at best. In addition, many times, the absence of allele frequencies for these populations makes it difficult to perform proper statistics. So we find ourselves in a situation where it is not possible to identify with genetics and it may be necessary to do it with morphological or anthropological methods (for instance through personal descriptors or shape of the face) that may not provide a known error.

    The same issues arise when we need to identify faces from video surveillance systems. In these cases, it is necessary to compare a face visible in a video or in a photo with that of a suspect. There are many ways of proceeding, through mere comparison or superimposition, but even in these cases, it is more and more frequent to be asked by judges if we can quantify the probability that this face belongs to another person and not to the suspect. And even here, inexistent statistics seem crucial.

    How much is enough to identify? And how can one translate the risk of making a mistake into a statistical expression?

    The more there is at stake in a forensic scenario, the more pressing these questions arise.

    In 2014 a media trial took place in Milan concerning the disappearance of the girlfriend of a member of the mafia. After 3 years of hearings and depositions of witnesses reporting that she had been killed and dissolved in acid, it was discovered through the testimony of a "pentito (so called repentants who collaborate with justice) that she had been strangled and her body burnt, chopped, and thrown into a manhole. At the opening of the manhole, the forensic anthropologist and archaeologist found in fact 1500 g of bone fragments of the maximum dimensions of 2 cm. Several laboratories, law enforcement, and university experts attempted identification through DNA but given the severe state of calcination it was not possible to extract a reliable genetic profile. Since we had found residues and fragments of dental implants among the burnt remains of the cranium, we asked the investigators if there was any antemortem dental data available; fortunately the woman had been treated a few years earlier. We compared the antemortem and postmortem data and arrived at an identification from such a comparison. A year later, we were called to the hearing for a cross-examination. We explained all that had been done for the recovery, the documentation of the cremation, and finally the identification. The judge hovered over the identification issue stressing that given the importance of the case, it was crucial for her to know the uniqueness and the frequency within the populationof those dental elements. In short, she was asking us exactly what chance there was of another person sharing the same dental setup. But there was no quantifiable" answer. The case ended in a conviction and the jury was convinced that the woman had been identified. However, the scientific problem remains.

    Many other examples exist, such as the instance where we need to provide statistics concerning the probability of a juvenile having reached the adult age and being imputable—how do we translate dental and skeletal growth into a satisfactory statistical answer?

    These are only some examples of the kind of expectations judges or prosecutors may have with respect to anthropological and medicolegal cases, and show how statistics seems to be inevitably more and more fundamental, or at least an important issue.

    This is why the time is crucial for deciding where we are at concerning the application of statistics and probability to several anthropological issues and when and how it is necessary. And even if not all questions will be solved by the illustrious scientists of the following chapters, the method and the type of logic we need to deal with nowadays in science and in court will have been made evident.

    Chapter 1.2: Study design and sampling

    Zuzana Obertováa,b; Alistair Stewartc    a Forensic Anthropologist, Visual Identification of Persons, Zürich Forensic Science Institute, Zürich, Switzerland

    b Centre for Forensic Anthropology, School of Social Sciences, The University of Western Australia, Australia

    c Retired, School of Population Health, The University of Auckland, Auckland, New Zealand

    Abstract

    Giving detailed thought to study design is essential for valid and reliable studies. Validity and reliability of studies are particularly important for forensic sciences. In this chapter, types of study designs, different sampling strategies, the role of sample size, type I and type II error, power, and bias are defined and discussed.

    Keywords

    Random sample; Sample size; Bias; Observational study; Validity; Reliability

    Introduction

    In textbooks on medical statistics and epidemiology, chapters on study design and sampling have usually got a prime position. It is natural since for example as a patient, you wish that the study which tested a certain type of medication for its curative effect and the probability of side effects has been designed and performed in a scientifically sound manner. As forensic anthropologists and pathologists, we may wish the same for our work, along with others, such as the families who lost a loved one, lawyers, and judges. When testing a drug or a forensic hypothesis, a proper study design is the first and, one might say, the most important step in search for answers.

    Forensic anthropologists design usually studies that are meant to provide population data on some characteristic or to clarify some aspect of a forensic case, for example, regarding trauma or pathology. The study design can include an experiment (e.g., to clarify what happens to a body part if it was subject to burning under certain conditions) or an observational study based on a sample assembled through data collection (e.g., measuring the length of femur from 300 individuals from an identified skeletal collection to estimate stature) or information extracted from existing sources (e.g., searching a database to establish the frequency of impacted lower left canine in Greek females).

    The first step in any study should be to pose the research question or hypothesis we would like to answer/test with the study. The research, also called alternative hypothesis usually includes some kind of comparison and therefore states that there is a difference in terms of the feature/measure in question (between populations). In contrast the null hypothesis includes a statement of no difference.

    After posing the research and the null hypothesis, a sample or samples need to be identified, which will be the basis of hypothesis testing and reflect the target population (the population of which the sample is representative). The population parameters will then be estimated based on the sample characteristics. A sample can consist of different types of data, which in forensic anthropology are mostly either qualitative, including categorical (or nominal) (e.g., sex and ancestry) and ordinal data (e.g., size of processus mastoideus not in terms of metrics but ordered from smallest to largest) or quantitative, including discrete (e.g., number of fractures) or continuous variables (e.g., stature or age).

    Quantitative variables are commonly categorized (e.g., continuous age categorized into age groups). Notably, categorizing results in the loss of information on data variation, as values in the same category are treated the same. Therefore categorization should be performed after giving careful thought to the research question, anthropologically, medically, or demographically relevant cutoffs, and data distribution such as the presence of multimodal distribution (probability distribution with more than one peak), for example, in age-related growth curves.

    As a consequence of the Daubert ruling, studies on the reliability and repeatability of findings (including testing intra- and interexaminer differences for a specific method) and studies on external validity (the extent to which the results [or methodology] can be generalized to other events/samples) or internal validity (the degree of confidence that the causal relationship studied is plausible and not influenced by other factors) have gained in importance in forensic sciences. Notably a reliable method is not always valid: the method can be reproduced well, but it does not mean that it measures what it should. A valid measurement is usually also reliable: if a method produces accurate results, it should also be reproducible. The validity of a study is largely dependent on the appropriateness of its design, including collection of sample data in an unbiased manner, and accurate and objective observations/measurements.

    Study design

    Study designs can generally be classified as observational or experimental. In observational studies researchers observe events/features as they occur (or are listed in a database), classifying their levels for an outcome of interest and one or more predictors. In experimental studies the researcher sets one or more predictors to a specific level and observes how the outcome of interest changes with a change in a given factor. Randomization of the groups is key for the experimental design. Therefore experimental studies allow stronger causal inference than observational studies, particularly since in the latter the researcher may not detect variables that are essential for data interpretation. Since each study design is associated with a given form of sampling, the study design also determines the type of analysis and the outcomes. The types of observational studies are summarized in Table 1.

    Table 1

    To better evaluate the outcome, researchers often feel that they need to employ a complex study design to cover all the bases. However, complex study designs may be expensive, and difficult to translate into reality (e.g., difficulties in acquiring a sufficient sample size or specimens with specific characteristics). So, it is advisable to use the simplest (but not simpler than necessary) study design available. Researchers should define the study objectives, outcome(s), and predictors early, to avoid adding questions as the study progresses.

    Regardless of the complexity of the study, keeping detailed notes of how the researcher proceeded is essential. In addition, if a form for data collection is designed (which is mostly the case), it needs to be clarified early who will fill in the form and if possible pretest the form with suitable persons (persons who will be representative of those who will collect data in the study or who will fill in a questionnaire) to avoid incorrect entries due to misunderstandings. Therefore the forms should be self-explanatory indicating details such as the degree of accuracy and the units of entries. A more extensive form of pretest is a pilot study, which includes all the steps of the actual study with small sample size. Notably a pilot study is not a substitution for a full study. The results of a pilot study should not be used for conclusive hypothesis testing or interpretations of the results. The role of the pilot study is to help assessing whether the selected sampling method actually results in a sample representative of the population of interest and whether the selected study design is appropriate for the full study. If possible the results of the pilot study should be compared with results of similar (published) studies on the topic to identify potential problems with the study design. However, differences to other studies may arise from the fact that due to the small sample size in pilot studies, it is often difficult to detect a difference when it actually exists (i.e., lack of statistical power resulting in type II error).

    Type II error means failing to reject the null hypothesis when it is false or saying that there is no effect/difference when actually there is one. Power of a study (calculated as 1 − type II error) reflects the chance of rejecting a null hypothesis when it is false, so saying there is an effect, when there is one (e.g., 90% power means there is a 90% chance of saying there is an effect if the effect actually exists). Type I error means rejecting the null hypothesis when it is true, or saying that there is an effect/a difference when actually there is none. Notably, for a fixed sample size, as the probability of type I error decreases, the probability of type II error increases and vice versa. Increasing the sample size may balance the errors, but if this is not possible, one may need to decide, which error is less important.

    Sample size and power

    Before conducting a study the sample size needs to be determined so we can detect meaningful effects without wasting resources. For studies with categorical outcomes, we need to specify the level of significance (probability of making type I error) (e.g., 0.05), power (e.g., 90%, in which case the probability of making type II error is 10%), estimate (possibly by looking at previous publications or based on the experience of the researcher) the proportion of Group 1 having outcome (in %), proportion of Group 2 having outcome (%), and also Group 1/Group 2 sample size ratio (especially when other than 1:1). Alternatively, for studies with continuous outcomes, the mean and standard deviation of the outcome in Group 1 and Group 2 need to be entered.

    In forensic anthropology, we often have the situation where we have a predefined sample size (e.g., there are only a certain number of male and female skeletons in an identified collection), so instead of calculating the sample size, we may want to know the power of our study with the given sample size. In that case, we will again need to specify the level of significance (e.g., 0.05), estimate the proportion of Group 1 having outcome (in %) and proportion of Group 2 having outcome (%), and give the number of individuals/specimens in Group 1 and Group 2.

    When the type I error is fixed by the researcher, type II error or power of the study is not fixed and depends on sample size (the larger the sample, the greater the power), magnitude of the effect we would like to detect (small effect is more difficult to detect, so large sample size may be needed to detect small but meaningful effect), and sample variance (when there is large variance, larger sample size is needed). Other aspects, such as statistical test are also important. Often some kind of trade-off is required to balance power, effect size (which should be meaningful within the research question), and an achievable sample size.

    Although some authors state specific numbers for minimum sample sizes needed for a given statistical analysis, a fixed number is usually misleading. For example, according to Long (1997), for a logistic regression, 100 is the recommended minimum sample size (with at least 10 observations per predictor). In addition, larger sample size may be needed with skewed outcome variable (e.g., with few 1’s and many 0’s) and with categorical predictors to avoid computational problems caused by empty cells or when multicollinearity is present. So already here, it is clear that there are many exceptions to the minimum number of 100 and the actual sample size needed for conclusive results depends on the particular study.

    Sampling

    Frequentist analysis is based on the notion that population characteristics are unknown, but we can gather some information regarding the population based on a sample of data. We assume that this sample is a random selection from the population of interest (otherwise, we would not be able to make the inference from sample to population). Random sampling can be assumed (the emphasis here is on assumed since this is an approximation of random sampling), for example, if we select all individuals who consecutively attended the emergency department of a hospital with a rib fracture within the past 2 years, or there are study designs, such as sample surveys, where random sampling is explicitly performed by the researcher by drawing a sample from known population lists (e.g., school enrolment lists or lists of patients from general practitioner practices). Usually, we also assume that each individual in the population has equal chance of being selected into the sample, that is, we have a simple random sample. However, we can also be interested in doing random sampling with unequal chance of selection, since we would like one group to be overrepresented in our sample. Sample surveys can be designed with such nonsimple random sampling, for instance, cluster, or stratified sampling.

    Sampling is important since it forms the basis for the type of analysis that can be done and what conclusions can be made based on the data. Sampling regimes can be divided in unconditional and conditional. Unconditional sampling for discrete data, for example, means that the sample is selected at random from the population and distributed to groups (e.g., males and females with and without shovel-shaped incisors). Row and column totals represent (marginal) distribution of the two variables, and the proportions of sex and dental nonmetric trait can be estimated. In conditional sampling the selection is guided by a particular feature, so the researcher chooses, for example, 100 males and 100 females with or without shovel-shaped incisors. In this case the population proportion (or prevalence) of males and females cannot be estimated (since it has been fixed by the researcher). None of the statistical software would recognize based on the numbers in the contingency table how the sampling was done so it cannot choose the correct analysis by itself. The researcher is the one to decide. Cohort studies are often based on unconditional sampling, while case–control studies are sampled conditionally on cases.

    Stratified sampling can help simplifying the study design since it controls for certain predictors, which may affect the outcome, for example, males and females or age groups can be sampled as separate strata. The advantage of stratified sampling is that it leads to better precision of estimates, is relatively simple to perform, and can control for confounding. The disadvantage is that it cannot account for many confounders simultaneously because this would normally result in small numbers in each strata. For example, if we stratify by sex (male/females) and age (15–19, 20–29, 30–39, 40–49, 50–59, 60–69, and 70 + years, i.e., seven groups), we would already have 14 strata. Alternative to stratification within the study design is using a certain type of statistical analysis, for example, multiple regression.

    Cluster sampling strategy is based on clusters, which are groups derived from families, schools, or hospitals. When performing statistical analysis with clusters, confidence intervals (CIs) will be wider and P-values greater compared with simple random samples, since the analysis adjusts for the effectively smaller sample size in the clusters. However, cluster sampling can be equivalent to simple random sampling if intracluster correlation is minimal (the individuals within the cluster are as diverse as they would be in the population). In contrast, if the intracluster correlation is close to 1 (so individuals in the cluster are very similar), then the estimates will have wide CIs.

    In forensic practice, it is often not possible to work with random samples, so convenience samples are used. The methodological robustness of such samples and their appropriateness of use in forensic cases are subject of debate. However, as Evett and Weir (1998, 45) commented … the scientist must also convince a court of the reasonableness of his or her inference within the circumstances as they are presented as evidence.

    Measurement error/bias

    In general, random sampling is important for reducing or eliminating bias within the study. Measurement error has a random and a systematic component. Random error cannot be attributed to a specific cause and is represented by unexplainable fluctuation in the data. Systematic error (or bias) has a direction and magnitude and is not the consequence of chance alone. To avoid at least the conscious bias (the unconscious may remain regardless), researchers should try to free their minds from interpreting data based on their desired outcomes or interpreting data in a way that they fit a certain theory. However, merely describing the data would not do. (Correct) Interpretation is necessary, and in this the researchers‘experience and knowledge play a major role, helping them differentiate between association and causation or recognize relevant patterns in data.

    Parameter estimates based on a simple random sample are known to be unbiased. As the sample size increases, the estimates get more precise. Unbiased estimates can be achieved when there are no confounding (mixing) effects, no measurement error (misclassification) or selection bias. In some instances, statistical modeling may help deal with some of these aspects (especially confounding) to still arrive at an unbiased estimate.

    In frequentist analysis the role of statistics in forensic anthropology (or any other discipline) is to find sample statistics that are appropriate estimators of the population parameters. However, the sample estimates usually differ from the population parameters. Therefore, by using point estimates only, we cannot describe the potential errors in the estimates. If we were to repeatedly sample data from a population, the estimate of standard error (standard deviation as a measure of variation within the sample divided by the number of observations) would reflect how the population parameter would be expected to vary purely by chance. Confidence intervals express the precision of an estimate and simultaneously assess the degree of sampling variability (or sampling error), which is associated with an estimate if all possible samples of a given size would be drawn from the population. If all these samples were used to create CIs, then the value of the population parameter would become known. Commonly, 95% CIs are reported, approximately equal to ± 2 standard errors of the estimate (1.96 exactly), which means that 95% of the CIs would include the population parameter and the remaining 5% would not. Notably we assume that the observed CIs are attributed to chance only, not to systematic bias (which would basically invalidate the

    Enjoying the preview?
    Page 1 of 1