Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Statistics from A to Z: Confusing Concepts Clarified
Statistics from A to Z: Confusing Concepts Clarified
Statistics from A to Z: Confusing Concepts Clarified
Ebook713 pages6 hours

Statistics from A to Z: Confusing Concepts Clarified

Rating: 0 out of 5 stars

()

Read preview

About this ebook

 Statistics is confusing, even for smart, technically competent people. And many students and professionals find that existing books and web resources don’t give them an intuitive understanding of confusing statistical concepts. That is why this book is needed. Some of the unique qualities of this book are:

Easy to Understand: Uses unique “graphics that teach” such as concept flow diagrams, compare-and-contrast tables, and even cartoons to enhance “rememberability.”

Easy to Use: Alphabetically arranged, like a mini-encyclopedia, for easy lookup on the job, while studying, or during an open-book exam.

Wider Scope: Covers Statistics I and Statistics II and Six Sigma Black Belt, adding such topics as control charts and statistical process control, process capability analysis, and design of experiments. As a result, this book will be useful for business professionals and industrial engineers in addition to students and professionals in the social and physical sciences.

In addition, each of the 60+ concepts is covered in one or more articles. The 75 articles in the book are usually 5–7 pages long, ensuring that things are presented in “bite-sized chunks.” The first page of each article typically lists five “Keys to Understanding” which tell the reader everything they need to know on one page. This book also contains an article on “Which Statistical Tool to Use to Solve Some Common Problems”, additional “Which to Use When” articles on Control Charts, Distributions, and Charts/Graphs/Plots, as well as articles explaining how different concepts work together (e.g., how Alpha, p, Critical Value, and Test Statistic interrelate).

ANDREW A. JAWLIK received his B.S. in Mathematics and his M.S. in Mathematics and Computer Science from the University of Michigan. He held jobs with IBM in marketing, sales, finance, and information technology, as well as a position as Process Executive. In these jobs, he learned how to communicate difficult technical concepts in easy - to - understand terms. He completed Lean Six Sigma Black Belt coursework at the IASSC - accredited Pyzdek Institute. In order to understand the confusing statistics involved, he wrote explanations in his own words and graphics. Using this material, he passed the certification exam with a perfect score. Those statistical explanations then became the starting point for this book.

LanguageEnglish
PublisherWiley
Release dateSep 21, 2016
ISBN9781119272007
Statistics from A to Z: Confusing Concepts Clarified

Related to Statistics from A to Z

Related ebooks

Mathematics For You

View More

Related articles

Reviews for Statistics from A to Z

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Statistics from A to Z - Andrew A. Jawlik

    STATISTICS FROM A TO Z

    Confusing Concepts Clarified


    ANDREW A. JAWLIK

    Wiley Logo

    Copyright © 2016 by John Wiley & Sons, Inc. All rights reserved.

    Published by John Wiley & Sons, Inc., Hoboken, New Jersey.

    Published simultaneously in Canada.

    No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

    Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

    For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

    Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

    Library of Congress Cataloging-in-Publication Data

    Names: Jawlik, Andrew.

    Title: Statistics from A to Z : confusing concepts clarified / Andrew Jawlik.

    Description: Hoboken, New Jersey : John Wiley & Sons, Inc., [2016].

    Identifiers: LCCN 2016017318 | ISBN 9781119272038 (pbk.) | ISBN 9781119272007 (epub)

    Subjects: LCSH: Mathematical statistics–Dictionaries. | Statistics–Dictionaries.

    Classification: LCC QA276.14 .J39 2016 | DDC 519.503–dc23

    LC record available at https://lccn.loc.gov/2016017318

    To my wonderful wife, Jane, who is a 7 Sigma*.

    CONTENTS

    Other Concepts Covered in the Articles

    Why This Book is Needed

    What Makes this Book Unique?

    How to Use This Book

    ALPHA,

    Alpha and Beta Errors

    Alpha, p, Critical Value, and Test Statistic – How They Work Together

    Alternative Hypothesis

    Analysis of Means (ANOM)

    ANOVA – Part 1 (of 4): What it Does

    ANOVA – Part 2 (of 4): How it Does it

    ANOVA – Part 3 (OF 4): 1-Way (AKA Single Factor)

    ANOVA – Part 4 (OF 4): 2-Way (AKA 2-Factor)

    ANOVA vs. Regression

    Binomial Distribution

    Charts/Graphs/Plots – Which to Use When

    Chi-Square – The Test Statistic and Its Distributions

    Chi-Square Test for Goodness of Fit

    Chi-Square Test for Independence

    Chi-Square Test for the Variance

    Confidence Intervals – Part 1 (of 2): General Concepts

    Confidence Intervals – Part 2 (of 2): Some Specifics

    Control Charts – Part 1 (of 2): General Concepts and Principles

    Control Charts – Part 2 (of 2): Which to Use When

    Correlation – Part 1 (of 2)

    Correlation – Part 2 (of 2)

    Critical Value

    Degrees of Freedom

    Design of Experiments (DOE) – Part 1 (of 3)

    Design of Experiments (DOE) – Part 2 (OF 3)

    Design of Experiments (DOE) – Part 3 (OF 3)

    Distributions – Part 1 (OF 3): What They Are

    Distributions – Part 2 (of 3): How They Are Used

    Distributions – Part 3 (of 3): Which To Use When

    Errors – Types, Uses, and Interrelationships

    Exponential Distribution

    F

    Fail to Reject the Null Hypothesis

    Hypergeometric Distribution

    Hypothesis Testing – Part 1 (of 2): Overview

    Hypothesis Testing – Part 2 (of 2): How To

    Inferential Statistics

    Margin of Error

    Nonparametric

    Normal Distribution

    Null Hypothesis

    p, p-Value

    p, t, and F: > or < ?

    Poisson Distribution

    Power

    Process Capability Analysis (PCA)

    Proportion

    r, Multiple R, r², R², R Square, R² Adjusted

    Regression – Part 1 (of 5): Sums of Squares

    Regression – Part 2 (of 5): Simple Linear

    Regression – Part 3 (of 5): Analysis Basics

    Regression – Part 4 (of 5): Multiple Linear

    Regression – Part 5 (of 5): Simple Nonlinear

    Reject the Null Hypothesis

    Residuals

    Sample, Sampling

    Sample Size – Part 1 (of 2): Proportions for Count Data

    Sample Size – Part 2 (of 2): For Measurement/Continuous Data

    Sampling Distribution

    Sigma

    Skew, Skewness

    Standard Deviation

    Standard Error

    Statistically Significant

    Sums of Squares

    t – The Test Statistic and Its Distributions

    t-Tests – Part 1 (of 2): Overview

    t-Tests – Part 2 (of 2): Calculations and Analysis

    Test Statistic

    Variables

    Variance

    Variation/Variability/ Dispersion/Spread

    Which Statistical Tool to Use to Solve Some Common Problems

    Z

    How to Find Concepts in This Book

    EULA

    Other Concepts Covered in the Articles

    1-Sided or 1-Tailed: see the articles Alternative Hypothesis and Alpha, α.

    1-Way: an analysis that has one Independent (x) Variable, e.g., 1-way ANOVA.

    2-Sided or 2-Tailed: see the articles Alternative Hypothesis and Alpha, α.

    2-Way: an analysis that has two Independent (x) Variables, e.g., 2-way ANOVA.

    68-95-99.7 Rule: same as the Empirical Rule. See the article Normal Distribution.

    Acceptance Region: see the article Alpha, α.

    Adjusted R²: see the article r, Multiple R, r², R², R Square, R² Adjusted.

    aka: also known as.

    Alias: see the article Design of Experiments (DOE) – Part 2.

    Associated, Association: see the article Chi-Square Test for Independence.

    Assumptions: requirements for being able to use a particular test or analysis. For example, ANOM and ANOVA require approximately Normal data.

    Attributes data, Attributes Variable: same as Categorical or Nominal data or Variable. See the articles Variables and Chi-Square Test for Independence.

    Autocorrelation: see the article Residuals.

    Average Absolute Deviation: see the article Variance.

    Average: same as the Mean – the sum of a set of numerical values divided by the Count of values in the set.

    Bernoulli Trial: see the article Binomial Distribution.

    Beta: the probability of a Beta Error. See the article Alpha and Beta Errors.

    Beta Error: featured in the article Alpha and Beta Errors.

    Bias: see the article Sample, Sampling.

    Bin, Binning: see the articles Chi-Square Test for Goodness of Fit and Charts/Graphs/Plots – Which to Use When.

    Block, Blocking: see the article Design of Experiments (DOE) – Part 3.

    Box Plot, Box and Whiskers Plot: see the article Charts/Graphs/Plots – Which to Use When.

    Cm, Cp, Cr, or CPK: see the article Process Capability Analysis (PCA).

    Capability, Capability Index: see the article Process Capability Analysis (PCA).

    Categorical data, Categorical Variable: same as Attribute or Nominal data/Variable. See the articles Variables and Chi-Square Test for Independence.

    CDF: see Cumulative Density Function.

    Central Limit Theorem: see the article Normal Distribution.

    Central Location: same as Central Tendency. See the article Distributions – Part 1: What They Are.

    Central Tendency: same as Central Location. See the article Distributions – Part 1: What They Are.

    Chebyshev's Theorem: see the article Standard Deviation.

    Confidence Coefficient: same as Confidence Level. See the article Alpha, α.

    Confidence Level: (aka Level of Confidence aka Confidence Coefficient) equals 1 – Alpha. See the article Alpha, α.

    Confounding: see the article Design of Experiments (DOE) – Part 3.

    Contingency Table: see the article Chi-Square Test for Independence.

    Continuous data or Variables: see the articles Variables and Distributions – Part 3: Which to Use When.

    Control, in… or out of…: see the article Control Charts – Part 1: General Concepts and Principles.

    Control Limits, Upper and Lower: see the article Control Charts – Part 1: General Concepts and Principles.

    Count data, Count Variables: aka Discrete data or Discrete Variables. See the article Variables.

    Covariance: see the article Correlation – Part 1.

    Criterion Variable: see the article Variables.

    Critical Region: same as Rejection Region. See the article Alpha, α.

    Cumulative Density Function (CDF): the formula for calculating the Cumulative Probability of a Range of values of a Continuous random Variable, for example, the Cumulative Probability that x ≤ 0.5.

    Cumulative Probability: see the article Distributions – Part 2: How They Are Used.

    Curve Fitting: see the article Regression – Part 5: Simple Nonlinear.

    Dependent Variable: see the article Variables.

    Descriptive Statistics: see the article Inferential Statistics.

    Dot Plot: see the article Charts/Graphs/Plots – Which to Use When.

    Deviation: the difference between a data value and a specified value (usually the Mean). See the article Regression – Part 1: Sums of Squares. See also the article Standard Deviation.

    Discrete data or Variables: see the articles Variables and Distributions – Part 3: Which to Use When.

    Dispersion: see the article Variation/Variability/Dispersion/Spread (they all mean the same thing).

    Effect Size: see the article Power.

    Empirical Rule: same as the 68-95-99.7 Rule. See the article Normal Distribution.

    Expected Frequency: see the articles Chi-Square Test for Goodness of Fit and Chi-Square Test for Independence.

    Expected Value: see the articles Chi-Square Test for Goodness of Fit and Chi-Square Test for Independence.

    Exponential: see the article Exponential Distribution.

    Exponential Curve: see the article Regression – Part 5: Simple Nonlinear.

    Exponential Transformation: see the article Regression – Part 5: Simple Nonlinear.

    Extremes: see the article Variation/Variability/Dispersion/Spread.

    F-test: see the article F.

    Factor: see the articles ANOVA – Parts 3 and 4 and Design of Experiments (DOE) – Part 1.

    False Positive: an Alpha or Type I Error; featured in the article Alpha and Beta Errors.

    False Negative: a Beta or Type II Error; featured in the article Alpha and Beta Errors.

    Frequency: a Count-like Statistic which can be non-integer. See the articles Chi-Square Test for Goodness of Fit and Chi-Square Test for Independence.

    Friedman Test: see the article Nonparametric.

    Gaussian Distribution: same as Normal Distribution.

    Generator: see the article Design of Experiments (DOE) – Part 3.

    Goodness of Fit: see the articles Regression – Part 1: Sums of Squares and Chi-Square Test for Goodness of Fit.

    Histogram: see the article Charts/Graphs/Plots – Which to Use When.

    Independence: see the article Chi-Square Test for Independence.

    Independent Variable: see the article Variables.

    Interaction: see the articles ANOM; ANOVA – Part 4: 2-Way; Design of Experiments, Parts 1, 2, and 3; Regression – Part 4: Multiple Linear.

    Intercept: see the article Regression – Part 2: Simple Linear.

    InterQuartile Range (IQR): see the article Variation/Variability/Dispersion/Spread.

    Kruskal–Wallis Test: see the article Nonparametric.

    Kurtosis: a measure of the Shape of a Distribution. See the article Distributions – Part 1: What They Are.

    Least Squares: (same as Least Sum of Squares or Ordinary Least Sum of Squares) see the articles Regression – Part 1: Sums of Squares and Regression – Part 2: Simple Linear.

    Least Sum of Squares: same as Least Squares.

    Level of Confidence: same as Confidence Level; equal to 1 – α. See the article Alpha, α.

    Level of Significance: same as Significance Level, Alpha (α). See the articles Alpha, α and Statistically Significant.

    Line Chart: see the article Charts/Graphs/Plots – Which to Use When.

    Logarithmic Curve, Logarithmic Transformation: see the article Regression – Part 5: Simple Nonlinear.

    Main Effect: a Factor which is not an Interaction. See the articles ANOVA – Part 4: 2-Way and Design of Experiments (DOE) – Part 2.

    Mann–Whitney Test: see the article Nonparametric.

    Mean: the average. Along with Mean and Median, it is a measure of Central Tendency.

    Mean Absolute Deviation (MAD): see the article Variation/Variability/Dispersion/Spread.

    Mean Sum of Squares: see the article ANOVA – Part 2 (MSB and MSW) and the article F.

    Measurement data: same as Continuous data.

    Median: the middle of a range of values. Along with Mean and Mode, it is a measure of Central Tendency. It is used instead of the Mean in Nonparametric Analysis. See the article Nonparametric.

    Memorylessness: see the article Exponential Distribution.

    Mode: the most common value within a group (e.g., a Sample or Population, or Process). There can be more than one Mode. Along with Mean and Median, Mode is a measure of Central Tendency.

    MOE: see the article Margin of Error.

    MSB and MSW: see the article ANOVA – Part 2 (MSB and MSW) and the article F.

    Multiple R: see the article r, Multiple R, r², R², R Square, R² Adjusted.

    Multiplicative Law of Probability: see the article Chi-Square Test for Independence.

    Nominal data, Nominal Variable: same as Categorical or Attributes data or Variable. See the article Variables.

    One-Sided, One-Tailed: (same as 1-sided, 1-tailed) see the articles Alternative Hypothesis and Alpha, α.

    One-Way: same as 1-Way; an analysis that has one Independent (x) Variable. For example, 1-way ANOVA.

    Outlier: see the article Variation/Variability/Dispersion/Spread.

    Parameter: a measure of a property of a Population or Process, e.g., the Mean or Standard Deviation. The counterpart for a Sample is called a Statistic. Parameters are usually denoted by characters in the Greek Alphabet, such as μ or σ.

    Parametric: see the article Nonparametric.

    Pareto Chart: see the article Charts/Graphs/Plots – Which to Use When.

    PCA: see the article Process Capability Analysis (PCA).

    PDF: see Probability Density Function.

    Pearson's Coefficient, Pearson's r: the correlation Coefficient, r. See the article Correlation – Part 2.

    Performance Index: see the article Process Capability Analysis (PCA).

    PMF: see Probability Mass Function.

    Polynomial Curve: see the article Regression – Part 5: Simple Nonlinear.

    Population or Process: where most texts say Population, this book adds or Process. Ongoing Processes are handled the same as Populations, because new data values continue to be created. Thus, like Populations, we don't have complete data for ongoing Processes.

    Power Transformation: see the article Regression – Part 5: Simple Nonlinear.

    Probability Density Function (PDF): the formula for calculating the Probability of a single value of a Continuous random Variable of, for example, the Probability that x = 5. (For Discrete random Variables, the corresponding term is Probability Mass Function, PMF.) See also Cumulative Density Function.

    Probability Distribution: see the article Distributions – Part 1: What They Are.

    Probability Mass Function (PMF): the formula for calculating the Probability of a single value of a Discrete random Variable of, for example, the Probability that x = 5.

    Qualitative Variable/Qualitative data: same as Categorical Variable and Categorical data. See the articles Variables and Chi-Square Test for Independence.

    Outlier: see the article Variation/Variability/Dispersion/Spread.

    Random Sample: see the article Sample, Sampling.

    Random Variable: see the article Variables.

    Range: see the article Variation/Variability/Dispersion/Spread.

    Rational Subgroup: see the article Control Charts – Part 1.

    Rejection Region: same as Critical Region. See the article Alpha, α.

    Replacement, Sampling With or Without: see the article Binomial Distribution.

    Resolution: see the article Design of Experiments (DOE) – Part 3.

    Response Variable: see the articles Variables and Design of Experiments (DOE) – Part 2.

    Run Rules: see the article Control Charts – Part 1.

    Scatterplot: see the article Charts/Graphs/Plots – Which to Use When.

    Shape: see the article Distributions – Part 1: What They Are.

    Significance Level: see the article Alpha, α.

    Significant: see the article Statistically Significant.

    Slope: see the article Regression – Part 2: Simple Linear.

    Spread: see the article Variation/Variability/Dispersion/Spread.

    Standard Normal Distribution: see the articles Normal Distribution and z.

    Statistic: a measure of a property of a Sample, e.g., the Mean or Standard Deviation. The counterpart for a Population or Process is called a Parameter. Statistics are usually denoted by characters based on the Roman Alphabet, such as or s.

    Statistical Inference: same as Inferential Statistics; see the article by that name.

    Statistical Process Control: see the article Control Charts – Part 1: General Concepts and Principles.

    Student's t: see the article t, The Test Statistic and Its Distributions.

    Tail: see the articles Alpha, α and Alternative Hypothesis.

    Three Sigma Rule: same as Empirical Rule and the 68-95-99.7 Rule. See the article Normal Distribution.

    Transformation: see the article Regression – Part 5: Simple Nonlinear.

    Two-Sided, Two-Tailed: same as 2-Sided, 2-Tailed. See the articles Alpha, α and Alternative Hypothesis.

    Two-way: same as 2-Way; an analysis that has two Independent (x) Variables, e.g., 2-way ANOVA.

    Type I and Type II Errors: same as Alpha and Beta Errors, respectively. See the article by that name.

    Variables data: same as Continuous data. See the articles Variables and Distributions – Part 3: Which to Use When.

    Variability: see the article Variation/Variability/Dispersion/Spread.

    Wilcoxon Test: see the article Nonparametric.

    Why This Book is Needed

    Statistics can be confusing – even for smart people, and even for smart technical people.

    As an illustration, how quickly can we figure out whether the woman pictured above agreed to get married? (For the answer, see the article in this book, "Fail to Reject the Null Hypothesis.")

    This is understandable, not only because some of the concepts are inherently complicated and difficult to understand, but also because:

    Different terms are used to mean the same thing

    For example, the Dependent Variable, the Outcome, the Effect, the Response, and the Criterion are all the same thing. And – believe it or not – there are at least seven different names and 18 different acronyms used for just the three Statistics: Sum of Squares Between, Sum of Squares Within, and Sum of Squares Total.

    Synonyms may be wonderful for poets and fiction writers, but they confuse things unnecessarily for students and practitioners of a technical discipline.

    Conversely, a single term can have very different meanings

    For example, SST is variously used for Sum of Squares Total or Sum of Squares Treatment. (The latter is actually a component part of the former.)

    Sometimes, there is no single truth

    The acknowledged experts sometimes disagree on fundamental concepts. For example, some experts specify the use of the Alternative Hypothesis in their methods of Hypothesis Testing. Others are violently opposed to its use. Other experts recommend avoiding Hypothesis Testing completely, because of the confusing language.

    Words can have different meanings from their usage in everyday language

    The meaning of words in statistics can sometimes be very different from, or even the opposite of, the meaning of the same words in normal, everyday language.

    For example, in a Bernoulli experiment on process quality, a quality failure is called a success. Also, for Skew or Skewness, in statistics, left means right.

    A confusing array of choices

    Which Distribution do I use when? Which Test Statistic? Which test? Which Control Chart? Which type of graph?

    There are several choices for each – some of which are good in a given situation, some not.

    And the existing books don't seem to make things clear enough

    Even those with titles targeting the supposedly clueless reader do not provide sufficient explanation to clear up a lot of this confusion. Students and professionals continue to look for a book which would give them a true intuitive understanding of statistical concepts.

    Also, if you look up a concept in the index of other books, you will find something like this:

    Degrees of freedom, 60, 75, 86, 91–93, 210, 241

    So, you have to go to six different places, pick up the bits and pieces from each, and try to assemble for yourself some type of coherent concept. In this book, each concept is completely covered in one or more contiguous short articles (usually three to seven pages each). And we don't need an index, because you find the concepts alphabetically – as in a dictionary or encyclopedia.

    What Makes this Book Unique?

    It is much easier to understand than other books on the subject, because of the following:

    Alphabetically arranged, like a mini-encyclopedia, for immediate access to the specific knowledge you need at the time.

    Individual articles which completely treat one concept per article (or series of contiguous articles). No paging through the book for bits and pieces here and there.

    Almost all the articles start with a one-page summary of five or so Keys to Understanding, which gives you the whole picture on a single page. The remaining pages in the article provide a more in-depth explanation of each of the individual keys.

    Unique graphics that teach:

    Concept Flow Diagrams: visually depict how one concept leads to another and then another in the step-by-step thought process leading to understanding.

    Compare-and-Contrast Tables: for reinforcing understanding via differences, similarities, and any interrelationships between related concepts – e.g., p vs. Alpha, z vs. t, ANOVA vs. Regression, Standard Deviation vs. Standard Error.

    Cartoons to enhance rememberability.

    Highest ratio of visuals to text – plenty of pictures and diagrams and tables. This provides more concrete reinforcement of understanding than words alone.

    Visual enhancing of text to increase focus and to improve rememberability. All statistical terms are capitalized. Extensive use of short paragraphs, numbered items, bullets, bordered text boxes, arrows, underlines, and bold font.

    Repetition: An individual concept is often explained in several ways, coming at it from different aspects. If an article needs to refer to some content covered in a different article, that content is usually repeated within the first article, if it's not too lengthy.

    A Which Statistical Tool to Use article: Given a type of problem or question, which test, tool, or analysis to use. In addition, there are individual Which to Use When articles for Distributions, Control Charts, and Charts/Graphs/Plots.

    Wider Scope – Statistics I and Statistics II and Six Sigma Black Belt. Most books are focused on statistics in the social sciences, and – to a lesser extent – physical sciences or management. They don't cover statistical concepts important in process and quality improvement (Six Sigma or industrial engineering).

    Authored by a recent student, who is freshly aware of the statistical concepts that confused him – and why. (The author recently completed a course of study for professional certification as a Lean Six Sigma black belt – a process and quality improvement discipline which uses statistics extensively. He had, years earlier, earned an MS in Mathematics in a concentration which did not include much statistics content.)

    How to Use This Book

    Use this book when:

    – you're confused about a specific statistical concept or which statistical tool to use

    – you need a refresher on a statistical concept or method, just to be sure

    – you want help in making things easier to understand when communicating with others

    It can be useful:

    – while studying or while taking an open-book exam

    – on the job

    – as a reference, when developing presentations or writing e-mails

    To find a subject, you can flip through the book like an old dictionary or encyclopedia volume. If the subject you are looking for does not have an article devoted to it, there is likely a glossary description for it. And/or it may be covered in an article on another subject. In an alphabetically-organized book like this, the Contents and the Other Concepts pages make an Index unnecessary.

    See the Contents at the beginning of this book for a list of the articles covering the major concepts. Following the Contents is a section called Other Concepts Covered in the Articles. Here, you can find concepts which do not headline their own articles, for example:

    Acceptance Region: see the article Alpha, α.

    If you have a statistical problem to solve or question to answer and don't know how to go about it, see the article Which Statistical Tool to Use to Solve Some Common Problems. There are also Which to Use When articles for Distributions, Control Charts, and Charts/Graphs/Plots.

    This book is designed for use as a reference for looking up specific topics, not as a textbook to be read front-to-back. However, if you do want to use this book as a single source for learning statistics, not just a reference, you could read the following articles in the order shown:

    Inferential Statistics

    Alpha, p, Critical Value, and Test Statistic – How They Work Together

    Hypothesis Testing, Parts 1 and 2

    Confidence Intervals, Parts 1 and 2

    Distributions, Parts 1 – 3

    Which Statistical Tool to Use to Solve Some Common Problems

    Articles on individual tests and analyses, such as t-Tests, F, ANOVA, and Regression

    At the end of these and all other articles in the book is a list of Related Articles which you can read for more detail on related subjects.

    ALPHA,

    Summary of Keys to Understanding

    In Inferential Statistics, p is the Probability of an Alpha (False Positive) Error.

    Alpha is the highest value of p that we are willing to tolerate and still say that a difference, change, or effect observed in the Sample is Statistically Significant.

    Alpha is a Cumulative Probability, represented as an area under the curve, at one or both tails of a Probability Distribution. p is also a Cumulative Probability.

    In Hypothesis Testing, if p α, Reject the Null Hypothesis. If p > α, Accept (Fail to Reject) the Null Hypothesis.

    Alpha defines the Critical Value(s) of Test Statistics, such as z, t, F, or Chi-Square. The Critical Value or Values, in turn, define the Confidence Interval.

    Explanation

    In Inferential Statistics, p is the Probability of an Alpha (False Positive) Error.

    In Inferential Statistics, we use data from a Sample to estimate a property (say, the Mean) of the Population or Process from which the Sample was taken. Being an estimate, there is a risk of error.

    One type of error is the Alpha Error (also known as Type I Error or False Positive).

    An Alpha Error is the error of seeing something which is not there, that is, concluding that there is a Statistically Significant difference, change, or effect, when in fact there is not. For example,

    Erroneously concluding that there is a difference in the Means of two Populations, when there is not, or

    Erroneously concluding that there has been a change in the Standard Deviation of a Process, when there has not, or

    Erroneously concluding that a medical treatment has an effect, when it does not.

    In Hypothesis Testing, the Null Hypothesis states that there is no difference, change, or effect. All these are examples of Rejecting the Null Hypothesis when the Null Hypothesis is true.

    p is the Probability of an Alpha Error, a False Positive.

    It is calculated as part of the Inferential Statistical analysis, for example, in a t-test or ANOVA.

    How does an Alpha Error happen? An Alpha Error occurs when data in our Sample are not representative of the overall Population or Process from which the Sample was taken.

    If the Sample Size is large enough, the great majority of Samples of that size will do a good job of representing the Population or Process. However, some won't. p tells us how probable it is that our Sample is un-representative enough to produce an Alpha Error.

    Alpha is the highest value of p that we are willing to tolerate and still say that a difference, change, or effect observed in the Sample is Statistically Significant.

    In this article, we use Alpha both as an adjective and as a noun. This might cause some confusion, so let's explain.

    Alpha, as an adjective, describes a type of error, the Alpha Error. Alpha as a noun is something related, but different.

    First of all, what it is not: Alpha, as a noun, is not

    a Statistic or a Parameter, which describes a property (e.g., the Mean) of a Sample or Population

    a Constant, like those shown in some statistical tables.

    Second, what it is: Alpha, as a noun, is

    a value of p which defines the boundary of the values of p which we are willing to tolerate from those which we are not.

    For example, if we are willing to tolerate a 5% risk of a False Positive, then we would select α = 5%. That would mean that we are willing to tolerate p ≤ 5%, but not p > 5%.

    Alpha must be selected prior to collecting the Sample data. This is to help ensure the integrity of the test or experiment. If we have a look at the data first, that might influence our selection of a value for Alpha.

    Rather than starting with Alpha, it's probably more natural to think in terms of a Level of Confidence first. Then we subtract it from 1 (100%) to get Alpha.

    If we want to be 95% sure, then we want a 95% Level of Confidence (aka Confidence Level).

    By definition, α = 100% – Confidence Level. (And, so Confidence Level = 100% – α.)

    Alpha is called the Level or Significance or Significance Level.

    If p is calculated to be less than or equal to the Significance Level, α, then any observed difference, change, or effect calculated from our Sample data is said to be Statistically Significant.

    If p > α, then it is not Statistically Significant.

    Popular choices for Alpha are 10% (0.1), 5% (0.05), 1% (0.01), 0.5% (0.005), and 0.1% (0.001). But, why wouldn't we always select as low a level of Alpha as possible? Because, the choice of Alpha is a tradeoff between Alpha (Type I) Error and Beta (Type 2) Error – or put another way – between a False Positive and a False Negative. If you reduce the chance (Probability) of one, you increase the chance of the other.

    Choosing α = 0.05 (5%) is generally accepted as a good balance for most uses. The pros and cons of various choices for Alpha (and Beta) in different situations are covered in the article, Alpha and Beta Errors.

    Alpha is a Cumulative Probability, represented by an area under the curve, at one or both tails of a Probability Distribution. p is also a Cumulative Probability.

    Below are diagrams of the Standard Normal Distribution. The Variable on its horizontal axis is the Test Statistic, z. Any point on the curve is the Probability of the value of z directly below that point.

    Probabilities of individual points are usually less useful in statistics than Probabilities of ranges of values. The latter are called Cumulative Probabilities. The Cumulative Probability of a range of values is calculated as the area under the curve above that range of values. The Cumulative Probability of all values under the curve is 100%.

    We start by selecting a value for Alpha, most commonly 5%, which tells us how big the shaded area under the curve will be. Depending on the type of problem we're trying to solve, we position the shaded area (α) under the left tail, the right tail, or both tails.

    If it's one tail only, the analysis is called 1-tailed or 1-sided (or left-tailed or right-tailed), and Alpha is entirely under one side of the curve. If it's both tails, it's called a 2-tailed" or 2-sided analysis. In that case, we divide Alpha by two, and put half under each tail. For more on tails, see the article Alternative Hypothesis.

    There are two main methods in Inferential Statistics – Hypothesis Testing and Confidence Intervals. Alpha plays a key role in both. First, let's take a look

    Enjoying the preview?
    Page 1 of 1