Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

SPSS Data Analysis for Univariate, Bivariate, and Multivariate Statistics
SPSS Data Analysis for Univariate, Bivariate, and Multivariate Statistics
SPSS Data Analysis for Univariate, Bivariate, and Multivariate Statistics
Ebook570 pages3 hours

SPSS Data Analysis for Univariate, Bivariate, and Multivariate Statistics

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Enables readers to start doing actual data analysis fast for a truly hands-on learning experience

This concise and very easy-to-use primer introduces readers to a host of computational tools useful for making sense out of data, whether that data come from the social, behavioral, or natural sciences. The book places great emphasis on both data analysis and drawing conclusions from empirical observations. It also provides formulas where needed in many places, while always remaining focused on concepts rather than mathematical abstraction.

SPSS Data Analysis for Univariate, Bivariate, and Multivariate Statistics offers a variety of popular statistical analyses and data management tasks using SPSS that readers can immediately apply as needed for their own research, and emphasizes many helpful computational tools used in the discovery of empirical patterns. The book begins with a review of essential statistical principles before introducing readers to SPSS. The book then goes on to offer chapters on: Exploratory Data Analysis, Basic Statistics, and Visual Displays; Data Management in SPSS; Inferential Tests on Correlations, Counts, and Means; Power Analysis and Estimating Sample Size; Analysis of Variance – Fixed and Random Effects; Repeated Measures ANOVA; Simple and Multiple Linear Regression; Logistic Regression; Multivariate Analysis of Variance (MANOVA) and Discriminant Analysis; Principal Components Analysis; Exploratory Factor Analysis; and Non-Parametric Tests. This helpful resource allows readers to:

  • Understand data analysis in practice rather than delving too deeply into abstract mathematical concepts
  • Make use of computational tools used by data analysis professionals.
  • Focus on real-world application to apply concepts from the book to actual research

Assuming only minimal, prior knowledge of statistics, SPSS Data Analysis for Univariate, Bivariate, and Multivariate Statistics is an excellent “how-to” book for undergraduate and graduate students alike. This book is also a welcome resource for researchers and professionals who require a quick, go-to source for performing essential statistical analyses and data management tasks.

LanguageEnglish
PublisherWiley
Release dateJul 31, 2018
ISBN9781119465782
SPSS Data Analysis for Univariate, Bivariate, and Multivariate Statistics

Related to SPSS Data Analysis for Univariate, Bivariate, and Multivariate Statistics

Related ebooks

Mathematics For You

View More

Related articles

Reviews for SPSS Data Analysis for Univariate, Bivariate, and Multivariate Statistics

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    SPSS Data Analysis for Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis

    Preface

    The goals of this book are to present a very concise, easy‐to‐use introductory primer of a host of computational tools useful for making sense out of data, whether that data come from the social, behavioral, or natural sciences, and to get you started doing data analysis fast. The emphasis on the book is data analysis and drawing conclusions from empirical observations. The emphasis of the book is not on theory. Formulas are given where needed in many places, but the focus of the book is on concepts rather than on mathematical abstraction. We emphasize computational tools used in the discovery of empirical patterns and feature a variety of popular statistical analyses and data management tasks that you can immediately apply as needed to your own research. The book features analyses and demonstrations using SPSS. Most of the data sets analyzed are very small and convenient, so entering them into SPSS should be easy. If desired, however, one can also download them from www.datapsyc.com. Many of the data sets were also first used in a more theoretical text written by the same author (see Denis, 2016), which should be consulted for a more in‐depth treatment of the topics presented in this book. Additional references for readings are also given throughout the book.

    Target Audience and Level

    This is a how‐to book and will be of use to undergraduate and graduate students along with researchers and professionals who require a quick go‐to source, to help them perform essential statistical analyses and data management tasks. The book only assumes minimal prior knowledge of statistics, providing you with the tools you need right now to help you understand and interpret your data analyses. A prior introductory course in statistics at the undergraduate level would be helpful, but is not required for this book. Instructors may choose to use the book either as a primary text for an undergraduate or graduate course or as a supplement to a more technical text, referring to this book primarily for the how to’s of data analysis in SPSS. The book can also be used for self‐study. It is suitable for use as a general reference in all social and natural science fields and may also be of interest to those in business who use SPSS for decision‐making. References to further reading are provided where appropriate should the reader wish to follow up on these topics or expand one’s knowledge base as it pertains to theory and further applications. An early chapter reviews essential statistical and research principles usually covered in an introductory statistics course, which should be sufficient for understanding the rest of the book and interpreting analyses. Mini brief sample write‐ups are also provided for select analyses in places to give the reader a starting point to writing up his/her own results for his/her thesis, dissertation, or publication. The book is meant to be an easy, user‐friendly introduction to a wealth of statistical methods while simultaneously demonstrating their implementation in SPSS. Please contact me at daniel.denis@umontana.edu or email@datapsyc.com with any comments or corrections.

    Glossary of Icons and Special Features

    When you see this symbol, it means a brief sample write‐up has been provided for the accompanying output. These brief write‐ups can be used as starting points to writing up your own results for your thesis/dissertation or even publication.

    When you see this symbol, it means a special note, hint, or reminder has been provided or signifies extra insight into something not thoroughly discussed in the text.

    When you see this symbol, it means a special WARNING has been issued that if not followed may result in a serious error.

    Acknowledgments

    Thanks go out to Wiley for publishing this book, especially to Jon Gurstelle for presenting the idea to Wiley and securing the contract for the book and to Mindy Okura‐Marszycki for taking over the project after Jon left. Thank you Kathleen Pagliaro for keeping in touch about this project and the former book. Thanks goes out to everyone (far too many to mention) who have influenced me in one way or another in my views and philosophy about statistics and science, including undergraduate and graduate students whom I have had the pleasure of teaching (and learning from) in my courses taught at the University of Montana.

    This book is dedicated to all military veterans of the United States of America, past, present, and future, who teach us that all problems are relative.

    1

    Review of Essential Statistical Principles: Big Picture on Statistical Modeling and Inference

    The purpose of statistical modeling is to both describe sample data and make inferences about that sample data to the population from which the data was drawn. We compute statistics on samples (e.g. sample mean) and use such statistics as estimators of population parameters (e.g. population mean). When we use the sample statistic to estimate a parameter in the population, we are engaged in the process of inference, which is why such statistics are referred to as inferential statistics, as opposed to descriptive statistics where we are typically simply describing something about a sample or population. All of this usually occurs in an experimental design (e.g. where we have a control vs. treatment group) or nonexperimental design (where we exercise little or no control over variables).

    As an example of an experimental design, suppose you wanted to learn whether a pill was effective in reducing symptoms from a headache. You could sample 100 individuals with headaches, give them a pill, and compare their reduction in symptoms to 100 people suffering from a headache but not receiving the pill. If the group receiving the pill showed a decrease in symptomology compared with the nontreated group, it may indicate that your pill is effective. However, to estimate whether the effect observed in the sample data is generalizable and inferable to the population from which the data were drawn, a statistical test could be performed to indicate whether it is plausible that such a difference between groups could have occurred simply by chance. If it were found that the difference was unlikely due to chance, then we may indeed conclude a difference in the population from which the data were drawn. The probability of data occurring under some assumption of (typically) equality is the infamous p‐value, usually set at 0.05. If the probability of such data is relatively low (e.g. less than 0.05) under the null hypothesis of no difference, we reject the null and infer the statistical alternative hypothesis of a difference in population means.

    Much of statistical modeling follows a similar logic to that featured above – sample some data, apply a model to the data, and then estimate how good the model fits and whether there is inferential evidence to suggest an effect in the population from which the data were drawn . The actual model you will fit to your data usually depends on the type of data you are working with. For instance, if you have collected sample means and wish to test differences between means, then t‐test and ANOVA techniques are appropriate. On the other hand, if you have collected data in which you would like to see if there is a linear relationship between continuous variables, then correlation and regression are usually appropriate. If you have collected data on numerous dependent variables and believe these variables, taken together as a set, represent some kind of composite variable, and wish to determine mean differences on this composite dependent variable, then a multivariate analysis of variance (MANOVA) technique may be useful. If you wish to predict group membership into two or more categories based on a set of predictors, then discriminant analysis or logistic regression would be an option. If you wished to take many variables and reduce them down to fewer dimensions, then principal components analysis or factor analysis may be your technique of choice. Finally, if you are interested in hypothesizing networks of variables and their interrelationships, then path analysis and structural equation modeling may be your model of choice (not covered in this book). There are numerous other possibilities as well, but overall, you should heed the following principle in guiding your choice of statistical analysis:

    The type of statistical model or method you select often depends on the types of data you have and your purpose for wanting to build a model. There usually is not one and only one method that is possible for a given set of data. The method of choice will be dictated often by the rationale of your research. You must know your variables very well along with the goals of your research to diligently select a statistical model.

    1.1 Variables and Types of Data

    Recall that variables are typically of two kinds – dependent or response variables and independent or predictor variables. The terms dependent and independent are most common in ANOVA‐type models, while response and predictor are more common in regression‐type models, though their usage is not uniform to any particular methodology. The classic function statement Y = f(X) tells the story – input a value for X (independent variable), and observe the effect on Y (dependent variable). In an independent‐samples t‐test, for instance, X is a variable with two levels, while the dependent variable is a continuous variable. In a classic one‐way ANOVA, X has multiple levels. In a simple linear regression, X is usually a continuous variable, and we use the variable to make predictions of another continuous variable Y. Most of statistical modeling is simply observing an outcome based on something you are inputting into an estimated (estimated based on the sample data) equation.

    Data come in many different forms. Though there are rather precise theoretical distinctions between different forms of data, for applied purposes, we can summarize the discussion into the following types for now: (i) continuous and (ii) discrete. Variables measured on a continuous scale can, in theory, achieve any numerical value on the given scale. For instance, length is typically considered to be a continuous variable, since we can measure length to any specified numerical degree. That is, the distance between 5 and 10 in. on a scale contains an infinite number of measurement possibilities (e.g. 6.1852, 8.341 364, etc.). The scale is continuous because it assumes an infinite number of possibilities between any two points on the scale and has no breaks in that continuum. On the other hand, if a scale is discrete, it means that between any two values on the scale, only a select number of possibilities can exist. As an example, the number of coins in my pocket is a discrete variable, since I cannot have 1.5 coins. I can have 1 coin, 2 coins, 3 coins, etc., but between those values do not exist an infinite number of possibilities. Sometimes data is also categorical, which means values of the variable are mutually exclusive categories, such as A or B or C or boy or girl. Other times, data come in the form of counts, where instead of measuring something like IQ, we are only counting the number of occurrences of some behavior (e.g. number of times I blink in a minute). Depending on the type of data you have, different statistical methods will apply. As we survey what SPSS has to offer, we identify variables as continuous, discrete, or categorical as we discuss the given method. However, do not get too caught up with definitions here; there is always a bit of a fuzziness in learning about the nature of the variables you have. For example, if I count the number of raindrops in a rainstorm, we would be hard pressed to call this count data. We would instead just accept it as continuous data and treat it as such. Many times you have to compromise a bit between data types to best answer a research question. Surely, the average number of people per household does not make sense, yet census reports often give us such figures on count data. Always remember however that the software does not recognize the nature of your variables or how they are measured. You have to be certain of this information going in; know your variables very well, so that you can be sure SPSS is treating them as you had planned .

    Scales of measurement are also distinguished between nominal, ordinal, interval, and ratio. A nominal scale is not really measurement in the first place, since it is simply assigning labels to objects we are studying. The classic example is that of numbers on football jerseys. That one player has the number 10 and another the number 15 does not mean anything other than labels to distinguish between two players. If differences between numbers do represent magnitudes, but that differences between the magnitudes are unknown or imprecise, then we have measurement at the ordinal level. For example, that a runner finished first and another second constitutes measurement at the ordinal level. Nothing is said of the time difference between the first and second runner, only that there is a ranking of the runners. If differences between numbers on a scale represent equal lengths, but that an absolute zero point still cannot be defined, then we have measurement at the interval level. A classic example of this is temperature in degrees Fahrenheit – the difference between 10 and 20° represents the same amount of temperature distance as that between 20 and 30; however zero on the scale does not represent an absence of temperature. When we can ascribe an absolute zero point in addition to inferring the properties of the interval scale, then we have measurement at the ratio scale. The number of coins in my pocket is an example of ratio measurement, since zero on the scale represents a complete absence of coins. The number of car accidents in a year is another variable measurable on a ratio scale, since it is possible, however unlikely, that there were no accidents in a given year.

    The first step in choosing a statistical model is knowing what kind of data you have, whether they are continuous, discrete, or categorical and with some attention also devoted to whether the data are nominal, ordinal, interval, or ratio. Making these decisions can be a lot trickier than it sounds, and you may need to consult with someone for advice on this before selecting a model. Other times, it is very easy to determine what kind of data you have. But if you are not sure, check with a statistical consultant to help confirm the nature of your variables, because making an error at this initial stage of analysis can have serious consequences and jeopardize your data analyses entirely.

    1.2 Significance Tests and Hypothesis Testing

    In classical statistics, a hypothesis test is about the value of a parameter we are wishing to estimate with our sample data. Consider our previous example of the two‐group problem regarding trying to establish whether taking a pill is effective in reducing headache symptoms. If there were no difference between the group receiving the treatment and the group not receiving the treatment, then we would expect the parameter difference to equal 0. We state this as our null hypothesis:

    Null hypothesis: The mean difference in the population is equal to 0.

    The alternative hypothesis is that the mean difference is not equal to 0. Now, if our sample means come out to be 50.0 for the control group and 50.0 for the treated group, then it is obvious that we do not have evidence to reject the null, since the difference of 50.0 – 50.0 = 0 aligns directly with expectation under the null. On the other hand, if the means were 48.0 vs. 52.0, could we reject the null? Yes, there is definitely a sample difference between groups, but do we have evidence for a population difference? It is difficult to say without asking the following question:

    What is the probability of observing a difference such as 48.0 vs. 52.0 under the null hypothesis of no difference?

    When we evaluate a null hypothesis, it is the parameter we are interested in, not the sample statistic. The fact that we observed a difference of 4 (i.e. 52.0–48.0) in our sample does not by itself indicate that in the population, the parameter is unequal to 0. To be able to reject the null hypothesis, we need to conduct a significance test on the mean difference of 48.0 vs. 52.0, which involves computing (in this particular case) what is known as a standard error of the difference in means to estimate how likely such differences occur in theoretical repeated sampling. When we do this, we are comparing an observed difference to a difference we would expect simply due to random variation. Virtually all test statistics follow the same logic. That is, we compare what we have observed in our sample(s) to variation we would expect under a null hypothesis or, crudely, what we would expect under simply chance. Virtually all test statistics have the following form:

    Test statistic = observed/expected

    If the observed difference is large relative to the expected difference, then we garner evidence that such a difference is not simply due to chance and may represent an actual difference in the population from which the data were drawn.

    As mentioned previously, significance tests are not only performed on mean differences, however. Whenever we wish to estimate a parameter, whatever the kind, we can perform a significance test on it. Hence, when we perform t‐tests, ANOVAs, regressions, etc., we are continually computing sample statistics and conducting tests of significance about parameters of interest. Whenever you see such output as "Sig." in SPSS with a probability value underneath it, it means a significance test has been performed on that statistic, which, as mentioned already, contains the p‐value. When we reject the null at, say, p < 0.05, however, we do so with a risk of either a type I or type II error. We review these next, along with significance levels.

    1.3 Significance Levels and Type I and Type II Errors

    Whenever we conduct a significance test on a parameter and decide to reject the null hypothesis, we do not know for certain that the null is false. We are rather hedging our bet that it is false. For instance, even if the mean difference in the sample is large, though it probably means there is a difference in the corresponding population parameters, we cannot be certain of this and thus risk falsely rejecting the null hypothesis. How much risk are we willing to tolerate for a given significance test? Historically, a probability level of 0.05 is used in most settings, though the setting of this level should depend individually on the given research context. The infamous "p < 0.05" means that the probability of the observed data under the null hypothesis is less than 5%, which implies that if such data are so unlikely under the null, that perhaps the null hypothesis is actually false, and that the data are more probable under a competing hypothesis, such as the statistical alternative hypothesis. The point to make here is that whenever we reject a null and conclude something about the population parameters, we could be making a false rejection of the null hypothesis. Rejecting a null hypothesis when in fact the null is not false is known as a type I error, and we usually try to limit the probability of making a type I error to 5% or less in most research contexts. On the other hand, we risk another type of error, known as a type II error. These occur when we fail to reject a null hypothesis that in actuality is false. More practically, this means that there may actually be a difference or effect in the population but that we failed to detect it. In this book, by default, we usually set the significance level at 0.05 for most tests. If the p‐value for a given significance test dips below 0.05, then we will typically call the result statistically significant. It needs to be emphasized however that a statistically significant result does not necessarily imply a strong practical effect in the population.

    For reasons discussed elsewhere (see Denis (2016) Chapter 3 for a thorough discussion), one can potentially obtain a statistically significant finding (i.e. p < 0.05) even if, to use our example about the headache treatment, the difference in means is rather small. Hence, throughout the book, when we note that a statistically significant finding has occurred, we often couple this with a measure of effect size, which is an indicator of just how much mean difference (or other effect) is actually present. The exact measure of effect size is different depending on the statistical method, so we explain how to interpret the given effect size in each setting as we come across it.

    1.4 Sample Size and Power

    Power is reviewed in Chapter 6, but an introductory note about it and how it relates to sample size is in order. Crudely, statistical power of a test is the probability of detecting

    Enjoying the preview?
    Page 1 of 1