Complex Surveys: A Guide to Analysis Using R

Ebook483 pages3 hours

Complex Surveys: A Guide to Analysis Using R

Name: Complex Surveys: A Guide to Analysis Using R
Author: Thomas Lumley
ISBN: 9781118210932

By Thomas Lumley

Rating: 0 out of 5 stars

()

Read preview

About this ebook

A complete guide to carrying out complex survey analysis using R

As survey analysis continues to serve as a core component of sociological research, researchers are increasingly relying upon data gathered from complex surveys to carry out traditional analyses. Complex Surveys is a practical guide to the analysis of this kind of data using R, the freely available and downloadable statistical programming language. As creator of the specific survey package for R, the author provides the ultimate presentation of how to successfully use the software for analyzing data from complex surveys while also utilizing the most current data from health and social sciences studies to demonstrate the application of survey research methods in these fields.

The book begins with coverage of basic tools and topics within survey analysis such as simple and stratified sampling, cluster sampling, linear regression, and categorical data regression. Subsequent chapters delve into more technical aspects of complex survey analysis, including post-stratification, two-phase sampling, missing data, and causal inference. Throughout the book, an emphasis is placed on graphics, regression modeling, and two-phase designs. In addition, the author supplies a unique discussion of epidemiological two-phase designs as well as probability-weighting for causal inference. All of the book's examples and figures are generated using R, and a related Web site provides the R code that allows readers to reproduce the presented content. Each chapter concludes with exercises that vary in level of complexity, and detailed appendices outline additional mathematical and computational descriptions to assist readers with comparing results from various software systems.

Complex Surveys is an excellent book for courses on sampling and complex surveys at the upper-undergraduate and graduate levels. It is also a practical reference guide for applied statisticians and practitioners in the social and health sciences who use statistics in their everyday work.

Skip carousel

Mathematics

LanguageEnglish

PublisherWiley

Release dateSep 20, 2011

ISBN9781118210932

Author

Thomas Lumley

Related authors

Skip carousel

Related to Complex Surveys

Titles in the series (27)

Skip carousel

Introduction to Survey Quality
Ebook
Introduction to Survey Quality
byPaul P. Biemer
Rating: 0 out of 5 stars
0 ratings
Advances in Telephone Survey Methodology
Ebook
Advances in Telephone Survey Methodology
byJames M. Lepkowski
Rating: 0 out of 5 stars
0 ratings
Analysis of Health Surveys
Ebook
Analysis of Health Surveys
byEdward L. Korn
Rating: 0 out of 5 stars
0 ratings
Envisioning the Survey Interview of the Future
Ebook
Envisioning the Survey Interview of the Future
byFrederick G. Conrad
Rating: 0 out of 5 stars
0 ratings
Question Evaluation Methods: Contributing to the Science of Data Quality
Ebook
Question Evaluation Methods: Contributing to the Science of Data Quality
byJennifer Madans
Rating: 0 out of 5 stars
0 ratings
Applied Survey Methods: A Statistical Perspective
Ebook
Applied Survey Methods: A Statistical Perspective
byJelke Bethlehem
Rating: 0 out of 5 stars
0 ratings
Methods for Testing and Evaluating Survey Questionnaires
Ebook
Methods for Testing and Evaluating Survey Questionnaires
byStanley Presser
Rating: 0 out of 5 stars
0 ratings
Complex Surveys: A Guide to Analysis Using R
Ebook
Complex Surveys: A Guide to Analysis Using R
byThomas Lumley
Rating: 0 out of 5 stars
0 ratings
Statistical Matching: Theory and Practice
Ebook
Statistical Matching: Theory and Practice
byMarcello D'Orazio
Rating: 0 out of 5 stars
0 ratings
Designing and Conducting Business Surveys
Ebook
Designing and Conducting Business Surveys
byGer Snijkers
Rating: 0 out of 5 stars
0 ratings
Latent Class Analysis of Survey Error
Ebook
Latent Class Analysis of Survey Error
byPaul P. Biemer
Rating: 0 out of 5 stars
0 ratings
Nonresponse in Household Interview Surveys
Ebook
Nonresponse in Household Interview Surveys
byRobert M. Groves
Rating: 0 out of 5 stars
0 ratings
Estimation in Surveys with Nonresponse
Ebook
Estimation in Surveys with Nonresponse
byCarl-Erik Särndal
Rating: 0 out of 5 stars
0 ratings
Statistical Disclosure Control
Ebook
Statistical Disclosure Control
byAnco Hundepool
Rating: 0 out of 5 stars
0 ratings
Improving Survey Response: Lessons Learned from the European Social Survey
Ebook
Improving Survey Response: Lessons Learned from the European Social Survey
byIneke A. L. Stoop
Rating: 0 out of 5 stars
0 ratings
Analysis of Poverty Data by Small Area Estimation
Ebook
Analysis of Poverty Data by Small Area Estimation
byMonica Pratesi
Rating: 0 out of 5 stars
0 ratings
Methodology of Longitudinal Surveys
Ebook
Methodology of Longitudinal Surveys
byPeter Lynn
Rating: 0 out of 5 stars
0 ratings
Register-based Statistics: Statistical Methods for Administrative Data
Ebook
Register-based Statistics: Statistical Methods for Administrative Data
byAnders Wallgren
Rating: 0 out of 5 stars
0 ratings
Online Panel Research: A Data Quality Perspective
Ebook
Online Panel Research: A Data Quality Perspective
byMario Callegaro
Rating: 0 out of 5 stars
0 ratings
Total Survey Error in Practice
Ebook
Total Survey Error in Practice
byPaul P. Biemer
Rating: 0 out of 5 stars
0 ratings
Cognitive Interviewing Methodology
Ebook
Cognitive Interviewing Methodology
byKristen Miller
Rating: 0 out of 5 stars
0 ratings
Small Area Estimation
Ebook
Small Area Estimation
byJ. N. K. Rao
Rating: 0 out of 5 stars
0 ratings
Advances in Comparative Survey Methods: Multinational, Multiregional, and Multicultural Contexts (3MC)
Ebook
Advances in Comparative Survey Methods: Multinational, Multiregional, and Multicultural Contexts (3MC)
byTimothy P. Johnson
Rating: 0 out of 5 stars
0 ratings
Implementation of Large-Scale Education Assessments
Ebook
Implementation of Large-Scale Education Assessments
byPetra Lietz
Rating: 0 out of 5 stars
0 ratings

Related ebooks

Skip carousel

Introduction to Population Pharmacokinetic / Pharmacodynamic Analysis with Nonlinear Mixed Effects Models
Ebook
Introduction to Population Pharmacokinetic / Pharmacodynamic Analysis with Nonlinear Mixed Effects Models
byJoel S. Owen
Rating: 0 out of 5 stars
0 ratings
Latent Class Analysis of Survey Error
Ebook
Latent Class Analysis of Survey Error
byPaul P. Biemer
Rating: 0 out of 5 stars
0 ratings
Analyzing Quantitative Data: An Introduction for Social Researchers
Ebook
Analyzing Quantitative Data: An Introduction for Social Researchers
byDebra Wetcher-Hendricks
Rating: 0 out of 5 stars
0 ratings
Essential Statistics, Regression, and Econometrics
Ebook
Essential Statistics, Regression, and Econometrics
byGary Smith
Rating: 0 out of 5 stars
0 ratings
Applied Survival Analysis: Regression Modeling of Time-to-Event Data
Ebook
Applied Survival Analysis: Regression Modeling of Time-to-Event Data
byDavid W. Hosmer, Jr.
Rating: 4 out of 5 stars
4/5
Statistical Arbitrage: Algorithmic Trading Insights and Techniques
Ebook
Statistical Arbitrage: Algorithmic Trading Insights and Techniques
byAndrew Pole
Rating: 3 out of 5 stars
3/5
Statistics for Earth and Environmental Scientists
Ebook
Statistics for Earth and Environmental Scientists
byJohn H. Schuenemeyer
Rating: 0 out of 5 stars
0 ratings
Modelling Under Risk and Uncertainty: An Introduction to Statistical, Phenomenological and Computational Methods
Ebook
Modelling Under Risk and Uncertainty: An Introduction to Statistical, Phenomenological and Computational Methods
byEtienne de Rocquigny
Rating: 0 out of 5 stars
0 ratings
Statistical Methods in the Atmospheric Sciences
Ebook
Statistical Methods in the Atmospheric Sciences
byDaniel S. Wilks
Rating: 5 out of 5 stars
5/5
Common Errors in Statistics (and How to Avoid Them)
Ebook
Common Errors in Statistics (and How to Avoid Them)
byPhillip I. Good
Rating: 0 out of 5 stars
0 ratings
Multiple Imputation and its Application
Ebook
Multiple Imputation and its Application
byJames Carpenter
Rating: 0 out of 5 stars
0 ratings
Bayesian Inference in the Social Sciences
Ebook
Bayesian Inference in the Social Sciences
byIvan Jeliazkov
Rating: 0 out of 5 stars
0 ratings
An Introduction to Analysis of Financial Data with R
Ebook
An Introduction to Analysis of Financial Data with R
byRuey S. Tsay
Rating: 5 out of 5 stars
5/5
Understanding Biostatistics
Ebook
Understanding Biostatistics
byAnders Källén
Rating: 0 out of 5 stars
0 ratings
Design and Analysis of Experiments in the Health Sciences
Ebook
Design and Analysis of Experiments in the Health Sciences
byGerald van Belle
Rating: 0 out of 5 stars
0 ratings
Statistical Inference: A Short Course
Ebook
Statistical Inference: A Short Course
byMichael J. Panik
Rating: 4 out of 5 stars
4/5
Statistics for Physical Sciences: An Introduction
Ebook
Statistics for Physical Sciences: An Introduction
byBrian Martin
Rating: 0 out of 5 stars
0 ratings
Data Analysis: What Can Be Learned From the Past 50 Years
Ebook
Data Analysis: What Can Be Learned From the Past 50 Years
byPeter J. Huber
Rating: 0 out of 5 stars
0 ratings
Handbook of Probability
Ebook
Handbook of Probability
byIonut Florescu
Rating: 0 out of 5 stars
0 ratings
Applied Statistical Modeling and Data Analytics: A Practical Guide for the Petroleum Geosciences
Ebook
Applied Statistical Modeling and Data Analytics: A Practical Guide for the Petroleum Geosciences
bySrikanta Mishra
Rating: 5 out of 5 stars
5/5
Statistical Design and Analysis of Experiments: With Applications to Engineering and Science
Ebook
Statistical Design and Analysis of Experiments: With Applications to Engineering and Science
byRobert L. Mason
Rating: 0 out of 5 stars
0 ratings
Biostatistics: A Guide to Design, Analysis and Discovery
Ebook
Biostatistics: A Guide to Design, Analysis and Discovery
byRonald N. Forthofer
Rating: 0 out of 5 stars
0 ratings
Statistics in Psychology Using R and SPSS
Ebook
Statistics in Psychology Using R and SPSS
byDieter Rasch
Rating: 0 out of 5 stars
0 ratings
Linear Statistical Inference and its Applications
Ebook
Linear Statistical Inference and its Applications
byC. Radhakrishna Rao
Rating: 0 out of 5 stars
0 ratings
SPSS Data Analysis for Univariate, Bivariate, and Multivariate Statistics
Ebook
SPSS Data Analysis for Univariate, Bivariate, and Multivariate Statistics
byDaniel J. Denis
Rating: 0 out of 5 stars
0 ratings
Statistical Bioinformatics: For Biomedical and Life Science Researchers
Ebook
Statistical Bioinformatics: For Biomedical and Life Science Researchers
byJae K. Lee
Rating: 0 out of 5 stars
0 ratings
Practical Business Statistics
Ebook
Practical Business Statistics
byAndrew F. Siegel
Rating: 0 out of 5 stars
0 ratings
Statistics at Square One
Ebook
Statistics at Square One
byMichael J. Campbell
Rating: 0 out of 5 stars
0 ratings
Statistics at Square Two: Understanding Modern Statistical Applications in Medicine
Ebook
Statistics at Square Two: Understanding Modern Statistical Applications in Medicine
byMichael J. Campbell
Rating: 0 out of 5 stars
0 ratings
Statistics for Censored Environmental Data Using Minitab and R
Ebook
Statistics for Censored Environmental Data Using Minitab and R
byDennis R. Helsel
Rating: 0 out of 5 stars
0 ratings

Mathematics For You

Skip carousel

Calculus Made Easy
Ebook
Calculus Made Easy
bySilvanus P. Thompson
Rating: 4 out of 5 stars
4/5
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
Ebook
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
byGary Smith
Rating: 4 out of 5 stars
4/5
Quantum Physics for Beginners
Ebook
Quantum Physics for Beginners
byMax Thomson
Rating: 4 out of 5 stars
4/5
My Best Mathematical and Logic Puzzles
Ebook
My Best Mathematical and Logic Puzzles
byMartin Gardner
Rating: 5 out of 5 stars
5/5
Algebra - The Very Basics
Ebook
Algebra - The Very Basics
byMetin Bektas
Rating: 5 out of 5 stars
5/5
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
Ebook
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
byS. Deviant
Rating: 4 out of 5 stars
4/5
Statistics 101: From Data Analysis and Predictive Modeling to Measuring Distribution and Determining Probability, Your Essential Guide to Statistics
Ebook
Statistics 101: From Data Analysis and Predictive Modeling to Measuring Distribution and Determining Probability, Your Essential Guide to Statistics
byDavid Borman
Rating: 4 out of 5 stars
4/5
Basic Math & Pre-Algebra For Dummies
Ebook
Basic Math & Pre-Algebra For Dummies
byMark Zegarelli
Rating: 4 out of 5 stars
4/5
Real Estate by the Numbers: A Complete Reference Guide to Deal Analysis
Ebook
Real Estate by the Numbers: A Complete Reference Guide to Deal Analysis
byJ Scott
Rating: 0 out of 5 stars
0 ratings
Logicomix: An epic search for truth
Ebook
Logicomix: An epic search for truth
byApostolos Doxiadis
Rating: 4 out of 5 stars
4/5
The Thirteen Books of the Elements, Vol. 1
Ebook
The Thirteen Books of the Elements, Vol. 1
byEuclid
Rating: 0 out of 5 stars
0 ratings
The Everything Guide to Algebra: A Step-by-Step Guide to the Basics of Algebra - in Plain English!
Ebook
The Everything Guide to Algebra: A Step-by-Step Guide to the Basics of Algebra - in Plain English!
byChristopher Monahan
Rating: 4 out of 5 stars
4/5
The Little Book of Mathematical Principles, Theories & Things
Ebook
The Little Book of Mathematical Principles, Theories & Things
byRobert Solomon
Rating: 3 out of 5 stars
3/5
Game Theory: A Simple Introduction
Ebook
Game Theory: A Simple Introduction
byK.H. Erickson
Rating: 4 out of 5 stars
4/5
The Everything Guide to Pre-Algebra: A Helpful Practice Guide Through the Pre-Algebra Basics - in Plain English!
Ebook
The Everything Guide to Pre-Algebra: A Helpful Practice Guide Through the Pre-Algebra Basics - in Plain English!
byJane Cassie
Rating: 5 out of 5 stars
5/5
Mental Math Secrets - How To Be a Human Calculator
Ebook
Mental Math Secrets - How To Be a Human Calculator
byRandy Silverman
Rating: 5 out of 5 stars
5/5
The Everything Everyday Math Book: From Tipping to Taxes, All the Real-World, Everyday Math Skills You Need
Ebook
The Everything Everyday Math Book: From Tipping to Taxes, All the Real-World, Everyday Math Skills You Need
byChristopher Monahan
Rating: 5 out of 5 stars
5/5
Algebra I Workbook For Dummies
Ebook
Algebra I Workbook For Dummies
byMary Jane Sterling
Rating: 3 out of 5 stars
3/5
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
Ebook
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
byAndrew Hodges
Rating: 4 out of 5 stars
4/5
Algebra I For Dummies
Ebook
Algebra I For Dummies
byMary Jane Sterling
Rating: 4 out of 5 stars
4/5
See Ya Later Calculator: Simple Math Tricks You Can Do in Your Head
Ebook
See Ya Later Calculator: Simple Math Tricks You Can Do in Your Head
byEditors of Portable Press
Rating: 4 out of 5 stars
4/5
Flatland
Ebook
Flatland
byEdwin A. Abbott
Rating: 4 out of 5 stars
4/5
Relativity: The special and the general theory
Ebook
Relativity: The special and the general theory
byAlbert Einstein
Rating: 5 out of 5 stars
5/5
Mathematical Thinking - For People Who Hate Math: Level Up Your Analytical and Creative Thinking Skills. Excel at Problem-Solving and Decision-Making.
Ebook
Mathematical Thinking - For People Who Hate Math: Level Up Your Analytical and Creative Thinking Skills. Excel at Problem-Solving and Decision-Making.
byAlbert Rutherford
Rating: 3 out of 5 stars
3/5
The Golden Ratio: The Divine Beauty of Mathematics
Ebook
The Golden Ratio: The Divine Beauty of Mathematics
byGary B. Meisner
Rating: 5 out of 5 stars
5/5
Basic Math Notes
Ebook
Basic Math Notes
byErnest Bywater
Rating: 5 out of 5 stars
5/5
The Math of Life and Death: 7 Mathematical Principles That Shape Our Lives
Ebook
The Math of Life and Death: 7 Mathematical Principles That Shape Our Lives
byKit Yates
Rating: 4 out of 5 stars
4/5
Is God a Mathematician?
Ebook
Is God a Mathematician?
byMario Livio
Rating: 4 out of 5 stars
4/5
Build a Mathematical Mind - Even If You Think You Can't Have One: Become a Pattern Detective. Boost Your Critical and Logical Thinking Skills.
Ebook
Build a Mathematical Mind - Even If You Think You Can't Have One: Become a Pattern Detective. Boost Your Critical and Logical Thinking Skills.
byAlbert Rutherford
Rating: 5 out of 5 stars
5/5
ACT Math & Science Prep: Includes 500+ Practice Questions
Ebook
ACT Math & Science Prep: Includes 500+ Practice Questions
byKaplan Test Prep
Rating: 3 out of 5 stars
3/5

Related podcast episodes

Skip carousel

Keeping ourselves honest when we work with observational healthcare data: The abundance of data in healthcare, and the valu…
Podcast episode
Keeping ourselves honest when we work with observational healthcare data: The abundance of data in healthcare, and the valu…
byLinear Digressions
0 ratings
0% found this document useful
058R_An adaptive learning process for developing and applying sustainability indicators with local communities (research summary)
Podcast episode
058R_An adaptive learning process for developing and applying sustainability indicators with local communities (research summary)
byWhat is The Future for Cities?
0 ratings
0% found this document useful
[Bite] Data Science and the Scientific Method
Podcast episode
[Bite] Data Science and the Scientific Method
byDataCafé
0 ratings
0% found this document useful
Ep 171: The ECRU on Team-based Research: On this episode, Dr. Katie Linder, Director of Research for Oregon State University Ecampus, is joined by the Ecampus Research Unit team to discuss logistics and tools used to conduct team-based research projects. Segment 1: The Logistics of...
Podcast episode
Ep 171: The ECRU on Team-based Research: On this episode, Dr. Katie Linder, Director of Research for Oregon State University Ecampus, is joined by the Ecampus Research Unit team to discuss logistics and tools used to conduct team-based research projects. Segment 1: The Logistics of...
byResearch in Action | A podcast for faculty & higher education professionals on research design, methods, productivity & more
0 ratings
0% found this document useful
90. LEAN Theorem Provers used to model Physics and Chemistry: http://breakingmath.io Breaking Math Email: BreakingMathPodcast@gmail.com Email us for copies of the transcript! Resources on the LEAN theorem prover and programming language can be found at the bottom of the show notes (scroll to the bottom). ...
Podcast episode
90. LEAN Theorem Provers used to model Physics and Chemistry: http://breakingmath.io Breaking Math Email: BreakingMathPodcast@gmail.com Email us for copies of the transcript! Resources on the LEAN theorem prover and programming language can be found at the bottom of the show notes (scroll to the bottom). ...
byBreaking Math Podcast
0 ratings
0% found this document useful
Machine Learning to Discover Physics and Engineering Principles with Nathan Kutz - TWiML Talk #162: In this episode, I’m joined by Nathan Kutz, Professor of applied mathematics, electrical engineering and physics at the University of Washington. Nathan and I met a few months ago at the Prepare.AI conference in St. Louis where he gave a talk on...
Podcast episode
Machine Learning to Discover Physics and Engineering Principles with Nathan Kutz - TWiML Talk #162: In this episode, I’m joined by Nathan Kutz, Professor of applied mathematics, electrical engineering and physics at the University of Washington. Nathan and I met a few months ago at the Prepare.AI conference in St. Louis where he gave a talk on...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
4 + 1 Model of Data Science: Before diving into the complex world of data science it seemed to wise to establish a shared definition of the field. Here at the UVA School of Data Science, we have defined data science with the 4 + 1 Model. This model serves an outline for the first series of UVA Data Points. It also serves as a guiding definition within the School of Data Science, touching everything from research to course planning. In this introduction trailer, host Monica Manney discusses the history, development, and function of the 4 + 1 Model of Data Science with its main author, Raf Alvarado. Below is a brief expect from An Outline of the 4 + 1 Model of Data Science by Raf Alvarado: “The point of the 4 + 1 model, abstract as it is, is to provide a practical template for strategically planning the various elements of a school of data science. To serve as an effective template, a model must be general. But generality if often purchased at the cost of intuitive understanding. The fol
Podcast episode
4 + 1 Model of Data Science: Before diving into the complex world of data science it seemed to wise to establish a shared definition of the field. Here at the UVA School of Data Science, we have defined data science with the 4 + 1 Model. This model serves an outline for the first series of UVA Data Points. It also serves as a guiding definition within the School of Data Science, touching everything from research to course planning. In this introduction trailer, host Monica Manney discusses the history, development, and function of the 4 + 1 Model of Data Science with its main author, Raf Alvarado. Below is a brief expect from An Outline of the 4 + 1 Model of Data Science by Raf Alvarado: “The point of the 4 + 1 model, abstract as it is, is to provide a practical template for strategically planning the various elements of a school of data science. To serve as an effective template, a model must be general. But generality if often purchased at the cost of intuitive understanding. The fol
byUVA Data Points
0 ratings
0% found this document useful
Ep 70: Dr. Stephan Lewandowsky on Distrust of Science: On this episode, Katie is joined by Professor Stephan Lewandowsky, a cognitive scientist at the University of Bristol. He was an Australian Professorial Fellow from 2007 to 2012, and was awarded a Discovery Outstanding Researcher Award from the...
Podcast episode
Ep 70: Dr. Stephan Lewandowsky on Distrust of Science: On this episode, Katie is joined by Professor Stephan Lewandowsky, a cognitive scientist at the University of Bristol. He was an Australian Professorial Fellow from 2007 to 2012, and was awarded a Discovery Outstanding Researcher Award from the...
byResearch in Action | A podcast for faculty & higher education professionals on research design, methods, productivity & more
0 ratings
0% found this document useful
[From the Archives] Ep 91: Dr. Mary Ellen Dello Stritto and Dr. William Marelich on the Applied Quantitative Perspective
Podcast episode
[From the Archives] Ep 91: Dr. Mary Ellen Dello Stritto and Dr. William Marelich on the Applied Quantitative Perspective
byResearch in Action | A podcast for faculty & higher education professionals on research design, methods, productivity & more
0 ratings
0% found this document useful
(Dispatch from the Scenius) Dr. Steve Spear’s 2019 and 2020 DOES Talks on Rapid, Distributed, Dynamic Learning: In the latest Dispatch from the Scenius, Gene Kim brings you two of Dr. Steve Spear’s DevOps Enterprise Summit presentations in their entirety. In Spear’s 2019 presentation, “Discovering Your Way to Greatness: How Finding and Fixing Faults is the P...
Podcast episode
(Dispatch from the Scenius) Dr. Steve Spear’s 2019 and 2020 DOES Talks on Rapid, Distributed, Dynamic Learning: In the latest Dispatch from the Scenius, Gene Kim brings you two of Dr. Steve Spear’s DevOps Enterprise Summit presentations in their entirety. In Spear’s 2019 presentation, “Discovering Your Way to Greatness: How Finding and Fixing Faults is the P...
byThe Idealcast with Gene Kim by IT Revolution
0 ratings
0% found this document useful
B. Fong and D. I. Spivak, "An Invitation to Applied Category Theory: Seven Sketches in Compositionality" (Cambridge UP, 2019): Fong and Spivak have written a marvelous and timely new textbook that, as its title suggests, invites readers of all backgrounds to explore what it means to take a compositional approach and how it might serve their needs....
Podcast episode
B. Fong and D. I. Spivak, "An Invitation to Applied Category Theory: Seven Sketches in Compositionality" (Cambridge UP, 2019): Fong and Spivak have written a marvelous and timely new textbook that, as its title suggests, invites readers of all backgrounds to explore what it means to take a compositional approach and how it might serve their needs....
byNew Books in Mathematics
0 ratings
0% found this document useful
CERAWeek: How energy transition discussions are shifting: This week the ESG Insider podcast is covering key themes from one of the world’s largest energy conferences — the annual CERAWeek gathering hosted by S&P Global in Houston, Texas. The event convenes stakeholders from across sectors...
Podcast episode
CERAWeek: How energy transition discussions are shifting: This week the ESG Insider podcast is covering key themes from one of the world’s largest energy conferences — the annual CERAWeek gathering hosted by S&P Global in Houston, Texas. The event convenes stakeholders from across sectors...
byESG Insider: A podcast from S&P Global
0 ratings
0% found this document useful
Renee M. P. Teate, "SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis" (John Wiley & Sons, 2021): An interview with Renee M. P. Teate
Podcast episode
Renee M. P. Teate, "SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis" (John Wiley & Sons, 2021): An interview with Renee M. P. Teate
byNew Books in Science, Technology, and Society
0 ratings
0% found this document useful
Data Decisions (w/ Dr. Peter Enns)
Podcast episode
Data Decisions (w/ Dr. Peter Enns)
byThe People Nerds Podcast
0 ratings
0% found this document useful
#89 – Owen Cotton-Barratt on epistemic systems and layers of defense against potential global catastrophes: You could think of academia as one big epistemic system — something which processes information, directs people's attention, and finds new ideas. 
Podcast episode
#89 – Owen Cotton-Barratt on epistemic systems and layers of defense against potential global catastrophes: You could think of academia as one big epistemic system — something which processes information, directs people's attention, and finds new ideas. 
by80,000 Hours Podcast
0 ratings
0% found this document useful
The U.S. Department of Defense's Energy and Environment Innovation Symposium – Focus on Resilience with SERDP + ESTCP: In the epic th episode of America Adapts, we’re partnering with the U.S. Department of Defense and highlighting the adaptation research they support. In this episode, we're turning our focus to The Strategic Environmental Research and...
Podcast episode
The U.S. Department of Defense's Energy and Environment Innovation Symposium – Focus on Resilience with SERDP + ESTCP: In the epic th episode of America Adapts, we’re partnering with the U.S. Department of Defense and highlighting the adaptation research they support. In this episode, we're turning our focus to The Strategic Environmental Research and...
byAmerica Adapts the Climate Change Podcast
0 ratings
0% found this document useful
WLP186 Thoughtful Thursday: Reading Between the Headlines: Today’s unusual episode brings us an interview originally recorded for another show, but it’s a great fit for our audience here, with whom we often share and discuss interesting research findings from the changing world of work. Journalism and...
Podcast episode
WLP186 Thoughtful Thursday: Reading Between the Headlines: Today’s unusual episode brings us an interview originally recorded for another show, but it’s a great fit for our audience here, with whom we often share and discuss interesting research findings from the changing world of work. Journalism and...
by21st Century Work Life and leading remote teams
0 ratings
0% found this document useful
Renee M. P. Teate, "SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis" (John Wiley & Sons, 2021): An interview with Renee M. P. Teate
Podcast episode
Renee M. P. Teate, "SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis" (John Wiley & Sons, 2021): An interview with Renee M. P. Teate
byNew Books in Economics
0 ratings
0% found this document useful
Episode 7: Introducing Perception vs. Reality
Podcast episode
Episode 7: Introducing Perception vs. Reality
byStop Anamythics Podcast
0 ratings
0% found this document useful
074R_Resilient urban planning: major principles and criteria (research summary)
Podcast episode
074R_Resilient urban planning: major principles and criteria (research summary)
byWhat is The Future for Cities?
0 ratings
0% found this document useful
083R_Operationalising a concept: The systematic review of composite indicator building for measuring community disaster resilience (research summary)
Podcast episode
083R_Operationalising a concept: The systematic review of composite indicator building for measuring community disaster resilience (research summary)
byWhat is The Future for Cities?
0 ratings
0% found this document useful
The Topography of Problems, and the Importance of Distributed Problem Solving with Dr. Steve Spear: In this bonus follow-up interview, Gene Kim and Dr. Steve Spear dig into what makes for great leadership today, including the importance of distributed decision-making and problem-solving. They showcase the real advantages of allowing more decisions ...
Podcast episode
The Topography of Problems, and the Importance of Distributed Problem Solving with Dr. Steve Spear: In this bonus follow-up interview, Gene Kim and Dr. Steve Spear dig into what makes for great leadership today, including the importance of distributed decision-making and problem-solving. They showcase the real advantages of allowing more decisions ...
byThe Idealcast with Gene Kim by IT Revolution
0 ratings
0% found this document useful
Renee M. P. Teate, "SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis" (John Wiley & Sons, 2021): An interview with Renee M. P. Teate
Podcast episode
Renee M. P. Teate, "SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis" (John Wiley & Sons, 2021): An interview with Renee M. P. Teate
byNew Books in Business, Management, and Marketing
0 ratings
0% found this document useful
344: Responsible Consumption and Production of Research with Elizabeth Engel and Polly Karpowicz: It’s critical that learning business professionals pay careful attention to the research they create and the research they rely on for making decisions. This means asking questions, knowing the research methods used, and understanding the...
Podcast episode
344: Responsible Consumption and Production of Research with Elizabeth Engel and Polly Karpowicz: It’s critical that learning business professionals pay careful attention to the research they create and the research they rely on for making decisions. This means asking questions, knowing the research methods used, and understanding the...
byLeading Learning Podcast
0 ratings
0% found this document useful
Alignment Newsletter #168: Four technical topics for which Open Phil is soliciting grant proposals: Four technical topics for which Open Phil is soliciting grant proposals
Podcast episode
Alignment Newsletter #168: Four technical topics for which Open Phil is soliciting grant proposals: Four technical topics for which Open Phil is soliciting grant proposals
byAlignment Newsletter Podcast
0 ratings
0% found this document useful
On the ground at CERAWeek: Where the energy world stands on the low-carbon transition: This week the ESG Insider podcast is on the ground in Houston, Texas for a special episode covering key themes from one of the world’s largest energy conferences — the annual CERAWeek event hosted by S&P Global. The event brings together big...
Podcast episode
On the ground at CERAWeek: Where the energy world stands on the low-carbon transition: This week the ESG Insider podcast is on the ground in Houston, Texas for a special episode covering key themes from one of the world’s largest energy conferences — the annual CERAWeek event hosted by S&P Global. The event brings together big...
byESG Insider: A podcast from S&P Global
0 ratings
0% found this document useful
Bridging the Research-to-Practice Gap Part 1: It’s Not Your Fault
Podcast episode
Bridging the Research-to-Practice Gap Part 1: It’s Not Your Fault
bySLP Nerdcast
0 ratings
0% found this document useful
Nature's Take: Can Registered Reports help tackle publication bias?: Nature staff take on the big topics that matter in science.
Podcast episode
Nature's Take: Can Registered Reports help tackle publication bias?: Nature staff take on the big topics that matter in science.
byNature Podcast
0 ratings
0% found this document useful
Hidden Gems From ASCO22: Abstracts on EDI, Health Care Economics, and More: Dr. Pamela Kunz, of the Yale Cancer Center, and the JCO consultant editor for Meeting Abstracts, discusses “hidden gems” from ASCO22, highlighting abstracts that address EDI, global health, health care economics, and more. Abstracts/Tweetorials ...
Podcast episode
Hidden Gems From ASCO22: Abstracts on EDI, Health Care Economics, and More: Dr. Pamela Kunz, of the Yale Cancer Center, and the JCO consultant editor for Meeting Abstracts, discusses “hidden gems” from ASCO22, highlighting abstracts that address EDI, global health, health care economics, and more. Abstracts/Tweetorials ...
byASCO Daily News
0 ratings
0% found this document useful
Nature's Take: what's next for the preprint revolution: Nature editors take on the big topics that matter in science.
Podcast episode
Nature's Take: what's next for the preprint revolution: Nature editors take on the big topics that matter in science.
byNature Podcast
0 ratings
0% found this document useful

Skip carousel

The National Academies Illustrates the More Nuanced Value of Transparency in Science
Union of Concerned Scientists
Article
The National Academies Illustrates the More Nuanced Value of Transparency in Science
May 13, 2019
4 min read
Opinion: Science Publications Should Use Checklists, Badges To Signal Trustworthiness
STAT
Article
Opinion: Science Publications Should Use Checklists, Badges To Signal Trustworthiness
Sep 30, 2019
Scientists have time-honored criteria for deciding which research results to trust. Those should be conveyed with signals the public can see and understand.
4 min read
It’s the Fourth of July! What Could be More Patriotic Than Serving on a Science Advisory Board?
Union of Concerned Scientists
Article
It’s the Fourth of July! What Could be More Patriotic Than Serving on a Science Advisory Board?
Jun 29, 2017
Earlier this month, I wrote about my experience serving on various federal science advisory boards and committees. In that post, I encouraged my fellow scientists to consider taking on this challenging but rewarding task. A lot has happened over the
2 min read
Updated Restricted Science Rule Spells Reanalysis Paralysis for the EPA
Union of Concerned Scientists
Article
Updated Restricted Science Rule Spells Reanalysis Paralysis for the EPA
Nov 12, 2019
7 min read
Opinion: All Study Participants Have A Right To Know Their Own Results. My Lab Has Been Doing That For Years
STAT
Article
Opinion: All Study Participants Have A Right To Know Their Own Results. My Lab Has Been Doing That For Years
Sep 5, 2018
Giving study participants their individual results can drive greater public participation in research, increased support for science, and better health.
6 min read
Engaged Science: 6 Tips for the Trump Era
Union of Concerned Scientists
Article
Engaged Science: 6 Tips for the Trump Era
Jun 26, 2018
5 min read
Why We Shouldn’t Accept Unrepeated Science—Our Author Responds to His Critics
Nautilus
Article
Why We Shouldn’t Accept Unrepeated Science—Our Author Responds to His Critics
Aug 17, 2016
4 min read
Opinion: Machine Learning For Clinical Decision-making: Pay Attention To What You Don’t See
STAT
Article
Opinion: Machine Learning For Clinical Decision-making: Pay Attention To What You Don’t See
Dec 12, 2019
Don't take results from machine learning algorithms at face value. Ask what information isn't available. What subgroups haven't been prioritized? Who is on the research team?
4 min read
Guidelines for Reading & Interpreting Sports Science Research
UltraRunning Magazine
Article
Guidelines for Reading & Interpreting Sports Science Research
Oct 29, 2021
5 min read
Is Transparency Always A Good Thing? EPA Weighs Controversial New Rule.
The Christian Science Monitor
Article
Is Transparency Always A Good Thing? EPA Weighs Controversial New Rule.
Mar 12, 2020
The Environmental Protection Agency is mulling a proposal to give preference to scientific research whose datasets and models are publicly available.
3 min read
Opinion: Real-world Evidence Is Changing The Way We Study Drug Safety And Effectiveness
STAT
Article
Opinion: Real-world Evidence Is Changing The Way We Study Drug Safety And Effectiveness
Jan 29, 2019
Instead of waiting for fleshed-out protocols for using real-world evidence, pharmaceutical companies should dive in now using a commonsense approach that relies on rigor and reason.
4 min read
Zero Bias: A Cq Editorial
CQ Amateur Radio
Article
Zero Bias: A Cq Editorial
Aug 1, 2023
I’m writing this just after the 4th of July, when it’s even more common than usual to hear people say “thank you for your service” to just about anybody in a uniform. This is great, of course, as long as it’s sincere and not just a cliché, but that’s
3 min read
The Situation: How And Why Are We Here?
Architecture Australia
Article
The Situation: How And Why Are We Here?
Aug 28, 2022
It has long been suspected that we have a problem with mental wellbeing in the Australian architecture profession. But we have lacked conclusive evidence of whether there is a problem and, if yes, what form it takes and what’s causing it. The convers
3 min read
A Graduate Researcher’s (Brief) Guide to: Creating a Student Science Policy Group
Union of Concerned Scientists
Article
A Graduate Researcher’s (Brief) Guide to: Creating a Student Science Policy Group
Apr 18, 2018
5 min read
Top Scientists Revamp Standards To Foster Integrity In Research
NPR
Article
Top Scientists Revamp Standards To Foster Integrity In Research
Apr 11, 2017
3 min read
EPA Might Be Using Its Advisors To Do Away With Protective Science Guidelines
Union of Concerned Scientists
Article
EPA Might Be Using Its Advisors To Do Away With Protective Science Guidelines
Jul 26, 2019
4 min read
An Intellectual Odyssey
Business Today
Article
An Intellectual Odyssey
Dec 11, 2017
2 min read
The Trump EPA Is Restricting EPA Science. It’s Somehow Worse than We Expected.
Union of Concerned Scientists
Article
The Trump EPA Is Restricting EPA Science. It’s Somehow Worse than We Expected.
Mar 4, 2020
3 min read
Great Expectations And The Art Of Whinging
AQ: Australian Quarterly
Article
Great Expectations And The Art Of Whinging
Dec 31, 2016
ARTICLE BY: Dr Andrew Sta pleton Science and SciComm with ANDYWATER Many of these academics are the feared peerreviewers of the present. Having survived high blood pressure and ingrown toenails, they carry with them the battle scars of a long career
4 min read
Data Analytics: From Bias to Better Decisions
Rotman Management
Article
Data Analytics: From Bias to Better Decisions
Sep 1, 2018
7 min read
Science Impact
AQ: Australian Quarterly
Article
Science Impact
Dec 30, 2019
To what end are you working? Presumably for the principle that science’s sole aim must be to lighten the burden of human existence. If the scientists, brought to heel by self-interested rulers, limit themselves to piling up knowledge for knowledge’s
10 min read
A New Goal: Aim To Be Less Wrong
NPR
Article
A New Goal: Aim To Be Less Wrong
Feb 12, 2018
4 min read
We Should Not Accept Scientific Results That Have Not Been Repeated
Nautilus
Article
We Should Not Accept Scientific Results That Have Not Been Repeated
Feb 5, 2018
5 min read
Put Your Faith In Science
TIME
Article
Put Your Faith In Science
Nov 8, 2019
3 min read
A CQ Exclusive: Slow Website Speeds Cause Spectrum Rage
CQ Amateur Radio
Article
A CQ Exclusive: Slow Website Speeds Cause Spectrum Rage
Apr 1, 2022
5 min read
Management: So Much More Than a Science
Rotman Management
Article
Management: So Much More Than a Science
Sep 1, 2019
11 min read
Science Advice in Action: Highlights from an EPA Science Advisory Board Meeting
Union of Concerned Scientists
Article
Science Advice in Action: Highlights from an EPA Science Advisory Board Meeting
Sep 5, 2017
5 min read
We Should Not Accept Scientific Results That Have Not Been Repeated
Nautilus
Article
We Should Not Accept Scientific Results That Have Not Been Repeated
Jul 29, 2016
5 min read
People Who Think Further Into The Future Less Likely To Take Risks
Futurity
Article
People Who Think Further Into The Future Less Likely To Take Risks
Feb 6, 2018
People who tend to think further into the future may be more likely to invest money and avoid risks, a new study suggests. Researchers tapped big data tools to conduct text analyses of nearly 40,000 Twitter users and to run online experiments of thei
3 min read
New White House Guidance Protects Federal Scientists and Their Work
Union of Concerned Scientists
Article
New White House Guidance Protects Federal Scientists and Their Work
Jan 12, 2023
The new framework helps protect science-based decisions from undue political interference at all federal agencies.
5 min read

Related categories

Skip carousel

Reviews for Complex Surveys

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Complex Surveys - Thomas Lumley

ACKNOWLEDGMENTS

PREFACE

ACRONYMS

CHAPTER 1: BASIC TOOLS

1.1 GOALS OF INFERENCE

1.2 AN INTRODUCTION TO THE DATA

1.3 OBTAINING THE SOFTWARE

1.4 USING R

EXERCISES

CHAPTER 2: SIMPLE AND STRATIFIED SAMPLING

2.1 ANALYZING SIMPLE RANDOM SAMPLES

2.2 STRATIFIED SAMPLING

2.3 REPLICATE WEIGHTS

2.4 OTHER POPULATION SUMMARIES

2.5 ESTIMATES IN SUBPOPULATIONS

2.6 DESIGN OF STRATIFIED SAMPLES

EXERCISES

CHAPTER 3: CLUSTER SAMPLING

3.1 INTRODUCTION

3.2 DESCRIBING MULTISTAGE DESIGNS TO R

3.3 SAMPLING BY SIZE

3.4 REPEATED MEASUREMENTS

EXERCISES

CHAPTER 4: GRAPHICS

4.1 WHY IS SURVEY DATA DIFFERENT?

4.2 PLOTTING A TABLE

4.3 ONE CONTINUOUS VARIABLE

4.4 TWO CONTINUOUS VARIABLES

4.5 CONDITIONING PLOTS

4.6 MAPS

EXERCISES

CHAPTER 5: RATIOS AND LINEAR REGRESSION

5.1 RATIO ESTIMATION

5.2 LINEAR REGRESSION

5.3 IS WEIGHTING NEEDED IN REGRESSION MODELS?

EXERCISES

CHAPTER 6: CATEGORICAL DATA REGRESSION

6.1 LOGISTIC REGRESSION

6.2 ORDINAL REGRESSION

6.3 LOGLINEAR MODELS

EXERCISES

CHAPTER 7: POST-STRATIFICATION, RAKING AND CALIBRATION

7.1 INTRODUCTION

7.2 POST-STRATIFICATION

7.3 RAKING

7.4 GENERALIZED RAKING, GREG ESTIMATION, AND CALIBRATION

7.5 BASU’S ELEPHANTS

7.6 SELECTING AUXILIARY VARIABLES FOR NON-RESPONSE

EXERCISES

CHAPTER 8: TWO-PHASE SAMPLING

8.1 MULTISTAGE AND MULTIPHASE SAMPLING

8.2 SAMPLING FOR STRATIFICATION

8.3 THE CASE–CONTROL DESIGN

8.4 SAMPLING FROM EXISTING COHORTS

8.5 USING AUXILIARY INFORMATION FROM PHASE ONE

EXERCISES

CHAPTER 9: MISSING DATA

9.1 ITEM NON-RESPONSE

9.2 TWO-PHASE ESTIMATION FOR MISSING DATA

9.3 IMPUTATION OF MISSING DATA

EXERCISES

CHAPTER 10: * CAUSAL INFERENCE

10.1 IPTW ESTIMATORS

10.2 MARGINAL STRUCTURAL MODELS

APPENDIX A: ANALYTIC DETAILS

A.1 ASYMPTOTICS

A.2 VARIANCES BY LINEARIZATION

A.3 TESTS IN CONTINGENCY TABLES

A.4 MULTIPLE IMPUTATION

A.5 CALIBRATION AND INFLUENCE FUNCTIONS

A.6 CALIBRATION IN RANDOMIZED TRIALS AND ANCOVA

APPENDIX B: BASIC R

B.1 READING DATA

B.2 DATA MANIPULATION

B.3 RANDOMNESS

B.4 METHODS AND OBJECTS

B.5 WRITING FUNCTIONS

APPENDIX C: COMPUTATIONAL DETAILS

C.1 LINEARIZATION

C.2 REPLICATE WEIGHTS

C.3 SCATTERPLOT SMOOTHERS

C.4 QUANTILES

C.5 BUG REPORTS AND FEATURE REQUESTS

APPENDIX D: DATABASE-BACKED DESIGN OBJECTS

D.1 LARGE DATA

D.2 SETTING UP DATABASE INTERFACES

APPENDIX E: EXTENDING THE PACKAGE

E.1 A CASE STUDY: NEGATIVE BINOMIAL REGRESSION

E.2 USING A POISSON MODEL

E.3 REPLICATE WEIGHTS

E.4 LINEARIZATION

REFERENCES

AUTHOR INDEX

TOPIC INDEX

WILEY SERIES IN SURVEY METHODOLOGY

Established in Part by WALTER A. SHEWHART AND SAMUEL S. WILKS

Editors: Mick R Couper, Graham Kalton, J. N. K. Rao, Norbert Schwarz, Christopher Skinner

Editor Emeritus: Robert M. Groves

A complete list of the titles in this series appears at the end of this volume.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.

Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., Ill River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic format. For information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

Lumley, Thomas, 1969—

Complex surveys : a guide to analysis using R / Thomas Lumley.

p. cm.

Includes bibliographical references and index.

ISBN 978-0-470-28430-8 (pbk.)

1. Mathematical statistics—Data processing. 2. R (Computer program language) I. Title.

QA276.45. R3L86 2010

515.0285—dc22

2009033999

Acknowledgments

Most of this book was written while I was on sabbatical at the University of Auckland and the University of Leiden. The Statistics department in Auckland and the Department of Clinical Epidemiology at Leiden University Medical Center were very hospitable and provided many interesting and productive distractions from writing.

I had useful discussions on a number of points with Alastair Scott and Chris Wild. Bruce Psaty, Stas Kolenikov, and Barbara McKnight gave detailed and helpful comments on a draft of the text. The ’s interpretation of the $ operator came from Ken Rice. Hadley Wickham explained how to combine city and state data in a single map. Paul Murrell made some suggestions about types of graphics to include. The taxonomy of regression predictor variables is from Scott Emerson. I learned about some of the references on reification from Cosma Shalizi’s web page. The students and instructors in STAT/CSSS 529 (Seattle) and STATS 740 (Auckland) tried out a draft of the book and pointed out a few problems that I hope have been corrected.

Some financial support for my visit to Auckland was provided by Alastair Scott, Chris Wild, and Alan Lee from a grant from the Marsden Fund, and my visit to Leiden was supported in part by Fondation Leducq through their funding of the LINAT collaboration. My sabbatical was also supported by the University of Washington.

The survey package has benefited greatly from comments, questions, and bug reports from its users, an attempt at a list is in the THANKS file in the package.

Preface

This book presents a practical guide to analyzing complex surveys using R, with occasional digressions into related areas of statistics. Complex survey analysis differs from most of statistics philosophically and in the substantive problems it faces. In the past this led to a requirement for specialized software and the spread of specialized jargon, and survey analysis became separated from the rest of statistics in many ways. In recent years there has been a convergence of ways. All major statistical packages now include at least some survey analysis features, and some of the mathematical techniques of survey analysis have been incorporated in widely-used statistical methods for missing data and for causal inference.

More importantly for this book, researchers in the social science and health sciences are increasingly interested in using data from complex surveys to conduct the same sorts of analyses that they traditionally conduct with more straightforward data. Medical researchers are also increasingly aware of the advantages of well-designed subsamples when measuring novel, expensive variables on an existing cohort.

This book is designed for readers who have some experience with applied statistics, especially in the social sciences or health sciences, and are interested in learning about survey analysis. As a result, we will spend more time on graphics, regression modelling, and two-phase designs than is typical for a survey analysis text. I have presented most of the material in this book in a one-quarter course for graduate students who are not specialist statisticians but have had a graduate-level introductory course in applied statistics, including linear and logistic regression. Chapters 1-6 should be of general interest to anyone wishing to analyze complex surveys. Chapters 7-10 are, on average, more technical and more specialized than the earlier material, and some of the content, particularly in Chapter 8, reflects recent research.

The widespread availability of software for analyzing complex surveys means that it is no longer as important for most researchers to learn a list of computationally convenient special cases of formulas for means and standard errors. Formulas will be presented in the text only when I feel they are useful for understanding concepts; the appendices present some additional mathematical and computational descriptions that will help in comparing results from different software systems. An excellent reference for statisticians who want more detail is Model Assisted Survey Sampling by Särndal, Swensson, and Wretman [151]. Some of the exercises presented at the end of each chapter require more mathematical or programming background, these are indicated with a . They are not necessarily more difficult than the unstarred exercises.

This book is designed around a particular software system: the survey package for the R statistical environment, and one of its main goals is to document and explain this system. All the examples, tables, and graphs in the book are produced with R, and code and data for you to reproduce nearly all of them is available. There are three reasons for choosing to emphasize R in this way: it is open-source software, which makes it easily available; it is very widely known and used by academic statisticians, making it convenient for teaching; and because I designed the survey package it emphasizes the areas I think are most important and readily automated about design-based inference. For other software for analyzing complex surveys, see the comprehensive list maintained by Alan Zaslavsky at http://www.hcp.med.harvard.edu/statistics/survey-soft/.

There are important statistical issues in the design and analysis of complex surveys outside design-based inference that I give little or no attention to. Small area estimation and item response theory are based on very different areas of statistics, and I think are best addressed under spatial statistics and multivariate statistics, respectively. Statistics has relatively little positive to say about about non-sampling error, although I do discuss raking, calibration, and the analysis of multiply-imputed data. There are also interesting but specialized areas of complex sampling that are not covered in the book (or the software), mostly because I lack experience with their application. These include adaptive sampling techniques, and methods from ecology such as line and quadrat sampling.

Code for reproducing the examples in this book (when not in the book itself), errata, and other information, can be found from the web site: http://faculty.washington.edu/tlumley/svybook. If you find mistakes or infelicities in the book or the package I would welcome an email: tlumley@u.washington.edu.

Acronyms

CHAPTER 1

BASIC TOOLS

In which we meet the probability sample and the R language.

1.1 GOALS OF INFERENCE

1.1.1 Population or process?

The mathematical development for most of statistics is model-based, and relies on specifying a probability model for the random process that generates the data. This can be a simple parametric model, such as a Normal distribution, or a complicated model incorporating many variables and allowing for dependence between observations. To the extent that the model represents the process that generated the data, it is possible to draw conclusions that can be generalized to other situations where the same process operates. As the model can only ever be an approximation, it is important (but often difficult) to know what sort of departures from the model will invalidate the analysis.

The analysis of complex survey samples, in contrast, is usually design-based. The researcher specifies a population, whose data values are unknown but are regarded as fixed, not random. The observed sample is random because it depends on the random selection of individuals from this fixed population. The random selection procedure of individuals (the sample design) is under the control of the researcher, so all the probabilities involved can, in principle, be known precisely. The goal of the analysis is to estimate features of the fixed population, and design-based inference does not support generalizing the findings to other populations.

In some situations there is a clear distinction between population and process inference. The Bureau of Labor Statistics can analyze data from a sample of the US population to find out the distribution of income in men and women in the US. The use of statistical estimation here is precisely to generalize from a sample to the population from which it was taken.

The University of Washington can analyze data on its faculty salaries to provide evidence in a court case alleging gender discrimination. As the university’s data are complete there is no uncertainty about the distribution of salaries in men and women in this population. Statistical modelling is needed to decide whether the differences in salaries can be attributed to valid causes, in particular to differences in seniority, to changes over time in state funding, and to area of study. These are questions about the process that led to the salaries being the way they are.

In more complex analyses there can be something of a compromise between these goals of inference. A regression model fitted to blood pressure data measured on a sample from the US population will provide design-based conclusions about associations in the US population. Sometimes these design-based conclusions are exactly what is required, e.g., there is more hypertension in blacks than in whites. Often the goal is to find out why some people have high blood pressure: is the racial difference due to diet, or stress, or access to medical care, or might there be a genetic component?

1.1.2 Probability samples

The fundamental statistical concept in design-based inference is the probability sample or random sample. In everyday speech, taking a random sample of 1000 individuals means a sampling procedure when any subset of 1000 people from the population is equally likely to be selected. The technical term for this is a simple random sample. The Law of Large Numbers implies that the sample of 1000 people is likely to be representative of the population, according to essentially any criteria we are interested in. If we compute the mean age, or the median income, or the proportion of registered Republican voters in the sample, the answer is likely to be close to the value for the population.

We could also end up with a sample of 1000 individuals from the US population, for example, by taking a simple random sample of 20 people from each state. On many criteria this sample is unlikely to be representative, because people from states with low populations are more likely to be sampled. Residents of these states have a similar age distribution to the country as a whole but tend to have lower incomes and be more politically conservative. As a result the mean age of the sample will be close to the mean age for the US population, but the median income is likely to be lower, and the proportion of registered Republican voters higher than for the US population. As long as we know the population of each state, this stratified random sample is still a probability sample. Yet another approach would be to choose a simple random sample of 50 counties from the US and then sample 20 people from each county. This sample would over-represent counties with low populations, which tend to be in rural areas. Even so, if we know all the counties in the US, and if we can find the number of households in the counties we choose, this is also a probability sample.

It is important to remember that what makes a probability sample is the procedure for taking samples from a population, not just the data we happen to end up with.

The properties we need of a sampling method for design-based inference are as follows:

1. Every individual in the population must have a non-zero probability of ending up in the sample (written πi for individual i)

2. The probability πi must be known for every individual who does end up in the sample.

3. Every pair of individuals in the sample must have a non-zero probability of both ending up in the sample (written πij for the pair of individuals (i,j)).

4. The probability πij must be known for every pair that does end up in the sample.

The first two properties are necessary in order to get valid population estimates; the last two are necessary to work out the accuracy of the estimates. If individuals were sampled independently of each other the first two properties would guarantee the last two, since then πij = πiπj, but a design that sampled one random person from each US county would have πi > 0 for everyone in the US and πij = 0 for two people in the same county. In the survey package, as in most software for analysis of complex samples, the computer will work out πij from the design description, they do not need to be specified explicitly.

The world is imperfect in many ways, and the necessary properties are present only as approximations in real surveys. A list of residences for sampling will include some that are not inhabited and miss some that have been newly constructed. Some people (me, for example) do not have a landline telephone, others may not be at home or may refuse to answer some or all of the questions. We will initially ignore these problems, but aspects of them are addressed in Chapters 7 and 9.

1.1.3 Sampling weights

If we take a simple random sample of 3500 people from California (with total population 35 million) then any person in California has a 1/10000 chance of being sampled, so πi = 3500/3500000 = 1/10000 for every i. Each of the people we sample represents 10000 Californians. If it turns out that 400 of our sample have high blood pressure and 100 are unemployed, we would expect 400 × 10000 = 4 million people with high blood pressure and 100 × 10000 = 1 million unemployed in the whole state. If we sample 3500 people from Connecticut (population 3,500,000), all the sampling probabilities are equal to 3500/3500000 = 1/1000, so each person in the sample represents 1000 people in the population. If 400 of the sample had high blood pressure we would expect 400 × 1000 = 400000 people with high blood pressure in the state population.

The fundamental statistical idea behind all of design-based inference is that an individual sampled with a sampling probability of πi represents 1/πi individuals in the population. The value 1/πi is called the sampling weight.

This weighting or grossing up operation is easy to grasp for a simple random sample where the probabilities are the same for every one. It is less obvious that the same rule applies when the sampling probabilities can be different. In particular, it may not be intuitive that the sampling probabilities for individuals who were not sampled do not need to be known.

Consider measuring income on a sample of one individual from a population of N, where πi might be different for each individual. The estimate ( income) of the total income of the population (T income) would be the income for that individual multiplied by the sampling weight:

This will not be a very good estimate, since it is based on only one person, but it will be unbiased: the expected value of the estimate will equal the true population total. The expected value of the estimate is the value of the estimate when we select person i, times the probability of selecting person i, added up over all people in the population

The same algebra applies with only slightly more work to samples of any size. The 1/πi sampling weights used to construct the estimate cancel out the πi probability that this particular individual is sampled. The estimator of the population total is called the Horvitz-Thompson estimator [63] after the authors who proposed the most general form and a standard error estimate for it, but the principle is much older.

Estimates for any other population quantity are derived in various ways from estimates for a population total, so the Horvitz-Thompson estimator of the population total is the foundation for all the analyses described in the rest of the book. Because of the importance of sampling weights and the inconvenience of writing fractions it is useful to have a notation for the weighted observations. If Xi is a measurement of variable X on person i, we write

Given a sample of size n the Horvitz-Thompson estimator X for the population total TX of X is

(1.1) c01e001

The variance estimate is

(1.2)

c00e000

Knowing the formula for the variance estimator is less important to the applied user, but it is useful to note two things. The first is that the formula applies to any design, however complicated, where πi and πij are known for the sampled observations. The second is that the formula depends on the pairwise sampling probabilities πij, not just on the sampling weights; this is how correlations in the sampling design enter the computations. Some other ways of writing the variance estimator are explored in the exercises at the end of this chapter.

Other meanings of weights Statisticians and statistical software use the term ‘weight’ to mean at least three different things.

sampling weights A sampling weight of 1000 means that the observation represents 1000 individuals in the population.

precision weights A precision (or inverse-variance) weight of 1000 means that the observation has 1000 times lower variance than an observation with a weight of 1.

frequency weights A frequency weight of 1000 means that the sample contains 1000 identical observations and space is being saved by using only one record in the data set to represent them.

In this book, weights are always sampling weights, 1/πi. Most statistical software that is not specifically designed for survey analysis will assume that weights are precision weights or frequency weights. Giving sampling weights to software that is expecting precision weights or frequency weights will often (but not always) give correct point estimates, but will usually give seriously incorrect standard errors, confidence intervals, and p-values.

1.1.4 Design effects

A complex survey will not have the same standard errors for estimates as a simple random sample of the same size, but many sample size calculations are only conveniently available for simple random samples. The design effect was defined by Kish (1965) as the ratio of a variance of an estimate in a complex sample to the variance of the same estimate in a simple random sample [75].

If the necessary sample size for a given level of precision is known for a simple random sample, the sample size for a complex design can be obtained by multiplying by the design effect. While the design effect will not be known in advance, some useful guidance can be obtained by looking at design effects reported for other similar surveys.

Design effects for large studies are usually greater than 1.0, implying that larger sample sizes are needed for complex designs than for a simple random sample. For example, the California Health Interview Survey reports typical design effects in the range 1.4–2.0. It may be surprising that complex designs are used if they require both larger samples sizes and special statistical methods, but as Chapter 3 discusses, the increased sample size can often still result in a lower cost.

The other ratio of variances that is of interest is the ratio of the variance of a correct estimate to the incorrect variance that would be obtained by pretending that the data are a simple random sample. This ratio allows the results of an analysis to be (approximately) corrected if software is not available to account for the complex design. This second ratio is sometimes called the design effect and sometimes the misspecification effect.

That is, the design effect compares the variance from correct estimates in two different designs, while the misspecification effect compares correct and incorrect analyses of the same design. Although these two ratios of variances are not the same, they are often similar for practical designs. The misspecification effect is of relatively little interest now that software for complex designs is widely available, and it will not appear further in this book.

1.2 AN INTRODUCTION TO THE DATA

Most of the examples used in this book will be based either on real surveys or on simulated surveys drawn from real populations. Some of the data sets will be quite large by textbook standards, but the computer used to write this book is a laptop dating from 2006, so it seems safe to assume that most readers will have access to at least this level of computer power. Links to the source and documentation for all these data sets can be found on the web site for the book.

Nearly all the data are available to you in electronic form to reproduce these analyses, but some effort may be required to get them. Surveys in the United States tend to provide (non-identifying, anonymized) data for download by anyone, and the datasets from these surveys used in this book are available on the book’s web site in directly usable formats. Access to survey data from Britain tends to require much filling in of forms, so the book’s web site provides instructions on where to find the data and how to convert it to usable form. These national differences partly reflect the differences in copyright policy in the two countries. In the US, the federal government places materials created at public expense in the public domain; in Britain, the copyright is retained by the government.

You may be unfamiliar with some of the terminology in the descriptions of data sets, which will be described in subsequent chapters.

1.2.1 Real surveys

NHANES. The National Health and Nutrition Examination Surveys have been conducted by the US National Center for Health Statistics (NCHS) since 1970. They are designed to provide nationwide data on health and disease, and on dietary and clinical risk factors. Each four-year cycle of NHANES recruits about 28000 people in a multistage sample.

Enjoying the preview?

Page 1 of 1

Complex Surveys: A Guide to Analysis Using R

About this ebook

Thomas Lumley

Related authors

Related to Complex Surveys

Titles in the series (27)

Related ebooks

Mathematics For You

Related podcast episodes

Related articles

Related categories

Reviews for Complex Surveys

What did you think?

Book preview

Complex Surveys - Thomas Lumley

CONTENTS

Acknowledgments

Preface

1.1 GOALS OF INFERENCE

1.1.1 Population or process?

1.1.2 Probability samples

1.1.3 Sampling weights

1.1.4 Design effects

1.2 AN INTRODUCTION TO THE DATA

1.2.1 Real surveys