Knowledge Discovery in the Social Sciences: A Data Mining Approach

Ebook429 pages3 hours

Knowledge Discovery in the Social Sciences: A Data Mining Approach

Name: Knowledge Discovery in the Social Sciences: A Data Mining Approach
Author: Prof. Xiaoling Shu
ISBN: 9780520965874

By Prof. Xiaoling Shu

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Knowledge Discovery in the Social Sciences helps readers find valid, meaningful, and useful information. It is written for researchers and data analysts as well as students who have no prior experience in statistics or computer science. Suitable for a variety of classes—including upper-division courses for undergraduates, introductory courses for graduate students, and courses in data management and advanced statistical methods—the book guides readers in the application of data mining techniques and illustrates the significance of newly discovered knowledge.

Readers will learn to:
• appreciate the role of data mining in scientific research
• develop an understanding of fundamental concepts of data mining and knowledge discovery
• use software to carry out data mining tasks
• select and assess appropriate models to ensure findings are valid and meaningful
• develop basic skills in data preparation, data mining, model selection, and validation
• apply concepts with end-of-chapter exercises and review summaries

Skip carousel

Social Science

LanguageEnglish

PublisherUniversity of California Press

Release dateFeb 4, 2020

ISBN9780520965874

Author

Prof. Xiaoling Shu

Xiaoling Shu is Professor of Sociology at the University of California, Davis.

Related authors

Skip carousel

Related to Knowledge Discovery in the Social Sciences

Related ebooks

Skip carousel

The Automation of Society is Next: How to Survive the Digital Revolution
Ebook
The Automation of Society is Next: How to Survive the Digital Revolution
byDirk Helbing
Rating: 0 out of 5 stars
0 ratings
Artificial Intelligence: Ultimate Handbook
Ebook
Artificial Intelligence: Ultimate Handbook
byLi Wei
Rating: 0 out of 5 stars
0 ratings
Data Mining for the Social Sciences: An Introduction
Ebook
Data Mining for the Social Sciences: An Introduction
byPaul Attewell
Rating: 0 out of 5 stars
0 ratings
Computational Network Science: An Algorithmic Approach
Ebook
Computational Network Science: An Algorithmic Approach
byHenry Hexmoor
Rating: 0 out of 5 stars
0 ratings
Machine Intelligence and Pattern Recognition
Ebook series
Machine Intelligence and Pattern Recognition
byElsevier Books Reference
Ultimate Neural Network Programming with Python: Create Powerful Modern AI Systems by Harnessing Neural Networks with Python, Keras, and TensorFlow
Ebook
Ultimate Neural Network Programming with Python: Create Powerful Modern AI Systems by Harnessing Neural Networks with Python, Keras, and TensorFlow
byVishal Rajput
Rating: 0 out of 5 stars
0 ratings
Quantum Theory of Collective Phenomena
Ebook
Quantum Theory of Collective Phenomena
byG. L. Sewell
Rating: 0 out of 5 stars
0 ratings
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
Ebook
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
byAlexandra George
Rating: 0 out of 5 stars
0 ratings
Knowledge Graphs A Complete Guide - 2019 Edition
Ebook
Knowledge Graphs A Complete Guide - 2019 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Machine Learning for Future Fiber-Optic Communication Systems
Ebook
Machine Learning for Future Fiber-Optic Communication Systems
byAlan Pak Tao Lau
Rating: 0 out of 5 stars
0 ratings
Analyzing the Social Web
Ebook
Analyzing the Social Web
byJennifer Golbeck
Rating: 5 out of 5 stars
5/5
Mathematical Approaches to Neural Networks
Ebook
Mathematical Approaches to Neural Networks
byElsevier Books Reference
Rating: 0 out of 5 stars
0 ratings
Dynamic Random Walks: Theory and Applications
Ebook
Dynamic Random Walks: Theory and Applications
byNadine Guillotin-Plantard
Rating: 0 out of 5 stars
0 ratings
Julia as a Second Language
Ebook
Julia as a Second Language
byErik Engheim
Rating: 0 out of 5 stars
0 ratings
Principles and Labs for Deep Learning
Ebook
Principles and Labs for Deep Learning
byShih-Chia Huang
Rating: 0 out of 5 stars
0 ratings
Beginning Anomaly Detection Using Python-Based Deep Learning: With Keras and PyTorch
Ebook
Beginning Anomaly Detection Using Python-Based Deep Learning: With Keras and PyTorch
bySridhar Alla
Rating: 0 out of 5 stars
0 ratings
Social Network Analytics: Computational Research Methods and Techniques
Ebook
Social Network Analytics: Computational Research Methods and Techniques
byNilanjan Dey
Rating: 0 out of 5 stars
0 ratings
Perspectives on Social Network Research
Ebook
Perspectives on Social Network Research
byPaul W. Holland
Rating: 0 out of 5 stars
0 ratings
Statistical Modeling by Wavelets
Ebook
Statistical Modeling by Wavelets
byBrani Vidakovic
Rating: 0 out of 5 stars
0 ratings
Automation and Its Macroeconomic Consequences: Theory, Evidence, and Social Impacts
Ebook
Automation and Its Macroeconomic Consequences: Theory, Evidence, and Social Impacts
byKlaus Prettner
Rating: 0 out of 5 stars
0 ratings
Collective Intelligence in Action
Ebook
Collective Intelligence in Action
bySatnam Alag
Rating: 4 out of 5 stars
4/5
Introduction to Agent-Based Economics
Ebook
Introduction to Agent-Based Economics
byMauro Gallegati
Rating: 5 out of 5 stars
5/5
Computability, Complexity, Logic
Ebook
Computability, Complexity, Logic
byE. Börger
Rating: 0 out of 5 stars
0 ratings
Artificial Neural Systems: Principle and Practice
Ebook
Artificial Neural Systems: Principle and Practice
byPierre Lorrentz
Rating: 0 out of 5 stars
0 ratings
Internet Economics: Models, Mechanisms and Management
Ebook
Internet Economics: Models, Mechanisms and Management
byHans W. Gottinger
Rating: 0 out of 5 stars
0 ratings
Social Network Sites for Scientists: A Quantitative Survey
Ebook
Social Network Sites for Scientists: A Quantitative Survey
byJose Luis Ortega
Rating: 0 out of 5 stars
0 ratings
Mastering Gephi Network Visualization
Ebook
Mastering Gephi Network Visualization
byKen Cherven
Rating: 0 out of 5 stars
0 ratings
Estimation and Control of Large-Scale Networked Systems
Ebook
Estimation and Control of Large-Scale Networked Systems
byTong Zhou
Rating: 0 out of 5 stars
0 ratings
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
Ebook
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
byAhmed Ph. Abbasi
Rating: 0 out of 5 stars
0 ratings
Persistent Fools: Cunning Intelligence and the Politics of Design
Ebook
Persistent Fools: Cunning Intelligence and the Politics of Design
byThomas Wendt
Rating: 0 out of 5 stars
0 ratings

Social Science For You

Skip carousel

Greek Mythology: The Gods, Goddesses, and Heroes Handbook: From Aphrodite to Zeus, a Profile of Who's Who in Greek Mythology
Ebook
Greek Mythology: The Gods, Goddesses, and Heroes Handbook: From Aphrodite to Zeus, a Profile of Who's Who in Greek Mythology
byLiv Albert
Rating: 4 out of 5 stars
4/5
Read People Like a Book: How to Analyze, Understand, and Predict People’s Emotions, Thoughts, Intentions, and Behaviors
Ebook
Read People Like a Book: How to Analyze, Understand, and Predict People’s Emotions, Thoughts, Intentions, and Behaviors
byPatrick King
Rating: 4 out of 5 stars
4/5
The Encyclopedia of Misinformation: A Compendium of Imitations, Spoofs, Delusions, Simulations, Counterfeits, Impostors, Illusions, Confabulations, Skullduggery, ... Conspiracies & Miscellaneous Fakery
Ebook
The Encyclopedia of Misinformation: A Compendium of Imitations, Spoofs, Delusions, Simulations, Counterfeits, Impostors, Illusions, Confabulations, Skullduggery, ... Conspiracies & Miscellaneous Fakery
byRex Sorgatz
Rating: 4 out of 5 stars
4/5
My Secret Garden: Women's Sexual Fantasies
Ebook
My Secret Garden: Women's Sexual Fantasies
byNancy Friday
Rating: 4 out of 5 stars
4/5
Dreamland: The True Tale of America's Opiate Epidemic
Ebook
Dreamland: The True Tale of America's Opiate Epidemic
bySam Quinones
Rating: 4 out of 5 stars
4/5
The Human Condition
Ebook
The Human Condition
byHannah Arendt
Rating: 4 out of 5 stars
4/5
One Nation Under Blackmail - Vol. 1: The Sordid Union Between Intelligence and Crime that Gave Rise to Jeffrey Epstein, VOL.1
Ebook
One Nation Under Blackmail - Vol. 1: The Sordid Union Between Intelligence and Crime that Gave Rise to Jeffrey Epstein, VOL.1
byWhitney Alyse Webb
Rating: 5 out of 5 stars
5/5
A People's History of the United States
Ebook
A People's History of the United States
byHoward Zinn
Rating: 4 out of 5 stars
4/5
Verbal Judo, Second Edition: The Gentle Art of Persuasion
Ebook
Verbal Judo, Second Edition: The Gentle Art of Persuasion
byGeorge J. Thompson, PhD
Rating: 4 out of 5 stars
4/5
Questions for Couples: 469 Thought-Provoking Conversation Starters for Connecting, Building Trust, and Rekindling Intimacy
Ebook
Questions for Couples: 469 Thought-Provoking Conversation Starters for Connecting, Building Trust, and Rekindling Intimacy
byMarcus Kusi
Rating: 0 out of 5 stars
0 ratings
The Art of Witty Banter: Be Clever, Quick, & Magnetic
Ebook
The Art of Witty Banter: Be Clever, Quick, & Magnetic
byPatrick King
Rating: 4 out of 5 stars
4/5
All About Love: New Visions
Ebook
All About Love: New Visions
bybell hooks
Rating: 4 out of 5 stars
4/5
Fervent: A Woman's Battle Plan to Serious, Specific, and Strategic Prayer
Ebook
Fervent: A Woman's Battle Plan to Serious, Specific, and Strategic Prayer
byPriscilla Shirer
Rating: 5 out of 5 stars
5/5
The Song of the Cell: An Exploration of Medicine and the New Human
Ebook
The Song of the Cell: An Exploration of Medicine and the New Human
bySiddhartha Mukherjee
Rating: 4 out of 5 stars
4/5
Prisoners of Geography: Ten Maps That Explain Everything About the World
Ebook
Prisoners of Geography: Ten Maps That Explain Everything About the World
byTim Marshall
Rating: 4 out of 5 stars
4/5
Just Mercy: a story of justice and redemption
Ebook
Just Mercy: a story of justice and redemption
byBryan Stevenson
Rating: 5 out of 5 stars
5/5
I Don't Want to Talk About It: Overcoming the Secret Legacy of Male Depression
Ebook
I Don't Want to Talk About It: Overcoming the Secret Legacy of Male Depression
byTerrence Real
Rating: 4 out of 5 stars
4/5
Freedom Is a Constant Struggle: Ferguson, Palestine, and the Foundations of a Movement
Ebook
Freedom Is a Constant Struggle: Ferguson, Palestine, and the Foundations of a Movement
byAngela Y. Davis
Rating: 4 out of 5 stars
4/5
Come As You Are: Revised and Updated: The Surprising New Science That Will Transform Your Sex Life
Ebook
Come As You Are: Revised and Updated: The Surprising New Science That Will Transform Your Sex Life
byEmily Nagoski
Rating: 4 out of 5 stars
4/5
The Like Switch: An Ex-FBI Agent's Guide to Influencing, Attracting, and Winning People Over
Ebook
The Like Switch: An Ex-FBI Agent's Guide to Influencing, Attracting, and Winning People Over
byJack Schafer
Rating: 4 out of 5 stars
4/5
Men Who Hate Women: From Incels to Pickup Artists: The Truth about Extreme Misogyny and How it Affects Us All
Ebook
Men Who Hate Women: From Incels to Pickup Artists: The Truth about Extreme Misogyny and How it Affects Us All
byLaura Bates
Rating: 4 out of 5 stars
4/5
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Ebook
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
byMargot Lee Shetterly
Rating: 4 out of 5 stars
4/5
The Sun Does Shine: How I Found Life and Freedom on Death Row (Oprah's Book Club Selection)
Ebook
The Sun Does Shine: How I Found Life and Freedom on Death Row (Oprah's Book Club Selection)
byAnthony Ray Hinton
Rating: 4 out of 5 stars
4/5
Women Don't Owe You Pretty
Ebook
Women Don't Owe You Pretty
byFlorence Given
Rating: 4 out of 5 stars
4/5
The Denial of Death
Ebook
The Denial of Death
byErnest Becker
Rating: 4 out of 5 stars
4/5
Generations: The Real Differences Between Gen Z, Millennials, Gen X, Boomers, and Silents—and What They Mean for America's Future
Ebook
Generations: The Real Differences Between Gen Z, Millennials, Gen X, Boomers, and Silents—and What They Mean for America's Future
byJean M. Twenge
Rating: 4 out of 5 stars
4/5
King, Warrior, Magician, Lover: Rediscovering the Archetypes of the Mature Masculine
Ebook
King, Warrior, Magician, Lover: Rediscovering the Archetypes of the Mature Masculine
byRobert Moore
Rating: 4 out of 5 stars
4/5
The Woman They Could Not Silence: One Woman, Her Incredible Fight for Freedom, and the Men Who Tried to Make Her Disappear (Women's History Month, True Story about an Inspirational Woman)
Ebook
The Woman They Could Not Silence: One Woman, Her Incredible Fight for Freedom, and the Men Who Tried to Make Her Disappear (Women's History Month, True Story about an Inspirational Woman)
byKate Moore
Rating: 4 out of 5 stars
4/5
You're Not Listening: What You're Missing and Why It Matters
Ebook
You're Not Listening: What You're Missing and Why It Matters
byKate Murphy
Rating: 4 out of 5 stars
4/5
Living Resistance: An Indigenous Vision for Seeking Wholeness Every Day
Ebook
Living Resistance: An Indigenous Vision for Seeking Wholeness Every Day
byKaitlin B Curtice
Rating: 5 out of 5 stars
5/5

Related podcast episodes

Skip carousel

Data Visualization with Manuel Lima: Gabi Ferrara and Jon Foust are back today and joined by fellow Googler Manuel Lima.
Podcast episode
Data Visualization with Manuel Lima: Gabi Ferrara and Jon Foust are back today and joined by fellow Googler Manuel Lima.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
#10 Data Science, the Environment and MOOCs: Air pollution, the environment and data science: where do these intersect? Find out in this episode of DataFramed, in which Hugo speaks with Roger Peng, Professor in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health...
Podcast episode
#10 Data Science, the Environment and MOOCs: Air pollution, the environment and data science: where do these intersect? Find out in this episode of DataFramed, in which Hugo speaks with Roger Peng, Professor in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health...
byDataFramed
0 ratings
0% found this document useful
#13 Fake News Detection with Data Science: <p>Fake news: how can data science and deep learning be leveraged to detect it? Come on a journey with Mike Tamir, Head of Data Science at Uber ATG, who is building out a data science product that classifies text as news, editorial, satire, hate speech...
Podcast episode
#13 Fake News Detection with Data Science: <p>Fake news: how can data science and deep learning be leveraged to detect it? Come on a journey with Mike Tamir, Head of Data Science at Uber ATG, who is building out a data science product that classifies text as news, editorial, satire, hate speech...
byDataFramed
100%
100% found this document useful
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
Podcast episode
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
byThe Python Podcast.__init__
0 ratings
0% found this document useful
126 | FlowingData with Nathan Yau
Podcast episode
126 | FlowingData with Nathan Yau
byData Stories
0 ratings
0% found this document useful
Learning Long-Time Dependencies with RNNs w/ Konstantin Rusch - #484: Today we conclude our 2021 ICLR coverage joined by Konstantin Rusch, a PhD Student at ETH Zurich. In our conversation with Konstantin, we explore his recent papers, titled coRNN and uniCORNN respectively, which focus on a novel architecture of...
Podcast episode
Learning Long-Time Dependencies with RNNs w/ Konstantin Rusch - #484: Today we conclude our 2021 ICLR coverage joined by Konstantin Rusch, a PhD Student at ETH Zurich. In our conversation with Konstantin, we explore his recent papers, titled coRNN and uniCORNN respectively, which focus on a novel architecture of...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Retraction Watch: A Discussion with Adam Marcus and Ivan Oransky: An interview with Adam Marcus and Ivan Oransky
Podcast episode
Retraction Watch: A Discussion with Adam Marcus and Ivan Oransky: An interview with Adam Marcus and Ivan Oransky
byNew Books in Science, Technology, and Society
0 ratings
0% found this document useful
Kate Crawford, "The Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence" (Yale UP, 2021): An interview with Kate Crawford
Podcast episode
Kate Crawford, "The Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence" (Yale UP, 2021): An interview with Kate Crawford
byNew Books in Mathematics
0 ratings
0% found this document useful
Explaining Access Control Using Python & Cautiously Handling Pickles
Podcast episode
Explaining Access Control Using Python & Cautiously Handling Pickles
byThe Real Python Podcast
0 ratings
0% found this document useful
Beyond Borders with Ethan #2: “Do I Need to Sound Like a Native?” | Hadar and Ethan's Advice on Fluency: In this episode of the show I had an amazing conversation with Hadar Shemesh, an incredibly skilled and insightful accent and fluency coach from Israel. She moved to NYC when she was 21 to pursue acting, where she worked as a bartender at a Jazz club...
Podcast episode
Beyond Borders with Ethan #2: “Do I Need to Sound Like a Native?” | Hadar and Ethan's Advice on Fluency: In this episode of the show I had an amazing conversation with Hadar Shemesh, an incredibly skilled and insightful accent and fluency coach from Israel. She moved to NYC when she was 21 to pursue acting, where she worked as a bartender at a Jazz club...
byRealLife English: Learn and Speak Confident, Natural English
0 ratings
0% found this document useful
#2 How Data Science is Impacting Telecommunications Networks: Chris Volinsky, AT&T Labs' Assistant Vice President for Big Data Research and a member of the team that won the $1M Netflix Prize, an open competition for improving Netflix' online recommendation system, speaks with Hugo. We'll be discussing the role d...
Podcast episode
#2 How Data Science is Impacting Telecommunications Networks: Chris Volinsky, AT&T Labs' Assistant Vice President for Big Data Research and a member of the team that won the $1M Netflix Prize, an open competition for improving Netflix' online recommendation system, speaks with Hugo. We'll be discussing the role d...
byDataFramed
0 ratings
0% found this document useful
Celestia’s Building the Multi-Chain Universe with Nick White | Alpha Leak: Nick White is the COO of Celestia Labs. Celestia is the first modular blockchain network. If you have no idea what that is, you’re in luck! Celestia wants it to be as easy to deploy blockchains as it is smart contracts. Nick and David cover all...
Podcast episode
Celestia’s Building the Multi-Chain Universe with Nick White | Alpha Leak: Nick White is the COO of Celestia Labs. Celestia is the first modular blockchain network. If you have no idea what that is, you’re in luck! Celestia wants it to be as easy to deploy blockchains as it is smart contracts. Nick and David cover all...
byBankless
0 ratings
0% found this document useful
SPAClash: the buzz and the bust: Special-purpose acquisition companies offer a novel way for companies to list on stockmarkets. We look behind the buzz, and something of a recent bust, to discover why they are a useful innovation both for investors and markets. President Jair Bolsonar...
Podcast episode
SPAClash: the buzz and the bust: Special-purpose acquisition companies offer a novel way for companies to list on stockmarkets. We look behind the buzz, and something of a recent bust, to discover why they are a useful innovation both for investors and markets. President Jair Bolsonar...
byEconomist Podcasts
0 ratings
0% found this document useful
#51 Francois Chollet - Intelligence and Generalisation
Podcast episode
#51 Francois Chollet - Intelligence and Generalisation
byMachine Learning Street Talk (MLST)
0 ratings
0% found this document useful
055 | Disinformation Visualization w/ Mushon Zer-Aviv
Podcast episode
055 | Disinformation Visualization w/ Mushon Zer-Aviv
byData Stories
0 ratings
0% found this document useful
Christine L. Borgman, “Big Data, Little Data, No Data: Scholarship in the Networked World” (MIT Press, 2015): Social media and digital technology now allow researchers to collect vast amounts of a variety data quickly. This so-called “big data,” and the practices that surround its collection, is all the rage in both the media and in research circles.
Podcast episode
Christine L. Borgman, “Big Data, Little Data, No Data: Scholarship in the Networked World” (MIT Press, 2015): Social media and digital technology now allow researchers to collect vast amounts of a variety data quickly. This so-called “big data,” and the practices that surround its collection, is all the rage in both the media and in research circles.
byNew Books in Education
0 ratings
0% found this document useful
Justin Grimmer et al., "Text as Data: A New Framework for Machine Learning and the Social Sciences" (Princeton UP, 2022): An interview with Justin Grimmer, Brandon M. Stewart, and Margaret E. Roberts
Podcast episode
Justin Grimmer et al., "Text as Data: A New Framework for Machine Learning and the Social Sciences" (Princeton UP, 2022): An interview with Justin Grimmer, Brandon M. Stewart, and Margaret E. Roberts
byNew Books in Political Science
0 ratings
0% found this document useful
Justin Grimmer et al., "Text as Data: A New Framework for Machine Learning and the Social Sciences" (Princeton UP, 2022): An interview with Justin Grimmer, Brandon M. Stewart, and Margaret E. Roberts
Podcast episode
Justin Grimmer et al., "Text as Data: A New Framework for Machine Learning and the Social Sciences" (Princeton UP, 2022): An interview with Justin Grimmer, Brandon M. Stewart, and Margaret E. Roberts
byNew Books in Science, Technology, and Society
0 ratings
0% found this document useful
Justin Grimmer et al., "Text as Data: A New Framework for Machine Learning and the Social Sciences" (Princeton UP, 2022): An interview with Justin Grimmer, Brandon M. Stewart, and Margaret E. Roberts
Podcast episode
Justin Grimmer et al., "Text as Data: A New Framework for Machine Learning and the Social Sciences" (Princeton UP, 2022): An interview with Justin Grimmer, Brandon M. Stewart, and Margaret E. Roberts
byPrinceton UP Ideas Podcast
0 ratings
0% found this document useful
Justin Grimmer et al., "Text as Data: A New Framework for Machine Learning and the Social Sciences" (Princeton UP, 2022): An interview with Justin Grimmer, Brandon M. Stewart, and Margaret E. Roberts
Podcast episode
Justin Grimmer et al., "Text as Data: A New Framework for Machine Learning and the Social Sciences" (Princeton UP, 2022): An interview with Justin Grimmer, Brandon M. Stewart, and Margaret E. Roberts
byNew Books in Economics
0 ratings
0% found this document useful
Justin Grimmer et al., "Text as Data: A New Framework for Machine Learning and the Social Sciences" (Princeton UP, 2022): An interview with Justin Grimmer, Brandon M. Stewart, and Margaret E. Roberts
Podcast episode
Justin Grimmer et al., "Text as Data: A New Framework for Machine Learning and the Social Sciences" (Princeton UP, 2022): An interview with Justin Grimmer, Brandon M. Stewart, and Margaret E. Roberts
byNew Books in Public Policy
0 ratings
0% found this document useful
Justin Grimmer et al., "Text as Data: A New Framework for Machine Learning and the Social Sciences" (Princeton UP, 2022): An interview with Justin Grimmer, Brandon M. Stewart, and Margaret E. Roberts
Podcast episode
Justin Grimmer et al., "Text as Data: A New Framework for Machine Learning and the Social Sciences" (Princeton UP, 2022): An interview with Justin Grimmer, Brandon M. Stewart, and Margaret E. Roberts
byNew Books in Anthropology
0 ratings
0% found this document useful
Justin Grimmer et al., "Text as Data: A New Framework for Machine Learning and the Social Sciences" (Princeton UP, 2022): An interview with Justin Grimmer, Brandon M. Stewart, and Margaret E. Roberts
Podcast episode
Justin Grimmer et al., "Text as Data: A New Framework for Machine Learning and the Social Sciences" (Princeton UP, 2022): An interview with Justin Grimmer, Brandon M. Stewart, and Margaret E. Roberts
byNew Books in Sociology
0 ratings
0% found this document useful
Sandy Pentland on Social Physics: For Alex “Sandy” Pentland, one of the best-known and widely cited computational social scientists in the world, these are halcyon days for his field. One of the creators of the MIT Media Lab and currently the director of the...
Podcast episode
Sandy Pentland on Social Physics: For Alex “Sandy” Pentland, one of the best-known and widely cited computational social scientists in the world, these are halcyon days for his field. One of the creators of the MIT Media Lab and currently the director of the...
bySocial Science Bites
100%
100% found this document useful
Big Ideas: Big Data for Law: Big data is big news. Did you know an estimated 90 per cent of the world's data was created in the last two years (see www.ibm.com/big-data)? Insights gleaned from large datasets are increasingly driving business innovation and economic growth. Underpinn
Podcast episode
Big Ideas: Big Data for Law: Big data is big news. Did you know an estimated 90 per cent of the world's data was created in the last two years (see www.ibm.com/big-data)? Insights gleaned from large datasets are increasingly driving business innovation and economic growth. Underpinn
byThe National Archives Podcast Series
0 ratings
0% found this document useful
Gary King on Big Data Analysis: It’s said that in the last two years, more data has been created than all the data that ever was created before that time. And that in two years hence, we’ll be able to say the same thing. Gary King, the head of the Institute for Quantitative...
Podcast episode
Gary King on Big Data Analysis: It’s said that in the last two years, more data has been created than all the data that ever was created before that time. And that in two years hence, we’ll be able to say the same thing. Gary King, the head of the Institute for Quantitative...
bySocial Science Bites
0 ratings
0% found this document useful
WENDY WONG - Author of We, the Data: Human Rights in the Digital Age: data, AI, Meta, politics, human rights, digital age, technology, privacy, humanities, books, climate change
Podcast episode
WENDY WONG - Author of We, the Data: Human Rights in the Digital Age: data, AI, Meta, politics, human rights, digital age, technology, privacy, humanities, books, climate change
bySustainability, Climate Change, Renewable Energy, Politics, Activism, Biodiversity, Carbon Footprint, Wildlife, Regenerative Agriculture, Circular Economy, Extinction, Net-Zero · One Planet Podcast
0 ratings
0% found this document useful
WENDY WONG - Author of We, the Data: Human Rights in the Digital Age: data, AI, Meta, politics, human rights, digital age, technology, privacy, humanities, books, climate change
Podcast episode
WENDY WONG - Author of We, the Data: Human Rights in the Digital Age: data, AI, Meta, politics, human rights, digital age, technology, privacy, humanities, books, climate change
byOne Planet Podcast · Climate Change, Politics, Sustainability, Environmental Solutions, Renewable Energy, Activism, Biodiversity, Carbon Footprint, Wildlife, Regenerative Agriculture, Circular Economy, Extinction, Net-Zero
0 ratings
0% found this document useful
Diplomachine: Andy and Dave discuss the latest in AI news, including the release of the U.S. Navy and Marine Corps Unmanned Campaign Framework, which describes the desired approach to developing and deploying unmanned systems. Google employees demand stronger laws...
Podcast episode
Diplomachine: Andy and Dave discuss the latest in AI news, including the release of the U.S. Navy and Marine Corps Unmanned Campaign Framework, which describes the desired approach to developing and deploying unmanned systems. Google employees demand stronger laws...
byAI with AI: Artificial Intelligence with Andy Ilachinski
0 ratings
0% found this document useful
Moonshot Thinking to Unleash Innovation [Audio]
Podcast episode
Moonshot Thinking to Unleash Innovation [Audio]
byLSE: Public lectures and events
0 ratings
0% found this document useful

Skip carousel

Don’t Be Misled by GPT-4’s Gift of Gab
The Atlantic
Article
Don’t Be Misled by GPT-4’s Gift of Gab
Mar 15, 2023
4 min read
The Truth About Robots
TIME
Article
The Truth About Robots
Feb 4, 2019
Artificial intelligence is powerful—and misunderstood. What we need to know to protect workers
3 min read
The Lightning Seeds
Linux Format
Article
The Lightning Seeds
Apr 6, 2021
1 min read
Why Do People Believe the Earth Is Flat?
Nautilus
Article
Why Do People Believe the Earth Is Flat?
May 22, 2023
3 min read
The Case For Leaving City Rats Alone: A Vancouver rat study is showing us how pest control can backfire.
Nautilus
Article
The Case For Leaving City Rats Alone: A Vancouver rat study is showing us how pest control can backfire.
Jul 28, 2016
Kaylee Byers crouches in a patch of urban blackberries early one morning this June, to check a live trap in one of Vancouver’s poorest areas, the V6A postal code. Her first catch of the day is near a large blue dumpster on “Block 5,” in front of a 20
8 min read
Virtual Reality May Boost Empathy More Than Other Media
Futurity
Article
Virtual Reality May Boost Empathy More Than Other Media
Oct 18, 2018
A virtual reality experience called “Becoming Homeless” is helping expand research on how immersive technology affects people’s level of empathy. According to a new study, people who saw what it would be like to lose their jobs and homes using virtua
3 min read
Future Historians Probably Won't Understand Our Internet, and That's Okay
The Atlantic
Article
Future Historians Probably Won't Understand Our Internet, and That's Okay
Dec 6, 2017
6 min read
'The Cloud' and Other Dangerous Metaphors
The Atlantic
Article
'The Cloud' and Other Dangerous Metaphors
Jan 20, 2015
4 min read
To Spur New AI Tools To Fight Coronavirus, Tech Leaders Launch Open Database Of Scientific Articles
STAT
Article
To Spur New AI Tools To Fight Coronavirus, Tech Leaders Launch Open Database Of Scientific Articles
Mar 16, 2020
The dataset is believed to be the most extensive collection concerning the #coronavirus, and crucially, it’s machine-readable, a format that can be easily processed by a computer.
1 min read
Free Flow Of Data: What The Corporate World Can Learn From Science
The European Business Review
Article
Free Flow Of Data: What The Corporate World Can Learn From Science
Jul 31, 2020
8 min read
The Dawn Of The Age Of Artificial Intelligence
The Atlantic
Article
The Dawn Of The Age Of Artificial Intelligence
Feb 14, 2014
6 min read
The Logic of the Filing Cabinet Is Everywhere
The Atlantic
Article
The Logic of the Filing Cabinet Is Everywhere
Jun 8, 2021
7 min read
<![CDATA[Think China's Data Is An Unbeatable AI Advantage? A New Report Thinks Different]>
Post Magazine
Article
<![CDATA[Think China's Data Is An Unbeatable AI Advantage? A New Report Thinks Different]>
Jul 18, 2019
3 min read
Eyeballs And AI Power The Research Into How Falsehoods Travel Online
NPR
Article
Eyeballs And AI Power The Research Into How Falsehoods Travel Online
Oct 13, 2022
4 min read
Server Manifesto: Data Center Architecture and the Future of Democracy
Architecture NZ
Article
Server Manifesto: Data Center Architecture and the Future of Democracy
Jan 15, 2024
4 min read
Who Will Design the Future?
Nautilus
Article
Who Will Design the Future?
Aug 1, 2019
Ada Lovelace was an English mathematician who lived in the first half of the 19th century. (She was also the daughter of the poet Lord Byron, who invited Mary Shelley to his house in Geneva for a weekend of merriment and a challenge to write a ghost
17 min read
Facebook Under Fire, But It’s Just Part Of ‘Surveillance Economy’
The Christian Science Monitor
Article
Facebook Under Fire, But It’s Just Part Of ‘Surveillance Economy’
Mar 28, 2018
Google, Amazon, and social media platforms track and analyze people's personal data so they can predict what they'll buy and even how they'll vote. Privacy advocates want Congress to set limits on the practice.
5 min read
World Data Forum Gaining Momentum
Sunday Independent
Article
World Data Forum Gaining Momentum
Apr 23, 2023
3 min read
World Data Forum Gaining Momentum
Isolezwe nGesonto Sunday
Article
World Data Forum Gaining Momentum
Apr 23, 2023
3 min read
World Data Forum Gaining Momentum
Sunday Tribune
Article
World Data Forum Gaining Momentum
Apr 23, 2023
3 min read
Can Location Data From Smartphones Help Slow The Coronavirus? Facebook Is Giving Academics A Chance To Try
STAT
Article
Can Location Data From Smartphones Help Slow The Coronavirus? Facebook Is Giving Academics A Chance To Try
Mar 24, 2020
Facebook is sharing aggregated, anonymized location data from its users with researchers analyzing the spread of the #coronavirus.
4 min read
Computational Propaganda: Bots, Targeting, And The Future
NPR
Article
Computational Propaganda: Bots, Targeting, And The Future
Feb 9, 2018
4 min read
You Can Aid COVID-19 Research With Folding@home
Futurity
Article
You Can Aid COVID-19 Research With Folding@home
Jul 2, 2020
3 min read
Book Review: The Future Is Faster Than You Think
Techfastly
Article
Book Review: The Future Is Faster Than You Think
Aug 2, 2021
4 min read
Who Decides Who Decides What Conversations Are Allowed About Artificial Intelligence?
AQ: Australian Quarterly
Article
Who Decides Who Decides What Conversations Are Allowed About Artificial Intelligence?
Mar 31, 2023
6 min read
In Defense of Algorithms
Reason
Article
In Defense of Algorithms
Nov 17, 2022
19 min read
Q&A
Rotman Management
Article
Q&A
Jan 1, 2022
You have referred to our current environment as ‘exponential’. How do you define that term? A linear process is one where something progresses straight from one stage to another. For example, your age increases by a predictable ‘one’ with each revolu
8 min read
Does Custom Political News Hurt Democracy?
Futurity
Article
Does Custom Political News Hurt Democracy?
Apr 27, 2017
Customizing political news online to filter out what doesn’t align with your beliefs may have real-world negative effects on democracy. A new study, published in the journal Computers in Human Behavior is among the first to experimentally test the po
3 min read
Science for a Stronger Democracy: 3 Ways to Boost Communication Between Scientists, Voters, and Decisionmakers
Union of Concerned Scientists
Article
Science for a Stronger Democracy: 3 Ways to Boost Communication Between Scientists, Voters, and Decisionmakers
Nov 2, 2023
A leader of March for Science NYC shares techniques for using science communication to strengthen our democracy.
4 min read
AI in Journalism
Techfastly
Article
AI in Journalism
Sep 21, 2020
AI is continually making waves in the news and journalism industry. The work of communicating everything that happens in the society seems like it’s something entirely human-dependent. Furthermore, what does a machine have to do with what does a mach
4 min read

Related categories

Skip carousel

Reviews for Knowledge Discovery in the Social Sciences

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Knowledge Discovery in the Social Sciences - Prof. Xiaoling Shu

KNOWLEDGE DISCOVERY IN THE SOCIAL SCIENCES

A Data Mining Approach

Xiaoling Shu

UC Logo

UNIVERSITY OF CALIFORNIA PRESS

University of California Press

Oakland, California

Library of Congress Cataloging-in-Publication Data

Names: Shu, Xiaoling, 1968- author.

Title: Knowledge discovery in the social sciences : a data mining approach / Xiaoling Shu.

Description: Oakland, California : University of California Press, [2020] | Includes bibliographical references and index.

Identifiers: LCCN 2019024334 (print) | LCCN 2019024335 (ebook) | ISBN 9780520339996 (cloth) | ISBN 9780520292307 (paperback) | ISBN 9780520965874 (ebook)

Subjects: LCSH: Social sciences—Research—Data processing. | Data mining.

Classification: LCC H61.3 .S49 2020 (print) | LCC H61.3 (ebook) | DDC 300.285/6312—dc23

LC record available at https://lccn.loc.gov/2019024334

LC ebook record available at https://lccn.loc.gov/2019024335

Manufactured in the United States of America

29 28 27 26 25 24 23 22 21 20

10 9 8 7 6 5 4 3 2 1

To Casey, Kina, and Dong with love and gratitude

PART I. KNOWLEDGE DISCOVERY AND DATA MINING IN SOCIAL SCIENCE RESEARCH

Chapter 1. Introduction

Chapter 2. New Contributions and Challenges

PART II. DATA PREPROCESSING

Chapter 3. Data Issues

Chapter 4. Data Visualization

PART III. MODEL ASSESSMENT

Chapter 5. Assessment of Models

PART IV. DATA MINING: UNSUPERVISED LEARNING

Chapter 6. Cluster Analysis

Chapter 7. Associations

PART V. DATA MINING: SUPERVISED LEARNING

Chapter 8. Generalized Regression

Chapter 9. Classification and Decision Trees

Chapter 10. Artificial Neural Networks

PART VI. DATA MINING: TEXT DATA AND NETWORK DATA

Chapter 11. Web Mining and Text Mining

Chapter 12. Network or Link Analysis

Index

PART I

KNOWLEDGE DISCOVERY AND DATA MINING IN SOCIAL SCIENCE RESEARCH

Chapter 1

INTRODUCTION

ADVANCES IN TECHNOLOGY —the internet, mobile devices, computers, digital sensors, and recording equipment—have led to exponential growth in the amount and complexity of data available for analysis. It has become difficult or even impossible to capture, manage, process, and analyze these data in a reasonable amount of time. We are at the threshold of an era in which digital data play an increasingly important role in the research process. In the traditional approach, hypotheses derived from theories are the driving forces behind model building. However, with the rise of big data and the enormous wealth of information and knowledge buried in this data mine, using data mining technologies to discover interesting, meaningful, and robust patterns has becoming increasingly important. This alternative method of research affects all fields, including the social sciences. The availability of huge amounts of data provides unprecedented opportunities for new discoveries, as well as challenges.

Today we are confronted with a data tsunami. We are accumulating data at an unprecedented scale in many areas of industry, government, and civil society. Analysis and knowledge based on big data now drive nearly every aspect of society, including retail, financial services, insurance, wireless mobile services, business management, urban planning, science and technology, social sciences, and humanities. Google Books has so far digitalized 4 percent of all the books ever printed in the world, and the process is ongoing. The Google Books corpus contains more than 500 billion words in English, French, Spanish, German, Chinese, Russian, and Hebrew that would take a person eighty years to read continuously at a pace of 200 words per minute. This entire corpus is available for downloading (http://storage.googleapis.com/books/ngrams/books/datasetsv2.html), and Google also hosts another site to graph word usage over time, from 1800 to 2008 (https://books.google.com/ngrams). The Internet Archive, a digital library of internet sites and other cultural artifacts in digital form, provides free access to 279 billion web pages, 11 million books and texts, 4 million audio recordings, 3 million videos, 1 million images, and 100,000 software programs (https://archive.org/about/). Facebook generates 4 new petabyes of data and runs 600,000 queries and one million map-reduce jobs per day. Facebook’s data warehouse Hive stores 300 petabytes of data in 800,000 tables, as reported in 2014 (https://research.fb.com/facebook-s-top-open-data-problems/). The GDELT database monitors global cyberspace in real time and analyzes and extracts news events from portals, print media, TV broadcasts, online media, and online forums in all countries of the world and extracts key information such as people, places, organizations, and event types related to news events. The GDELT Event Database records over 300 categories of physical activities around the world, from riots and protests to peaceful appeals and diplomatic exchanges, georeferenced to the city or mountaintop, across the entire world from January 1, 1979 and updated every 15 minutes. Since February 2015, GDELT has brought together 940 million messages from global cyberspace in a volume of 9.4TB (https://www.gdeltproject.org/). A report by McKinsey (Manyika et al. 2011) estimated that corporations, institutions, and users stored more than 13 exabytes of new data, which is over 50,000 times larger than the amount of data in the Library of Congress. The value of global personal location data is estimated to be $700 billion, and these data can reduce costs as much as 50 percent in product development and assembly.

Both industry and academic demands for data analytical skills have soared rapidly and continue to do so. IBM projects that by 2020 the number of jobs requiring data analytical skills in the United States will increase by 15 percent, to more than 2.7 million, and job openings requiring advanced data science analytical skills will reach more than 60,000 (Miller and Hughes 2017). Global firms are focusing on data-intensive sectors such as finance, insurance, and medicine. The topic of big data has been covered in popular news media such as the Economist (2017), the New York Times (Lohr 2012), and National Public Radio (Harris 2016), and data mining has also been featured in Forbes (2015; Brown 2018), the Atlantic (Furnis 2012), and Time (Stein 2011), to name a few.

The growth of big data has also revolutionized scientific research. Computational social sciences emerged as a new methodology, and it is growing in popularity as a result of dramatic increases in available data on human and organizational behaviors (Lazer at al. 2009). Astronomy has also been revolutionized by using a huge database of space pictures, the Sloan Digital Sky Survey, to identify interesting objects and phenomena (https://www.sdss.org/). Bioinformatics has emerged from biological science to focus on databases of genome sequencing, allowing millions or billions of DNA strands to be sequenced rapidly in parallel.

In the field of artificial intelligence (AI), scientists have developed AlphaGo, which was trained to model expert players from recorded historical games of a database of 30 million game moves and was later trained to learn new strategies for itself (https://deepmind.com/research/alphago/). AlphaGo has defeated Go world champions many times and is regarded as the strongest Go player in the game’s history. This is a major advancement over the old AI technology. When IBM’s DeepMind beat chess champion Gary Kasparov in the late 1990s, it used brute-force AI that searched chess moves in a space that was just a small fraction of the search space for Go.

The Google Books corpus has made it possible to expand quantitative analysis into a wider array of topics in the social sciences and the humanities (Michel et al. 2011). By analyzing this corpus, social scientists and humanists have been able to provide insights into cultural trends that include English-language lexicography, the evolution of grammar, collective memory, adoption of technology, pursuits of fame, censorship, and historical epidemiology.

In response to this fast-growing demand, universities and colleges have developed data science or data studies majors. These fields have grown from the confluence of statistics, machine learning, AI, and computer science. They are products of a structural transformation in the nature of research in disciplines that include communication, psychology, sociology, political science, economics, business and commerce, environmental science, linguistics, and the humanities. Data mining projects not only require that users possess in-depth knowledge about data processing, database technology, and statistical and computational algorithms; they also require domain-specific knowledge (from experts such as psychologists, economists, sociologists, political scientists, and linguists) to combine with available data mining tools to discover valid and meaningful knowledge. On many university campuses, social sciences programs have joined forces to consolidate course offerings across disciplines to teach introductory, intermediate, and advanced courses on data description, visualization, mining, and modeling to students in the social sciences and humanities.

This chapter examines the major concepts of big data, knowledge discovery in databases, data mining, and computational social science. It analyzes the characteristics of these terms, their central features, components, and research methods.

WHAT IS BIG DATA?

The concept of big data was conceived in 2001 when the META analyst D. Laney (2001) proposed the famous 3V’s Model to cope with the management of increasingly large amounts of data. Laney described the data as of large volume, growing at a high velocity, and having great variety. The concept of big data became popular in 2008 when Nature featured a special issue on the utility, approaches, and challenges of big data analysis. Big data has since become a widely discussed new topic in all areas of scientific research. Science featured a special forum on big data in 2011, further highlighting the enormous potential and great challenge of big data research. In the same year, McKinsey’s report Big Data: The Next Frontier for Innovation, Competition, and Productivity (2011) announced that the tsunami of data will bring enormous productivity and profits, adding enthusiasm to this already exciting development. Mayer-Schönberger and Cukier (2012) focused on the dramatic impacts that big data will have on the economy, science, and society and the revolutionary changes it will bring about in society at large.

A variety of definitions of big data all agree on one central feature of this concept: data enormity and complexity. Some treat data that are too large for traditional database technologies to store, access, manage, and analyze (Manyika et al. 2011). Others define big data based on its characteristic four big V’s: (1) big volume, measured at terabytes or petabytes; (2) big velocity, which grows rapidly and continuously; (3) big variety, which includes structured numerical data and unstructured data such as text, pictures, video, and sound; and (4) big value, which can be translated into enormous economic profits, academic knowledge, and policy insights. Analysis of big data uses computational algorithms, cloud storage, and AI to instantaneously and continuously mine and analyze data (Dumbill 2013).

There are just as many scholars who think big data is a multifaceted and complex concept that cannot be viewed simply from a data or technology perspective (Mauro, Greco, and Grimaldi 2016). A word cloud analysis from the literature shows that big data can be viewed from at least four different angles. First, big data contains information. The foundation of big data is the production and utilization of information from text, online records, GPS locations, online forums, and so on. This enormous amount of information is digitized, compiled, and stored on computers (Seife 2015). Second, big data includes technology. The enormous size and complexity of the data pose difficulties for computer storage, data processing, and data mining technologies. The technology component of big data includes distributed data storage, cloud computing, data mining, and artificial intelligence. Third, big data encompasses methods. Big data requires a series of processing and analytical methods that are beyond the traditional statistical approaches, such as association, classification, cluster analysis, natural language processing, neural networks, network analysis, pattern recognition, predictive modeling, spatial analysis, statistics, supervised and unsupervised learning, and simulation (Manyika et al. 2011). And fourth, big data has impacts. Big data has affected many dimensions of our society. It has revolutionized how we conduct business, research, design, and production. It has brought and will continue to bring changes in laws, guidelines, and policies on the utility and management of personal information.

To summarize, the essence of big data is big volume, high velocity, and big variety of information. As shown in figure 1.1, it also comprises technology and analytical methods, to transform the information into insights that are worth economic value, thus having an impact on society.

FIGURE 1.1 What Is Big Data?

WHAT IS KNOWLEDGE DISCOVERY IN DATABASE?

Knowledge discovery in database (KDD) is the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data (Fayyad, Piatesky-Shapiro, and Smyth 1996, 84). It consists of nine steps that begin with the development and understanding of the application domain and ends with actions on the knowledge discovered, as illustrated in figure 1.2.

FIGURE 1.2 Components of Knowledge Discovery in a Database.

KDD is a nine-step process that involves understanding domain knowledge, selecting a data set, data processing, data reduction, choice of data mining method, data mining, interpreting patterns, and consolidating discovered knowledge. This process is not a one-way flow. Rather, during each step, researchers can backtrack to any of the previous steps and start again. For example, while considering a variety of data mining methods, researchers may go back to the literature and study existing work on the topic to decide which data mining strategy is the most effective one to address the research question.

KDD has already been applied to a variety of fields, including astronomy, investment, marketing, manufacturing, public policy, sports, and telecommunications. An example of KDD system is the Sky Image Cataloging and Analysis Tool (SKICAT), which can automatically analyze, classify, and catalog sky objects as stars or galaxies using machine learning, machine-assisted discovery, and other AI technologies (http://www.ifa.hawaii.edu/∼rgal/science/dposs/dposs_frames_skicat.html). Another KDD application is Advanced Scout, which is used by NBA coaching staffs to discover interesting patterns in basketball game data and allows users to relate these patterns to videos (https://www.nbastuffer.com/analytics101/advanced-scout/).

WHAT IS DATA MINING?

Data mining has two definitions. The narrow definition is that it is a step in the KDD process of applying data analysis and discovery algorithms to produce certain patterns or models on the data. As shown in figure 1.2, data mining is step 7 in the nine steps of the KDD model. It is usually the case that data mining operates in a pattern space that is infinite, and data mining searches that space to find patterns.

Based on this narrow definition of data mining, its techniques and terminology come from three sources. Statistics are the basic source of data mining and bring well-defined techniques to identify systematic relationships between variables. Data visualization, such as histograms and various plots, presents information in visual forms that provide attractive and powerful methods of data exploration. Computational methods include descriptive statistics, correlation, frequency tables, multivariate exploratory techniques, and advanced and generalized linear models. Figure 1.3 shows the three foundations of data mining according to this narrow definition.

FIGURE 1.3 A Narrow Definition of Data Mining: Three Foundations.

AI, another foundation of data mining techniques, contributes to data mining development with information processing techniques based on a human reasoning model that is heuristic. Machine learning represents an important approach in data mining that trains computers to recognize patterns in data. An artificial neural network (ANN) consists of structures of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems. ANNs are modeled after human brains in how they process information, and they learn by example by adjusting to the synaptic connections that exist between the neurons.

The last foundation of data mining is database systems that provide the support platform for information processing and mining. Increases in computing power and advancements in computer science have made it possible to store, access, and retrieve a huge amount of data, enabling the use of new methods for revealing hidden patterns. These advancements made discoveries of new ideas and theories possible.

The second, and broader, definition of data mining conceptualizes it in a way similar to KDD (Gorunescu 2011). According to this definition, data mining has several components: (1) use of a huge database; (2) computational techniques; (3) automatic or semiautomatic search; and (4) extraction of implicit, previously unknown and potentially useful patterns and relationships hidden in the data. The information expected to be extracted by data scientists are of two types: descriptive and predictive (Larose and Larose 2016). Descriptive objectives are achieved by identifying relations among variables that describe data, and these patterns can be easily understood. Predictive objectives are achieved by using some of the variables to predict one or more of the other outcome variables, thus making it possible to accurately estimate future outcomes based on existing data (Larose and Larose 2016). Figure 1.4 shows the broad definition of data mining.

FIGURE 1.4 A Broad Definition of Data Mining.

This book uses the terms knowledge discovery and data mining interchangeably, according to the broadest conceptualization of data mining. Knowledge discovery and data mining in the social sciences constitute a research process that is guided by social science theories. Social scientists with deep domain knowledge work alongside data miners to select appropriate data, process the data, and choose suitable data mining technologies to conduct visualization, analysis, and mining of data to discover valid, novel, potentially useful, and ultimately understandable patterns. These new patterns are then consolidated with existing theories to develop new knowledge. Knowledge discovery and data mining in the social sciences are also important components of computational social science.

WHAT IS COMPUTATIONAL SOCIAL SCIENCE IN THE ERA OF BIG DATA?

Computational social science (CSS) is a new interdisciplinary area of research at the confluence of information technology, big data, social computing, and the social sciences. The concept of computational social science first gained recognition in 2009 when Lazer and colleagues (2009) published Computational Social Science in the journal Science. Email, mobile devices, credit cards, online invoices, medical records, and social media have recorded an enormous amount of long-term, interactive, and large-scale data on human interactions. CSS is based on the collection and analysis of big data and the use of digitalization tools and methods such as social computing, social modeling, social simulation, network analysis, online experiments, artificial intelligence to research human behaviors, collective interactions, and complex organizations (Watts 2013). Only computational social science can provide us with the unprecedented ability to analyze the breadth and depth of vast amounts of data, thus affording us a new approach to understanding individual behaviors, group interactions, social structures, and societal transformations.

Scholars have formulated a variety of conceptualizations of CSS. One version argues that there are two fundamental components in CSS: substantive and instrumental (Cioff-Revilla 2010). The substantive, or theoretical, dimension entails complex systems and theories of computer programming. The instrumental dimension includes tools for data processing, mining, and analysis, such as automatic information retrieval, social network analysis, socio-GIS, complex modeling, and computation simulation.

Another conceptualization postulates that CSS has four important characteristics. First, it uses data from natural samples that document actual human behaviors (unlike the more artificial data collected from experiments and surveys). Second, the data are big and complex. Third, patterns of individual behavior and social structure are extracted using complex computations based on cloud computing with big databases and data mining approaches. And fourth, scientists use theoretical ideas to guide data mining of big data (Shah et al. 2015).

Others believe that CSS should be an interdisciplinary area at the confluence of domain knowledge, data management, data analysis, and transdisciplinary collaboration and coordination among scholars from different disciplinary training (Mason, Vaughan, and Wallach 2014). Social scientists provide insights on research background and questions, deciding on data sources and methods of collection, while statisticians and computer scientists develop appropriate mathematical models and data mining methods, as well as the necessary computational knowledge and skills to maintain smooth project progress.

Methods of computational social science consist primarily of social computing, online experiments, and computer simulations (Conte 2016). Social computing uses information processing technology and computational methods to conduct data mining and analysis on big data to reveal hidden patterns of collective and individual behaviors. Online experiments as a new research method use the internet as a laboratory to break free of the confines of conventional experimental approaches and use the online world as a natural setting for experiments that transcend time and space (Bond et al. 2012; Kramer, Guillory, and Hancock 2014). Computer simulations use mathematical modeling and simulation software to set and adjust program parameters to simulate social phenomena and detect patterns of social behaviors (Bankes 2002; Gilbert et al. 2005; Epstein 2006). Both online experiments and computer simulations emphasize theory testing and development.

As figure 1.5 shows, in CSS researchers operate under the guidance of social scientific theories, apply computational social science methodology to data (usually big data) from natural samples, detect hidden patterns to enrich social science empirical evidence, and contribute to theory discovery.

FIGURE 1.5 Computational Social Science.

OUTLINE OF THE BOOK

The book has six parts. Part I, comprising this chapter and chapter 2, explains the concepts and development of data mining and knowledge and the role it plays in social science research. Chapter 2 provides information on the process of scientific research as theory-driven confirmatory hypothesis testing. It also explains the impact of the new data mining and knowledge discovery approaches on this process.

Part II deals with data preprocessing. Chapter 3 elaborates issues such as privacy, security, data collection, data cleaning, missing data, and data transformation. Chapter 4 provides information on data visualization that includes graphic summaries of single, bivariate, and complex data.

Part III focuses on model assessment. Chapter 5 explains important methods and measures of model selection and model assessment, such as cross-validation and bootstrapping. It provides justifications as well as ways to use these methods to evaluate models. This chapter is more challenging than the previous chapters. Because the content is difficult for the average undergraduate student, I recommend that instructors selectively introduce sections of this chapter to their students. Later chapters on specific approaches also introduce some of these model assessment approaches. It may be most effective to introduce these specific methods of model assessment after students acquire knowledge of these data mining techniques.

Part IV is devoted to the methods of unsupervised learning: clustering and association. Chapter 6 explains the different types of cluster analysis, similarity measures, hierarchical clustering, and cluster validity. Chapter 7 concentrates on the topic of associations, including association rules, the usefulness of association rules, and the application of association rules in social research.

Part V continues with the topic of machine learning: supervised learning that includes generalized regression, classification and decision trees, and neural networks. Chapter 8 focuses on models of parameter learning that include linear regression and logistic regression. Chapter 9 covers inductive machine learning, decision trees, and types of algorithms in classification and decision trees. Chapter 10 focuses on neural networks, including the structure of neural networks, learning rules, and

Enjoying the preview?

Page 1 of 1

Knowledge Discovery in the Social Sciences: A Data Mining Approach

About this ebook

Prof. Xiaoling Shu

Related authors

Related to Knowledge Discovery in the Social Sciences

Related ebooks

Social Science For You

Related podcast episodes

Related articles

Related categories

Reviews for Knowledge Discovery in the Social Sciences

What did you think?

Book preview

Knowledge Discovery in the Social Sciences - Prof. Xiaoling Shu

CONTENTS

INTRODUCTION