Co-Clustering: Models, Algorithms and Applications

Ebook279 pages2 hours

Co-Clustering: Models, Algorithms and Applications

Name: Co-Clustering: Models, Algorithms and Applications
Author: Gérard Govaert
ISBN: 9781118649503

By Gérard Govaert and Mohamed Nadif

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Cluster or co-cluster analyses are important tools in a variety of scientific areas. The introduction of this book presents a state of the art of already well-established, as well as more recent methods of co-clustering. The authors mainly deal with the two-mode partitioning under different approaches, but pay particular attention to a probabilistic approach.
Chapter 1 concerns clustering in general and the model-based clustering in particular. The authors briefly review the classical clustering methods and focus on the mixture model. They present and discuss the use of different mixtures adapted to different types of data. The algorithms used are described and related works with different classical methods are presented and commented upon. This chapter is useful in tackling the problem of
co-clustering under the mixture approach. Chapter 2 is devoted to the latent block model proposed in the mixture approach context. The authors discuss this model in detail and present its interest regarding co-clustering. Various algorithms are presented in a general context. Chapter 3 focuses on binary and categorical data. It presents, in detail, the appropriated latent block mixture models. Variants of these models and algorithms are presented and illustrated using examples. Chapter 4 focuses on contingency data. Mutual information, phi-squared and model-based co-clustering are studied. Models, algorithms and connections among different approaches are described and illustrated. Chapter 5 presents the case of continuous data. In the same way, the different approaches used in the previous chapters are extended to this situation.

Contents

1. Cluster Analysis.
2. Model-Based Co-Clustering.
3. Co-Clustering of Binary and Categorical Data.
4. Co-Clustering of Contingency Tables.
5. Co-Clustering of Continuous Data.

About the Authors

Gérard Govaert is Professor at the University of Technology of Compiègne, France. He is also a member of the CNRS Laboratory Heudiasyc (Heuristic and diagnostic of complex systems). His research interests include latent structure modeling, model selection, model-based cluster analysis, block clustering and statistical pattern recognition. He is one of the authors of the MIXMOD (MIXtureMODelling) software.
Mohamed Nadif is Professor at the University of Paris-Descartes, France, where he is a member of LIPADE (Paris Descartes computer science laboratory) in the Mathematics and Computer Science department. His research interests include machine learning, data mining, model-based cluster analysis, co-clustering, factorization and data analysis.

Cluster Analysis is an important tool in a variety of scientific areas. Chapter 1 briefly presents a state of the art of already well-established as well more recent methods. The hierarchical, partitioning and fuzzy approaches will be discussed amongst others. The authors review the difficulty of these classical methods in tackling the high dimensionality, sparsity and scalability. Chapter 2 discusses the interests of coclustering, presenting different approaches and defining a co-cluster. The authors focus on co-clustering as a simultaneous clustering and discuss the cases of binary, continuous and co-occurrence data. The criteria and algorithms are described and illustrated on simulated and real data. Chapter 3 considers co-clustering as a model-based co-clustering. A latent block model is defined for different kinds of data. The estimation of parameters and co-clustering is tackled under two approaches: maximum likelihood and classification maximum likelihood. Hard and soft algorithms are described and applied on simulated and real data. Chapter 4 considers co-clustering as a matrix approximation. The trifactorization approach is considered and algorithms based on update rules are described. Links with numerical and probabi

Skip carousel

Programming

LanguageEnglish

PublisherWiley

Release dateDec 11, 2013

ISBN9781118649503

Author

Gérard Govaert

Related authors

Skip carousel

Related to Co-Clustering

Related ebooks

Skip carousel

Digital Signal Processing (DSP) with Python Programming
Ebook
Digital Signal Processing (DSP) with Python Programming
byMaurice Charbit
Rating: 0 out of 5 stars
0 ratings
Statistics I Essentials
Ebook
Statistics I Essentials
byEmil G. Milewski
Rating: 0 out of 5 stars
0 ratings
Theory and Computation of Tensors: Multi-Dimensional Arrays
Ebook
Theory and Computation of Tensors: Multi-Dimensional Arrays
byYimin Wei
Rating: 0 out of 5 stars
0 ratings
Numerical Algebra
Ebook
Numerical Algebra
byJohn Todd
Rating: 0 out of 5 stars
0 ratings
Fundamentals of Modern Mathematics: A Practical Review
Ebook
Fundamentals of Modern Mathematics: A Practical Review
byDavid B. MacNeil
Rating: 0 out of 5 stars
0 ratings
Statistical Inference in Financial and Insurance Mathematics with R
Ebook
Statistical Inference in Financial and Insurance Mathematics with R
byAlexandre Brouste
Rating: 0 out of 5 stars
0 ratings
Information Theory and Statistics
Ebook
Information Theory and Statistics
bySolomon Kullback
Rating: 0 out of 5 stars
0 ratings
Econometrics: A Simple Introduction
Ebook
Econometrics: A Simple Introduction
byK.H. Erickson
Rating: 4 out of 5 stars
4/5
Determinants and Matrices
Ebook
Determinants and Matrices
byA. C. Aitken
Rating: 3 out of 5 stars
3/5
The Nuts and Bolts of Proofs: An Introduction to Mathematical Proofs
Ebook
The Nuts and Bolts of Proofs: An Introduction to Mathematical Proofs
byAntonella Cupillari
Rating: 5 out of 5 stars
5/5
Exploratory and Multivariate Data Analysis
Ebook
Exploratory and Multivariate Data Analysis
byMichel Jambu
Rating: 0 out of 5 stars
0 ratings
Inference for Heavy-Tailed Data: Applications in Insurance and Finance
Ebook
Inference for Heavy-Tailed Data: Applications in Insurance and Finance
byLiang Peng
Rating: 0 out of 5 stars
0 ratings
Matrix Theory
Ebook
Matrix Theory
byJoel N. Franklin
Rating: 0 out of 5 stars
0 ratings
Special Matrices and Their Applications in Numerical Mathematics: Second Edition
Ebook
Special Matrices and Their Applications in Numerical Mathematics: Second Edition
byMiroslav Fiedler
Rating: 5 out of 5 stars
5/5
Ordinary Differential Equations and Stability Theory: An Introduction
Ebook
Ordinary Differential Equations and Stability Theory: An Introduction
byDavid A. Sanchez
Rating: 0 out of 5 stars
0 ratings
Biostatistics and Computer-based Analysis of Health Data using Stata
Ebook
Biostatistics and Computer-based Analysis of Health Data using Stata
byChristophe Lalanne
Rating: 0 out of 5 stars
0 ratings
Biostatistics and Computer-based Analysis of Health Data using R
Ebook
Biostatistics and Computer-based Analysis of Health Data using R
byChristophe Lalanne
Rating: 0 out of 5 stars
0 ratings
Advances in Domain Adaptation Theory
Ebook
Advances in Domain Adaptation Theory
byIevgen Redko
Rating: 0 out of 5 stars
0 ratings
Introduction To Business Statistics Through R Software: Software
Ebook
Introduction To Business Statistics Through R Software: Software
byEditor IJSMI
Rating: 0 out of 5 stars
0 ratings
A Weak Convergence Approach to the Theory of Large Deviations
Ebook
A Weak Convergence Approach to the Theory of Large Deviations
byPaul Dupuis
Rating: 4 out of 5 stars
4/5
Bayesian Learning: Fundamentals and Applications
Ebook
Bayesian Learning: Fundamentals and Applications
byFouad Sabry
Rating: 0 out of 5 stars
0 ratings
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Ebook
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
byPeter Bradley
Rating: 0 out of 5 stars
0 ratings
Learn Statistics Fast: A Simplified Detailed Version for Students
Ebook
Learn Statistics Fast: A Simplified Detailed Version for Students
byHesbon R.M
Rating: 0 out of 5 stars
0 ratings
Statistical Inference for Models with Multivariate t-Distributed Errors
Ebook
Statistical Inference for Models with Multivariate t-Distributed Errors
byA. K. Md. Ehsanes Saleh
Rating: 0 out of 5 stars
0 ratings
Calculus
Ebook
Calculus
byJagdish Krishanlal Arora
Rating: 0 out of 5 stars
0 ratings
Modern Multidimensional Calculus
Ebook
Modern Multidimensional Calculus
byMarshall Evans Munroe
Rating: 0 out of 5 stars
0 ratings
Exact Statistical Inference for Categorical Data
Ebook
Exact Statistical Inference for Categorical Data
byGuogen Shan
Rating: 0 out of 5 stars
0 ratings
Data Treatment in Environmental Sciences
Ebook
Data Treatment in Environmental Sciences
byValérie David
Rating: 0 out of 5 stars
0 ratings
Stochastic Calculus for Quantitative Finance
Ebook
Stochastic Calculus for Quantitative Finance
byAlexander A Gushchin
Rating: 0 out of 5 stars
0 ratings
Numerical Analysis
Ebook
Numerical Analysis
byJohn Todd
Rating: 0 out of 5 stars
0 ratings

Programming For You

Skip carousel

Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
Ebook
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
byTimothy C. Needham
Rating: 4 out of 5 stars
4/5
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
Ebook
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
byJoseph Labrecque
Rating: 5 out of 5 stars
5/5
HTML & CSS: Learn the Fundaments in 7 Days
Ebook
HTML & CSS: Learn the Fundaments in 7 Days
byMichael Knapp
Rating: 4 out of 5 stars
4/5
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
Ebook
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
byJason Scotts
Rating: 4 out of 5 stars
4/5
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
Ebook
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
byJames Tudor
Rating: 5 out of 5 stars
5/5
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
Ebook
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
byBrady Ellison
Rating: 5 out of 5 stars
5/5
CODING FOR ABSOLUTE BEGINNERS: How to Keep Your Data Safe from Hackers by Mastering the Basic Functions of Python, Java, and C++ (2022 Guide for Newbies)
Ebook
CODING FOR ABSOLUTE BEGINNERS: How to Keep Your Data Safe from Hackers by Mastering the Basic Functions of Python, Java, and C++ (2022 Guide for Newbies)
byEric Vargas
Rating: 0 out of 5 stars
0 ratings
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
Ebook
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
byi Code Academy
Rating: 5 out of 5 stars
5/5
Coding All-in-One For Dummies
Ebook
Coding All-in-One For Dummies
byNikhil Abraham
Rating: 4 out of 5 stars
4/5
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
Ebook
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
byAnthony Adams
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Python Machine Learning By Example
Ebook
Python Machine Learning By Example
byYuxi (Hayden) Liu
Rating: 4 out of 5 stars
4/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
Ebook
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
byGwendolyn Faraday
Rating: 5 out of 5 stars
5/5
Learn SQL in 24 Hours
Ebook
Learn SQL in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
Ebook
The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
byHeath Haskins
Rating: 5 out of 5 stars
5/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Linux: Learn in 24 Hours
Ebook
Linux: Learn in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
Ebook
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
bySebastian Raschka
Rating: 5 out of 5 stars
5/5
HTML & CSS QuickStart Guide: The Simplified Beginners Guide to Developing a Strong Coding Foundation, Building Responsive Websites, and Mastering the Fundamentals of Modern Web Design
Ebook
HTML & CSS QuickStart Guide: The Simplified Beginners Guide to Developing a Strong Coding Foundation, Building Responsive Websites, and Mastering the Fundamentals of Modern Web Design
byDavid DuRocher
Rating: 4 out of 5 stars
4/5
Pokemon Go: Guide + 20 Tips and Tricks You Must Read Hints, Tricks, Tips, Secrets, Android, iOS
Ebook
Pokemon Go: Guide + 20 Tips and Tricks You Must Read Hints, Tricks, Tips, Secrets, Android, iOS
byGame Guidez
Rating: 5 out of 5 stars
5/5
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
Ebook
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
byKevin Pitch
Rating: 5 out of 5 stars
5/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
SQL All-in-One For Dummies
Ebook
SQL All-in-One For Dummies
byAllen G. Taylor
Rating: 3 out of 5 stars
3/5
Modern C++ for Absolute Beginners: A Friendly Introduction to C++ Programming Language and C++11 to C++20 Standards
Ebook
Modern C++ for Absolute Beginners: A Friendly Introduction to C++ Programming Language and C++11 to C++20 Standards
bySlobodan Dmitrović
Rating: 0 out of 5 stars
0 ratings
Web Designer's Idea Book, Volume 4: Inspiration from the Best Web Design Trends, Themes and Styles
Ebook
Web Designer's Idea Book, Volume 4: Inspiration from the Best Web Design Trends, Themes and Styles
byPatrick McNeil
Rating: 4 out of 5 stars
4/5
101 Amazing Nintendo NES Facts: Includes facts about the Famicom
Ebook
101 Amazing Nintendo NES Facts: Includes facts about the Famicom
byJimmy Russell
Rating: 4 out of 5 stars
4/5
OneNote: The Ultimate Guide on How to Use Microsoft OneNote for Getting Things Done
Ebook
OneNote: The Ultimate Guide on How to Use Microsoft OneNote for Getting Things Done
byChris Will
Rating: 1 out of 5 stars
1/5
Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS
Ebook
Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS
byTravis Plunk
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

Energy Markets: Modellansatz 190
Podcast episode
Energy Markets: Modellansatz 190
byModellansatz - English episodes only
0 ratings
0% found this document useful
Pattern formation and travelling waves in a multiphase moving boundary model of tumour growth
Podcast episode
Pattern formation and travelling waves in a multiphase moving boundary model of tumour growth
byPaperPlayer biorxiv cell biology
0 ratings
0% found this document useful
Replicating Anomalies in Financial Markets with Hou, Xue, and Zhang: In this episode, I have three guests on the show with me: Kewei Hou of Ohio State University, Chen Xue of the University of Cincinnati, and Lu Zhang of Ohio State University. Kewei, Chen, and Lu have coauthored a paper titled "Replicating Anomalies,"...
Podcast episode
Replicating Anomalies in Financial Markets with Hou, Xue, and Zhang: In this episode, I have three guests on the show with me: Kewei Hou of Ohio State University, Chen Xue of the University of Cincinnati, and Lu Zhang of Ohio State University. Kewei, Chen, and Lu have coauthored a paper titled "Replicating Anomalies,"...
byEconomics Detective Radio
0 ratings
0% found this document useful
Counting Mitoses: SI(ze) matters!: In this episode, Dr. Ian Cree, Head of The WHO Tumour Classification discusses his team's recent open access publication in Modern Pathology. Historically, mitotic figures counting has been done by expressing the number of mitoses per n...
Podcast episode
Counting Mitoses: SI(ze) matters!: In this episode, Dr. Ian Cree, Head of The WHO Tumour Classification discusses his team's recent open access publication in Modern Pathology. Historically, mitotic figures counting has been done by expressing the number of mitoses per n...
byModPath Chat
0 ratings
0% found this document useful
A Computational Perspective on Metamathematics: Colloquium Mathematical Philosophy
Podcast episode
A Computational Perspective on Metamathematics: Colloquium Mathematical Philosophy
byMCMP – Philosophy of Mathematics
0 ratings
0% found this document useful
Dynamical Sampling: Modellansatz 173
Podcast episode
Dynamical Sampling: Modellansatz 173
byModellansatz - English episodes only
0 ratings
0% found this document useful
Dynamical Sampling
Podcast episode
Dynamical Sampling
byModellansatz
0 ratings
0% found this document useful
Singular Pertubation: Modellansatz 162
Podcast episode
Singular Pertubation: Modellansatz 162
byModellansatz - English episodes only
0 ratings
0% found this document useful
Office Hours w/ Professor Jacob Mays
Podcast episode
Office Hours w/ Professor Jacob Mays
byPublic Power Underground
0 ratings
0% found this document useful
Discriminative Stimuli (SDs) and Motivating Operations (MOs) - E-01 **RBT
Podcast episode
Discriminative Stimuli (SDs) and Motivating Operations (MOs) - E-01 **RBT
byA BA Study Podcast
0 ratings
0% found this document useful
Alignment Newsletter #163: Using finite factored sets for causal and temporal inference: Using finite factored sets for causal and temporal inference
Podcast episode
Alignment Newsletter #163: Using finite factored sets for causal and temporal inference: Using finite factored sets for causal and temporal inference
byAlignment Newsletter Podcast
0 ratings
0% found this document useful
056R_A place-based model for understanding community resilience to natural disasters (research summary)
Podcast episode
056R_A place-based model for understanding community resilience to natural disasters (research summary)
byWhat is The Future for Cities?
0 ratings
0% found this document useful
Nick Huntington-Klein, "The Effect: An Introduction to Research Design and Causality" (CRC Press, 2021): An interview with Nick Huntington-Klein
Podcast episode
Nick Huntington-Klein, "The Effect: An Introduction to Research Design and Causality" (CRC Press, 2021): An interview with Nick Huntington-Klein
byNew Books in Education
0 ratings
0% found this document useful
Nick Huntington-Klein, "The Effect: An Introduction to Research Design and Causality" (CRC Press, 2021): An interview with Nick Huntington-Klein
Podcast episode
Nick Huntington-Klein, "The Effect: An Introduction to Research Design and Causality" (CRC Press, 2021): An interview with Nick Huntington-Klein
byNew Books in Public Policy
0 ratings
0% found this document useful
Nick Huntington-Klein, "The Effect: An Introduction to Research Design and Causality" (CRC Press, 2021): An interview with Nick Huntington-Klein
Podcast episode
Nick Huntington-Klein, "The Effect: An Introduction to Research Design and Causality" (CRC Press, 2021): An interview with Nick Huntington-Klein
byNew Books in Economics
0 ratings
0% found this document useful
Nick Huntington-Klein, "The Effect: An Introduction to Research Design and Causality" (CRC Press, 2021): An interview with Nick Huntington-Klein
Podcast episode
Nick Huntington-Klein, "The Effect: An Introduction to Research Design and Causality" (CRC Press, 2021): An interview with Nick Huntington-Klein
byNew Books in Business, Management, and Marketing
0 ratings
0% found this document useful
Nick Huntington-Klein, "The Effect: An Introduction to Research Design and Causality" (CRC Press, 2021): An interview with Nick Huntington-Klein
Podcast episode
Nick Huntington-Klein, "The Effect: An Introduction to Research Design and Causality" (CRC Press, 2021): An interview with Nick Huntington-Klein
byNew Books in Sociology
0 ratings
0% found this document useful
Choice of friction coefficient deeply affects tissue behaviour in epithelial vertex models
Podcast episode
Choice of friction coefficient deeply affects tissue behaviour in epithelial vertex models
byPaperPlayer biorxiv cell biology
0 ratings
0% found this document useful
Quantification of extracellular matrix components in immunolabeled tissue samples
Podcast episode
Quantification of extracellular matrix components in immunolabeled tissue samples
byPaperPlayer biorxiv cell biology
0 ratings
0% found this document useful
Remarks on the foundations of mathematics: Colloquium Mathematical Philosophy
Podcast episode
Remarks on the foundations of mathematics: Colloquium Mathematical Philosophy
byMCMP – Philosophy of Mathematics
0 ratings
0% found this document useful
Partial Differential Equations: Origins, Developments and Roles in the Changing World - Gui-Qiang George Chen: Professor Gui-Qiang G. Chen presents in his inaugural lecture several examples to illustrate the origins, developments, and roles of partial differential equations in our changing world.
Podcast episode
Partial Differential Equations: Origins, Developments and Roles in the Changing World - Gui-Qiang George Chen: Professor Gui-Qiang G. Chen presents in his inaugural lecture several examples to illustrate the origins, developments, and roles of partial differential equations in our changing world.
byThe Secrets of Mathematics
0 ratings
0% found this document useful
Percent of Occurrence (A-06), Trials-to-Criterion (A-07), and Interobserver Agreement (A-08)
Podcast episode
Percent of Occurrence (A-06), Trials-to-Criterion (A-07), and Interobserver Agreement (A-08)
byA BA Study Podcast
0 ratings
0% found this document useful
Keeping ourselves honest when we work with observational healthcare data: The abundance of data in healthcare, and the valu…
Podcast episode
Keeping ourselves honest when we work with observational healthcare data: The abundance of data in healthcare, and the valu…
byLinear Digressions
0 ratings
0% found this document useful
[Bite] Data Science and the Scientific Method
Podcast episode
[Bite] Data Science and the Scientific Method
byDataCafé
0 ratings
0% found this document useful
Geometric Statistics in Machine Learning w/ geomstats with Nina Miolane - TWiML Talk #196: In this episode we’re joined by Nina Miolane, researcher and lecturer at Stanford University. Nina and I recently spoke about her work in the field of geometric statistics in machine learning. Specifically, we discuss the application of Riemannian...
Podcast episode
Geometric Statistics in Machine Learning w/ geomstats with Nina Miolane - TWiML Talk #196: In this episode we’re joined by Nina Miolane, researcher and lecturer at Stanford University. Nina and I recently spoke about her work in the field of geometric statistics in machine learning. Specifically, we discuss the application of Riemannian...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
ScaleFExSM: a lightweight and scalable method to extract fixed features from single cells in high-content imaging screens
Podcast episode
ScaleFExSM: a lightweight and scalable method to extract fixed features from single cells in high-content imaging screens
byPaperPlayer biorxiv cell biology
0 ratings
0% found this document useful
B. Fong and D. I. Spivak, "An Invitation to Applied Category Theory: Seven Sketches in Compositionality" (Cambridge UP, 2019): Fong and Spivak have written a marvelous and timely new textbook that, as its title suggests, invites readers of all backgrounds to explore what it means to take a compositional approach and how it might serve their needs....
Podcast episode
B. Fong and D. I. Spivak, "An Invitation to Applied Category Theory: Seven Sketches in Compositionality" (Cambridge UP, 2019): Fong and Spivak have written a marvelous and timely new textbook that, as its title suggests, invites readers of all backgrounds to explore what it means to take a compositional approach and how it might serve their needs....
byNew Books in Mathematics
0 ratings
0% found this document useful
A scalable, data analytics workflow for image-based morphological profiles
Podcast episode
A scalable, data analytics workflow for image-based morphological profiles
byPaperPlayer biorxiv cell biology
0 ratings
0% found this document useful
Episode 79: Basic Concepts in Statistics: A concise introduction to key concepts of statistics, explained in as clear a manner as possible. Includes a discussion of key concepts of probability, types of statistical data, sampling methods, the difference between descriptive and inferential statis...
Podcast episode
Episode 79: Basic Concepts in Statistics: A concise introduction to key concepts of statistics, explained in as clear a manner as possible. Includes a discussion of key concepts of probability, types of statistical data, sampling methods, the difference between descriptive and inferential statis...
byThe Science of Everything Podcast
100%
100% found this document useful
LM101-086: Ch8: How to Learn the Probability of Infinitely Many Outcomes: This 86th episode of Learning Machines 101 discusses the problem of assigning probabilities to a possibly infinite set of observed outcomes in a space-time continuum which corresponds to our physical world. The machine learning algorithm uses information
Podcast episode
LM101-086: Ch8: How to Learn the Probability of Infinitely Many Outcomes: This 86th episode of Learning Machines 101 discusses the problem of assigning probabilities to a possibly infinite set of observed outcomes in a space-time continuum which corresponds to our physical world. The machine learning algorithm uses information
byLearning Machines 101
0 ratings
0% found this document useful

Skip carousel

The Mathematics Of Contagion
Frontiers of Science
Article
The Mathematics Of Contagion
Apr 21, 2020
4 min read
Microcontrollers In Amateur Radio
CQ Amateur Radio
Article
Microcontrollers In Amateur Radio
Aug 1, 2022
My way of teaching about program data has always been a little different than the way most approach the subject. As you may know, pointers in C are a special type of variable that allows you to access data in a very efficient manner. Indeed, many com
6 min read
Matrix Multiplication Inches Closer to Mythic Goal
Quanta
Article
Matrix Multiplication Inches Closer to Mythic Goal
Mar 23, 2021
5 min read
R Data Types
Linux Format
Article
R Data Types
Feb 11, 2020
List – a generic vector containing other objects, and a vector is a sequence of data elements of the same basic type. Data frame – a list of vectors of equal length that is primarily used for storing data tables. It is used a lot in R and is equivale
1 min read
Estimating Relationships from Shared DNA
Family Tree
Article
Estimating Relationships from Shared DNA
Oct 18, 2022
1 min read
Smaller Is Better: Why Finite Number Systems Pack More Punch
Quanta
Article
Smaller Is Better: Why Finite Number Systems Pack More Punch
Feb 11, 2019
4 min read
Greenwashing in Graphs: an ExxonMobil Story
Union of Concerned Scientists
Article
Greenwashing in Graphs: an ExxonMobil Story
Apr 9, 2024
Research Scientist Carly Phillips takes a look at ExxonMobil's latest climate report to see if it bears up to scientific scrutiny (spoiler: nope).
4 min read
Memristor Setup Could Make Computer Chips More Efficient
Futurity
Article
Memristor Setup Could Make Computer Chips More Efficient
Jul 31, 2018
A new way of arranging advanced computer components called memristors on a chip could pave the way for their use in general computing. This could cut energy consumption by a factor of 100. Using memristors would improve performance in low power envir
2 min read
An Intellectual Odyssey
Business Today
Article
An Intellectual Odyssey
Dec 11, 2017
2 min read
Nerd’s Notes: How We Did The ClinicalTrials.gov Data Analysis
STAT
Article
Nerd’s Notes: How We Did The ClinicalTrials.gov Data Analysis
Mar 30, 2018
The principles of transparency and replication are as important to us as data journalists as they are to researchers.
5 min read
Website And RSS Feed Python Scraping
Linux Format
Article
Website And RSS Feed Python Scraping
Oct 18, 2022
Matt Holder has worked in IT support for over a decade, and is keen to utilise Linux alongside other installed systems. All the Python scripts that we’ve discussed in this tutorial are all available at https://github.com/mattmole/LXF295. Before we b
8 min read
Microcontrollers In Amateur Radio
CQ Amateur Radio
Article
Microcontrollers In Amateur Radio
Feb 1, 2023
3 min read
Life Science
Family Tree
Article
Life Science
Jun 27, 2023
6 min read
How Pi Connects Colliding Blocks to a Quantum Search Algorithm
Quanta
Article
How Pi Connects Colliding Blocks to a Quantum Search Algorithm
Jan 21, 2020
6 min read
Excel and Calc
APC
Article
Excel and Calc
Aug 12, 2019
5 min read
Microcontrollers In Amateur Radio
CQ Amateur Radio
Article
Microcontrollers In Amateur Radio
Nov 1, 2022
7 min read
Data Analysis
Linux Format
Article
Data Analysis
Mar 10, 2020
Sometimes you receive raw data that needs to be processed before plotting. In Veusz, look under the Data > Operations menu and find lots of options for manipulating data sets. Joining, merging, finding the average, filtering and many more are availab
1 min read
Are Neural Networks About to Reinvent Physics?
Nautilus
Article
Are Neural Networks About to Reinvent Physics?
Nov 21, 2019
Can AI teach itself the laws of physics? Will classical computers soon be replaced by deep neural networks? Sure looks like it, if you’ve been following the news, which lately has been filled with headlines like, “A neural net solves the three-body p
9 min read
The Vector Data Type
Linux Format
Article
The Vector Data Type
May 31, 2022
The Vector data type is very popular in Rust, so it deserves some extra attention. First of all, a Vector is like an array. As is the case with arrays, index values start from 0. The main advantage of vectors over arrays is that vectors can be resize
1 min read
The Vector Data Type
Linux Format
Article
The Vector Data Type
May 31, 2022
The Vector data type is very popular in Rust, so it deserves some extra attention. First of all, a Vector is like an array. As is the case with arrays, index values start from 0. The main advantage of vectors over arrays is that vectors can be resize
1 min read
References
AQ: Australian Quarterly
Article
References
Dec 31, 2017
1 Benjamin J. Cohen, “Electronic Money: New Day or False Dawn?,” Review of International Political Economy 8, no. 2 (2001): 197–225. 2 Manuel Castells, “Informationalism, Networks, and the Network Society: A Theoretical Blueprint” in I M. Castells: T
5 min read
Working With Binary Tree Data Structures
Linux Format
Article
Working With Binary Tree Data Structures
Mar 8, 2022
Mihalis Tsoukalos is a systems engineer and a technical writer. You can reach him at www.mtsoukalos.eu and @mactsouk. The main benefit of using a binary tree or a tree in general is that you can quickly find out if an element is present or not, compa
10 min read
Working With Binary Tree Data Structures
Linux Format
Article
Working With Binary Tree Data Structures
Mar 8, 2022
Mihalis Tsoukalos is a systems engineer and a technical writer. You can reach him at www.mtsoukalos.eu and @mactsouk. The main benefit of using a binary tree or a tree in general is that you can quickly find out if an element is present or not, compa
10 min read
Scripting Text-based Checklists In Bash
Linux Format
Article
Scripting Text-based Checklists In Bash
Jan 14, 2020
7 min read
How And Where You Use Machine-learning
APC
Article
How And Where You Use Machine-learning
Oct 7, 2019
4 min read
Forecasts For Covid-19 Based On Artificial Intelligence
Frontiers of Science
Article
Forecasts For Covid-19 Based On Artificial Intelligence
Apr 21, 2020
3 min read
Grid Modeling Overview: Four Types of Models Guiding the Transition to Clean Electricity
Union of Concerned Scientists
Article
Grid Modeling Overview: Four Types of Models Guiding the Transition to Clean Electricity
Apr 25, 2022
6 min read
'The Cloud' and Other Dangerous Metaphors
The Atlantic
Article
'The Cloud' and Other Dangerous Metaphors
Jan 20, 2015
4 min read
Understanding a Material List
Woodworker's Journal
Article
Understanding a Material List
Dec 29, 2020
A material list, which can also be known as a cutting list, bill of materials or schedule of materials, is simply a listing of all the parts that will be required to construct a project. In today’s terms, it is a spreadsheet that allows a woodworker
1 min read
Facebook Neural Nets Solve Differential Equations
Popular Mechanics South Africa
Article
Facebook Neural Nets Solve Differential Equations
Feb 22, 2021
IF UNIVERSITY students could obtain a copy of Facebook’s latest neural network – a series of algorithms that resemble the human brain – they could cheat all the way through Calculus 300. At the least, they could solve the following differential equat
3 min read

Related categories

Skip carousel

Reviews for Co-Clustering

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Co-Clustering - Gérard Govaert

Introduction

Many of the data sets encountered in statistics are two dimensional and can be represented by a rectangular numeric table, that is an n by d data matrix x = (xij) defined on two sets I and J, sometimes referred to as two-way or two-mode data. For instance, I may be a set of individuals (observations, cases, objects and persons) and J may be a set of variables (measurements, attributes and features). The data matrix then collects the values taken by all the variables for each individual. These data may be represented either as a table of individuals–variables as in the case of continuous variables, or as a frequency table or contingency table as in the case of categorical variables. In the following we examine a number of types of data on which co-clustering can be performed.

I.1. Types and representation of data

The type of a variable is determined by the set of possible values that the variable can take. In the following, we briefly review each type.

I.1.1. Binary data

Binary variables are widely used in statistics. Examples include presence–absence data in ecology, black and white pixels in image processing and the data obtained when recoding a table of qualitative variables. Data take the form of a sample x = (x1,..., xn) where xi is a vector (xi1,..., xid) of values xij belonging to the set {0,1}. For example, the data might correspond to a set of 10 squares of woodland in which the presence (1) or absence (0) of two types of butterflies P1 and P2 was observed. Figure I.1 illustrates three alternative ways of presenting these data.

Figure I.1. Example of binary data

intro-1

Binary data have been treated in clustering with a large number of distances, most of which are defined using the values n11, n10, n01 and n00 of the table crossing the two variables. For example, the distances between two binary vectors i and i’ measured using Jaccarďs index and the agreement coefficient can be written, respectively

intro-2

I.1.2. Categorical data

Categorical variables, sometimes known as qualitative variables or factors, are a generalization of binary data to situations where there are more than two possible values. Here, each variable may take an arbitrary finite set of values, usually referred to as categories, modalities or levels. Like binary data, categorical data may be represented in different ways: as a table of individuals–variables of dimension (n,d), as a frequency vector for the different possible states, as a contingency table with d dimensions linking the categories or as a complete disjunctive table where categories are represented by their indicators. In this last form of representation, which we will use here, the data are composed of a sample intro-3 where intro-4 intro-5 with

intro-6

where mj denotes the number of modalities of the variable j. In Figure I.2, a data matrix is shown which consists of a set of eight individuals described by three categorical variables A, B and C and its associated complete disjunctive table.

Figure I.2. Example of categorical data (left) and its associated complete disjunctive table (right)

intro-7

I.1.3. Continuous data

Continuous data are undoubtedly the most current type of data and can be found in all areas. The structure takes the form of a relational table where the d columns are continuous variables and intro-8 . They can be positive or negative with different units and variabilities. The measurement unit used can affect the results of different methods of data analysis and a normalization or transformation is often necessary. For instance, a variable can be normalized by scaling its values so that they lie within a specified range, such as [0, 1]. This aim can be achieved by the min-max normalization defined by

intro-9 ,

where minj and maxj are, respectively, the lowest and the highest values taken by the variable j. The logarithmic transformation is also commonly used to pre-process data. These two transformations are frequently used with microarray data sets in order to overcome problems of inaccuracy of measurement or to provide values that are more easily interpretable. Other transformation techniques exist and are commonly used. We can cite, for instance, the z-score normalization defined by

intro-10 ,

where intro-11 and intro-11 are, respectively, the mean and the standard deviation of the variable j. Sometimes, and in order to reduce the effect of outliers, a variation of this z-score normalization consists of replacing intro-12 by intro-12 , the mean absolute deviation of j. Different ways to normalize the data also exist. The user should pay special attention to this step as it is essential for obtaining meaningful results.

Besides, most authors distinguish two types of analysis: Tryon and Bailey [TRY 70] suggest 0-Analysis for the study of objects and V-Analysis for the study of variables. According to them, the earliest works relate to the analysis of objects, which is the classification (taxonomy). The first work on the analysis of the variables, from Pearson and Spearman, is the factor analysis. In other domains, these two types of analysis are called P-technique and Q-technique.

In the data previously described, both sets (individuals and variables) show a strong asymmetry, however in some situations the two sets play a similar role and can be interchanged. The contingency table studied in the next section is the most common example of this type of data.

I.1.4. Contingency table

There are many situations where we try to study the association between two categorical variables. A two-way contingency table is a method for summarizing the two variables. We can remark that this definition can be easily extended to more categorical variables. With data of this kind, the cells, formed by the cross-tabulation of two categorical variables, I having n categories and J having d categories, contain the frequency counts of the individuals belonging to these cells. Contingency tables of this sort can be found in many distinctive applications. An important example is information retrieval and document clustering, where I may correspond to a collection of documents and J to a set of words, the frequency denotes the number of occurrences of a word in a document. It is also noteworthy that the definition of the contingency table can also be extended to tables where every entry expresses a quantity of the same matter, in such a way that all of the entries can be meaningfully summed up to a number expressing the total amount of matter in the data. Examples of such data are trade tables showing the money transferred from country i to country j during a specified period. We now specify the notation that will be used to study the contingency table.

Let x = (xij, i = 1, … , n; j = 1, … , d) be a two-way contingency table associated with two categorical random variables that take values in sets I = {1, … , n} and J = {1, … , d}. The entries xij are co-occurrences of row and column categories, each of which counts the number of entities that fall simultaneously into the corresponding row and column categories. The sum of frequencies of row and column categories, usually called marginals, are denoted by xi and x.j and defined by intro-16 and intro-17 . Here, we use the usual dot notation to express the sum with respect to the suffix replaced by a dot. Let PIJ = (pij) denote the sample joint probability distribution. It is a matrix of size n × d defined by intro-20 where N = x..· The sample marginal probability distributions are defined by intro-22 The sample joint probability distribution pij can be considered as estimators of the probabilities ξij that the two categorical random variables occur in the cell in row i and column j. Table I.1 presents the form of the contingency table and of the corresponding sample joint distribution.

Table I.1. Contingency table and sample joint distribution

intro-25

Sometimes, and specifically in document clustering when the rows are documents and the columns are words, some transformations of data are necessary. For instance, the co-occurrences can be replaced by the tf-idf statistics [JON 72]. Different variants are proposed and commonly used in information retrieval and text mining.

I.1.5. Data representations

Different representations can be associated with the types of data described in the previous section.

Geometrical representation: for the continuous data, a classical geometrical representation consists of regarding these data as n points in d dimensions. In a dual way, a second and less familiar geometrical representation consists of regarding the data as d points in n dimensions. The classical methods, such as principal component analysis and k-means algorithm, used such representations extensively. Correspondence analysis [BEN 73b] uses similar geometrical representations to the contingency table.

Bipartite graph: in all situations, it is possible to associate the data matrix to a bipartite graph whose vertices are the elements of the union I ∪ J of sets I and J. For individuals × variables table and the contingency table, the edges of the graph are the set of pairs {(i, j), i ∈ I, j ∈ J} weighted by corresponding entries xij in the data matrix. For binary data, the edges of the graph are the set of pairs (i, j) such that xij = 1 (see, for instance, Figure I.3). This representation is frequently considered in the graph community such as in Web 2.0 tagging data and social networks.

Figure I.3. Binary data and its associated bipartite graph

intro-30

The methods we are interested in next are clustering methods and, specifically, the simultaneous clustering of I and J. To this end, we will review the motivation of simultaneous analysis and then introduce co-clustering.

I.2. Simultaneous analysis

I.2.1. Data analysis

Given a data matrix, the objective of data analysis can be viewed as the simultaneous analysis of the two sets I and J to identify underlying structures that may exist between these two sets. Different approaches such as exploratory analysis (graphical representation or numerical summary) or dimension reduction have been used. Principal component analysis and correspondence analysis are examples of such methods. This last method given by Benzecri [BEN 73b] is one of the best known methods that performs analysis simultaneously on both sets I and J. The data table must be a contingency table or at least it must have similar properties. The properties of this approach, especially transition formulas, allow us to exchange the results on the sets I and J. These properties help us to define a set of barycentric relations, justifying a simultaneous representation of I and J and allowing us to simultaneously visualize the proximity among the elements of I, the elements of J and the elements of I and J. Finally let us quote the unfolding method of [COO 50] for which the objective is to represent rank preference data on a line or a plan. Each individual is represented by an ideal point such that the relation of order among the variables, defined by the distances between the ideal point and the various variables, is closest to the order given in the initial data.

Other methods relate to direct processing of the data matrix. For instance, seriation methods amount to finding a permutation of rows associated with a permutation of columns, leading to a reshaped data matrix with a maximum density of high cell values along the diagonal, in addition to low value areas in the upper and lower parts. Such approaches have been used, for instance, in archaeology, phytosociology, geography and production management. Caraux [CAR 84] proposed a criterion based on an objective function with quadratic costs and Bertin [BER 80] proposed a manual heuristics based on visual densification. Factorial methods such as correspondence analysis can also be used. Note that when correspondence analysis gives rise to a U-shaped effect (Guttman effect) on the first two axes of the factorial representation, there exists a latent order within the rows and the columns leading to diagonal band reshaping, which corresponds to the order of the projections along the first axis of the rows and columns.

This book is devoted to another group of methods of simultaneous analysis of two sets by using the notion of clustering. With a two-way or two-mode data set, clustering algorithms are often applied to just one mode of the data matrix, which can be done in a hierarchical or non-hierarchical way. Among the non-hierarchical methods, k-means clustering [FOR 65, MAC 67, HAR 75b] is one of the most popular methods. Contrary to this approach, there is a relatively new form of clustering that analyzes the two sets simultaneously. These methods, called direct clustering, cross-clustering, simultaneous clustering, co-clustering, biclustering, two-way clustering, two-mode clustering or two-side clustering, have developed considerably in recent times.

I.2.2. Co-clustering

A large number of co-clustering algorithms have been proposed to date. One of the earliest and most cited biclustering formulations, known as block clustering, was proposed by Hartigan [HAR 72, HAR 75a]. He sought to organize the data table using structures that may be, for example, defined from classifications on each of the two sets. This kind of method is sometimes known as direct clustering. Older works can also be cited. For instance, this problem was first described formally by Good [GOO 65] who proposed a technique for the simultaneous clustering of objects and variables. Fisher [FIS 69] posed the problem of the simultaneous search for clustering on the row and column dimensions of a data matrix in a metric way. He defined a criterion for optimization, but offered no method to solve this problem. Tryon and Bailey [TRY 70] first clustered the set of variables using the correlation matrix and then, using a distance measure across the clusters of variables, clustered the set of individuals. Dubin and Champoux [DUB 70] proposed a method that combines the variables into types, and associates each individual with the types of variables forming a classification of individuals. More often, the

Enjoying the preview?

Page 1 of 1

Co-Clustering: Models, Algorithms and Applications

About this ebook

Gérard Govaert

Related authors

Related to Co-Clustering

Related ebooks

Programming For You

Related podcast episodes

Related articles

Related categories

Reviews for Co-Clustering

What did you think?

Book preview

Co-Clustering - Gérard Govaert

Introduction

I.1. Types and representation of data

I.1.1. Binary data

I.1.2. Categorical data

I.1.3. Continuous data

I.1.4. Contingency table

I.1.5. Data representations

I.2. Simultaneous analysis

I.2.1. Data analysis

I.2.2. Co-clustering