Ebook321 pages3 hours

Metaheuristics for Big Data

Name: Metaheuristics for Big Data
Author: Clarisse Dhaenens
ISBN: 9781119347606

By Clarisse Dhaenens and Laetitia Jourdan

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Big Data is a new field, with many technological challenges to be understood in order to use it to its full potential. These challenges arise at all stages of working with Big Data, beginning with data generation and acquisition. The storage and management phase presents two critical challenges: infrastructure, for storage and transportation, and conceptual models. Finally, to extract meaning from Big Data requires complex analysis. Here the authors propose using metaheuristics as a solution to these challenges; they are first able to deal with large size problems and secondly flexible and therefore easily adaptable to different types of data and different contexts.

The use of metaheuristics to overcome some of these data mining challenges is introduced and justified in the first part of the book, alongside a specific protocol for the performance evaluation of algorithms. An introduction to metaheuristics follows. The second part of the book details a number of data mining tasks, including clustering, association rules, supervised classification and feature selection, before explaining how metaheuristics can be used to deal with them. This book is designed to be self-contained, so that readers can understand all of the concepts discussed within it, and to provide an overview of recent applications of metaheuristics to knowledge discovery problems in the context of Big Data.

Skip carousel

Computers

LanguageEnglish

PublisherWiley

Release dateAug 16, 2016

ISBN9781119347606

Author

Clarisse Dhaenens

Related authors

Skip carousel

Related to Metaheuristics for Big Data

Related ebooks

Skip carousel

Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
Ebook
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
byEMC Education Services
Rating: 0 out of 5 stars
0 ratings
Pattern Recognition
Ebook
Pattern Recognition
byKonstantinos Koutroumbas
Rating: 4 out of 5 stars
4/5
Data Mining Applications with R
Ebook
Data Mining Applications with R
byYanchang Zhao
Rating: 4 out of 5 stars
4/5
Data Mining: Know It All
Ebook
Data Mining: Know It All
bySoumen Chakrabarti
Rating: 0 out of 5 stars
0 ratings
Data Science: Concepts, Strategies, and Applications
Ebook
Data Science: Concepts, Strategies, and Applications
byZemelak Goraga
Rating: 0 out of 5 stars
0 ratings
Modern Industrial Statistics: with applications in R, MINITAB and JMP
Ebook
Modern Industrial Statistics: with applications in R, MINITAB and JMP
byRon S. Kenett
Rating: 0 out of 5 stars
0 ratings
A Practical Guide to Data Mining for Business and Industry
Ebook
A Practical Guide to Data Mining for Business and Industry
byAndrea Ahlemeyer-Stubbe
Rating: 0 out of 5 stars
0 ratings
Efficient Management of Large Metadata Catalogs in a Ubiquitous Computing Environment
Ebook
Efficient Management of Large Metadata Catalogs in a Ubiquitous Computing Environment
byDaniel Beatty
Rating: 0 out of 5 stars
0 ratings
Big Data: Principles and Paradigms
Ebook
Big Data: Principles and Paradigms
byRajkumar Buyya
Rating: 0 out of 5 stars
0 ratings
The Analytics Lifecycle Toolkit: A Practical Guide for an Effective Analytics Capability
Ebook
The Analytics Lifecycle Toolkit: A Practical Guide for an Effective Analytics Capability
byGregory S. Nelson
Rating: 0 out of 5 stars
0 ratings
Handbook of Metaheuristic Algorithms: From Fundamental Theories to Advanced Applications
Ebook
Handbook of Metaheuristic Algorithms: From Fundamental Theories to Advanced Applications
byChun-Wei Tsai
Rating: 0 out of 5 stars
0 ratings
Data Mining and Statistics for Decision Making
Ebook
Data Mining and Statistics for Decision Making
byStéphane Tufféry
Rating: 0 out of 5 stars
0 ratings
Environmental Data Analysis with MatLab
Ebook
Environmental Data Analysis with MatLab
byWilliam Menke
Rating: 0 out of 5 stars
0 ratings
Artificial Intelligence Methods for Optimization of the Software Testing Process: With Practical Examples and Exercises
Ebook
Artificial Intelligence Methods for Optimization of the Software Testing Process: With Practical Examples and Exercises
bySahar Tahvili
Rating: 0 out of 5 stars
0 ratings
Computational and Statistical Methods for Analysing Big Data with Applications
Ebook
Computational and Statistical Methods for Analysing Big Data with Applications
byShen Liu
Rating: 0 out of 5 stars
0 ratings
The Art and Science of Analyzing Software Data
Ebook
The Art and Science of Analyzing Software Data
byChristian Bird
Rating: 0 out of 5 stars
0 ratings
View-based 3-D Object Retrieval
Ebook
View-based 3-D Object Retrieval
byYue Gao
Rating: 5 out of 5 stars
5/5
The Handbook of Behavioral Operations
Ebook
The Handbook of Behavioral Operations
byKaren Donohue
Rating: 0 out of 5 stars
0 ratings
Data Analytics and Big Data
Ebook
Data Analytics and Big Data
bySoraya Sedkaoui
Rating: 0 out of 5 stars
0 ratings
Statistical Pattern Recognition
Ebook
Statistical Pattern Recognition
byAndrew R. Webb
Rating: 4 out of 5 stars
4/5
Performance Evaluation by Simulation and Analysis with Applications to Computer Networks
Ebook
Performance Evaluation by Simulation and Analysis with Applications to Computer Networks
byKen Chen
Rating: 0 out of 5 stars
0 ratings
Applied Logistic Regression
Ebook
Applied Logistic Regression
byDavid W. Hosmer, Jr.
Rating: 5 out of 5 stars
5/5
Management of IOT Open Data Projects in Smart Cities
Ebook
Management of IOT Open Data Projects in Smart Cities
byCezary Orlowski
Rating: 0 out of 5 stars
0 ratings
Guerrilla Analytics: A Practical Approach to Working with Data
Ebook
Guerrilla Analytics: A Practical Approach to Working with Data
byEnda Ridge
Rating: 5 out of 5 stars
5/5
Architecture and Patterns for IT Service Management, Resource Planning, and Governance: Making Shoes for the Cobbler's Children
Ebook
Architecture and Patterns for IT Service Management, Resource Planning, and Governance: Making Shoes for the Cobbler's Children
byCharles T. Betz
Rating: 0 out of 5 stars
0 ratings
Effective CRM using Predictive Analytics
Ebook
Effective CRM using Predictive Analytics
byAntonios Chorianopoulos
Rating: 0 out of 5 stars
0 ratings
Modeling and Analysis of Real-Time and Embedded Systems with UML and MARTE: Developing Cyber-Physical Systems
Ebook
Modeling and Analysis of Real-Time and Embedded Systems with UML and MARTE: Developing Cyber-Physical Systems
byBran Selic
Rating: 5 out of 5 stars
5/5
System Requirements Analysis
Ebook
System Requirements Analysis
byJeffrey O. Grady
Rating: 2 out of 5 stars
2/5
Optimization of Logistics
Ebook
Optimization of Logistics
byAlice Yalaoui
Rating: 1 out of 5 stars
1/5
Joe Celko's Trees and Hierarchies in SQL for Smarties
Ebook
Joe Celko's Trees and Hierarchies in SQL for Smarties
byJoe Celko
Rating: 0 out of 5 stars
0 ratings

Computers For You

Skip carousel

Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls
Ebook
Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls
byKathleen Hale
Rating: 4 out of 5 stars
4/5
The Invisible Rainbow: A History of Electricity and Life
Ebook
The Invisible Rainbow: A History of Electricity and Life
byArthur Firstenberg
Rating: 4 out of 5 stars
4/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
Ebook
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
byGary Smith
Rating: 4 out of 5 stars
4/5
Elon Musk
Ebook
Elon Musk
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
Ebook
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
byRizwan Virk
Rating: 5 out of 5 stars
5/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
Ebook
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
byQuentin Docter
Rating: 0 out of 5 stars
0 ratings
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
Ebook
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
byAaron Smith
Rating: 0 out of 5 stars
0 ratings
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
Ebook
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
byAndrew Hodges
Rating: 4 out of 5 stars
4/5
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
Ebook
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
byTJ Books
Rating: 0 out of 5 stars
0 ratings
The Hacker Crackdown: Law and Disorder on the Electronic Frontier
Ebook
The Hacker Crackdown: Law and Disorder on the Electronic Frontier
byBruce Sterling
Rating: 4 out of 5 stars
4/5
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
Ebook
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
byTriumph Books
Rating: 4 out of 5 stars
4/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
CompTIA Security+ Practice Questions
Ebook
CompTIA Security+ Practice Questions
byIP Specialist
Rating: 2 out of 5 stars
2/5
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
Ebook
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
Ebook
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
bySeth Stephens-Davidowitz
Rating: 4 out of 5 stars
4/5
Childhood Unplugged: Practical Advice to Get Kids Off Screens and Find Balance
Ebook
Childhood Unplugged: Practical Advice to Get Kids Off Screens and Find Balance
byKatherine Johnson Martinko
Rating: 0 out of 5 stars
0 ratings
How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)
Ebook
How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)
byDavid Kadavy
Rating: 5 out of 5 stars
5/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
The Professional Voiceover Handbook: Voiceover training, #1
Ebook
The Professional Voiceover Handbook: Voiceover training, #1
byPeter Baker
Rating: 5 out of 5 stars
5/5
People Skills for Analytical Thinkers
Ebook
People Skills for Analytical Thinkers
byGilbert Eijkelenboom
Rating: 5 out of 5 stars
5/5
Going Text: Mastering the Command Line
Ebook
Going Text: Mastering the Command Line
byBrian Schell
Rating: 4 out of 5 stars
4/5
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
AP Computer Science Principles Premium, 2024: 6 Practice Tests + Comprehensive Review + Online Practice
Ebook
AP Computer Science Principles Premium, 2024: 6 Practice Tests + Comprehensive Review + Online Practice
bySeth Reichelson
Rating: 0 out of 5 stars
0 ratings
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
Ebook
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
byDavid Mayer
Rating: 0 out of 5 stars
0 ratings
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
Ebook
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
byAlex Parkinson
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

37. Sean Knapp - The brave new world of data engineering
Podcast episode
37. Sean Knapp - The brave new world of data engineering
byTowards Data Science
0 ratings
0% found this document useful
MLOps Coffee Sessions #10 Analyzing the Article “Continuous Delivery and Automation Pipelines in Machine Learning" // Part 2
Podcast episode
MLOps Coffee Sessions #10 Analyzing the Article “Continuous Delivery and Automation Pipelines in Machine Learning" // Part 2
byMLOps.community
0 ratings
0% found this document useful
State of Containers in the Public Cloud
Podcast episode
State of Containers in the Public Cloud
byThe Cloudcast
0 ratings
0% found this document useful
Database Monitoring & Observability
Podcast episode
Database Monitoring & Observability
byThe Cloudcast
0 ratings
0% found this document useful
Allen Day: Google’s Mission to Provide Open Datasets for Public Blockchains: We're joined by Allen Day, Science Advocate at Google. Earlier this year, he and his team released both Bitcoin and Ethereum as public datasets in Big Query, Google big data IaaS offering.
Podcast episode
Allen Day: Google’s Mission to Provide Open Datasets for Public Blockchains: We're joined by Allen Day, Science Advocate at Google. Earlier this year, he and his team released both Bitcoin and Ethereum as public datasets in Big Query, Google big data IaaS offering.
byEpicenter - Learn about Crypto, Blockchain, Ethereum, Bitcoin and Distributed Technologies
0 ratings
0% found this document useful
2730: Unveiling the Green Data Blind Spot With NetApp: Today, we're delving into a topic quietly shaping the environmental discourse in the tech world – the ecological impact of data storage. Matt Watts, the Chief Technology Evangelist at NetApp, joins me and brings a wealth of knowledge and experience...
Podcast episode
2730: Unveiling the Green Data Blind Spot With NetApp: Today, we're delving into a topic quietly shaping the environmental discourse in the tech world – the ecological impact of data storage. Matt Watts, the Chief Technology Evangelist at NetApp, joins me and brings a wealth of knowledge and experience...
byThe Tech Talks Daily Podcast
0 ratings
0% found this document useful
MLOps Coffee Sessions #11: Analyzing “Continuous Delivery and Automation Pipelines in ML" // Part 3
Podcast episode
MLOps Coffee Sessions #11: Analyzing “Continuous Delivery and Automation Pipelines in ML" // Part 3
byMLOps.community
0 ratings
0% found this document useful
Yaniv Tal: The Graph – A Marketplace for Web3 Data Indexes Based on GraphQL: We're joined by Yaniv Tal, Project Lead at The Graph. The project aims to create a scalable marketplace for high-availability blockchain data indexes.
Podcast episode
Yaniv Tal: The Graph – A Marketplace for Web3 Data Indexes Based on GraphQL: We're joined by Yaniv Tal, Project Lead at The Graph. The project aims to create a scalable marketplace for high-availability blockchain data indexes.
byEpicenter - Learn about Crypto, Blockchain, Ethereum, Bitcoin and Distributed Technologies
0 ratings
0% found this document useful
System Observability For The Cloud Native Era With Chronosphere: An interview about the Chronosphere platform and the M3DB storage engine for managing system metrics to power observability in the cloud native era.
Podcast episode
System Observability For The Cloud Native Era With Chronosphere: An interview about the Chronosphere platform and the M3DB storage engine for managing system metrics to power observability in the cloud native era.
byData Engineering Podcast
0 ratings
0% found this document useful
Conquering the Last Mile in Data - Caitlin Moorman
Podcast episode
Conquering the Last Mile in Data - Caitlin Moorman
byDataTalks.Club
0 ratings
0% found this document useful
MLOps Coffee Sessions #6 // Continuous Integration for ML // Featuring Elle O'Brien
Podcast episode
MLOps Coffee Sessions #6 // Continuous Integration for ML // Featuring Elle O'Brien
byMLOps.community
0 ratings
0% found this document useful
22. Luke Marsden - Data Science Infrastructure and MLOps
Podcast episode
22. Luke Marsden - Data Science Infrastructure and MLOps
byTowards Data Science
0 ratings
0% found this document useful
48. Big Data Wrangling for Core Sensing Technology
Podcast episode
48. Big Data Wrangling for Core Sensing Technology
byDiscovery to Recovery
0 ratings
0% found this document useful
A "AI & ML" Look Ahead for 2020
Podcast episode
A "AI & ML" Look Ahead for 2020
byThe Cloudcast
0 ratings
0% found this document useful
Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer: Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. In order to enable this use case, while maintaining a single point of access, the semantic layer has evolved as a technological solution to the problem. In this episode Artyom Keydunov, creator of Cube, discusses the evolution and applications of the semantic layer as a component of your data platform, and how Cube provides speed and cost optimization for your data consumers.
Podcast episode
Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer: Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. In order to enable this use case, while maintaining a single point of access, the semantic layer has evolved as a technological solution to the problem. In this episode Artyom Keydunov, creator of Cube, discusses the evolution and applications of the semantic layer as a component of your data platform, and how Cube provides speed and cost optimization for your data consumers.
byData Engineering Podcast
0 ratings
0% found this document useful
How to pick projects for a professional data science team: This week's episodes is for data scientists, sure…
Podcast episode
How to pick projects for a professional data science team: This week's episodes is for data scientists, sure…
byLinear Digressions
0 ratings
0% found this document useful
?ThursdAI - LAION down, OpenChat beats GPT3.5, Apple is showing where it's going, Midjourney v6 is here & Suno can make music!
Podcast episode
?ThursdAI - LAION down, OpenChat beats GPT3.5, Apple is showing where it's going, Midjourney v6 is here & Suno can make music!
byThursdAI - The top AI news from the past week
0 ratings
0% found this document useful
Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel: Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. As the sophistication increases, so does the complexity, leading to challenges for user experience. Jignesh Patel has been researching these areas for several years in his work as a professor at Carnegie Mellon University. In this episode he illuminates the landscape of problems that we are faced with and how his research is aimed at helping to solve these problems.
Podcast episode
Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel: Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. As the sophistication increases, so does the complexity, leading to challenges for user experience. Jignesh Patel has been researching these areas for several years in his work as a professor at Carnegie Mellon University. In this episode he illuminates the landscape of problems that we are faced with and how his research is aimed at helping to solve these problems.
byData Engineering Podcast
0 ratings
0% found this document useful
Episode 15: Nagios was the Original Call of Duty: Let’s chat about the Cloud and everything in between. The people in this world are pretty comfortable with not running physical servers on their own, but trusting someone else to run them. Yet, people suffer from the psychological barrier of thinking they
Podcast episode
Episode 15: Nagios was the Original Call of Duty: Let’s chat about the Cloud and everything in between. The people in this world are pretty comfortable with not running physical servers on their own, but trusting someone else to run them. Yet, people suffer from the psychological barrier of thinking they
byScreaming in the Cloud
0 ratings
0% found this document useful
Automated Data Labeling for AI Apps
Podcast episode
Automated Data Labeling for AI Apps
byThe Cloudcast
0 ratings
0% found this document useful
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library: Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.
Podcast episode
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library: Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.
byData Engineering Podcast
0 ratings
0% found this document useful
New Trends in Serverless
Podcast episode
New Trends in Serverless
byThe Cloudcast
0 ratings
0% found this document useful
Qubit with Matthew Tamsett and Ravi Upreti: Our guests Matthew Tamsett and Ravi Upreti join Gabi Ferrara and Aja Hammerly to talk about data science and their project, Qubit.
Podcast episode
Qubit with Matthew Tamsett and Ravi Upreti: Our guests Matthew Tamsett and Ravi Upreti join Gabi Ferrara and Aja Hammerly to talk about data science and their project, Qubit.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
Privacy-aware Data Pipelines with Skyflow’s Piper Keyes: A data analytics pipeline is important to modern businesses because it allows them to extract valuable insights from the large amounts of data they generate and collect on a daily basis. This leads to better decision making, improved efficiency, and ...
Podcast episode
Privacy-aware Data Pipelines with Skyflow’s Piper Keyes: A data analytics pipeline is important to modern businesses because it allows them to extract valuable insights from the large amounts of data they generate and collect on a daily basis. This leads to better decision making, improved efficiency, and ...
byPartially Redacted: Data Privacy, Security & Compliance
0 ratings
0% found this document useful
Understanding Graph Database Patterns
Podcast episode
Understanding Graph Database Patterns
byThe Cloudcast
0 ratings
0% found this document useful
Powering your Copilot for Data – with Artem Keydunov of Cube.dev
Podcast episode
Powering your Copilot for Data – with Artem Keydunov of Cube.dev
byLatent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0
0 ratings
0% found this document useful
How Column-Aware Development Tooling Yields Better Data Models: Architectural decisions are all based on certain constraints and a desire to optimize for different outcomes. In data systems one of the core architectural exercises is data modeling, which can have significant impacts on what is and is not possible for downstream use cases. By incorporating column-level lineage in the data modeling process it encourages a more robust and well-informed design. In this episode Satish Jayanthi explores the benefits of incorporating column-aware tooling in the data modeling process.
Podcast episode
How Column-Aware Development Tooling Yields Better Data Models: Architectural decisions are all based on certain constraints and a desire to optimize for different outcomes. In data systems one of the core architectural exercises is data modeling, which can have significant impacts on what is and is not possible for downstream use cases. By incorporating column-level lineage in the data modeling process it encourages a more robust and well-informed design. In this episode Satish Jayanthi explores the benefits of incorporating column-aware tooling in the data modeling process.
byData Engineering Podcast
0 ratings
0% found this document useful
High Agency Pydantic > VC Backed Frameworks — with Jason Liu of Instructor
Podcast episode
High Agency Pydantic > VC Backed Frameworks — with Jason Liu of Instructor
byLatent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0
0 ratings
0% found this document useful
Build Better Tests For Your dbt Projects With Datafold And data-diff: Data engineering is all about building workflows, pipelines, systems, and interfaces to provide stable and reliable data. Your data can be stable and wrong, but then it isn't reliable. Confidence in your data is achieved through constant validation and testing. Datafold has invested a lot of time into integrating with the workflow of dbt projects to add early verification that the changes you are making are correct. In this episode Gleb Mezhanskiy shares some valuable advice and insights into how you can build reliable and well-tested data assets with dbt and data-diff.
Podcast episode
Build Better Tests For Your dbt Projects With Datafold And data-diff: Data engineering is all about building workflows, pipelines, systems, and interfaces to provide stable and reliable data. Your data can be stable and wrong, but then it isn't reliable. Confidence in your data is achieved through constant validation and testing. Datafold has invested a lot of time into integrating with the workflow of dbt projects to add early verification that the changes you are making are correct. In this episode Gleb Mezhanskiy shares some valuable advice and insights into how you can build reliable and well-tested data assets with dbt and data-diff.
byData Engineering Podcast
0 ratings
0% found this document useful
Optimising the Future
Podcast episode
Optimising the Future
byDataCafé
0 ratings
0% found this document useful

Skip carousel

Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
The European Business Review
Article
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
May 25, 2021
8 min read
Data Centers Aren’t The Energy Hogs We Thought
Futurity
Article
Data Centers Aren’t The Energy Hogs We Thought
Feb 28, 2020
2 min read
Strategies For Procedural Modelling Of 3D Cities
3D World
Article
Strategies For Procedural Modelling Of 3D Cities
May 18, 2021
6 min read
Facilities Systems
Facility Management
Article
Facilities Systems
Oct 21, 2018
5 min read
Quantum Computing and The Rise Of Machine Learning
Techfastly
Article
Quantum Computing and The Rise Of Machine Learning
Oct 1, 2021
2 min read
Understanding ELT & ETL
Techfastly
Article
Understanding ELT & ETL
Apr 1, 2021
8 min read
How To Train Computers Faster For ‘Extreme’ Datasets
Futurity
Article
How To Train Computers Faster For ‘Extreme’ Datasets
Dec 12, 2019
4 min read
Grid Modeling Overview: Four Types of Models Guiding the Transition to Clean Electricity
Union of Concerned Scientists
Article
Grid Modeling Overview: Four Types of Models Guiding the Transition to Clean Electricity
Apr 25, 2022
6 min read
How And Where You Use Machine-learning
APC
Article
How And Where You Use Machine-learning
Oct 7, 2019
4 min read
Quantum Simulators An Overview
Techfastly
Article
Quantum Simulators An Overview
Oct 1, 2021
4 min read
Public Logs: The Benefits Outweigh the Risks
CQ Amateur Radio
Article
Public Logs: The Benefits Outweigh the Risks
Feb 1, 2020
5 min read
Quantum Computing Is Here…with One Small Caveat
PC Pro Magazine
Article
Quantum Computing Is Here…with One Small Caveat
Jan 4, 2024
7 min read
Quantum Computing Is Here… With One Small Caveat
APC
Article
Quantum Computing Is Here… With One Small Caveat
Feb 5, 2024
8 min read
Building Trends, Building Momentum
Facility Management
Article
Building Trends, Building Momentum
Oct 14, 2019
3 min read
How Quantum Computing Can Fight Climate Change
APC
Article
How Quantum Computing Can Fight Climate Change
Nov 28, 2022
8 min read
How Quantum Computing Can Fight Climate Change
PC Pro Magazine
Article
How Quantum Computing Can Fight Climate Change
Oct 8, 2022
8 min read
The Future Is All Quantum
Techfastly
Article
The Future Is All Quantum
Oct 1, 2021
2 min read
Business applications For Quantum computing
Rotman Management
Article
Business applications For Quantum computing
May 1, 2022
COMPUTERS DO ARITHMETIC. Underlying every amazing application of computers today is math, calculated using binary digits or ‘bits.’ The original computers of the early 1950s could perform about 465 multiplications per second — much faster than the ‘h
11 min read
Quantum Computing’s DISRUPTION IN Finance Industry
Techfastly
Article
Quantum Computing’s DISRUPTION IN Finance Industry
Oct 1, 2021
5 min read
Generative AI: What Leaders Need To Know
Rotman Management
Article
Generative AI: What Leaders Need To Know
Jan 1, 2024
12 min read
The AI race
Racecar Engineering
Article
The AI race
Jul 7, 2023
10 min read
Deep Learning Technique for Object Detection
Techfastly
Article
Deep Learning Technique for Object Detection
Jun 1, 2021
3 min read
Why The Future Needs Optical Data Centres
PC Pro Magazine
Article
Why The Future Needs Optical Data Centres
Sep 10, 2020
9 min read
Code A Cataloguing Application In Python
Linux Format
Article
Code A Cataloguing Application In Python
Nov 15, 2022
Credit: www.djangoproject.com Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. More featurepacked source code for this project can be downloaded from https://github.com/mat
8 min read
How Technology Commons Revolutionise Industry Foundations
The European Business Review
Article
How Technology Commons Revolutionise Industry Foundations
Feb 11, 2022
9 min read
태도가 건축이 될 때 When Attitude Becomes Architecture
Space
Article
태도가 건축이 될 때 When Attitude Becomes Architecture
Dec 5, 2023
12 min read
Test Gets Quantum Computers To Check Their Own Work
Futurity
Article
Test Gets Quantum Computers To Check Their Own Work
Nov 18, 2019
3 min read
Is The Future sustainable?
PC Pro Magazine
Article
Is The Future sustainable?
Jun 8, 2023
8 min read
Prototype Paves Way For ‘Computer-on-a-chip’
Futurity
Article
Prototype Paves Way For ‘Computer-on-a-chip’
Feb 22, 2019
2 min read
Machine Learning in Business: Issues for Society
Rotman Management
Article
Machine Learning in Business: Issues for Society
Jan 1, 2020
11 min read

Related categories

Skip carousel

Reviews for Metaheuristics for Big Data

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Metaheuristics for Big Data - Clarisse Dhaenens

Cover

Title

Copyright

Acknowledgments

Introduction

1 Optimization and Big Data

1.1. Context of Big Data

1.2. Knowledge discovery in Big Data

1.3. Performance analysis of data mining algorithms

1.4. Conclusion

2 Metaheuristics – A Short Introduction

2.1. Introduction

2.2. Common concepts of metaheuristics

2.3. Single solution-based/local search methods acceptance approach

2.4. Population-based metaheuristics

2.5. Multi-objective metaheuristics

2.6. Conclusion

3 Metaheuristics and Parallel Optimization

3.1. Parallelism

3.2. Parallel metaheuristics

3.3. Infrastructure and technologies for parallel metaheuristics

3.4. Quality measures

3.5. Conclusion

4 Metaheuristics and Clustering

4.1. Task description

4.2. Big Data and clustering

4.3. Optimization model

4.4. Overview of methods

4.5. Validation

4.6. Conclusion

5 Metaheuristics and Association Rules

5.1. Task description and classical approaches

5.2. Optimization model

5.3. Overview of metaheuristics for the association rules mining problem

5.4. General table

5.5. Conclusion

6 Metaheuristics and (Supervised) Classification

6.1. Task description and standard approaches

6.2. Optimization model

6.3. Metaheuristics to build standard classifiers

6.4. Metaheuristics for classification rules

6.5. Conclusion

7 On the Use of Metaheuristics for Feature Selection in Classification

7.1. Task description

7.2. Optimization model

7.3. Overview of methods

7.4. Conclusion

8 Frameworks

8.1. Frameworks for designing metaheuristics

8.2. Framework for data mining

8.3. Framework for data mining with metaheuristics

8.4. Conclusion

Conclusion

Bibliography

Index

End User License Agreement

List of Tables

4 Metaheuristics and Clustering

Table 4.1. Most widely used objective functions and their category

Table 4.2. Summary table of the some single objective algorithms for hard clustering. C = centroid-based encoding, VL = variable length, FL = fixed length, B = binary encoding

Table 4.3. Summary table of the most famous multi-objective clustering methods

5 Metaheuristics and Association Rules

Table 5.1. Some quality measures for association rules discovery

Table 5.2. Summary table of some metaheuristics for association rules

6 Metaheuristics and (Supervised) Classification

Table 6.1. Confusion matrix

7 On the Use of Metaheuristics for Feature Selection in Classification

Table 7.1. Overview of fitness function for feature selection in classification

Table 7.2. Overview of evolutionary feature selection applications from [HAM 13]

8 Frameworks

Table 8.1. Some available frameworks

Table 8.2. A comparison of frameworks extracted from [PAR 12]. The average has been computed over 12 frameworks

Table 8.3. A comparison of frameworks for data mining

List of Illustrations

Introduction

Figure I.1. Main phases of a Big Data process

1 Optimization and Big Data

Figure 1.1. Evolution of Google requests for Big Data (Google source)

Figure 1.2. Overview of the KDD process

Figure 1.3. Overview of main tasks and approaches in data mining

Figure 1.4. Statistical test summary [JAC 13b]

2 Metaheuristics – A Short Introduction

Figure 2.1. Solving a problem from the class

Figure 2.2. Neighborhood operator for the TSP

Figure 2.3. Objective space and specific points of a bi-objective problem

3 Metaheuristics and Parallel Optimization

Figure 3.1. Parallel multi-start model: several single solution-based metaheuristics are launched in parallel

Figure 3.2. Move acceleration model: the solution is evaluated in parallel

Figure 3.3. Sub-linear, linear and super-linear speedup

4 Metaheuristics and Clustering

Figure 4.1. An example of dendrogram

Figure 4.2. Optimizing both objectives simultaneously [GAR 12]

Figure 4.3. Multi-objective clustering a Pareto set of solutions [GAR 12]

Figure 4.4. Binary encoding with a fixed number of clusters from [JOS 16]

Figure 4.5. Binary encoding for representative from [JOS 16]

Figure 4.6. Integer encoding: label-based representation from [JOS 16]

Figure 4.7. Integer encoding: graph-based representation from [JOS 16]

6 Metaheuristics and (Supervised) Classification

Figure 6.1. Classification task

Figure 6.2. K-nearest neighbor method

Figure 6.3. Example of a decision tree to predict the flu

Figure 6.4. A three-layer artificial neural network

Figure 6.5. Linear support vector machine

Figure 6.6. Performance evaluation methodology in supervised classification

Figure 6.7. Cross validation (example of a 10-fold)

Figure 6.8. Receiver operating characteristic (ROC) curve

Figure 6.9. Venn diagram illustrating repartition of observations [IGL 06]

7 On the Use of Metaheuristics for Feature Selection in Classification

Figure 7.1. Filter model for feature selection: learned on the training set and tested on the test dataset

Figure 7.2. Wrapper model for feature selection.

Figure 7.3. Some representations for metaheuristic in feature selection for the selection of attributes 1,3,7,9. a) binary representation; b) fixed length representation; c) variable length representation

8 Frameworks

Figure 8.1. Clustering and tree exploration with Orange

Figure 8.2. Tree exploration with Rattle GUI

Figure 8.3. Tree exploration with RapidMiner

Figure 8.4. Decision tree with WEKA

Figure 8.5. LIONoso

Metaheuristics Set

coordinated by

Nicolas Monmarché and Patrick Siarry

Volume 5

Metaheuristics for Big Data

Clarisse Dhaenens

Laetitia Jourdan

First published 2016 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:

ISTE Ltd

27-37 St George’s Road

London SW19 4EU

www.iste.co.uk

John Wiley & Sons, Inc.

111 River Street

Hoboken, NJ 07030

USA

www.wiley.com

The rights of Clarisse Dhaenens and Laetitia Jourdan to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.

Library of Congress Control Number: 2016944993

British Library Cataloguing-in-Publication Data

A CIP record for this book is available from the British Library

ISBN 978-1-84821-806-2

Acknowledgments

This book is an overview of metaheuristics for Big Data. Hence it is based on a large literature review conducted by the authors in the Laboratory CRIStAL (Research Center in Computer Science, Signal and Automatics), University of Lille and CNRS, France and in the Lille Nord Europe Research Center of INRIA (French National Institute for Computer Science and Applied Mathematics) between 2000 and the present. We are grateful to our former and current PhD students and colleagues for all the work they have done together with us that has led to this book.

We are particularly grateful to Aymeric Blot, Fanny Dufossé, Lucien Mousin and Maxence Vandromme who read and corrected the first versions of this book. A special word of gratitude to Marie-Elénore Marmion who read carefully and commented on several chapters.

We would like to thank Nicolas Monmarché and Patrick Siarry for their proposal to write this book and for their patience! Sorry for the time we took.

Finally, we would like to thank our families for their support and love.

Clarisse DHAENENS and Laetitia JOURDAN

Introduction

Big Data: a buzzword or a real challenge?

Both answers are suitable. On the one hand, the term Big Data has not yet been well defined, although several attempts have been made to give it a definition. Indeed, the term Big Data does not have the same meaning according to the person who uses it. It could be seen as a buzzword: everyone talks about Big Data but no one really manipulates it.

On the other hand, the characteristics of Big Data, often reduced to the three Vs – volume, variety and velocity – introduce plenty of new technological challenges at different phases of the Big Data process. These phases are presented in a very simple way in Figure I.1.

Starting from the generation of data, its storage and management, analyses can be made to help decision-making. This process may be reiterated if additional information is required. At each phase, some important challenges arise.

Indeed, during the generation and capture of data, some challenges may be related to technological aspects that are linked to the acquisition of real-time data, for example. However, at this phase, challenges are also related to the identification of meaningful data.

The storage and management phase leads to two critical challenges: first, the infrastructures for the storage of data and its transportation; second, conceptual models to provide well-formed available data that may be used for analysis.

Figure I.1. Main phases of a Big Data process

Then, the analysis phase has its own challenges, with the manipulation of heterogeneous massive data. In particular, when considering the knowledge extraction, in which unknown patterns have to be discovered, analysis may be very complex due to the nature of data manipulated. This is at the heart of data mining. A way to address data mining problems is to model them as optimization problems. In the context of Big Data, most of these problems are large-scale ones. Hence metaheuristics seem to be good candidates to tackle them. However, as we will see in the following, metaheuristics are suitable not only to address the large size of the problem, but also to deal with other aspects of Big Data, such as variety and velocity.

The aim of this book is to present how metaheuristics can provide answers to some of the challenges induced by the Big Data context and particularly within the data analytics phase.

This book is composed of three parts. The first part is an introductory part consisting of three chapters. The aim of this part is to provide the reader with elements to understand the following aspects.

Chapter 1, Optimization and Big Data, provides elements to understand the main issues led by the Big Data context. It then reveals what characterizes Big Data and focuses on the analysis phase and, more precisely, on the data mining task. This chapter indicates how data mining problems may be seen as combinatorial optimization problems and justifies the use of metaheuristics to address some of these problems. A section is also dedicated to the performance evaluation of algorithms, as in data mining, a specific protocol has to be followed.

Chapter 2 presents an introduction to metaheuristics, to make this book self-contained. First, common concepts of metaheuristics are presented and then the most widely known metaheuristics are described with a distinction between single solution-based and population-based methods. A section is also dedicated to multi-objective metaheuristics, as many of them have been proposed to deal with data mining problems.

Chapter 3 provides indications on parallel optimization and the way metaheuristics may be parallelized to tackle very large size problems. As it will be revealed, the parallelization is considered not only to deal with large problems, but also to provide better quality solutions.

The second part, composed of the following four chapters, is the heart of the book. Each of these chapters details a data mining task and indicates how metaheuristics can be used to deal with it.

Chapter 4 begins the second part of the book and is dedicated to clustering. This chapter first presents the clustering task that aims to group similar objects and some of the classical approaches to solve it. Then, the chapter provides indications on the modeling of the clustering task as an optimization problem and focuses on the quality measures that are commonly used, on the interest of a multi-objective resolution approach and on the representation of a solution in metaheuristics. An overview of multi-objective methods is then proposed. The chapter ends with a specific and difficult point in the clustering task: how the estimation of the quality of a clustering solution and its validation can be done.

Chapter 5 deals with association rules. It first describes the corresponding data mining task and the classical approach: the a priori algorithm. Then, the chapter indicates how this task may be modeled as an optimization task and then focuses on metaheuristics proposed to deal with this task. It differentiates the metaheuristics according to the type of rules that are considered: categorical association rules, quantitative association rules or fuzzy association rules. A general table summarizes the most important works of the literature.

Chapter 6 is dedicated to supervised classification. Data mining is of great importance as it allows the prediction of the class of a new observation regarding information from observations whose classes are known. The chapter first gives a description of the classification task and briefly presents standard classification methods. Then, an optimization perspective of some of these standard methods is presented as well as the use of metaheuristics to optimize some of them. The last part of the chapter is dedicated to the use of metaheuristics for the search of classification rules, viewed as a special case of association rules.

Chapter 7 deals with feature selection for classification that aims to reduce the number of attributes and to improve the classification performance. The chapter uses several notions that are presented in Chapter 6 on classification. After a presentation of generalities on feature selection, the chapter gives its modeling as an optimization problem. Different representations of solutions and their associated search mechanisms are then presented. An overview of metaheuristics for feature selection is finally proposed.

Finally, the last part is composed of a single chapter (Chapter 8) which presents frameworks dedicated to data mining and/or metaheuristics. A short comparative survey is provided for each kind of framework.

Browsing the different chapters, the reader will have an overview of the way metaheuristics have been applied so far to tackle problems that are present in the Big Data context, with a focus on the data mining part, which provides the optimization community with many challenging opportunities of applications.

Optimization and Big Data

The term Big Data refers to vast amounts of information that come from different sources. Hence Big Data refers not only to this huge data volume but also to the diversity of data types, delivered at various speeds and frequencies. This chapter attempts to provide definitions of Big Data, the main challenges induced by this context, and focuses on Big Data analytics.

1.1. Context of Big Data

As depicted in Figure 1.1, the evolution of Google requests on the term Big Data has grown exponentially since 2011.

Figure 1.1. Evolution of Google requests for Big Data (Google source)

How can we explain the increasing interest in this subject? Some responses may be formulated, when we know that everyday 2.5 quintillion bytes of data are generated – such that 90% of the data in the world today have been created in the last two years. These data come from everywhere, depending on the industry and organization: sensors are used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records and cellphone GPS signals, to name but a few [IBM 16b]. Such data are recorded, stored and analyzed.

1.1.1. Examples of situations

Big Data appears in a lot of situations where large amounts of complex data are generated. Each situation presents challenges to handle. We may cite some examples of such situations:

– Social networks: the quantity of data generated in social networks is huge. Indeed, monthly estimations indicate that 12 billion tweets are sent by about 200 million active users, 4 billion hours of video are watched on YouTube and 30 billion pieces of content are shared on Facebook [IBM 16a]. Moreover, such data are of different formats/types.

– Traffic management: in the context of creation of smart cities, the traffic within cities is an important issue. This becomes feasible, as the widespread adoption in recent years of technologies such as smartphones, smartcards and various sensors has made it possible to collect, store and visualize information on urban activities such as people and traffic flows. However, this also represents a huge amount of data collected that need to be managed.

– Healthcare: in 2011, the global size of data in healthcare was estimated as 150 exabytes. Such data are unique and difficult to deal with because: 1) data are in multiple places (different source systems in different formats including text as well as images); 2) data are structured and unstructured; 3) data may be inconsistent (they may have different definitions according to the person in charge of filling data); 4) data are complex (it is difficult to identify standard processes); 5) data are subject to regulatory requirement changes [LES 16].

– Genomic studies: with the rapid progress of DNA sequencing techniques that now allows us to identify more than 1 million SNPs (genetic variations), large-scale genome-wide association studies (GWAS) have become practical. The aim is to track genetic variations that may, for example, explain genetic susceptibility for a disease. In their analysis on the

Enjoying the preview?

Page 1 of 1

Metaheuristics for Big Data

About this ebook

Clarisse Dhaenens

Related authors

Related to Metaheuristics for Big Data

Related ebooks

Computers For You

Related podcast episodes

Related articles

Related categories

Reviews for Metaheuristics for Big Data

What did you think?

Book preview

Metaheuristics for Big Data - Clarisse Dhaenens

Table of Contents

Acknowledgments

Introduction