Cognitive Big Data Intelligence with a Metaheuristic Approach
()
About this ebook
Cognitive Big Data Intelligence with a Metaheuristic Approach presents an exact and compact organization of content relating to the latest metaheuristics methodologies based on new challenging big data application domains and cognitive computing. The combined model of cognitive big data intelligence with metaheuristics methods can be used to analyze emerging patterns, spot business opportunities, and take care of critical process-centric issues in real-time. Various real-time case studies and implemented works are discussed in this book for better understanding and additional clarity.
This book presents an essential platform for the use of cognitive technology in the field of Data Science. It covers metaheuristic methodologies that can be successful in a wide variety of problem settings in big data frameworks.
- Provides a unique opportunity to present the work on the state-of-the-art of metaheuristics approach in the area of big data processing developing automated and intelligent models
- Explains different, feasible applications and case studies where cognitive computing can be successfully implemented in big data analytics using metaheuristics algorithms
- Provides a snapshot of the latest advances in the contribution of metaheuristics frameworks in cognitive big data applications to solve optimization problems
Related to Cognitive Big Data Intelligence with a Metaheuristic Approach
Related ebooks
Diagnostic Biomedical Signal and Image Processing Applications With Deep Learning Methods Rating: 0 out of 5 stars0 ratingsBlockchain Technology Solutions for the Security of IoT-Based Healthcare Systems Rating: 0 out of 5 stars0 ratingsDigital Image Enhancement and Reconstruction Rating: 0 out of 5 stars0 ratingsAdvances in Computational Techniques for Biomedical Image Analysis: Methods and Applications Rating: 0 out of 5 stars0 ratingsCognitive Informatics, Computer Modelling, and Cognitive Science: Volume 2: Application to Neural Engineering, Robotics, and STEM Rating: 0 out of 5 stars0 ratingsCognitive Systems and Signal Processing in Image Processing Rating: 0 out of 5 stars0 ratingsCognitive and Soft Computing Techniques for the Analysis of Healthcare Data Rating: 0 out of 5 stars0 ratingsComputer Vision for Microscopy Image Analysis Rating: 0 out of 5 stars0 ratingsComputational Intelligence and Its Applications in Healthcare Rating: 0 out of 5 stars0 ratingsMeta Learning With Medical Imaging and Health Informatics Applications Rating: 0 out of 5 stars0 ratingsDeep Learning for Medical Applications with Unique Data Rating: 0 out of 5 stars0 ratingsSoft Computing Based Medical Image Analysis Rating: 0 out of 5 stars0 ratingsNeutrosophic Set in Medical Image Analysis Rating: 0 out of 5 stars0 ratingsDeep Learning and Parallel Computing Environment for Bioengineering Systems Rating: 0 out of 5 stars0 ratingsAdvanced Machine Vision Paradigms for Medical Image Analysis Rating: 0 out of 5 stars0 ratingsHandbook of Computational Intelligence in Biomedical Engineering and Healthcare Rating: 0 out of 5 stars0 ratingsMachine Learning for Biometrics: Concepts, Algorithms and Applications Rating: 0 out of 5 stars0 ratingsAdvanced Methods in Biomedical Signal Processing and Analysis Rating: 0 out of 5 stars0 ratingsDeep Learning Techniques for Biomedical and Health Informatics Rating: 0 out of 5 stars0 ratingsCognitive Computing for Human-Robot Interaction: Principles and Practices Rating: 0 out of 5 stars0 ratingsIntelligent Speech Signal Processing Rating: 0 out of 5 stars0 ratingsSemantic Models in IoT and eHealth Applications Rating: 0 out of 5 stars0 ratingsMachine Learning, Big Data, and IoT for Medical Informatics Rating: 0 out of 5 stars0 ratingsTrends in Deep Learning Methodologies: Algorithms, Applications, and Systems Rating: 0 out of 5 stars0 ratingsInternet of Multimedia Things (IoMT): Techniques and Applications Rating: 0 out of 5 stars0 ratingsMachine Learning in Bio-Signal Analysis and Diagnostic Imaging Rating: 0 out of 5 stars0 ratingsBio-Inspired Computation and Applications in Image Processing Rating: 0 out of 5 stars0 ratingsHandbook of Decision Support Systems for Neurological Disorders Rating: 0 out of 5 stars0 ratingsRecent Trends in Computational Intelligence Enabled Research: Theoretical Foundations and Applications Rating: 0 out of 5 stars0 ratingsHandbook of Metaheuristic Algorithms: From Fundamental Theories to Advanced Applications Rating: 0 out of 5 stars0 ratings
Intelligence (AI) & Semantics For You
Midjourney Mastery - The Ultimate Handbook of Prompts Rating: 5 out of 5 stars5/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5AI for Educators: AI for Educators Rating: 5 out of 5 stars5/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 5 out of 5 stars5/5101 Midjourney Prompt Secrets Rating: 3 out of 5 stars3/5Killer ChatGPT Prompts: Harness the Power of AI for Success and Profit Rating: 2 out of 5 stars2/5How To Become A Data Scientist With ChatGPT: A Beginner's Guide to ChatGPT-Assisted Programming Rating: 5 out of 5 stars5/5ChatGPT For Dummies Rating: 0 out of 5 stars0 ratingsChatGPT For Fiction Writing: AI for Authors Rating: 5 out of 5 stars5/5ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology Rating: 0 out of 5 stars0 ratingsTensorFlow in 1 Day: Make your own Neural Network Rating: 4 out of 5 stars4/5Artificial Intelligence: A Guide for Thinking Humans Rating: 4 out of 5 stars4/5ChatGPT Rating: 3 out of 5 stars3/5A Quickstart Guide To Becoming A ChatGPT Millionaire: The ChatGPT Book For Beginners (Lazy Money Series®) Rating: 4 out of 5 stars4/5Make Money with ChatGPT: Your Guide to Making Passive Income Online with Ease using AI: AI Wealth Mastery Rating: 0 out of 5 stars0 ratingsChat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures Rating: 4 out of 5 stars4/5The Secrets of ChatGPT Prompt Engineering for Non-Developers Rating: 5 out of 5 stars5/5Dark Aeon: Transhumanism and the War Against Humanity Rating: 5 out of 5 stars5/52084: Artificial Intelligence and the Future of Humanity Rating: 4 out of 5 stars4/5Enterprise AI For Dummies Rating: 3 out of 5 stars3/5
Reviews for Cognitive Big Data Intelligence with a Metaheuristic Approach
0 ratings0 reviews
Book preview
Cognitive Big Data Intelligence with a Metaheuristic Approach - Sushruta Mishra
Cognitive Big Data Intelligence with a Metaheuristic Approach
Editor
Sushruta Mishra
School of Computer Engineering, Kalinga Institute of Industrial technology (KIIT) University, Bhubaneswar, Odisha, India
Editor
Hrudaya Kumar Tripathy
School of Computer Engineering, Kalinga Institute of Industrial technology (KIIT) University, Bhubaneswar, Odisha, India
Editor
Pradeep Kumar Mallick
School of Computer Engineering, Kalinga Institute of Industrial technology (KIIT) University, Bhubaneswar, Odisha, India
Editor
Arun Kumar Sangaiah
School of Computer Science, The University of Adelaide, Adelaide, SA, Australia
Editor
Gyoo-Soo Chae
Division of ICT, Baekseok University, Cheonan, South Korea
Table of Contents
Cover image
Title page
Copyright
Contributors
Preface
Chapter 1. A discourse on metaheuristics techniques for solving clustering and semisupervised learning models
1. Introduction
2. Overview of clustering
3. Conclusion
Chapter 2. Metaheuristics in classification, clustering, and frequent pattern mining
1. Introduction
2. Metaheuristics in classification
3. Metaheuristics in clustering
4. Metaheuristics in frequent pattern mining
5. Conclusion
Chapter 3. Impacts of metaheuristic and swarm intelligence approach in optimization
1. Introduction
2. Concepts of Metaheuristic
3. Metaheuristic techniques
4. Swarm intelligence techniques
5. Impacts of metaheuristic and swarm intelligence approach in optimization
6. Conclusion
Chapter 4. A perspective depiction of heuristics in virtual reality
1. Introduction to virtual reality
2. Heuristics in brief
3. Virtual reality–enabled case studies
4. Performance evaluation and discussion
5. Conclusion
Chapter 5. A heuristic approach of web users decision-making using deep learning models
1. Introduction
2. Analysis of user online behavior using deep learning models
3. Greedy algorithm as the heuristic
4. Background study
5. Description of the dataset
6. Implementation and discussion
7. Conclusion
Chapter 6. Inertia weight strategies for task allocation using metaheuristic algorithm
1. Introduction
2. Related work
3. Standard PSO
4. Model of task allocation in VM
5. Inertia weight strategy
6. Performance evaluation
7. Conclusion and future work
Chapter 7. Big data classification with IoT-based application for e-health care
1. Introduction
2. State of the art
3. Big data in health care
4. Classification techniques
5. IoT-based smart biomedical data acquisition and processing system
6. Multiagent system for biomedical data processing
7. Detection of cardiac abnormalities
8. Results and discussion
9. Conclusion
Chapter 8. Study of bio-inspired neural networks for the prediction of liquid flow in a process control system
1. Introduction
2. Related work
3. Experimental setup
4. Preliminary details of the algorithm
5. Proposed model
6. Results and discussion
7. Conclusions and future work
Chapter 9. Affordable energy-intensive routing using metaheuristics
1. Introduction
2. Literature survey
3. Problem description
4. Routing
5. Routing algorithms
6. Routing table
7. Metaheuristics
8. Metaheuristics for efficient routing
9. Proposed solution using metaheuristics
10. Conclusion
Chapter 10. Semantic segmentation for self-driving cars using deep learning: a survey
1. Introduction
2. Semantic segmentation for autonomous driving
3. Deep learning
4. Related work
5. Experimental results
6. Conclusion
Chapter 11. Cognitive big data analysis for E-health and telemedicine using metaheuristic algorithms
1. Introduction
2. Cognitive computing technologies for E-health care
3. Cognitive big data analytics for E-health care
4. Need for cognitive big data analytics in E-health care
5. Advantages of cognitive big data analytics in E-health care
6. Challenges of cognitive big data analytics in E-health care
7. Metaheuristic approach for optimization of cognitive big data healthcare
8. Cognitive big data analytics use cases in E-health care
9. Future of cognitive big data analytics in E-health care
10. Market analysis of cognitive big data analytics in E-health care
11. Cognitive big data players in E-health care
Chapter 12. Multicriteria recommender system using different approaches
1. Introduction
2. Related work
3. Working principle
4. Proposed approaches
5. Experimental data analysis
6. Result
7. Conclusion
Chapter 13. Optimization-based energy-efficient routing scheme for wireless body area network
1. Introduction
2. Related work
3. Case study on an energy-efficient hybrid C-means donkey-smuggler optimization-based routing technique for a wireless sensor network
4. Analysis of the previous approach
5. Conclusion
Chapter 14. Livestock health monitoring using a smart IoT-enabled neural network recognition system
1. Introduction
2. System architecture
3. Recognition of a diseased bird by the central monitoring unit using Raspberry Pi
4. Results and discussion
5. Conclusion
Chapter 15. Preserving healthcare data: from traditional encryption to cognitive deep learning perspective
1. Introduction
2. Related works
3. Encryption algorithms
4. Performance evaluation
5. Future challenges of cognitive encryption models in healthcare
6. Conclusion
Index
Copyright
Academic Press is an imprint of Elsevier
125 London Wall, London EC2Y 5AS, United Kingdom
525 B Street, Suite 1650, San Diego, CA 92101, United States
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom
Copyright © 2022 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the Library of Congress
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
ISBN: 978-0-323-85117-6
For information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals
Publisher: Mara Conner
Acquisitions Editor: Sonnini R. Yura
Editorial Project Manager: Michelle Fisher
Production Project Manager: Sreejith Viswanathan
Cover Designer: Christian J. Bilbow
Typeset by TNQ Technologies
Contributors
Angelia Melani Adrian, Informatics Engineering Department, De La Salle Catholic University, Manado City, Indonesia
Abhishek Banerjee, Pailan College of Management and Technology, Pailan, Joka, Kolkata, West Bengal, India
Aradhana Behura, Department of Computer Science & Engineering, Veer Surendra Sai University of Technology, Burla, Odisha, India
Sukant Kishoro Bisoy, Department of Computer Science and Engineering, C.V. Raman Global University, Bhubaneswar, Odisha, India
Korhan Cengiz, Department of Telecommunication, Trakya University, Edirne, Turkey
Chandramouli Das, School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, Odisha, India
Ankit Desai, Embibe, Bengaluru, Karnataka, India
Pijush Dutta, Department of Electronics and Communication Engineering, Global Institute of Management and Technology, Krishnagar, West Bengal, India
Priyom Dutta, School of Computer Engineering, KIIT University, Bhubaneswar, Odisha, India
Manas Ranjan Kabat, Department of Computer Science & Engineering, Veer Surendra Sai University of Technology, Burla, Odisha, India
Nishant Kashyap, C.V. Raman Global University, Bhubaneswar, Odisha, India
Asok Kumar, Dean of Student Welfare Department, Vidyasagar University, Medinipur, West Bengal, India
B.S. Mahanand, Department of Information Science and Engineering, Sri Jayachamarajendra College of Engineering, JSS Science and Technology University, Mysuru, Karnataka, India
Anjana Mishra, C.V. Raman Global University, Bhubaneswar, Odisha, India
Sushruta Mishra, Kalinga Institute of Industrial Technology Deemed to be University, Bhubaneswar, Odisha, India
Ricky Mohanty, Department of Electronics & Telecommunication, Orissa Engineering College, Bhubaneswar, Odisha, India
Mihir Narayan Mohanty, ITER, Siksha ‘O’ Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India
Saumendra Kumar Mohapatra, ITER, Siksha ‘O’ Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India
Ira Nath, JIS College of Engineering, Kalyani, West Bengal, India
Subhendu Kumar Pani, Krupajal Computer Academy, Bhubaneswar, Odisha, India
Moushita Patnaik, School of Computer Engineering, KIIT University, Bhubaneswar, Odisha, India
Arabinda Pradhan, Department of Computer Science and Engineering, C.V. Raman Global University, Bhubaneswar, Odisha, India
Chittaranjan Pradhan, School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, Odisha, India
Rojanlina Priyadarshini, Department of Computer Science and Information Technology, C.V. Raman Global University, Bhubaneswar, Odisha, India
Deepak Rai, Department of Computer Science and Engineering, National Institute of Technology, Patna, Bihar, India
Priyanka Ray, Kalinga Institute of Industrial Technology Deemed to be University, Bhubaneswar, Odisha, India
Vaisnav Roy, School of Computer Engineering, KIIT University, Bhubaneswar, Odisha, India
Sudipta Sahana, JIS College of Engineering, Kalyani, West Bengal, India
Abhaya Kumar Sahoo, School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, Odisha, India
Prasan Kumar Sahoo, Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan City, Taiwan
Qusay Sellat, Department of Computer Science and Engineering, C.V. Raman Global University, Bhubaneswar, Odisha, India
Hrushikesh Shukla, School of Computer Science and Engineering, Dr. Vishwanath Karad MIT World Peace University, Pune, Maharashtra, India
Dharmpal Singh, JIS College of Engineering, Kalyani, West Bengal, India
Hiren Kumar Thakkar, Department of Computer Science and Engineering, School of Engineering and Sciences, SRM University, Mangalagiri, Andhra Pradesh, India
Preface
In recent times, information industry has experienced rapid changes in both the platform scale and scope of applications. Computers, smartphones, clouds, social networks, and supercomputers demand not only high performance but also a high degree of machine intelligence. At present, Big data is playing a significant role in the field of information technology and, more specifically, in the Data Science domain. Many solutions are being offered in addressing big data analytics. Analysis of data by humans can be a time-consuming activity, and thus, the use of sophisticated cognitive systems can be utilized to crunch this enormous amount of data. Cognitive computing can be utilized to reduce the shortcomings of the concerns faced during big data analytics. Thus, we are entering an era of big data and cognitive computing. To meet these new computing and communication changes, we must upgrade the clouds and the computing ecosystem with new capabilities, such as machine learning, IoT sensing, data analytics, and cognitive machines mimicking human intelligence. Metaheuristic algorithms have proven to be effective, robust, and efficient in solving real-world optimization, clustering, forecasting, classification, and other engineering problems. The ability of metaheuristics algorithms includes managing a set of solutions, attending multiple objectives, as well as their ability to optimize various values, which allows them to fit in dealing with big data analytics. Metaheuristic algorithms have become powerful and famous in computational intelligence and many applications. This volume intends to project different frameworks and applications of cognitive big data analytics using the metaheuristics approach.
This book is designed as a self-contained edition so that readers can understand different aspects of recent innovations of metaheuristics to knowledge discovery problems in the context of Big Data and Cognitive computing. As many as 15 chapters are made as part of this book. Different application domains related to the scope of this book are discussed which include metaheuristics in clustering and classification, swarm intelligence, heuristics in virtual reality, deep learning, and big data with IoT and recommendation system among others.
Chapter 1: A discourse on metaheuristics techniques for solving clustering and semisupervised learning models
Nishant Kashyap, and Anjana Mishra C.V. Raman Global University, Bhubaneswar, Odisha, India
Abstract
This study presents a succinct view on metaheuristics techniques for solving problems pertinent to the field of data mining and deep learning, especially those cognate to unsupervised and semisupervised learning models. As is implied by one of the four Vs of data, our learning models must comply with the proliferating volume of data available in contemporary world. By undertaking the erstwhile study, we endeavor to do away with the computational time related to such high volumes of data at the cost of optimality, accuracy, and precision of our learning algorithms. We delve upon certain prevalent metaheuristic techniques ranging from evolutionary algorithms like genetic algorithm to particle swarm optimization, ant colony optimization, or an ensemble of different learning models to name a few, so as to instill an understanding of the methods used and thereafter apply those to our learning algorithms.
Keywords
Combinatorial optimization; Deep learning; Evolutionary algorithms; Heuristics; K-means clustering
1. Introduction
A rapid surge of machine learning algorithms have been seen in the last decade which has also pointed out the need for state-of-the art optimization techniques to deal with the large amount of data involved. Metaheuristics involving data-driven methods have shown their effectiveness in better quality of solutions and better convergence rate.
Metaheuristics methods explore the solution space to find globally optimum solution and improve upon the general searching procedure provided by the heuristic methods. They guide a basic and simple heuristic method by inculcating various concepts to exploit the search space of the given problem [1]. Metaheuristics techniques are generally nondeterministic and approximate but give good enough solution in reasonable time. Besides, these methods are particularly well suited to solving nonconvex optimization problems including the likes of those encountered during clustering. A wide range of metaheuristics have been discovered ranging from search-based methods like simulated annealing and tabu search to the popular nature inspired and the evolutionary algorithms. A brief discussion on the widely used metaheuristics is discussed in our study.
Clustering and semisupervised methods are a hot topic today, finding its uses in multiple important fields ranging from big data, wireless sensor networks to bioinformatics to name a few [2]. As such, it has become one of the most sought after areas in research to improve the performance of the existing methods involved. Most of the algorithms used for clustering and classification suffer from drawbacks related to being trapped in the local maxima or minima. Due to this, we use various nature-inspired metaheuristic techniques to find globally optimal solutions in reasonable amount of computation time.
Before starting out with the metaheuristics, we start out with an overview of the standard methods for clustering.
2. Overview of clustering
2.1. K-means clustering
Clustering using k-means basically culminates to assigning the data vectors, Zp accurately to k clusters. Basically, the aim is to minimize the sum of the squared errors, i.e., by repeating the following steps for some number of iteration (or till some termination condition is reached):
1. Randomly initialize the cluster centroids. Then, for each data vector Zp, assign it the cluster with centroids ki(i=1,2,3, …,k) such that ||Zp – ki|| is minimum (here, ||x|| represents the Euclidean norm of x)
2. For each cluster, find a the new cluster centroid, Ki such that Ki= (where |ki| is the number of data vectors in cluster ki)
2.2. Hierarchical clustering
In hierarchical clustering, the required number of clusters is formed in a hierarchical manner. For some n number of data points, initially we assign each data point to n clusters, i.e., each point in a cluster in itself. Thereafter, we merge two points with the least distance between them into a single cluster. The distance of the other points from this cluster made of two points is the least distance from either of the two points from the other points. This process is continued until we get the required number of clusters (Fig. 1.1).
2.3. Fuzzy C-means
This is used when the data points are fuzzy, i.e., they can belong to two or more clusters at the same time. Each data point (there being n data points) belongs to some cluster with some weight (Fig. 1.2).
Gij where Gij denotes belongingness (weight) the ith data point to the jth cluster cj (with centroids kj). It is important to note that all weights of a data point add up to 1 as shown in Ref. [3]. In fuzzy c-means, the squared error function that we want to minimize is Eq. (1.1)
Figure 1.1 Hierarchical clustering, closest points clustered first.
Figure 1.2 Clustering on fuzzy sets.
(1.1)
Here, d is some value from 1 to infinity which is used to control the weights.
At first all the values of the weights are initialized. Thereafter, the following steps are repeated till some termination condition:
I. Calculate the cluster centroids.
II. Update the weights.
The cluster centroids are updated as follows:
(1.2)
And the weights are updated as follows:
(1.3)
2.4. Model-based clustering
This form of clustering is based on probabilistic distribution. Some models are used to cluster the data and then the relation between the data points and the model is optimized. Generally, Gaussian distribution is used for continuous points and Poisson for discreet points. In this regard, the EM algorithm is widely used to find the mixture of Gaussians [4]. For other models, refer to Ref. [5].
2.5. Particle swarm optimization
The particle swarm optimization (PSO) is a swarm-based optimization technique which draws inspiration from the foraging behavior of swarms of birds or that of fish schooling. This optimization technique is quite useful when continuous nonlinear functions are involved [6].
Let us indulge ourselves in a scenario which depicts the intuition behind technique in regard to the behavior of foraging swarms. Suppose there is a swarm of bird foraging for food which is limited and is concentrated in a particular location. A naïve strategy to reach the location is that (Fig. 1.3) each and every bird searches the entire area (search space); however, this might lead to some birds never reaching the location due to random nature of the search or some reaching so late that the resource is already depleted. Instead, what the birds do is that each of them considers the best position (position of the bird which is closest to the food source) among all the birds and keeps a memory of their previous best position (their closest individual position from the food source) to update their position and velocity. This connection amongst the birds and the fact that the best way to search is to follow the bird who is closest to the source form the basis of the technique [7].
Figure 1.3 Birds flocking together.
While dealing with a numeric optimization problem using PSO, initially there can be lot of solutions. Each solution is denoted as a particle and each is particle is associated with a position and velocity at each turn of updation [8]. Besides, each particle keeps track of its best position Pbest (where it was closest to the global optimum) and that of the global best [9], Gbest (position of the particle closest to the global optimum).The velocity and position equation are given as follows:
(1.4)
(1.5)
{position should be velocity∗time, but since updation is done over a single turn time can be thought as 1}
Here, V(t) is the velocity associated with the particle at time t.
X(t) is the position of the particle at time t
r1, r2 are random values ranging from 0 to 1
c1 is a constant called the cognitive (local or personal) weight
c2 is a constant called the social (or global) weight.
W is a constant factor called the inertia weight.
Let us consider a function to understand the workings better. Suppose we have to find the global minima of
Now, although it is apparent that the plot will have a minima at (x,y) = (0,0) with a value of 0, nonetheless, we can see the convergence of the PSO quite well here. Let us start at a point (x,y) = (37.5,18.75). Here, F(x) will be 1832.81. Let V(t) be (−1,−2), W be 0.6, c1 and c2 be 1.4, and r1, r2 be 0.4 and 0.5, respectively. Let the Pbest be (31.25,12.5) yielding a value 1195.31 and the Gbest be (12.5,18.75) yielding a value 532.81 (Fig. 1.4).
Figure 1.4 Plot of the function.
Now putting the above values, we get
which is closer to the optimal point (0,0).
2.6. Clustering using PSO
In a clustering problem, let Wp represent a data vector, k be the number of clusters, and ki be the number of particles in cluster i.
Each particle can be thought of as a set of k cluster centroids, i.e., the ith particle is determined as follows:
where mij is the jth cluster centroid of the ith particle. Unless stated otherwise, we will be using the aforementioned symbols throughout the chapter.
The PSO equation for the ith particle can be then written as follows:
(1.6)
(1.7)
In general, lower the intracluster distance (the distance between all the data vectors and the centroid in a particular cluster), better the results are. Henceforth, we can define the quantization error for a particle i as the average intracluster error as
(1.8)
Clustering using PSO runs on the following algorithm:
❖ Initially, each and every particle is randomly initialized, i.e., all the k cluster centroids of the particle are randomly initialized and some velocity associated with them.
❖ For some number of iteration t, we do the following:
▪ For each particle i:
➢ For data vector Wp, we first calculate ||Wp−Cij|| (Euclidean norm) to all the cluster centroids of particle i and then assign each data vector to the cluster centroids with which it has the least Euclidean norm among all the cluster centroids.
➢ Then, the fitness for the particle is calculated using the fitness equation.
▪ The Gbesst and Pbest are then changed accordingly after which the cluster centroids are updated using the aforementioned Eqs. (1.6) and (1.7).
While k-means for clustering suffers from the problem of being trapped in the local optimal positions (resulting in less accurate clusters) due to initial assumptions, clustering using PSO can give the global best resulting in more accurate clusters. However, it has the problem of very slow convergence rate toward the end. To do away with this problem, the hybrid PSO and k-means clustering algorithm can be used wherein the performance of PSO is improved by seeding the initial particle swarm with the results of k-mean. In this method, every particle is initialized randomly except one which is initialized with the result of k-mean. After that, the PSO clustering algorithm is run.
2.7. Ant colony optimization
The ant colony optimization (ACO) is another swarm algorithm which is inspired by the foraging behavior of ants. It is a stochastic metaheuristics technique for solving a bunch of nonlinear optimization problems. The main idea of the algorithm lies in its pheromone model which depends on probabilistic sampling.
Since most of the species of ants are devoid of eyesight or have blurry eyesight, they scatter along random routes from their nest to the food source when they initially start with each route having an equal probability. Each ant deposits a certain amount of pheromone on the path that it traverses. Eventually, they perceive the shortest path by following the path with the largest amount of pheromone entrails left over a period of time by all the ants.
The situation can be better understood by the following situation:
Initially, ants travel with equal probability along the paths a and b. Let us start from the situation when only two ants have started out (since both paths have equal probability). One moves along path a and the other along path b. Since path b is shorter, the ant on that path traverses to and fro from nest to food source back to nest quicker than the ant on the other path all while leaving pheromones. When another ant starts, it will move on the path with more pheromones that is b. Likewise, over time, the probability of choosing path b increases due to more pheromone concentration on that path compared to the other as path b has a shorter traverse time (Fig. 1.5).
The amount of pheromone on a path also depends on a lot of other factors like evaporation factor and other environmental factor. However, while considering ACO, only the factor is taken into consideration.
The ACO generally consist of two parts:
Figure 1.5 Ant movement initially and finally.
2.7.1. Constructing the ant probability solution
The ant probability solution refers to the probability, Pij, that an ant moves from node i to j where i and j are some nodes along the path when the problem is represented as a graph problem. The probability Pij is given by the following:
(1.9)
Here, τij indicates how much pheromone is deposited on the path from i to j.
α indicates the parameter that is used to control
ηij indicates the ant's desirability to go through path from i to j
β indicates the control parameter of ηij
2.7.2. Updating the pheromone amount
(1.10)
Here, ρ is the pheromone evaporation rate and the last term represents the amount of pheromone deposited which is given for some by 1/len, where len is the length of the path from i to j. If the ant does not travel via that route, the last term is taken as 0. This update when applied to all the ants includes a summation with the last term.
2.8. Clustering using ACO
Initially, all the data vectors are randomly projected on a 2d grid. The algorithm is based on the pickings and droppings of the data points on the 2d grid. The ants (agents) move around the 2d grid in a random manner picking and dropping the data points. The pickup and drop off probability of the data points, though random, are determined by neighborhood factors. The pickup probability increases in the case when the neighborhood data items have different labels, i.e., they are dissimilar [10]. On the other hand, if the neighborhood data items are similar, the drop off probability increases. The pickoff probability is given by
(1.11)
and the drop off probability is given by
(1.12)
Here, f is the fraction of items in the neighborhood of the agent.
I1 and I2 are threshold constants.
In the algorithm, we take the number of agents similar to the number of clusters that we want.
The algorithm consists of the following steps:
1. The following are run for some iteration, t:
❖ The k cluster centroids are initialized by the ACO. The Pij of the ants as discussed determines the data points that are assimilated into clusters. The pickup and the drop off probabilities are considered alongside.
❖ While ensuring that every ant visits the corresponding occurrences, the data points are assigned to the neighboring cluster centroids and the value of the objective function is calculated for every agent.
❖ The best solution is found among all of the others by grading the agents by the norm and eliminating those agents with cluster less than given, i.e., k. The pheromone values are updated according to these best solutions.
2. The best agent globally is calculated using local best [11] while the agents are equal in number to the global best solutions. Thereafter, the mean is used to calculate the cluster centroids after evaluating initial and final f-measure and calculating entropy and moving data points to different cluster if the Fij-initial
3. The above step is repeated till there is no case of reassigning the points.
Here, the f-measure for some data point p and cluster i is calculated as follows:
(1.13)
Here, mij is the frequency of the class i in cluster j and mi is the frequency of points in cluster I and the entropy Ep is defined as the probability sum of the fitting of some cluster into some class
(1.14)
and the total entropy for a cluster group is
(1.15)
2.9. Genetic algorithm
Genetic algorithms (GAs) are heuristic optimization techniques that take inspiration from the natural phenomena of evolution [12]. Despite being slow in nature, these techniques have the advantage of being easy to code and represent. An example of evolution is seen in giraffe that had a mix of long and short necks in the prehistoric times. However, with the passage of time, through the process of natural selection, only those having long necks were able to survive (Fig. 1.6).
As with natural selection, the GA is based on three steps:
❖ Selection
❖ Crossover
❖ Mutation
Figure 1.6 Genetic algorithms synthesis.
2.9.1. Selection
In a population of some size, any feature of the population is distinguished by a particular chromosome of the genome and each chromosome is made of genes (generally binary). Each chromosome in the population is associated with a certain fitness value (quantifies the optimality of the solution) which is inherent to the problem and the task that we wish to perform with our GA. The selection process determines which solutions are allowed to reproduce (crossover stage) and which are not according to fitness values while maintaining a constant population size for the solution space. This seems intuitively valid as we endeavor to weed unoptimal (low fitness value) solutions toward future generations (iterations) and perform crossover over solutions with high fitness values with the hope of getting a better solution to replace some other suboptimal solution in the population.
There are different techniques for selection like the following:
❖ Tournament selection
❖ Roulette-wheel selection
❖ Proportionate selection
❖ Rank selection, etc.
When the fitness values have sufficient difference, we may use tournament selection, and when the fitness values are close enough, generally, roulette-wheel selection is used.
2.9.2. Crossover
Suppose there are two chromosomes of size n for the purpose of crossover, we select a position k such that position [1,k] is covered by the genes of the first chromosome and the position [k+1,n] is covered by the genes of the other to create a new chromosome.
2.9.3. Mutation
Now any kind of mutation is used to create a new property out of our chromosomes which helps to improve to the rate of convergence. Mutation can be simply achieved by altering the bit of a specific position [13]. However, mutation is a risky process as too much can bring too much randomness which may completely disrupt our results. Likewise, applying mutation to the MSB of the chromosome can bring drastic changes which we want to avoid. However, mutation provided with low probability is useful.
2.9.4. Clustering using genetic algorithm
While considering to tackle the problem of clustering with the help of GA we must understand the restraints, i.e., what function we want to minimize or maximize [14]. This is crucial as the fitness function will depend on the task in hand. Subsequently, it is not difficult to see that what we seek is to actually minimize the sum of the Euclidean norm of the data vectors from their respective centroids, i.e., we need to minimize the function
(1.16)
Here, mj denotes the cluster centers.
For k clusters and data vector Wp with n features, we can represent this information using a chromosome of length n∗k. The sequence of genes in the chromosome represents the k cluster centers. For e.g., for a data vector with two features and two clusters with their respective cluster centroids being (1,2) and (7,8), the chromosome can be represented [15] as shown below (Fig. 1.7).
For the purpose of our algorithm, we will consider a population of size N. Henceforth, with that population size, chromosomes will be created (with random cluster centroid initialization).
2.9.5. Computing the fitness function
For a chromosome, we assign the data vectors Wp in the clusters and then, the data vectors are assigned to those clusters with the Euclidean norm is the least. This is followed by replacing the values of each cluster centroid in the chromosome by the mean of the cluster to which the cluster center belonged to, i.e., all the mj are replaced by Mj such that
(1.17)
Then, we compute the error function considering the new centroid center Mj. The fitness function is then defined as
(1.18)
Thereafter, we do crossover as mentioned above. Now since we are dealing with real values for the genes, instead of flipping, we use a different procedure for mutation [16]. Suppose, the value of a gene is x, then for some random value μ generated using uniform distribution within the range [0,1] x is mutated (as shown in Ref. [15]) as follows
(1.19)
Figure 1.7 Cluster chromosome.
These steps are continued till termination step is reached. We get our solution from selecting the best solution (one with the highest fitness value) from the last generation.
While implementing our GA-based clustering algorithm, we initialize the population size carefully [17]. A population size that is too large