Artificial Intelligence in Data Mining: Theories and Applications
()
About this ebook
- Provides coverage of the fundamentals of Artificial Intelligence as applied to data mining, including computational intelligence and unsupervised learning methods for data clustering
- Presents coverage of key topics such as heuristic methods for data clustering, deep learning methods for data classification, and neural networks
- Includes case studies and real-world applications of AI techniques in data mining, for improved outcomes in clinical diagnosis, satellite data extraction, agriculture, security and defense
Related to Artificial Intelligence in Data Mining
Related ebooks
Handbook of Computational Intelligence in Biomedical Engineering and Healthcare Rating: 0 out of 5 stars0 ratingsHybrid Computational Intelligence: Challenges and Applications Rating: 0 out of 5 stars0 ratingsApplications of Computational Intelligence in Multi-Disciplinary Research Rating: 0 out of 5 stars0 ratingsCognitive Big Data Intelligence with a Metaheuristic Approach Rating: 0 out of 5 stars0 ratingsHandbook of Data Science Approaches for Biomedical Engineering Rating: 0 out of 5 stars0 ratingsApplications of Big Data in Healthcare: Theory and Practice Rating: 0 out of 5 stars0 ratingsBig Data Analytics for Intelligent Healthcare Management Rating: 0 out of 5 stars0 ratingsEdge-of-Things in Personalized Healthcare Support Systems Rating: 0 out of 5 stars0 ratingsWeb Semantics: Cutting Edge and Future Directions in Healthcare Rating: 0 out of 5 stars0 ratingsAssistive Technology for the Elderly Rating: 0 out of 5 stars0 ratingsDemystifying Big Data, Machine Learning, and Deep Learning for Healthcare Analytics Rating: 0 out of 5 stars0 ratingsBlockchain Technology for Emerging Applications: A Comprehensive Approach Rating: 0 out of 5 stars0 ratingsThe Cognitive Approach in Cloud Computing and Internet of Things Technologies for Surveillance Tracking Systems Rating: 0 out of 5 stars0 ratingsIntelligent Data Security Solutions for e-Health Applications Rating: 0 out of 5 stars0 ratingsDeep Learning for Sustainable Agriculture Rating: 0 out of 5 stars0 ratingsData Science for Genomics Rating: 0 out of 5 stars0 ratingsAdvanced Data Mining Tools and Methods for Social Computing Rating: 0 out of 5 stars0 ratingsSemantic Models in IoT and eHealth Applications Rating: 0 out of 5 stars0 ratingsAn Industrial IoT Approach for Pharmaceutical Industry Growth: Volume 2 Rating: 0 out of 5 stars0 ratingsWearable Telemedicine Technology for the Healthcare Industry: Product Design and Development Rating: 0 out of 5 stars0 ratingsDeep Learning Techniques for Biomedical and Health Informatics Rating: 0 out of 5 stars0 ratingsImplementation of Smart Healthcare Systems using AI, IoT, and Blockchain Rating: 0 out of 5 stars0 ratingsEmergence of Pharmaceutical Industry Growth with Industrial IoT Approach Rating: 0 out of 5 stars0 ratingsHandbook of Deep Learning in Biomedical Engineering: Techniques and Applications Rating: 0 out of 5 stars0 ratingsComputational Intelligence for Multimedia Big Data on the Cloud with Engineering Applications Rating: 0 out of 5 stars0 ratingsComputational Intelligence and Its Applications in Healthcare Rating: 0 out of 5 stars0 ratingsSustainable Networks in Smart Grid Rating: 0 out of 5 stars0 ratingsArtificial Intelligence-Based Brain-Computer Interface Rating: 0 out of 5 stars0 ratingsFundamentals of Data Science: Theory and Practice Rating: 0 out of 5 stars0 ratingsData Analytics in Biomedical Engineering and Healthcare Rating: 0 out of 5 stars0 ratings
Science & Mathematics For You
The Big Book of Hacks: 264 Amazing DIY Tech Projects Rating: 4 out of 5 stars4/5How Emotions Are Made: The Secret Life of the Brain Rating: 4 out of 5 stars4/5Homo Deus: A Brief History of Tomorrow Rating: 4 out of 5 stars4/5Fantastic Fungi: How Mushrooms Can Heal, Shift Consciousness, and Save the Planet Rating: 5 out of 5 stars5/5Becoming Cliterate: Why Orgasm Equality Matters--And How to Get It Rating: 4 out of 5 stars4/5Memory Craft: Improve Your Memory with the Most Powerful Methods in History Rating: 3 out of 5 stars3/5How to Think Critically: Question, Analyze, Reflect, Debate. Rating: 5 out of 5 stars5/5Metaphors We Live By Rating: 4 out of 5 stars4/5On Food and Cooking: The Science and Lore of the Kitchen Rating: 5 out of 5 stars5/5The Psychology of Totalitarianism Rating: 5 out of 5 stars5/52084: Artificial Intelligence and the Future of Humanity Rating: 4 out of 5 stars4/5Free Will Rating: 4 out of 5 stars4/5Ultralearning: Master Hard Skills, Outsmart the Competition, and Accelerate Your Career Rating: 4 out of 5 stars4/5Activate Your Brain: How Understanding Your Brain Can Improve Your Work - and Your Life Rating: 4 out of 5 stars4/5Hunt for the Skinwalker: Science Confronts the Unexplained at a Remote Ranch in Utah Rating: 4 out of 5 stars4/5The Wisdom of Psychopaths: What Saints, Spies, and Serial Killers Can Teach Us About Success Rating: 4 out of 5 stars4/5The Systems Thinker: Essential Thinking Skills For Solving Problems, Managing Chaos, Rating: 4 out of 5 stars4/5Outsmart Your Brain: Why Learning is Hard and How You Can Make It Easy Rating: 4 out of 5 stars4/5No Stone Unturned: The True Story of the World's Premier Forensic Investigators Rating: 4 out of 5 stars4/5Conscious: A Brief Guide to the Fundamental Mystery of the Mind Rating: 4 out of 5 stars4/5Other Minds: The Octopus, the Sea, and the Deep Origins of Consciousness Rating: 4 out of 5 stars4/5A Crack In Creation: Gene Editing and the Unthinkable Power to Control Evolution Rating: 4 out of 5 stars4/5No-Drama Discipline: the bestselling parenting guide to nurturing your child's developing mind Rating: 4 out of 5 stars4/518 Tiny Deaths: The Untold Story of Frances Glessner Lee and the Invention of Modern Forensics Rating: 4 out of 5 stars4/5The Structure of Scientific Revolutions Rating: 4 out of 5 stars4/5Born for Love: Why Empathy Is Essential--and Endangered Rating: 4 out of 5 stars4/5Why People Believe Weird Things: Pseudoscience, Superstition, and Other Confusions of Our Time Rating: 4 out of 5 stars4/5Flu: The Story of the Great Influenza Pandemic of 1918 and the Search for the Virus That Caused It Rating: 4 out of 5 stars4/5Lies My Gov't Told Me: And the Better Future Coming Rating: 4 out of 5 stars4/5
Related categories
Reviews for Artificial Intelligence in Data Mining
0 ratings0 reviews
Book preview
Artificial Intelligence in Data Mining - D. Binu
Oman
Preface
D. Binu and B.R. Rajakumar
The artificial intelligence (AI) has attained a level of maturity in which several methods are proved as victorious. The ability of research is shown in different research projects ranging from decision making to rivalry of the cognitive process of human expertise. Other triumphant AI models illustrated the design of descriptive reasoning theories and usage of formal language is done to symbolize pattern discovery and relations among the data. The automation of tools in societies has considerably improved the potential for producing and accumulating data from different sources. The increasing quantity of data has flooded all factors of the lives. The growth in stored data has produced an urgent requirement for novel methods and automatic tools that can intelligently help to transform huge data into useful information and knowledge. This escorts generation of promising and budding frontier in information technologies called data mining. The data mining poses a huge capability to enhance business outcomes. The significance of AI in data mining is well known and termed as oil of the cyber world.
The book is modeled to cover key factors of the subject of AI in data mining. This book is splitted into small chapters so that the topics can be arranged and understood properly and the topics within chapters are organized in proper sequence for ensuring smooth subject flow. The book utilizes understandable language for explaining the fundamentals of the subject. The book offers a logical method of explaining several complex concepts and stepwise techniques for elaborating the imperative topics. Each chapter is well modeled with essential illustrations, practical instances, and solved problems. All chapters contained in the book are organized in proper sequence which allows each topic to build upon earlier studies. All care is taken for making learners comfortable in understanding the basic concepts of the subject. The book not only covers the complete scope of subject but also illustrates the philosophy of the subject, which makes the understanding of the subject clearer and makes it more interesting.
This book will provide learners adequate information to attain mastery over the data mining and its applications. It covers data mining, biomedical data mining, data clustering, and heuristic methods for clustering data, deep learning methods, neural networks for data classification, and application of data mining in defense and security applications without compromising the subject details. The motive of the book is the illustration of concepts with practical instances so that the learners can grab contents in an easier manner. Another imperative feature of the book is the elaboration of data mining algorithms with examples. Moreover, this book contains several educational features like chapter-wise abstract, summary, practical examples, and relevant references to offer sound knowledge to the beginners. It also offers students a tenet to attain knowledge on technology. We hope that this book will motivate individuals of different backgrounds and experience to interchange their ideas concerning data mining so as to contribute toward further endorsement and shaping of this exhilarating and dynamic field.
I wish to convey my heartfelt thanks to all those who supported to make this book a reality. Any suggestions for upgrading the book will be acknowledged and well appreciated.
1
Introduction
D. Binu and B.R. Rajakumar, Resbee Info Technologies, India
Abstract
Data mining is a new domain that has elevated the confluence of numerous disciplines with massive databases. The inspiring stimulus behind data mining is that these massive databases consist of information that is of high value to the dataset owners, but this information is concealed and remains uncovered. The motivating fact behind the data mining is to extract valuable information from the massive database which is closely related to exploratory data analysis. The exploration and analysis of massive data are extremely difficult and require huge computational time for analyzing the data. The visualization of data mining assists to deal with complex data wherein the user is directly related to the data mining technique. There are more data visualization methods that are designed to support the exploration of huge datasets. This chapter describes the introductory part of data mining techniques and the methodologies adapted for extracting the interesting data.
Keywords
Data mining; data warehouse server; regression; prediction; classification; information visualization; visual data mining; visual data exploration; knowledge discovery; artificial intelligence approach
1.1 Data mining
The data mining is a trendy research domain that has fascinated the interest of many industries in day-to-day lives. Due to massive-sized data, there is an impending need to tune such data into useful data and information. The knowledge acquired from the applications involves production control, science exploration, engineering design, business management, and market analysis. Data mining is considered as the result of increasing datasets as well as the evolution of information technologies. The evolutionary paths are observed from database industries in the design of subsequent techniques, which include dataset formation, data collection, and supervision of database for data storage and retrieval to attain effective data analysis for better understanding.
Ever since 1960, the information technologies and databases are evolved systematically from ancient processing models to complicated and dominant database models. The investigation and design of database models from 1970 have escorted the design of the relational databases, data organization methods, indexing, and data modeling tools. Moreover, the users acquired expedient data access with user interfaces, through query processing, and query languages. Simply stated, data mining is a technique that is employed for extracting the knowledge from massive datasets.
The existing evolution of data mining products and functions formed as a result of influence considering different disciplines like information retrieval, databases, machine learning, and statistics. Other areas of computer science acquired a major issue on the Knowledge Discovery in Databases (KDDs) process related to multimedia and graphics. The KDD is referred to as the overall process of discovering useful knowledge from data. The purpose of KDD is to illustrate the outcomes of the KDD process in a significant manner as many results are generated which could form a nontrivial issue.
Visualization methods contain graphics presentations and sophisticated multimedia wherein the data mining strategies can be applied for multimedia applications. In contrast to earlier researches in these data mining, a major inclination with the database community is to integrate the results from different disciplines to form a unified data or algorithmic method. The goal of the method is to devise a big picture of the areas that enable the incorporation of different types of applications into the user domains or real-world scenarios.
Data mining is considered as a multidisciplinary domain that maintains knowledgeable workers, who tried to mine the data-rich information from huge datasets. The data mining concept is rooted with the idea of extracting knowledge from massive data. The tools help to discover pertinent information by adapting several data analysis methods. Thus any method employed for extracting the patterns from the huge-sized data source is considered as a data mining method.
1.2 Description of data mining
Data mining is considered as a part of computer vision, which refers to the process that tries to determine the patterns from huge-size datasets. Data mining utilizes several methods such as statistics, artificial intelligence, database systems, and machine learning methods. The aim of data mining is to mine essential data from the dataset and convert it into a comprehensible arrangement for later use. Moreover, the raw analysis stage assumed certain factors for database management, which involve data processing, interest metrics, inference considerations, computational complexity, visualization, and online updates for establishing effective mining of data.
Data mining plays an essential role in the process of discovering knowledge, which can be instantiated by analyzing huge datasets and acquiring useful knowledge from data. Data mining is employed effectively on the business environment, medicine, insurance, weather forecast, transportation, healthcare, and government sectors. These data mining applications pose huge benefits while using specific industries.
1.2.1 Different databases adapted for data mining
The data mining can be carried out using the following sets of data which are listed as follows:
• relational databases
• advanced databases and data repositories
• transactional and spatial databases
• object-oriented and object-relational databases
• data warehouses
• diverse databases
• text databases
• multimedia database
• text mining and web mining
1.2.2 Different steps in design process for mining data
Fig. 1–1 depicts the process of mining data.
• Understanding business
Figure 1–1 Process of mining data.
This phase establishes the goals of data mining, which are listed as follows:
First, an understanding of client objectives is important. The desires of the clients must be carefully examined. Consider the stockpile of the present data mining cases, which must consider certain factors like constraints, assumptions, resources, and other factors in the evaluation. The purpose of mining imperative data is clearly defined using the objectives of business and analysis of current scenarios. The best plan of data mining is elaborated and must be designed for accomplishing both data mining goals and improved business.
• Understanding data
This phase deals with the checking of data to determine if the data is feasible to attain the goals of data mining.
First, the data are accumulated from different sources of data accessible through business. The sources of data involve different datasets, such as data cubes or flat files. There exist certain limitations, such as schema integration and object matching, which could rise during the data integration process. The method is quite complicated and tricky due to the accumulation of different sources that are improbable to match. Thus it is complex to facilitate the value of given objects are the same or not. Here, the metadata must be utilized for minimizing the errors in the process of data integration. Then, the step for searching the properties of accumulated data and the improved way for exploring the data is to answer the questions of data mining using reporting, visualization tools, and queries. With the outcomes of queries the quality of data can be obtained. The missing data should be filled with dummy values.
• Preparation of data
This phase deals to make the data readily available for extracting the essential knowledge. In the following phase the data is processed for making it prepared for the production. Here, the data from various sources are selected, cleaned, transformed anonymized, formatted, and constructed for attaining data mining.
• Data cleaning
The cleaning of data is a procedure for cleaning the data by removing the noisy data and fills the values of missing.
For instance, in the customer outline, if the age is not filled, then the data is said to be unfinished which must be filled. Considering some scenarios, the data can be outliers as age cannot be 300. Thus data should be consistent.
• Transformation of data
The operations in transforming data contribute to the success of mining process. Moreover, the function of transforming data is performed to alter data for making it useful in mining data. Some of the processes employed in the data mining process are listed as follows:
• Smoothing
The smoothing method helps to eliminate noise throughout data.
• Aggregation
The operations of aggregation are adapted in the data for establishing a precise summary.
• Generalization
In generalization, low-level data is replaced with sophisticated concepts.
• Normalization
In normalization the data attributes are scaled to normalize it in a certain range. For instance, the data can fall in the range 0 to 1 in normalization.
• Attribute design
The attributes are designed and considered with the given attributes for assisting data mining.
The transformed data can be utilized as the final dataset for performing modeling.
• Modeling
The modeling phase utilizes mathematical models for determining the patterns of data.
Considering these business objectives, the appropriate modeling methods can be chosen for the prepared dataset. Construct the scenario for testing the quality and model validity. Execute the model using the equipped dataset. Results must be evaluated with the stakeholder for making sure that the model could satisfy all objectives of mining useful data.
• Evaluation
In this stage, the acknowledged patterns are computed with the goals of the business.
The results produced by the data mining framework can be computed using the set of business objectives. Acquiring business understanding is a repeated process. While consolidating, novel business needs can be raised due to data mining. The final decision can be considered for moving the model into the deployment phase.
• Deployment
In this stage, the discoveries of data mining can be used for dealing with different business operations.
The information or knowledge discovered from the process of data mining can be easily understood by nontechnical stakeholders. A comprehensive deployment plan can be utilized for monitoring data and mining the crucial data. The final report is used with the lessons learned and can be used for enhancing the business policies of organizations.
1.3 Tools in data mining
The two data mining tools that are employed broadly in the industry are listed as follows:
1. R-language
R-language is a type of free tool for dealing with graphics and statistical computing methods. R poses an assortment of classical statistical tests, graphical methods, and time-series analysis. Moreover, this tool provides effective handling of data with high storage facility.
2. Oracle data mining (ODM)
ODM utilizes a component of Oracle Advanced Analytics Database. This tool permits analysts to produce detailed insights and makes the prediction more accurate. Moreover, this tool helps to predict the behavior of the customer and design the customer profiles and identifies cross-selling.
1.4 Data mining terminologies
A general data mining model consists of the following components:
1. Data warehouse, database, or other information repositories
This module consists of a data store, databases, worksheet, or erstwhile types of information repositories. The data integration and the data cleaning mechanisms are carried out on the data.
2. Data warehouse server
The server of data warehouse or database is liable for obtaining pertinent data using the request of data mining.
3. Knowledge base
The domain knowledge is utilized for guiding the search or evaluating the interest of resultant patterns. This knowledge involves hierarchies of concepts that are utilized for organizing the attribute values into abstraction levels. Knowledge, like user beliefs, is utilized for assessing the patterns of interestingness based on the unexpectedness. Other instances of domain knowledge include thresholds, interestingness constraints, or metadata.
4. Data mining engine
This is important in the data mining model and comprises a set of well-designed modules for processing tasks like association analysis, deviation, characterization, evolution analysis, and classification.
5. Module for pattern evaluation
This module adapts interestingness metrics and interrelates with the modules of data mining to spotlight on extracting useful patterns. This module access the thresholds accumulated in the knowledge base. On the other hand, the assessment of patterns may be combined using the mining unit based on the execution of data mining models. For proficient data mining, it is suggested to compute the interestingness of pattern into the mining process for confining the search into interesting patterns.
6. Graphical user interface (GUI)
This GUI model provides an interface between data mining models and users for permitting the user to cooperate with the system by computing data mining queries by offering information to concentrate on investigation and perform tentative data mining using results of intermediate data. Moreover, GUI permitted users to surf the dataset and schemas of data centers by evaluating structures of data and mined patterns for visualizing the patterns into various forms.
1.5 Merits of data mining
The data mining is benefitted in several areas, in which some of them are listed as follows:
1. Marketing or retail industries for making campaigns
Data mining helps the marketing industries in building models on the basis of historical data for predicting the response to make novel marketing promotions such as the campaign on online marketing and direct mail and so on. Throughout these results, the marketers hold a suitable method for selling cost-effective products to the targeted customers.
Data mining holds many benefits in the case of retail companies through marketing. With market basket analysis a store could pose a suitable production arrangement such that the customers buy the products frequently with a pleasant mind. Moreover, the method helps retail companies to provide some discounts to a specific product that acquires the interest of many customers.
2. Finance or banking for determining fraudulent transactions
Data mining provides considerable attention in the financial institutions for acquiring the data about the loan. By designing a replica from the data of customers the bank can find better loans. Moreover, data mining assists the banks to determine the deceptive transactions for protecting the owners of credit card.
3. Manufacturing
By implementing data mining the manufacturers can determine the faulty tools and find the most favorable control parameters. In addition, data mining is applied for determining the control parameters that could direct to high production. Then, these parameters were used by manufacturers for qualitative data mining.
4. Governments
The data mining helped government agencies by evaluating records of financial transactions by building the pattern, which poses the ability to determine the criminal or money offenses.
1.6 Disadvantages of data mining
Some of the obstacles faced by the data mining methods are elaborated as follows:
1. Human interaction
As data mining issues are not accurately stated, the interfaces are needed with both domain experts and technical person. The technical experts are utilized for formulating queries and interpreting the results. The users are required for identifying the training data to produce the desired results.
2. Overfitting
When the model is produced with a given database, then it is enviable that model is fit for executing future states. Overfitting issue occurs when the model is unfit with the future states. This may be caused by the supposition that is made with the data or caused by the small-sized training datasets. Overfitting can occur with other situations as well, even though the data are not distorted.
3. Outliers
There exist numerous data entries that do not fit into the derived model. This became an issue considering huge databases. If the model is designed that includes these outliers, then the model may not perform well with data that are not outliers.
4. Massive datasets
The huge-size data are linked with data mining that creates issues when applying techniques designed for small datasets. Numerous modeling applications are devised on the literature which is inefficient for huge datasets. Parallelization and sampling are tools to attack the scalability issue.
5. High dimensionality
The classical database models consist of various attributes. The issue here is that these attributes are needed for solving the issue of data mining. The usage of specific attributes may with the correct completion of the data mining task. The use of other attributes may increase the complexity and minimize the algorithm efficiency. This issue is known as the dimensionality curse wherein many attributes are involved that are complex to determine. One resolution is to reduce the count of attributes, which is termed as reducing the number of attributes. However, the determination of important attributes is a complex task.
6. Security issues
Security is a major issue while dealing with massive datasets. Here, the business possesses information about the customers. However, the maintenance of information is a major drawback in which the hackers can access and stole essential data of customers which can become a major theft in data mining.
1.7 Process of data mining
The heart of the KDD process is data mining techniques for refining patterns from the massive datasets. These techniques pose different performance goals on the basis of the intended outcome of the complete KDD process. It can be observed that numerous techniques with different aims can be utilized to attain the required result.
The majority of goals in data mining domain fall in these steps:
• Processing of data
Based on the desires of KDD process, the analyst can aggregate, filter, clean, sample, and alter data for analysis. Mechanizing numerous tasks of data processing and combining them impeccably into the complete process may remove or minimize the program focused routines for data import/export to enhance the productivity of analysts.
• Prediction
For a data item or a predictive scheme, one can forecast the particular attribute value or a data item. For instance, a predictive scheme for the transactions done using the credit card can be utilized to predict the likelihood of a fraudulent transaction. The prediction can be utilized for validating the detected hypothesis.
• Regression
For group of data items the regression represents the evaluation of dependence with a number of attribute that values other items considering same item and a habitual invention of a model, which could foresee the values of attributes considering new record.
Regression analysis can be utilized for modeling the relation between different dependent and independent variables. For sovereign variables the attributes are termed as response variables that are utilized to make a prediction. Various issues of the real-world are considered for enhancing the process of data mining.
For example, the sales volumes, prices of stocks, and rates of product failures are complex to forecast as they are based on complicated interfaces of different variables or predictors. Thus additional methods such as decision trees, logistic regression, and neural networks (NNs) are essential to forecast the values of the future. Similar models are utilized for both classification and regression. For instance, the Classification and Regression Trees is the algorithm of a decision tree which is utilized for building the regression trees to forecast continuous response variables and classification trees for classifying categorical response variables. NNs can be constructed as a regression or classification models.
Different types of regression techniques utilized for data mining are listed as follows:
• nonlinear regression
• multivariate nonlinear regression
• linear regression
• multivariate linear regression
• Classification
With a set of definite categorical classes the determination of class for a specific data item is a major requirement.
Classification is a widely utilized data mining method that adapts a group of determined class to design a model, which categorizes data with respect to its class. Credit risk applications and fraud detection are broadly suited for these types of analysis. The method adapts NN-based classification algorithms and decision tree for classifying the huge data. The data classification process consists of classification and learning. In learning, the training data are evaluated by the classification method. In classification, the test data are utilized for eliminating the precision of classification rules. If the correctness is satisfactory, then the rules are adapted with the new data. For fraud detection the data could involve whole records of valid activities, and deceitful cases discovered by the technique are eliminated.
The classifier-based training algorithms utilize preclassified instances for determining the parameters set needed for correct discrimination. The algorithms encode these attributes with a model named as a classifier.
Different types of classification techniques:
• decision tree models
• NNs
• classification based on Bayesian rules
• classification based on associations
• support vector machines (SVM)
• Clustering
Considering a group of data items, the first step is partitioning of data into different classes like items with the same properties are grouped together. Clustering is a technique, which is utilized for determining the groups of item that are related. For instance, for specified dataset, the identification of subgroups that have the same buying behavior is a major