Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Artificial Intelligence in Data Mining: Theories and Applications
Artificial Intelligence in Data Mining: Theories and Applications
Artificial Intelligence in Data Mining: Theories and Applications
Ebook552 pages5 hours

Artificial Intelligence in Data Mining: Theories and Applications

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Artificial Intelligence in Data Mining: Theories and Applications offers a comprehensive introduction to data mining theories, relevant AI techniques, and their many real-world applications. This book is written by experienced engineers for engineers, biomedical engineers, and researchers in neural networks, as well as computer scientists with an interest in the area.
  • Provides coverage of the fundamentals of Artificial Intelligence as applied to data mining, including computational intelligence and unsupervised learning methods for data clustering
  • Presents coverage of key topics such as heuristic methods for data clustering, deep learning methods for data classification, and neural networks
  • Includes case studies and real-world applications of AI techniques in data mining, for improved outcomes in clinical diagnosis, satellite data extraction, agriculture, security and defense
LanguageEnglish
Release dateFeb 17, 2021
ISBN9780128206164
Artificial Intelligence in Data Mining: Theories and Applications

Related to Artificial Intelligence in Data Mining

Related ebooks

Science & Mathematics For You

View More

Related articles

Related categories

Reviews for Artificial Intelligence in Data Mining

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Artificial Intelligence in Data Mining - D. Binu

    Oman

    Preface

    D. Binu and B.R. Rajakumar

    The artificial intelligence (AI) has attained a level of maturity in which several methods are proved as victorious. The ability of research is shown in different research projects ranging from decision making to rivalry of the cognitive process of human expertise. Other triumphant AI models illustrated the design of descriptive reasoning theories and usage of formal language is done to symbolize pattern discovery and relations among the data. The automation of tools in societies has considerably improved the potential for producing and accumulating data from different sources. The increasing quantity of data has flooded all factors of the lives. The growth in stored data has produced an urgent requirement for novel methods and automatic tools that can intelligently help to transform huge data into useful information and knowledge. This escorts generation of promising and budding frontier in information technologies called data mining. The data mining poses a huge capability to enhance business outcomes. The significance of AI in data mining is well known and termed as oil of the cyber world.

    The book is modeled to cover key factors of the subject of AI in data mining. This book is splitted into small chapters so that the topics can be arranged and understood properly and the topics within chapters are organized in proper sequence for ensuring smooth subject flow. The book utilizes understandable language for explaining the fundamentals of the subject. The book offers a logical method of explaining several complex concepts and stepwise techniques for elaborating the imperative topics. Each chapter is well modeled with essential illustrations, practical instances, and solved problems. All chapters contained in the book are organized in proper sequence which allows each topic to build upon earlier studies. All care is taken for making learners comfortable in understanding the basic concepts of the subject. The book not only covers the complete scope of subject but also illustrates the philosophy of the subject, which makes the understanding of the subject clearer and makes it more interesting.

    This book will provide learners adequate information to attain mastery over the data mining and its applications. It covers data mining, biomedical data mining, data clustering, and heuristic methods for clustering data, deep learning methods, neural networks for data classification, and application of data mining in defense and security applications without compromising the subject details. The motive of the book is the illustration of concepts with practical instances so that the learners can grab contents in an easier manner. Another imperative feature of the book is the elaboration of data mining algorithms with examples. Moreover, this book contains several educational features like chapter-wise abstract, summary, practical examples, and relevant references to offer sound knowledge to the beginners. It also offers students a tenet to attain knowledge on technology. We hope that this book will motivate individuals of different backgrounds and experience to interchange their ideas concerning data mining so as to contribute toward further endorsement and shaping of this exhilarating and dynamic field.

    I wish to convey my heartfelt thanks to all those who supported to make this book a reality. Any suggestions for upgrading the book will be acknowledged and well appreciated.

    1

    Introduction

    D. Binu and B.R. Rajakumar,    Resbee Info Technologies, India

    Abstract

    Data mining is a new domain that has elevated the confluence of numerous disciplines with massive databases. The inspiring stimulus behind data mining is that these massive databases consist of information that is of high value to the dataset owners, but this information is concealed and remains uncovered. The motivating fact behind the data mining is to extract valuable information from the massive database which is closely related to exploratory data analysis. The exploration and analysis of massive data are extremely difficult and require huge computational time for analyzing the data. The visualization of data mining assists to deal with complex data wherein the user is directly related to the data mining technique. There are more data visualization methods that are designed to support the exploration of huge datasets. This chapter describes the introductory part of data mining techniques and the methodologies adapted for extracting the interesting data.

    Keywords

    Data mining; data warehouse server; regression; prediction; classification; information visualization; visual data mining; visual data exploration; knowledge discovery; artificial intelligence approach

    1.1 Data mining

    The data mining is a trendy research domain that has fascinated the interest of many industries in day-to-day lives. Due to massive-sized data, there is an impending need to tune such data into useful data and information. The knowledge acquired from the applications involves production control, science exploration, engineering design, business management, and market analysis. Data mining is considered as the result of increasing datasets as well as the evolution of information technologies. The evolutionary paths are observed from database industries in the design of subsequent techniques, which include dataset formation, data collection, and supervision of database for data storage and retrieval to attain effective data analysis for better understanding.

    Ever since 1960, the information technologies and databases are evolved systematically from ancient processing models to complicated and dominant database models. The investigation and design of database models from 1970 have escorted the design of the relational databases, data organization methods, indexing, and data modeling tools. Moreover, the users acquired expedient data access with user interfaces, through query processing, and query languages. Simply stated, data mining is a technique that is employed for extracting the knowledge from massive datasets.

    The existing evolution of data mining products and functions formed as a result of influence considering different disciplines like information retrieval, databases, machine learning, and statistics. Other areas of computer science acquired a major issue on the Knowledge Discovery in Databases (KDDs) process related to multimedia and graphics. The KDD is referred to as the overall process of discovering useful knowledge from data. The purpose of KDD is to illustrate the outcomes of the KDD process in a significant manner as many results are generated which could form a nontrivial issue.

    Visualization methods contain graphics presentations and sophisticated multimedia wherein the data mining strategies can be applied for multimedia applications. In contrast to earlier researches in these data mining, a major inclination with the database community is to integrate the results from different disciplines to form a unified data or algorithmic method. The goal of the method is to devise a big picture of the areas that enable the incorporation of different types of applications into the user domains or real-world scenarios.

    Data mining is considered as a multidisciplinary domain that maintains knowledgeable workers, who tried to mine the data-rich information from huge datasets. The data mining concept is rooted with the idea of extracting knowledge from massive data. The tools help to discover pertinent information by adapting several data analysis methods. Thus any method employed for extracting the patterns from the huge-sized data source is considered as a data mining method.

    1.2 Description of data mining

    Data mining is considered as a part of computer vision, which refers to the process that tries to determine the patterns from huge-size datasets. Data mining utilizes several methods such as statistics, artificial intelligence, database systems, and machine learning methods. The aim of data mining is to mine essential data from the dataset and convert it into a comprehensible arrangement for later use. Moreover, the raw analysis stage assumed certain factors for database management, which involve data processing, interest metrics, inference considerations, computational complexity, visualization, and online updates for establishing effective mining of data.

    Data mining plays an essential role in the process of discovering knowledge, which can be instantiated by analyzing huge datasets and acquiring useful knowledge from data. Data mining is employed effectively on the business environment, medicine, insurance, weather forecast, transportation, healthcare, and government sectors. These data mining applications pose huge benefits while using specific industries.

    1.2.1 Different databases adapted for data mining

    The data mining can be carried out using the following sets of data which are listed as follows:

    • relational databases

    • advanced databases and data repositories

    • transactional and spatial databases

    • object-oriented and object-relational databases

    • data warehouses

    • diverse databases

    • text databases

    • multimedia database

    • text mining and web mining

    1.2.2 Different steps in design process for mining data

    Fig. 1–1 depicts the process of mining data.

    • Understanding business

    Figure 1–1 Process of mining data.

    This phase establishes the goals of data mining, which are listed as follows:

    First, an understanding of client objectives is important. The desires of the clients must be carefully examined. Consider the stockpile of the present data mining cases, which must consider certain factors like constraints, assumptions, resources, and other factors in the evaluation. The purpose of mining imperative data is clearly defined using the objectives of business and analysis of current scenarios. The best plan of data mining is elaborated and must be designed for accomplishing both data mining goals and improved business.

    • Understanding data

    This phase deals with the checking of data to determine if the data is feasible to attain the goals of data mining.

    First, the data are accumulated from different sources of data accessible through business. The sources of data involve different datasets, such as data cubes or flat files. There exist certain limitations, such as schema integration and object matching, which could rise during the data integration process. The method is quite complicated and tricky due to the accumulation of different sources that are improbable to match. Thus it is complex to facilitate the value of given objects are the same or not. Here, the metadata must be utilized for minimizing the errors in the process of data integration. Then, the step for searching the properties of accumulated data and the improved way for exploring the data is to answer the questions of data mining using reporting, visualization tools, and queries. With the outcomes of queries the quality of data can be obtained. The missing data should be filled with dummy values.

    • Preparation of data

    This phase deals to make the data readily available for extracting the essential knowledge. In the following phase the data is processed for making it prepared for the production. Here, the data from various sources are selected, cleaned, transformed anonymized, formatted, and constructed for attaining data mining.

    • Data cleaning

    The cleaning of data is a procedure for cleaning the data by removing the noisy data and fills the values of missing.

    For instance, in the customer outline, if the age is not filled, then the data is said to be unfinished which must be filled. Considering some scenarios, the data can be outliers as age cannot be 300. Thus data should be consistent.

    • Transformation of data

    The operations in transforming data contribute to the success of mining process. Moreover, the function of transforming data is performed to alter data for making it useful in mining data. Some of the processes employed in the data mining process are listed as follows:

    • Smoothing

    The smoothing method helps to eliminate noise throughout data.

    • Aggregation

    The operations of aggregation are adapted in the data for establishing a precise summary.

    • Generalization

    In generalization, low-level data is replaced with sophisticated concepts.

    • Normalization

    In normalization the data attributes are scaled to normalize it in a certain range. For instance, the data can fall in the range 0 to 1 in normalization.

    • Attribute design

    The attributes are designed and considered with the given attributes for assisting data mining.

    The transformed data can be utilized as the final dataset for performing modeling.

    • Modeling

    The modeling phase utilizes mathematical models for determining the patterns of data.

    Considering these business objectives, the appropriate modeling methods can be chosen for the prepared dataset. Construct the scenario for testing the quality and model validity. Execute the model using the equipped dataset. Results must be evaluated with the stakeholder for making sure that the model could satisfy all objectives of mining useful data.

    • Evaluation

    In this stage, the acknowledged patterns are computed with the goals of the business.

    The results produced by the data mining framework can be computed using the set of business objectives. Acquiring business understanding is a repeated process. While consolidating, novel business needs can be raised due to data mining. The final decision can be considered for moving the model into the deployment phase.

    • Deployment

    In this stage, the discoveries of data mining can be used for dealing with different business operations.

    The information or knowledge discovered from the process of data mining can be easily understood by nontechnical stakeholders. A comprehensive deployment plan can be utilized for monitoring data and mining the crucial data. The final report is used with the lessons learned and can be used for enhancing the business policies of organizations.

    1.3 Tools in data mining

    The two data mining tools that are employed broadly in the industry are listed as follows:

    1. R-language

    R-language is a type of free tool for dealing with graphics and statistical computing methods. R poses an assortment of classical statistical tests, graphical methods, and time-series analysis. Moreover, this tool provides effective handling of data with high storage facility.

    2. Oracle data mining (ODM)

    ODM utilizes a component of Oracle Advanced Analytics Database. This tool permits analysts to produce detailed insights and makes the prediction more accurate. Moreover, this tool helps to predict the behavior of the customer and design the customer profiles and identifies cross-selling.

    1.4 Data mining terminologies

    A general data mining model consists of the following components:

    1. Data warehouse, database, or other information repositories

    This module consists of a data store, databases, worksheet, or erstwhile types of information repositories. The data integration and the data cleaning mechanisms are carried out on the data.

    2. Data warehouse server

    The server of data warehouse or database is liable for obtaining pertinent data using the request of data mining.

    3. Knowledge base

    The domain knowledge is utilized for guiding the search or evaluating the interest of resultant patterns. This knowledge involves hierarchies of concepts that are utilized for organizing the attribute values into abstraction levels. Knowledge, like user beliefs, is utilized for assessing the patterns of interestingness based on the unexpectedness. Other instances of domain knowledge include thresholds, interestingness constraints, or metadata.

    4. Data mining engine

    This is important in the data mining model and comprises a set of well-designed modules for processing tasks like association analysis, deviation, characterization, evolution analysis, and classification.

    5. Module for pattern evaluation

    This module adapts interestingness metrics and interrelates with the modules of data mining to spotlight on extracting useful patterns. This module access the thresholds accumulated in the knowledge base. On the other hand, the assessment of patterns may be combined using the mining unit based on the execution of data mining models. For proficient data mining, it is suggested to compute the interestingness of pattern into the mining process for confining the search into interesting patterns.

    6. Graphical user interface (GUI)

    This GUI model provides an interface between data mining models and users for permitting the user to cooperate with the system by computing data mining queries by offering information to concentrate on investigation and perform tentative data mining using results of intermediate data. Moreover, GUI permitted users to surf the dataset and schemas of data centers by evaluating structures of data and mined patterns for visualizing the patterns into various forms.

    1.5 Merits of data mining

    The data mining is benefitted in several areas, in which some of them are listed as follows:

    1. Marketing or retail industries for making campaigns

    Data mining helps the marketing industries in building models on the basis of historical data for predicting the response to make novel marketing promotions such as the campaign on online marketing and direct mail and so on. Throughout these results, the marketers hold a suitable method for selling cost-effective products to the targeted customers.

    Data mining holds many benefits in the case of retail companies through marketing. With market basket analysis a store could pose a suitable production arrangement such that the customers buy the products frequently with a pleasant mind. Moreover, the method helps retail companies to provide some discounts to a specific product that acquires the interest of many customers.

    2. Finance or banking for determining fraudulent transactions

    Data mining provides considerable attention in the financial institutions for acquiring the data about the loan. By designing a replica from the data of customers the bank can find better loans. Moreover, data mining assists the banks to determine the deceptive transactions for protecting the owners of credit card.

    3. Manufacturing

    By implementing data mining the manufacturers can determine the faulty tools and find the most favorable control parameters. In addition, data mining is applied for determining the control parameters that could direct to high production. Then, these parameters were used by manufacturers for qualitative data mining.

    4. Governments

    The data mining helped government agencies by evaluating records of financial transactions by building the pattern, which poses the ability to determine the criminal or money offenses.

    1.6 Disadvantages of data mining

    Some of the obstacles faced by the data mining methods are elaborated as follows:

    1. Human interaction

    As data mining issues are not accurately stated, the interfaces are needed with both domain experts and technical person. The technical experts are utilized for formulating queries and interpreting the results. The users are required for identifying the training data to produce the desired results.

    2. Overfitting

    When the model is produced with a given database, then it is enviable that model is fit for executing future states. Overfitting issue occurs when the model is unfit with the future states. This may be caused by the supposition that is made with the data or caused by the small-sized training datasets. Overfitting can occur with other situations as well, even though the data are not distorted.

    3. Outliers

    There exist numerous data entries that do not fit into the derived model. This became an issue considering huge databases. If the model is designed that includes these outliers, then the model may not perform well with data that are not outliers.

    4. Massive datasets

    The huge-size data are linked with data mining that creates issues when applying techniques designed for small datasets. Numerous modeling applications are devised on the literature which is inefficient for huge datasets. Parallelization and sampling are tools to attack the scalability issue.

    5. High dimensionality

    The classical database models consist of various attributes. The issue here is that these attributes are needed for solving the issue of data mining. The usage of specific attributes may with the correct completion of the data mining task. The use of other attributes may increase the complexity and minimize the algorithm efficiency. This issue is known as the dimensionality curse wherein many attributes are involved that are complex to determine. One resolution is to reduce the count of attributes, which is termed as reducing the number of attributes. However, the determination of important attributes is a complex task.

    6. Security issues

    Security is a major issue while dealing with massive datasets. Here, the business possesses information about the customers. However, the maintenance of information is a major drawback in which the hackers can access and stole essential data of customers which can become a major theft in data mining.

    1.7 Process of data mining

    The heart of the KDD process is data mining techniques for refining patterns from the massive datasets. These techniques pose different performance goals on the basis of the intended outcome of the complete KDD process. It can be observed that numerous techniques with different aims can be utilized to attain the required result.

    The majority of goals in data mining domain fall in these steps:

    • Processing of data

    Based on the desires of KDD process, the analyst can aggregate, filter, clean, sample, and alter data for analysis. Mechanizing numerous tasks of data processing and combining them impeccably into the complete process may remove or minimize the program focused routines for data import/export to enhance the productivity of analysts.

    • Prediction

    For a data item or a predictive scheme, one can forecast the particular attribute value or a data item. For instance, a predictive scheme for the transactions done using the credit card can be utilized to predict the likelihood of a fraudulent transaction. The prediction can be utilized for validating the detected hypothesis.

    • Regression

    For group of data items the regression represents the evaluation of dependence with a number of attribute that values other items considering same item and a habitual invention of a model, which could foresee the values of attributes considering new record.

    Regression analysis can be utilized for modeling the relation between different dependent and independent variables. For sovereign variables the attributes are termed as response variables that are utilized to make a prediction. Various issues of the real-world are considered for enhancing the process of data mining.

    For example, the sales volumes, prices of stocks, and rates of product failures are complex to forecast as they are based on complicated interfaces of different variables or predictors. Thus additional methods such as decision trees, logistic regression, and neural networks (NNs) are essential to forecast the values of the future. Similar models are utilized for both classification and regression. For instance, the Classification and Regression Trees is the algorithm of a decision tree which is utilized for building the regression trees to forecast continuous response variables and classification trees for classifying categorical response variables. NNs can be constructed as a regression or classification models.

    Different types of regression techniques utilized for data mining are listed as follows:

    • nonlinear regression

    • multivariate nonlinear regression

    • linear regression

    • multivariate linear regression

    • Classification

    With a set of definite categorical classes the determination of class for a specific data item is a major requirement.

    Classification is a widely utilized data mining method that adapts a group of determined class to design a model, which categorizes data with respect to its class. Credit risk applications and fraud detection are broadly suited for these types of analysis. The method adapts NN-based classification algorithms and decision tree for classifying the huge data. The data classification process consists of classification and learning. In learning, the training data are evaluated by the classification method. In classification, the test data are utilized for eliminating the precision of classification rules. If the correctness is satisfactory, then the rules are adapted with the new data. For fraud detection the data could involve whole records of valid activities, and deceitful cases discovered by the technique are eliminated.

    The classifier-based training algorithms utilize preclassified instances for determining the parameters set needed for correct discrimination. The algorithms encode these attributes with a model named as a classifier.

    Different types of classification techniques:

    • decision tree models

    • NNs

    • classification based on Bayesian rules

    • classification based on associations

    • support vector machines (SVM)

    • Clustering

    Considering a group of data items, the first step is partitioning of data into different classes like items with the same properties are grouped together. Clustering is a technique, which is utilized for determining the groups of item that are related. For instance, for specified dataset, the identification of subgroups that have the same buying behavior is a major

    Enjoying the preview?
    Page 1 of 1