Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Pathways to Machine Learning and Soft Computing: 邁向機器學習與軟計算之路(國際英文版)
Pathways to Machine Learning and Soft Computing: 邁向機器學習與軟計算之路(國際英文版)
Pathways to Machine Learning and Soft Computing: 邁向機器學習與軟計算之路(國際英文版)
Ebook357 pages3 hours

Pathways to Machine Learning and Soft Computing: 邁向機器學習與軟計算之路(國際英文版)

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This book provides frequently studied and used machines together with soft computing methods such as evolutionary computation. The main topics of the machine learning cover Artificial Neural Networks (ANNs), Radial Basis Function Networks (RBFNs), Fuzzy Neural Networks (FNNs), Support Vector Machines (SVMs), and Wilcoxon Learning Machines (WLMs)

LanguageEnglish
PublisherEHGBooks
Release dateJul 1, 2018
ISBN9781647848569
Pathways to Machine Learning and Soft Computing: 邁向機器學習與軟計算之路(國際英文版)

Related to Pathways to Machine Learning and Soft Computing

Related ebooks

Applications & Software For You

View More

Related articles

Reviews for Pathways to Machine Learning and Soft Computing

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Pathways to Machine Learning and Soft Computing - Jyh-Horng Jeng

    Pathways to Machine Learning and Soft Computing

    Jyh-Horng Jeng, Jer-Guang Hsieh, Yih-Lon Lin, and Ying-Sheng Kuo

    About Author

    Jer-Guang Hsieh received the Ph.D. degree in electrical engineering from Rensselaer Polytechnic Institute, Troy, New York, U.S.A., in 1985.  He was with the Department of Electrical Engineering, National Sun Yat-Sen University, Kaohsiung, Taiwan, from 1985 to 2008.  Currently, he is a Chair Professor at the Department of Electrical Engineering, I-Shou University, Kaohsiung, Taiwan.  He is also a Chair Professor at the Department of Automatic Control Engineering, Feng Chia University, Taichung, Taiwan.  Dr. Hsieh is the recipient of the 1983 Henry J. Nolte Memorial Prize of Rensselaer Polytechnic Institute.  He won the Distinguished Teaching Award in 1988 and Best Prize for competition of the microcomputer design package for teaching and research in 1989, both from the Ministry of Education of the Republic of China.  He won the Young Engineer Prize from the Chinese Engineers Association in 1994.  He is a member of the Phi Tau Phi Scholastic Honor Society of the Republic of China and a violinist of Kaohsiung Chamber Orchestra.  His current research interests are in the areas of nonlinear control, machine learning and soft computing, and differential games.

    Jyh-Horng Jeng received the B.S. and M.S. degrees in mathematics from Tamkang University, Taiwan, in 1982 and 1984, respectively, and the Ph.D. degree in mathematics (Information Group) from The State University of New York at Buffalo (SUNY, Buffalo) in 1996.  He was a  Senior Research Engineer at the Chung Shan Institute of Science and Technology (CSIST), Taiwan, from 1984 to 1992.  Currently, he is a Professor at the Department of Information Engineering, I-Shou University, Taiwan.  His research interests include multimedia applications, AI, soft computing and machine learning.

    Yih-Lon Lin received the B.S. degree from the Department of Electronic Engineering, I-Shou University, Kaohsiung, Taiwan, in 1997 and the M.S. and Ph.D. degrees from the Department of Electrical Engineering, National Sun Yat-Sen University, Kaohsiung, Taiwan, in 1999 and 2006, respectively.  Currently, he is an Associate Professor at the Department of Information Engineering, I-Shou University.  His research interests include neural networks, fuzzy systems, and machine learning.

    Ying-Sheng Kuo received the B.S. degree in Mechanical Engineering from Feng Chia University, Taichung, Taiwan, in 1988 and the M.S. and Ph.D. degrees in Mechanical Engineering from National Cheng Kung University, Tainan, Taiwan, in 1991 and 1995, respectively.  Currently, he is an Associate Professor at the General Education Center, Open University of Kaohsiung, Kaohsiung, Taiwan. His research interests include machine learning, soft computing, and computational fluid dynamics.

    About the Book

    This book provides frequently studied and used machines together with soft computing methods such as evolutionary computation. The main topics of the machine learning cover Artificial Neural Networks (ANNs), Radial Basis Function Networks (RBFNs), Fuzzy Neural Networks (FNNs), Support Vector Machines (SVMs), and Wilcoxon Learning Machines (WLMs). The soft computing methods include Genetic Algorithm (GA) and Particle Swarm Optimization (PSO).

    The contents are basics of machine learning, including construction of models and derivation of learning algorithms. This book also provides lots of examples, figures, illustrations, tables, exercises, and the solution menu. In addition, the simulated and validated codes written in R are also provided for the user to learn the programming procedure when written in different programming languages. The R codes work correctly on many simulated datasets. So, the readers can verify their own codes by comparison. Reading this book will become strong.

    One most important feature of this book is that we provide step by step illustrations for every algorithm, which is referred to as pre-pseudo codes. The pre-pseudo codes arrange complicated algorithms in the forms of mathematical equations, which are ready for programming using any languages. It means that students and engineers can easily implement the algorithms from the pre-pseudo codes even they do not fully understand the underlying ideas. On the other hand, implementing the pre-pseudo codes will help them to understand the ideas.

    Brief Sketch of the Contents

    The book starts with the introduction to machine learning.  More emphasis is put on supervised learning, including classification learning (or pattern recognition) and function learning (or regression estimation).  Bias-variance dilemma, which occurs in every machine learning problem, is illustrated through a numerical example.

    Since machine learning problems usually involve some finite-dimensional optimization problems, solid background in optimization theory is crucial for sound understanding of the machine learning processes.  We will briefly review some fundamental concepts and important results of finite-dimensional optimization theory in Chapter 2.

    The true beginning of the mathematical analysis of learning processes started by the proposition of the Rosenblatt’s algorithms for perceptrons, followed by the proposition of the Widrow-Hoff algorithms for Adalines (adaptive linear neurons).  To our astonishment, these algorithms have already provided us hints for kernel-based learning machines of classification and regression if we consider the dual forms of their algorithms.  The concept of the kernel is the basis of the support vector machines.

    Linear classification problems are studied in Chapter 3.  A linear classifier can be represented as a single-layer neural network with a hard limiting output activation function.  The Rosenblatt’s Perceptron Algorithms for linearly separable training data sets are introduced.  Large margin for a linear classification provides the maximum robustness against perturbation.  This motivates the introduction of maximal margin classifiers.  To allow some misclassifications for linearly inseparable data, we will introduce slack variables for classification problems.  Based on this, soft margin classifiers (or linear support vector classifiers) are studied. 

    Linear regression problems are studied in Chapter 4.  A linear regressor can be represented as a single-layer neural network with a linear output activation function.  The Widrow-Hoff algorithms, also called the delta learning rules, are derived for finding the least squares solutions.  To smooth the predictive functions and to tolerate the errors in corrupted data, we will consider the ridge regression and linear support vector regression.

    Three popular and powerful learning machines, i.e., artificial neural networks, generalized radial basis function networks, and fuzzy neural networks are introduced in Chapter 5.  All three learning machines can be represented as multi-layer neural networks with a hidden layer, and the activation functions of the hidden nodes are nonlinear and continuously differentiable.  Simple back propagation algorithm, which is a direct generalization of the delta learning rule used in Widrow-Hoff algorithm for Adalines, is introduced.  The invention of the back propagation learning rules is a major breakthrough in machine learning theory. 

    At first glance, it might seem strange why we spent so much effort dealing with linear classification and linear regression problems, because our world is truly nonlinear in whatever sense.  It will be seen that the commonly used learning machines, including those introduced in Chapter 5, nonlinearly transform in a peculiar way the input vectors to a feature space and perform generalized linear regression in feature space to produce the output vectors.  Amazingly, it is rather trivial to go from linear classification and regression to kernel-based nonlinear classification and regression by applying the so called kernel trick.  It simply replaces the inner products by kernels.  Such kernel-based approach results in the invention of support vector machines.  The idea of a kernel generalizes the standard inner product in the finite-dimensional Euclidean space.  The kernels are studied in Chapter 6.

    To numerically solve the kernel-based classification and regression problems, we introduce an elegant and powerful sequential minimal optimization technique in Chapter 7. 

    Every learning problem has some (machine) parameters to be specified in advance.  This is the problem of model selection, which is studied in Chapter 8.  Two powerful evolutionary computation techniques, i.e., genetic algorithm and particle swarm optimization, are applied for tuning the parameters of support vector machines. 

    In a broad range of practical applications, data collected inevitably contain one or more atypical observations called outliers; that is, observations that are well separated from the majority or bulk of the data, or in some fashion deviate from the general pattern of the data.  As is well known in linear regression theory, classical least squares fit of a regression model can be very adversely influenced by outliers, even by a single one, and often fails to provide a good fit to the bulk of the data.  Robust regression that is resistant to the adverse effects of outlying response values offers a half-way house between including outliers and omitting them entirely.  Rather than omitting outliers, it dampens their influence on the fitted regression model by down-weighting them.  It is desirable that the robust estimates provide a good fit for the majority of the data when the data contain outliers, as well as when the data are free of them.  A learning machine is said to be robust if it is not sensitive to outliers in the data.

    The newly developed Wilcoxon learning machines will be studied in Chapter 8.  They were developed by extending the R-estimators frequently used in robust regression paradigm to nonparametric learning machines for nonlinear learning problems.  These machines are based on minimizing the rank-based Wilcoxon norm of total residuals and are quite robust against (or insensitive to) outliers.  It is our firm belief that the Wilcoxon approach will provide a promising methodology for many machine learning problems.

    Chapter 1  Introduction

    1.1    What is Machine Learning?

    An important task in almost all science and engineering is fitting models to data.  The first step in mathematical modeling of a system under consideration is to use the first principles, e.g., Newton’s laws in mechanics, Kirchhoff’s laws in lumped electric circuits, or various laws in thermodynamics.  As the system becomes increasingly complex, it is more and more unlikely to obtain a precise description of the system in quantitative terms.  What we desire in practice is a reasonable yet tractable model.  It may also happen that there is no analytic model for the system under consideration.  This is particularly true in social science problems.  However, in many real situations, we do have some experimental data (or observational data), either from measurement or data collection by some means.  This raises the necessity of a theory concerning the learning from examples, i.e., obtaining a good mathematical model from experimental data.  This is what machine learning all about.

    Machine learning can be embedded in the broader context of knowledge discovery in databases (KDD), originated in computer science.  See Hand, Mannila, and Smyth (2001) and Kantardzic (2003).  The entire process of KDD is interactive, which is shown in Figure 1.1.1.  The machine learning belongs to the fourth component of KDD.  Application of machine learning methods to large databases is called data mining.

    Problem Statement

    Data Preprocessing

    Extraction of

    Relationships or Patterns

    Interpretation and Assessment of

    Discovered Structures

    Selection of Target Data

    Figure 1.1.1: Process of knowledge discovery in databases.

    Our view of machine learning and soft computing is shown in Figure 1.1.2.  The items inside the circle represent some commonly used learning machines, those outside the circle represent various tools necessary for solving machine learning problems, and those inside the rectangle denote some possible applications of machine learning and soft computing. 

    ANN

    FNN

    CNN

    GRBFNNN

    SVM

    GA

    PSO

    Numerical

    Optimization

    Approximation

    Theory

    Statistical

    Learning

    Linear

    Algebra

    Probability

    Chaos

    Machine Learning & Soft Computing

    Intelligent

    Control

    Regression

    Management

    Bioinformatics

    Time Series Analysis

    Secure

    Communication

    Diagnostics

    Filter Design

    Data Compression

    Classification

    WLM

    Figure 1.1.2: Brief sketch of machine learning and soft computing.

    The learning machines addressed in this book include Artificial Neural Networks (ANNs), Generalized Radial Basis Function Networks (GRBFNs), Fuzzy Neural Networks (FNNs), Support Vector Machines (SVMs), and Wilcoxon Learning Machines (WLMs).  More emphasis is put on SVMs and WLMs.  In statistical terms, the aforementioned learning machines are nonparametric in the sense that they do not make any assumptions of the functional form, e.g., linearity, of the discriminant or predictive functions.  This provides a great deal of flexibility in designing an appropriate learning machine for the problem at hand.  In our view, SVM theory cleverly combines the convex optimization from nonlinear optimization theory, kernel representation from functional analysis, and distribution-free generalization error bounds from statistical learning theory.  The WLMs were recently developed by extending the R-estimators frequently used in robust regression paradigm to nonparametric learning machines for nonlinear learning problems.  We firmly believe that WLMs will provide promising alternatives for many machine learning problems.  The powerful Evolutionary Computation (EC) techniques addressed in this book include the Genetic Algorithm (GA) and the Particle Swarm Optimization (PSO).

    Our basic belief in machine learning is that we believe there is a process that explains the data we observe. Though we do not know the details of the process underlying the generation of data, we know that it is not completely random.  See Alpaydin (2010).

    What is a machine learning problem?  The goal of machine learning is to find a general rule that explains experimental data given only a sample of limited size.  There are three major categories of machine learning, namely supervised learning, unsupervised learning, and reinforcement learning, as shown in Figure 1.1.3.  See Herbrich (2002) and Alpaydin (2010).

    Machine

    Learning

    Supervised

    Learning

    Unsupervised

    Learning

    Reinforcement

    Learning

    Classification Learning

    (Pattern Recognition)

    Function Learning

    (Regression Estimation)

    Preference Learning

    Figure 1.1.3: Main categories of machine learning.

    In the supervised learning problem, we are given a sample of input-output pairs, called training sample.  The task is to find a deterministic function that maps any input to an output such that disagreement with future input-output observations is minimized.

    There are three major types of the supervised learning.  The first type is the classification learning, also called pattern recognition.  The outputs of a classification problem are categorical variables, also called class labels.  Usually, there is no ordering between the classes.  Credit scoring of loan applicants in a bank, classification of handwritten letters and digits, optical character recognition, face recognition, speech recognition, and classification of news in a news agency belong to classification problems.

    The second type of the supervised learning is the function learning, also called regression estimation.  The outputs of a regression problem are continuous variables.  Prediction of the stock market share values, weather forecasting, and navigation of an autonomous car belong to regression problems.

    The third type of the supervised learning is the preference learning.  The outputs of a preference learning problem are ranks in the order space.  One may compare whether two elements are equal or, if not, which one has higher rank of preference.  Arrangement of WEB pages such that the most relevant pages are ranked highest belongs to preference learning problems.

    In the unsupervised learning, we are given a sample of objects without corresponding target values.  The goal is to extract some structure or regularity from the experimental data.  Finding a concise description of the data could be a set of clusters (cluster analysis) sharing some common regularity in each cluster, or a probability density (density estimation) showing the probability of observing an event in the future.  Image and text segmentation, novelty detection in process control, grouping of customers in a company, and alignment in molecular biology belong to unsupervised learning problems.

    In some applications, the output of the system is a sequence of actions.  A single action is not important; what is important is the strategy or policy that is the sequence of correct actions to reach the goal.  In the reinforcement learning, we are given a sample of state-action-reward triples.  The goal is to find a concise description of the data in the form of a strategy or policy (what to do?) that maximizes the expected reward over time.  Usually no optimal action exists in a given intermediate state; an action is good if it is part of a good policy.  In such a case, the learning algorithm should be able to assess the goodness of policies and must identify a sequence of actions, learned from past, so as to maximize the expected reward over time. Playing chess and robot navigation in search of a goal location belong to reinforcement learning problems.  See

    Enjoying the preview?
    Page 1 of 1