Pathways to Machine Learning and Soft Computing: 邁向機器學習與軟計算之路(國際英文版)
By Jyh-Horng Jeng and 鄭志宏
()
About this ebook
This book provides frequently studied and used machines together with soft computing methods such as evolutionary computation. The main topics of the machine learning cover Artificial Neural Networks (ANNs), Radial Basis Function Networks (RBFNs), Fuzzy Neural Networks (FNNs), Support Vector Machines (SVMs), and Wilcoxon Learning Machines (WLMs)
Related to Pathways to Machine Learning and Soft Computing
Related ebooks
Probability and Random Processes: With Applications to Signal Processing and Communications Rating: 4 out of 5 stars4/5Neural Networks and Fuzzy Logic Rating: 0 out of 5 stars0 ratingsSignals and Systems using MATLAB Rating: 0 out of 5 stars0 ratingsEngineering Simulation and its Applications: Algorithms and Numerical Methods Rating: 0 out of 5 stars0 ratingsSemi-empirical Neural Network Modeling and Digital Twins Development Rating: 0 out of 5 stars0 ratingsInterval Finite Element Method with MATLAB Rating: 0 out of 5 stars0 ratingsSupport Vector Machine: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsComputational Methods in Engineering Rating: 1 out of 5 stars1/5Integer Optimization and its Computation in Emergency Management Rating: 0 out of 5 stars0 ratingsQuantum Machine Learning: What Quantum Computing Means to Data Mining Rating: 0 out of 5 stars0 ratingsRecent Advances in Learning Automata Rating: 0 out of 5 stars0 ratingsEssentials of the Finite Element Method: For Mechanical and Structural Engineers Rating: 3 out of 5 stars3/5Neural Network Modeling and Identification of Dynamical Systems Rating: 0 out of 5 stars0 ratingsState Space Systems With Time-Delays Analysis, Identification, and Applications Rating: 0 out of 5 stars0 ratingsDeep Learning and Parallel Computing Environment for Bioengineering Systems Rating: 0 out of 5 stars0 ratingsMatrix Operations for Engineers and Scientists: An Essential Guide in Linear Algebra Rating: 0 out of 5 stars0 ratingsA Modern Introduction to Differential Equations Rating: 0 out of 5 stars0 ratingsEvolutionary Algorithms and Neural Networks: Theory and Applications Rating: 0 out of 5 stars0 ratingsElasticity: Theory, Applications, and Numerics Rating: 0 out of 5 stars0 ratingsDiscrete Structure and Automata Theory for Learners: Learn Discrete Structure Concepts and Automata Theory with JFLAP Rating: 0 out of 5 stars0 ratingsLow-Rank Models in Visual Analysis: Theories, Algorithms, and Applications Rating: 0 out of 5 stars0 ratingsHandbook of Regression Analysis Rating: 0 out of 5 stars0 ratingsPerceptrons: Fundamentals and Applications for The Neural Building Block Rating: 0 out of 5 stars0 ratingsNeural-Based Orthogonal Data Fitting: The EXIN Neural Networks Rating: 0 out of 5 stars0 ratingsExploring the World of Data Science and Machine Learning Rating: 0 out of 5 stars0 ratingsExperiments and Modeling in Cognitive Science: MATLAB, SPSS, Excel and E-Prime Rating: 0 out of 5 stars0 ratingsHybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models Rating: 0 out of 5 stars0 ratingsArtificial Intelligence for Future Generation Robotics Rating: 5 out of 5 stars5/5Digital Electronics for Beginners: 1, #1 Rating: 0 out of 5 stars0 ratings
Applications & Software For You
Adobe Photoshop: A Complete Course and Compendium of Features Rating: 5 out of 5 stars5/5Blender 3D Basics Beginner's Guide Second Edition Rating: 5 out of 5 stars5/5iPhone Photography For Dummies Rating: 0 out of 5 stars0 ratingsAdobe Illustrator: A Complete Course and Compendium of Features Rating: 0 out of 5 stars0 ratingsThe Best Hacking Tricks for Beginners Rating: 4 out of 5 stars4/5How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally Rating: 4 out of 5 stars4/5Adobe Premiere Pro: A Complete Course and Compendium of Features Rating: 0 out of 5 stars0 ratings2022 Adobe® Premiere Pro Guide For Filmmakers and YouTubers Rating: 5 out of 5 stars5/5Kodi User Manual: Watch Unlimited Movies & TV shows for free on Your PC, Mac or Android Devices Rating: 0 out of 5 stars0 ratingsLearn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer. Rating: 5 out of 5 stars5/5FL Studio Cookbook Rating: 4 out of 5 stars4/5iPhone Photography: A Ridiculously Simple Guide To Taking Photos With Your iPhone Rating: 0 out of 5 stars0 ratingsLogic Pro X For Dummies Rating: 0 out of 5 stars0 ratingsAdobe InDesign CC: A Complete Course and Compendium of Features Rating: 0 out of 5 stars0 ratingsMastering ChatGPT Rating: 0 out of 5 stars0 ratingsHilarious Jokes for Minecrafters: Mobs, Creepers, Skeletons, and More Rating: 1 out of 5 stars1/5Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/5YouTube Channels For Dummies Rating: 3 out of 5 stars3/5Affinity Photo How To Rating: 0 out of 5 stars0 ratingsGarageBand For Dummies Rating: 5 out of 5 stars5/5Vocal Rescue: Rediscover the Beauty, Power and Freedom in Your Singing Rating: 4 out of 5 stars4/5Six Figure Blogging In 3 Months Rating: 4 out of 5 stars4/5Sound Design for Filmmakers: Film School Sound Rating: 5 out of 5 stars5/5GarageBand Basics: The Complete Guide to GarageBand: Music Rating: 0 out of 5 stars0 ratingsiPhone X Hacks, Tips and Tricks: Discover 101 Awesome Tips and Tricks for iPhone XS, XS Max and iPhone X Rating: 3 out of 5 stars3/5Canon EOS Rebel T3/1100D For Dummies Rating: 5 out of 5 stars5/5Mastering QuickBooks 2020: The ultimate guide to bookkeeping and QuickBooks Online Rating: 0 out of 5 stars0 ratings
Reviews for Pathways to Machine Learning and Soft Computing
0 ratings0 reviews
Book preview
Pathways to Machine Learning and Soft Computing - Jyh-Horng Jeng
Pathways to Machine Learning and Soft Computing
Jyh-Horng Jeng, Jer-Guang Hsieh, Yih-Lon Lin, and Ying-Sheng Kuo
About Author
Jer-Guang Hsieh received the Ph.D. degree in electrical engineering from Rensselaer Polytechnic Institute, Troy, New York, U.S.A., in 1985. He was with the Department of Electrical Engineering, National Sun Yat-Sen University, Kaohsiung, Taiwan, from 1985 to 2008. Currently, he is a Chair Professor at the Department of Electrical Engineering, I-Shou University, Kaohsiung, Taiwan. He is also a Chair Professor at the Department of Automatic Control Engineering, Feng Chia University, Taichung, Taiwan. Dr. Hsieh is the recipient of the 1983 Henry J. Nolte Memorial Prize of Rensselaer Polytechnic Institute. He won the Distinguished Teaching Award in 1988 and Best Prize for competition of the microcomputer design package for teaching and research in 1989, both from the Ministry of Education of the Republic of China. He won the Young Engineer Prize from the Chinese Engineers Association in 1994. He is a member of the Phi Tau Phi Scholastic Honor Society of the Republic of China and a violinist of Kaohsiung Chamber Orchestra. His current research interests are in the areas of nonlinear control, machine learning and soft computing, and differential games.
Jyh-Horng Jeng received the B.S. and M.S. degrees in mathematics from Tamkang University, Taiwan, in 1982 and 1984, respectively, and the Ph.D. degree in mathematics (Information Group) from The State University of New York at Buffalo (SUNY, Buffalo) in 1996. He was a Senior Research Engineer at the Chung Shan Institute of Science and Technology (CSIST), Taiwan, from 1984 to 1992. Currently, he is a Professor at the Department of Information Engineering, I-Shou University, Taiwan. His research interests include multimedia applications, AI, soft computing and machine learning.
Yih-Lon Lin received the B.S. degree from the Department of Electronic Engineering, I-Shou University, Kaohsiung, Taiwan, in 1997 and the M.S. and Ph.D. degrees from the Department of Electrical Engineering, National Sun Yat-Sen University, Kaohsiung, Taiwan, in 1999 and 2006, respectively. Currently, he is an Associate Professor at the Department of Information Engineering, I-Shou University. His research interests include neural networks, fuzzy systems, and machine learning.
Ying-Sheng Kuo received the B.S. degree in Mechanical Engineering from Feng Chia University, Taichung, Taiwan, in 1988 and the M.S. and Ph.D. degrees in Mechanical Engineering from National Cheng Kung University, Tainan, Taiwan, in 1991 and 1995, respectively. Currently, he is an Associate Professor at the General Education Center, Open University of Kaohsiung, Kaohsiung, Taiwan. His research interests include machine learning, soft computing, and computational fluid dynamics.
About the Book
This book provides frequently studied and used machines together with soft computing methods such as evolutionary computation. The main topics of the machine learning cover Artificial Neural Networks (ANNs), Radial Basis Function Networks (RBFNs), Fuzzy Neural Networks (FNNs), Support Vector Machines (SVMs), and Wilcoxon Learning Machines (WLMs). The soft computing methods include Genetic Algorithm (GA) and Particle Swarm Optimization (PSO).
The contents are basics of machine learning, including construction of models and derivation of learning algorithms. This book also provides lots of examples, figures, illustrations, tables, exercises, and the solution menu. In addition, the simulated and validated codes written in R are also provided for the user to learn the programming procedure when written in different programming languages. The R codes work correctly on many simulated datasets. So, the readers can verify their own codes by comparison. Reading this book will become strong.
One most important feature of this book is that we provide step by step illustrations for every algorithm, which is referred to as pre-pseudo codes. The pre-pseudo codes arrange complicated algorithms in the forms of mathematical equations, which are ready for programming using any languages. It means that students and engineers can easily implement the algorithms from the pre-pseudo codes even they do not fully understand the underlying ideas. On the other hand, implementing the pre-pseudo codes will help them to understand the ideas.
Brief Sketch of the Contents
The book starts with the introduction to machine learning. More emphasis is put on supervised learning, including classification learning (or pattern recognition) and function learning (or regression estimation). Bias-variance dilemma, which occurs in every machine learning problem, is illustrated through a numerical example.
Since machine learning problems usually involve some finite-dimensional optimization problems, solid background in optimization theory is crucial for sound understanding of the machine learning processes. We will briefly review some fundamental concepts and important results of finite-dimensional optimization theory in Chapter 2.
The true beginning of the mathematical analysis of learning processes started by the proposition of the Rosenblatt’s algorithms for perceptrons, followed by the proposition of the Widrow-Hoff algorithms for Adalines (adaptive linear neurons). To our astonishment, these algorithms have already provided us hints for kernel-based learning machines of classification and regression if we consider the dual forms of their algorithms. The concept of the kernel is the basis of the support vector machines.
Linear classification problems are studied in Chapter 3. A linear classifier can be represented as a single-layer neural network with a hard limiting output activation function. The Rosenblatt’s Perceptron Algorithms for linearly separable training data sets are introduced. Large margin for a linear classification provides the maximum robustness against perturbation. This motivates the introduction of maximal margin classifiers. To allow some misclassifications for linearly inseparable data, we will introduce slack variables for classification problems. Based on this, soft margin classifiers (or linear support vector classifiers) are studied.
Linear regression problems are studied in Chapter 4. A linear regressor can be represented as a single-layer neural network with a linear output activation function. The Widrow-Hoff algorithms, also called the delta learning rules, are derived for finding the least squares solutions. To smooth the predictive functions and to tolerate the errors in corrupted data, we will consider the ridge regression and linear support vector regression.
Three popular and powerful learning machines, i.e., artificial neural networks, generalized radial basis function networks, and fuzzy neural networks are introduced in Chapter 5. All three learning machines can be represented as multi-layer neural networks with a hidden layer, and the activation functions of the hidden nodes are nonlinear and continuously differentiable. Simple back propagation algorithm, which is a direct generalization of the delta learning rule used in Widrow-Hoff algorithm for Adalines, is introduced. The invention of the back propagation learning rules is a major breakthrough in machine learning theory.
At first glance, it might seem strange why we spent so much effort dealing with linear classification and linear regression problems, because our world is truly nonlinear in whatever sense. It will be seen that the commonly used learning machines, including those introduced in Chapter 5, nonlinearly transform in a peculiar way the input vectors to a feature space and perform generalized linear regression in feature space to produce the output vectors. Amazingly, it is rather trivial to go from linear classification and regression to kernel-based nonlinear classification and regression by applying the so called kernel trick
. It simply replaces the inner products by kernels. Such kernel-based approach results in the invention of support vector machines. The idea of a kernel generalizes the standard inner product in the finite-dimensional Euclidean space. The kernels are studied in Chapter 6.
To numerically solve the kernel-based classification and regression problems, we introduce an elegant and powerful sequential minimal optimization technique in Chapter 7.
Every learning problem has some (machine) parameters to be specified in advance. This is the problem of model selection, which is studied in Chapter 8. Two powerful evolutionary computation techniques, i.e., genetic algorithm and particle swarm optimization, are applied for tuning the parameters of support vector machines.
In a broad range of practical applications, data collected inevitably contain one or more atypical observations called outliers; that is, observations that are well separated from the majority or bulk of the data, or in some fashion deviate from the general pattern of the data. As is well known in linear regression theory, classical least squares fit of a regression model can be very adversely influenced by outliers, even by a single one, and often fails to provide a good fit to the bulk of the data. Robust regression that is resistant to the adverse effects of outlying response values offers a half-way house between including outliers and omitting them entirely. Rather than omitting outliers, it dampens their influence on the fitted regression model by down-weighting them. It is desirable that the robust estimates provide a good fit for the majority of the data when the data contain outliers, as well as when the data are free of them. A learning machine is said to be robust if it is not sensitive to outliers in the data.
The newly developed Wilcoxon learning machines will be studied in Chapter 8. They were developed by extending the R-estimators frequently used in robust regression paradigm to nonparametric learning machines for nonlinear learning problems. These machines are based on minimizing the rank-based Wilcoxon norm of total residuals and are quite robust against (or insensitive to) outliers. It is our firm belief that the Wilcoxon approach will provide a promising methodology for many machine learning problems.
Chapter 1 Introduction
1.1 What is Machine Learning?
An important task in almost all science and engineering is fitting models to data. The first step in mathematical modeling of a system under consideration is to use the first principles, e.g., Newton’s laws in mechanics, Kirchhoff’s laws in lumped electric circuits, or various laws in thermodynamics. As the system becomes increasingly complex, it is more and more unlikely to obtain a precise description of the system in quantitative terms. What we desire in practice is a reasonable yet tractable model. It may also happen that there is no analytic model for the system under consideration. This is particularly true in social science problems. However, in many real situations, we do have some experimental data (or observational data), either from measurement or data collection by some means. This raises the necessity of a theory concerning the learning from examples, i.e., obtaining a good mathematical model from experimental data. This is what machine learning all about.
Machine learning can be embedded in the broader context of knowledge discovery in databases (KDD), originated in computer science. See Hand, Mannila, and Smyth (2001) and Kantardzic (2003). The entire process of KDD is interactive, which is shown in Figure 1.1.1. The machine learning belongs to the fourth component of KDD. Application of machine learning methods to large databases is called data mining.
Problem Statement
Data Preprocessing
Extraction of
Relationships or Patterns
Interpretation and Assessment of
Discovered Structures
Selection of Target Data
Figure 1.1.1: Process of knowledge discovery in databases.
Our view of machine learning and soft computing is shown in Figure 1.1.2. The items inside the circle represent some commonly used learning machines, those outside the circle represent various tools necessary for solving machine learning problems, and those inside the rectangle denote some possible applications of machine learning and soft computing.
ANN
FNN
CNN
GRBFNNN
SVM
GA
PSO
Numerical
Optimization
Approximation
Theory
Statistical
Learning
Linear
Algebra
Probability
Chaos
Machine Learning & Soft Computing
Intelligent
Control
Regression
Management
Bioinformatics
Time Series Analysis
Secure
Communication
Diagnostics
Filter Design
Data Compression
Classification
WLM
Figure 1.1.2: Brief sketch of machine learning and soft computing.
The learning machines addressed in this book include Artificial Neural Networks (ANNs), Generalized Radial Basis Function Networks (GRBFNs), Fuzzy Neural Networks (FNNs), Support Vector Machines (SVMs), and Wilcoxon Learning Machines (WLMs). More emphasis is put on SVMs and WLMs. In statistical terms, the aforementioned learning machines are nonparametric in the sense that they do not make any assumptions of the functional form, e.g., linearity, of the discriminant or predictive functions. This provides a great deal of flexibility in designing an appropriate learning machine for the problem at hand. In our view, SVM theory cleverly combines the convex optimization from nonlinear optimization theory, kernel representation from functional analysis, and distribution-free generalization error bounds from statistical learning theory. The WLMs were recently developed by extending the R-estimators frequently used in robust regression paradigm to nonparametric learning machines for nonlinear learning problems. We firmly believe that WLMs will provide promising alternatives for many machine learning problems. The powerful Evolutionary Computation (EC) techniques addressed in this book include the Genetic Algorithm (GA) and the Particle Swarm Optimization (PSO).
Our basic belief in machine learning is that we believe there is a process that explains the data we observe. Though we do not know the details of the process underlying the generation of data, we know that it is not completely random. See Alpaydin (2010).
What is a machine learning problem? The goal of machine learning is to find a general rule that explains experimental data given only a sample of limited size. There are three major categories of machine learning, namely supervised learning, unsupervised learning, and reinforcement learning, as shown in Figure 1.1.3. See Herbrich (2002) and Alpaydin (2010).
Machine
Learning
Supervised
Learning
Unsupervised
Learning
Reinforcement
Learning
Classification Learning
(Pattern Recognition)
Function Learning
(Regression Estimation)
Preference Learning
Figure 1.1.3: Main categories of machine learning.
In the supervised learning problem, we are given a sample of input-output pairs, called training sample. The task is to find a deterministic function that maps any input to an output such that disagreement with future input-output observations is minimized.
There are three major types of the supervised learning. The first type is the classification learning, also called pattern recognition. The outputs of a classification problem are categorical variables, also called class labels. Usually, there is no ordering between the classes. Credit scoring of loan applicants in a bank, classification of handwritten letters and digits, optical character recognition, face recognition, speech recognition, and classification of news in a news agency belong to classification problems.
The second type of the supervised learning is the function learning, also called regression estimation. The outputs of a regression problem are continuous variables. Prediction of the stock market share values, weather forecasting, and navigation of an autonomous car belong to regression problems.
The third type of the supervised learning is the preference learning. The outputs of a preference learning problem are ranks in the order space. One may compare whether two elements are equal or, if not, which one has higher rank of preference. Arrangement of WEB pages such that the most relevant pages are ranked highest belongs to preference learning problems.
In the unsupervised learning, we are given a sample of objects without corresponding target values. The goal is to extract some structure or regularity from the experimental data. Finding a concise description of the data could be a set of clusters (cluster analysis) sharing some common regularity in each cluster, or a probability density (density estimation) showing the probability of observing an event in the future. Image and text segmentation, novelty detection in process control, grouping of customers in a company, and alignment in molecular biology belong to unsupervised learning problems.
In some applications, the output of the system is a sequence of actions. A single action is not important; what is important is the strategy or policy that is the sequence of correct actions to reach the goal. In the reinforcement learning, we are given a sample of state-action-reward triples. The goal is to find a concise description of the data in the form of a strategy or policy (what to do?) that maximizes the expected reward over time. Usually no optimal action exists in a given intermediate state; an action is good if it is part of a good policy. In such a case, the learning algorithm should be able to assess the goodness of policies and must identify a sequence of actions, learned from past, so as to maximize the expected reward over time. Playing chess and robot navigation in search of a goal location belong to reinforcement learning problems. See