Pathways to Machine Learning and Soft Computing: 邁向機器學習與軟計算之路(國際英文版)
By Jyh-Horng Jeng and 鄭志宏
()
About this ebook
This book provides frequently studied and used machines together with soft computing methods such as evolutionary computation. The main topics of the machine learning cover Artificial Neural Networks (ANNs), Radial Basis Function Networks (RBFNs), Fuzzy Neural Networks (FNNs), Support Vector Machines (SVMs), and Wilcoxon Learning Machines (WLMs)
Related to Pathways to Machine Learning and Soft Computing
Related ebooks
Machine Learning in Python: Essential Techniques for Predictive Analysis Rating: 4 out of 5 stars4/5Optimization in Function Spaces Rating: 0 out of 5 stars0 ratingsIdentification of Physical Systems: Applications to Condition Monitoring, Fault Diagnosis, Soft Sensor and Controller Design Rating: 0 out of 5 stars0 ratingsDesigning Machine Learning Systems with Python Rating: 0 out of 5 stars0 ratingsIntroduction to Stochastic Control Theory Rating: 0 out of 5 stars0 ratingsCore Concepts in Statistical Learning Rating: 0 out of 5 stars0 ratingsAdvanced Dynamic-System Simulation: Model Replication and Monte Carlo Studies Rating: 0 out of 5 stars0 ratingsGARCH Models: Structure, Statistical Inference and Financial Applications Rating: 5 out of 5 stars5/5Machine Learning Unraveled: Exploring the World of Data Science and AI Rating: 0 out of 5 stars0 ratings15 Math Concepts Every Data Scientist Should Know: Understand and learn how to apply the math behind data science algorithms Rating: 0 out of 5 stars0 ratingsThe Data Science Workshop: A New, Interactive Approach to Learning Data Science Rating: 0 out of 5 stars0 ratingsMarkov Decision Processes: Discrete Stochastic Dynamic Programming Rating: 4 out of 5 stars4/5Economic Analysis of the Digital Economy Rating: 0 out of 5 stars0 ratingsInterpolation and Extrapolation Optimal Designs 2: Finite Dimensional General Models Rating: 0 out of 5 stars0 ratingsAdvanced Kalman Filtering, Least-Squares and Modeling: A Practical Handbook Rating: 0 out of 5 stars0 ratingsLearning Quantitative Finance with R Rating: 4 out of 5 stars4/5AI and ML for Coders: AI Fundamentals Rating: 0 out of 5 stars0 ratingsNeural Network Programming: How To Create Modern AI Systems With Python, Tensorflow, And Keras Rating: 0 out of 5 stars0 ratingsForecasting Models – an Overview With The Help Of R Software Rating: 0 out of 5 stars0 ratingsDATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide) Rating: 0 out of 5 stars0 ratingsRobust Adaptive Control Rating: 0 out of 5 stars0 ratingsData Science with Jupyter: Master Data Science skills with easy-to-follow Python examples Rating: 0 out of 5 stars0 ratingsData Empowerment: Harnessing Advanced Mathematical and Statistical Methods for Data Science and Machine Learning Rating: 0 out of 5 stars0 ratingsAlgorithms for Minimization Without Derivatives Rating: 0 out of 5 stars0 ratingsGrowth Curve Modeling: Theory and Applications Rating: 0 out of 5 stars0 ratings
Applications & Software For You
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/580 Ways to Use ChatGPT in the Classroom Rating: 5 out of 5 stars5/5Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer. Rating: 5 out of 5 stars5/5Microsoft Word Guide for Success: Achieve Efficiency and Professional Results in Every Document [IV EDITION] Rating: 5 out of 5 stars5/5GarageBand For Dummies Rating: 5 out of 5 stars5/5Logic Pro X For Dummies Rating: 0 out of 5 stars0 ratingsAnimation for Beginners: Getting Started with Animation Filmmaking Rating: 4 out of 5 stars4/52025 Procreate and Procreate Dreams: A Complete Guide to Digital Art and Animation on iPad Rating: 0 out of 5 stars0 ratingsCanva Tips and Tricks Beyond The Limits Rating: 3 out of 5 stars3/5The Designer’s Guide to Figma: Master Prototyping, Collaboration, Handoff, and Workflow Rating: 3 out of 5 stars3/5Tableau Your Data!: Fast and Easy Visual Analysis with Tableau Software Rating: 4 out of 5 stars4/5The Basics of User Experience Design by Interaction Design Foundation Rating: 4 out of 5 stars4/5Master In YouTube - How I Run 12+ Different Profitable YouTube Channels and Make 7 Figures From Them ! Rating: 0 out of 5 stars0 ratingsThe Beginner's Guide to Procreate Dreams: How to Create and Animate Your Stories on the iPad Rating: 0 out of 5 stars0 ratingsSmartphone Photography Rating: 0 out of 5 stars0 ratingsDesign for Hackers: Reverse Engineering Beauty Rating: 4 out of 5 stars4/5Blender All-in-One For Dummies Rating: 0 out of 5 stars0 ratingsMastering YouTube Automation: The Ultimate Guide to Creating a Successful Faceless Channel Rating: 0 out of 5 stars0 ratings2022 Adobe® Premiere Pro Guide For Filmmakers and YouTubers Rating: 5 out of 5 stars5/5Get Started with Procreate: The 10-Step Guide to Drawing on Your iPad: Contains 20 Project Tutorials Rating: 0 out of 5 stars0 ratingsPhotoshop - Stupid. Simple. Photoshop: A Noobie's Guide to Using Photoshop TODAY Rating: 3 out of 5 stars3/5Adobe Illustrator CC For Dummies Rating: 5 out of 5 stars5/5How To Learn Photoshop Quickly! Rating: 0 out of 5 stars0 ratingsAutoCAD For Dummies Rating: 0 out of 5 stars0 ratings
Reviews for Pathways to Machine Learning and Soft Computing
0 ratings0 reviews
Book preview
Pathways to Machine Learning and Soft Computing - Jyh-Horng Jeng
Pathways to Machine Learning and Soft Computing
Jyh-Horng Jeng, Jer-Guang Hsieh, Yih-Lon Lin, and Ying-Sheng Kuo
About Author
Jer-Guang Hsieh received the Ph.D. degree in electrical engineering from Rensselaer Polytechnic Institute, Troy, New York, U.S.A., in 1985. He was with the Department of Electrical Engineering, National Sun Yat-Sen University, Kaohsiung, Taiwan, from 1985 to 2008. Currently, he is a Chair Professor at the Department of Electrical Engineering, I-Shou University, Kaohsiung, Taiwan. He is also a Chair Professor at the Department of Automatic Control Engineering, Feng Chia University, Taichung, Taiwan. Dr. Hsieh is the recipient of the 1983 Henry J. Nolte Memorial Prize of Rensselaer Polytechnic Institute. He won the Distinguished Teaching Award in 1988 and Best Prize for competition of the microcomputer design package for teaching and research in 1989, both from the Ministry of Education of the Republic of China. He won the Young Engineer Prize from the Chinese Engineers Association in 1994. He is a member of the Phi Tau Phi Scholastic Honor Society of the Republic of China and a violinist of Kaohsiung Chamber Orchestra. His current research interests are in the areas of nonlinear control, machine learning and soft computing, and differential games.
Jyh-Horng Jeng received the B.S. and M.S. degrees in mathematics from Tamkang University, Taiwan, in 1982 and 1984, respectively, and the Ph.D. degree in mathematics (Information Group) from The State University of New York at Buffalo (SUNY, Buffalo) in 1996. He was a Senior Research Engineer at the Chung Shan Institute of Science and Technology (CSIST), Taiwan, from 1984 to 1992. Currently, he is a Professor at the Department of Information Engineering, I-Shou University, Taiwan. His research interests include multimedia applications, AI, soft computing and machine learning.
Yih-Lon Lin received the B.S. degree from the Department of Electronic Engineering, I-Shou University, Kaohsiung, Taiwan, in 1997 and the M.S. and Ph.D. degrees from the Department of Electrical Engineering, National Sun Yat-Sen University, Kaohsiung, Taiwan, in 1999 and 2006, respectively. Currently, he is an Associate Professor at the Department of Information Engineering, I-Shou University. His research interests include neural networks, fuzzy systems, and machine learning.
Ying-Sheng Kuo received the B.S. degree in Mechanical Engineering from Feng Chia University, Taichung, Taiwan, in 1988 and the M.S. and Ph.D. degrees in Mechanical Engineering from National Cheng Kung University, Tainan, Taiwan, in 1991 and 1995, respectively. Currently, he is an Associate Professor at the General Education Center, Open University of Kaohsiung, Kaohsiung, Taiwan. His research interests include machine learning, soft computing, and computational fluid dynamics.
About the Book
This book provides frequently studied and used machines together with soft computing methods such as evolutionary computation. The main topics of the machine learning cover Artificial Neural Networks (ANNs), Radial Basis Function Networks (RBFNs), Fuzzy Neural Networks (FNNs), Support Vector Machines (SVMs), and Wilcoxon Learning Machines (WLMs). The soft computing methods include Genetic Algorithm (GA) and Particle Swarm Optimization (PSO).
The contents are basics of machine learning, including construction of models and derivation of learning algorithms. This book also provides lots of examples, figures, illustrations, tables, exercises, and the solution menu. In addition, the simulated and validated codes written in R are also provided for the user to learn the programming procedure when written in different programming languages. The R codes work correctly on many simulated datasets. So, the readers can verify their own codes by comparison. Reading this book will become strong.
One most important feature of this book is that we provide step by step illustrations for every algorithm, which is referred to as pre-pseudo codes. The pre-pseudo codes arrange complicated algorithms in the forms of mathematical equations, which are ready for programming using any languages. It means that students and engineers can easily implement the algorithms from the pre-pseudo codes even they do not fully understand the underlying ideas. On the other hand, implementing the pre-pseudo codes will help them to understand the ideas.
Brief Sketch of the Contents
The book starts with the introduction to machine learning. More emphasis is put on supervised learning, including classification learning (or pattern recognition) and function learning (or regression estimation). Bias-variance dilemma, which occurs in every machine learning problem, is illustrated through a numerical example.
Since machine learning problems usually involve some finite-dimensional optimization problems, solid background in optimization theory is crucial for sound understanding of the machine learning processes. We will briefly review some fundamental concepts and important results of finite-dimensional optimization theory in Chapter 2.
The true beginning of the mathematical analysis of learning processes started by the proposition of the Rosenblatt’s algorithms for perceptrons, followed by the proposition of the Widrow-Hoff algorithms for Adalines (adaptive linear neurons). To our astonishment, these algorithms have already provided us hints for kernel-based learning machines of classification and regression if we consider the dual forms of their algorithms. The concept of the kernel is the basis of the support vector machines.
Linear classification problems are studied in Chapter 3. A linear classifier can be represented as a single-layer neural network with a hard limiting output activation function. The Rosenblatt’s Perceptron Algorithms for linearly separable training data sets are introduced. Large margin for a linear classification provides the maximum robustness against perturbation. This motivates the introduction of maximal margin classifiers. To allow some misclassifications for linearly inseparable data, we will introduce slack variables for classification problems. Based on this, soft margin classifiers (or linear support vector classifiers) are studied.
Linear regression problems are studied in Chapter 4. A linear regressor can be represented as a single-layer neural network with a linear output activation function. The Widrow-Hoff algorithms, also called the delta learning rules, are derived for finding the least squares solutions. To smooth the predictive functions and to tolerate the errors in corrupted data, we will consider the ridge regression and linear support vector regression.
Three popular and powerful learning machines, i.e., artificial neural networks, generalized radial basis function networks, and fuzzy neural networks are introduced in Chapter 5. All three learning machines can be represented as multi-layer neural networks with a hidden layer, and the activation functions of the hidden nodes are nonlinear and continuously differentiable. Simple back propagation algorithm, which is a direct generalization of the delta learning rule used in Widrow-Hoff algorithm for Adalines, is introduced. The invention of the back propagation learning rules is a major breakthrough in machine learning theory.
At first glance, it might seem strange why we spent so much effort dealing with linear classification and linear regression problems, because our world is truly nonlinear in whatever sense. It will be seen that the commonly used learning machines, including those introduced in Chapter 5, nonlinearly transform in a peculiar way the input vectors to a feature space and perform generalized linear regression in feature space to produce the output vectors. Amazingly, it is rather trivial to go from linear classification and regression to kernel-based nonlinear classification and regression by applying the so called kernel trick
. It simply replaces the inner products by kernels. Such kernel-based approach results in the invention of support vector machines. The idea of a kernel generalizes the standard inner product in the finite-dimensional Euclidean space. The kernels are studied in Chapter 6.
To numerically solve the kernel-based classification and regression problems, we introduce an elegant and powerful sequential minimal optimization technique in Chapter 7.
Every learning problem has some (machine) parameters to be specified in advance. This is the problem of model selection, which is studied in Chapter 8. Two powerful evolutionary computation techniques, i.e., genetic algorithm and particle swarm optimization, are applied for tuning the parameters of support vector machines.
In a broad range of practical applications, data collected inevitably contain one or more atypical observations called outliers; that is, observations that are well separated from the majority or bulk of the data, or in some fashion deviate from the general pattern of the data. As is well known in linear regression theory, classical least squares fit of a regression model can be very adversely influenced by outliers, even by a single one, and often fails to provide a good fit to the bulk of the data. Robust regression that is resistant to the adverse effects of outlying response values offers a half-way house between including outliers and omitting them entirely. Rather than omitting outliers, it dampens their influence on the fitted regression model by down-weighting them. It is desirable that the robust estimates provide a good fit for the majority of the data when the data contain outliers, as well as when the data are free of them. A learning machine is said to be robust if it is not sensitive to outliers in the data.
The newly developed Wilcoxon learning machines will be studied in Chapter 8. They were developed by extending the R-estimators frequently used in robust regression paradigm to nonparametric learning machines for nonlinear learning problems. These machines are based on minimizing the rank-based Wilcoxon norm of total residuals and are quite robust against (or insensitive to) outliers. It is our firm belief that the Wilcoxon approach will provide a promising methodology for many machine learning problems.
Chapter 1 Introduction
1.1 What is Machine Learning?
An important task in almost all science and engineering is fitting models to data. The first step in mathematical modeling of a system under consideration is to use the first principles, e.g., Newton’s laws in mechanics, Kirchhoff’s laws in lumped electric circuits, or various laws in thermodynamics. As the system becomes increasingly complex, it is more and more unlikely to obtain a precise description of the system in quantitative terms. What we desire in practice is a reasonable yet tractable model. It may also happen that there is no analytic model for the system under consideration. This is particularly true in social science problems. However, in many real situations, we do have some experimental data (or observational data), either from measurement or data collection by some means. This raises the necessity of a theory concerning the learning from examples, i.e., obtaining a good mathematical model from experimental data. This is what machine learning all about.
Machine learning can be embedded in the broader context of knowledge discovery in databases (KDD), originated in computer science. See Hand, Mannila, and Smyth (2001) and Kantardzic (2003). The entire process of KDD is interactive, which is shown in Figure 1.1.1. The machine learning belongs to the fourth component of KDD. Application of machine learning methods to large databases is called data mining.
Problem Statement
Data Preprocessing
Extraction of
Relationships or Patterns
Interpretation and Assessment of
Discovered Structures
Selection of Target Data
Figure 1.1.1: Process of knowledge discovery in databases.
Our view of machine learning and soft computing is shown in Figure 1.1.2. The items inside the circle represent some commonly used learning machines, those outside the circle represent various tools necessary for solving machine learning problems, and those inside the rectangle denote some possible applications of machine learning and soft computing.
ANN
FNN
CNN
GRBFNNN
SVM
GA
PSO
Numerical
Optimization
Approximation
Theory
Statistical
Learning
Linear
Algebra
Probability
Chaos
Machine Learning & Soft Computing
Intelligent
Control
Regression
Management
Bioinformatics
Time Series Analysis
Secure
Communication
Diagnostics
Filter Design
Data Compression
Classification
WLM
Figure 1.1.2: Brief sketch of machine learning and soft computing.
The learning machines addressed in this book include Artificial Neural Networks (ANNs), Generalized Radial Basis Function Networks (GRBFNs), Fuzzy Neural Networks (FNNs), Support Vector Machines (SVMs), and Wilcoxon Learning Machines (WLMs). More emphasis is put on SVMs and WLMs. In statistical terms, the aforementioned learning machines are nonparametric in the sense that they do not make any assumptions of the functional form, e.g., linearity, of the discriminant or predictive functions. This provides a great deal of flexibility in designing an appropriate learning machine for the problem at hand. In our view, SVM theory cleverly combines the convex optimization from nonlinear optimization theory, kernel representation from functional analysis, and distribution-free generalization error bounds from statistical learning theory. The WLMs were recently developed by extending the R-estimators frequently used in robust regression paradigm to nonparametric learning machines for nonlinear learning problems. We firmly believe that WLMs will provide promising alternatives for many machine learning problems. The powerful Evolutionary Computation (EC) techniques addressed in this book include the Genetic Algorithm (GA) and the Particle Swarm Optimization (PSO).
Our basic belief in machine learning is that we believe there is a process that explains the data we observe. Though we do not know the details of the process underlying the generation of data, we know that it is not completely random. See Alpaydin (2010).
What is a machine learning problem? The goal of machine learning is to find a general rule that explains experimental data given only a sample of limited size. There are three major categories of machine learning, namely supervised learning, unsupervised learning, and reinforcement learning, as shown in Figure 1.1.3. See Herbrich (2002) and Alpaydin (2010).
Machine
Learning
Supervised
Learning
Unsupervised
Learning
Reinforcement
Learning
Classification Learning
(Pattern Recognition)
Function Learning
(Regression Estimation)
Preference Learning
Figure 1.1.3: Main categories of machine learning.
In the supervised learning problem, we are given a sample of input-output pairs, called training sample. The task is to find a deterministic function that maps any input to an output such that disagreement with future input-output observations is minimized.
There are three major types of the supervised learning. The first type is the classification learning, also called pattern recognition. The outputs of a classification problem are categorical variables, also called class labels. Usually, there is no ordering between the classes. Credit scoring of loan applicants in a bank, classification of handwritten letters and digits, optical character recognition, face recognition, speech recognition, and classification of news in a news agency belong to classification problems.
The second type of the supervised learning is the function learning, also called regression estimation. The outputs of a regression problem are continuous variables. Prediction of the stock market share values, weather forecasting, and navigation of an autonomous car belong to regression problems.
The third type of the supervised learning is the preference learning. The outputs of a preference learning problem are ranks in the order space. One may compare whether two elements are equal or, if not, which one has higher rank of preference. Arrangement of WEB pages such that the most relevant pages are ranked highest belongs to preference learning problems.
In the unsupervised learning, we are given a sample of objects without corresponding target values. The goal is to extract some structure or regularity from the experimental data. Finding a concise description of the data could be a set of clusters (cluster analysis) sharing some common regularity in each cluster, or a probability density (density estimation) showing the probability of observing an event in the future. Image and text segmentation, novelty detection in process control, grouping of customers in a company, and alignment in molecular biology belong to unsupervised learning problems.
In some applications, the output of the system is a sequence of actions. A single action is not important; what is important is the strategy or policy that is the sequence of correct actions to reach the goal. In the reinforcement learning, we are given a sample of state-action-reward triples. The goal is to find a concise description of the data in the form of a strategy or policy (what to do?) that maximizes the expected reward over time. Usually no optimal action exists in a given intermediate state; an action is good if it is part of a good policy. In such a case, the learning algorithm should be able to assess the goodness of policies and must identify a sequence of actions, learned from past, so as to maximize the expected reward over time. Playing chess and robot navigation in search of a goal location belong to reinforcement learning problems. See