0% found this document useful (0 votes)
45 views11 pages

Teaching Classical Machine Learning As A Graduate-Level Course in Chemical Engineering: An Algorithmic Approach

The demand for engineering graduates with technical skills in data science, machine learning (ML), and artificial intelligence (AI) is now growing. Chemical engineering (ChemE) departments around the world are currently addressing this skills gap by instituting AI or ML elective courses in their program. However, designing such a course is difficult since the issue of which ML models to teach and the depth of theory to be discussed remains unclear. In this paper, we present a graduate-level ML

Uploaded by

danielaarteagat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views11 pages

Teaching Classical Machine Learning As A Graduate-Level Course in Chemical Engineering: An Algorithmic Approach

The demand for engineering graduates with technical skills in data science, machine learning (ML), and artificial intelligence (AI) is now growing. Chemical engineering (ChemE) departments around the world are currently addressing this skills gap by instituting AI or ML elective courses in their program. However, designing such a course is difficult since the issue of which ML models to teach and the depth of theory to be discussed remains unclear. In this paper, we present a graduate-level ML

Uploaded by

danielaarteagat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Digital Chemical Engineering 11 (2024) 100163

Contents lists available at ScienceDirect

Digital Chemical Engineering


journal homepage: www.elsevier.com/locate/dche

Original Article

Teaching classical machine learning as a graduate-level course in chemical


engineering: An algorithmic approach
Karl Ezra Pilario
Process Systems Engineering Laboratory, Department of Chemical Engineering, University of the Philippines, Diliman, Quezon City, 1101, Philippines

A R T I C L E I N F O A B S T R A C T

Keywords: The demand for engineering graduates with technical skills in data science, machine learning (ML), and artificial
Chemical engineering education intelligence (AI) is now growing. Chemical engineering (ChemE) departments around the world are currently
Machine learning addressing this skills gap by instituting AI or ML elective courses in their program. However, designing such a
Data science
course is difficult since the issue of which ML models to teach and the depth of theory to be discussed remains
Artificial intelligence
unclear. In this paper, we present a graduate-level ML course particularly designed such that students will be able
Bridging topics
Roadmap to apply ML for research in ChemE. To achieve this, the course intends to cover a wide selection of ML models
with emphasis on their motivations, derivations, and training algorithms, followed by their applications to
ChemE-related data sets. We argue that this algorithmic approach to teaching ML can help broaden the capa­
bilities of students since they can judge for themselves which tool to use when, even for problems outside the
process industries, or they can modify the methods to test novel ideas. We found that students remain engaged in
the mathematical details as long as every topic is properly motivated and the gaps in the required statistical and
computer science concepts are filled. Hence, this paper also presents a roadmap of ML topics, their motivations,
and bridging topics that can be followed by instructors. Lastly, we report anonymized student feedback on this
course which is being offered at the Department of Chemical Engineering, University of the Philippines, Diliman.

1. Introduction typically used for system identification, but other ML methods such as
kernel machines are also being developed (Pilario et al., 2021). Aside
The pace of research in machine learning (ML) and artificial intel­ from MPC, reinforcement learning also has potential for the control and
ligence (AI) applications in chemical engineering is accelerating (Beck optimization of highly complex processes (Pan et al., 2021). In engi­
et al., 2016). A quick search in Scopus reveals that documents with the neering design, the search for optimal designs of processes, products,
words “chemical engineering” AND (“machine learning” or “artificial catalysts, adsorbents, etc. is now being made more efficient thanks to
intelligence” or “data”) only averaged 50 documents per year from 1970 data-driven optimization. Data-driven optimization uses surrogate
to 2000, but spiked to an average of 200+ documents per year after the models built from ML regression models to greatly reduce the effort in
year 2000. There are even more papers than this number since not all sampling the objective function (van de Berg et al., 2022). Related to this
chemical engineering ML papers mentioned these keywords per se. is the area of soft sensors, where ML models are trained to estimate
Many researchers refer to today’s age as the Fourth Industrial Revolu­ hard-to-measure variables in the process such as chemical compositions
tion, or Industry 4.0, where data-driven or hybrid solutions show more (Jiang et al., 2021) or flow regimes (Roxas et al., 2022; Khan et al.,
promise in solving industrial problems than purely physics-driven or 2024). Recent surveys have mentioned a lot more applications of ML and
knowledge-driven ones (Reis & Gins, 2017; Qin & Chiang, 2019). AI in process systems engineering that are too many to mention here (Ge
In the area of predictive maintenance, fault detection, and fault et al., 2017; Lee et al., 2018; Mowbray et al., 2022; Daoutidis et al.,
diagnosis, ML and deep learning tools are now used to address industrial 2024). In essence, advances in AI and ML research have now made their
faults more timely and accurately (Zhang et al., 2019; Jang et al., 2024). way into the chemical engineering domain, including industries such as
In the area of process control, process modeling via recurrent neural nets energy, environment, bioprocess, and pharmaceutical.
(RNNs) is now common for tracking desired trajectories more efficiently Due to these advancements, ChemE programs around the world are
through model predictive control (MPC) (Wu et al., 2019). RNNs are now designing AI and ML courses for their students. These courses are

E-mail address: [email protected].

https://doi.org/10.1016/j.dche.2024.100163
Received 18 March 2024; Received in revised form 29 May 2024; Accepted 30 May 2024
Available online 31 May 2024
2772-5081/© 2024 The Author(s). Published by Elsevier Ltd on behalf of Institution of Chemical Engineers (IChemE). This is an open access article under the CC
BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
K.E. Pilario Digital Chemical Engineering 11 (2024) 100163

important either for the student’s personal career growth in the in­ details of ML models.
dustries or for their research. One important ML modification would be physics-informed ML,
In doing research, several benefits can be realized. Students can use which is now a growing research area that aims to improve the
AI and ML skills to test novel ways to hybridize data-driven models with extrapolation ability and explainability of data-driven models for pre­
chemical engineering principles, leading to significant discoveries and dicting physical phenomena (Bikmukhametov & Jäschke, 2020; Rai &
better predictive models. In addition, democratizing the knowledge of Sahu, 2020). Indeed, the knowledge of ML theory is required to effec­
AI and ML skills to as many researchers as possible can help address the tively incorporate physics into an ML model and vice versa. One
growing irreproducibility of scientific papers on AI and ML applications example is to modify the cost function of a neural net to include the
in ChemE (Marcato et al., 2023). For instance, many publications do not outputs from a physics-based model, such as in the work of G. Wu et al.
include the full code or full set of hyper-parameters in the architecture of (2023) for the model predictive control of a batch crystallizer. The work
the deep learning model that was used. Students with knowledge of ML has shown that a new physics-informed RNN can be trained with less
hyper-parameters can verify for themselves these results by doing their data and can extrapolate better than a purely data-driven RNN. In
own hyper-parameter search. Also, we often find publications that addition, Alhajeri et al. (2023) assessed the generalization error of
report state-of-the-art results using the latest AI models (convolutional physics-informed RNNs towards better predictive control of nonlinear
neural nets, stacked autoencoders, long short-term memory networks) processes.
but are not compared comprehensively to simpler models (support Another advantage of knowing ML theory is having the ability to
vector machines, shallow neural nets, random forests). It is important to further improve the model when it performs poorly. For instance, it is
teach students to always find a good match between the model now easy to apply different ML models for any problem thanks to the
complexity and the problem complexity: If a data set can be modeled Scikit-learn library. But if the hyper-parameters are only set to default
more accurately using simpler models, then by Occam’s razor, these values, the ML models are not trained to their full potential and they will
models could be more useful in production than deep learning. Only perform poorly on test data. Theoretical knowledge of the ML hyper-
students who have a vast knowledge of both classical ML and deep parameters and how to tune them properly using cross-validation is
learning techniques can make comprehensive comparisons like these. If required to improve predictions significantly (Sun & Braatz, 2021).
the model and problem complexities are not matched, then either Despite knowing the importance of teaching ML theory, it is still unclear
underfitting (model complexity is lower than the problem complexity) as to how deep should the theory be taught in an ML course for ChemE.
or overfitting (model complexity is higher than the problem complexity) Third, most ChemE undergraduate programs lack sufficient statistics
can occur. courses that are prerequisites for understanding ML theory. For instance,
We then ask: How should the AI and ML courses be designed for the idea behind Gaussian process regression, a well-known ML model for
ChemE students? This question is difficult to answer for three reasons. predicting point values with uncertainty, requires understanding
First, AI and ML is a rapidly evolving field. Even if an elective course Bayesian statistics which is not typically taught in ChemE programs.
can be designed at the moment, there is no guarantee that the content Also, the concept behind the cross-entropy loss, the loss function in
will still be relevant after a few years. Newer AI with better architectures neural net classifiers and logistic regression models, comes from the idea
and intuitions are produced every few months. A good example of this is of entropy in statistics. Although ChemE students are familiar with the
the Transformer architecture. Before Transformers, convolutional neu­ entropy from thermodynamics as a measure of disorder, the entropy

ral nets (CNNs) gained considerable success in computer vision tasks used in machine learning has a different formula, − plogp, which is
while recurrent neural nets (RNNs) gained success in time series ana­ rooted in measuring the expected “surprise” in drawing a red ball from a
lytics. Time series data appear a lot in ChemE applications, in the form of bag of blue and red balls—a statistical concept (Hastie et al., 2008). In a
tabular data sets, text data sets, and video data sets. Both textual and way, both meanings of entropy are related to “disorder”, but the sta­
image data are important for automatically populating ontologies that tistical concept helps make sense as to why minimizing its value leads to
help organize a large body of knowledge in a single representation, such well-trained classifiers. Students may be confused between the two
as for pharmaceutical knowledge (Remolona et al., 2016). Applying meanings. ML courses would have to include extra time to discuss some
video analytics to the feed from a camera pointed at a plant flare for topics from statistics (e.g. statistical distributions, Bayes theorem,
improving flare management was also reported in practice by Pat­ Kullback-Leibler divergence, entropy, etc.) to appreciate the theory
wardhan et al. (2019). But then, Vaswani et al. (2017) proposed the behind certain learning algorithms. In addition, gaps in optimization
attention mechanism and showed that it can dispense both convolution theory, linear algebra, and graph data structures would have to be filled
and recurrence mechanisms in a single Transformer architecture, in when deriving other ML models as well. Are these topics worth adding
achieving similar performance at a fraction of the computational cost for to an ML course for ChemE students?
training. Indeed, this Transformer architecture is now powering the Currently, ML is taught in ChemE programs either as topics within a
familiar Generative Pre-trained Transformer (GPT) network in ChatGPT ChemE subject or as a separate elective course entirely. In the course
and other large language models (LLMs). A recent work by Vogel et al. called Analysis of Chemical Process Industry being offered at the Uni­
(2023) now uses a generative transformer for the automatic completion versity of Sao Paulo, Brazil, (Lavor et al., 2024) machine learning topics
of process flowsheets during plant design. How often should ML courses such as introduction to AI, clustering, classification, regression, neural
in ChemE adjust for newer architectures like these? nets, and deep learning are incorporated in the last few weeks of the
Second, it is difficult to specify what learning outcomes should the course. The course focuses on hands-on activities where students use ML
ML course achieve exactly. If the learning outcome is for students to in Python to analyze data sets available from Kaggle and the UCI (Uni­
simply be adept in applying available Python libraries or any ML soft­ versity of California, Irvine) repository. Students were also tasked to
ware on any data set, without regard for the mathematics and algo­ train neural nets in a real mining processing industry to predict a key
rithmic details behind the ML models, then students are taught to be process variable. A selection of ML algorithms was taught down to their
tool-users. There is nothing wrong with this desired outcome since theoretical details. This course met with a positive response from stu­
this skill set is already very useful in the industry. Graduates can already dents. At the Massachusetts Institute of Technology (MIT), a Process
realize considerable career growth by knowing how to use built-in Data Analytics course is offered for ChemE students as well as me­
functions from Scikit-learn, Pandas, PyTorch, and Tensorflow libraries, chanical engineering and engineering management students (Hong,
for instance. However, if it is desired for graduate students to be able to 2022). The course intends to focus more on the practical skills of
conduct research into new ML modifications that can help improve the applying ML in ChemE, rather than discussing the mathematics behind
state-of-the-art in solving industry problems, then the learning outcomes the ML methods. As such, students were exposed to more real-world data
of the ML course should include the ability to follow the theoretical sets rather than synthetic ones. MATLAB was chosen as the

2
K.E. Pilario Digital Chemical Engineering 11 (2024) 100163

programming language due to its compatibility with engineering ap­ preliminaries for each module. Besides, statistical concepts are
plications, and also since students already gained experience with it in already being taught to research students at UPD such as the design
earlier courses at MIT. Meanwhile, the Machine Learning in Chemical of experiments, response surface methodology, correlation co­
Engineering course taught at Imperial College London (Sanchez Medina efficients, linear regression, etc. In the new ML course, these topics
et al., 2023) organizes the ML topics according to their applications in are simply expanded to the level that can motivate the mathematics
ChemE such as in materials informatics, process modeling, PID (pro­ behind ML models. We believe that these concepts will eventually
portional-integral-derivative) tuning, process control, process moni­ become mainstream in future ChemE programs, i.e., the way we
toring, and energy systems modeling. Python was also the language of teach artificial neural networks will eventually become as standard
choice and the codes were even made available to the public via Github as the way we teach linear regression.
pages. These are valuable resources for any ChemE student to learn ML.
ML was also taught to ChemE students in the course by Venkatasu­ This paper is organized as follows. Section 2 presents the course
bramanian (2022) at Columbia University, together with symbolic AI design of ChE 197/297; Section 3 discusses the roadmap of motivations
topics such as knowledge-based expert systems, ontologies, and used to tie all the ML algorithms in one narrative, including a list of
rule-based systems. The course has been running since 1986 and it has suggested bridging topics; Section 4 discusses examples of case studies
since evolved as necessary due to advances in AI and ML. used in class; Section 5 reports some student feedback on the course;
Many other attempts at designing an ML course for chemical engi­ Section 6 provides a link to the actual course materials; and lastly,
neers can be found in other universities around the world, as noted Section 7 concludes the paper.
through conversations of the author with other researchers during
conferences. These courses are still evolving and they all answer the 2. Course Design
three above-mentioned issues differently. All of them are important
advances in chemical engineering education. 2.1. Learning outcomes
This paper presents an attempt to design a graduate-level ML course
at the Department of Chemical Engineering at the University of the The ChE 197/297 course is designed so that at the end of the course,
Philippines, Diliman (UPD), called ChE 197/297: Introduction to AI/ML students should be able to:
for Chemical Engineers. Different from the course designs above, a wide
coverage of ML models was taught together with their motivations, • Understand the motivations and mathematical details behind pop­
derivations, and algorithmic details. The primary goal of the course is to ular classical ML methods for supervised and unsupervised learning.
train graduate ChemE students not only on how to apply ML to various • Identify problem types in ChemE that can potentially be solved using
areas in ChemE but also on how to use them in research. Specifically, the AI and ML techniques.
course answers the above-mentioned issues in the following way: • Solve predictive problems in ChemE using supervised ML.
• Analyze ChemE data sets in both unsupervised and supervised
• What topics should be included in an ML course? How often should an ML learning settings.
course content adjust to advances in AI and ML research? We believe • Critique the effectiveness of current research work that applied ML in
that by designing a course around classical ML rather than more ChemE problems.
advanced ML, it will remain relevant for years to come. If students
can understand what the basic learning problems are (supervised, For graduate-level ChemE students (ChE 297), all outcomes are ex­
unsupervised, reinforcement, etc.) and the concepts important to pected, but for undergraduate thesis ChemE students, only the first 4 are
learning itself (loss functions, generalization, underfitting and expected. The last student outcome is measured through a research
overfitting, etc.), then it is hoped that they can follow more complex paper critique that is required only for graduate-level ChemE students.
concepts in deep learning and other newer architectures more easily.
Most new architectures nowadays come from deep learning research. 2.2. Required software
These models still have the same underlying mechanisms that are
rooted in classical ML such as neural nets, backpropagation, and Students are required to use Python codes in the course, especially in
gradient descent. By focusing on these ML concepts, it is hoped that the form of Jupyter Notebooks. Jupyter Notebooks were proven to be
the course remains relevant over time. Teaching classical ML is also effective platforms for learning ChemE modules since they are interac­
easier since access to high-performance computing is not required. tive and engaging (Bascuñana et al., 2023). We adopt this same platform
Students can already apply ML using their laptops or desktop com­ for teaching ML in ChE 197/297. Student exercises and exam sub­
puters in the university. missions are also required to be written as Jupyter Notebooks as well.
• How deep should the theory be taught in an ML course? The level of ML ChemE students at UPD are taught MATLAB in their previous sub­
theory taught in the ChE 197/297 course is based on the re­ jects rather than Python. Even graduate students of ChemE are not ex­
quirements to conduct a certain level of research. ChE 297 is a course pected to know how to code in Python before enrolling in the course.
for Master’s-by-research and PhD students and the corresponding Hence, ChE 197/297 is designed so students can learn Python along the
ChE 197 is an undergraduate course for research thesis students. For way. All the standard codes for each ML topic are already given to the
these students, the theoretical content should be: (1) enough for students at the start. During lab sessions, the Python codes are explained
them to perform hyper-parameter tuning and model comparisons to the students line by line, while noting differences between Python and
properly; (2) enough so that students can describe ML algorithms in MATLAB syntax. Outside of class, students can already run the codes
detail when writing their research; and, (3) enough to allow for ahead of time for better understanding.
students to suggest novel modifications or find new ChemE appli­
cations. However, these are mere guiding principles since they are 2.3. Course content
not explicitly assessed as student outcomes. There is still room for
variation in the depth of discussion despite having these course goals At the beginning of the course, students are presented with the or­
in place. ganization of the course content (see Fig. 1). The content is divided into
• Are bridging topics worth adding to an ML course for ChemE students? supervised and unsupervised learning, which are the two main cate­
Yes, as long as time permits. A list of important bridging topics will gories of classical ML (Murphy, 2012). Within each category, the main
be presented later for each module of the course, mostly coming from areas of ML tasks are discussed and differentiated from each other:
statistics. These topics were proven to be manageable to discuss as regression, classification, dimensionality reduction, clustering, density

3
K.E. Pilario Digital Chemical Engineering 11 (2024) 100163

Fig. 1. Overview of course content in ChE 197/297.

estimation, and anomaly detection. A range of various ML techniques is 2.4. Excluded classical ML topics in the course
then presented under each task, all of which are to be discussed
throughout the course (see Table 1). Aside from the course content, it is Based on Table 1, it is clear that some ML topics are not emphasized
also helpful to discuss important questions on the first day: What is ML? or excluded in the course. We present the reasons for doing so in this
What were the breakthroughs in AI in the past? Why did AI/ML flourish subsection.
only now? As engineers, why would we use ML in the first place? Does it One topic that is less emphasized is Reinforcement Learning (RL). A
give some added value? How can we apply AI and ML in our domain? proper treatment of RL, such as that found in the book by Sutton & Barto
The topics listed in Table 1 were carefully ordered to ensure that (2012), requires starting with Markov Decision Processes, policy and
student learning is incremental and that the current topic can be un­ value iteration, temporal difference learning, Q-learning, and then
derstood by building upon previous topics. Students are first taught actor-critic networks. This level of discussion is the one that is at par
exploratory data analysis so they can learn to summarize and describe with the level of discussion in the other ML topics of the ChE 197/297
the statistics of ChemE data sets using data visualization techniques in course. These can potentially be applied not only in process control, but
Python. This module also serves as an introduction to the Python pro­ also in scheduling, real-time optimization, or any search problem posed
gramming language. After this, the course is split into topics under su­ in a complex design space. However, due to time constraints, it is better
pervised learning, and then unsupervised learning. In Section 3 of this to teach these topics possibly as a separate course altogether to cover the
paper, we will further elaborate on the reasons for designing the order of vast literature on RL. Nonetheless, the present ML course includes demos
modules in this way. Finally, explainability methods and AutoML are the of RL applied to process control in MATLAB Simulink, just to show that
last topics of the course since they are more integrative. Explainability RL is also an ML category that has found applications in ChemE.
methods aim to shed light on the black-box nature of the ML predictions, Evolutionary optimization is also a topic in ML not found in Table 1.
e.g. by reporting feature importances. Meanwhile, AutoML (automated The category of evolutionary optimizers, including the genetic algo­
ML) is any procedure for automatically selecting the best ML model with rithm (GA) and particle swarm optimization (PSO), is listed as one of the
tuned hyper-parameters for classification or regression. Both of these key “tribes” in machine learning according to Domingos (2015).
topics require the knowledge of all previous ML topics to be appreciated. Evolutionary optimization is ubiquitous in ChemE. In one particular
research area, namely the modeling of dye removal processes in
wastewater, a recent survey found that most researchers use GA and PSO
either to find the best neural net architectures or to optimize the output
Table 1 of surrogate models for dye adsorption (Bhagat et al., 2023). A
List of topics covered in the ChE 197/297 course at UPD. commonly used AutoML package called TPOT (Tree-based Pipeline
Optimization Tool) is also based on genetic programming to find optimal
Module Topic
multi-stage ML pipelines for any task (Le et al., 2020). As an example,
1 Introduction to Machine Learning Huntington et al. (2023) used TPOT for building surrogate models for
2 Exploratory Data Analysis in Python
lignocellulosic ethanol production. SINDy, which stands for Sparse
Supervised Learning
3 Linear Regression and Logistic Regression Identification of Nonlinear DYnamics, is also a Python-based data-­
4 Support Vector Machines and Kernel Methods driven system identification platform that uses genetic
5 Cross-validation and Hyper-parameter Tuning programming-based symbolic regression (Brunton et al., 2016), and was
6 Gaussian Processes and Bayesian Optimization
recently used for fault prognosis (Bhadriraju et al., 2021) and noisy
7 Neural Networks for Regression, Classification, and Time Series
8 Trees, Weak Learners, and Ensemble Learning multi-scale dynamic modeling (Abdullah & Christofides, 2023). Ulti­
Unsupervised Learning mately, evolutionary algorithms are excluded from the ChE 197/297 ML
9 Linear Dimensionality Reduction and Feature Engineering course because they are already covered in a separate graduate course
10 Nonlinear Dimensionality Reduction and Manifold Learning for optimization in ChemE.
11 Clustering, Density Estimation, and Anomaly Detection
ML topics related to learning from sequential data such as state-space
12 AutoML and Explainability

4
K.E. Pilario Digital Chemical Engineering 11 (2024) 100163

models, hidden Markov models, and particle filtering are also commonly topic before knowing the whats and the hows. The ML topics in the ChE
used in ChemE. Bishop (2006) dedicated a chapter for them in his ma­ 197/297 course need to be motivated in the right way to maximize their
chine learning textbook. However, because they relate more to process appreciation. Figs. 2–5 show a roadmap of such motivations for the
dynamics and control, these are discussed in a graduate-level process supervised learning and unsupervised learning topics. We discuss the
control course instead. contents of the roadmap as follows.
Probabilistic graphical models, causal graphs, and graph neural nets
are also not found in the current set of ML topics. Similar to RL, these
3.1. Modules on supervised learning
topics require a strong grasp of pre-requisite topics to be understood,
namely graph theory. It is better to dedicate a separate course for them
Linear regression is a good starting point since ChemE students are
to be fully discussed and appreciated.
already familiar with finding best-fit lines in a data set. In this lecture,
Finally, there are topics related to ML that are not exactly considered
linear regression is presented from the ML perspective, i.e. the slope and
ML algorithms. Two prominent examples are fuzzy logic and wavelet
y-intercepts are now called weights, the least-squares objective is now
theory. Fuzzy systems and fuzzy logic are tools from soft computing that
called a loss function, and the model is expanded to include linear basis
have found applications in building expert systems, which is a branch of
functions, ridge regularization, and locally weighted linear regression
AI. Fuzzy logic is an approach to mathematically abstract uncertain and
(LWR). Meanwhile, the starting point for classification is logistic
approximate variables from the real world to a computer. As such,
regression. Logistic regression is taught as the analog of linear regression
fuzzification is not an ML algorithm per se (it is not an approach to
in which the linear model output is now fed to the logistic (or sigmoid)
learning from data), but it can be used to enhance ML models. One
function to turn the output into a probability score for binary
notable example is the use of neuro-fuzzy networks in system identifi­
classification.
cation and fuzzy logic control (Babuška & Verbruggen, 2003). The same
Performance metrics for regression and classification are also intro­
can be thought about wavelets. Wavelets are wave-like oscillations that
duced in this lecture since they will be used throughout the rest of the
are used as building blocks for multi-resolution feature extraction pri­
course. The typical metrics for regression include the mean squared
marily in the signal processing community (Guo et al., 2022). Wavelet
error (MSE), the root-mean-squared error (RMSE), the normalized MSE
neural networks have been popular since the ‘90s due to their ability to
(NMSE), and the R2 metric. For classification, we find it appropriate to
combine the flexibility of a neural net with the time-frequency feature
introduce the confusion matrix first. From this result, other metrics can
extraction capabilities of wavelet analysis (Alexandridis & Zapranis,
be derived such as precision, recall, F1-score, accuracy, and so on.
2013). Other combinations of wavelets with ML models (Liu et al., 2013;
Although students may be overwhelmed by introducing many perfor­
Li et al., 2020), as well as applications of wavelet transforms (Roxas
mance metrics at once, we remind them that different tasks in ChemE
et al., 2022), were also reported in ChemE research literature. These
have different preferences on which selected few metrics to use. For
topics, fuzzy logic and wavelets among others, are simply mentioned
instance, researchers in the area of fault detection prefer to report false
passively in the ChE 197/297 course, but no separate lecture is dedi­
alarm rates, missed detection rates, and detection accuracies. Mean­
cated to them.
while, any general classification task such as for flow regimes in pipes
prefers to use confusion matrices and F1-scores, especially when the
3. A Roadmap of ML Topics
number of samples is imbalanced among the classes.
The next module discusses kernel methods. The idea is to use the
For learning to be effective, students need to know the whys of a
kernel trick on linear dot products within linear models so that they can

Fig. 2. Roadmap of the ChE 197/297 course for Modules 2-7. Legend: Blue box = regression, green box = classification, gray box = more general ML topics (For
interpretation of the references to color in this figure legend, the reader is referred to the web version of this article).

5
K.E. Pilario Digital Chemical Engineering 11 (2024) 100163

Fig. 3. Roadmap of the ChE 197/297 course for Modules 8-9. Legend: Blue box = regression, green box = classification, purple box = dimensionality reduction, gray
box = general ML topics (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article).

Fig. 4. Roadmap of the ChE 197/297 course for Modules 10-11. Legend: Purple box = dimensionality reduction, orange box = clustering, gray box = general ML
topics (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article).

become nonlinear covariances (Pilario et al., 2019). Kernel methods large-margin classifier intuition. To expand binary SVM, multi-class
such as kernel ridge regression (KRR), support vector classification classification strategies are also discussed: one-vs-one, one-vs-rest, and
(SVC), and support vector regression (SVR) are discussed. KRR is error-correcting output codes.
motivated by applying the kernel trick on linear basis function models After this, a separate module is dedicated to cross-validation
described in the previous lecture. SVMs are motivated by the methods and basic hyper-parameter tuning in Python, since by now,

6
K.E. Pilario Digital Chemical Engineering 11 (2024) 100163

Fig. 5. Roadmap for the ChE 197/297 course for Modules 11-12. Legend: Pink box = anomaly detection, red box = AutoML and explainable ML, gray box = general
ML topics (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article).

students are curious about how to tune kernel parameters and regula­ gradient descent, forward, and backpropagation are discussed. RNNs
rization parameters, among others. We emphasize that training data and their variants are also introduced for time series analysis, along with
should be independent of the validation data and the testing data. other available deep learning architectures: CNNs, GANs, Transformers,
Tuning the hyper-parameters on validation data is important to combat etc. In this specific module, around 70 % of the time is dedicated to the
overfitting on the training set. For hyper-parameter tuning, Bayesian basic multi-layer perceptron (MLP) architecture and algorithm details
optimization is noted as the main algorithm that is preferable in most and only 30 % is dedicated to deep learning.
cases. But in the present lecture, the class is simply taught how to use Fig. 3 shows the next modules after the one on neural networks. The
Optuna (Akiba et al., 2019), which is a Python package that implements last topic under supervised learning is ensemble learning. The idea is to
a variant of Bayesian optimization for ML tuning. A more in-depth lec­ build a strong learner from a committee of weak learners. Hence, weak
ture on Bayesian optimization is yet to be introduced in Module 6. Aside learners are first discussed, including Naïve Bayes, k-nearest neighbors,
from Bayesian optimization, grid search and random search are also and decision trees. By this time, students are already familiar with Bayes
taught in this lecture. Different cross-validation techniques are also theorem and cross-entropy, which are needed to motivate Naïve Bayes
discussed such as Holdout, K-fold, stratified K-fold, leave-one-out, and and splitting in decision trees, respectively. Discriminant analysis is also
others that are available in Scikit-learn. It is hoped that students will be a weak learner that could have been included in this module. But we
able to use them off the shelf for the rest of the course. decided to include this in the dimensionality reduction module instead
In the next module, the bigger story around KRR is revealed by a so that Linear Discriminant Analysis can be taught with the idea of data
change of perspective. Least-squares is taught to be only an instance of projections. The module on ensemble learning ends with the well-known
maximum likelihood estimation, which in turn is an instance of boosting, bagging, and stacking procedures, which lead to models such
maximum a posteriori estimation, and in turn, an instance of a full as gradient boosting, extreme gradient boosting (XGBoost), Random
Bayesian approach. To understand the full Bayesian approach, students Forest, and stacking ensembles.
are introduced to Bayes theorem, which then motivates Gaussian pro­
cess regression (GPR). A key discussion in class is the difference between 3.2. Modules on unsupervised learning
the frequentist and Bayesian interpretations of probability. Up to this
point, the treatment of the course follows closely with the first few The course now transitions to unsupervised learning by motivating
chapters of the textbook by Bishop (2006), one of the main course ref­ them as tools to enhance pattern extraction before training any super­
erences. After this, Bayesian optimization is taught to the class where vised ML model.
GPR is utilized as a surrogate model. Acquisition functions such as upper In Fig. 3, dimensionality reduction is first introduced to address is­
confidence bound, probability of improvement, and expected improve­ sues such as multi-collinearity and redundancy in the features of the
ment are discussed and compared to each other for specific toy opti­ data. Students are taught that features are dimensions where each data
mization exercises in 1-D and 2-D. Students are given animations on how point lies. If the same information in the data can be expressed in fewer
Bayesian optimization discovers the ground truth function step-by-step dimensions, learning can become more efficient. The main algorithm
to improve understanding. discussed in this module is Principal Components Analysis (PCA). Most
The next module is motivated by noting the scalability issue of kernel online courses would discuss the PCA algorithm directly via the eigen­
methods, which is shared by non-parametric models such as KRR, SVR, value decomposition of the sample covariance matrix. However, in ChE
SVC, and GPR. We return to parametric models, the most successful of 197/297, we also discuss the derivation of PCA starting with variance
which is the artificial neural network (ANN). The basics of ANN such as maximization of the scores under the constraint that the projection
the architecture, the loss functions, activation functions, variants of matrix is orthonormal. The optimization problem is posed and solved via

7
K.E. Pilario Digital Chemical Engineering 11 (2024) 100163

Lagrange multipliers, leading to the reason why PCA becomes an we end the course with a class discussion about these issues in the final
eigenvalue problem in the end. Aside from PCA, feature selection lecture.
techniques are also discussed, as well as other techniques such as LDA Overall, the roadmap presented in Figs. 2–5 can be one prescription
and PLS with their kernelized versions. Variants of PCA typically used in for the design of an ML course. If some ML topics are decided to be
process data analytics such as dynamic PCA and multi-way PCA are not dropped, or new ones added, the roadmap makes it easier to see how it
discussed anymore, but are left for the students to explore if ever their will affect the surrounding topics while keeping the narrative flowing for
research goes in this direction. students to follow.
The next module discusses nonlinear dimensionality reduction
methods, with an emphasis on manifold learning. The module starts 3.4. Bridging Topics
with Kernel PCA (KPCA), which is motivated by applying the kernel
trick in PCA. Even though PCA can be generalized into KPCA, it still Table 2 lists the bridging topics that are important to cover before
cannot learn low-dimensional manifolds. This leads to the discussion of deriving some of the ML algorithms. These topics are not covered in the
manifold learning algorithms (see Fig. 4) starting from multi- undergraduate ChemE program at UPD. Hence, they must be discussed
dimensional scaling, isomap embedding, until t-SNE (t-distributed sto­ as preliminaries in the modules.
chastic neighborhood embedding) and UMAP (uniform manifold Classical ML algorithms are rooted in statistical learning theory
approximation and projection). The order in which the algorithms are (Hastie et al., 2008). However, the ChemE program at UPD has no
discussed is the same order that they were developed historically. Ap­ required major course for statistics at both undergraduate and graduate
plications of these methods to process data analytics, chemometrics, and levels, and only the course on research methods discusses statistical
soft sensing (Khan et al., 2024; Pilario et al., 2019, 2022b) are discussed methods and their applications. This is the reason why most of the
for better appreciation. The module also makes side notes on various bridging topics were found to come from statistics.
distance metrics used in manifold learning, as well as the connection Aside from these, the derivation of multiple linear regression, logistic
between PCA, LDA, and Laplacian Eigenmaps, namely through the regression, SVMs, neural nets, and PCA all require the knowledge of
Rayleigh quotient. differentiating matrix functions. Partial differentiation is taught in the
The last module under unsupervised learning discusses clustering, ChemE program at UPD, but when the function involves matrices, the
density estimation, and anomaly detection. The main clustering partial derivative is not as straightforward. For instance, the model and
methods are K-means, hierarchical clustering, spectral clustering, cost function in linear regression are:
Gaussian mixture models, and DBSCAN (density-based spatial clustering
of applications with noise). To tie these methods into a single narrative, y = Xw + ε (1)
they are motivated based on what issues they can address that K-means
clustering cannot. Cluster validity indices are also presented, notably the C(w) = (y − Xw)T (y − Xw) (2)
silhouette score, to measure clustering performance. Fig. 5 then presents
the methods discussed under density estimation and anomaly detection. where y ∈ Rn are the outputs, X ∈ Rn× (m+1) are the inputs, w ∈ Rm+1 are
Only kernel density estimation is presented in the former, whereas the the weights, and ε is the noise. If one wishes to minimize Eq. (2) with
latter includes 4 methods: elliptical envelope, local outlier factor (LOF), respect to the weights w, then the partial derivative of C(w) equated to
isolation forest, and one-class SVMs. This topic is important for per­ zero becomes:
forming process monitoring in large chemical plants. Hence, the module ∂C
also discusses the basic fault detection framework: feature extraction, = − 2XT (y − Xw) = 0 (3)
∂w
building statistical indices (T2 and Q statistics), and threshold setting.

3.3. AutoML and Explainable ML Table 2


List of bridging topics for each ML topic in ChE 197/297.
The final module of the course is dedicated to integrative topics, ML Topic (Module No.) Pre-requisite Bridging Topic
namely AutoML and Explainable AI.
Exploratory data analysis (2), Statistics Statistical distributions
The goal of AutoML is to automate the process of combined algo­ Hyper-parameter tuning (5),
rithm selection and hyper-parameter optimization (Kotthoff et al., Gaussian process regression (6), t-
2017). Students are made aware of existing AutoML packages such as SNE (10)
LazyPredict, PyCaret, Auto-sklearn, Auto-WEKA, TPOT, and H2O Non-negative Matrix Factorization Statistics Kullback-Leibler
(9), t-SNE (10) divergence
AutoML. Optuna is revisited as a tool where users can do AutoML with Logistic Regression (3), Neural Statistics Entropy in statistics
more freedom to define the search space. Optuna uses a variant of Networks (7), t-SNE (10)
Bayesian Optimization which was already taught in a previous lecture. Gaussian process regression (6), Statistics Bayesian statistics,
Meanwhile, for Explainable AI, Shapley additive explanations Bayesian optimization (6), Naïve conditional and joint
Bayes (8) probabilities
(SHAP) is the main tool being discussed. Other methods for feature
Recurrent Neural Networks (7), Statistics Autocorrelation Function
importance are also mentioned such as permutation feature importance, Exploratory Data Analysis (2)
drop-column feature importance, and mean-decrease-in-impurity Independent Components Analysis Statistics Mutual Information
feature importance for tree-based models. To aid in explainability, (7)
various examples of physics-informed ML in literature are also briefly K-means clustering (11), Gaussian Statistics Expectation maximization
mixture models (11)
mentioned. In the ChemE domain, more and more research on Linear and Logistic Regression (3), Calculus Matrix calculus
explainable AI is also being conducted. For instance, the use of SHAP for SVM (4), Neural Networks (7),
fault diagnosis was tested for adversarial autoencoders by Jang et al. PCA (9)
(2024) and stacked autoencoders by Choi & Lee (2022). SHAP was also Kernel Ridge Regression (4) Linear algebra Woodbury matrix identity
Laplacian Eigenmaps (10), Graph theory Graph adjacency matrix
used to identify the key features from the pressure signals measured on a
Spectral Clustering (11) and Graph Laplacian
pipe to classify flow regimes (Khan et al., 2023). matrix
AI and ML explainability, transparency, interpretability, and fairness Decision trees (8), Random Forest Graph theory Graph traversal
are important to discuss in class as part of AI ethics and safety. Policies (8), XGBoost (8)
and regulations on the development and use of AI are now being drafted SVM (4), PCA (9) Optimization Lagrange multipliers
theory
around the world, which include these notions for compliance. Hence,

8
K.E. Pilario Digital Chemical Engineering 11 (2024) 100163

which may not be obvious to students who know partial differentiation environmental systems, bioprocesses, and business analytics. More can
but only without matrices. This treatment of linear regression is the one be added to the list if one thinks about the role of chemical engineers in
used in Bishop (2006). addressing the United Nations Sustainable Development Goals (SDGs)—
Finally, additional topics from linear algebra, graph theory, and data sets can come from addressing problems in water and sanitation,
optimization theory as listed in Table 2 are also not taught in the un­ climate change, renewable energy, pollution, food, and so on.
dergraduate ChemE program at UPD but are important for under­ Although case studies are used for class exercises, smaller examples
standing and deriving other ML models from different modules. are also used during lectures. For instance, different ML regression
Although Lagrange multipliers are taught in a graduate-level optimiza­ models are tested to fit data points from a sine wave, exponential
tion course, we still added it as a bridging topic in the ML course in case functions, or polynomial functions. ML classification models are also
the graduate student has not taken the optimization course yet. tested on synthetic data sets from Python’s Scikit-learn library such as
“make_classification”, “make_moons”, and “make_blobs.” These are
4. Case studies Used in Class helpful illustrations for students to understand the effects of various
model hyper-parameters on model performance.
ML algorithms are difficult to appreciate without examples. The ChE
197/297 course consists of 2 meetings per week at 1.5 h per meeting. 5. Student Feedback
The first meeting consists of lectures and class discussions, while the
second meeting is dedicated to case studies where live coding sessions in The ChE 197/297 course gained some positive feedback from the
Python are done. Shown in Table 3 are some of the case studies used in official student evaluations, as follows:
class for every module.
The case studies in Table 3 highlight the fact that chemical engineers • “[The] course masterfully balances theoretical and practical aspects of
can use ML algorithms on a wide coverage of applications, from process Machine Learning, providing clear explanations of complex concepts.”
data analytics to chemometrics, materials informatics, energy systems, • “[The instructor] talks about his experience in using some of the algo­
rithms which makes the learning more reliable as it is based on first-hand
experience.”
Table 3 • “Very engaging lectures. Very helpful resources and sample codes.”
Sample case studies for each module in the course. • Translated from Filipino to English: “It’s a good thing that the
instructor has many examples.”
Module Sample case studies (Source of data)
• “[The] slides do a nice job balancing being technical enough to be valu­
2 • Cranfield Multiphase Flow Facility (Ruiz-Cárcel et al., 2015; Stief et al., able, and leveled enough to be understandable, in a sense that it is not just
2019)
• Taylor Swift Spotify Data Set (https://www.kaggle.com/datasets/
always filled with jargon, but also coupled with useful explanations.”
jarredpriester/taylor-swift-spotify-dataset)
• Titanic Data Set (https://www.kaggle.com/c/titanic/data) The course also received points for improvement from the student
• Fisher Iris Data Set (https://archive.ics.uci.edu/dataset/53/iris) evaluations:
3 • Hypothetical Metal-Organic Framework (hMOF) CO2 Isotherm Data
(https://mof.tech.northwestern.edu/mofs/15338)
• Flow Regime Classification using Gas-liquid Velocities (Khan et al., • “The course should have additional topics for data cleaning, pre-
2024) processing, as well as data ethics. Data pre-processing is one of the
4 • Fault Classification in an Evaporator System (Pilario et al., 2021) most crucial and frequently overlooked topics in ML.”
• Predicting Energy Efficiency in Buildings (https://archive.ics.uci.edu/ • “[Students should be allowed] to pick only a subset of the multiple ma­
dataset/242/energy+efficiency)
5 • Wine Quality Data Set (https://archive.ics.uci.edu/ml/datasets/
chine learning exercises, especially the ones that interest them the most…”
wine+quality) • “I wish there was a dedicated lab for this subject to facilitate us better [in]
• Air Quality Data Set (https://www.kaggle.com/datasets/fedesoriano/ applying theory to practical work.”
air-quality-data-set) • “More source code samples.”
6 • Airline Passengers Data Set (https://www.kaggle.com/datasets/
• “The course per se is a really technical course so any difficulties faced in
rakannimer/air-passengers)
• Battery Degradation Data Set (G. dos Reis et al., 2021) learning can really be supplemented with more practice in coding.
• Microalgae Moisture Content Data during Drying (Pilario et al., 2022b) Although if possible, maybe some extra machine exercises (could be
• Atmospheric CO2 Data at Mauna Loa (https://gml.noaa.gov/ccgg/ optional) could help in reinforcing the different machine learning
trends/) algorithms.”
7 • Chlorophyll-a in Global Lakes Data Set (Naderian et al., 2024)
• Identification of an Evaporator System (Pilario et al., 2021)
• Global Solar Irradiance and Wind Speed Forecasting (https://power. It seems that students think the course strikes a good balance be­
larc.nasa.gov/) tween theory and practice. We find that practical examples and illus­
8 • Diamonds Data Set (https://www.kaggle.com/datasets/shivam2503/ trations are needed for students to appreciate the ML theory. The road
diamonds)
map of ML topics presented in Section 3 is also believed to have led to
• Gas Turbine CO and NOX Emissions Data (https://www.kaggle.com/
datasets/muniryadi/gasturbine-co-and-nox-emission-data) the feedback that the course “provided clear explanations.” This high­
• QSAR-based Pesticide Aquatic Toxicity Data (Yu & Zeng, 2022) lights the importance of tying all topics in one narrative, as well as
9 • Hypothetical MOF Database (hMOF) (https://mof.tech.northwestern. adding bridging topics, for better understanding.
edu/mofs/15338) Due to the breadth of applications presented as case studies in Sec­
• Chemometric Data on Bee Substances (Pilario et al., 2022a)
10 • Hypothetical MOF Database (hMOF) (https://mof.tech.northwestern.
tion 4, students also noted that this aspect of the course made it more
edu/mofs/15338) practical. Many students realized that ML has many more applications in
• Flow Regime Mapping using Pressure Signal Features (Khan et al., ChemE than the ones presented in class. Students also received above-
2024) average marks for the exercises and assessments in class. Hence, it can
• 8 × 8 Handwritten Digits Recognition (https://archive.ics.uci.edu/ml/
be said that the course goal of training graduate ChemE students to learn
datasets/Optical+Recognition+of+Handwritten+Digits)
11 • Anomaly Detection in a Wastewater Treatment Plant (https://archive. to apply ML to various areas in ChemE has been achieved.
ics.uci.edu/ml/datasets/water+treatment+plant) According to the students who took the course, one way to improve
• Tennessee Eastman Plant (Ricker, 1996) the course is to add topics related to data pre-processing, which could
• Cranfield Multiphase Flow Facility (Ruiz-Cárcel et al., 2015; Stief et al., mean techniques for data engineering, data cleaning, handling missing
2019)
data, handling heterogeneous data types, categorical data, and so on. In

9
K.E. Pilario Digital Chemical Engineering 11 (2024) 100163

the future, the module on Exploratory Data Analysis can be expanded to utility of ML and AI in our domain.
include these topics. The other point of improvement is to add more
practice sessions for students. We think that this issue can be addressed CRediT authorship contribution statement
by adding a Python programming subject as a pre-requisite to the ML
course so that during the ML course, the time can be spent on more ML Karl Ezra Pilario: Writing – review & editing, Writing – original
implementations rather than learning Python on the go. In our experi­ draft, Visualization, Validation, Software, Resources, Project adminis­
ence, going through Python codes line by line is already time- tration, Methodology, Investigation, Formal analysis, Data curation,
consuming. This leaves less time for testing different ML algorithms on Conceptualization.
more examples. The other solution is to replace Python with MATLAB
for the ML course, similar to the Process Data Analytics course at MIT
Declaration of competing interest
(Hong, 2022). MATLAB is continuously being updated with more ML
capabilities in its toolbox. However, the accessibility of Python makes it
The authors declare that they have no known competing financial
more attractive to students, especially for graduates who plan to use ML
interests or personal relationships that could have appeared to influence
in industries where MATLAB is not available.
the work reported in this paper.
Lastly, some students also noted that the course has too many exer­
cises and that they should be allowed to pick only a subset of the exer­
Acknowledgment
cises to do. We perceive this as helpful feedback that means that the
student workload is heavy in the course. Hence, in the future, exercises
K.E. Pilario is grateful for the support given by the UP KEM Global –
may be condensed to one session per two modules instead of one session
Dr. Luz Salonga Professorial Chair at the College of Engineering, Uni­
per module, to lessen the workload.
versity of the Philippines, Diliman. K.E.P. is also thankful to the
Department of Chemical Engineering for allowing him to offer the ML
6. Availability of course materials
elective course to research students.
The course materials provided to the students include the lecture
slides, Jupyter Notebooks from each module, case studies, data sets as References
CSV (comma-separated values) files, and MATLAB codes used to create
Abdullah, F., Christofides, P.D., 2023. Data-based modeling and control of nonlinear
illustrative animations of ML concepts. Readers can follow this link to process systems using sparse identification: an overview of recent results. Comput.
access these materials: https://github.com/kspilario/MLxChE. Chem. Eng. 174 (February), 108247 https://doi.org/10.1016/j.
The above repository is continuously updated by the author. Some compchemeng.2023.108247.
Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M., 2019. Optuna: a next-generation
new materials may have been added beyond those mentioned in this hyperparameter optimization framework. In: Proceedings of the ACM SIGKDD
paper. International Conference on Knowledge Discovery and Data Mining, pp. 2623–2631.
In addition, the same course design was also recently expanded into https://doi.org/10.1145/3292500.3330701.
Alexandridis, A.K., Zapranis, A.D., 2013. Wavelet neural networks: a practical guide.
two more courses at UPD, namely AI 221 (Classical Machine Learning) Neural Netw. 42, 1–27. https://doi.org/10.1016/j.neunet.2013.01.008.
and DS 397 (Advanced Computational Methods in Data Science). AI 221 Alhajeri, M.S., Alnajdi, A., Abdullah, F., Christofides, P.D., 2023. On generalization error
is offered to graduate students of the AI program at UPD. AI 221 has of neural network models and its application to predictive control of nonlinear
processes. Chem. Eng. Res. Des. 189, 664–679. https://doi.org/10.1016/j.
more breadth in terms of applications, such as in the sciences, other cherd.2022.12.001.
engineering fields, and business. Meanwhile, DS 397 is offered to Ph.D. Babuška, R., Verbruggen, H., 2003. Neuro-fuzzy methods for nonlinear system
students in Data Science at UPD. Compared to ChE 197/297, the DS 397 identification. Annu Rev. Control 27 I, 73–85. https://doi.org/10.1016/S1367-5788
(03)00009-9.
course goes deeper into the computational details and issues of ML al­ Bascuñana, J., León, S., González-Miquel, M., González, E.J., Ramírez, J., 2023. Impact
gorithms so that students learn how to code popular ML models from of Jupyter notebook as a tool to enhance the learning process in chemical
scratch, how to assess time and memory complexities, and how to solve engineering modules. Educ. Chem. Eng. 44, 155–163. https://doi.org/10.1016/j.
ece.2023.06.001.
numerical issues, such as vanishing/exploding gradients during
Beck, D.A.C., Carothers, J.M., Subramanian, V.R., Pfaendtner, J., 2016. Data science:
training. The course materials for AI 221 and DS 397 are also available in accelerating innovation and discovery in chemical engineering. AIChE J. 62 (5),
the author’s repository. 1402–1416. https://doi.org/10.1002/aic.15192.
Bhadriraju, B., Kwon, J.S.Il, Khan, F, 2021. OASIS-P: operable adaptive sparse
identification of systems for fault prognosis of chemical processes. J. Process. Control
7. Conclusion 107, 114–126. https://doi.org/10.1016/j.jprocont.2021.10.006.
Bhagat, S.K., Pilario, K.E., Babalola, O.E., Tiyasha, T., Yaqub, M., Onu, C.E., Pyrgaki, K.,
This paper presents some ideas on how to design a machine learning Falah, M.W., Jawad, A.H., Yaseen, D.A., Barka, N., Yaseen, Z.M., 2023.
Comprehensive review on machine learning methodologies for modeling dye
(ML) course for ChemE graduate-level and research students. These removal processes in wastewater. J. Clean. Prod. 385 (November 2022), 135522
ideas were implemented in the ML course offered at the Department of https://doi.org/10.1016/j.jclepro.2022.135522.
Chemical Engineering, University of the Philippines, Diliman (UPD), Bikmukhametov, T., Jäschke, J., 2020. Combining machine learning and process
engineering physics towards enhanced accuracy and explainability of data-driven
named ChE 197/297: Introduction to AI/ML for Chemical Engineers. models. Comput. Chem. Eng. 138 https://doi.org/10.1016/j.
The goal of the course is to train students to apply ML in various areas of compchemeng.2020.106834.
ChemE. Hence, the ML course is designed to emphasize the mathemat­ Bishop, C.M., 2006. Pattern Recognition and Machine Learning. Springer.
Brunton, S.L., Proctor, J.L., Kutz, J.N., 2016. Discovering governing equations from data
ical details, derivations, and motivations of selected ML algorithms—an by sparse identification of nonlinear dynamical systems. In: Proceedings of the
algorithmic approach as opposed to a purely statistical approach. National Academy of Sciences of the United States of America, 113, pp. 3932–3937.
Our main results include the following: Reasons for the inclusion and https://doi.org/10.1073/pnas.1517384113.
Choi, S.H., Lee, J.M., 2022. Explainable fault diagnosis model using stacked autoencoder
exclusion of ML topics, a road map of ML topics that can be followed to and kernel SHAP. In: Proceedings of the IEEE International Symposium on Advanced
motivate the lessons to students, a list of bridging topics needed to un­ Control of Industrial Processes, AdCONIP, 2022, pp. 182–187. https://doi.org/
derstand how ML models were derived, and a list of sample case studies 10.1109/AdCONIP55568.2022.9894124.
Daoutidis, P., Lee, J.H., Rangarajan, S., Chiang, L., Gopaluni, B., Schweidtmann, A.M.,
of ML applications in ChemE that were used in class. Based on student
Harjunkoski, I., Mercangöz, M., Mesbah, A., Boukouvala, F., Lima, F.V., del Rio
evaluations of the course, we found that students now realize that ML Chanona, A., Georgakis, C., 2024. Machine learning in process systems engineering:
can be applied to many areas in ChemE, even beyond the examples Challenges and opportunities. Comput. Chem. Eng. 181 (November 2023), 108523
presented in class. Class materials are available to the readers through https://doi.org/10.1016/j.compchemeng.2023.108523.
Domingos, P. (2015). The master algorithm. Basic Books.
the link provided in Section 6. We hope that this ML course design can dos Reis, G., Strange, C., Yadav, M., Li, S., 2021. Lithium-ion battery data and where to
inspire changes in ChemE education to accommodate the increasing find it. Energy AI 5. https://doi.org/10.1016/j.egyai.2021.100081.

10
K.E. Pilario Digital Chemical Engineering 11 (2024) 100163

Ge, Z., Song, Z., Ding, S.X., Huang, B., 2017. Data mining and analytics in the process Qin, S.J., Chiang, L.H., 2019. Advances and opportunities in machine learning for
industry: the role of machine learning. IEEe Access. 5 (99), 20590–20616. https:// process data analytics. Comput. Chem. Eng. 126, 465–473. https://doi.org/10.1016/
doi.org/10.1109/ACCESS.2017.2756872. j.compchemeng.2019.04.003.
Guo, T., Zhang, T., Lim, E., Lopez-Benitez, M., Ma, F., Yu, L., 2022. A review of wavelet Rai, R., Sahu, C.K., 2020. Driven by data or derived through physics? A review of hybrid
analysis and its applications: challenges and opportunities. IEEe Access. 10, physics guided machine learning techniques with Cyber-Physical System (CPS)
58869–58903. https://doi.org/10.1109/ACCESS.2022.3179517. focus. IEEe Access. 8, 71050–71073. https://doi.org/10.1109/
Hastie, T., Tibshirani, R., Friedman, J., 2008. The Elements of Statistical Learning. ACCESS.2020.2987324.
Springer, New York. Reis, M., Gins, G., 2017. Industrial process monitoring in the big data/industry 4.0 era:
Hong, M.S., 2022. Teaching process data analytics and machine learning at MIT. Chem. from detection, to diagnosis, to prognosis. Processes 5 (4), 35. https://doi.org/
Eng. Educ. 56 (4), 0–3. https://doi.org/10.18260/2-1-370.660-130947. 10.3390/pr5030035.
Huntington, T., Baral, N.R., Yang, M., Sundstrom, E., Scown, C.D., 2023. Machine Remolona, M.F.M., Conway, M.F., Balasubramanian, S., Fan, L., Feng, Z., Gu, T., Kim, H.,
learning for surrogate process models of bioproduction pathways. Bioresour. Nirantar, P.M., Panda, S., Ranabothu, N.R., Rastogi, N., Venkatasubramanian, V.,
Technol. 370 (October 2022), 128528 https://doi.org/10.1016/j. 2016. Hybrid ontology-learning materials engineering system for pharmaceutical
biortech.2022.128528. products: multi-label entity recognition and concept detection. Comput. Chem. Eng.
Jang, K., Pilario, K.E.S., Lee, N., Moon, I., Na, J., 2024. Explainable artificial intelligence 107, 49–60. https://doi.org/10.1016/j.compchemeng.2017.03.012.
for fault diagnosis of industrial processes. IEEe Trans. Industr. Inform. 1–8. https:// Ricker, N.L., 1996. Decentralized control of the tennessee eastman challenge process.
doi.org/10.1109/TII.2023.3240601. J. Process. Control 6 (4), 205–221.
Jiang, Y., Yin, S., Dong, J., Kaynak, O., 2021. A review on soft sensors for monitoring, Roxas, R., Evangelista, M.A., Sombillo, J.A., Nnabuife, S.G., Pilario, K.E., 2022. Machine
control, and optimization of industrial processes. IEEe Sens. J. 21 (11), learning based flow regime identification using ultrasonic doppler data and feature
12868–12881. https://doi.org/10.1109/JSEN.2020.3033153. relevance determination. Digit. Chem. Eng. 3 (November 2021), 100024 https://doi.
Khan, U., Pao, W., Pilario, K.E.S., Sallih, N., Khan, M.R., 2023. Two-phase flow regime org/10.1016/j.dche.2022.100024.
identification using multi-method feature extraction and explainable kernel Fisher Ruiz-Cárcel, C., Cao, Y., Mba, D., Lao, L., Samuel, R.T., 2015. Statistical process
discriminant analysis. Int. J. Numer. Methods Heat Fluid Flow. https://doi.org/ monitoring of a multiphase flow facility. Control Eng. Pract. 42, 74–88. https://doi.
10.1108/HFF-09-2023-0526. org/10.1016/j.conengprac.2015.04.012.
Khan, U., Pao, W., Pilario, K.E., Sallih, N., 2024. Flow regime classification using various Sanchez Medina, E. I., del Rio Chanona, E. A., & Ganzer, C. (2023). Machine learning in
dimensionality reduction methods and AutoML. Eng. Anal. Bound. Elem. 163 chemical engineering. Zenodo. 10.5281/zenodo.7986905.
(November 2023), 161–174. https://doi.org/10.1016/j.enganabound.2024.03.006. Stief, A., Tan, R., Cao, Y., Ottewill, J.R., Thornhill, N.F., Baranowski, J., 2019.
Kotthoff, L., Thornton, C., Hoos, H.H., Hutter, F., Leyton-Brown, K., 2017. Auto-WEKA A heterogeneous benchmark dataset for data analytics: multiphase flow facility case
2.0: automatic model selection and hyperparameter optimization in WEKA. J. Mach. study. J. Process. Control 79, 41–55. https://doi.org/10.1016/j.
Learn. Res. 18 (25), 1–5. http://jmlr.org/papers/v18/16-261.html. jprocont.2019.04.009.
Lavor, V., de Come, F., dos Santos, M.T., Vianna, A.S., 2024. Machine learning in Sun, W., Braatz, R.D., 2021. Smart process analytics for predictive modeling. Comput.
chemical engineering: hands-on activities. Educ. Chem. Eng. 46 (September 2023), Chem. Eng. 144, 107134 https://doi.org/10.1016/j.compchemeng.2020.107134.
10–21. https://doi.org/10.1016/j.ece.2023.09.005. Sutton, R.S., Barto, A.G., 2012. Reinforcement Learning: An Introduction, 2nd ed. MIT
Le, T.T., Fu, W., Moore, J.H., 2020. Scaling tree-based automated machine learning to Press.
biomedical big data with a feature set selector. Bioinformatics 36 (1), 250–256. van de Berg, D., Savage, T., Petsagkourakis, P., Zhang, D., Shah, N., del Rio-Chanona, E.
https://doi.org/10.1093/bioinformatics/btz470. A., 2022. Data-driven optimization for process systems engineering applications.
Lee, J.H., Shin, J., Realff, M.J., 2018. Machine learning: overview of the recent Chem. Eng. Sci. 248, 117135 https://doi.org/10.1016/j.ces.2021.117135.
progresses and implications for the process systems engineering field. Comput. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.,
Chem. Eng. 114, 111–121. https://doi.org/10.1016/j.compchemeng.2017.10.008. Polosukhin, I., 2017. Attention is all you need. In: Proceedings of the 31st Conference
Li, X., Zhou, K., Xue, F., Chen, Z., Ge, Z., Chen, X., Song, K., 2020. A wavelet transform- on Neural Information Processing Systems.
assisted convolutional neural network multi-model framework for monitoring large- Venkatasubramanian, V., 2022. Teaching artificial intelligence to chemical engineers:
scale fluorochemical engineering processes. Processes 8 (11), 1–17. https://doi.org/ experience from a 35-year-old course. Chem. Eng. Educ. 56 (4) https://doi.org/
10.3390/pr8111480. 10.18260/2-1-370.660-130423.
Liu, Z., Cao, H., Chen, X., He, Z., Shen, Z., 2013. Multi-fault classification based on Vogel, G., Schulze Balhorn, L., Schweidtmann, A.M., 2023. Learning from flowsheets: a
wavelet SVM with PSO algorithm to analyze vibration signals from rolling element generative transformer model for autocompletion of flowsheets. Comput. Chem.
bearings. Neurocomputing 99, 399–410. https://doi.org/10.1016/j. Eng. 171 (January), 108162 https://doi.org/10.1016/j.
neucom.2012.07.019. compchemeng.2023.108162.
Marcato, A., Marchisio, D., Boccardo, G., 2023. Reconciling deep learning and first- Wu, G., Yion, W.T.G., Dang, K.L.N.Q., Wu, Z., 2023. Physics-informed machine learning
principle modelling for the investigation of transport phenomena in chemical for MPC: application to a batch crystallization process. Chem. Eng. Res. Des. 192,
engineering. Can. J. Chem. Eng. 101 (6), 3013–3018. https://doi.org/10.1002/ 556–569. https://doi.org/10.1016/j.cherd.2023.02.048.
cjce.24838. Wu, Z., Tran, A., Rincon, D., Christofides, P.D., 2019. Machine learning-based predictive
Mowbray, M., Vallerio, M., Perez-Galvan, C., Zhang, D., Del Rio Chanona, A., Navarro- control of nonlinear processes. Part I: theory. AIChE J. 65 (11) https://doi.org/
Brull, F.J., 2022. Industrial data science – a review of machine learning applications 10.1002/aic.16729.
for chemical and process industries. React. Chem. Eng. 7 (7), 1471–1509. https:// Yu, X., Zeng, Q., 2022. Random forest algorithm-based classification model of pesticide
doi.org/10.1039/D1RE00541C. aquatic toxicity to fishes. Aquat. Toxicol. 251 (June), 106265 https://doi.org/
Murphy, K.P., 2012. Machine Learning: A Probabilistic Perspective. MIT Press. 10.1016/j.aquatox.2022.106265.
Naderian, D., Noori, R., Heggy, E., Bateni, S.M., Bhattarai, R., Nohegar, A., Sharma, S., Zhang, W., Yang, D., Wang, H., 2019. Data-driven methods for predictive maintenance of
2024. A water quality database for global lakes. Resourc. Conserv. Recycl. 202, industrial equipment: a survey. IEEE Syst. J. 13 (3), 2213–2227. https://doi.org/
107401 https://doi.org/10.1016/j.resconrec.2023.107401. 10.1109/JSYST.2019.2905565.
Pan, E., Petsagkourakis, P., Mowbray, M., Zhang, D., del Rio-Chanona, E.A., 2021.
Constrained model-free reinforcement learning for process optimization. Comput.
Chem. Eng. 154, 107462 https://doi.org/10.1016/j.compchemeng.2021.107462.
Patwardhan, R.S., Hamadah, H.A., Patel, K.M., Hafiz, R.H., Al-Gwaiz, M.M., 2019. Karl Ezra S. Pilario received the B.Sc. (summa cum laude) and
Applications of advanced analytics at Saudi Aramco: a practitioners’ perspective. M.Sc. degrees in chemical engineering from the University of
the Philippines, Diliman, Quezon City, Philippines, in 2012 and
Ind. Eng. Chem. Res. https://doi.org/10.1021/acs.iecr.8b06205.
Pilario, K.E.S., Cao, Y., Shafiee, M., 2021. A kernel design approach to improve kernel 2015, respectively and the Ph.D. degree in energy and power
from Cranfield University, Cranfield, U.K. in 2020. He also
subspace identification. IEEE Trans. Ind. Electr. 68 (7), 6171–6180. https://doi.org/
10.1109/TIE.2020.2996142. participated in the Oxford Machine Learning Summer School in
Pilario, K.E.S., Ching, P.M.L., Calapatia, A.M.A., Culaba, A.B., 2022a. Predicting drying 2022. He is currently an Associate Professor at the Department
curves in algal biorefineries using Gaussian process autoregressive models. Digit. of Chemical Engineering, University of the Philippines, Dili­
Chem. Eng. 4 (February), 100036 https://doi.org/10.1016/j.dche.2022.100036. man, Philippines, and also with the Artificial Intelligence
Program at the same institution. His current research interests
Pilario, K.E., Shafiee, M., Cao, Y., Lao, L., Yang, S., 2019. A review of kernel methods for
feature extraction in nonlinear process monitoring. Processes 8 (1), 24. https://doi. include industrial process data analytics and machine learning
org/10.3390/pr8010024. applications in energy, water, and environmental process
Pilario, K.E., Tielemans, A., Mojica, E.E., 2022b. Geographical discrimination of propolis systems.
using dynamic time warping kernel principal components analysis. Expert. Syst.
Appl. 187, 115938 https://doi.org/10.1016/j.eswa.2021.115938.

11

You might also like