0% found this document useful (0 votes)
7 views

A Learning Algorithm For Optimal Internal Combustion Engine Calibration in Real Time

Uploaded by

oussama safi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

A Learning Algorithm For Optimal Internal Combustion Engine Calibration in Real Time

Uploaded by

oussama safi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/254578979

A Learning Algorithm for Optimal Internal Combustion Engine Calibration in


Real Time

Article · January 2007


DOI: 10.1115/DETC2007-34718

CITATIONS READS
28 1,429

3 authors, including:

Andreas A Malikopoulos Dennis N. Assanis


Cornell University University of Delaware
238 PUBLICATIONS 5,192 CITATIONS 344 PUBLICATIONS 12,066 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Dennis N. Assanis on 18 July 2016.

The user has requested enhancement of the downloaded file.


Proceedings of the ASME 2007 International Design Engineering Technical Conferences & Computers and
Information in Engineering Conference
IDETC/CIE 2007
September 4-7, 2007, Las Vegas, Nevada, USA

DETC2007-34718

A LEARNING ALGORITHM FOR OPTIMAL INTERNAL COMBUSTION


ENGINE CALIBRATION IN REAL TIME
Andreas A. Malikopoulos* Panos Y. Papalambros
Department of Mechanical Engineering Department of Mechanical Engineering

Dennis N. Assanis
Department of Mechanical Engineering

University of Michigan, Ann Arbor, Michigan 48109, U.S.A.

ABSTRACT 1. INTRODUCTION
Advanced internal combustion engine technologies have The growing requests for better performance and fuel
increased the number of accessible variables of an engine and economy, and reduced emissions, have motivated continued
our ability to control them. The optimal values of these research in advanced internal combustion engine technologies.
variables are designated during engine calibration by means of These technologies, such as fuel injections systems, variable
a static correlation between the controllable variables and the geometry turbocharging, variable valve actuation, and exhaust
corresponding steady-state engine operating points. While the gas recirculation, have increased the number of accessible
engine is running, these correlations are being interpolated to engine controllable variables, and our ability to optimize engine
provide values of the controllable variables for each operating operation. In particular, the determination of the optimal values
point. These values are controlled by the electronic control unit of these variables, referred to as engine calibration, have been
to achieve desirable engine performance, for example in fuel shown to be especially critical for achieving high engine
economy, pollutant emissions, and engine acceleration. The performance and fuel economy while meeting emission
state-of-the-art engine calibration cannot guarantee standards. Consequently, engine calibration is defined as a
continuously optimal engine operation for the entire operating procedure that optimizes one or more engine performance
domain, especially in transient cases encountered in driving indices, e.g., fuel economy, emissions, or engine performance
styles of different drivers. This paper presents the theoretical with respect to the engine controllable variables. Engine
basis and algorithmic implementation for allowing the engine calibration generates a static correlation between the optimal
to learn the optimal set values of accessible variables in real
values of the controllable variables and the corresponding
time while running a vehicle. Through this new approach, the
steady-state engine operating points to coordinate optimal
engine progressively perceives the driver’s driving style and
performance of the specified indices. This correlation is
eventually learns to operate in a manner that optimizes
incorporated into the electronic control unit (ECU) of the
specified performance indices. The effectiveness of the
engine to control engine operation, so that optimal values of the
approach is demonstrated through simulation of a spark
ignition engine, which learns to optimize fuel economy with specified indices are maintained.
respect to spark ignition timing, while it is running a vehicle. Despite the advanced engine technologies, however,
continuously optimal engine operation has not yet been
Keywords: Markov Decision Process (MDP), learning possible. State-of-the-art engine calibration methods rely on
algorithms, sequential decision-making under uncertainty, dynamometer static correlations for steady-state operating
simulation-based optimization, reinforcement learning, internal points accompanied by transient vehicle testing. However, the
combustion engine calibration, fuel economy calibration process, its duration, and its cost grow exponentially
with the number of controllable variables and optimal

*
Author of correspondence, Phone: (734) 647-1409, Fax: (734) 764-4256, Email: [email protected]

1 Copyright © 2007 by ASME


calibration for the entire feasible engine operating domain Brake-Specific Fuel Consumption (BSFC)
Regimes on the Engine Operating Domain
cannot be guaranteed. Even for engines with simple
Max Engine Torque
technologies, achievement of the optimal calibrations may
become impractical [1]. In addition, current calibration
methods cannot guarantee optimal engine operation in transient

Engine Torque
cases encountered in driving styles of different drivers [2]. Trajectory A of engine
operating points
To pre-specify the huge number of transient operations is
impractical and, thus, calibration cannot generate optimal static
correlations for all transient cases a priori. Transient operation Terminal engine operating
point of A and B
constitutes the largest segment of engine operation over a
driving cycle compared to the steady-state one [3, 4]. Trajectory B of engine
Emissions during transient operation are extremely complicated operating points
[4], vary significantly with each particular driving cycle [5, 6],
and are highly dependent upon the calibration [6, 7]. Engine Engine Speed
operating points, during the transient period before their Figure 1. Two trajectories A, and B, of engine operating
steady-state value is reached, are associated with different points ending at the same operating point.
Brake-Specific Fuel Consumption (BSFC) values, depending
on the directions from which they have been arrived, as
BSFC value at the
illustrated qualitatively in Figures 1 and 2. Pollutant emissions terminal operating point BSFC value at the
such as NOx, and particulate matters, demonstrate the same reached from trajectory A terminal operating point
qualitative behavior, as shown by Hagena et al. [8]. reached from trajectory B
The main objective of calibration methods is to expedite
dynamometer tests significantly using a smaller subset of tests. BSFC
This subset is utilized either in implementing engine calibration
experimentally or in developing mathematical models for
evaluating engine output. Using these models, optimization
methods can determine the engine calibration static correlations
BSFC value of the
between steady-state operating points and the controllable Transient Period terminal operating point
engine variables [9]. Design of Experiments (DoE) [10-12] has at steady-
steady-state operation
been widely used as the baseline method. Major applications
include catalyst system optimization [13], optimization of Time
variable valve trains for performance and emissions [14-17], Figure 2. BSFC value of the terminal engine operating point
implementation of dynamic model-based engine calibrations as reached from trajectories A, and B.
[18, 19], and optimization of fuel consumption in a spark
ignition engine with dual-continuously controlled camshaft generate optimized correlations for a V6 engine equipped with
phasing with respect to valve timing [20]. two-step variable valve actuation and intake cam phasing.
DoE-based calibration systems are typically used to reduce Guerrier et al. [18] employed DoE and advanced statistical
the scope of the experiments required to derive the optimal modeling to develop empirical models to advance the
engine calibration correlation under steady-state operating powertrain control module calibration tables. Stuhler et al. [19]
conditions. Dynamic model-based calibration, however, utilizes implemented a standardized and automated calibration
high-fidelity dynamic or transient engine modeling. The data environment, supporting the complexity of gasoline direct
required to develop the engine model are obtained by operating injection engines, for an efficient calibration process using an
the engine through a set of transient dynamometer tests while online DoE to decrease the calibration cost. These engine
the engine calibration is perturbed in real time by a models can predict engine output over transient operation.
reconfigurable rapid prototyping control system. The predictive However, not all the correlations of optimal values of the
engine model produced in this fashion utilizes a combination of controllable engine variables associated with the transient
equation-based and neural network methods. DoE-experimental operating points can be quantified explicitly; to pre-specify the
calibration is well suited only for steady-state engine operation entire transient engine operation is impractical, and thus,
over some driving cycle. In contrast, dynamic modeling engine calibration correlations cannot be optimized for these
produces a transient or dynamic engine model capable of cases a priori.
predicting engine operating cycle. The steady-state optimal Various approaches have been proposed for using artificial
engine calibration can be produced from the transient engine neural networks (ANN) to promote modeling and calibration of
model as a sub-set of the transient engine operation. Rask et al. engines [21-26]. However, ANNs are application-specific and
[1] developed a simulation-based calibration method to rapidly exhibit unpredictable behavior when previously unfamiliar data

2 Copyright © 2007 by ASME


are presented to them. These difficulties increase if a nonlinear 2.1 Markov Decision Process (MDP)
dynamic presentation of a system is to be realized, because of Formally, MDP is a discrete time stochastic control process
the increasing number of possibilities related to the dynamic defined as the tuple:
and the interactions between the input signals. ANNs are Xn={S, A, P(⋅,⋅), R(⋅,⋅)} (1)
suited for formulating objective functions, evaluating the
specified engine performance indices with respect to the where S = {si | i = 1, 2,...N }, N ∈ N denotes a finite state space,
controllable engine variables and, thus, deriving the engine A = U si ∈S A( si ) stands for a finite action space, P(⋅,⋅) is the
calibration correlations. They are computationally efficient for
optimization requiring hundreds of function evaluations. transition probability matrix, and R(⋅,⋅) is the transition reward
However, optimal engine calibration for the entire engine matrix. The decision-making process occurs at each of a
operating domain seldom is guaranteed even for steady-state sequence of stages κ = 0,1, 2,...M , M ∈ N . At each stage, the
operating points. Moreover, the correlations between optimal decision-maker observes a system’s state si ∈ S , and executes
values of the controllable engine variables and the transient an action α ∈ A( si ) , from the feasible set of actions A( si ) ⊆ A
operating points, overall, cannot be quantified explicitly
at this state. At the next stage, the system transits to the state
prohibiting a priori optimal engine calibration.
s j ∈ S imposed by the conditional probabilities p( s j | si , α ) ,
This paper introduces a method to make the engine an
autonomous system that can learn its optimal calibration for the designated by the transition probability matrix P(⋅,⋅). These
entire engine operating domain in real time while running a conditional probabilities of P(⋅,⋅), p : S × A → [0,1] , satisfy the
vehicle. Section 2 builds the general theoretical framework of following constraint
considering the engine operation as a stochastic process, and N

introduces the predictive optimal stochastic learning algorithm. ∑ p(s


j =1
j | si , α ) = 1. (2)
This algorithm predicts the optimal correlation between the
Following this state transition, the decision-maker receives a
engine controllable variables and the operating points (both
reward associated with the action α, R ( s j | si , α ), R : S × A → R
steady-state and transient) based on observations of the engine
outputs. The effectiveness of the approach is demonstrated in as imposed by the transition reward matrix R(⋅,⋅). A two-state
Section 3 through simulation of a Spark Ignition (SI) engine MDP problem with the conditional probabilities and rewards
model; while the SI model is running, it progressively corresponding to all possible transitions is illustrated in Figure
perceives the driver’s conventional driving style and learns the 3. The states of an MDP possess the Markov property, stating
optimal spark ignition values, as illustrated in Section 4. that the conditional probability distribution of future states of
Finally, conclusions are presented in Section 5. the process, given the present state and all past states, depends
only upon the current state and not on any past states, i.e., it is
2. PROPOSED METHOD conditionally independent of the past states (the path of the
Engine operation is described in terms of engine operating process) given the present state.
points and the evaluation of engine performance indices is a Mathematically, the Markov property requires that
function of various controllable variables. In our approach, the
engine performance indices are treated as random functions. p( X n +1 = s j | X n = si , X n −1 = si −1 ,..., X 0 = s0 ) =
(3)
Consequently, the engine is treated as a controlled stochastic = p( X n +1 = s j | X n = si ).
system, and engine operation is treated as a stochastic process.
The problem of engine calibration is thus reformulated as a
sequential decision-making problem under uncertainty. The
main objective towards the solution in this problem is to select p(sκj | sκj , α)
R(sκj | sκj , α)
the optimum values of the controllable variables for each p(sκj | sκi , α)
engine operating point in real time that optimize the random R(sκj | sκi , α)
sj
functions (engine performance indices). The Markov Decision
Process (MDP), extensively covered by Puterman [27],
provides the mathematical framework for modeling sequential
p(sκi | sκj , α)
decision-making problems under uncertainty [28]; it is si
p(sκ sκ R(sκi | sκj , α)
comprised of (a) a decision maker (engine), (b) states (engine i | i , α)
R(sκi | sκi , α)
operating points), (c) actions (controllable variables), (d)
transition probability matrices (driver), (e) transition reward
matrices (engine performance indices), and (f) optimization Figure 3. Probability distribution and rewards for all possible
criteria (e.g., maximizing fuel economy, minimizing pollutant transitions between the states si and sj at stage κ when action α
emissions, maximizing engine performance). is taken

3 Copyright © 2007 by ASME


The solution to a MDP can be expressed as a policy transition probability and reward matrices can be either
π = {µ0 , µ1 ,...µ M } , which provides the action to be executed impractical or impossible to compute.
for a given state, regardless of prior history; µκ is a function Viable alternatives for approaching these problems through
mapping states si into actions α = µκ ( si ), such that a simulation-based stochastic framework have been primarily
developed in the field of Reinforcement Learning (RL) [31,
µκ ( si ) ∈ A( si ). Such policies are addressed as admissible. 32]. RL methods aim to provide effective near-optimal
Consequently, for any initial state at stage κ = 0, si0 , and for solutions to complex problems of planning and sequential
any finite sequence of stages κ = 0,1, 2,...M , M ∈ N , the decision-making under uncertainty. Wheeler et al. [33]
expected accumulated value of the rewards of the decision proposed a learning method in sequential stochastic games
maker is under certain properties; Sutton et al. [34] introduced a class of
incremental learning procedures specialized for prediction,
M using past experience with an incomplete known system to
Vπ ( slM ) = E{R ( s 0j | si0 , µ0 ( si0 )) + ∑ Vπ ( sκj )}, (4) predict its future behavior; Watkins [35] developed an
κ =1 algorithm for systems to learn how to act optimally in
∀si , s j , sl ∈ S . controlled Markovian domains amounting to an incremental
method for dynamic programming imposing limited
In the finite-horizon context the decision maker should computational demands. These aforementioned classic RL
maximize the accumulated value for the next M stages; more methods and algorithms have been utilized successfully in
precisely, an optimal policy π* is one that maximizes the overall various applications, e.g., robotics, control, operations research,
expected accumulated value of the rewards games, human-computer interaction, economics/finance, and
marketing.
Vπ * ( slM ) = max Vπ ( slM ). (5) The rigorous mathematical assumptions required by the
π ∈A majority of existing RL algorithms to converge to optimal
solutions impose limitations in efficiently employing these
Consequently, the optimal policy π * = {µ0* , µ1* ,...µ M* } for the algorithms to solve engineering problems. For the engine
M-stage sequence is calibration problem built upon the MDP theoretical framework
described above, a learning process as employed in RL [32]
π * = arg max Vπ * ( slM ). (6) and a new predictive optimal stochastic control algorithm
π ∈A (POSCA) are developed. The learning process is applied to the
engine to progressively perceive the conventional driving style
The finite-horizon model is appropriate when the decision- of a driver designating the transition probability matrix, P(⋅,⋅).
maker’s “lifetime” is known, namely, the terminal stage of the In addition, during this process the desired engine performance
decision-making sequence. However, in most real-life problems indices, e.g., fuel economy, pollutant emissions, engine
this is not the case; these problems are modeled in the infinite- performance, are represented by the elements of the transition
horizon context, where the overall expected reward is the limit reward matrix, R(⋅,⋅). The intention of the algorithm is to
of the corresponding M-stage overall reward as M → ∞ : predict the optimal policy (values of the engine controllable
variables) π * ∈ A, thus optimizing the expected accumulated
Vπ * ( slM ) = lim max Vπ ( slM ). (7) value of the rewards in the infinite-horizon context.
M →∞ π ∈A

This relation is extremely valuable for various MDP problems, 2.2 The Learning Process of the Engine
where the terminal stage is unknown; Eq. (7) holds under The learning process transpires while the engine is running
certain conditions [29]. the vehicle and interacting with the driver. Taken in
A large class of sequential decision-making problems under conjunction with assigning values of the controllable variables
uncertainty is solved by using classical dynamic programming from the feasible action space, A, this interaction portrays the
algorithms, originally proposed by Bellman [30]. Algorithms, progressive enhancement of the engine’s “knowledge” of the
such as value iteration, policy iteration, and linear driver’s driving style with respect to the controllable variables.
programming are employed to find optimal solution of MDPs. More precisely, at each of a sequence of
However, the computational complexity of these algorithms in stages κ = 0,1, 2,...M , as M → ∞ , the driver introduces a state
some occasions may be prohibitive and can grow intractably siκ ∈ S to the engine, and on that basis the engine selects an
with the size of the problem and its related data. In addition,
dynamic programming algorithms require the realization of the action, α κ ∈ A( si ) . This state arises as a result of the driver’s
transition probability matrix, P(⋅,⋅), and the transition reward driving style corresponding to particular engine operating
matrix, R(⋅,⋅). For complex systems with large state space, the points. One stage later, as a consequence of its action, the

4 Copyright © 2007 by ASME


engine receives a numerical reward, Rκ +1 ∈ R , and transits to a state: siκ
action: ακ
new state sκj +1 ∈ S as illustrated in Figure 4. Engine
reward: Rκ
At each stage, the engine implements a mapping from the
Cartesian product of the state space and action space to the set
of real numbers, S × A → R , by means of the rewards that it
Rκ+1
receives. Similarly, another mapping from the Cartesian
product of the state space and action space to the closed set Driver
[0,1] is executed, S × A → [0,1] , satisfying Eq. (2). The latter sjκ+1
essentially perceives the conventional driving style as Figure 4. The learning process during the interaction
expressed by the incidence in which particular states or between the engine and the driver
particular sequences of states arise. The implementation of
these two mappings aims to disclose the optimal policy π* goal for the engine is to progressively perceive the driving style
(optimal set values of the controllable variables) of the engine and learn the optimal policy in the long run.
designated by the predictive optimal control algorithm. This
policy is expressed by means of a mapping from states to 2.3 The Predictive Optimal Stochastic Control
probabilities of selecting the actions, resulting in the highest Algorithm (POSCA)
expected accumulated value of the rewards. The learning process of the engine transpires at each stage κ
A challenge in the learning process is the trade-off between in conjunction with actions α ∈ A( si ) taken for each state
exploration and exploitation of the action space. Specifically,
si ∈ S . At the early stages and until full exploration of the
the engine has to exploit what is already known regarding the
correlation involving the driving style and the values of the action set A( si ), ∀si ∈ S occurs, the mapping from the states to
controllable variables that maximize the rewards, and also to probabilities of selecting the actions is constant; namely, the
explore those actions that have not yet been tried for this actions for each state are selected randomly with the same
driving style to assess whether these actions may result in probability
higher rewards. This exploration-exploitation dilemma has 1
been extensively reported in the literature; Iwata et al. [36] p(α | si ) = , ∀α ∈ A( si ), ∀si ∈ S . (8)
A( si )
proposed a model-based learning method extending Q-learning
and introducing two separated functions based on statistics and Exploration of the entire feasible action set is important to
on information by applying exploration and exploitation evade sub-optimal solutions when the exploration phase is
strategies; Ishii et al. [37] developed a model-based done. The predictive optimal stochastic control algorithm
reinforcement learning method utilizing a balance parameter, (POSCA) is thus used after the exploration phase. The main
which is controlled based on variation of action rewards and objective of POSCA is to realize the action at each stage which
perception of environmental change; Chan-Geon et al. [38] is optimal not only for the current state, but also for the
proposed an exploration-exploitation policy in Q-learning subsequent states over the following stages. The subsequent
consisting of an auxiliary Markov process and the original states are predicted by POSCA by means of the conditional
Markov process; Miyazaki et al. [39] developed a unified probabilities p( s j | si , α ), p : S × A → [0,1] of the transition
learning system realizing the tradeoff between exploration and probability matrix P(⋅,⋅). The expected accumulated value of
exploitation. the rewards for the subsequent states is perceived in terms of
The exploration-exploitation dilemma is closely related to the ~

type of problem along with the decision-maker’s “lifetime” the magnitude, T ( s j ) , defined as the maximum average future
problem [40]: The longer the lifetime, the worse the reward. Suppose that the current state is si and the following
consequences of prematurely converging on a sub-optimal state given an action α ∈ A( si ) is sj with
solution. This could result from not fully exploring the entire
feasible action space for each state. In our case, the objective is probability p( s j | si , a) . The average future overall reward will
to make the engine learn its optimal calibration for the driver’s be
driving style in the infinite-horizon context. Consequently, the ⎛ N ⎞
engine has to explore the entire action space for any state being ~ ⎜ ∑ p ( sl | s j , a) ⋅ R( sl | s j , a ) ⎟
visited by the particular driving style. In particular, it is T ( s j ) = max ⎜ l =1 ⎟,
a∈A ⎜ N ⎟ (9)
assumed that for any state si ∈ S corresponding to the driving ⎜ ⎟
⎝ ⎠
style, all actions of the feasible action set α ∈ A( si ) are selected
∀s j ∈ S .
at least once. This may result in sacrificing the engine
performance indices in the short run; however, the ultimate The immediate expected reward by transiting from state si to
state s j given an action α ∈ A( si ) is

5 Copyright © 2007 by ASME


.

~ where m f is the fuel mass flow rate per unit time and P is
t ( s j | si , a) = p ( s j | si , a) ⋅ R( s j | si , a). (10) engine’s power output. Continuous engine operation at MBT
ensures optimum fuel economy with respect to the spark
For the problem of optimal control of uncertain systems, which ignition timing.
is treated in a stochastic framework, all uncertain quantities are For a successful real-time, self-learning optimization of
described by probability distributions and the expected value of engine calibration in terms of spark ignition timing, the engine
the overall reward is maximized. In this context, the optimal should realize the MBT timing for each engine operating point
policy π* realized by POSCA is based on the maxmin control (steady-state and transient) dictated by the driving style of a
approach, whereby the worst possible values of the uncertain driver. Consequently, by achieving MBT timing for all steady-
quantities within the given set are assumed to occur. This is a state and transient operating points an overall improvement of
pessimistic point of view which essentially assures that the the BSFC is expected. Aspects of preventing knocking are not
optimal policy will result in at least a minimum overall reward considered in this example; however, they can be easily
value. Consequently, being at state si , POSCA predicts the incorporated by defining the spark ignition space to include the
maximum allowable values.
optimal policy π* in terms of the values of the controllable
The software package enDYNA by TESIS [42] suitable for
variables as
real-time simulation of internal combustion engines is
employed. The software utilizes thermodynamic models of the
⎧⎪ ⎫⎪
π * ( si ) = arg max ⎨min ⎛⎜ t ( s j | si , a ) + T ( s j ) ⎞⎟ ⎬ ,
~ ~
gas path and is well suited for testing and development of
⎪⎩ a∈j A ⎝ ⎠⎪
s ∈S (11) electronic engine controllers. In the example, a four-cylinder
⎭ gasoline engine is used from the enDYNA model database. The
∀si , s j ∈ S. software’s static correlation involving spark ignition timing and
engine operating points is bypassed to incorporate the POSCA
3. EXAMPLE algorithm. This correlation is designated by the baseline
An example of real-time, self-learning optimization of the calibration that enDYNA model is accompanied by, and is
calibration with respect to spark ignition timing in a spark included in, the engine’s ECU. In the context of the MDP
ignition engine is presented. In spark ignition engines the fuel problem, the states represent the pair of gas-pedal position and
and air mixture is prepared in advance before it is ignited by engine speed, and the actions denote the spark ignition timing;
the spark discharge. The major objectives for the spark ignition the rewards that the decision-maker (engine) receives
are to initiate a stable combustion and to ignite the air-fuel correspond to the engine brake torque.
mixture at the crank angle resulting in maximum efficiency, The engine model is run repeatedly over the same driving
while fulfilling emissions standards and preventing the engine style represented by the pedal position. Every run over this
from knocking. Simultaneous achievement of the driving style constitutes one complete simulation. To evaluate
aforementioned objectives is sometimes inconsistent; for the efficiency of our approach in both steady-state and transient
instance, at high engine loads the ignition timing for maximum engine operation, the pedal position rate is chosen to represent
efficiency has to be abandoned in favor of prevention of engine an aggressive acceleration, as illustrated in Figure 6.
destruction by way of engine knock. Two essential parameters
are controlled with the spark ignition: ignition energy and
ignition timing. Control of ignition energy is important for
assuring combustion initiation, but the focus here is on the Maximum Brake Torque
spark timing that maximizes engine efficiency. Ignition timing (MBT)
influences nearly all engine outputs and is essential for
Engine Torque

efficiency, drivability, and emissions. The optimum spark


ignition timing generating the maximum engine brake torque is
defined as Maximum Brake Torque (MBT) timing [41]. Any
ignition timing that deviates from MBT lowers the engine’s
output torque as illustrated in Figure 5. A useful parameter for
evaluating fuel consumption of an engine is the Brake-Specific
Fuel Consumption (BSFC), defined as the fuel flow rate per
unit power output. This parameter evaluates how efficiently an
engine is utilizing the fuel supplied to produce work
RETARD TDC ADVANCE
Spark Ignition Timing
.
m f ( g / h) Figure 5. Effect of spark ignition timing on the engine brake
bsfc( g / kW ⋅ h) = , (12)
P(kW ) torque at constant engine speed

6 Copyright © 2007 by ASME


35
Before initiating the first simulation of the engine model,
the elements of the transition probability and reward matrix are 30
assigned to be zero. That is, the engine at the beginning has no
knowledge regarding the particular style and the values of the 25

Spark Ignition [deg]


rewards associated with the controllable variables.
20

4. RESULTS 15
After completing the fourth simulation, POSCA specified
the optimal policy in terms of the spark ignition timing, as 10
shown in Figure 7, and compared with the spark ignition timing
designated by the baseline calibration of the enDYNA model. 5 Baseline Engine Calibration
Real-Time Engine Calibration through Learning
The optimal policy resulted in higher engine brake torque
0
compared to the baseline calibration as shown in Figures 8 and 2 2.5 3 3.5 4
Time [sec]
9. This improvement indicates that the engine with self-
learning calibration was able to operate closer to MBT timing. Figure 7. Spark ignition timing over the
Having the engine operate at MBT timing resulted in an overall driving style
minimization of the BSFC, illustrated in Figure 10. Figure 11
compares the velocity of the two vehicles, one carrying the 140
engine with the baseline calibration and the other with the self-
calibrated one. 120

Engine Brake Torque [Nm]


The two vehicles were simulated for the same driving style,
namely, the same pedal-position rate. The vehicle carrying the 100
engine with the self-learning calibration demonstrated higher
velocity, since the engine produced higher brake torque for the 80

same gas-pedal position rate.


60
Consequently, if the driver wishes to follow a specific
vehicle’s speed profile, this can now be achieved by stepping Baseline Engine Calibration
40
on the gas-pedal more lightly than required in the engine with Real-Time Engine Calibration through Learning

the baseline calibration and, therefore, directly enabling in 20


2 2.5 3 3.5 4
additional benefits in fuel economy. Time [sec]

Figure 8. Engine brake torque

90

80
135
70
Gas-Pedal Position [deg]

Engine Brake Torque [Nm]

60 130

50
125
40

30 120

20 115
10
Baseline Engine Calibration
110 Real-Time Engine Calibration through Learning
0
2 2.5 3 3.5 4
Time [sec] 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8
Time [sec]
Figure 6. Gas-pedal position rate representing
Figure 9. Engine brake torque (zoom-in)
a driver’s driving style

7 Copyright © 2007 by ASME


The example presented the real-time, self learning
700
calibration of a spark ignition engine with respect to spark
690
ignition timing. The engine was able to realize the MBT timing
680 for each engine operating point (steady-state and transient)
670 designated by a driving style representing an aggressive
acceleration and, thus, minimizing BSFC. Aspects of
BSFC [g/kW h]

660
preventing knocking were not considered in this example;
650
however, a potential extension is possible such as defining the
640
spark ignition space to include the maximum allowable values
630 ensuring engine operation without knocking. POSCA predicted
620 efficiently the optimal control policy (spark ignition timing) for
610
Baseline Engine Calibration each state (engine operating point). It is left for future research
Real-Time Engine Calibration through Learning
to explore the impact of traffic patterns, and terrain, on the
600
2 2.5 3 3.5 4 general applicability of having the engine learn its optimal
Time [sec]
calibration for an individual driving style. Future research
Figure 10. BSFC comparison between the baseline should also investigate the potential of advancing POSCA in
and self-learning calibration predicting the optimal policy of a number of controllable
variables associated with different states and, thus, avoiding the
18
Baseline Engine Calibration enhancement of the problem’s dimensionality. Increased
Real-Time Engine Calibration through Learning
dimensionality is a major challenge for learning algorithms.
17
Velocity of the Vehicle [mph]

16
ACKNOWLEDGMENTS
This research was partially supported by the Automotive
15 Research Center (ARC), a U.S. Army Center of Excellence in
14
Modeling and Simulation of Ground Vehicles at the University
of Michigan. The engine simulation package enDYNA was
13 provided by TESIS DYNAware GmbH. This support is
12 gratefully acknowledged.

11 REFERENCES
2.8 3 3.2 3.4 3.6 3.8
Time [sec]
[1] Rask, E. and Sellnau, M., "Simulation-Based Engine
Figure 11. Velocity of the two vehicles carrying the
Calibration: Tools, Techniques, and Applications," SAE
engine with baseline and self-learning calibration
Transactions-Journal of Engines, v. 113, 2004, SAE 2004-
01-1264.
5. CONCLUDING REMARKS
[2] Atkinson, C. and Mott, G., "Dynamic Model-Based
The POSCA algorithm allows an internal combustion
Calibration Optimization: An Introduction and
engine to act as an autonomous system that can learn its
Application to Diesel Engines," SAE World Congress,
optimal calibration for both steady-state and transient operating
Detroit, Michigan, April 11-14, 2005, SAE 2005-01-0026.
points in real time while running a vehicle. The engine
[3] Rakopoulos, C. D., Giakoumis, E. G., Hountalas, D. T.,
progressively perceives the driver’s driving style and,
and Rakopoulos, D. C., "The Effect of Various Dynamic,
eventually, learns to coordinate optimal performance of several
Thermodynamic and Design Parameters on the
specified indices, e.g., fuel economy, pollutant emissions,
Performance of a Turbocharged Diesel Engine Operating
engine performance, for this particular driving style. The longer
under Transient Load Conditions," SAE 2004 World
the engine runs with a particular driving style, the better the
Congress and Exhibition, Detroit, Michigan, April 8-11,
specified indices will be. The engine’s ability to learn its
2004, SAE 2004-01-0926.
optimum calibration is not limited, however, to a particular
[4] Wijetunge, R. S., Brace, C. J., Hawley, J. G., Vaughan, N.
driving style. The engine can learn to operate optimally for
D., Horroocks, R. W., and Bird, G. L., "Dynamic Behavior
different drivers by assigning the transition probability P(⋅,⋅),
of a High-Speed, Direct-Injection Diesel Engine," SAE
and reward matrices R(⋅,⋅) for each driver. The drivers would Transactions-Journal of Engines, v. 108, 1999, SAE 1999-
indicate their identities before starting the vehicle to denote the 01-0829.
pair of these matrices that the engine should employ. The [5] Clark, N. N., Gautam, M., Rapp, B. L., Lyons, D. W.,
engine can then adjust its operation to be optimal for a Graboski, M. S., McCormick, R. L., Alleman, T. L., and
particular driver based on what it has learned in the past National, P. N., "Diesel and CNG Transit Bus Emissions
regarding his/her driving style. Characterization by Two Chassis Dynamometer

8 Copyright © 2007 by ASME


Laboratories: Results and Issues," SAE Transactions- Calibration," SAE Transactions-Journal of Engines, v.
Journal of Fuels and Lubricants, v. 108, 1999, SAE 1999- 113, 2004, SAE 2004-01-1466.
01-1469. [19] Stuhler, H., Kruse, T., Stuber, A., Gschweitl, K., Piock,
[6] Samulski, M. J. and Jackson, C. C., "Effects of Steady- W., Pfluegl, H., and Lick, P., "Automated Model-Based
State and Transient Operation on Exhaust Emissions from GDI Engine Calibration Adaptive Online DoE Approach,"
Nonroad and Highway Diesel Engines," SAE SAE 2002 World Congress, Detroit, Michigan, March 3-7,
Transactions-Journal of Engines, v. 107, 1998, SAE 2002, SAE 2002-01-0708.
982044. [20] Roepke, K. and Fischer, M., "Efficient Layout and
[7] Green, R. M., "Measuring the Cylinder-to-Cylinder EGR Calibration of Variable Valve Trains," SAE Transactions-
Distribution in the Intake of a Diesel Engine During Journal of Engines, v. 110, 2001, SAE 2001-01-0668.
Transient Operation," SAE Transactions-Journal of [21] Ayeb, M., Theuerkauf, H. J., and Winsel, T., "SI Engine
Engines, v. 109, 2000, SAE 2000-01-2866. Emissions Model Based on Dynamic Neural Networks
[8] Hagena, J. R., Filipi, Z. S., and Assanis, D. N., "Transient and D- Optimality," SAE World Congress, Detroit,
Diesel Emissions: Analysis of Engine Operation During a Michigan, April 11-14, 2005, SAE 2005-01-0019.
Tip-In," SAE 2006 World Congress, Detroit, Michigan, [22] Brahma, I., He, Y., and Rutland, C. J., "Improvement of
April 3-6, 2006, SAE 2006-01-1151. Neural Network Accuracy for Engine Simulations,"
[9] Papalambros, P. Y. and Wilde, D. J., Principles of Optimal Powertrain and Fluid Systems Conference and Exhibition,
Design: Modeling and Computation, 2nd edition, Pittsburgh, Pennsylvania, September 27-30, 2003, SAE
Cambridge University Press, July 2000. 2003-01-3227.
[10] Clarke, G. M. and Kempson, R. E., Introduction to the [23] Lowe, D. and Zapart, K., "Validation of Neural Networks
Design and Analysis of Experiments, Hodder Arnold, in Automotive Engine Calibration," Proceedings of the
November 1996. Fifth International Conference on Artificial Neural
[11] Hicks, C. R. and Turner, K. V., Fundamental Concepts in Networks, Cambridge, UK, 1997.
the Design of Experiments, 5th edition, Oxford University [24] Meyer, S. and Greff, A., "New Calibration Methods and
Press, USA, March 1999. Control Systems with Artificial Neural Networks," SAE
[12] Diamond, W. J., Practical Experiment Designs : for 2002 World Congress, Detroit, Michigan, March 4-7,
Engineers and Scientists, 3rd edition, John Wiley & Sons, 2002, SAE 2002-01-1147.
February 2001. [25] Wu, B., Prucka, R. G., Filipi, Z. S., Kramer, D. M., and
[13] Edwards, S. P., Grove, D. M., and Wynn, H. P., Statistics Ohl, G. L., "Cam-Phasing Optimization Using Artificial
for Engine Optimization, 1st edition, John Wiley & Sons Neural Networks as Surrogate Models-Fuel Consumption
Canada, December 2000. and NOx Emissions," SAE 2006 World Congress, Detroit,
[14] Regner, G., Teng, H., Wieren, P. V., Park, J. I., Park, S. Y., Michigan, April 3-6, 2006, SAE 2006-01-1512.
and Yeom, D. J., "Performance Analysis and Valve Event [26] Wu, B., Prucka, R. G., Filipi, Z. S., Kramer, D. M., and
Optimization for SI Engines Using Fractal Combustion Ohl, G. L., "Cam-Phasing Optimization Using Artificial
Model," Powertrain and Fluid Systems Conference and Neural Networks as Surrogate Models~Maximizing
Exhibition, Toronto, Ontario, Canada, September 16-19, Torque Output," SAE Transactions-Journal of Engines, v.
2006, SAE 2006-01-3238. 114, 2005, SAE 2005-01-3757.
[15] Ghauri, A., Richardson, S. H., and Nightingale, C. J. E., [27] Puterman, M. L., Markov Decision Processes: Discrete
"Variation of Both Symmetric and Asymmetric Valve Stochastic Dynamic Programming, 2nd Rev. edition,
Events on a 4-Valve SI Engine and the Effects on Wiley-Interscience, 2005.
Emissions and Fuel Economy," SAE 2000 World [28] Sennott, L. I., Stochastic Dynamic Programming and the
Congress, Detroit, Michigan, March 6-9, 2000, SAE Control of Queueing Systems, 1st edition, Wiley-
2000-01-1222. Interscience, September 1998.
[16] Amann, M., Buckingham, J., and Kono, N., "Evaluation [29] Bertsekas, D. P., Dynamic Programming and Optimal
of HCCI Engine Potentials in Comparison to Advanced Control (Volumes 1 and 2), Athena Scientific, September
Gasoline and Diesel Engines," Powertrain and Fluid 2001.
Systems Conference and Exhibition, Toronto, Ontario, [30] Bellman, R., Dynamic Programming. Princeton, NJ,
Canada, September 16-19, 2006, SAE 2006-01-3249. Princeton University Press, 1957.
[17] Jankovic, M. and Magner, S., "Fuel Economy [31] Bertsekas, D. P. and Tsitsiklis, J. N., Neuro-Dynamic
Optimization in Automotive Engines," Proceedings of the Programming (Optimization and Neural Computation
2006 American Control Conference, Minneapolis, MN, Series, 3), 1st edition, Athena Scientific, May 1996.
USA, 2006. [32] Sutton, R. S. and Barto, A. G., Reinforcement Learning:
[18] Guerrier, M. and Cawsey, P., "The Development of An Introduction (Adaptive Computation and Machine
Model-Based Methodologies for Gasoline IC Engine Learning), The MIT Press, March 1998.

9 Copyright © 2007 by ASME


[33] Wheeler, R. and Narenda, K., "Decentralized Learning in [38] Chan-Geon, P. and Sung-Bong, Y., "Implementation of the
Finite Markov Chains," IEEE Transactions on Automatic Agent Using Universal On-Line Q-learning by Balancing
Control, vol. 31(6), pp. 373-376, 1986. Exploration and Exploitation in Reinforcement Learning,"
[34] Sutton, R. S., "Learning to Predict by the Methods of Journal of KISS: Software and Applications, vol. 30, pp.
Temporal Difference," Machine Learning, vol. 3, pp. 9- 672-80, 2003.
44, 1988. [39] Miyazaki, K. and Yamamura, M., "Marco Polo: a
[35] Watkins, C. J., Learning from Delayed Rewards, PhD Reinforcement Learning System Considering Tradeoff
Thesis, Kings College, Cambridge, England, May 1989. Exploitation and Exploration under Markovian
[36] Iwata, K., Ito, N., Yamauchi, K., and Ishii, N., Environments," Journal of Japanese Society for Artificial
"Combining Exploitation-Based and Exploration-Based Intelligence, vol. 12, pp. 78-89, 1997.
Approach in Reinforcement Learning," Proceedings of the [40] Kaelbling, L. P., Littman, M. L., and Moore, A. W.,
Intelligent Data Engineering and Automated - IDEAL "Reinforcement Learning: a Survey," Journal of Artificial
2000, Hong Kong, China, 2000. Intelligence Research, vol. 4.
[37] Ishii, S., Yoshida, W., and Yoshimoto, J., "Control of [41] Heywood, J., Internal Combustion Engine Fundamentals,
Exploitation-Exploration Meta-Parameter in 1st edition, McGraw-Hill, April 1988.
Reinforcement Learning," Journal of Neural Networks, [42] TESIS, <http://www.tesis.de/en/>.
vol. 15, pp. 665-87, 2002.

10 Copyright © 2007 by ASME

View publication stats

You might also like