0% found this document useful (0 votes)

198 views

Case-Based Reinforcement Learning For Dynamic Inventory Control in A Multi-Agent Supply-Chain System

Case-based Reinforcement Learning for Dynamic Inventory Control in a Multi-Agent Supply-chain System

Uploaded by

Umang Soni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

198 views

Case-Based Reinforcement Learning For Dynamic Inventory Control in A Multi-Agent Supply-Chain System

Case-based Reinforcement Learning for Dynamic Inventory Control in a Multi-Agent Supply-chain System

Uploaded by

Umang Soni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Expert Systems with Applications 36 (2009) 6520–6526

Contents lists available at ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier.com/locate/eswa

Case-based reinforcement learning for dynamic inventory control in a

multi-agent supply-chain system
Chengzhi Jiang *, Zhaohan Sheng
Department of Management and Engineering, Nanjing University, 22 Hankou Road, Nanjing 210093, PR China

a r t i c l e i n f o a b s t r a c t

Keywords: Reinforcement learning (RL) appeals to many researchers in recent years because of its generality. It is an
Inventory control approach to machine intelligence that learns to achieve the given goal by trial-and-error iterations with
Reinforcement learning its environment. This paper proposes a case-based reinforcement learning algorithm (CRL) for dynamic
Supply-chain management inventory control in a multi-agent supply-chain system. Traditional time-triggered and event-triggered
Multi-agent simulation
ordering policies remain popular because they are easy to implement. But in the dynamic environment,
the results of them may become inaccurate causing excessive inventory (cost) or shortage. Under the con-
dition of nonstationary customer demand, the S value of (T, S) and (Q, S) inventory review method is
learnt using the proposed algorithm for satisfying target service level, respectively. Multi-agent simula-
tion of a simpliﬁed two-echelon supply chain, where proposed algorithm is implemented, is run for a few
times. The results show the effectiveness of CRL in both review methods. We also consider a framework
for general learning method based on proposed one, which may be helpful in all aspects of supply-chain
management (SCM). Hence, it is suggested that well-designed ‘‘connections” are necessary to be built
between CRL, multi-agent system (MAS) and SCM.
Ó 2008 Elsevier Ltd. All rights reserved.

1. Introduction proach to forecasting the demand for spare parts of electronic

equipment, providing a more accurate determination on stock le-
Supply-chain management (SCM) has been providing competi- vel for satisfying negotiated customer service level. Ashayeri,
tive advantages for enterprises in the market. In that, inventory Heuts, Lansdaal, and Strijbosch (2006) also develop cyclic produc-
control plays an important role and has been attracting attentions tion-inventory optimization models for the process manufacturing
from many researchers in recent years. Some known inventory industry. ElHafsi (2007) shows that optimal inventory allocation
control policies are studied and improved for all aspects, such as policy in an assemble-to-order system is a multi-level state-depen-
reduced cost, more flexibility. Chen, Li, Marc Kilgour, and Hipel dent rationing policy. Díez, Erik Ydstie, Fjeld, and Lie (2008) design
(2006) introduce a case-based multi-criteria ABC analysis, that im- model-based controllers based on discretized population balance
proves on this approach by accounting for additional criteria, such (PB) models for particular processes, which are encountered in al-
as lead time and criticality of SKUs. This procedure provides more most any branch of process industries. Kopach, Balcıoğlu, and Car-
flexibility to account for more factors in classifying SKUs. Lee and ter (2008) revisit a queuing model and determine an optimal
Wu (2006) propose the statistical process control (SPC) based inventory control policy using level crossing techniques in blood
replenishment method, in which inventory rules and demand rules industry.
are developed to determine the amount of order replenishment for Meanwhile, identifying factors affecting inventory management
solving order batching problem. This control system performs well performance such as cost and demand also assists in designing the
at reducing backorders, and bullwhip effect. Yazgı Tütüncü, Aköz, controllers. Andersson and Marklund (2000) introduce a modified
Apaydın, and Petrovic (2007) present new models for continuous cost structure at the warehouse, and then multi-level inventory
review inventory control in the presence of uncertainty. The opti- control problem can be decomposed to single-level problems. By
mal order quantity and the optimal reorder point are found to min- applying a simple coordination procedure to them, the near opti-
imize the fuzzy cost. mal solution is obtained. Zhang (2007) studies an inventory control
On the other hand, different inventory management systems problem under temporal demand heteroscedasticity, which is
could be designed according to a specific industry or environment. found to have a significant influence on firm’s inventory costs.
Aronis, Magou, Dekker, and Tagaras (2004) apply Bayesian ap- Chiang (2007) uses dynamic programming to determine the opti-
mal control policy for a standing order system. Yazgı Tütüncü
* Corresponding author. Tel.: +86 13851833708. et al. (2007) make use of fuzzy set concepts to treat imprecision
E-mail address: [email protected] (C. Jiang). regarding the costs and probability theory to treat customer

0957-4174/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2008.07.036
C. Jiang, Z. Sheng / Expert Systems with Applications 36 (2009) 6520–6526 6521

demand uncertainty. Additionally, Maity and Maiti (2007) devise

··· Multiple supplier agents
the optimal production and advertising policies for an inventory
control system considering inflation and discounting in fuzzy
environment.
··· ··· Multiple retailer agents
It is observed that in recent researches mentioned, mathemati-
cal or analytical models are preferred, such as Bayesian approach
(Aronis et al., 2004), Utility Function Method (Maity & Maiti,
Multiple customer
2007), fuzzy set concepts (Yazgı Tütüncü et al., 2007) and Autore- ··· ··· ···
agents
gressive and Integrated Moving Average and Generalized Autore-
gressive Conditional Heteroscedasticity (Zhang, 2007). This kind Fig. 1. The supply-chain model under consideration.
of method provides strict deduction, which usually involves com-
plicated notations and equations under assumptions. However, on
one hand, the problem may be time-varying under dynamic envi- The second condition requests additional customer selecting
ronment, especially in the evolving system like supply chain where standards and strategies for retailers trying to attract customers
the solution in one time may be not suitable for another time. On or maximize their profits. The former one adopts simplified moti-
the other hand, those models are too difficult for managers to vation function proposed by Zhang and Zhang (2007)
implement in the real enterprises because of the complicated calcu- Mi ¼ PSi Pi þ QSi Q i þ fti ini : ð1Þ
lations involved. This requires the learning ability to enrich one’s
Mi motivation of retailer i (i = 0 to Nretailer 1) of a customer agent.
experience continuously in order to make reasonable decisions.
PSi presents customer agent’s price sensitivity parameter to retailer
Reinforcement learning (RL) is an approach to machine intelligence
i, while QSi is the quality sensitivity parameter and fti is the follower
that combines the fields of dynamic programming and supervised
tendency. Pi and Qi are the price and quality of retailer i respectively.
learning to yield powerful machine-learning systems (Harmon &
ini is the influence received from some other customer agents as
Harmon, 1996). Chi, Ersoy, Moskowitz, and Ward (2007) demon-
friends with the respect to retailer i. Eq. (1) is further specified as
strate and validate the applicability and desirability of using ma-
follows:
chine learning techniques to model, understand, and optimize

complex supply chains. To make the best use of learning methods, Mi ¼ ðaPi Pave þ kÞ Pi þ bjQ i Q ave j þ L Q i þ ft ini ; ð2Þ
intelligent entities are the necessary carriers. Multi-agent Systems
(MAS) seem to be a good choice where the agents are characterized where, a > 1, 0 < b < 1, Pave and Qave are the average price and qual-
of intelligence, autonomy, interactive and reactivity. Liang and ity provided by retailers. k is a constant presenting the price sensi-
Huang (2006) develop a multi-agent system to simulate a supply tivity which is varied according to the social-status of customers.
chain, where agents are coordinated to control inventory and min- And L is the corresponding quality sensitivity constant of custom-
imize the total cost of a supply chain. Govindu and Chinnam (2007) ers. The calculation of ini is treated as: each customer has a list of
propose a generic process-centered methodological framework for influence out from positive to negative corresponding to its own
analysis and design of multi-agent supply chain systems. rank of retailers. ini equals the added value of influence out of retai-
Therefore, this paper proposes a reinforcement learning algo- ler i from friend agents.
rithm combined with case-base reasoning (CRL) in a multi-agent And a simple adjustment strategy for retailer i is used here as
supply-chain system. Similar research is carried out by Kwon, follows:
Kim, Jun, and Lee (2007). They suggest a case-based myopic if its market share is below average, then Pi = Pi p;
reinforcement learning algorithm for satisfying target service le- else if its market share is above average, then Pi = Pi + p;
vel using vendor managed inventory model. And in this paper, else no change is made.
we are trying to provide a simpler learning method with similar Under both conditions, the demand that is not met immediately
or better performance, which could be used more widely and is treated as lost sales. However, under condition 2, for each cus-
easier to implement by managers. Furthermore, the ‘‘connec- tomer has its own rank of retailers according to motivation func-
tions” are strongly recommended to be built between CRL, tion values, it may select the next one when the retailer of
MAS and SCM, thus a generic reinforcement learning method is higher rank has insufficient inventory.
also suggested. Besides, all retailers firstly use periodical review order-up-to (T,
The remainder of this paper is organized as follows. Section 2 S) system and then order-quantity reorder point (Q, S) system for
explains the multi-agent supply-chain model including the inven- inventory management. And the order-up-to level and reorder
tory control problem. Section 3 presents the CRL algorithm in more point in both systems are learned, respectively using CRL for satis-
detail. Simulation environment for measuring the performance of fying target service level (see Figs. 2 and 3). The goal of each retai-
CRL is explained and the results are presented in Section 4. Section ler is to satisfy its target service level predefined while trying to cut
5 considers a generic RL method based on the proposed one. Final- excessive inventory. In this paper, the fill-rate type service level is
ly, the conclusion and future research are provided in Section 6. adopted.
It can be seen that the CRL in (T, S) system takes place before
2. Multi-agent supply-chain model lead time, because it will learn the rewards before and suggest

A simpliﬁed two-echelon supply chain consisting of multiple

customers and retailers is designed for demonstration (see Fig. 1). T LT T LT T
It is assumed that each retailer receives all ordered stocks from sup-
pliers in a ﬁxed lead time regardless of the amount of order. And it
faces nonstationary customer demand under two conditions:

(i) Each retailer has a ﬁxed group of customers whose demand CRL CRL
is nonstationary. stocks arrived stocks arrived
(ii) Each customer is free to choose one retailer in each period,
i.e., in a competitive market. Fig. 2. (T, S) inventory replenishment mechanism.
6522 C. Jiang, Z. Sheng / Expert Systems with Applications 36 (2009) 6520–6526

LT LT der point (S) are learned based on the past experience. The result-
ing service levels of previously suggested S values are treated as
rewards for the actions taken before. The (State, Action, reward) re-
cords provide reference when similar state is met again.
when inventory <S when inventory <S The details of elements are shown as follows:
order Q order Q Case½CaseSize ¼ fCase½0; Case½1; . . . ; Case½CaseSize 1g;
CRL CRL
stocks arrived stocks arrived where Case [i] = {inventory level (int), Dmean (int)}.
Case [CaseSize] (array of vectors) is used to record the different
Fig. 3. (Q, S) inventory replenishment mechanism. states met. Each state includes two elements: current S value and
estimated current mean of customer demand.

an order-up-to level that is implemented immediately. While in (Q, Counter½CaseSize

S) system, the CRL is implemented after lead time for the reason ¼ fCounter½0; Counter½1; . . . ; Counter½CaseSize 1g;
that the suggested reorder point is used when next condition of
where Counter [i] = {counter (int)}.
inventory replenishment is met.
Finally, all agents and associated attributes are introduced and
explained in the following part, the model is programmed under
JDK (Java2 Development Kit) 1.5. SLba [i] Action [i] SLaa [i]

1. Customer Class: ID (int), customer identity, from 0 to Ncustomer 1. Action_nn [0] SLaa_nn [0]
Demand (int), nonstationary. k (int), price sensitivity parameter. L

...

...
(int), quality sensitivity parameter. Index[] (array of double), rank SLba_n [0] Action_n [0] SLaa_n [0]
Action_nn [i] SLaa_nn [i]
of motivation values. Rank[] (array of all retailer objects), rank of SLba_n [1] Action_n [1] SLaa_n [1]

...

...
retailers. Inﬂuence_out[] (array of double), inﬂuence out to friend
...

...

...
agent (s) with respect to retailers. Inﬂuence_in[] (array of double),
Action_nn [0] SLaa_nn [0]
received inﬂuence from friend agent (s). Follow (double), follower SLba_n [i] SLaa_n [i]
Action_n [i]

...

...
tendency. FriendID[] (array of int), IDs of friend agent (s). a, b (sta-
tic double), motivation function parameters. Mean (static int), CV Action_nn [i] SLaa_nn [i]
...

...

...
(static double), Deviation (static int), T (static int), extent (static dou-

...

...
ble), parameters of nonstationary demand. Range (static int),
Fig. 4. Overview of data structure.
number of friend agent(s). SuperID (Vector), records the ID of
seller each time. BuyHistory (Vector), records IDs of retailers in
each defined period.
Case [i] SLba [i] Action [i] SLaa [i]
2. Retailer Class: ID (int), retailer identity. price (double), price of
products provided. quality (double), quality of products pro-
vided. cost (double), profit (double), bank (double), reserved. SLba_n [i] Action_n [i] SLaa_n [i]
Stock_in (Vector), Stock_out (Vector), stock flow records.
Money_in (Vector), Money_out (Vector), money flow records, Action_nn [i] SLaa_nn [i]

reserved. LostSales (Vector) records of lost sales. Order (int),

Fig. 5. Corresponding relationships of elements.
inventory replenishment quantity each time. TimeNow (int),
running time in marker. T (static int), review period of (T, S)
system. Q (static int) order quantity of (Q, S) system. Max_Stock
S[i] S[i+1]
(int), order-up-to level for (T, S) system or re-order point for (Q,
S) system. TargetSL (static double), predeﬁned target service Grid1 Grid1 Grid1 Grid1
level. SLave (static double), average service level. Current_Stock Range for [i] Range for [i+1]
Case [i]={S[i],D[i]}
(int), current inventory level. CurrentSL (double), current service
Case [i+1]={S[i+1], D[i+1]}
level achieved. Case [] (array of Vectors), CaseSize (int), Counter [] D[i] D[i+1]
(array of int), CaseMarker (int), ActionMarker (int), UpdateMarker
Grid2 Grid2 Grid2 Grid2
(int), InvenIncrease (int), InvenDecrease (int), SLba [] (array of
double), SLaa [] (array of Vectors), Actions [] (array of Vectors),
Range for [i] Range for [i+1]
elements of CRL. SuperID (Vector), SuperHistory (Vector), records
of suppliers, reserved. NextID (Vector), NextHistory (Vector), Fig. 6. Criteria for similarity of level 1 searching.
records of customers.
3. Transfer Class: It is in charge of transferring money and stock
between upstream and downstream agents, calculating average S[i] S[i+1]
price, quality and service level and so on.
4. Business Class: It provides the market environment for supply
chain. The work ﬂow of other agents is controlled in it.
Case [i]={S[i],D[i]}
Current S
Case [i+1]={S[i+1], D[i+1]}
D[i] D[i+1]
3. Case-based reinforcement learning

In this section, the elements of case-base reinforcement learn- Current D

ing mentioned in Section 2 are explained in more details and the
whole procedure is presented. The order-up-to level (S) and reor- Fig. 7. Situation of two similar states.
C. Jiang, Z. Sheng / Expert Systems with Applications 36 (2009) 6520–6526 6523

Table 1 SLaa½CaseSize ¼ fSLaa½0; SLaa½1; . . . ; SL½CaseSize 1g;

Simulation modes under condition 1
where SLaa [i] = {SLaa_n [0], SLaa_n [1], ... , SLaa_n [i], ...}, where:
Mode [i] CV T Extent
SLaa_n [i] = {SLaa_nn [0], SLaa_nn [1], ... , SLaa_nn [i], ...}, where:
M1 0.2 Uniform (50, 80) Uniform (1, 1) SLaa_nn [i] = {resulting service level after action (reward)
M2 0.2 Uniform (15, 30) Uniform (2, 2)
(double)}.
M3 0.4 Uniform (50, 80) Uniform (1, 1)
M4 0.4 Uniform (15, 30) Uniform (2, 2) According to action structure, records of resulting service levels
use the same conﬁguration. The overview of data structure of ele-
ments is displayed in Fig. 4 and their corresponding relationships
Counter [CaseSize] (array of int) is an array of counters which
are summarized in Fig. 5.
memorize the frequency of the corresponding state met.
All initial values of data structures above are set to empty. The
SLba½CaseSize ¼ fSLba½0; SLba½1; . . . ; SLba½CaseSize 1g; case-based reinforcement learning algorithm is summarized as
three-level searching. The steps of it are described as follows:
where SLba [i] = {SLba_n [0], SLba_n [1], . . . ,SLba_n [i], . . .}, where:
SLba_n [i] = {Current Service Level before action (double)}.
Step 0: Before searching, the SLaa_n [i] is updated as reward for
SLba [CaseSize] (array of double) records the current service lev-
the last action. The exact location of update is recorded
els before action under corresponding case
by three markers: CaseMarker, ActionMarker,
Action½CaseSize ¼ fAction½0; Action½1; . . . ; Action½CaseSize 1g; UpdateMarker.
Step 1: Level 1 searching. Search the case records for similar
where Action [i] = {Action_n [0], Action_n [1], ... , Action_n [i], ...},
states. If the case is empty or no similar state is found
where:
in the history, go to step 1.1. Else, go to step 1.2. The cri-
Action_n [i] = {Action_nn[0], Action_nn[1], ... , Action_nn [i], ...},
teria for similarity are shown in Fig. 6. If current S
where:
resides in two situation ranges (see Fig. 7), the closer
Action_nn [i] = {Change amount (int)}.
situation (closer to S[i] or S[i + 1]) is selected. If the dis-
For multiple choices could be made under a single state, multi-
tance is equal, select one randomly.
hierarchy structure is adopted for records of actions taken. The ac-
Step 1.1: If the case record is not full, add the new state to the
tion taken is interpreted as adding the ‘‘change amount” (positive
end of the record. Else, replace the earliest state of
or negative) to current S value of (T, S) or (Q, S). The exact amount
lowest frequency (smallest counter) with the new
is presented by InvenIncrease (int) or InvenDecrease (int).

Fig. 8. (a) Initial S > D, (b) Initial S < D, (c) Target service level = 95%.
6524 C. Jiang, Z. Sheng / Expert Systems with Applications 36 (2009) 6520–6526

state. Memorize the location in case records using vice level out of target range, it is no longer a qualiﬁed
CaseMarker. action.
Step 1.2: Update CaseMarker. Step 3.2: Calculate InvenIncrease or InveDecrease based on the
Step 2: Level 2 searching. Check Case [CaseMarker]. If it is a new estimated mean of demand and the difference
case, add current service level to SLba_n [] and update between SLC and target service level. Add this amount
ActionMarker and UpdateMarker indicating new record to action records and update UpdateMarker indicating
added. Go to step 2.1. Else, search SLba_n [] for similar that new SLaa_nn [i] will be added. Thus, the old SLC
one with the criteria in step 1, setting the grid as 0.2%. will not be replaced.
Then, if no similar service level is found, go to step Step 4: repeat previous steps when inventory replenishment
2.1. Otherwise, update ActionMarker and go to step 3. condition is met again.
Step 2.1: Add current service level to the end of SLba record.
Compare the current service level to target service 4. Simulation results and analysis
level. If it is among the range of [TargetSL 0.2%, Tar-
getSL + 0.2%], no change will be made and 0 is added The supply-chain model in consideration simulated consists of
to action records. Else, calculate InvenIncrease or 10 retailers and 80 customers. Under condition one in section two,
InvenDecrease based on the estimated mean of cus-
tomer demand and difference between current ser-
Table 2
vice level and target service level. Add this amount Parameters added for condition 2
to action records and update S value.
Parameters Scale Distribution
Step 3: Level 3 searching. Search SLaa_n [ActionMarker] for clos-
est service level to target service level. Denote this L 0 100 Random distribution
K 100 0 Random distribution
record as SLC. If SLC is among the range of [Tar-
Followi 0 100 Normal distribution, l = 50 and d = 15
getSL 0.2%, TargetSL + 0.2%], go to step 3.1. Else, go Inﬂuence_outi 30 30 {30, 30 g, 30 2g, . . . , 30} where g = 60/Nretailer
to step 3.2. Range 2 Constant
Step 3.1: Get the corresponding action in Action_n [ActionMar- Pi 80 90 Random distribution
Qi 80 90 Random distribution
ker] and update S value and UpdateMarker. In the step
a a>1 Depend on testing
1 of next period, replace SLC with the resulting service b 0<b<1 Depend on testing
level. It guarantees that if this action results in a ser-

Fig. 9. (a) M1, (b) M2, (c) M3, (d) M4 under condition 1 for (T, S).
C. Jiang, Z. Sheng / Expert Systems with Applications 36 (2009) 6520–6526 6525

each retailer has a fixed group of eight customers. Their demand fol- The standard deviation of customer (d) is computed by multi-
lows a normal distribution N (l,d2), but its mean is changed by two plying l and coefficient of variable (CV). And initial l is set to
parameters: interval T and ranged extent, which follow a uniform 20. The target service level for each retailer is set to 90% equally.
distribution. It means that at every interval T, l = l + extent. Lead time is set to 1 day for (T, S) and 4 days for (Q, S). Each simu-
Two types of demands are considered which are defined as: lation is conducted for 1000 review periods. For the same experi-
ment condition, 20 simulations are repeated with different seeds
TE 1: T = Uniform (50, 80), extent = Uniform (1, 1). and their average service level is considered as actual service level.
TE 2: T = Uniform (15, 30), extent = Uniform (2, 2). The modes used in simulation under condition 1 are shown in Ta-
ble 1.
The values of grid1 and grid2 in Fig. 6 are initially set to 20 and
4 and are then changed as the mean value and standard deviation
change.
In order to show the independence of initial values, first 300 re-
view period is shown with: Initial supply > demand, Initial sup-
ply < demand, target service level is temporarily set to 95% (see
Fig. 8).
Fig. 9 shows the simulation results over time for four modes in
(T, S) system under condition 1. It can be seen that as the non-
stationary of customer is becoming more severe, the deviation of
average service level increases. However, the average service level
is kept very closely to target service level.
The simulation parameters (see Table 2) are added for simula-
tion under condition 2.
Fig. 10 show the results under condition 2. The customer de-
mand in this situation is much more nonstationary for the reason
Fig. 10. Results under condition 2 for (T, S).
that one retailer may loose most of its customers after increasing

Fig. 11. (a) M1, (b) M2, (c) M3, (d) M4 under condition 1 for (Q, S).
6526 C. Jiang, Z. Sheng / Expert Systems with Applications 36 (2009) 6520–6526

6. Conclusion and future research

In this paper, the problem of dynamic inventory control for sat-

isfying target service level in supply chain with nonstationary cus-
tomer demand is studied. The case-based reinforcement learning is
applied and proved experimentally to be effective. Furthermore,
the general CRL is considered for the purpose of applying CRL
widely in supply chain. Hence, the thinking behind this paper is
to link CRL to supply-chain management where multi-agent sys-
tem (MAS) is necessary. The future researches may reside in two
directions. One direction is to extend the CRL to a multi-stage mul-
ti-agent supply chain, so that bullwhip effect may be observed and
reduced. The other direction is to apply CRL to other issues in sup-
ply chain such as trading competition.

Acknowledgements

Fig. 12. Results under condition 2 for (Q, S).

This research is supported by National Natural Science Founda-
tion of PRC under the Grant Nos. 70731002 and 70571034 and
its price while others may win much more customers after Computational Experiment Center for Social Science at Nanjing
decreasing their prices. Therefore, the performance here is not as University.
good as the one under condition 1.
Fig. 11 shows the simulation results in (Q, S) system under con- References
dition 1 and Fig. 12 shows the performance under condition 2. The
Andersson, Jonas, & Marklund, Johan (2000). Decentralized inventory control in a
similar trend is found as in (T, S) system. Furthermore, in all situa- two-level distribution system. European Journal of Operational Research, 127,
tions, the case-based reinforcement learning performs better in (Q, 483–506.
Aronis, Kostas-Platon, Magou, Ioulia, Dekker, Rommert, & Tagaras, George (2004).
S) system than in (T, S) system. This is mainly due to the procedure
Inventory control of spare parts using a Bayesian approach: A case study.
indicated in Figs. 2 and 3. In (T, S) system, the suggested S value is European Journal of Operational Research, 154, 730–739.
calculated before lead time, i.e. it is based on duration of [T1, T2]. Ashayeri, J., Heuts, R. J. M., Lansdaal, H. G. L., & Strijbosch, L. W. G. (2006). Cyclic
And the valid duration of S is actually from [T1 + LT, T2 + LT]. Mean- production-inventory planning and control in the pre-Deco industry: A case
study. International Journal of Production Economics, 103, 715–725.
while, in (Q, S) system, the S value is calculated for the period of Chen, Ye, Li, Kevin W., Marc Kilgour, D., & Hipel, Keith W. (2006). A case-based
lead time (the actual valid duration). This difference makes the bet- distance model for multiple criteria ABC analysis. Computers & Operations
ter performance in (Q, S) system. Research, 35, 776–796.
Chi, Hoi-Ming, Ersoy, Okan K., Moskowitz, Herbert, & Ward, Jim (2007). Modeling
and optimizing a vendor managed replenishment system using machine
5. Consideration for general case-based reinforcement learning learning and genetic algorithms. European Journal of Operational Research, 180,
174–193.
in supply chain
Chiang, Chi (2007). Optimal control policy for a standing order inventory system.
European Journal of Operational Research, 182, 695–703.
The efficiency of the CRL is proved in the simulation results. But Díez, Marta Dueñas, Erik Ydstie, B., Fjeld, Magne, & Lie, Bernt (2008). Inventory
the application of this algorithm in supply chain is far from the control of particulate processes. Computers and Chemical Engineering, 32,
46–67.
end. Various issues in supply chain are difficult to resolve under ElHafsi, Mohsen (2007). Optimal integrated production and inventory control of an
evolving or dynamic environment, such as trading competition. assemble-to-order system with multiple non-unitary demand classes. European
RL could be used in many areas because of its generality. In RL, Journal of Operational Research. doi:10.1016/j.ejor.2007.12.00.
Govindu, Ramakrishna, & Chinnam, Ratna Babu (2007). MASCF: A generic process-
the computer simply gives a goal and then it learns how to achieve centered methodological framework for analysis and design of multi-agent
that goal by trial-and-error iteration with its environment (Har- supply chain systems. Computers & Industrial Engineering, 53, 584–609.
mon & Harmon, 1996). Harmon, Mance E., & Harmon, Stephanie S. (1996). Reinforcement learning: A
tutorial. Available from: <http://www.nada.kth.se/kurser/kth/2D1432/2004/
Based on the algorithm for inventory control in this paper, a rltutorial.pdf>.
general framework is considered. The same structure could still Kopach, Renata, Balcıoğlu, Barısß, & Carter, Michael (2008). Tutorial on constructing a
be used as follows. red blood cell inventory management system with two demand rates. European
Journal of Operational Research, 185, 1051–1059.
(States, Target level achieved before action, Actions, Target level Kwon, Ick-Hyun, Kim, Chang Ouk, Jun, Jin, & Lee, Jung Hoon (2007). Case-based
achieved after actions) myopic reinforcement learning for satisfying target service level in supply
States may be made up of related parameters in environment, chain. Expert Systems with Applications. doi:10.1016/j.eswa.2007.07.00.
Lee, H. T., & Wu, J. C. (2006). A study on inventory replenishment policies in a two-
which describe the current situation, e.g., average price and aver-
echelon supply chain system. Computers & Industrial Engineering, 51, 257–263.
age quality. And current target may be calculated before any ac- Liang, Wen-Yau, & Huang, Chun-Che (2006). Agent-based demand forecast in multi-
tions taken, e.g., current profit or market share in trading echelon supply chain. Decision Support Systems, 42, 390–407.
Maity, K., & Maiti, M. (2007). A numerical approach to a multi-objective optimal
competition. Sometimes, manager will not know which action
inventory control problem for deteriorating multi-items under fuzzy inflation
could lead to a result closer to target. But he can try some kind and discounting. Computers and Mathematics with Applications. doi:10.1016/
of action based on some motivation or information and learn the j.camwa.2007.07.011.
delayed reward for that action. This is so-called trial-and-error Yazgı Tütüncü, G., Aköz, Onur, Apaydın, Aysßen, & Petrovic, Dobrila (2007).
Continuous review inventory control in the presence of fuzzy costs.
iteration. The qualified actions could be directly used from expe- International Journal of Production Economics. doi:10.1016/j.ijpe.2007.10.01.
rience while unqualified actions are avoided. And the category of Zhang, Xiaolong (2007). Inventory control under temporal demand
qualified and unqualified is dynamic according to perceived re- heteroscedasticity. European Journal of Operational Research, 182, 127–144.
Zhang, Tao, & Zhang, David (2007). Agent-based simulation of consumer purchase
wards reflecting the changes in environment. With the accumu- decision-making and the decoy effect. Journal of Business Research, 60, 912–922.
lated experience, the error is becoming less and less.

[FREE PDF sample] Practical OpenTelemetry: Adopting Open Observability Standards Across Your Organization 1st Edition Daniel Gomez Blanco ebooks
100% (3)
[FREE PDF sample] Practical OpenTelemetry: Adopting Open Observability Standards Across Your Organization 1st Edition Daniel Gomez Blanco ebooks
51 pages
Apache Mahout Clustering Designs - Sample Chapter
No ratings yet
Apache Mahout Clustering Designs - Sample Chapter
25 pages
Anaconda's Guide To Open-Source: Tools and Libraries For Enterprise Data Science and Machine Learning
No ratings yet
Anaconda's Guide To Open-Source: Tools and Libraries For Enterprise Data Science and Machine Learning
29 pages
A Conceptual Framework For Tackling Knowable Unknown Unknowns in Project Management 2014 Journal of Operations Management
No ratings yet
A Conceptual Framework For Tackling Knowable Unknown Unknowns in Project Management 2014 Journal of Operations Management
15 pages
Epicor Core Data Map PDF
100% (1)
Epicor Core Data Map PDF
9 pages
32.web Based Data Retrieval & Manipulation
No ratings yet
32.web Based Data Retrieval & Manipulation
2 pages
Dbms
No ratings yet
Dbms
2 pages
Multi-Agent System in A Collaborative Supply Chain
No ratings yet
Multi-Agent System in A Collaborative Supply Chain
11 pages
POL BigDataStatisticsJune2014
No ratings yet
POL BigDataStatisticsJune2014
27 pages
Fundamentals of Engineering (FE) INDUSTRIAL AND SYSTEMS CBT Exam Specifications
No ratings yet
Fundamentals of Engineering (FE) INDUSTRIAL AND SYSTEMS CBT Exam Specifications
3 pages
SimPy For First Time Users - SimPy v2.2 Documentation
No ratings yet
SimPy For First Time Users - SimPy v2.2 Documentation
15 pages
How to Design Optimization Algorithms by Applying Natural Behavioral Patterns
From Everand
How to Design Optimization Algorithms by Applying Natural Behavioral Patterns
Rohollah Omidvar
No ratings yet
Social Media Data Mining and Analytics
From Everand
Social Media Data Mining and Analytics
Gabor Szabo
No ratings yet
Nonlinear Filtering and Smoothing: An Introduction to Martingales, Stochastic Integrals and Estimation
From Everand
Nonlinear Filtering and Smoothing: An Introduction to Martingales, Stochastic Integrals and Estimation
Venkatarama Krishnan
No ratings yet
Introduction To Computation and Programming Using Python, Revised - Guttag, John V..64 PDF
0% (2)
Introduction To Computation and Programming Using Python, Revised - Guttag, John V..64 PDF
1 page
Machine Learning For Fluid Mechanics
No ratings yet
Machine Learning For Fluid Mechanics
32 pages
Multi-agent system Second Edition
From Everand
Multi-agent system Second Edition
Gerardus Blokdyk
No ratings yet
Bayesian Learning: Fundamentals and Applications
From Everand
Bayesian Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Ant Colony Optimization Algorithms
No ratings yet
Ant Colony Optimization Algorithms
13 pages
Mining for Knowledge: Exploring GPU Architectures In Cryptocurrency and AI: The Crypto Mining Mastery Series, #2
From Everand
Mining for Knowledge: Exploring GPU Architectures In Cryptocurrency and AI: The Crypto Mining Mastery Series, #2
Ladd Baby
No ratings yet
Digital Twins Industry 4.0
100% (1)
Digital Twins Industry 4.0
9 pages
Python Machine Learning Complete Self-Assessment Guide
From Everand
Python Machine Learning Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Anylogic and Java
No ratings yet
Anylogic and Java
38 pages
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
From Everand
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
Fouad Sabry
No ratings yet
Transforming Systems Engineering Through Model-Centric Engineering
100% (1)
Transforming Systems Engineering Through Model-Centric Engineering
240 pages
The Worst-Case Scenario Survival Handbook: Expert Advice For Extreme Situations Borgenicht Ebook All Chapters PDF
100% (4)
The Worst-Case Scenario Survival Handbook: Expert Advice For Extreme Situations Borgenicht Ebook All Chapters PDF
52 pages
Dynamodb Applied Design Patterns: Chapter No. 1 "Data Modeling With Dynamodb"
No ratings yet
Dynamodb Applied Design Patterns: Chapter No. 1 "Data Modeling With Dynamodb"
23 pages
Microsoft Dynamics NAV 2009: Professional Reporting
From Everand
Microsoft Dynamics NAV 2009: Professional Reporting
Steven Renders
No ratings yet
Federated Learning Challenges Methods and Future Directions
No ratings yet
Federated Learning Challenges Methods and Future Directions
11 pages
Clojure For Data Science - Sample Chapter
100% (1)
Clojure For Data Science - Sample Chapter
61 pages
Handbook of Intelligent Control
No ratings yet
Handbook of Intelligent Control
138 pages
Approaches To The Analysis of Survey Data PDF
No ratings yet
Approaches To The Analysis of Survey Data PDF
28 pages
(Textbooks in Mathematics) Rodney X. Sturdivant, William P. Fox - Probability and Statistics For Engineering and The Sciences With Modeling Using R-CRC Press - Chapman & Hall (2022)
No ratings yet
(Textbooks in Mathematics) Rodney X. Sturdivant, William P. Fox - Probability and Statistics For Engineering and The Sciences With Modeling Using R-CRC Press - Chapman & Hall (2022)
429 pages
Install TensorFlow With Pip - TensorFlow
No ratings yet
Install TensorFlow With Pip - TensorFlow
3 pages
Machine Learning Slides
No ratings yet
Machine Learning Slides
281 pages
Ibm Synapse: Samyak Ashok Jain
No ratings yet
Ibm Synapse: Samyak Ashok Jain
16 pages
Mastering Time Series Analysis and Forecasting with Python
From Everand
Mastering Time Series Analysis and Forecasting with Python
Sulekha Aloorravi
No ratings yet
Statistical and Inductive Probabilities
From Everand
Statistical and Inductive Probabilities
Hugues Leblanc
No ratings yet
Manual Webots: Worlds Controllers Prototypes Samples Devices Real Robot Howto
100% (1)
Manual Webots: Worlds Controllers Prototypes Samples Devices Real Robot Howto
63 pages
Data Flow Diagram
No ratings yet
Data Flow Diagram
55 pages
Machine Learning Overview
No ratings yet
Machine Learning Overview
11 pages
An Introduction To The Bootstrap: Teaching Statistics May 2001
No ratings yet
An Introduction To The Bootstrap: Teaching Statistics May 2001
7 pages
Data-Driven Modelling - Concepts, Approaches and Experiences
0% (1)
Data-Driven Modelling - Concepts, Approaches and Experiences
15 pages
Chapter 14 Linear Programming
No ratings yet
Chapter 14 Linear Programming
42 pages
Professional Python
From Everand
Professional Python
Luke Sneeringer
No ratings yet
Pyomo Workshop December 2023
No ratings yet
Pyomo Workshop December 2023
261 pages
Group Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis
From Everand
Group Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis
Fouad Sabry
No ratings yet
Pentaho Linux MSADAuth With Kerberos
No ratings yet
Pentaho Linux MSADAuth With Kerberos
14 pages
Machine Learning: Fundamentals and Applications
From Everand
Machine Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
A Survey of Feature Selection and Feature Extraction Techniques in Machine Learning
No ratings yet
A Survey of Feature Selection and Feature Extraction Techniques in Machine Learning
7 pages
Andrew NG Main - Notes PDF
No ratings yet
Andrew NG Main - Notes PDF
226 pages
Stochastic Modeling and Optimization With Applications in Queues, Finance, and Supply Chains
No ratings yet
Stochastic Modeling and Optimization With Applications in Queues, Finance, and Supply Chains
480 pages
Learning PyTorch 2.0, Second Edition: Utilize PyTorch 2.3 and CUDA 12 to experiment neural networks and deep learning models
From Everand
Learning PyTorch 2.0, Second Edition: Utilize PyTorch 2.3 and CUDA 12 to experiment neural networks and deep learning models
Matthew Rosch
No ratings yet
Complete Download An Introduction to Statistical Learning: with Applications in Python Gareth James PDF All Chapters
No ratings yet
Complete Download An Introduction to Statistical Learning: with Applications in Python Gareth James PDF All Chapters
55 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
234 pages
Recommended Reading For Time Series Analysis
No ratings yet
Recommended Reading For Time Series Analysis
2 pages
Ext PDF Hackers Delight
0% (1)
Ext PDF Hackers Delight
2 pages
Probability For Finance
No ratings yet
Probability For Finance
40 pages
Modern Web Development With Scala Sample
No ratings yet
Modern Web Development With Scala Sample
45 pages
Python for Everyone: A Complete Guide to Coding, Data, and Web Development: Your Guide to the Digital World, #3
From Everand
Python for Everyone: A Complete Guide to Coding, Data, and Web Development: Your Guide to the Digital World, #3
Atokhon Ghaniev
No ratings yet
Text Mining: Fundamentals and Applications
From Everand
Text Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Machine Learning: Hands-On for Developers and Technical Professionals
From Everand
Machine Learning: Hands-On for Developers and Technical Professionals
Jason Bell
No ratings yet
Six Steps To Master Machine Learning With Data Preparation
No ratings yet
Six Steps To Master Machine Learning With Data Preparation
44 pages
Machine Learning 1
No ratings yet
Machine Learning 1
29 pages
WELDER (Dual Mode) : Under Dual Training System
No ratings yet
WELDER (Dual Mode) : Under Dual Training System
33 pages
Draughtsman (Mechanical)
No ratings yet
Draughtsman (Mechanical)
28 pages
Depth Estimation Using CNN With Transfer Learning
No ratings yet
Depth Estimation Using CNN With Transfer Learning
15 pages
Trade Theory - Draughtsman-Syllabus
No ratings yet
Trade Theory - Draughtsman-Syllabus
6 pages
Syllabus For Workshop Calculation and Science
No ratings yet
Syllabus For Workshop Calculation and Science
2 pages
A Heuristic Framework For The Bi Objective Enhanced Index Tracking Problem 2016 Omega
No ratings yet
A Heuristic Framework For The Bi Objective Enhanced Index Tracking Problem 2016 Omega
16 pages
Transportation Research Part E: Dong-Ping Song, Andrew Lyons, Dong Li, Hossein Sharifi
No ratings yet
Transportation Research Part E: Dong-Ping Song, Andrew Lyons, Dong Li, Hossein Sharifi
22 pages
Trade Theory Welder-Syllabus
No ratings yet
Trade Theory Welder-Syllabus
4 pages
Escalator's Arrangement 1. A Single Escalator Arrangement: August 2016 Semester
No ratings yet
Escalator's Arrangement 1. A Single Escalator Arrangement: August 2016 Semester
10 pages
STANDARD NORMAL DISTRIBUTION: Table Values Represent AREA To The LEFT of The Z Score
No ratings yet
STANDARD NORMAL DISTRIBUTION: Table Values Represent AREA To The LEFT of The Z Score
2 pages
Expert Systems With Applications: Ming-Lang Tseng
No ratings yet
Expert Systems With Applications: Ming-Lang Tseng
11 pages
An Interdependent Layered Network Model For A Resilient Supply Chain 2014 Omega
No ratings yet
An Interdependent Layered Network Model For A Resilient Supply Chain 2014 Omega
13 pages
A Simple Heuristic For Solving Small Fixed Charge Transportation Problems 2003 Omega
No ratings yet
A Simple Heuristic For Solving Small Fixed Charge Transportation Problems 2003 Omega
7 pages
An Improved Heuristic For The Single Machine Weighted Tardiness Problem 1999 Omega
No ratings yet
An Improved Heuristic For The Single Machine Weighted Tardiness Problem 1999 Omega
11 pages
Omega: Pui-Sze Chow, Yulan Wang, Tsan-Ming Choi, Bin Shen
No ratings yet
Omega: Pui-Sze Chow, Yulan Wang, Tsan-Ming Choi, Bin Shen
13 pages
A Delay in Payment Contract For Pareto Improvement of A Supply Chain With Stochastic Demand 2014 Omega
No ratings yet
A Delay in Payment Contract For Pareto Improvement of A Supply Chain With Stochastic Demand 2014 Omega
9 pages
A Competitive Advantage From The Implementation Timing of ISO Management Standards 2015 Journal of Operations Management
No ratings yet
A Competitive Advantage From The Implementation Timing of ISO Management Standards 2015 Journal of Operations Management
14 pages
TekHigh-Visual Basic - Net (2021!12!24 13-09-30 UTC)
No ratings yet
TekHigh-Visual Basic - Net (2021!12!24 13-09-30 UTC)
230 pages
Switching Theory and Logic Design
No ratings yet
Switching Theory and Logic Design
89 pages
IV SEM
No ratings yet
IV SEM
24 pages
PDF Lab Manual For Security Guide To Network Security
0% (1)
PDF Lab Manual For Security Guide To Network Security
3 pages
What Media Files Are Supported Through Allshare and USB On My LED TV - (2011 - 2012) - Samsung Support CA PDF
No ratings yet
What Media Files Are Supported Through Allshare and USB On My LED TV - (2011 - 2012) - Samsung Support CA PDF
4 pages
Ting 2019
No ratings yet
Ting 2019
28 pages
8 Bit Kogge Stone Adder - Final - Report
No ratings yet
8 Bit Kogge Stone Adder - Final - Report
8 pages
Bayesian Network Classifiers in Weka
No ratings yet
Bayesian Network Classifiers in Weka
23 pages
Aqap2210e PDF
No ratings yet
Aqap2210e PDF
27 pages
Unit 1,2,3 ML
No ratings yet
Unit 1,2,3 ML
144 pages
Supercomputerjgs Ojirs
No ratings yet
Supercomputerjgs Ojirs
12 pages
Hsslive Xii Maths Science Second Term Key Anon Dec 2023
No ratings yet
Hsslive Xii Maths Science Second Term Key Anon Dec 2023
6 pages
A Guide To Writing in Mathematics
No ratings yet
A Guide To Writing in Mathematics
2 pages
Week2: Playfair Cipher. The Best-Known Multiple-Letter Encryption Cipher Is The Playfair, Which
No ratings yet
Week2: Playfair Cipher. The Best-Known Multiple-Letter Encryption Cipher Is The Playfair, Which
10 pages
Designation Order Form JHS SHS PDF
No ratings yet
Designation Order Form JHS SHS PDF
3 pages
12 Trees, models and properties (Дрва)
No ratings yet
12 Trees, models and properties (Дрва)
9 pages
ERPNext On Debian 9
100% (1)
ERPNext On Debian 9
15 pages
Compiler Lab Manual
No ratings yet
Compiler Lab Manual
59 pages
Security Against Deceptive Phishing: Submitted
No ratings yet
Security Against Deceptive Phishing: Submitted
13 pages
WBP UNIT 3
No ratings yet
WBP UNIT 3
54 pages
PW Power Supplies DC Converters
No ratings yet
PW Power Supplies DC Converters
71 pages
Telnet
No ratings yet
Telnet
6 pages
Effects of Sampling and Aliasing in Discrete Time Sinusoids
No ratings yet
Effects of Sampling and Aliasing in Discrete Time Sinusoids
7 pages
The Beauty of Laplace's Equation, Mathematical Key To Everything - WIRED PDF
No ratings yet
The Beauty of Laplace's Equation, Mathematical Key To Everything - WIRED PDF
9 pages
Transformer
No ratings yet
Transformer
21 pages
Function: I. Procedimiento A) Desarrollar Una GUI Que Ejecuta Una Calculadora Con Las Operaciones Aritméticas Básicas
No ratings yet
Function: I. Procedimiento A) Desarrollar Una GUI Que Ejecuta Una Calculadora Con Las Operaciones Aritméticas Básicas
10 pages
Write Full Code For All Algorithm Along With Output and Screen Shot
No ratings yet
Write Full Code For All Algorithm Along With Output and Screen Shot
7 pages

Case-Based Reinforcement Learning For Dynamic Inventory Control in A Multi-Agent Supply-Chain System

Uploaded by

Case-Based Reinforcement Learning For Dynamic Inventory Control in A Multi-Agent Supply-Chain System

Uploaded by

Expert Systems with Applications 36 (2009) 6520–6526

Contents lists available at ScienceDirect

Expert Systems with Applications

Case-based reinforcement learning for dynamic inventory control in a

1. Introduction proach to forecasting the demand for spare parts of electronic

demand uncertainty. Additionally, Maity and Maiti (2007) devise

A simpliﬁed two-echelon supply chain consisting of multiple

an order-up-to level that is implemented immediately. While in (Q, Counter½CaseSize

reserved. LostSales (Vector) records of lost sales. Order (int),

In this section, the elements of case-base reinforcement learn- Current D

Table 1 SLaa½CaseSize ¼ fSLaa½0; SLaa½1; . . . ; SL½CaseSize 1g;

6. Conclusion and future research

In this paper, the problem of dynamic inventory control for sat-

Fig. 12. Results under condition 2 for (Q, S).

You might also like