0% found this document useful (0 votes)
198 views

Case-Based Reinforcement Learning For Dynamic Inventory Control in A Multi-Agent Supply-Chain System

Case-based Reinforcement Learning for Dynamic Inventory Control in a Multi-Agent Supply-chain System

Uploaded by

Umang Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
198 views

Case-Based Reinforcement Learning For Dynamic Inventory Control in A Multi-Agent Supply-Chain System

Case-based Reinforcement Learning for Dynamic Inventory Control in a Multi-Agent Supply-chain System

Uploaded by

Umang Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Expert Systems with Applications 36 (2009) 6520–6526

Contents lists available at ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

Case-based reinforcement learning for dynamic inventory control in a


multi-agent supply-chain system
Chengzhi Jiang *, Zhaohan Sheng
Department of Management and Engineering, Nanjing University, 22 Hankou Road, Nanjing 210093, PR China

a r t i c l e i n f o a b s t r a c t

Keywords: Reinforcement learning (RL) appeals to many researchers in recent years because of its generality. It is an
Inventory control approach to machine intelligence that learns to achieve the given goal by trial-and-error iterations with
Reinforcement learning its environment. This paper proposes a case-based reinforcement learning algorithm (CRL) for dynamic
Supply-chain management inventory control in a multi-agent supply-chain system. Traditional time-triggered and event-triggered
Multi-agent simulation
ordering policies remain popular because they are easy to implement. But in the dynamic environment,
the results of them may become inaccurate causing excessive inventory (cost) or shortage. Under the con-
dition of nonstationary customer demand, the S value of (T, S) and (Q, S) inventory review method is
learnt using the proposed algorithm for satisfying target service level, respectively. Multi-agent simula-
tion of a simplified two-echelon supply chain, where proposed algorithm is implemented, is run for a few
times. The results show the effectiveness of CRL in both review methods. We also consider a framework
for general learning method based on proposed one, which may be helpful in all aspects of supply-chain
management (SCM). Hence, it is suggested that well-designed ‘‘connections” are necessary to be built
between CRL, multi-agent system (MAS) and SCM.
Ó 2008 Elsevier Ltd. All rights reserved.

1. Introduction proach to forecasting the demand for spare parts of electronic


equipment, providing a more accurate determination on stock le-
Supply-chain management (SCM) has been providing competi- vel for satisfying negotiated customer service level. Ashayeri,
tive advantages for enterprises in the market. In that, inventory Heuts, Lansdaal, and Strijbosch (2006) also develop cyclic produc-
control plays an important role and has been attracting attentions tion-inventory optimization models for the process manufacturing
from many researchers in recent years. Some known inventory industry. ElHafsi (2007) shows that optimal inventory allocation
control policies are studied and improved for all aspects, such as policy in an assemble-to-order system is a multi-level state-depen-
reduced cost, more flexibility. Chen, Li, Marc Kilgour, and Hipel dent rationing policy. Díez, Erik Ydstie, Fjeld, and Lie (2008) design
(2006) introduce a case-based multi-criteria ABC analysis, that im- model-based controllers based on discretized population balance
proves on this approach by accounting for additional criteria, such (PB) models for particular processes, which are encountered in al-
as lead time and criticality of SKUs. This procedure provides more most any branch of process industries. Kopach, Balcıoğlu, and Car-
flexibility to account for more factors in classifying SKUs. Lee and ter (2008) revisit a queuing model and determine an optimal
Wu (2006) propose the statistical process control (SPC) based inventory control policy using level crossing techniques in blood
replenishment method, in which inventory rules and demand rules industry.
are developed to determine the amount of order replenishment for Meanwhile, identifying factors affecting inventory management
solving order batching problem. This control system performs well performance such as cost and demand also assists in designing the
at reducing backorders, and bullwhip effect. Yazgı Tütüncü, Aköz, controllers. Andersson and Marklund (2000) introduce a modified
Apaydın, and Petrovic (2007) present new models for continuous cost structure at the warehouse, and then multi-level inventory
review inventory control in the presence of uncertainty. The opti- control problem can be decomposed to single-level problems. By
mal order quantity and the optimal reorder point are found to min- applying a simple coordination procedure to them, the near opti-
imize the fuzzy cost. mal solution is obtained. Zhang (2007) studies an inventory control
On the other hand, different inventory management systems problem under temporal demand heteroscedasticity, which is
could be designed according to a specific industry or environment. found to have a significant influence on firm’s inventory costs.
Aronis, Magou, Dekker, and Tagaras (2004) apply Bayesian ap- Chiang (2007) uses dynamic programming to determine the opti-
mal control policy for a standing order system. Yazgı Tütüncü
* Corresponding author. Tel.: +86 13851833708. et al. (2007) make use of fuzzy set concepts to treat imprecision
E-mail address: [email protected] (C. Jiang). regarding the costs and probability theory to treat customer

0957-4174/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2008.07.036
C. Jiang, Z. Sheng / Expert Systems with Applications 36 (2009) 6520–6526 6521

demand uncertainty. Additionally, Maity and Maiti (2007) devise


··· Multiple supplier agents
the optimal production and advertising policies for an inventory
control system considering inflation and discounting in fuzzy
environment.
··· ··· Multiple retailer agents
It is observed that in recent researches mentioned, mathemati-
cal or analytical models are preferred, such as Bayesian approach
(Aronis et al., 2004), Utility Function Method (Maity & Maiti,
Multiple customer
2007), fuzzy set concepts (Yazgı Tütüncü et al., 2007) and Autore- ··· ··· ···
agents
gressive and Integrated Moving Average and Generalized Autore-
gressive Conditional Heteroscedasticity (Zhang, 2007). This kind Fig. 1. The supply-chain model under consideration.
of method provides strict deduction, which usually involves com-
plicated notations and equations under assumptions. However, on
one hand, the problem may be time-varying under dynamic envi- The second condition requests additional customer selecting
ronment, especially in the evolving system like supply chain where standards and strategies for retailers trying to attract customers
the solution in one time may be not suitable for another time. On or maximize their profits. The former one adopts simplified moti-
the other hand, those models are too difficult for managers to vation function proposed by Zhang and Zhang (2007)
implement in the real enterprises because of the complicated calcu- Mi ¼ PSi  Pi þ QSi  Q i þ fti  ini : ð1Þ
lations involved. This requires the learning ability to enrich one’s
Mi motivation of retailer i (i = 0 to Nretailer  1) of a customer agent.
experience continuously in order to make reasonable decisions.
PSi presents customer agent’s price sensitivity parameter to retailer
Reinforcement learning (RL) is an approach to machine intelligence
i, while QSi is the quality sensitivity parameter and fti is the follower
that combines the fields of dynamic programming and supervised
tendency. Pi and Qi are the price and quality of retailer i respectively.
learning to yield powerful machine-learning systems (Harmon &
ini is the influence received from some other customer agents as
Harmon, 1996). Chi, Ersoy, Moskowitz, and Ward (2007) demon-
friends with the respect to retailer i. Eq. (1) is further specified as
strate and validate the applicability and desirability of using ma-
follows:
chine learning techniques to model, understand, and optimize
 
complex supply chains. To make the best use of learning methods, Mi ¼ ðaPi Pave þ kÞ  Pi þ bjQ i Q ave j þ L  Q i þ ft  ini ; ð2Þ
intelligent entities are the necessary carriers. Multi-agent Systems
(MAS) seem to be a good choice where the agents are characterized where, a > 1, 0 < b < 1, Pave and Qave are the average price and qual-
of intelligence, autonomy, interactive and reactivity. Liang and ity provided by retailers. k is a constant presenting the price sensi-
Huang (2006) develop a multi-agent system to simulate a supply tivity which is varied according to the social-status of customers.
chain, where agents are coordinated to control inventory and min- And L is the corresponding quality sensitivity constant of custom-
imize the total cost of a supply chain. Govindu and Chinnam (2007) ers. The calculation of ini is treated as: each customer has a list of
propose a generic process-centered methodological framework for influence out from positive to negative corresponding to its own
analysis and design of multi-agent supply chain systems. rank of retailers. ini equals the added value of influence out of retai-
Therefore, this paper proposes a reinforcement learning algo- ler i from friend agents.
rithm combined with case-base reasoning (CRL) in a multi-agent And a simple adjustment strategy for retailer i is used here as
supply-chain system. Similar research is carried out by Kwon, follows:
Kim, Jun, and Lee (2007). They suggest a case-based myopic if its market share is below average, then Pi = Pi  p;
reinforcement learning algorithm for satisfying target service le- else if its market share is above average, then Pi = Pi + p;
vel using vendor managed inventory model. And in this paper, else no change is made.
we are trying to provide a simpler learning method with similar Under both conditions, the demand that is not met immediately
or better performance, which could be used more widely and is treated as lost sales. However, under condition 2, for each cus-
easier to implement by managers. Furthermore, the ‘‘connec- tomer has its own rank of retailers according to motivation func-
tions” are strongly recommended to be built between CRL, tion values, it may select the next one when the retailer of
MAS and SCM, thus a generic reinforcement learning method is higher rank has insufficient inventory.
also suggested. Besides, all retailers firstly use periodical review order-up-to (T,
The remainder of this paper is organized as follows. Section 2 S) system and then order-quantity reorder point (Q, S) system for
explains the multi-agent supply-chain model including the inven- inventory management. And the order-up-to level and reorder
tory control problem. Section 3 presents the CRL algorithm in more point in both systems are learned, respectively using CRL for satis-
detail. Simulation environment for measuring the performance of fying target service level (see Figs. 2 and 3). The goal of each retai-
CRL is explained and the results are presented in Section 4. Section ler is to satisfy its target service level predefined while trying to cut
5 considers a generic RL method based on the proposed one. Final- excessive inventory. In this paper, the fill-rate type service level is
ly, the conclusion and future research are provided in Section 6. adopted.
It can be seen that the CRL in (T, S) system takes place before
2. Multi-agent supply-chain model lead time, because it will learn the rewards before and suggest

A simplified two-echelon supply chain consisting of multiple


customers and retailers is designed for demonstration (see Fig. 1). T LT T LT T
It is assumed that each retailer receives all ordered stocks from sup-
pliers in a fixed lead time regardless of the amount of order. And it
faces nonstationary customer demand under two conditions:

(i) Each retailer has a fixed group of customers whose demand CRL CRL
is nonstationary. stocks arrived stocks arrived
(ii) Each customer is free to choose one retailer in each period,
i.e., in a competitive market. Fig. 2. (T, S) inventory replenishment mechanism.
6522 C. Jiang, Z. Sheng / Expert Systems with Applications 36 (2009) 6520–6526

LT LT der point (S) are learned based on the past experience. The result-
ing service levels of previously suggested S values are treated as
rewards for the actions taken before. The (State, Action, reward) re-
cords provide reference when similar state is met again.
when inventory <S when inventory <S The details of elements are shown as follows:
order Q order Q Case½CaseSize ¼ fCase½0; Case½1; . . . ; Case½CaseSize  1g;
CRL CRL
stocks arrived stocks arrived where Case [i] = {inventory level (int), Dmean (int)}.
Case [CaseSize] (array of vectors) is used to record the different
Fig. 3. (Q, S) inventory replenishment mechanism. states met. Each state includes two elements: current S value and
estimated current mean of customer demand.

an order-up-to level that is implemented immediately. While in (Q, Counter½CaseSize


S) system, the CRL is implemented after lead time for the reason ¼ fCounter½0; Counter½1; . . . ; Counter½CaseSize  1g;
that the suggested reorder point is used when next condition of
where Counter [i] = {counter (int)}.
inventory replenishment is met.
Finally, all agents and associated attributes are introduced and
explained in the following part, the model is programmed under
JDK (Java2 Development Kit) 1.5. SLba [i] Action [i] SLaa [i]

1. Customer Class: ID (int), customer identity, from 0 to Ncustomer  1. Action_nn [0] SLaa_nn [0]
Demand (int), nonstationary. k (int), price sensitivity parameter. L

...

...
(int), quality sensitivity parameter. Index[] (array of double), rank SLba_n [0] Action_n [0] SLaa_n [0]
Action_nn [i] SLaa_nn [i]
of motivation values. Rank[] (array of all retailer objects), rank of SLba_n [1] Action_n [1] SLaa_n [1]

...

...
retailers. Influence_out[] (array of double), influence out to friend
...

...

...
agent (s) with respect to retailers. Influence_in[] (array of double),
Action_nn [0] SLaa_nn [0]
received influence from friend agent (s). Follow (double), follower SLba_n [i] SLaa_n [i]
Action_n [i]

...

...
tendency. FriendID[] (array of int), IDs of friend agent (s). a, b (sta-
tic double), motivation function parameters. Mean (static int), CV Action_nn [i] SLaa_nn [i]
...

...

...
(static double), Deviation (static int), T (static int), extent (static dou-

...

...
ble), parameters of nonstationary demand. Range (static int),
Fig. 4. Overview of data structure.
number of friend agent(s). SuperID (Vector), records the ID of
seller each time. BuyHistory (Vector), records IDs of retailers in
each defined period.
Case [i] SLba [i] Action [i] SLaa [i]
2. Retailer Class: ID (int), retailer identity. price (double), price of
products provided. quality (double), quality of products pro-
vided. cost (double), profit (double), bank (double), reserved. SLba_n [i] Action_n [i] SLaa_n [i]
Stock_in (Vector), Stock_out (Vector), stock flow records.
Money_in (Vector), Money_out (Vector), money flow records, Action_nn [i] SLaa_nn [i]

reserved. LostSales (Vector) records of lost sales. Order (int),


Fig. 5. Corresponding relationships of elements.
inventory replenishment quantity each time. TimeNow (int),
running time in marker. T (static int), review period of (T, S)
system. Q (static int) order quantity of (Q, S) system. Max_Stock
S[i] S[i+1]
(int), order-up-to level for (T, S) system or re-order point for (Q,
S) system. TargetSL (static double), predefined target service Grid1 Grid1 Grid1 Grid1
level. SLave (static double), average service level. Current_Stock Range for [i] Range for [i+1]
Case [i]={S[i],D[i]}
(int), current inventory level. CurrentSL (double), current service
Case [i+1]={S[i+1], D[i+1]}
level achieved. Case [] (array of Vectors), CaseSize (int), Counter [] D[i] D[i+1]
(array of int), CaseMarker (int), ActionMarker (int), UpdateMarker
Grid2 Grid2 Grid2 Grid2
(int), InvenIncrease (int), InvenDecrease (int), SLba [] (array of
double), SLaa [] (array of Vectors), Actions [] (array of Vectors),
Range for [i] Range for [i+1]
elements of CRL. SuperID (Vector), SuperHistory (Vector), records
of suppliers, reserved. NextID (Vector), NextHistory (Vector), Fig. 6. Criteria for similarity of level 1 searching.
records of customers.
3. Transfer Class: It is in charge of transferring money and stock
between upstream and downstream agents, calculating average S[i] S[i+1]
price, quality and service level and so on.
4. Business Class: It provides the market environment for supply
chain. The work flow of other agents is controlled in it.
Case [i]={S[i],D[i]}
Current S
Case [i+1]={S[i+1], D[i+1]}
D[i] D[i+1]
3. Case-based reinforcement learning

In this section, the elements of case-base reinforcement learn- Current D


ing mentioned in Section 2 are explained in more details and the
whole procedure is presented. The order-up-to level (S) and reor- Fig. 7. Situation of two similar states.
C. Jiang, Z. Sheng / Expert Systems with Applications 36 (2009) 6520–6526 6523

Table 1 SLaa½CaseSize ¼ fSLaa½0; SLaa½1; . . . ; SL½CaseSize  1g;


Simulation modes under condition 1
where SLaa [i] = {SLaa_n [0], SLaa_n [1], ... , SLaa_n [i], ...}, where:
Mode [i] CV T Extent
SLaa_n [i] = {SLaa_nn [0], SLaa_nn [1], ... , SLaa_nn [i], ...}, where:
M1 0.2 Uniform (50, 80) Uniform (1, 1) SLaa_nn [i] = {resulting service level after action (reward)
M2 0.2 Uniform (15, 30) Uniform (2, 2)
(double)}.
M3 0.4 Uniform (50, 80) Uniform (1, 1)
M4 0.4 Uniform (15, 30) Uniform (2, 2) According to action structure, records of resulting service levels
use the same configuration. The overview of data structure of ele-
ments is displayed in Fig. 4 and their corresponding relationships
Counter [CaseSize] (array of int) is an array of counters which
are summarized in Fig. 5.
memorize the frequency of the corresponding state met.
All initial values of data structures above are set to empty. The
SLba½CaseSize ¼ fSLba½0; SLba½1; . . . ; SLba½CaseSize  1g; case-based reinforcement learning algorithm is summarized as
three-level searching. The steps of it are described as follows:
where SLba [i] = {SLba_n [0], SLba_n [1], . . . ,SLba_n [i], . . .}, where:
SLba_n [i] = {Current Service Level before action (double)}.
Step 0: Before searching, the SLaa_n [i] is updated as reward for
SLba [CaseSize] (array of double) records the current service lev-
the last action. The exact location of update is recorded
els before action under corresponding case
by three markers: CaseMarker, ActionMarker,
Action½CaseSize ¼ fAction½0; Action½1; . . . ; Action½CaseSize  1g; UpdateMarker.
Step 1: Level 1 searching. Search the case records for similar
where Action [i] = {Action_n [0], Action_n [1], ... , Action_n [i], ...},
states. If the case is empty or no similar state is found
where:
in the history, go to step 1.1. Else, go to step 1.2. The cri-
Action_n [i] = {Action_nn[0], Action_nn[1], ... , Action_nn [i], ...},
teria for similarity are shown in Fig. 6. If current S
where:
resides in two situation ranges (see Fig. 7), the closer
Action_nn [i] = {Change amount (int)}.
situation (closer to S[i] or S[i + 1]) is selected. If the dis-
For multiple choices could be made under a single state, multi-
tance is equal, select one randomly.
hierarchy structure is adopted for records of actions taken. The ac-
Step 1.1: If the case record is not full, add the new state to the
tion taken is interpreted as adding the ‘‘change amount” (positive
end of the record. Else, replace the earliest state of
or negative) to current S value of (T, S) or (Q, S). The exact amount
lowest frequency (smallest counter) with the new
is presented by InvenIncrease (int) or InvenDecrease (int).

Fig. 8. (a) Initial S > D, (b) Initial S < D, (c) Target service level = 95%.
6524 C. Jiang, Z. Sheng / Expert Systems with Applications 36 (2009) 6520–6526

state. Memorize the location in case records using vice level out of target range, it is no longer a qualified
CaseMarker. action.
Step 1.2: Update CaseMarker. Step 3.2: Calculate InvenIncrease or InveDecrease based on the
Step 2: Level 2 searching. Check Case [CaseMarker]. If it is a new estimated mean of demand and the difference
case, add current service level to SLba_n [] and update between SLC and target service level. Add this amount
ActionMarker and UpdateMarker indicating new record to action records and update UpdateMarker indicating
added. Go to step 2.1. Else, search SLba_n [] for similar that new SLaa_nn [i] will be added. Thus, the old SLC
one with the criteria in step 1, setting the grid as 0.2%. will not be replaced.
Then, if no similar service level is found, go to step Step 4: repeat previous steps when inventory replenishment
2.1. Otherwise, update ActionMarker and go to step 3. condition is met again.
Step 2.1: Add current service level to the end of SLba record.
Compare the current service level to target service 4. Simulation results and analysis
level. If it is among the range of [TargetSL  0.2%, Tar-
getSL + 0.2%], no change will be made and 0 is added The supply-chain model in consideration simulated consists of
to action records. Else, calculate InvenIncrease or 10 retailers and 80 customers. Under condition one in section two,
InvenDecrease based on the estimated mean of cus-
tomer demand and difference between current ser-
Table 2
vice level and target service level. Add this amount Parameters added for condition 2
to action records and update S value.
Parameters Scale Distribution
Step 3: Level 3 searching. Search SLaa_n [ActionMarker] for clos-
est service level to target service level. Denote this L 0  100 Random distribution
K 100  0 Random distribution
record as SLC. If SLC is among the range of [Tar-
Followi 0  100 Normal distribution, l = 50 and d = 15
getSL  0.2%, TargetSL + 0.2%], go to step 3.1. Else, go Influence_outi 30  30 {30, 30  g, 30  2g, . . . , 30} where g = 60/Nretailer
to step 3.2. Range 2 Constant
Step 3.1: Get the corresponding action in Action_n [ActionMar- Pi 80  90 Random distribution
Qi 80  90 Random distribution
ker] and update S value and UpdateMarker. In the step
a a>1 Depend on testing
1 of next period, replace SLC with the resulting service b 0<b<1 Depend on testing
level. It guarantees that if this action results in a ser-

Fig. 9. (a) M1, (b) M2, (c) M3, (d) M4 under condition 1 for (T, S).
C. Jiang, Z. Sheng / Expert Systems with Applications 36 (2009) 6520–6526 6525

each retailer has a fixed group of eight customers. Their demand fol- The standard deviation of customer (d) is computed by multi-
lows a normal distribution N (l,d2), but its mean is changed by two plying l and coefficient of variable (CV). And initial l is set to
parameters: interval T and ranged extent, which follow a uniform 20. The target service level for each retailer is set to 90% equally.
distribution. It means that at every interval T, l = l + extent. Lead time is set to 1 day for (T, S) and 4 days for (Q, S). Each simu-
Two types of demands are considered which are defined as: lation is conducted for 1000 review periods. For the same experi-
ment condition, 20 simulations are repeated with different seeds
TE 1: T = Uniform (50, 80), extent = Uniform (1, 1). and their average service level is considered as actual service level.
TE 2: T = Uniform (15, 30), extent = Uniform (2, 2). The modes used in simulation under condition 1 are shown in Ta-
ble 1.
The values of grid1 and grid2 in Fig. 6 are initially set to 20 and
4 and are then changed as the mean value and standard deviation
change.
In order to show the independence of initial values, first 300 re-
view period is shown with: Initial supply > demand, Initial sup-
ply < demand, target service level is temporarily set to 95% (see
Fig. 8).
Fig. 9 shows the simulation results over time for four modes in
(T, S) system under condition 1. It can be seen that as the non-
stationary of customer is becoming more severe, the deviation of
average service level increases. However, the average service level
is kept very closely to target service level.
The simulation parameters (see Table 2) are added for simula-
tion under condition 2.
Fig. 10 show the results under condition 2. The customer de-
mand in this situation is much more nonstationary for the reason
Fig. 10. Results under condition 2 for (T, S).
that one retailer may loose most of its customers after increasing

Fig. 11. (a) M1, (b) M2, (c) M3, (d) M4 under condition 1 for (Q, S).
6526 C. Jiang, Z. Sheng / Expert Systems with Applications 36 (2009) 6520–6526

6. Conclusion and future research

In this paper, the problem of dynamic inventory control for sat-


isfying target service level in supply chain with nonstationary cus-
tomer demand is studied. The case-based reinforcement learning is
applied and proved experimentally to be effective. Furthermore,
the general CRL is considered for the purpose of applying CRL
widely in supply chain. Hence, the thinking behind this paper is
to link CRL to supply-chain management where multi-agent sys-
tem (MAS) is necessary. The future researches may reside in two
directions. One direction is to extend the CRL to a multi-stage mul-
ti-agent supply chain, so that bullwhip effect may be observed and
reduced. The other direction is to apply CRL to other issues in sup-
ply chain such as trading competition.

Acknowledgements

Fig. 12. Results under condition 2 for (Q, S).


This research is supported by National Natural Science Founda-
tion of PRC under the Grant Nos. 70731002 and 70571034 and
its price while others may win much more customers after Computational Experiment Center for Social Science at Nanjing
decreasing their prices. Therefore, the performance here is not as University.
good as the one under condition 1.
Fig. 11 shows the simulation results in (Q, S) system under con- References
dition 1 and Fig. 12 shows the performance under condition 2. The
Andersson, Jonas, & Marklund, Johan (2000). Decentralized inventory control in a
similar trend is found as in (T, S) system. Furthermore, in all situa- two-level distribution system. European Journal of Operational Research, 127,
tions, the case-based reinforcement learning performs better in (Q, 483–506.
Aronis, Kostas-Platon, Magou, Ioulia, Dekker, Rommert, & Tagaras, George (2004).
S) system than in (T, S) system. This is mainly due to the procedure
Inventory control of spare parts using a Bayesian approach: A case study.
indicated in Figs. 2 and 3. In (T, S) system, the suggested S value is European Journal of Operational Research, 154, 730–739.
calculated before lead time, i.e. it is based on duration of [T1, T2]. Ashayeri, J., Heuts, R. J. M., Lansdaal, H. G. L., & Strijbosch, L. W. G. (2006). Cyclic
And the valid duration of S is actually from [T1 + LT, T2 + LT]. Mean- production-inventory planning and control in the pre-Deco industry: A case
study. International Journal of Production Economics, 103, 715–725.
while, in (Q, S) system, the S value is calculated for the period of Chen, Ye, Li, Kevin W., Marc Kilgour, D., & Hipel, Keith W. (2006). A case-based
lead time (the actual valid duration). This difference makes the bet- distance model for multiple criteria ABC analysis. Computers & Operations
ter performance in (Q, S) system. Research, 35, 776–796.
Chi, Hoi-Ming, Ersoy, Okan K., Moskowitz, Herbert, & Ward, Jim (2007). Modeling
and optimizing a vendor managed replenishment system using machine
5. Consideration for general case-based reinforcement learning learning and genetic algorithms. European Journal of Operational Research, 180,
174–193.
in supply chain
Chiang, Chi (2007). Optimal control policy for a standing order inventory system.
European Journal of Operational Research, 182, 695–703.
The efficiency of the CRL is proved in the simulation results. But Díez, Marta Dueñas, Erik Ydstie, B., Fjeld, Magne, & Lie, Bernt (2008). Inventory
the application of this algorithm in supply chain is far from the control of particulate processes. Computers and Chemical Engineering, 32,
46–67.
end. Various issues in supply chain are difficult to resolve under ElHafsi, Mohsen (2007). Optimal integrated production and inventory control of an
evolving or dynamic environment, such as trading competition. assemble-to-order system with multiple non-unitary demand classes. European
RL could be used in many areas because of its generality. In RL, Journal of Operational Research. doi:10.1016/j.ejor.2007.12.00.
Govindu, Ramakrishna, & Chinnam, Ratna Babu (2007). MASCF: A generic process-
the computer simply gives a goal and then it learns how to achieve centered methodological framework for analysis and design of multi-agent
that goal by trial-and-error iteration with its environment (Har- supply chain systems. Computers & Industrial Engineering, 53, 584–609.
mon & Harmon, 1996). Harmon, Mance E., & Harmon, Stephanie S. (1996). Reinforcement learning: A
tutorial. Available from: <http://www.nada.kth.se/kurser/kth/2D1432/2004/
Based on the algorithm for inventory control in this paper, a rltutorial.pdf>.
general framework is considered. The same structure could still Kopach, Renata, Balcıoğlu, Barısß, & Carter, Michael (2008). Tutorial on constructing a
be used as follows. red blood cell inventory management system with two demand rates. European
Journal of Operational Research, 185, 1051–1059.
(States, Target level achieved before action, Actions, Target level Kwon, Ick-Hyun, Kim, Chang Ouk, Jun, Jin, & Lee, Jung Hoon (2007). Case-based
achieved after actions) myopic reinforcement learning for satisfying target service level in supply
States may be made up of related parameters in environment, chain. Expert Systems with Applications. doi:10.1016/j.eswa.2007.07.00.
Lee, H. T., & Wu, J. C. (2006). A study on inventory replenishment policies in a two-
which describe the current situation, e.g., average price and aver-
echelon supply chain system. Computers & Industrial Engineering, 51, 257–263.
age quality. And current target may be calculated before any ac- Liang, Wen-Yau, & Huang, Chun-Che (2006). Agent-based demand forecast in multi-
tions taken, e.g., current profit or market share in trading echelon supply chain. Decision Support Systems, 42, 390–407.
Maity, K., & Maiti, M. (2007). A numerical approach to a multi-objective optimal
competition. Sometimes, manager will not know which action
inventory control problem for deteriorating multi-items under fuzzy inflation
could lead to a result closer to target. But he can try some kind and discounting. Computers and Mathematics with Applications. doi:10.1016/
of action based on some motivation or information and learn the j.camwa.2007.07.011.
delayed reward for that action. This is so-called trial-and-error Yazgı Tütüncü, G., Aköz, Onur, Apaydın, Aysßen, & Petrovic, Dobrila (2007).
Continuous review inventory control in the presence of fuzzy costs.
iteration. The qualified actions could be directly used from expe- International Journal of Production Economics. doi:10.1016/j.ijpe.2007.10.01.
rience while unqualified actions are avoided. And the category of Zhang, Xiaolong (2007). Inventory control under temporal demand
qualified and unqualified is dynamic according to perceived re- heteroscedasticity. European Journal of Operational Research, 182, 127–144.
Zhang, Tao, & Zhang, David (2007). Agent-based simulation of consumer purchase
wards reflecting the changes in environment. With the accumu- decision-making and the decoy effect. Journal of Business Research, 60, 912–922.
lated experience, the error is becoming less and less.

You might also like