0% found this document useful (0 votes)
2 views

Intelligent Optimization Algorithm for Master (2)

The document discusses intelligent algorithms for solving optimization problems, highlighting methods such as hill-climbing and genetic algorithms. It also covers reinforcement learning techniques, including Q-learning and deep Q-learning, to optimize decision-making in uncertain environments. Additionally, it suggests hybrid approaches combining heuristic methods and neural networks for improved problem-solving in specific cases like RCPSP.

Uploaded by

lizhen.huang09
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Intelligent Optimization Algorithm for Master (2)

The document discusses intelligent algorithms for solving optimization problems, highlighting methods such as hill-climbing and genetic algorithms. It also covers reinforcement learning techniques, including Q-learning and deep Q-learning, to optimize decision-making in uncertain environments. Additionally, it suggests hybrid approaches combining heuristic methods and neural networks for improved problem-solving in specific cases like RCPSP.

Uploaded by

lizhen.huang09
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Topic: Intelligent

algorithm for
optimization problems
• Optimization problem
example
Optimization problem example b
cases from

Optimization problem example c


Rcpsp math formation
• Objective function: min([ft[i],…..]), given es[i], i=1,….N

• St: ft[i]=es[i]+d[i]
• if i to pa[j], es[j]>=ft[i]
• For any day t, k type of rs consumed by act i

∑𝑟 𝑠𝑖𝑘𝑡 <𝑟 𝑠 𝑙𝑘
𝑖
It is not possible to Heuristic approach
solve the problem is the promissing
by manual way for this
operation. problem.
How can we use IA to solve the optimization
problem?
• hill-climbing algorithm (competition between two individuals)

f ( x0)

f ( x0  delta)
Example

• See hill_climb algorithm.py


• Find the maximum value for
Sin(x^2)+2*cos(2*x)
• X is in[5,8]
How can we use IA to solve the optimization
problem?
• genetic algorithm ( competition among group)

• Many individuals
• Crossover and mutation
Ga process
• Step1 :Generate the individual answer (the answer should
be the feasible answer)
• Srep 2:Generate a population of answers
• Step 3:make the object function for the problem
• Step 4:Evaluate the population by using object function
• Step 5:Select the feasible answers according to their fittting
values
• Step 6:Corssover
• Step 7:Mutation
• Step 8: back to step 4
Variation: cossover and mutation for
binary value
Variation: cossover and mutation for decimal
value
Variation: mutation for decimal value
Advantage and disadvantage
• question free
• Not guarante to global solution
• Many parameters
• Operate Slowly with operator
Several algorithms with few
parameters and simple evolution
strcture
• 1+1 ES
• Only mutation
Several algorithms with few
parameters and simple evolution
strcture
• U +lambda ES (u parents, each parent produce lambda children, all
are evaluated, select u, repeat)
• Only mutation
DE flow
chart
( more on
mutation)
Difference
evolution
PSO (competition and cooperation)
• Particle Swarm Optimization (PSO)
What is nn
Surrogate • To find an approximate function for the data,
traditionally using gausian process with kernal
optimization function
Neuron network (surrogate
optimization)
• The concept of surrogate optimization
• To find an approximate function for the data, traditionally using gausian
process with kernal function
• but nn is more powerful to fit the data
• (an example)…nn for optimization

• Differentiable , continuours function


Rl (reinforcement learning)
• Based on the Dynamical programming and control theory
• Subproblems
• For each subproblems ,Presented by states and controled varaibled
Rl (reinforcement learning)
• Learning what?
• Learning reaction strategy to unknown environment or given state
Rl
• Learning from data

• State: (fire)
• Action: (oil),
• Rw_f(state,action)=reward
• Rw_f( fire,use oil)=-50
• Rw_f(fire, use water)=100
Using q table to store the
knowledge
• Data is stored in a table with the reuslts for paired data (state, action)

• Given the q table, greedy strategy to select action under state environment
• Here, states are discrete and independent in the fire example.

Action a Action b

State 1 Q(1,a) Q(1,b)


State 2 Q(2,a) Q(2,b)
Using q table to store the
knowledge
• For consecutive task, states have specific requirements.
• It should follow Markov property.
Consecutive task or risky
environment
Explore vs exploit in rl
• for the unknown enviroment, how to explore?
• Greedy epislon strategy
Rl target

• for the unknown enviroment, by taking a lot of trial


and error, the agent obtains precious data:
• S0-a0-r0-s1-a1-r1…….sn-an-rn…… ( one episode)
• Sometimes, the immediate reward may not be clear
until the end of state.
• S0-a0-s1-a1-….sn-an-……
• Which is much alike multi bandit slot game.
• So the target for the agent is to maximize the
expectation return
• Return =R0+dis*r1+r2*dis^2+….+rn*dis^n
Consecutive task or risk enviroment
• for the unknown enviroment, by taking a lot
of trial and error, the agent obtains precious
data:
• 2,right,0,3
• 2,left,0,3
• 2,left,10,1,left,-100,0
• 2,left,10,1,right,-100,0

• How to use this experimental data to


calculate q table ?
Bellman equation

S1-r1-s2-r2…. v(s1)=r1+dis*v(s2)
Monte carlo q table
• One episode:
• 2,left,10,1,left,-100,0
• 2,left,10,1
• 1,left,-100,0 (end state)
• q(1,left)=-100+dis*0=-100
• q(2,left)=10+0.9*(-100)=-80
Update the knowledge
• q(1,left)=-100
• q(2,left)=-80
• New episode
• 2,left,10,1,right,-100,0
• 2,left,10,1
• 1,right,-100,0
• q(1,right)=-100+dis*0=-100
• q(2,left)=10+0.9*(-100)=-80
• Update the knowledg again
• q(1,left)=-100 , q(1,right)=-100 ,q(2,left)=(-80-80)/2=-80,
Monte carlo q
table

• Need a lof of experiments to


explore to obtain useful q table for
exploitation
• If Enviroment is too complicated,
some states may not be detected.
For example , the np problem.
• Need to get a complete episode
Bellman optimal equation: q
learning
• S1-a1-r1-s2( section of episode)
• Q(s1,a1)=r1+dis*max v(s2) ….. q learning
• S1-a1-r1-s2-a2( section of episode)
• Q(s1,a1)=r1+dis*q(s2,a2)-----sarsa learning
Deep q learning
• If state space is infinite, q table is unavailable.
• We use nn to fit the data for q value.
data
• S0,a0,r0,s1,a1,r1,s2…..
• (S0,a0,r0),( s1,a1,r1)…..(we use
before to build nn)
• (S0,a0,r0,s1) ,

f(s0,a0)=r0+dis*max(list(f(s1,ai))) ,i
is all actions
• using estimation to validate nn
Policy gradient
https://towardsdatascience.com/reinforcement-learning-
explained-visually-part-6-policy-gradients-step-by-step-
f9f448e73754
Actor critic method
Application in rcpsp
• nn
to approximate a function from a matrix which store the result of fun
ction with row and column as inputs
• Monte carlo may not be enough to detect the the whole searching
space
• Heuristic methods are good at searching.
• Hybrid method may be a way to solve rcpsp
See practical ga for case 1.py
isos for case 1 improved.py
puregaforcase2.py
Thanks

You might also like