A Guide To Markov Chain and Its Applications in Machine Learning
A Guide To Markov Chain and Its Applications in Machine Learning
BY YUGESH VERMA
Markov Chains are one of the simple and very useful tools in order to model time-
dependent, space-dependent stochastic processes. Many domains like finance (stock
price movement), sales(sales quantity information), NLP algorithms (finite-state
transducers, Hidden Markov Model for POS Tagging), weather forecasting, etc use
the Markov chain to make their predictions easily and accurately. In this article, we
will discuss the Markovian Chains in detail. We will try to understand their working
with advantages and applications. The major points to be covered in this article are
listed below.
Table of Contents
1. Definitions
1.1. State-space
1.2. Trajectory
2. Prediction Using Markov Chain
2.1. Initial State and One-Step Prediction
2.2. Long Run Prediction
3. Advantages of Markov Chain
4. Application of the Markov Chain
1. Definitions
The Markov chain represents a class of stochastic processes in which the future does
not depend on the past, it depends on the present. A stochastic process can be
considered as the Markov chain if the process consists of the Markovian properties
which are to process the future. We require the information of the present state and
past is independent of the process.
Consider a situation where we have Xn state recorded at the n time stamp. The future
state at time n+1 depends on the state at time n. Let’s take an example of the corona
cases where the number of cases is Xn at time n and the number of cases at time
n+1 is Xn+1. So if we are following the Markov chain definition the number of cases
at time n+1 will depend on the number of cases at time n (Xn+1 will depend on Xn),
not on the past which are {Xn−1, Xn−2, . . . , X0}. To understand the Markov chain
we may need to understand some of the terms which are basically used by the Markov
chain concept. These terms are explained below.
1.1. State-space
If the state space for the Markov chain can be donated by S where S = {1,2,3….., n}
then the state of the process can be given by the value of Xn. For example, if Xn =
8 then the state of the process is 8. Hence we can say that at any time n, the state
in which the process is given by the value of Xn.
For example, in a class of students, the students with the old fail record are more
likely to develop a final result as a failure and the students who have lower marks in
the previous exam have the probability to get the result as a failure. So in this
situation, we can say that the chances of students failing the exam for the students
with old fail records are higher and for the students with low marks are lower. In this
scenario, we have two states: lower chances and higher chances. And S={1,2}.
1.2. Trajectory
The trajectory of the Markov chain can be considered as the sequence of the states
in which the stochastic process has been existing from the start.
In other words if we can denote the trajectory values as s0, s1, s2……. sn then the
state will take values as X0=s0, X1=s1……. Xn=sn.
1.3. Transition probability
Markov chains cannot be indifferent states at a particular time but they can change
their state with time. The change in the state can be called the transition of state.
From the above-given example, the Markov chain for example can be either at
lower chances or higher chances.
The above picture is a representation of the transition of the states. At state 1 the
chain is on higher chances states where we can say the exam which is going on is at
the state where the chances of failure are higher. The probability for the next exam
in getting into a state with higher chances for failure is 0.7 and the probability of
state transition to lower chances is 0.3. The probability that students transit to a
lower chances state to another exam from the present exam is 0.3.
Suppose the system is in a lower chance state and a similar transition diagram is
drawn. Here the transition probabilities are 0.85 and 0.15. Using both of the diagrams
we can draw a complete process
The above image is a representation of the Combined-state transition diagram from
State 1 to State 2. For a time instance, these processes can not go in the backward
direction but they can go backwards on the next time instance.
The Markov chain is a very powerful tool for making predictions for future value.
Since it gives various useful insights, it becomes very necessary to know the
transition probabilities, transition matrix, state-space, and trajectory to understand
the insights.
Also one of the basic things which are required to have prior knowledge about is the
initial state of the process. To explain the prediction process let’s take a look at the
above student failure chances example where applying few changes
From the above initial state, we can make a future prediction in the form of the
probability that the students will pass the core subject by just multiplying the initial
state and the transition matrix.
For the given example the prediction for the next step will be.
From the above intuition, we can say that the prediction for the first state after the
initial state can be given by the following formula.
Let’s say that the V0 is the initial state probability vector and T is the transition matrix
so the one-time step prediction can be represented as the
V1 = V0 . T
Here one thing which is notable and very simple mathematics is the dot product of
the vector and matrix in a vector and by this intuition, we can say that in the process
of predicting the one-time step we again meet with a vector which can again be
considered as the initial state. Or more formally saying every predicted one-time step
in the future will be responsible for its next step only.
So if we want to predict the second step the formula for prediction will be
V2 = V1 . T
And here from the prediction of one step, we know the value of V1. by putting the
value of V1
V2 = (V0 . T) . T
V2 = V0 . T^2
V3 = V2 . T = (V0 . T^2).T
V3 = V0 . T^3
Therefore talking about the nth time step prediction the prediction can be calculated
by the following formula
Vn = Vn-1 . T = V0 .T^n
So this is how the above given iterative process helps in prediction for the future
state probability of the long processes. Here the long-run probability can be written
as
V∞ = V0 . T^∞
From the above formula of long-run probabilities, we can say that no amount of
multiplication by the transition matrix leads to changes in the long-run probability
vector.
• As we have seen above the Markov chain is very easy to derive from a
successional data
• We don’t need to dive deep into the mechanism of dynamic change.
• Markov chain is very insightful. It can tell the area of any process where we are
lacking and further we can make changes in accordance to improvement.
• Very low or modest computation requirements can easily be calculated by any
size of the system.
• Markov chains can be used for forecasting which can be any kind of forecasting
like weather, temperature, sale, etc.
• This can be used for predicting customer behaviour.
• As we know, it is good with sequential data so it can be merged with
many NLP problem solutions like POS tagging.
• Brand loyalty and consumer behaviour can be analyzed.
• In the gaming field, various models can be developed in the game of chances.
5. Final Words
Here in the article, we have seen an explanation of the Markov chain concept. As we
have seen, it is not difficult to understand and calculate so we can say that it is also
very easy for any size of computing. There can be many more usages of the Markov
chain because of its ease and accuracy, it has become a very popular subject of
research. I encourage readers to utilize this concept in his or her real-life projects to
make the project more easily calculable and interpretable.