0% found this document useful (0 votes)
89 views

A Guide To Markov Chain and Its Applications in Machine Learning

The document is a guide to Markov chains and their applications in machine learning. It defines Markov chains as stochastic processes where the future state depends only on the present state, not past states. It discusses key concepts like state space, trajectories, transition probabilities, and transition matrices. It then explains how Markov chains can be used for prediction, including one-step and long-run predictions. Finally, it outlines some common applications of Markov chains like forecasting, customer behavior analysis, and natural language processing.

Uploaded by

perhacker
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views

A Guide To Markov Chain and Its Applications in Machine Learning

The document is a guide to Markov chains and their applications in machine learning. It defines Markov chains as stochastic processes where the future state depends only on the present state, not past states. It discusses key concepts like state space, trajectories, transition probabilities, and transition matrices. It then explains how Markov chains can be used for prediction, including one-step and long-run predictions. Finally, it outlines some common applications of Markov chains like forecasting, customer behavior analysis, and natural language processing.

Uploaded by

perhacker
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

PUBLISHED ON OCTOBER 8, 2021 IN DEVELOPERS CORNER

A Guide to Markov Chain and its Applications


in Machine Learning
A stochastic process can be considered as the Markov chain if the process consists of
the Markovian properties which are to process the future.

BY YUGESH VERMA

Markov Chains are one of the simple and very useful tools in order to model time-
dependent, space-dependent stochastic processes. Many domains like finance (stock
price movement), sales(sales quantity information), NLP algorithms (finite-state
transducers, Hidden Markov Model for POS Tagging), weather forecasting, etc use
the Markov chain to make their predictions easily and accurately. In this article, we
will discuss the Markovian Chains in detail. We will try to understand their working
with advantages and applications. The major points to be covered in this article are
listed below.

Table of Contents

1. Definitions
1.1. State-space
1.2. Trajectory
2. Prediction Using Markov Chain
2.1. Initial State and One-Step Prediction
2.2. Long Run Prediction
3. Advantages of Markov Chain
4. Application of the Markov Chain
1. Definitions

The Markov chain represents a class of stochastic processes in which the future does
not depend on the past, it depends on the present. A stochastic process can be
considered as the Markov chain if the process consists of the Markovian properties
which are to process the future. We require the information of the present state and
past is independent of the process.

Consider a situation where we have Xn state recorded at the n time stamp. The future
state at time n+1 depends on the state at time n. Let’s take an example of the corona
cases where the number of cases is Xn at time n and the number of cases at time
n+1 is Xn+1. So if we are following the Markov chain definition the number of cases
at time n+1 will depend on the number of cases at time n (Xn+1 will depend on Xn),
not on the past which are {Xn−1, Xn−2, . . . , X0}. To understand the Markov chain
we may need to understand some of the terms which are basically used by the Markov
chain concept. These terms are explained below.

1.1. State-space

If the state space for the Markov chain can be donated by S where S = {1,2,3….., n}
then the state of the process can be given by the value of Xn. For example, if Xn =
8 then the state of the process is 8. Hence we can say that at any time n, the state
in which the process is given by the value of Xn.

For example, in a class of students, the students with the old fail record are more
likely to develop a final result as a failure and the students who have lower marks in
the previous exam have the probability to get the result as a failure. So in this
situation, we can say that the chances of students failing the exam for the students
with old fail records are higher and for the students with low marks are lower. In this
scenario, we have two states: lower chances and higher chances. And S={1,2}.

1.2. Trajectory

The trajectory of the Markov chain can be considered as the sequence of the states
in which the stochastic process has been existing from the start.

In other words if we can denote the trajectory values as s0, s1, s2……. sn then the
state will take values as X0=s0, X1=s1……. Xn=sn.
1.3. Transition probability
Markov chains cannot be indifferent states at a particular time but they can change
their state with time. The change in the state can be called the transition of state.
From the above-given example, the Markov chain for example can be either at
lower chances or higher chances.

Let’s take a look at the below pictures

The above picture is a representation of the transition of the states. At state 1 the
chain is on higher chances states where we can say the exam which is going on is at
the state where the chances of failure are higher. The probability for the next exam
in getting into a state with higher chances for failure is 0.7 and the probability of
state transition to lower chances is 0.3. The probability that students transit to a
lower chances state to another exam from the present exam is 0.3.

Suppose the system is in a lower chance state and a similar transition diagram is
drawn. Here the transition probabilities are 0.85 and 0.15. Using both of the diagrams
we can draw a complete process
The above image is a representation of the Combined-state transition diagram from
State 1 to State 2. For a time instance, these processes can not go in the backward
direction but they can go backwards on the next time instance.

1.4. State transition matrix


The matrix of all transition probabilities is called the transition matrix. Where the
rows are the starting point and the columns are the endpoint.

The above matrix is a representation of the transition matrix of the example


illustrated above in the article. the transition of the process from a lower chances
state to a low risk has a probability of 0.15. The transition from low risk to higher
chances has a probability of 0.85.
2. Prediction Using Markov Chain

The Markov chain is a very powerful tool for making predictions for future value.
Since it gives various useful insights, it becomes very necessary to know the
transition probabilities, transition matrix, state-space, and trajectory to understand
the insights.

Also one of the basic things which are required to have prior knowledge about is the
initial state of the process. To explain the prediction process let’s take a look at the
above student failure chances example where applying few changes

2.1. Initial State and One-Step Prediction


This time it’s an engineering examination and the observation is that if the students
fail in the first year’s mathematics exams they are more likely to fail three times in
their core subjects and if they pass mathematics in the first year they are more likely
to pass their core subjects exams four times. So the transition matrix for example
will be
So the initial state of the process if the students pass their mathematics exam will be

From the above initial state, we can make a future prediction in the form of the
probability that the students will pass the core subject by just multiplying the initial
state and the transition matrix.

For the given example the prediction for the next step will be.

From the above intuition, we can say that the prediction for the first state after the
initial state can be given by the following formula.

Initial State * Transition Matrix = Prediction


2.2. Long-run Probability
The long-run probability can be considered as the steady-state probability. Since we
can calculate the steady-state probability when a state in the procedure is stable.
Here in the Markov chain if the initial stage is stable which means once it becomes
constant we can calculate the steady-state probability.

Let’s say that the V0 is the initial state probability vector and T is the transition matrix
so the one-time step prediction can be represented as the

V1 = V0 . T

Here one thing which is notable and very simple mathematics is the dot product of
the vector and matrix in a vector and by this intuition, we can say that in the process
of predicting the one-time step we again meet with a vector which can again be
considered as the initial state. Or more formally saying every predicted one-time step
in the future will be responsible for its next step only.

So if we want to predict the second step the formula for prediction will be

V2 = V1 . T

And here from the prediction of one step, we know the value of V1. by putting the
value of V1

V2 = (V0 . T) . T

V2 = V0 . T^2

Similarly, for the third step, the prediction will be

V3 = V2 . T = (V0 . T^2).T

V3 = V0 . T^3

Therefore talking about the nth time step prediction the prediction can be calculated
by the following formula

Vn = Vn-1 . T = V0 .T^n

So this is how the above given iterative process helps in prediction for the future
state probability of the long processes. Here the long-run probability can be written
as

V∞ = V0 . T^∞
From the above formula of long-run probabilities, we can say that no amount of
multiplication by the transition matrix leads to changes in the long-run probability
vector.

3. Advantages of Markov Chain

• As we have seen above the Markov chain is very easy to derive from a
successional data
• We don’t need to dive deep into the mechanism of dynamic change.
• Markov chain is very insightful. It can tell the area of any process where we are
lacking and further we can make changes in accordance to improvement.
• Very low or modest computation requirements can easily be calculated by any
size of the system.

4. Application of the Markov Chain

• Markov chains can be used for forecasting which can be any kind of forecasting
like weather, temperature, sale, etc.
• This can be used for predicting customer behaviour.
• As we know, it is good with sequential data so it can be merged with
many NLP problem solutions like POS tagging.
• Brand loyalty and consumer behaviour can be analyzed.
• In the gaming field, various models can be developed in the game of chances.

5. Final Words

Here in the article, we have seen an explanation of the Markov chain concept. As we
have seen, it is not difficult to understand and calculate so we can say that it is also
very easy for any size of computing. There can be many more usages of the Markov
chain because of its ease and accuracy, it has become a very popular subject of
research. I encourage readers to utilize this concept in his or her real-life projects to
make the project more easily calculable and interpretable.

You might also like