Open In App

Hidden Markov Models with Scikit-Learn

Last Updated : 24 Jun, 2024
Summarize
Comments
Improve
Suggest changes
Share
Like Article
Like
Report

Hidden Markov Models (HMMs) are statistical models that represent systems that transition between a series of states over time. They are specially used in various fields such as speech recognition, finance, and bioinformatics for tasks that include sequential data.

Here, we will explore the Hidden Markov Models and how to implement them using the Scikit-learn library in Python.

What is a Hidden Markov Model?

A Hidden Markov Model (HMM) is a way to predict hidden states of a system based on observable outcomes. It has several components such as -

  • Hidden States: These are the actual conditions we can't see directly (like weather being sunny or rainy).
  • Observations: But those we can see (like if someone carries an umbrella).
  • Transitions: The chances of moving from one hidden state to another (like the chance of sunny turning to rainy).
  • Emissions: Chances of seeing an observation from a hidden state (like the chance of carrying an umbrella if it's rainy).
  • Initial State: Starting probabilities of the hidden states (like the chance of starting with sunny weather).

Step-by-Step Implementation of Hidden Markov Model using Scikit-Learn Libraries

Step 1: Import Necessary Libraries

The code begins by importing necessary Python libraries. numpy is used for numerical operations, pandas for data manipulation and analysis, and hmmlearn for working with Hidden Markov Models (HMMs). These libraries provide the fundamental tools needed to handle, process, and model the data.

Python
!pip install hmmlearn
import numpy as np
import pandas as pd
from hmmlearn import hmm


Step 2: Load and Prepare Data

The dataset is loaded from a CSV file named 'weatherHistory.csv' into a pandas DataFrame. Any missing values in the dataset are dropped to ensure the quality of the data. The DataFrame is then truncated to the first 1000 rows for simplicity.

Python
data = pd.read_csv('weatherHistory.csv')
data = data.dropna()
data = data.head(1000)


Step 3: Transform Data for HMM

The precipitation type in the data is mapped to numerical values for compatibility with HMM, which requires numerical input. Additionally, temperature values are categorized into discrete states (cold, mild, warm) using pandas' cut function, which helps in simplifying the temperature data into a format suitable for HMM processing.

Python
data['Precip Type'] = data['Precip Type'].map({'rain': 1, 'snow': 2, 'none': 0})
data['Temp State'] = pd.cut(data['Temperature (C)'], bins=[-np.inf, 5, 15, np.inf], labels=[0, 1, 2]).astype(int)


Step 4: Configure and Initialize the HMM

A multinomial HMM model is defined with three components corresponding to the three temperature states (cold, mild, warm). The model's start probabilities, transition probabilities, and emission probabilities are explicitly set based on assumptions or prior knowledge.

Python
model = hmm.MultinomialHMM(n_components=3, n_iter=100)
model.startprob_ = np.array([0.5, 0.3, 0.2])
model.transmat_ = np.array([[0.6, 0.3, 0.1], [0.3, 0.4, 0.3], [0.1, 0.3, 0.6]])
model.emissionprob_ = np.array([[0.6, 0.3, 0.1], [0.3, 0.4, 0.3], [0.1, 0.3, 0.6]])


Step 5: Fit Model and Predict States

The model is trained using the observed precipitation types. Then, the hidden states (temperature states) are predicted based on the observed precipitation data. These predicted states provide insights into the underlying temperature conditions as inferred from the precipitation types.

Python
observations = data['Precip Type'].values.reshape(-1, 1)
model.fit(observations)
hidden_states = model.predict(observations)

Output:

Capture

Step 6: Post-Processing and Visualization

The numerical temperature states predicted by the HMM are mapped back to their corresponding labels ('Cold', 'Mild', 'Warm') for clarity. The actual and predicted temperature states are then visualized using matplotlib to compare and contrast the performance of the HMM in predicting temperature states based on precipitation.

Python
predicted_temp_states = [temp_state_map[state] for state in hidden_states]
plt.figure(figsize=(12, 6))
plt.plot(hidden_states_actual[:200], label='Actual Temperature State', marker='o', linestyle='-')
plt.plot(hidden_states[:200], label='Predicted Temperature State', marker='x', linestyle='--')
plt.title('Comparison of Actual and Predicted Temperature States')
plt.xlabel('Time')
plt.ylabel('Temperature State')
plt.legend()
plt.show()

Output:

download-(2)-min
The output shows a plot comparing the actual and predicted temperature states derived from the Hidden Markov Model (HMM)

Output Explanation

Visualization Components

  • X-axis (Time): Represents sequential time points or instances in the dataset. Each point likely corresponds to a specific observation from the original dataset.
  • Y-axis (Temperature State): The temperature states are categorized into three levels:
    • 0: Cold
    • 1: Mild
    • 2: Warm

Lines on the Plot

  • Blue line: Represents the actual temperature states as observed in the dataset. This line moves across the three temperature states based on the actual recorded temperatures.
  • Orange dashed line: Represents the temperature states predicted by the HMM. The predictions are based on the observed precipitation types and learned model parameters.

Analysis of Results

  • Alignment of States: At several points along the timeline, the predicted temperature states (orange dashed line) align closely with the actual temperature states (blue line). This indicates that the model has effectively learned some patterns or relationships between the observed precipitation types and the corresponding temperature states.
  • Misalignments: There are also noticeable sections where the predicted states do not match the actual states. These discrepancies may result from several factors, such as limitations in the model’s ability to capture more complex dependencies, insufficient or non-representative training data, or inherent randomness in the data not accounted for by the model.

Conclusion

Hidden Markov Models (HMMs) are effective for analyzing time series data with hidden states. Using Scikit-learn simplifies HMM implementation and training, enabling the discovery of hidden patterns in sequential data. Here we demostrate HMMs which reveal hidden structures in complex datasets. The output plot visually assesses the performance of the HMM in predicting temperature states based on precipitation data. While the model shows some degree of predictive accuracy, evident from several points of alignment, the occurrences of misalignment highlight areas where the model could potentially be improved, perhaps by tuning parameters, incorporating additional features, or using more advanced modeling techniques.


Next Article

Similar Reads