Open In App

Building Regression Models with Microsoft Cognitive Toolkit (CNTK)

Last Updated : 09 Sep, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

The Microsoft Cognitive Toolkit (CNTK), also known as the Computational Network Toolkit, is an open-source, commercial-grade toolkit designed for deep learning. It allows developers to create models that mimic the learning processes of the human brain. Although Cognitive Toolkit is mainly used for Deep Learning, it can be used to build models that can be used for Regression analysis.

Setting Up CNTK for Regression Models

Regression models are essential in predicting continuous values based on input data. CNTK provides a robust framework for building regression models, leveraging its deep learning capabilities to handle complex datasets.

To set up CNTK for Regression, we have to ensure that Python 3.6.8 is installed on our local machine. Before building a regression model using CNTK, you need to set up the CNTK environment. Follow these steps to install and configure CNTK in your system:

Install CNTK: Install CNTK via pip with the command:

pip install cntk

Now, Let's begin by importing the required libraries, such as CNTK, NumPy, and matplotlib, for creating the model and plotting results. Data preparation is an important step since we will be using the data to train our model and we need to ensure that the model can predict the continuous values with perfection. Follow the below steps for second step:

  • First collect the data and identify the independent and dependent variables.
  • Then handle the missing values either by removing or filling the NaN values with mean, median or mode.
  • Use One Hot Encoding and Label Encoding techniques to convert the categorical values to numerical values.
  • To boost the performance of the model, scale the features by using techniques like Normalization and Standardization.
  • Remove outliers from the data if present.
  • Lastly ensure that all the features and the results are in numpy arrays so that CNTK can process those.
  • Split the data into train and test.
Python
import cntk as C
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import numpy as np

# Load the California housing dataset
housing_data = fetch_california_housing()

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(housing_data.data, housing_data.target, test_size=0.2, random_state=42)

# Normalize features using StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Define input and output dimensions
input_dim = X_train.shape[1]  # Number of features
output_dim = 1                # Regression output is a single value

# Define the input and output placeholders
X = C.input_variable(input_dim)
y = C.input_variable(output_dim)

From the example, we can see that we have identified the features and the dependent values. Then we divide it into train and test, use feature scaling and also define placeholders so that CNTK can process the values.

Building Regression Models With CNTK

We can build many regression models using CNTK. Some of them are as follows:

  • Linear Regression.
  • Polynomial Regression.
  • Ridge Regression
  • Lasso Regression

In this article, we will cover each of these models, their theoretical background, and their implementation using CNTK.

1. Linear Regression

Linear Regression is the simplest model in which there is one independent and one dependent variable. In this we initialize the weights and the bias. Using the values we define the regression model.

Python
# Initialize weights and bias
W = C.parameter((input_dim, output_dim))
b = C.parameter((output_dim))

# Define the linear regression model
model = C.times(X, W) + b

2. Polynomial Regression

In Polynomial Regression there is one dependent variables but more than one independent feature. To implement polynomial regression, we convert the features to Polynomial forms using sklearn library and then we make use of Linear Regression model which now acts as polynomial regression model.

Python
# Use PolynomialFeatures to generate polynomial terms (degree 2 for simplicity)
poly = PolynomialFeatures(degree=2)
X_train = poly.fit_transform(X_train)
X_test = poly.transform(X_test)
# Normalize features using StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Define input and output dimensions
input_dim = X_train.shape[1]  # Number of polynomial features
output_dim = 1                # Single output for regression

# Define the input and output placeholders
X = C.input_variable(input_dim)
y = C.input_variable(output_dim)

# Initialize weights and bias
W = C.parameter((input_dim, output_dim))
b = C.parameter((output_dim))

# Define the polynomial regression model: y = W * X + b
model = C.times(X, W) + b

3. Lasso Regression

Lasso Regression is similar to Linear Regression but to handle multicollinearity and overfitting as it uses L1 Regularization. L1 means that absolute value of penalty is added. In this we modify the loss function by introducing the penalty .

Python
# Define input and output dimensions
input_dim = X_train.shape[1]  # Number of polynomial features
output_dim = 1                # Single output for regression

# Define the input and output placeholders
X = C.input_variable(input_dim)
y = C.input_variable(output_dim)

# Initialize weights and bias
W = C.parameter((input_dim, output_dim))
b = C.parameter((output_dim))

# Define the regression model: y = W * X + b
model = C.times(X, W) + b

# Define loss (MSE) with L1 regularization (Lasso)
loss = C.squared_error(model, y) + 0.01 * C.reduce_sum(C.abs(W))  # L1 regularization term
eval_error = C.squared_error(model, y)

# Use Adam optimizer with momentum
learning_rate = 0.01
momentum = C.momentum_schedule(0.9)
learner = C.adam(model.parameters, C.learning_rate_schedule(learning_rate, C.UnitType.minibatch), momentum)

# Create the trainer
trainer = C.Trainer(model, (loss, eval_error), [learner])

4. Ridge Regression

Ridge Regression is similar to Lasso Regression but adds square of penalty values that is L2 regularization in the loss function. Here also we make use of Linear Regression model but modify the loss function by introducing L2 penalty.

Python
# Initialize weights and bias
W = C.parameter((input_dim, output_dim))
b = C.parameter((output_dim))

# Define the regression model: y = W * X + b
model = C.times(X, W) + b

# Define loss (MSE) with L2 regularization (Ridge)
loss = C.squared_error(model, y) + 0.01 * C.reduce_sum(C.square(W))  # L2 regularization term
eval_error = C.squared_error(model, y)

Regression Model Evaluation in CNTK

For training the model we specify the batch size and the epochs. For each epoch the model will be trained on a particular batch of data. For each batch the loss and error is calculated. Finally the test error usually MSE is calculated. Smaller the MSE better is the predictive power of the model.

batch_size = 32
num_epochs = 20

# Training loop
for epoch in range(num_epochs):
for i in range(0, len(X_train_cntk), batch_size):
X_batch = X_train_cntk[i:i + batch_size]
y_batch = y_train_cntk[i:i + batch_size]
trainer.train_minibatch({X: X_batch, y: y_batch})
if epoch % 10 == 0:
print(f'Epoch {epoch}, Loss: {trainer.previous_minibatch_loss_average}')

# Convert test data to CNTK format
X_test_cntk = np.array(X_test, dtype=np.float32)
y_test_cntk = np.array(y_test, dtype=np.float32).reshape(-1, 1)

# Get predictions
predictions = model.eval({X: X_test_cntk})

# Calculate test error (MSE)
mse = np.mean((predictions - y_test_cntk) ** 2)
print(f'Test MSE: {mse}')

We need to optimize our model so that our model is trained faster and that it produces accurate predictions. Some techniques to optimize our model are as follows:

  • Use hyperparameter tuning to tune the parameters like batch, epochs, learning rate etc.
  • Introduce penalties to reduce overfitting and multicollinearity.
  • Scale the features to improve the performance of the model.
  • Use cross validation techniques to enhance the performance and optimize the hyperparameters.
  • Do not add too many layers to the model as it can result in longer training time and other problems like Vanishing Gradient problem or Overfitting.

Here we have used California Housing Dataset present in sklearn library. 80% of the data has been used for training and the rest 20% is used for testing purpose. Finally we have calculated the Mean Squared Error on the test dataset.

Linear Regression

Python
import cntk as C
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import numpy as np

# Load the California housing dataset
housing_data = fetch_california_housing()

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(housing_data.data, housing_data.target, test_size=0.2, random_state=42)

# Normalize features using StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Define input and output dimensions
input_dim = X_train.shape[1]  # Number of features
output_dim = 1                # Regression output is a single value

# Define the input and output placeholders
X = C.input_variable(input_dim)
y = C.input_variable(output_dim)

# Initialize weights and bias
W = C.parameter((input_dim, output_dim))
b = C.parameter((output_dim))

# Define the linear regression model: y = W * X + b
model = C.times(X, W) + b

# Define loss (MSE) and evaluation functions
loss = C.squared_error(model, y)
eval_error = C.squared_error(model, y)

# Use Adam optimizer (with momentum)
learning_rate = 0.01
momentum = C.momentum_schedule(0.9)  # Define momentum
learner = C.adam(model.parameters, 
                 C.learning_rate_schedule(learning_rate, C.UnitType.minibatch), 
                 momentum)

# Create the trainer
trainer = C.Trainer(model, (loss, eval_error), [learner])

# Convert data to CNTK format
X_train_cntk = np.array(X_train, dtype=np.float32)
y_train_cntk = np.array(y_train, dtype=np.float32).reshape(-1, 1)

# Set batch size and epochs
batch_size = 32
num_epochs = 20

# Training loop
for epoch in range(num_epochs):
    for i in range(0, len(X_train_cntk), batch_size):
        X_batch = X_train_cntk[i:i + batch_size]
        y_batch = y_train_cntk[i:i + batch_size]
        trainer.train_minibatch({X: X_batch, y: y_batch})
    if epoch % 10 == 0:
        print(f'Epoch {epoch}, Loss: {trainer.previous_minibatch_loss_average}')

# Convert test data to CNTK format
X_test_cntk = np.array(X_test, dtype=np.float32)
y_test_cntk = np.array(y_test, dtype=np.float32).reshape(-1, 1)

# Get predictions
predictions = model.eval({X: X_test_cntk})

# Calculate test error (MSE)
mse = np.mean((predictions - y_test_cntk) ** 2)
print(f'Test MSE: {mse}')

Output:

Epoch 0, Loss: 5.107728481292725
Epoch 10, Loss: 1.2154287099838257
Test MSE: 0.6077708005905151

Polynomial Regression

Python
import cntk as C
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
import numpy as np

# Load the California housing dataset
housing_data = fetch_california_housing()

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(housing_data.data, housing_data.target, test_size=0.2, random_state=42)

# Use PolynomialFeatures to generate polynomial terms (degree 2 for simplicity)
poly = PolynomialFeatures(degree=2)
X_train = poly.fit_transform(X_train)
X_test = poly.transform(X_test)

# Normalize features using StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Define input and output dimensions
input_dim = X_train.shape[1]  # Updated number of features after polynomial transformation
output_dim = 1                # Regression output is a single value

# Define the input and output placeholders
X = C.input_variable(input_dim)
y = C.input_variable(output_dim)

# Initialize weights and bias
W = C.parameter((input_dim, output_dim))
b = C.parameter((output_dim))

# Define the polynomial regression model: y = W * X + b
model = C.times(X, W) + b

# Define loss (MSE) and evaluation functions
loss = C.squared_error(model, y)
eval_error = C.squared_error(model, y)

# Use Adam optimizer (with momentum)
learning_rate = 0.01
momentum = C.momentum_schedule(0.9)  # Define momentum
learner = C.adam(model.parameters, 
                 C.learning_rate_schedule(learning_rate, C.UnitType.minibatch), 
                 momentum)

# Create the trainer
trainer = C.Trainer(model, (loss, eval_error), [learner])

# Convert data to CNTK format
X_train_cntk = np.array(X_train, dtype=np.float32)
y_train_cntk = np.array(y_train, dtype=np.float32).reshape(-1, 1)

# Set batch size and epochs
batch_size = 32
num_epochs = 20

# Training loop
for epoch in range(num_epochs):
    for i in range(0, len(X_train_cntk), batch_size):
        X_batch = X_train_cntk[i:i + batch_size]
        y_batch = y_train_cntk[i:i + batch_size]
        trainer.train_minibatch({X: X_batch, y: y_batch})
    if epoch % 10 == 0:
        print(f'Epoch {epoch}, Loss: {trainer.previous_minibatch_loss_average}')

# Convert test data to CNTK format
X_test_cntk = np.array(X_test, dtype=np.float32)
y_test_cntk = np.array(y_test, dtype=np.float32).reshape(-1, 1)

# Get predictions
predictions = model.eval({X: X_test_cntk})

# Calculate test error (MSE)
mse = np.mean((predictions - y_test_cntk) ** 2)
print(f'Test MSE: {mse}')

Output:

Epoch 0, Loss: 4.65647554397583
Epoch 10, Loss: 1.1391972303390503
Test MSE: 0.6355475783348083

Lasso Regression

Python
import cntk as C
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
import numpy as np

# Load the California housing dataset
housing_data = fetch_california_housing()

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(housing_data.data, housing_data.target, test_size=0.2, random_state=42)

# Use PolynomialFeatures to generate polynomial terms (degree 2 for simplicity)
poly = PolynomialFeatures(degree=2)
X_train = poly.fit_transform(X_train)
X_test = poly.transform(X_test)

# Normalize features using StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Define input and output dimensions
input_dim = X_train.shape[1]  # Number of polynomial features
output_dim = 1                # Single output for regression

# Define the input and output placeholders
X = C.input_variable(input_dim)
y = C.input_variable(output_dim)

# Initialize weights and bias
W = C.parameter((input_dim, output_dim))
b = C.parameter((output_dim))

# Define the regression model: y = W * X + b
model = C.times(X, W) + b

# Define loss (MSE) with L1 regularization (Lasso)
loss = C.squared_error(model, y) + 0.01 * C.reduce_sum(C.abs(W))  # L1 regularization term
eval_error = C.squared_error(model, y)

# Use Adam optimizer with momentum
learning_rate = 0.01
momentum = C.momentum_schedule(0.9)
learner = C.adam(model.parameters, C.learning_rate_schedule(learning_rate, C.UnitType.minibatch), momentum)

# Create the trainer
trainer = C.Trainer(model, (loss, eval_error), [learner])

# Convert data to CNTK format
X_train_cntk = np.array(X_train, dtype=np.float32)
y_train_cntk = np.array(y_train, dtype=np.float32).reshape(-1, 1)

# Set batch size and epochs
batch_size = 32
num_epochs = 20

# Training loop
for epoch in range(num_epochs):
    for i in range(0, len(X_train_cntk), batch_size):
        X_batch = X_train_cntk[i:i + batch_size]
        y_batch = y_train_cntk[i:i + batch_size]
        trainer.train_minibatch({X: X_batch, y: y_batch})
    if epoch % 10 == 0:
        print(f'Epoch {epoch}, Loss: {trainer.previous_minibatch_loss_average}')

# Convert test data to CNTK format
X_test_cntk = np.array(X_test, dtype=np.float32)
y_test_cntk = np.array(y_test, dtype=np.float32).reshape(-1, 1)

# Get predictions
predictions = model.eval({X: X_test_cntk})

# Calculate test error (MSE)
mse = np.mean((predictions - y_test_cntk) ** 2)
print(f'Test MSE: {mse}')

Output:

Epoch 0, Loss: 4.6795973777771
Epoch 10, Loss: 1.1734471321105957
Test MSE: 0.6340514421463013

Ridge Regression

Python
import cntk as C
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
import numpy as np

# Load the California housing dataset
housing_data = fetch_california_housing()

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(housing_data.data, housing_data.target, test_size=0.2, random_state=42)

# Use PolynomialFeatures to generate polynomial terms (degree 2 for simplicity)
poly = PolynomialFeatures(degree=2)
X_train = poly.fit_transform(X_train)
X_test = poly.transform(X_test)

# Normalize features using StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Define input and output dimensions
input_dim = X_train.shape[1]  # Number of polynomial features
output_dim = 1                # Single output for regression

# Define the input and output placeholders
X = C.input_variable(input_dim)
y = C.input_variable(output_dim)

# Initialize weights and bias
W = C.parameter((input_dim, output_dim))
b = C.parameter((output_dim))

# Define the regression model: y = W * X + b
model = C.times(X, W) + b

# Define loss (MSE) with L2 regularization (Ridge)
loss = C.squared_error(model, y) + 0.01 * C.reduce_sum(C.square(W))  # L2 regularization term
eval_error = C.squared_error(model, y)

# Use Adam optimizer with momentum
learning_rate = 0.01
momentum = C.momentum_schedule(0.9)
learner = C.adam(model.parameters, C.learning_rate_schedule(learning_rate, C.UnitType.minibatch), momentum)

# Create the trainer
trainer = C.Trainer(model, (loss, eval_error), [learner])

# Convert data to CNTK format
X_train_cntk = np.array(X_train, dtype=np.float32)
y_train_cntk = np.array(y_train, dtype=np.float32).reshape(-1, 1)

# Set batch size and epochs
batch_size = 32
num_epochs = 20

# Training loop
for epoch in range(num_epochs):
    for i in range(0, len(X_train_cntk), batch_size):
        X_batch = X_train_cntk[i:i + batch_size]
        y_batch = y_train_cntk[i:i + batch_size]
        trainer.train_minibatch({X: X_batch, y: y_batch})
    if epoch % 10 == 0:
        print(f'Epoch {epoch}, Loss: {trainer.previous_minibatch_loss_average}')

# Convert test data to CNTK format
X_test_cntk = np.array(X_test, dtype=np.float32)
y_test_cntk = np.array(y_test, dtype=np.float32).reshape(-1, 1)

# Get predictions
predictions = model.eval({X: X_test_cntk})

# Calculate test error (MSE)
mse = np.mean((predictions - y_test_cntk) ** 2)
print(f'Test MSE: {mse}')

Output:

Epoch 0, Loss: 4.6577277183532715
Epoch 10, Loss: 1.1457855701446533
Test MSE: 0.6365649700164795

Conclusion

Building regression models with Microsoft Cognitive Toolkit (CNTK) offers a powerful approach to handling complex datasets and achieving high accuracy in predictions. The toolkit's flexibility, performance optimizations, and support for advanced techniques make it an excellent choice for developing scalable regression models. By leveraging CNTK's capabilities, developers can create models that efficiently predict continuous values, driving insights and decision-making across various domains.


Next Article

Similar Reads