How to Implement Softmax and Cross-Entropy in Python and PyTorch
Last Updated :
24 Apr, 2023
Multiclass classification is an application of deep learning/machine learning where the model is given input and renders a categorical output corresponding to one of the labels that form the output. For example, providing a set of images of animals and classifying it among cats, dogs, horses, etc.
For this purpose, where the model outputs multiple outputs for each class, a simple logistic function (or sigmoid function) cannot be used. Thus, another activation function called the Softmax function is used along with the cross-entropy loss.
Softmax Function:
The softmax formula is represented as:
softmax function image
where the values of zi are the elements of the input vector and they can take any real value. The denominator of the formula is normalised term which guarantees that all the output values of the function will sum to 1, thus making it a valid probability distribution.
The softmax function and the sigmoid function are similar to each other. Softmax operates on vector values while the sigmoid takes scalar values. Thus, we can say that sigmoid function is a specific case of the softmax function and it is for a classifier with only two input classes. The logistic function, often known as the logistic sigmoid function, is the most common object of the word "sigmoid function" in the context of machine learning. Mathematically, it is defined by:
The above function is used for classification between 2 classes, i.e., 1 and 0. In the case of Multiclass classification, the softmax function is used. The softmax converts the output for each class to a probability value (between 0-1), which is exponentially normalized among the classes.
Example:
The below code implements the softmax function using python and NumPy.
Python3
# The below code implements the softmax function
# using python and numpy. It takes:
# Input: It takes input array/list of values
# Output: Outputs a array/list of softmax values.
# Importing the required libraries
import numpy as np
# Defining the softmax function
def softmax(values):
# Computing element wise exponential value
exp_values = np.exp(values)
# Computing sum of these values
exp_values_sum = np.sum(exp_values)
# Returing the softmax output.
return exp_values/exp_values_sum
if __name__ == '__main__':
# Input to be fed
values = [2, 4, 5, 3]
# Output achieved
output = softmax(values)
print("Softmax Output: ", output)
print("Sum of Softmax Values: ", np.sum(output))
Output:
Implementing Softmax using Python and Pytorch:
Below, we will see how we implement the softmax function using Python and Pytorch. For this purpose, we use the torch.nn.functional library provided by pytorch.
- First, import the required libraries.
- Now we use the softmax function provided by the PyTorch nn module. For this, we pass the input tensor to the function.
Syntax: torch.nn.functional.softmax(input_tensor, dim=None, _stacklevel=3, dtype=None)
Parameters
- input: The input on which softmax to be applied. Should be a Tensor.
- dim: Integer value. Indicates the dimension along which the softmax shall be applied.
- dtype (optional) – This is for setting the desired datatype for the returned tensor
- If set, the input is casted before the operation is applied. Default: None
Example:
The below code implements the softmax function using pytorch.
Python3
# The below code implements the softmax function
# using the function softmax provided by
# torch.nn.functional in pytorch.
# Input: The input values (list, array)
# Output: List (Computed softmax values)
# Importing the required Libraries
import torch.nn.functional as F
import torch
# The input tensor to be passed
input_ = torch.tensor([1, 2, 3])
# Printing the loss value
softmax = F.softmax(input_.float(), dim=0)
print("Softmax values are: ", softmax)
# Sum of all the softmax values
print("Sum of the softmax values: ", torch.sum(softmax))
Output:
Cross-Entropy Loss
Loss functions are the objective functions used in any machine learning task to train the corresponding model. One of the most important loss functions used here is Cross-Entropy Loss, also known as logistic loss or log loss, used in the classification task. The understanding of Cross-Entropy Loss is based on the Softmax Activation function.
The softmax function tends to return a vector of C classes, where each entry denotes the probability of the occurrence of the corresponding class. The cross-entropy loss tends to compute the distance/deviation of this vector from the true probability vector.
Entropy
The entropy of any random variable X is defined as the level of disorder or randomness inherited in its possible outcome.
For P(x) be any probability distribution, we define Entropy as:
The negative sign is there as p(x) <= 1, therefore log(p(x)) <= 0. So to have a positive value, the negative sign is used.
Cross-Entropy
Mathematically, cross-entropy is defined as:
Here is the true probability of a class, while is the computed probability using the Softmax function.
Implementing Cross Entropy Loss using Python and Numpy
Below we discuss the Implementation of Cross-Entropy Loss using Python and the Numpy Library.
- Import the Numpy Library
- Define the Cross-Entropy Loss function. In defining this function:
- We pass the true and predicted values for a data point.
- Next, we compute the softmax of the predicted values.
- We compute the cross-entropy loss.
Example:
The below code implements the softmax function using Python and Numpy.
Python3
# The below code implements the cross entropy
# loss between the predicted values and the
# true values of cass labels. The function:
# Inputs: Predicted values, True values
# Output: The cross entropy loss between them.
# Importing the required library
import torch.nn as nn
import torch
# Cross Entropy function.
def cross_entropy(y_pred, y_true):
# computing softmax values for predicted values
y_pred = softmax(y_pred)
loss = 0
# Doing cross entropy Loss
for i in range(len(y_pred)):
# Here, the loss is computed using the
# above mathematical formulation.
loss = loss + (-1 * y_true[i]*np.log(y_pred[i]))
return loss
# y_true: True Probability Distribution
y_true = [1, 0, 0, 0, 0]
# y_pred: Predicted values for each calss
y_pred = [10, 5, 3, 1, 4]
# Calling the cross_entropy function by passing
# the suitable values
cross_entropy_loss = cross_entropy(y_pred, y_true)
print("Cross Entropy Loss: ", cross_entropy_loss)
Output:
Implementing Cross-Entropy Loss using Pytorch:
For implementing Cross-Entropy Loss using Pytorch, we use torch.nn library. Below are the required steps:
- Import the libraries. Here, we will use the torch.nn library provided by PyTorch.
- We use the CrossEntropyLoss() class for computing the loss. Therefore we define an object for CrossEntropyLoss().
- We pass the true values and Predicted values to this function.
torch.nn.CrossEntropyLoss():
The function implements the cross-entropy loss between the input and the target value.
Syntax: torch.nn.CrossEntropyLoss(weight=None, ignore_index=- 100, reduce=None, reduction='mean', label_smoothing=0.0)
Parameters:
- weight (optional) – A rescaling weight provided for each class. Has to be a Tensor of size C, where C are number of classe.
- ignore_index (optional) – An integer values which specifies the target value which is to be ignored.
This means, it will not contribute for the gradient. - reduce (optional) – Boolean value. By default, the losses computed are averaged or summed over observations for each minibatch. When set False, it
returns a loss per element of the batch. Default: True - reduction (optional) – A string which specifies the reduction applied to the output. Can be: 'none' | 'mean' | 'sum'.
1. 'none': No reduction applied
2. 'mean': A weighted mean of the output is applied.
3. 'sum': The output will be summed.
Default: mean - label_smoothing (optional) – A value (type float) in range [0.0, 1.0]. Specifies the amount of smoothing when computing the loss.
Below is the code:
The code below computes the cross-entropy loss between the input and the target value using Pytorch.
Python3
# The below code implements the cross
# entropy loss using the CrossEntropyLoss()
# class provided by torch.nn module in pytorch.
# Importing the required library
import torch.nn as nn
import torch
# Defining the object for this class.
loss = nn.CrossEntropyLoss()
# y_pred: Predicted values
y_pred = torch.tensor([[1.4, 0.4, 1.1, 0.1, 2.3]])
# y_true: True class label
y_true = torch.tensor([0])
# Passing these values to the loss object.
cross_entropy_loss = loss(y_pred, y_true)
# Printing the value of the loss.
print("Cross Entropy Loss: ", cross_entropy_loss.item())
Output:
Similar Reads
How to compute element-wise entropy of an input tensor in PyTorch In this article, we are going to discuss how to compute the element-wise entropy of an input tensor in PyTorch, we can compute this by using torch.special.entr() method. torch.special.entr() method torch.special.entr() method computes the element-wise entropy, This method accepts a tensor as input a
2 min read
Implementing an Autoencoder in PyTorch Autoencoders are neural networks designed for unsupervised tasks like dimensionality reduction, anomaly detection and feature extraction. They work by compressing data into a smaller form through an encoder and then reconstructing it back using a decoder. The goal is to minimize the difference betwe
4 min read
How to Implement Adam Gradient Descent from Scratch using Python? Grade descent is an extensively used optimization algorithm in machine literacy and deep literacy. It's used to minimize the cost or loss function of a model by iteratively confirming the model's parameters grounded on the slants of the cost function with respect to those parameters. One variant of
14 min read
How to Implement Various Optimization Algorithms in Pytorch? Optimization algorithms are an essential aspect of deep learning, and PyTorch provides a wide range of optimization algorithms to help us train our neural networks effectively. In this article, we will explore various optimization algorithms in PyTorch and demonstrate how to implement them. We will
6 min read
How to calculate the F1 score and other custom metrics in PyTorch? Evaluating deep learning models goes beyond just training them; it means rigorously checking their performance to ensure they're accurate, reliable, and efficient for real-world use. This evaluation is critical because it tells us how well a model has learned and how effective it might be in real-li
7 min read
How to implement neural networks in PyTorch? This tutorial shows how to use PyTorch to create a basic neural network for classifying handwritten digits from the MNIST dataset. Neural networks, which are central to modern AI, enable machines to learn tasks like regression, classification, and generation. With PyTorch, you'll learn how to design
5 min read
How to Measure the Binary Cross Entropy Between the Target and the Input Probabilities in PyTorch? In this article, we are going to see how to Measure the Binary Cross Entropy between the target and the input probabilities in PyTorch using Python. We can measure this by using the BCELoss() method of torch.nn module. BCELoss() method The BCELoss() method measures the Binary Cross Entropy between t
2 min read
How to Flatten Input in nn.Sequential in PyTorch One of the essential operations in neural networks, especially when transitioning from convolutional layers to fully connected layers, is flattening. Flattening transforms a multi-dimensional tensor into a one-dimensional tensor, making it compatible with linear layers. This article explores how to
3 min read
Difference Between Softmax and Softmax_Cross_Entropy_With_Logits Activation functions and loss functions play pivotal roles in the training and performance of neural networks. Two commonly used functions in this context are the Softmax activation function and the softmax_cross_entropy_with_logits loss function. In this article, we will learn about these functions
9 min read