0% found this document useful (0 votes)
68 views4 pages

Logistic Activation Function Overview

Activation Functions

Uploaded by

bcdedes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views4 pages

Logistic Activation Function Overview

Activation Functions

Uploaded by

bcdedes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Introduction

When constructing Artificial Neural Network (ANN) models, one of the primary
considerations is choosing activation functions for hidden and output layers that
are differentiable. This is because calculating the backpropagated error signal that is used to
determine ANN parameter updates requires the gradient of the activation function gradient.
Three of the most commonly-used activation functions used in ANNs are the identity function,
the logistic sigmoid function, and the hyperbolic tangent function. Examples of these functions
and their associated gradients (derivatives in 1D) are plotted in Figure 1.

Figure 1 Common activation functions functions used in artificial neural, along with their derivatives

In the remainder of this post, we derive the derivatives/gradients for each of these common
activation functions.

The Identity Activation Function


The simplest activation function, one that is commonly used for the output layer activation
function in regression problems, is the identity/linear activation function:

(Figure 1, red curves). This activation function simply maps the pre-activation to itself and can
output values that range . Why would one want to do use an identity activation
function? After all, a multi-layered network with linear activations at each layer can be equally-
formulated as a single-layered linear network. It turns out that the identity activation function is
surprisingly useful. For example, a multi-layer network that has nonlinear activation functions
amongst the hidden units and an output layer that uses the identity activation function
implements a powerful form of nonlinear regression. Specifically, the network can predict
continuous target values using a linear combination of signals that arise from one or more layers
of nonlinear transformations of the input.

The derivative of , , is simply 1, in the case of 1D inputs. For vector inputs of length
the gradient is , a vector of ones of length .

The Logistic Sigmoid Activation Function


Another function that is often used as the output activation function for binary classification
problems (i.e. outputs values that range ), is the logistic sigmoid. The logistic sigmoid has
the following form:

(Figure 1, blue curves) and outputs values that range . The logistic sigmoid is motivated
somewhat by biological neurons and can be interpreted as the probability of an artificial neuron
“firing” given its inputs. (It turns out that the logistic sigmoid can also be derived as
the maximum likelihood solution to for logistic regression in statistics). Calculating the
derivative of the logistic sigmoid function makes use of the quotient rule and a clever trick that
both adds and subtracts a one from the numerator:

Here we see that evaluated at is simply weighted by 1-minus- .


This turns out to be a convenient form for efficiently calculating gradients used in neural
networks: if one keeps in memory the feed-forward activations of the logistic function for a
given layer, the gradients for that layer can be evaluated using simple multiplication and
subtraction rather than performing any re-evaluating the sigmoid function, which requires
extra exponentiation.

The Hyperbolic Tangent Activation Function


Though the logistic sigmoid has a nice biological interpretation, it turns out that the logistic
sigmoid can cause a neural network to get “stuck” during training. This is due in part to the fact
that if a strongly-negative input is provided to the logistic sigmoid, it outputs values very near
zero. Since neural networks use the feed-forward activations to calculate parameter gradients
(again, see this previous post for details), this can result in model parameters that are updated
less regularly than we would like, and are thus “stuck” in their current state.

An alternative to the logistic sigmoid is the hyperbolic tangent, or tanh function (Figure 1, green
curves):

Like the logistic sigmoid, the tanh function is also sigmoidal (“s”-shaped), but instead outputs
values that range . Thus, strongly negative inputs to the tanh will map to negative outputs.
Additionally, only zero-valued inputs are mapped to near-zero outputs. These properties make
the network less likely to get “stuck” during training. Calculating the gradient for the tanh
function also uses the quotient rule:
Similar to the derivative for the logistic sigmoid, the derivative of is a function of feed-
forward activation evaluated at , namely . Thus, the same caching trick can be
used for layers that implement tanh activation functions.

Wrapping Up
In this post we reviewed a few commonly-used activation functions in neural network literature
and their derivative calculations. These activation functions are motivated by biology and/or
provide some handy implementation tricks like calculating derivatives using cached feed-
forward activation values. Note that there are also many other options for activation functions
not covered here: e.g. rectification, soft rectification, polynomial kernels, etc. Indeed, finding and
evaluating novel activation functions is an active subfield of machine learning research.
However, the three basic activations covered here can be used to solve a majority of the machine
learning problems one will likely face.

Common questions

Powered by AI

The identity function has a constant derivative of 1 for one-dimensional inputs, making its derivative calculations straightforward . The logistic sigmoid function's derivative is calculated using the quotient rule and involves multiplying the output of the sigmoid by one minus the output, allowing gradient computation without reevaluating the sigmoid function . The hyperbolic tangent function also uses the quotient rule for its derivative calculation but leverages a caching trick similar to the logistic sigmoid, involving the use of precomputed feed-forward activation values . This technique enhances efficiency by avoiding repeated complex operations.

The choice of activation function affects computational cost during backpropagation due to differences in derivative calculations. Functions like the logistic sigmoid and tanh allow caching of feed-forward activations, enabling faster gradient calculations using simple arithmetic instead of costly operations like exponentiation . In contrast, functions requiring complex derivative evaluations increase computational load, thus impacting the speed and efficiency of training neural networks .

Activation functions influence neural network stability and performance through their output ranges. The identity function outputs unbounded values, making it suitable for regression tasks where target values vary widely . The logistic sigmoid, with an output between 0 and 1, can lead to vanishing gradient problems due to saturation. In contrast, the tanh function's range of -1 to 1 helps maintain larger gradients, promoting better error signal propagation and reducing the likelihood of becoming "stuck" . Thus, the output range can directly affect convergence and sensitivity to network parameters.

Activation functions significantly impact both the outcomes and the efficiency of training in Artificial Neural Networks. The identity function, often used for regression problems, maps inputs to outputs directly, enabling networks to implement nonlinear regression when used with nonlinear hidden layers . The logistic sigmoid, used in binary classifications, can "get stuck" during training due to near-zero outputs for strongly negative inputs, thereby causing less frequent parameter updates . In contrast, the hyperbolic tangent (tanh) function outputs values ranging from -1 to 1, providing negative outputs for negative inputs, which mitigates the 'stuck' issue noticed with the logistic sigmoid . Each function, by virtue of its gradient properties, also influences computational efficiency during backpropagation through the cost of derivative evaluations .

The identity function in neural networks maps the input directly to the output without modification. It is typically used as the activation function in the output layer for regression problems, where the goal is to predict continuous target values. When combined with nonlinear activation functions in hidden layers, the identity function enables the network to perform nonlinear regression by outputting a linear combination of signals derived from complex transformations of the input data .

Utilizing identity activation functions in an output layer is rational despite potentially reducing a multi-layer network to a single linear layer because, in context, they serve specific roles, particularly in regression tasks. While intermediate nonlinear transformations afford complex feature representation, the identity function maintains these transformed inputs' linearity when predicting continuous values, enabling straightforward learning of relationships inherent in the data without additional nonlinear distortion at the output . This configuration balances model complexity and interpretability.

The hyperbolic tangent (tanh) function is advantageous over the logistic sigmoid function because it maps stronger negative inputs to negative outputs and has a broader output range from -1 to 1. This range provides gradients that are not near-zero for zero-valued inputs, reducing the risk of the neural network getting "stuck" during training. This problem often arises with the logistic sigmoid as it outputs near-zero for strongly negative inputs, potentially slowing down updates to model parameters .

Researchers pursue new activation functions in machine learning to address limitations in existing ones, like the saturation problem in sigmoid functions or inefficiencies in gradient calculation. New functions can offer improved gradient flow, faster convergence rates, and better handling of diverse data distributions. Moreover, innovative activations can optimize performance across varying neural network architectures and tasks, making their pursuit a critical subfield of machine learning research .

A common challenge with the logistic sigmoid activation function is that it can lead to neural networks getting 'stuck' during training. This occurs because the logistic sigmoid outputs values near zero for strongly negative inputs, resulting in smaller gradients that slow down parameter updates. This issue can be mitigated by using activation functions like the hyperbolic tangent, which outputs values in the range of -1 to 1, thus maintaining larger gradients and facilitating more efficient learning .

Caching feed-forward activations when using logistic sigmoid and tanh functions is beneficial because it allows the gradients for these layers to be calculated efficiently through simple multiplication and subtraction, rather than re-evaluating the activation function, which involves additional exponentiation. This reduction in computational overhead can significantly speed up the training process of neural networks .

You might also like