Here is our new paper accepted in Neurocomputing journal. SITU: Stochastic input encoding and weight update thresholding for efficient memristive neural network in-situ training The Analog-to-Digital Converter (ADC) sensing and conductance update are the most power-demanding processes in the in-situ training of memristive neural networks. In this work, we propose a new thresholded weight update method in conjunction with stochastic input encoding to reduce the ADC sensing requirement and weight updates. This leads to better power and area efficiency for simple neural network models developed in situ on memristive crossbars. This is an interesting research work by Xuening Dong and collaboration between University of Toronto, James Cook University , and York University. https://lnkd.in/gY4-GgtC
Amirali Amirsoleimani, SMIEEE, PEng’s Post
More Relevant Posts
-
📃Scientific paper: Measurement-based/Model-less Estimation of Voltage Sensitivity Coefficients by Feedforward and LSTM Neural Networks in Power Distribution Grids Abstract: The increasing adoption of measurement units in electrical power distribution grids has enabled the deployment of data-driven and measurement-based control schemes. Such schemes rely on measurement-based estimated models, where the models are first estimated using raw measurements and then used in the control problem. This work focuses on measurement-based estimation of the voltage sensitivity coefficients which can be used for voltage control. In the existing literature, these coefficients are estimated using regression-based methods, which do not perform well in the case of high measurement noise. This work proposes tackling this problem by using neural network (NN)-based estimation of the voltage sensitivity coefficients which is robust against measurement noise. In particular, we propose using Feedforward and Long-Short Term Memory (LSTM) neural networks. The trained NNs take measurements of nodal voltage magnitudes and active and reactive powers and output the vector of voltage magnitude sensitivity coefficients. The performance of the proposed scheme is compared against the regression-based method for a CIGRE benchmark network. ;Comment: Submitted to TPEC 2024 Continued on ES/IODE ➡️ https://etcse.fr/DM0 ------- If you find this interesting, feel free to follow, comment and share. We need your help to enhance our visibility, so that our platform continues to serve you.
Measurement-based/Model-less Estimation of Voltage Sensitivity Coefficients by Feedforward and LSTM Neural Networks in Power Distribution Grids
ethicseido.com
To view or add a comment, sign in
-
Let's learn about Restricted Boltzmann Machines (RBMs), a fundamental model in unsupervised learning within the domain of neural networks. RBMs are stochastic recurrent neural networks designed to learn internal representations of data through probabilistic inference. They consist of two layers: a visible layer representing input data and a hidden layer capturing latent features. The primary goal of an RBM is to reconstruct the input data by adjusting its weights and biases based on observed patterns. In computational terms, RBMs operate by iteratively adjusting weights that connect the visible and hidden layers. During training, RBMs use a technique known as Contrastive Divergence (CD) to update these parameters. CD involves two phases: the positive phase, where the model computes the activations of hidden units given input data, and the negative phase, where it generates reconstructed input data based on hidden unit activations. By comparing the differences between these phases, RBMs adjust their weights to minimize reconstruction error and maximize the likelihood of observed data. The below animation illustrates the training process over multiple epochs, showing how the RBM's weights and biases evolve to capture underlying patterns in the input data.
To view or add a comment, sign in
-
In my last three blog posts, I explained how to address high but incorrect confidence in neural networks by properly initializing the last softmax layer and scaling it with small numbers. This method helped achieve more uniform logits and a reasonable initial loss. I also discussed the problem of dead neurons in tanh activation functions by examining the backward pass mathematically. To tackle vanishing gradients and dead neurons, I explored both informal and standard solutions, emphasizing the importance of initializing the hidden layers properly and scaling weights with standard values based on the activation function. Lastly, I introduced Kaiming Initialization and Batch Normalization—modern techniques that simplify the process. Batch normalization, in particular, removes the need to worry about manual initialization as it normalizes layers dynamically during training. In my latest blog post, I dive deeper into visualizing the dynamics of activations, gradients, and parameter updates across a neural network. Building on the changes made in the last three blog posts, I analyzed how these adjustments impacted the network’s training dynamics. Through graphs and detailed analysis, I explored how initialization, gain, batch normalization, and learning rate adjustments interact to create stable and efficient training. This visualization journey ties everything together, highlighting the importance of understanding these dynamics to build robust neural networks. Check it out to see how all these concepts come alive through visual insights!
Visualizing the Activation, Gradients, Gradient to data ratio, and Update to Data Distributions in Neural Networks Training.
aabidkarim.hashnode.dev
To view or add a comment, sign in
-
AIX partners believe working actively as practitioners is the best way to keep pace with AI — from cutting-edge research to production and widespread adoption. Read the latest machine learning research from AIX Partner Christopher Manning.
Technical Leader - Artificial Intelligence and Deep Learning Enthusiast - Senior Software Engineer at ALTEN Italia
"Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations" Robert Csordas, Christopher Manning et al. "The Linear Representation Hypothesis (LRH) states that neural networks learn to encode concepts as directions in activation space, and a strong version of the LRH states that models learn only such encodings. In this paper, we present a counterexample to this strong LRH: when trained to repeat an input token sequence, gated recurrent neural networks (RNNs) learn to represent the token at each position with a particular order of magnitude, rather than a direction. These representations have layered features that are impossible to locate in distinct linear subspaces. To show this, we train interventions to predict and manipulate tokens by learning the scaling factor corresponding to each sequence position. These interventions indicate that the smallest RNNs find only this magnitude-based solution, while larger RNNs have linear representations. These findings strongly indicate that interpretability research should not be confined by the LRH." Paper: https://lnkd.in/dxqrX65i #machinelearning
To view or add a comment, sign in
-
Diving into the world of neural networks This diagram illustrates the core components and data flow within a single artificial neuron, which is the fundamental unit of neural networks. The neuron receives multiple input values (x1, x2, x3,...xn). Each input gets multiplied by a weight value (w1, w2, w3,...wn). The weighted inputs are then summed together, along with a bias term (b), to calculate the weighted sum (Σ). This weighted sum passes through an activation function (σ). The activation function introduces non-linearity, which is crucial for neural networks to learn complex patterns in data. The main types of activation functions used are: 🔺 Sigmoid: Squashes values between 0 and 1, useful for outputs that need to be probabilistic. 🔺 Hyperbolic Tangent (Tanh): Similar to sigmoid but ranges from -1 to 1. Can perform better than sigmoid in some cases. 🔺 Rectified Linear Unit (ReLU): Sets all negative values to 0. Simple but very effective and computationally efficient. 🔺 Leaky ReLU: A variant that doesn't cause some neurons to be "dead" by allowing small negative values. 🔺 Maxout: Generalizes ReLU and leaky ReLU, computing the max of dot products. The activated output from the neuron then propagates to the next layer of the neural network or becomes the final output. By stacking many of these neurons together, deep learning models can represent highly complex functions. Rahul Maheshwari
To view or add a comment, sign in
-
Information Geometry of Evolution of Neural Network Parameters While Training Abstract: Artificial neural networks (ANNs) are powerful tools capable of approximating any arbitrary mathematical function, but their interpretability remains limited, rendering them as black box models. To address this issue, numerous methods have been proposed to enhance the explainability and interpretability of ANNs. In this study, we introduce the application of information geometric framework to investigate phase transition-like behavior during the training of ANNs and relate these transitions to overfitting in certain models. The evolution of ANNs during training is studied by looking at the probability distribution of its parameters. Information geometry utilizing the principles of differential geometry, offers a unique perspective on probability and statistics by considering probability density functions as points on a Riemannian manifold. We create this manifold using a metric based on Fisher information to define a distance and a velocity. By parameterizing this distance and velocity with training steps, we study how the ANN evolves as training progresses. Utilizing standard datasets like MNIST, FMNIST and CIFAR-10, we observe a transition in the motion on the manifold while training the ANN and this transition is identified with over-fitting in the ANN models considered. The information geometric transitions observed is shown to be mathematically similar to the phase transitions in physics. Preliminary results showing finite-size scaling behavior is also provided. This work contributes to the development of robust tools for improving the explainability and interpretability of ANNs, aiding in our understanding of the variability of the parameters these complex models exhibit during training.
To view or add a comment, sign in
-
Learning algorithms for neural networks use local search to choose the weights that will get the right output for each input during training. The most common training technique is the backpropagation algorithm.[105] Neural networks learn to model complex relationships between inputs and outputs and find patterns in data. In theory, a neural network can learn any function.[106] In feedforward neural networks the signal passes in only one direction.[107] Recurrent neural networks feed the output signal back into the input, which allows short-term memories of previous input events. Long short term memory is the most successful network architecture for recurrent networks.[108] Perceptrons[109] use only a single layer of neurons, deep learning[110] uses multiple layers. Convolutional neural networks strengthen the connection between neurons that are "close" to each other—this is especially important in image processing, where a local set of neurons must identify an "edge" before the network can identify an object.
To view or add a comment, sign in
-
An artificial neural network (ANN) is a computational model to perform tasks like prediction, classification, decision making, etc. It consists of artificial neurons. These artificial neurons are a copy of human brain neurons. Neurons in the brain pass the signals to perform the actions. Similarly, artificial neurons connect in a neural network to perform tasks. The connection between the artificial neurons is called weight. A neural network consists of three layers. The first layer is the input layer. It contains the input neurons that send information to the hidden layer. The hidden layer performs the computations on input data and transfers the output to the output layer. It includes weight, activation function, cost function. The connection between neurons is called weight, which is the numerical values. The weight between neurons determines the learning ability of the neural network. During the learning of artificial neural networks, weight between the neuron changes. Initial weights are set randomly. To standardize the output from the neuron, the "activation function" is used. Activation functions are the mathematical equations that calculate the output of the neural network. Standardization refers to the transformation of data to have mean 0 and standard deviation 1. Each neuron has its activation function. It is difficult to understand without mathematical reasoning. It also helps to normalize the output in a range between 0 to 1 or -1 to 1. An activation function is also known as the transfer function. #snsinstitutions #snsdesignthinkers #designthinking
To view or add a comment, sign in
-
"Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations" Robert Csordas, Christopher Manning et al. "The Linear Representation Hypothesis (LRH) states that neural networks learn to encode concepts as directions in activation space, and a strong version of the LRH states that models learn only such encodings. In this paper, we present a counterexample to this strong LRH: when trained to repeat an input token sequence, gated recurrent neural networks (RNNs) learn to represent the token at each position with a particular order of magnitude, rather than a direction. These representations have layered features that are impossible to locate in distinct linear subspaces. To show this, we train interventions to predict and manipulate tokens by learning the scaling factor corresponding to each sequence position. These interventions indicate that the smallest RNNs find only this magnitude-based solution, while larger RNNs have linear representations. These findings strongly indicate that interpretability research should not be confined by the LRH." Paper: https://lnkd.in/dxqrX65i #machinelearning
To view or add a comment, sign in
Postdoctoral Fellowship University of Toronto
6mocongrats!