Module V
Module V
Word Embeddings:
Word embeddings are vector representations of words that capture semantic meaning. Unlike
traditional methods like one-hot encoding, where each word is represented by a sparse vector,
word embeddings represent words in dense vectors of fixed size. These embeddings are
learned in such a way that words with similar meanings are closer to each other in the vector
space. Common techniques for generating word embeddings include:
These embeddings are useful for various NLP tasks such as sentiment analysis, machine
translation, and text classification.
Nonlinear neural networks are those where the activation function applied to neurons
introduces nonlinearity into the network. This nonlinearity allows the network to approximate
complex functions. Without nonlinearities, a neural network would simply be equivalent to a
linear transformation, no matter how many layers it has.
ReLU (Rectified Linear Unit): f(x) = max(0, x)—widely used in hidden layers
because it helps mitigate the vanishing gradient problem.
Sigmoid: Maps input to the range (0, 1), commonly used in binary classification
problems.
Tanh: Maps input to the range (-1, 1), often used in recurrent neural networks
(RNNs).
Leaky ReLU: Similar to ReLU but allows a small, nonzero gradient for negative
values, helping with the "dying ReLU" problem.
By stacking layers of nonlinear transformations, neural networks can learn highly complex
patterns in data, making them powerful tools in tasks such as image recognition, language
modeling, and game playing.
Here's an overview of the topics you've mentioned, which are foundational for building
neural network-based models:
1. Neuron - Intro:
A neuron is the fundamental building block of a neural network. It's modeled after the
biological neuron and receives inputs, processes them, and produces an output.
Mathematically, a neuron takes a weighted sum of its inputs and passes the result through an
activation function to produce an output.
The activation function introduces nonlinearity, enabling the network to model complex
patterns.
2. Fitting a Line:
In machine learning, fitting a line is a simple way of understanding how a model can learn.
In linear regression, for example, the goal is to find the best-fit line that minimizes the error
between predicted and actual values. The process involves adjusting the weights
(coefficients) of the features to minimize a loss function.
For example, in a 2D space with one feature xx and output yy, fitting a line would involve
finding the equation:
y=wx+by = wx + b
where ww and bb are learned parameters. This is typically achieved through optimization
techniques like gradient descent.
For classification tasks, we aim to predict discrete labels (e.g., spam or not spam). The
process generally involves:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
Text classification involves categorizing text into predefined categories (e.g., sentiment
analysis or spam detection). For this task, you would typically use preprocessing techniques
like tokenization, padding, and embedding to convert text into numerical representations.
In TensorFlow, you can use the Keras API to build a text classification model. Here's an
example of how you might prepare and train such a model:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, LSTM
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
model.compile(optimizer='adam', loss='binary_crossentropy',
metrics=['accuracy'])
The neuron operates as a basic computational unit that processes inputs and produces an
output. The output is influenced by the weights and bias, and the activation function controls
how the neuron responds to different inputs.
Neurons are organized into layers (input, hidden, and output layers). The entire neural
network learns by adjusting the weights of these neurons based on the data it processes and
the optimization process (e.g., gradient descent).
A model learns through training, where the weights of the neurons are adjusted to minimize
a loss function. The general steps in model learning are:
1. Forward Pass: The input data is passed through the network, layer by layer, to make
predictions.
2. Loss Calculation: The predicted output is compared with the true labels to compute
the loss (error).
3. Backpropagation: The loss is propagated backward through the network, adjusting
the weights using optimization techniques (like gradient descent).
4. Parameter Update: The weights are updated to reduce the loss, and the process
repeats for multiple epochs until convergence.
Let’s explore Feedforward Neural Networks (FNNs), activation functions, and text
classification using TensorFlow in more detail.
3. Activation Functions:
Activation functions introduce nonlinearity into the network, which allows the network to
learn complex patterns. Without them, the network would essentially be a linear model,
regardless of the number of layers.
ReLU (Rectified Linear Unit): f(x)=max(0,x)f(x) = \max(0, x). It’s the most widely
used because it reduces the likelihood of the vanishing gradient problem.
Sigmoid: f(x)=11+e−xf(x) = \frac{1}{1 + e^{-x}}. It outputs values between 0 and 1,
making it suitable for binary classification.
Tanh: f(x)=tanh(x)f(x) = \tanh(x), outputs values between -1 and 1.
Softmax: Often used in the output layer for multi-class classification, it normalizes
the output into probabilities (values between 0 and 1 that sum to 1).
4. Multiclass Classification:
In multiclass classification, the goal is to classify inputs into more than two categories. For
example, in digit recognition (MNIST dataset), the model needs to classify a digit as one of
the 10 possible digits (0–9).
Text classification is the task of assigning a label to a given text. For example, classifying
emails as spam or not spam.
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
# Example data
texts = ['I love machine learning', 'Deep learning is great', 'Text
classification is fun']
labels = [1, 1, 0] # 1: positive, 0: negative (binary classification)
Text preprocessing involves converting raw text into a format that can be fed into the neural
network. Common preprocessing steps include:
# Initialize Tokenizer
tokenizer = Tokenizer(num_words=10000)
tokenizer.fit_on_texts(texts)
import tensorflow as tf
from tensorflow.keras.layers import TextVectorization
Summary:
Let’s break down embeddings, the Continuous Bag of Words (CBOW) model, and how to
implement CBOW in TensorFlow.
1. Embeddings:
In natural language processing (NLP), embeddings are dense vector representations of words
or tokens, where semantically similar words have similar vector representations. Embeddings
reduce the high-dimensional space of words into lower dimensions while preserving the
semantic relationships between words.
Word Embeddings can be learned using algorithms like Word2Vec, GloVe, or
FastText.
Word2Vec creates embeddings by training a neural network to predict words in a
given context (using CBOW or Skip-Gram).
In Word2Vec, the embeddings for words are vectors that capture semantic similarities based
on their usage in contexts (e.g., "king" and "queen" will have similar vector representations
due to their semantic relationship).
2. Continuous Bag of Words (CBOW):
The CBOW model is one of the two primary architectures of Word2Vec (the other being
Skip-Gram). CBOW predicts a target word (center word) from its context (surrounding
words). It is called "bag of words" because the model considers the context as a set (ignoring
word order).
CBOW Process:
1. Input: A window of context words around a target word.
2. Prediction: The model tries to predict the target word given the surrounding context
words.
For example, in the sentence “The quick brown fox jumps over the lazy dog,” if we choose
the context window size to be 2, and "fox" is the target word, the context words would be
"quick", "brown", "jumps", and "over". The CBOW model would predict "fox" using these
context words.
CBOW Architecture:
Input Layer: The context words are one-hot encoded or converted into embeddings.
Hidden Layer: A shared weight matrix is used to map the context words into a
lower-dimensional vector.
Output Layer: A softmax activation is applied to predict the probability distribution
of the target word.
3. CBOW in TensorFlow:
Let’s implement a simple CBOW model in TensorFlow. We’ll use the following steps:
1. Preprocessing the text: Tokenize the text and prepare context-target pairs.
2. Model Architecture: Build the CBOW model with embedding layers and a softmax
output layer.
Here’s a basic implementation of a CBOW model in TensorFlow:
Preprocessing:
Tokenize the text into words and create context-target pairs for training.
import tensorflow as tf
import numpy as np
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import skipgrams