0% found this document useful (0 votes)
28 views8 pages

Unit - 1 Leftover Topic Notes

The document discusses key concepts in statistical learning and machine learning, focusing on estimators, bias, variance, and the bias-variance tradeoff, which are essential for model performance and generalization. It also covers deep feedforward networks, detailing their structure, training processes, activation functions, and regularization techniques to prevent overfitting. These concepts are crucial for building effective machine learning models that can accurately predict outcomes from complex data.

Uploaded by

batchaids
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views8 pages

Unit - 1 Leftover Topic Notes

The document discusses key concepts in statistical learning and machine learning, focusing on estimators, bias, variance, and the bias-variance tradeoff, which are essential for model performance and generalization. It also covers deep feedforward networks, detailing their structure, training processes, activation functions, and regularization techniques to prevent overfitting. These concepts are crucial for building effective machine learning models that can accurately predict outcomes from complex data.

Uploaded by

batchaids
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Estimators -- Bias and variance

Estimators, bias, and variance are fundamental concepts in statistical learning and machine
learning. They help in understanding the performance of models and are crucial for
developing accurate and generalizable predictive models. Here’s a detailed look at each:
1. Estimators
In statistical learning, an estimator is a rule or method for estimating an unknown parameter
from observed data.
**a. Definition
 Estimator: A function or algorithm used to estimate a parameter of a statistical model
based on data.
**b. Types of Estimators
 Point Estimator: Provides a single value estimate of a parameter. For example, the
sample mean is a point estimator of the population mean.
 Interval Estimator: Provides a range within which the parameter is expected to lie
with a certain probability. For example, confidence intervals.
**c. Example
 Sample Mean as an Estimator:
python
Copy code
import numpy as np

data = [2, 3, 5, 7, 11]


sample_mean = np.mean(data)
print("Sample Mean:", sample_mean)
2. Bias
Bias refers to the error introduced by approximating a real-world problem, which may be
complex, by a simpler model. It is the difference between the expected (average) prediction
of the model and the true value of the parameter being estimated.
**a. Definition
 Bias: The difference between the expected value of the estimator and the true value of
the parameter.
**b. Mathematically
 Bias Formula: Bias=E[θ^]−θ\text{Bias} = E[\hat{\theta}] - \thetaBias=E[θ^]−θ
o θ^\hat{\theta}θ^: The estimator
o θ\thetaθ: The true parameter value
**c. High Bias
 Implication: Indicates that the model is too simple and may underfit the data. For
example, a linear model trying to fit non-linear data.
**d. Example
 High Bias Example:
python
Copy code
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_classification
from sklearn.metrics import mean_squared_error

# Generate some data


X, y = make_classification(n_samples=100, n_features=2, n_informative=2,
n_clusters_per_class=1)
model = LinearRegression()
model.fit(X, y)
predictions = model.predict(X)
error = mean_squared_error(y, predictions)
print("Mean Squared Error with Linear Model (High Bias):", error)
3. Variance
Variance measures the variability of the estimator's predictions when different training
datasets are used. It reflects how sensitive the estimator is to the specific data it was trained
on.
**a. Definition
 Variance: The expected squared deviation of the estimator from its expected value.
**b. Mathematically
 Variance Formula: Variance=E[(θ^−E[θ^])2]\text{Variance} = E[(\hat{\theta} - E[\
hat{\theta}])^2]Variance=E[(θ^−E[θ^])2]
**c. High Variance
 Implication: Indicates that the model is too complex and may overfit the data. For
example, a high-degree polynomial model on a small dataset.
**d. Example
 High Variance Example:
python
Copy code
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Generate some data


X, y = make_classification(n_samples=100, n_features=1, n_informative=1,
n_clusters_per_class=1)

# High-degree polynomial regression


polynomial_features = PolynomialFeatures(degree=15)
model = make_pipeline(polynomial_features, LinearRegression())
model.fit(X, y)
predictions = model.predict(X)
error = mean_squared_error(y, predictions)
print("Mean Squared Error with High-Degree Polynomial Model (High Variance):", error)
4. Bias-Variance Tradeoff
The bias-variance tradeoff is a fundamental concept in machine learning that describes the
tradeoff between model complexity and its ability to generalize.
**a. Tradeoff Explanation
 High Bias: Model is too simple, leading to underfitting.
 High Variance: Model is too complex, leading to overfitting.
 Goal: Find a balance where both bias and variance are minimized to achieve the best
generalization performance.
**b. Visualization
 A typical bias-variance tradeoff curve might show low error for both bias and variance
initially but start to increase as model complexity grows.
**c. Example of Balancing
python
Copy code
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Generate some data


X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_classes=2)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# Train model with balanced complexity


model = RandomForestClassifier(n_estimators=50)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print("Model Accuracy with Balanced Complexity:", accuracy)
Summary
 Estimators: Rules or algorithms for estimating unknown parameters from data.
Examples include the sample mean.
 Bias: Error due to approximating a complex problem with a simpler model. High bias
indicates underfitting.
 Variance: Error due to sensitivity to the training data. High variance indicates
overfitting.
 Bias-Variance Tradeoff: Balancing model complexity to minimize both bias and
variance, aiming for good generalization.
Understanding and managing bias and variance is crucial for building effective machine
learning models that generalize well to unseen data.
----------------------------------------------------------------------------------------------------------
Deep Feedforward Networks
Deep Feedforward Networks, also known as Deep Neural Networks (DNNs), are a type of
artificial neural network where multiple layers of neurons are arranged sequentially. They are
designed to model complex patterns and learn hierarchical representations from data. Here’s
an overview of deep feedforward networks:
1. Structure of Deep Feedforward Networks
a. Input Layer
 Purpose: Receives the input features and passes them to the subsequent layers.
 Example: For image classification, the input layer would take pixel values of images.
b. Hidden Layers
 Purpose: Extract features and learn representations. Each hidden layer consists of
multiple neurons (units) with non-linear activation functions.
 Types of Layers:
o Fully Connected (Dense) Layers: Each neuron is connected to every neuron
in the previous layer.
o Activation Functions: Introduce non-linearity (e.g., ReLU, Sigmoid, Tanh).
c. Output Layer
 Purpose: Produces the final prediction. The type of activation function used depends
on the task:
o Classification: Softmax or sigmoid activation functions.
o Regression: Linear activation function.
d. Example Architecture
Here's an example of a simple deep feedforward network with 3 hidden layers:
python
Copy code
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# Define the model
model = Sequential([
Dense(64, activation='relu', input_shape=(input_dim,)), # First hidden layer
Dense(64, activation='relu'), # Second hidden layer
Dense(64, activation='relu'), # Third hidden layer
Dense(output_dim, activation='softmax') # Output layer (for classification)
])
# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
2. Training Deep Feedforward Networks
a. Forward Pass
 Purpose: Compute the output of the network by propagating input data through each
layer.
 Process: Input data is multiplied by weights, biases are added, and activation
functions are applied.
b. Loss Function
 Purpose: Measure the difference between the predicted output and the actual target.
 Types:
o For Classification: Cross-entropy loss.
o For Regression: Mean squared error or mean absolute error.
c. Backward Pass (Backpropagation)
 Purpose: Compute gradients of the loss function with respect to each weight by
propagating errors backward through the network.
 Process: Update weights using gradient descent or its variants (e.g., Adam).
d. Example Training Code
python
Copy code
# Example training data
X_train = ... # Input features
y_train = ... # Target labels

# Train the model


model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)
3. Activation Functions
Activation functions introduce non-linearity into the network, enabling it to learn complex
patterns.
a. ReLU (Rectified Linear Unit)
 Formula: ReLU(x)=max⁡(0,x)\text{ReLU}(x) = \max(0, x)ReLU(x)=max(0,x)
 Benefits: Reduces vanishing gradient problems, commonly used in hidden layers.
b. Sigmoid
 Formula: σ(x)=11+e−x\sigma(x) = \frac{1}{1 + e^{-x}}σ(x)=1+e−x1
 Use Case: Often used in binary classification tasks.
c. Tanh (Hyperbolic Tangent)
 Formula: tanh⁡(x)=ex−e−xex+e−x\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-
x}}tanh(x)=ex+e−xex−e−x
 Use Case: Provides outputs in the range (-1, 1), often used in hidden layers.
d. Softmax
 Formula: Softmax(xi)=exi∑jexj\text{Softmax}(x_i) = \frac{e^{x_i}}{\sum_{j}
e^{x_j}}Softmax(xi)=∑jexjexi
 Use Case: Produces a probability distribution for classification tasks with multiple
classes.
4. Regularization Techniques
To prevent overfitting and improve generalization, various regularization techniques can be
applied:
a. Dropout
 Description: Randomly sets a fraction of the input units to zero during training.
 Example Code:
python
Copy code
from tensorflow.keras.layers import Dropout
model = Sequential([
Dense(64, activation='relu', input_shape=(input_dim,)),
Dropout(0.5),
Dense(output_dim, activation='softmax')
])
b. L1/L2 Regularization
 Description: Adds a penalty proportional to the size of weights (L1) or their squares
(L2).
 Example Code:
python
Copy code
from tensorflow.keras.regularizers import l2

model = Sequential([
Dense(64, activation='relu', input_shape=(input_dim,), kernel_regularizer=l2(0.01)),
Dense(output_dim, activation='softmax')
])
5. Use Cases
a. Image Classification
 Description: Classifying images into different categories (e.g., cats vs. dogs).
b. Speech Recognition
 Description: Converting spoken language into text.
c. Text Classification
 Description: Categorizing text into predefined classes (e.g., spam vs. non-spam
emails).
Summary
 Deep Feedforward Networks: Composed of input, hidden, and output layers where
each layer is fully connected to the next. They are used for a wide range of tasks,
including classification and regression.
 Training: Involves forward pass, loss computation, backpropagation, and weight
updates.
 Activation Functions: Introduce non-linearity and are crucial for learning complex
patterns.
 Regularization: Techniques like dropout and L1/L2 regularization help in preventing
overfitting and improving generalization.
Deep feedforward networks are fundamental to many deep learning applications and provide
a robust framework for modeling complex relationships in data.

You might also like