Predictive Analytics Answer Key
Predictive Analytics Answer Key
UNIT I
Question 1:
Supervised learning is a type of machine learning where the model is trained on labeled
data. Example: Predicting house prices using historical data.
Question 2:
Linear regression predicts a continuous outcome based on input features. The least squares
method minimizes the sum of squared errors.
Question 3:
Ridge regression adds an L2 penalty term to reduce overfitting, while Lasso regression adds
an L1 penalty, which helps with feature selection.
Question 4:
Logistic regression predicts binary outcomes (0 or 1) using a logistic function: P(Y=1) = 1 /
(1 + e^(-z)), where z is a linear combination of inputs.
Question 5:
The Perceptron Learning Algorithm is an iterative process for binary classification that
updates weights to find a linear decision boundary.
UNIT II
Question 1:
The bias-variance tradeoff is a balance between model complexity and accuracy. High bias
leads to underfitting, and high variance leads to overfitting.
Question 2:
Cross-validation splits the dataset into multiple subsets to evaluate model performance and
prevent overfitting.
Question 3:
The Bayesian approach incorporates prior information to assess models and calculates
posterior probabilities for model selection.
Question 4:
Effective number of parameters refers to the number of parameters in a model that
significantly influence predictions.
Question 5:
Conditional test error is the error for a fixed training set, while expected test error averages
errors over all possible training datasets.
UNIT III
Question 1:
Generalized Additive Models (GAMs) extend linear models to include non-linear
relationships using smooth functions like splines.
Question 2:
Regression trees predict continuous outcomes by splitting data into subsets, whereas
classification trees predict categorical outcomes.
Question 3:
AdaBoost combines multiple weak learners (e.g., decision trees) to form a strong classifier
by focusing on misclassified samples.
Question 4:
Gradient Boosting minimizes loss iteratively using numerical optimization techniques to
improve predictions.
Question 5:
Decision trees split data based on feature values to form a tree structure for classifying or
predicting outcomes.
UNIT IV
Question 1:
Neural networks consist of input, hidden, and output layers that simulate the human brain
to learn from data.
Question 2:
Backpropagation is an algorithm that trains neural networks by adjusting weights to
minimize the error between predicted and actual outputs.
Question 3:
Support Vector Machines (SVM) find the optimal hyperplane that separates classes with
maximum margin.
Question 4:
The k-Nearest Neighbor (k-NN) method classifies data points based on the majority class of
their closest neighbors.
Question 5:
Kernels in SVMs transform data into higher-dimensional spaces, allowing non-linear
decision boundaries to be identified.
UNIT V
Question 1:
Clustering groups similar data points together. Types include K-Means clustering (partition-
based) and hierarchical clustering.
Question 2:
Principal Component Analysis (PCA) reduces dimensionality by transforming features into a
smaller set of uncorrelated components.
Question 3:
Association rules identify patterns and relationships in data. Example: 'Customers who buy
bread also buy milk.'
Question 4:
Random forests combine multiple decision trees to improve accuracy and reduce overfitting
using bagging techniques.
Question 5:
Unsupervised learning identifies patterns in unlabeled data. Examples include clustering
and dimensionality reduction.