ML Fundamentals
ML Fundamentals
1. Basic Concepts:
Definition of Machine Learning: Explain what machine learning is and how it differs from
traditional programming.
ANSWER:
-----------------------------------
a. Programming Paradigm: In traditional programming, explicit instructions and rules and
logic are hard coded.
vs. Machine Learning: Algorithms are designed to learn patterns and rules from data rather
than being explicitly programmed
c. Feedback Loop:
Traditional Programming: Feedback on the performance of a traditional program is typically
based on how well it adheres to predefined requirements and specifications.
Machine Learning: Feedback in machine learning is based on the accuracy of predictions or
decisions made by the model on new data. This feedback loop is crucial for improving the
model's performance through iterative training and adjustment of parameters.
Example Question: "Can you explain the difference between supervised and unsupervised
learning? Provide examples of each."
Bias-Variance Tradeoff: Explain the concept of bias and variance in machine learning
models and how they affect model performance.
Example Question: "What is the bias-variance tradeoff? How do you balance bias and
variance in machine learning models?"
ANSWER:
---------------------
Two different sources of error .
Bias refers to the error introduced by approximating a real-world problem with a simplified
model. A high bias means the model is too simplistic and fails to capture the underlying
patterns in the data
Variance refers to the sensitivity of a model to the specific noise or fluctuations in the
training data. A high variance means the model is highly sensitive to noise and changes in
the training data.
Bias-Variance Tradeoff Curve: Evaluate different models and understand their bias-variance
tradeoff. Typically, as model complexity increases (e.g., adding more features or increasing
model capacity), bias decreases but variance increases.
HOW TO MEASURE:
A. Bias refers to the error introduced by the simplifying assumptions made by a model. It is
usually assessed by comparing the average prediction of the model to the true values in the
training set.
i. Compute the bias as the average of differences between predicted vs. actual:
bias = np.mean(predictions_train - y_train)
print(f"Bias: {bias}")
ii. For a BINARY TARGET, If the model has high precision but low recall, it might
be biased towards predicting emails as non-spam (false negatives are higher).
B. Calculate Variance: Compute the variance as the average variance of predictions across
different folds or subsets of the training data.
i. For a BINARY TARGET, use variance of Predicted Probabilities: For each
instance in the validation set, the classifier outputs a probability score indicating the
likelihood of belonging to the positive class.
C. High Bias: If the bias is significantly non-zero, it indicates that the model is underfitting
the training data and is not capturing the underlying patterns well enough.
D. High Variance: If the variance is high, it suggests that the model is overly sensitive to the
noise in the training data and may be overfitting
Balancing bias and variance involves finding the optimal level of model complexity that
minimizes both sources of error
Example Question: "What metrics would you use to evaluate a binary classification model?
Explain why you would choose those metrics."
It maximizes the use of available data. Instead of splitting the data into a single train-test
set, cross-validation allows each data point to be used for both training and validation
across different folds.
3. Feature Engineering:
Definition and Importance: Explain what feature engineering is and why it is crucial in
machine learning.
Example Question: "How would you handle missing data in a dataset before applying
machine learning algorithms?"
Example Question: "When would you choose a decision tree model over a support vector
machine (SVM)?"
ANSWER
-----------------
Choosing between a decision tree model and a support vector machine (SVM) depends on
several factors related to the dataset, problem requirements, and computational
considerations. Here are some scenarios where you might prefer a decision tree over an
SVM:
Requirement: If interpretability of the model is crucial, decision trees are often preferred.
Decision trees provide clear, interpretable rules that can be easily understood and
visualized. Each node in the tree represents a decision point based on a feature, making it
straightforward to explain how predictions are made.
Feature Importance:
Need to Rank Features: Decision trees inherently rank features by their importance in the
classification or regression task. This can provide insights into which features are most
relevant for making predictions.
Handling Non-linear Relationships:
Non-linear Data: Decision trees can model complex non-linear relationships in the data
without requiring explicit transformation of features. They can capture interactions between
variables effectively.
Handling Missing Values:
Robustness to Missing Data: Decision trees can handle missing values in the dataset by
making decisions based on available information at each node. This reduces the need for
imputation techniques.
Scalability:
Scalability to Large Datasets: Decision trees can handle large datasets efficiently, especially
with modern algorithms like random forests and gradient boosting machines, which
aggregate multiple decision trees for improved performance.
Complex Feature Space: SVMs perform well in high-dimensional spaces, where the number
of dimensions (features) is large compared to the number of samples. They are effective in
scenarios such as text classification or image recognition where feature spaces can be
complex.
Linear Separability:
Linearly Separable Data: SVMs work best when the data is linearly separable or can be
transformed into a linearly separable space using kernel methods (e.g., polynomial kernel,
radial basis function kernel).
Generalization Performance:
Optimal Margin: SVMs aim to find the hyperplane that maximizes the margin between
classes, leading to good generalization performance and robustness against overfitting,
especially in scenarios with limited training data.
Regularization:
Control over Overfitting: SVMs offer a regularization parameter (C parameter in SVM with
linear kernel) that allows controlling the trade-off between achieving a low training error and
maximizing the margin.
Small to Medium-Sized Datasets:
Efficient Training: SVMs can efficiently handle small to medium-sized datasets with
moderate computational resources, especially when using efficient implementations such as
SVM with linear kernel.
Considerations for Choosing:
Data Complexity: Assess the complexity of your dataset, including the number of features,
presence of non-linear relationships, and data size.
Computational Resources: Evaluate the computational cost and scalability of each model,
especially with respect to large datasets or real-time prediction requirements.
In summary, choose a decision tree model when interpretability, feature importance, and
handling of non-linear relationships are priorities. Opt for an SVM when dealing with
high-dimensional spaces, linearly separable data, and the need for optimal margin and
regularization. Understanding these differences helps in selecting the appropriate model
based on the specific characteristics and requirements of your machine learning problem.
Hyperparameter Tuning: Explain the concept of hyperparameters and techniques like grid
search and random search for tuning them.
Example Question: "What are some challenges you might face when deploying a machine
learning model into production?"
Example Question: "How would you build a machine learning model to predict credit risk for
leasing transactions? What data would you use, and how would you evaluate the model's
performance?"
ANSWER:
---------------------
Building a machine learning model to predict credit risk for leasing transactions involves
several key steps, from data preparation to model evaluation. Here’s a structured approach
to achieve this:
Data Cleaning: Preprocess the data to handle missing values, outliers, and ensure
consistency. Perform feature engineering to create new features if necessary (e.g.,
debt-to-income ratio, loan-to-value ratio).
Feature Selection: Select features that are most predictive of credit risk based on domain
knowledge and exploratory data analysis.
Train-Validation Split: Split the dataset into training and validation sets (e.g., 70% training,
30% validation) to train the model and evaluate its performance.
Model Training: Train the chosen model on the training dataset using appropriate
techniques such as cross-validation for hyperparameter tuning to optimize model
performance.
3. Model Evaluation:
Performance Metrics: Evaluate the model’s performance using appropriate metrics for
binary classification problems:
Considerations:
Imbalanced Data: Address class imbalance if present by using techniques such as
oversampling minority class, undersampling majority class, or using class-weighted models.
Monitoring: Continuously monitor the model’s performance over time to ensure it remains
accurate and reliable. This may involve periodic retraining with new data or model updates.
2. Feature Importance:
3. Robustness to Outliers:
a. Requirement: Your dataset has missing values that need to be handled effectively.
b. Reason: Random Forests can handle missing data without requiring imputation, as they
use averages of other trees to replace missing values during prediction.
5. Predictive Power:
a. Requirement: The relationship between your features and the target variable is linear or
can be reasonably approximated by a linear model.
b. Reason: Linear regression provides a clear interpretation of coefficients, indicating the
direction and magnitude of the effect of each feature on the target variable.
2. Interpretability:
a. Requirement: You need a model that is easy to interpret and explain to stakeholders.
b. Reason: Linear regression provides explicit formulas to explain predictions, making it
straightforward to understand how changes in input variables affect the output.
3. Computational Efficiency:
4. Assumptions of Linearity:
Requirement: Your data satisfies the assumptions of linear regression, such as normality of
residuals and independence of errors.
Reason: Linear regression performs well under these assumptions and can provide reliable
estimates of coefficients and statistical significance.
5. Baseline Model:
a. Requirement: You need a simple baseline model for comparison or as a starting point in
model development.
b. Reason: Linear regression is often used as a baseline model due to its simplicity and
ease of implementation.
Considerations:
Model Evaluation: Always evaluate both models (and potentially others) using appropriate
metrics and validation techniques to determine which performs best for your specific dataset
and objectives.
Hybrid Approaches: In some cases, a hybrid approach where predictions from both models
are combined (ensemble methods) might provide better performance than either model
alone.
a. Speed: XGBoost tends to be faster than Random Forests during inference for several
reasons:
iii. Both Random Forests and XGBoost can make predictions efficiently, but
XGBoost tends to have a slight edge due to its optimized tree structures and the ability to
perform predictions in parallel when using multi-core processors.
c. Use Case Considerations: While XGBoost is generally faster, Random Forests might still
be preferable in scenarios where interpretability of individual trees or robustness to outliers
is critical.
b. SVM is SLOWER to train than Random Forest and XGBoost, and requires lots of RAM -
so may not work for bigger datasets.
8. ChatGPT
Interpreting coefficients of logistic regression for categorical and boolean variables involves
understanding how these variables are encoded in the model and how their coefficients
relate to the log-odds of the target variable being in a particular category or having a
particular value.
Categorical Variables:
For categorical variables with k levels (or categories), logistic regression typically uses k−1
dummy variables to represent them. Let's denote a categorical variable X with k categories
as X1,X2,...,Xk−1X_1, X_2, ..., X_{k-1}X1,X2,...,Xk−1, where Xi indicates whether the
observation belongs to category i (compared to a reference category, often the first one).
• In logistic regression, when you have categorical variables with more than two
levels (also known as multinomial logistic regression), the coefficients (or parameters)
associated with each category represent the change in log-odds of the target variable being
in that category compared to a reference category, while holding all other variables constant
• Example: If X2 has a coefficient β2=0.5 then exp(0.5)=1.648 This means that the
odds of the target variable being in category 2 are 1.648 times higher than the odds of it
being in the reference category.
In short, a coefficient of 0.5 for a binary predictor variable in logistic regression indicates that
setting the binary variable
X to 1 (versus 0) increases the log-odds (and thus the odds) of the target variable being in
category 1 (success)
by 0.5 units or approximately 50%, holding all other variables constant.
When interpreting logistic regression coefficients, the exponentiated coefficient (obtained by
taking the exponential function 𝑒^𝛽 of the coefficient value) gives the odds ratio associated
with the predictor variable
Numeric variables:
Specifically, 𝛽1 indicates the expected change in the log-odds of the target variable for a
one-unit increase in x.
Logistic regression does not make many of the key assumptions of linear regression and
general linear models that are based on ordinary least squares algorithms – particularly
regarding linearity, normality, homoscedasticity, and measurement level.
First, logistic regression does not require a linear relationship between the dependent and
independent variables. Second, the error terms (residuals) do not need to be normally
distributed. Third, homoscedasticity is not required. Finally, the dependent variable in
logistic regression is not measured on an interval or ratio scale.
First, binary logistic regression requires the dependent variable to be binary and ordinal
logistic regression requires the dependent variable to be ordinal.
**Fourth, logistic regression assumes linearity of independent variables and log odds of the
dependent variable.
Although this analysis does not require the dependent and independent variables to be
related linearly, it requires that the independent variables are linearly related to the log odds
of the dependent variable.
Finally, logistic regression typically requires a large sample size. A general guideline is that
you need at minimum of 10 cases with the least frequent outcome for each independent
variable in your model. For example, if you have 5 independent variables and the expected
probability of your least frequent outcome is .10, then you would need a minimum sample
size of 500 (10*5 / .10).
-One very useful notion of the likelihood of an event is the odds. The odds of an event
is the ratio of the probability of the event occurring to the probability of the event not
occurring. So, for example, if the event has an 80% probability of occurrence, the odds
are 80:20 or 4:1.
10. Progressive Leasing, being a company involved in lease-to-own financing primarily for
consumers with low credit scores, would likely employ and build machine learning models
tailored to various aspects of their business operations. Here are some types of machine
learning models Progressive Leasing might utilize:
Time Series Forecasting: Predicting demand for lease products based on historical sales
data, seasonality, and external factors like economic indicators.
Optimization Models: Balancing inventory levels and lease approvals to minimize stockouts
and maximize lease acceptance rates.
Customer Lifetime Value Prediction:
Regression Models: Estimating the expected revenue from a customer over their entire
leasing period based on historical customer behavior and demographics.
Survival Analysis: Predicting the likelihood of customers staying with Progressive Leasing
over time, considering lease renewal patterns and customer churn.
Natural Language Processing (NLP) for Customer Interactions:
11. Here are some likely interview questions related to recommender systems:
Basic Concepts:
What is a recommender system? Can you explain the main types of recommender
systems?
ANSWER:
------------------------------
a. Collaborative Filtering: Collaborative filtering methods recommend items based on the
preferences of other users. They do not require item or user attributes but instead rely on
historical user-item interactions
i. It creates user profiles based on item attributes and recommends items that
match the user's profile.
a. Historical Data: User interactions with items (e.g., ratings,
purchases, views).
b. Explicit Feedback: User-provided ratings or explicit preferences for
certain attributes (e.g., genres they like).
c. Implicit Feedback: Indicators of preference inferred from user
behavior (e.g., time spent on an item page, clicks).
How would you evaluate the performance of a recommender system? What metrics would
you use?
ANSWER:
-------------------
Offline Metrics:
1. Precision
2. Recall
3. F1- score
4. NDCG -Measures the ranking quality by considering the position of relevant items in the
ranked list.
Business Metrics:
1. Conversion rate
2. Revenue
Collaborative Filtering:
Describe how content-based filtering works. What are its advantages and limitations?
How would you handle the cold start problem in content-based filtering?
Hybrid Recommender Systems:
What are hybrid recommender systems? Provide examples of how you would combine
different approaches (e.g., collaborative filtering and content-based filtering).
Matrix Factorization:
Explain matrix factorization techniques such as Singular Value Decomposition (SVD) and
Alternating Least Squares (ALS). How are they used in recommender systems?
Evaluation and Metrics:
How would you measure the accuracy and effectiveness of a recommender system?
Discuss the trade-offs between accuracy and diversity in recommender systems.
Challenges and Practical Considerations:
What are some common challenges you might encounter when deploying a recommender
system in a real-world setting?
How would you handle bias and fairness issues in recommender systems, especially in the
context of financial services like lease-to-own?
Case Studies and Practical Applications:
Can you describe a project or case study where you implemented or improved a
recommender system? What were the key challenges and outcomes?
How would you tailor a recommender system for Progressive Leasing's customer base and
product offerings?