CampusX DSMP 2.0 Syllabus
CampusX DSMP 2.0 Syllabus
Program
Week 5: Numpy
1. Session 13: Numpy Fundamentals
● Numpy Theory
● Numpy array
● Matrix in numpy
● Numpy array attributes
● Array operations
● Scalar and Vector operations
● Numpy array functions
i. Dot product
ii. Log, exp, mean, median, std, prod, min, max, trigo, variance, ceil,
floor, slicing, iteration
iii. Reshaping
iv. Stacking and splitting
2. Session 14: Advanced Numpy
● Numpy array vs Python List
● Advanced, Fancy and Boolean Indexing
● Broadcasting
● Mathematical operations in numpy
● Sigmoid in numpy
● Mean Squared Error in numpy
● Working with missing values
● Plotting graphs
3. Session 15: Numpy Tricks
● Various numpy functions like sort, append, concatenate, percentile, flip,
Set functions, etc.
4. Session on Web Development using Flask
● What is Flask library
● Why to use Flask?
● Building login system and name entity recognition with API
Week 6: Pandas
1. Session 16: Pandas Series
● What is Pandas?
● Introduction to Pandas Series
● Series Methods
● Series Math Methods
● Series with Python functionalities
● Boolean Indexing on Series
● Plotting graphs on series
2. Session 17: Pandas DataFrame
● Introduction Pandas DataFrame
● Creating DataFrame and read_csv()
● DataFrame attributes and methods
● Dataframe Math Methods
● Selecting cols and rows from dataframe
● Filtering a Dataframe
● Adding new columns
● Dataframe function – astype()
3. Session 18: Important DataFrame Methods
● Various DataFrame Methods
● Sort, index, reset_index, isnull, dropna, fillna, drop_duplicates,
value_counts, apply, etc.
4. Session on API Development using Flask
● What is API?
● Building API using Flask
● Hands-on project
5. Session on Numpy Interview Question
Capstone Project:
1. Session 1 on Capstone Project | Data Gathering
a. Project overview in details
b. Gather data for the project
c. Details of the data
2. Session 2 on Capstone Project | Data Cleaning
a. Merging House and Flats Data
b. Basic Level Data Cleaning
3. Session 3 on Capstone Project | Feature Engineering
a. Feature Engineering on Columns:
i. additionalRoom
ii. areaWithType
iii. agePossession
iv. furnishDetails
v. features : luxury Score
4. Session 4 on Capstone Project | EDA
a. Univariate Analysis
b. PandasProfiling
c. Multivariate Analysis
5. Session 5 on Capstone Project | Outlier Detection and Removal
a. Outlier Detection And Removal
6. Session 6 on Capstone Project | Missing Value Imputation
a. Outlier Detection and Removal on area and bedroom
b. Missing Value Imputation
7. Session 7 on Capstone Project | Feature Selection
a. Feature Selection
i. Correlation Technique
ii. Random Forest Feature Importance
iii. Gradient Boosting Feature Importance
iv. Permutation Importance
v. LASSO
vi. Recursive Feature Elimination
vii. Linear Regression with Weights
viii. SHAP (Explainable AI)
b. Linear Regression - Base Model
i. One-Hot Encoding
ii. Transformation
iii. Pipeline for Linear Regression
c. SVR
8. Session 8 on Capstone Project | Model Selection & Productionalization
a. Price Prediction Pipeline
i. Encoding Selection
1. Ordinal Encoding
2. OHE
3. OHE with PCA
4. Target Encoding
ii. Model Selection
b. Price Prediction Web Interface -Streamlit
9. Session 9 on Capstone Project | Building the Analytics Module
a. geo map
b. word cloud amenities
c. scatterplot -> area vs price
d. pie chart bhk filter by sector
e. side by side boxplot bedroom price
f. distplot of price of flat and house
10. Session 10 on Capstone Project | Building the Recommender System
a. Recommender System using TopFacilities
b. Recommender System using Price Details
c. Recommender System using LocationAdvantages
11. Session 11 on Capstone Project | Building the Recommender System Part 2
a. Evaluating Recommendation Results
b. Web Interface for Recommendation (Streamlit)
12. Session 12 on Capstone Project | Building the Insights Module
13. Session 13 on Capstone Project | Deploying the application on AWS
—-----------------------------------------------------------------------
MLOps Curriculum
4. Session 3: Reproducibility
a. Story
b. Industry Tools
c. Cookiecutter
i. Step 1: Install the Cookiecutter Library and start a project
ii. Step 2: Explore the Template Structure
iii. Step 3: Customize the Cookiecutter Variables
iv. Step 4: Benefits of Using Cookiecutter Templates in Data
Science
5. Session 4: Data Versioning Control
a. Introduction
b. Prerequisites
c. Setup
i. Step 1: Initialize a Git repository
ii. Step 2: Set up DVC in your project
iii. Step 3: Add a dataset to your project
iv. Step 4: Commit changes to Git
v. Step 5: Create and version your machine learning pipeline
vi. Step 6: Track changes and reproduce experiments
6. Doubt Clearance Session 2
a. Assignment Solution on DVC: 10:19
b. Doubt Clearance
i. DVC with G-Drive 42:50
ii. DVC Setup Error: 48:45
iii. Containerization with Virtual Environment 49:40
iv. Create Version and ML Pipeline: 56:50
v. DVC Checkout 57:50
vi. How to which ID(commit) to go to - through commit
messages? 1:00:00
vii. What is Kubernetes?
viii. Not able to understand by reading documentation 1:04:30
ix. Getting no of commits 11k+ 1:09:40
Feature Engineering
3. Feature Scaling
a. Importance in Modeling: Effects on algorithms; Influence on
performance
b. Common Methods: Min-Max; Standardization; Robust Scaler
4. Feature Encoding
a. Categorical Encoding: One-hot; Label; Binary; Target
b. Handling Ordinal Variables: Mapping; Custom encoding strategies
5. Sklearn Transformers
a. Utilization in Pipelines: Building preprocessing pipelines; Combining
steps
b. Custom Transformers: Creating and implementing
c. Commonly Used Transformers: Overview and application
6. Session 2 on Discretization
a. Types of Discretization
i. Uniforn Binning
ii. Quantile Binning
iii. K-Means Binning
iv. Decision Tree Based tiening
v. Custom Binning
vi. Threshold Binning (Binarization)
7. Session 1 on Handling Missing Data
a. Missing Values
b. The missingo library
c. Why missing velues occur?
d. Types of missing values
e. How missing visual impact ML?
f. How to handle missing values?
i. Removing
ii. Imputing
g. Removing Missing Data
Unsupervised Learning
1. KMeans Clustering
a. Session 1 on KMeans Clustering
i. Plan of Attack (Getting Started with Clustering)
ii. Types of ML Learning
iii. Applications of Clustering
iv. Geometric Intuition of K-Means
v. Elbow Method for Deciding Number of Clusters
1. Code Example
2. Limitation of Elbow Method
vi. Assumptions of KMeans
vii. Limitations of K Means
b. Session 2 on KMeans Clustering
i. Recap of Last class
ii. Assignment Solution
iii. Silhouette Score
iv. Kmeans Hyperparameters
1. Number of Clusters(k)
2. Initialization Method (K Means++)
3. Number of Initialization Runs (n_init)
4. Maximum Number of Iterations (max_iter)
5. Tolerance (tol)
6. Algorithm (auto, full, ..)
7. Random State
v. K Means ++
c. Session 3 on KMeans Clustering
i. K-Means Mathematical Formulation (Loyd’s Algorithm)
ii. K-Means Time and Space Complexity
iii. Mini Batch K Means
iv. Types of Clustering
1. Partitional Clustering
2. Hierarchical Clustering
3. Density Based Clustering
4. Distribution/Model-based Clustering
d. K-Means Clustering Algorithms from Scratch in Python
i. Algorithms implementation from Scratch in Python
1. Adaboost
a. Introduction: Overview and intuition of the algorithm
b. Components: Weak Learners, Weights, Final Model
c. Hyperparameters: Learning Rate, Number of Estimators
d. Applications: Use Cases in Classification and Regression
2. Stacking
a. Introduction: Concept of model ensembling
b. Steps: Base Models, Meta-Model, Final Prediction
c. Variations: Different approaches and modifications
d. Best Practices: Tips for effective stacking
3. LightGBM
a. Introduction: Gradient boosting framework
b. Key Features: Handling large datasets, Categorical feature support
c. Parameters: Core hyperparameters and their tuning
d. Applications: Common use cases and performance considerations
4. CatBoost
a. Introduction: Categorical boosting algorithm
b. Handling Categorical Features: Algorithmic approach
c. Key Parameters: Learning Rate, Depth, Iterations
d. Practical Usage: Tips and common practices
Miscellaneous Topics
1. NoSQL
a. Introduction: Overview of NoSQL databases
b. Types: Document, Key-Value, Column-Family, Graph
c. Use Cases: When to use NoSQL over SQL databases
d. Popular Databases: MongoDB, Cassandra, Redis, Neo4j
2. Model Explainability
a. Introduction: Importance of interpretable models
b. Techniques: LIME, SHAP, Feature Importance
c. Application: Applying techniques to various models
d. Best Practices: Ensuring reliable and accurate explanations
3. AutoML
a. Introduction: Automated machine learning overview
b. Platforms: Google AutoML, H2O.ai, DataRobot
c. Capabilities: Data preprocessing, Model selection, Hyperparameter
tuning
d. Evaluation: Assessing AutoML performance and reliability
4. FastAPI
a. Introduction: Modern, fast web framework for building APIs
b. Features: Type checking, Automatic validation, Documentation
c. Building APIs: Steps and best practices
d. Deployment: Hosting and scaling FastAPI applications
5. AWS Sagemaker
a. Introduction: Fully managed service for machine learning
b. Features: Model building, Training, Deployment
c. Usage: Workflow from data preprocessing to model deployment
d. Best Practices: Optimizing costs and performance