CampusX DSMP Syllabus
CampusX DSMP Syllabus
Program
Week 5: Numpy
1. Session 13: Numpy Fundamentals
● Numpy Theory
● Numpy array
● Matrix in numpy
● Numpy array attributes
● Array operations
● Scalar and Vector operations
● Numpy array functions
i. Dot product
ii. Log, exp, mean, median, std, prod, min, max, trigo, variance, ceil,
floor, slicing, iteration
iii. Reshaping
iv. Stacking and splitting
2. Session 14: Advanced Numpy
● Numpy array vs Python List
● Advanced, Fancy and Boolean Indexing
● Broadcasting
● Mathematical operations in numpy
● Sigmoid in numpy
● Mean Squared Error in numpy
● Working with missing values
● Plotting graphs
3. Session 15: Numpy Tricks
● Various numpy functions like sort, append, concatenate, percentile, flip,
Set functions, etc.
4. Session on Web Development using Flask
● What is Flask library
● Why to use Flask?
● Building login system and name entity recognition with API
Week 6: Pandas
1. Session 16: Pandas Series
● What is Pandas?
● Introduction to Pandas Series
● Series Methods
● Series Math Methods
● Series with Python functionalities
● Boolean Indexing on Series
● Plotting graphs on series
2. Session 17: Pandas DataFrame
● Introduction Pandas DataFrame
● Creating DataFrame and read_csv()
● DataFrame attributes and methods
● Dataframe Math Methods
● Selecting cols and rows from dataframe
● Filtering a Dataframe
● Adding new columns
● Dataframe function – astype()
3. Session 18: Important DataFrame Methods
● Various DataFrame Methods
● Sort, index, reset_index, isnull, dropna, fillna, drop_duplicates,
value_counts, apply, etc.
4. Session on API Development using Flask
● What is API?
● Building API using Flask
● Hands-on project
5. Session on Numpy Interview Question
Capstone Project:
1. Session 1 on Capstone Project | Data Gathering
a. Project overview in details
b. Gather data for the project
c. Details of the data
2. Session 2 on Capstone Project | Data Cleaning
a. Merging House and Flats Data
b. Basic Level Data Cleaning
3. Session 3 on Capstone Project | Feature Engineering
a. Feature Engineering on Columns:
i. additionalRoom
ii. areaWithType
iii. agePossession
iv. furnishDetails
v. features : luxury Score
4. Session 4 on Capstone Project | EDA
a. Univariate Analysis
b. PandasProfiling
c. Multivariate Analysis
5. Session 5 on Capstone Project | Outlier Detection and Removal
a. Outlier Detection And removal
6. Session 6 on Capstone Project | Missing Value Imputation
a. Outlier Detection and Removal on area and bedroom
b. Missing Value Imputation
7. Session 7 on Capstone Project | Feature Selection
a. Feature Selection
i. Correlation Technique
ii. Random Forest Feature Importance
iii. Gradient Boosting Feature Importance
iv. Permutation Importance
v. LASSO
vi. Recursive Feature Elimination
vii. Linear Regression with Weights
viii. SHAP (Explainable AI)
b. Linear Regression - Base Model
i. One-Hot Encoding
ii. Transformation
iii. Pipeline for Linear Regression
c. SVR
8. Session 8 on Capstone Project | Model Selection & Productionalization
a. Price Prediction Pipeline
i. Encoding Selection
1. Ordinal Encoding
2. OHE
3. OHE with PCA
4. Target Encoding
ii. Model Selection
b. Price Prediction Web Interface -Streamlit
9. Session 9 on Capstone Project | Building the Analytics Module
a. geo map
b. word cloud amenities
c. scatterplot -> area vs price
d. pie chart bhk filter by sector
e. side by side boxplot bedroom price
f. distplot of price of flat and house
10. Session 10 on Capstone Project | Building the Recommender System
a. Recommender System using TopFacilities
b. Recommender System using Price Details
c. Recommender System using LocationAdvantages
11. Session 11 on Capstone Project | Building the Recommender System Part 2
a. Evaluating Recommendation Results
b. Web Interface for Recommendation (Streamlit)
12. Session 12 on Capstone Project | Building the Insights Module
13. Session 13 on Capstone Project | Deploying the application on AWS
—-----------------------------------------------------------------------
MLOps Curriculum
Feature Engineering
4. Feature Encoding
a. Categorical Encoding: One-hot; Label; Binary; Target
b. Handling Ordinal Variables: Mapping; Custom encoding strategies
5. Sklearn Transformers
a. Utilization in Pipelines: Building preprocessing pipelines; Combining steps
b. Custom Transformers: Creating and implementing
c. Commonly Used Transformers: Overview and application
Unsupervised Learning
1. K-Means
a. Introduction: Basic principles and algorithm intuition
b. Applications: Common use cases and examples
c. Steps: Initialization, Assignment, Update
d. Optimization: Choosing K, Convergence criteria, Variations of K-Means
2. DBSCAN
a. Introduction: Principles of density-based clustering
b. Key Concepts: Density Reachability, Core Points, Border Points
c. Parameter Tuning: Choosing Eps and MinPts
d. Applications and Use Cases
3. T-SNE
a. Introduction: Algorithm overview and dimensionality reduction
b. Hyperparameters: Perplexity, Learning Rate
c. Applications: Visualizing high-dimensional data
d. Comparison: With other dimensionality reduction techniques
6. Apriori
a. Introduction: Principles of association rule mining
b. Key Concepts: Support, Confidence, Lift
c. Algorithm Steps: Candidate generation, Pruning
d. Applications: Market Basket Analysis, Recommender Systems
1. Adaboost
a. Introduction: Overview and intuition of the algorithm
b. Components: Weak Learners, Weights, Final Model
c. Hyperparameters: Learning Rate, Number of Estimators
d. Applications: Use Cases in Classification and Regression
2. Stacking
a. Introduction: Concept of model ensembling
b. Steps: Base Models, Meta-Model, Final Prediction
c. Variations: Different approaches and modifications
d. Best Practices: Tips for effective stacking
3. LightGBM
a. Introduction: Gradient boosting framework
b. Key Features: Handling large datasets, Categorical feature support
c. Parameters: Core hyperparameters and their tuning
d. Applications: Common use cases and performance considerations
4. CatBoost
a. Introduction: Categorical boosting algorithm
b. Handling Categorical Features: Algorithmic approach
c. Key Parameters: Learning Rate, Depth, Iterations
d. Practical Usage: Tips and common practices
Miscellaneous Topics
1. NoSQL
a. Introduction: Overview of NoSQL databases
b. Types: Document, Key-Value, Column-Family, Graph
c. Use Cases: When to use NoSQL over SQL databases
d. Popular Databases: MongoDB, Cassandra, Redis, Neo4j
2. Model Explainability
a. Introduction: Importance of interpretable models
b. Techniques: LIME, SHAP, Feature Importance
c. Application: Applying techniques to various models
d. Best Practices: Ensuring reliable and accurate explanations
3. AutoML
a. Introduction: Automated machine learning overview
b. Platforms: Google AutoML, H2O.ai, DataRobot
c. Capabilities: Data preprocessing, Model selection, Hyperparameter tuning
d. Evaluation: Assessing AutoML performance and reliability
4. FastAPI
a. Introduction: Modern, fast web framework for building APIs
b. Features: Type checking, Automatic validation, Documentation
c. Building APIs: Steps and best practices
d. Deployment: Hosting and scaling FastAPI applications
5. AWS Sagemaker
a. Introduction: Fully managed service for machine learning
b. Features: Model building, Training, Deployment
c. Usage: Workflow from data preprocessing to model deployment
d. Best Practices: Optimizing costs and performance
Note: The schedule is tentative and topics can be added/removed from it in the future.