0% found this document useful (0 votes)
22 views

Data Science Master

Uploaded by

pavan.s0120
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Data Science Master

Uploaded by

pavan.s0120
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

DATA SCIENCE: MASTER

Module 1 : Fundamentals of Programming


Chapter 2 : Python for Data Science: Introduction
2.1 Python, Anaconda and relevant packages installations
2.2 Why learn Python?
2.3 Keywords and Identifiers
2.4 Comments, Indentation, and Statements
2.5 Variables and Datatypes in Python
2.6 Standard Input and Output
2.7 Operators
2.8 Control flow: If...else
2.9 Control flow: while loop
2.10 Control flow: for loop
2.11 Control flow: break and continue

Chapter 3 : Python for Data Science: Data Structures


3.1 Lists
3.2 Tuples part 1
3.3 Tuples part 2
3.4 Sets
3.5 Dictionary
3.6 Strings

Chapter 4 : Python for Data Science: Functions


4.1 Introduction
4.2 Types of Functions
4.3 Function Arguments
4.4 Recursive Functions
4.5 Lambda Functions
4.6 Modules
4.7 Packages
4.8 File Handling
4.9 Exception Handling
4.10 Debugging Python

Chapter 5 : Python for Data Science: Functions


5.1 Introduction to NumPy.
5.2 Numerical operations.
Chapter 7 : Python for Data Science: Pandas
7.1 Getting started with pandas
7.2 Data Frame Basics
7.3 Key Operations on Data Frames.

Chapter 8 : Computational Complexity: an Introduction


8.1 Space and Time Complexity: Find the largest number in a list
8.2 Binary search
8.3 Find elements common in two lists.
8.4 Find elements common in two lists using a Hashtable/Dict

Module 2: SQL
9.1 Introduction to databases.
9.2 Why SQL?
9.3 Execution of an SQL statement.
9.4 IMDB Dataset
9.5 Installing MySQL
9.6 Load IMDB data.
9.7 Use, Describe, Show table.
9.8 Select.
9.9 Limit, Offset.
9.10 Order By.
9.11 Distinct.
9.12 Where, Comparison Operators, NULL.
9.13 Logic Operators.
9.14 Aggregate Functions: COUNT, MIN, MAX, AVG, SUM.
9.15 Group By.
9.16 Having.
9.17 Order of Keywords.
9.18 Join and Natural Join.
9.19 Inner, Left, Right, and Outer Joins.
9.20 Sub Queries/Nested Queries/Inner Queries.
9.21 DML: INSERT
9.22 DML: UPDATE, DELETE
9.23 DML: CREATE,TABLE
9.24 DDL: ALTER, ADD, MODIFY, DROP
9.25 DDL: DROP TABLE, TRUNCATE, DELETE
9.26 Data Control Language: GRANT, REVOKE
9.27 Learning Resources.
Module 3: Data Science: Exploratory Data Analysis and Data
Visualization using python
Chapter 6 : Python for Data Science: Visualization lib
6.1 Introduction to Matplotlib
6.1 Introduction to Seaborn

Chapter 1 : Plotting for exploratory data analysis


(EDA)
10.1 Introduction to Iris dataset and 2D scatter-plot
10.2 3D Scatter-plot.
10.3 Pair plots.
10.4 Limitations of Pair plots
10.5 Histogram and introduction to PDF(Probability Density Function)
10.6 Univariate analysis using PDF.
10.7 CDF(Cumulative distribution function)
10.8 Variance, Standard Deviation
10.9 Median
10.10 Percentiles and Quantiles
10.11 IQR(InterQuartile Range), MAD(Median Absolute Deviation).
10.12 Box-plot with whiskers
10.13 Violin plots.
10.14 Summarizing plots, Univariate, Bivariate, and Multivariate analysis.
10.15 Multivariate probability density, contour plot.

Module 4: Data Analysis and Data Visualization using Tableau

Module 5: Data science: applied mathematics

Chapter 2 : Linear Algebra


11.1 Why learn it?
11.2 Introduction to Vectors (2-D, 3-D, n-D), row vectors and column
vector
11.3 Dot product and the angle between 2 vectors.
11.4 Projection and unit vector
11.5 Equation of a line (2-D), plane(3-D) and hyperplane (n-D)
11.6 Distance of a point from a plane/hyperplane, half-spaces.
11.7 Equation of a circle (2-D), sphere (3-D) and hypersphere (n-D)
11.8 Equation of an ellipse (2-D), ellipsoid (3-D) and hyperellipsoid (n-D)
11.9 Square, Rectangle.
11.10 Hypercube, Hyper cuboid.

Chapter 3 : Probability and Statistics


12.1 Introduction to Probability and Statistics.
12.2 Population & Sample.
12.3 Gaussian/Normal Distribution and its PDF(Probability Density
Function).
12.4 CDF(Cumulative Density Function) of Gaussian/Normal Distribution
12.5 Symmetric distribution, Skewness, and Kurtosis
12.6 Standard normal variate (z) and standardization.
12.7 Kernel density estimation.
12.8 Sampling distribution & Central Limit Theorem.
12.9 Q-Q Plot: Is a given random variable Gaussian distributed?
12.10 How distributions are used?
12.11 Chebyshev’s inequality
12.12 Discrete and Continuous Uniform distributions.
12.13 How to randomly sample data points. [Uniform Distribution]
12.14 Bernoulli and Binomial distribution
12.15 Log-normal
12.16 Power law distribution
12.17 Box-Cox transform.
12.18 Application of Non-Gaussian Distributions?
12.19 Co-variance
12.20 Pearson Correlation Coefficient
12.21 Spearman Rank Correlation Coefficient
12.22 Correlation vs Causation
12.23 How to use Correlations?
12.24 Confidence Intervals(C.I) Introduction
12.25 Computing confidence-interval has given the underlying distribution
12.26 C.I for the mean of a normal random variable.
12.27 Confidence Interval using bootstrapping.
12.28 Hypothesis Testing methodology, Null-hypothesis, p-value
12.29 Hypothesis testing intuition with coin toss example
12.30 Resampling and permutation test.
12.33 Hypothesis Testing: another example.
12.34 Resampling and permutation test: another example.
12.35 How to use Hypothesis testing?
12.36 Proportional Sampling.

Module 6: Data science: dimensionality reduction

Chapter 5 : Dimensionality reduction and Visualization:


14.1 What is dimensionality reduction?
14.2 Row vector, and Column vector.
14.3 How to represent a dataset?
14.4 How to represent a dataset as a Matrix.
14.5 Data preprocessing: Feature Normalization
14.6 Mean of a data matrix.
14.7 Data preprocessing: Column Standardization
14.8 Co-variance of a Data Matrix.
14.9 MNIST dataset (784 dimensional)
14.10 Code to load MNIST data set.
Chapter 6 : Principal Component Analysis.
15.1 Why learn it.
15.2 Geometric intuition.
15.3 Mathematical objective function.
15.4 Alternative formulation of PCA: distance minimization
15.5 Eigenvalues and eigenvectors.
15.6 PCA for dimensionality reduction and visualization.
15.7 Visualize MNIST dataset.
15.8 Limitations of PCA
15.9 Code example.
15.10 PCA for dimensionality reduction (not-visualization)

Module 7 : Foundations of Natural Language Processing and


Machine Learning

Chapter 2 : Classification and Regression Models: K-Nearest Neighbors


19.1 How “Classification” works?
19.2 Data matrix notation.
19.3 Classification vs Regression (examples)
19.4 K-Nearest Neighbors Geometric intuition with a toy example.
19.5 Failure cases of K-NN
19.6 Distance measures: Euclidean(L2) , Manhattan(L1), Minkowski,
Hamming
19.8 How to measure the effectiveness of k-NN?
19.9 Test/Evaluation time and space complexity.
19.10 k-NN Limitations.
19.11 Decision surface for K-NN as K changes.
19.12 Overfitting and Underfitting.
19.13 Need for Cross validation.
19.14 K-fold cross validation.
19.15 Visualizing train, validation and test datasets
19.16 How to determine overfitting and underfitting?
19.17 Time based splitting
19.21 Binary search tree
19.22 How to build a kd-tree.
19.23 Find nearest neighbors using kd-tree
19.24 Limitations of kd-tree

Chapter 4 : Classification algorithms in various situations:


21.1 Introduction
21.2 Imbalanced vs balanced dataset
21.3 Multi-class classification.
21.4 k-NN, given a distance or similarity matrix
21.5 Train and test set differences.
21.6 Impact of Outliers
21.7 Local Outlier Factor(Simple solution: mean distance to k-NN).
21.12 Impact of Scale & Column standardization.
21.14 Feature importance & Forward Feature Selection
21.15 Handling categorical and numerical features.
21.16 Handling missing values by imputation.
21.17 Curse of dimensionality.
21.18 Bias-Variance tradeoff.
21.19 Intuitive understanding of bias-variance.

Chapter 5 : Performance measurement of models:


22.1 Accuracy
22.2 Confusion matrix, TPR, FPR, FNR, TNR
22.3 Precision & recall, F1-score.
22.4 Receiver Operating Characteristic Curve (ROC) curve and AUC.
22.5 Log-loss.
22.6 R-Squared/ Coefficient of determination.
22.7 Median absolute deviation (MAD)
22.8 Distribution of errors.

Chapter 8 : Logistic Regression:


25.1 Geometric intuition of logistic regression
25.2 Sigmoid function: Squashing
25.3 Mathematical formulation of objective function.
25.4 Weight Vector.
25.5 L2 Regularization: Overfitting and Underfitting.
25.6 L1 regularization and sparsity.
25.12 Collinearity of features.
25.13 Train & Run time space and time complexity.
25.15 Non-linearly separable data & feature engineering.

Chapter 9 : Linear Regression.


26.1 Geometric intuition of Linear Regression.
26.2 Mathematical formulation.

Chapter 10 : Solving optimization problems


27.1 Differentiation.
27.2 Online differentiation tools
27.3 Maxima and Minima
27.4 Vector calculus: Grad
27.5 Gradient descent: geometric intuition.
27.6 Learning rate.
27.7 Gradient descent for linear regression.
27.9 Constrained optimization & PCA
27.10 Logistic regression formulation revisited.
27.11 Why L1 regularization creates sparsity?

Module 4 : Advanced Machine Learning algorithms

Chapter 7 : Naive Bayes


24.1 Conditional probability.
24.2 Independent vs Mutually exclusive events.
24.3 Bayes Theorem with examples.
24.4 Exercise problems on Bayes Theorem
24.5 Naive Bayes algorithm.
24.6 Toy example: Train and test stages.
24.7 Naive Bayes on Text data.
27.8 Laplace/Additive Smoothing.
24.9 Log-probabilities for numerical stability.
24.10 Bias and Variance tradeoff.
24.11 Feature importance and interpretability.
24.12 Imbalanced data
24.13 Outliers.
24.14 Missing values.
24.15 Handling Numerical features (Gaussian NB)
24.16 Multiclass classification.
24.17 Similarity or Distance matrix.
24.18 Large dimensionality.

Chapter 1 : Support Vector Machines (SVM)


29.1 Geometric intuition.
29.2 Mathematical derivation.
29.3 why we take values +1 and -1 for support vector planes
29.4 Loss function(Hinge Loss) based interpretation.
29.5 Dual form of SVM formulation.
29.6 Kernel trick.
29.7 Polynomial kernel.

Chapter 3 : Decision Trees


31.1 Geometric Intuition of decision tree: Axis parallel hyperplanes.
31.2 Sample Decision tree.
31.3 Building a decision Tree: Entropy(Intuition behind entropy)
31.4 Building a decision Tree: Information Gain
31.5 Building a decision Tree: Gini Impurity.
31.6 Building a decision Tree: Constructing a DT.
31.7 Building a decision Tree: Splitting numerical features.
31.8 Feature standardization.
31.9 Categorical features with many possible values.
31.10 Overfitting and Underfitting.
31.11 Train and Run time complexity
31.12 Regression using Decision Trees.

Chapter 5 : Ensemble Models:


33.1 What are ensembles?
33.2 Bootstrapped Aggregation (Bagging) Intuition.
33.3 Random Forest and their construction.
33.4 Bias-Variance tradeoff.
33.5 Train and Run-time Complexity.
33.6 Bagging: code Sample.
33.7 Extremely randomized trees.
33.8 Random Forest: Cases.
33.9 Boosting Intuition
33.10 Residuals, Loss functions, and gradients.
33.11 Gradient Boosting
33.12 Regularization by Shrinkage.
33.13 Train and Run time complexity.
33.14 XGBoost: Boosting + Randomization
33.15 AdaBoost: geometric intuition.
33.16 Stacking models.
33.17 Cascading classifiers.

Module 8 : Feature Engineering

Chapter 1 : Featurizations and Feature engineering.


34.1 Introduction.
34.2 Moving window for Time-series data.
34.3 Fourier decomposition.
34.4 Deep learning features: LSTM
34.5 Image histogram.
34.6 Key points: SIFT.
34.7 Deep learning features: CNN
34.8 Relational data.
34.9 Graph data.
34.10 Indicator variables.
34.11 Feature binning.
34.12 Interaction variables.
34.13 Mathematical transforms.
34.14 Model specific featurizations.
34.15 Feature orthogonality.
34.16 Domain specific featurizations.
34.17 Feature slicing.

Module 9: Productionisation and deployment of ML Models


Chapter 2 : Miscellaneous Topics
35.1 Calibration of Models: Need for calibration.
35.2 Calibration Plots.
35.10 Data Science Life Cycle.
35.11 Production and Deployment of Machine Learning Models.
35.12 Live Session:Productionalization and Deployment of Machine Learning
Models.

Module 10 : Data Mining(Unsupervised Learning) and


Recommender Systems
Chapter 1 : Unsupervised learning/Clustering
43.1 What is Clustering?
43.2 Unsupervised learning
43.3 Applications.
43.4 Metrics for Clustering.
43.5 K-Means: Geometric intuition, Centroids.
43.6 K-Means: Mathematical formulation: Objective function
43.7 K-Means Algorithm.
43.8 How to initialize: K-Means++
43.9 Failure cases/Limitations.
43.10 K-Medoids
43.11 Determining the right K.

Chapter 2 : Hierarchical clustering Technique


44.1 Agglomerative & Divisive, Dendrograms
44.2 Agglomerative Clustering.
44.3 Proximity methods: Advantages and Limitations.
44.4 Time and Space Complexity.
44.5 Limitations of Hierarchical Clustering.

Chapter 3 : DBSCAN (Density based clustering)


45.1 Density based clustering
45.2 MinPts and Eps: Density
45.3 Core, Border and Noise points.
45.4 Density edge and Density connected points.
45.5 DBSCAN Algorithm.
45.6 Hyper Parameters: MinPts and Eps.
45.7 Advantages and Limitations of DBSCAN.
45.8 Time and Space Complexity.

Chapter 4 : Recommender Systems and Matrix Factorization.


46.1 Problem formulation: Movie reviews.
46.2 Content based vs Collaborative Filtering.
46.3 Similarity based Algorithms.
46.4 Matrix Factorization: PCA, SVD.
46.5 Matrix Factorization: NMF.
46.6 Matrix Factorization for Collaborative filtering
46.7 Matrix Factorization for feature engineering.
46.8 Clustering as MF.
46.9 Hyperparameter tuning.
46.12 Word Vectors as MF.
46.13 Eigen-Faces

Module 11 : Neutral Networks, Computer Vision and Deep


Learning

Chapter 1 : Deep Learning: Neural Networks.


50.1 History of Neural networks and Deep Learning.
50.2 How Biological Neurons work?
50.3 Growth of biological neural networks.
50.4 Diagrammatic representation: Logistic Regression and Perceptron
50.5 Multi-Layered Perceptron (MLP).
50.6 Notation.
50.8 Training an MLP: Chain rule
50.9 Training an MLP: Memoization
50.10 Backpropagation algorithm.
50.11 Activation functions.
50.12 Vanishing Gradient problem.
50.13 Bias-Variance tradeoff.
50.14 Decision surfaces: Playground

Chapter 2 : Deep Learning: Deep Multi-layer perceptrons


51.1 Deep Multi-layer perceptrons: 1980s to 2010s
51.2 Dropout layers & Regularization.
51.3 Rectified Linear Units (ReLU).
51.4 Weight initialization.
51.5 Batch Normalization.
51.6 Optimizers: Hill-descent analogy in 2D
51.7 Optimizers: Hill descent in 3D and contours.
51.14 Which algorithm to choose when?
51.15 Gradient Checking and Clipping.
51.16 Softmax and cross-entropy for multi-class classification.
51.17 How to train a Deep MLP?
51.18 Auto Encoders.

Module 12 : Keras library


Chapter 3 : Deep Learning: Tensorflow and Keras.
52.1 Tensorflow and Keras Overview.
52.2 GPU vs CPU for Deep Learning.
52.3 Google Collaboratory.
52.4 Install TensorFlow.
52.5 Online documentation and tutorials.
52.6 Softmax Classifier on MNIST dataset.
52.7 MLP: Initialization
52.8 Model 1: Sigmoid activation.
52.9 Model 2: ReLU activation
52.10 Model 3: Batch Normalization.
52.11 Model 4 : Dropout.
52.12 MNIST classification in Keras.
52.13 Hyperparameter tuning in Keras.

Module 13 : Advanced deep learning


Chapter 4 : Deep Learning: Convolutional Neural Nets.
53.1 Biological inspiration: Visual Cortex
53.2 Convolution: Edge Detection on images.
53.3 Convolution: Padding and strides
53.4 Convolution over RGB images.
53.5 Convolutional layer.
53.6 Max-pooling.
53.7 CNN Training: Optimization
53.9 ImageNet dataset
53.10 Data Augmentation.
53.11 Convolution Layers in Keras
53.14 Residual Network.
53.15 Inception Network.
53.16 What is Transfer Learning?

Chapter 5 : Deep Learning: Long Short-Term Memory (LSTMS)


54.1 Why RNNs?
54.2 Recurrent Neural Network.
54.3 Training RNNs: Backprop.
54.4 Types of RNNs.
54.5 Need for LSTM/GRU.
54.6 LSTM.
54.7 GRUs.
54.8 Deep RNN.
54.9 Bidirectional RNN

You might also like