0% found this document useful (0 votes)
72 views43 pages

DsNaIT v2.0

This document outlines a data science course that covers 120 hours of content over Python, R, Hadoop, Spark, Azure ML, deep learning, AI, machine learning, data analysis, data visualization with Matplotlib and Seaborn, Pandas, SQL, regular expressions, object oriented programming in Python, Dash, Plotly, feature engineering, model selection, and case studies applying these techniques. Key topics include introductory and advanced Python and R, big data technologies, the data science process, statistics, machine learning algorithms, and developing dashboards and interactive visualizations.

Uploaded by

Ganesh Peketi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views43 pages

DsNaIT v2.0

This document outlines a data science course that covers 120 hours of content over Python, R, Hadoop, Spark, Azure ML, deep learning, AI, machine learning, data analysis, data visualization with Matplotlib and Seaborn, Pandas, SQL, regular expressions, object oriented programming in Python, Dash, Plotly, feature engineering, model selection, and case studies applying these techniques. Key topics include introductory and advanced Python and R, big data technologies, the data science process, statistics, machine learning algorithms, and developing dashboards and interactive visualizations.

Uploaded by

Ganesh Peketi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 43

Python 60 Hours

R Program 20 Hours
Hadoop 10 Hours
Spark 20 Hours
Azure ML 10 Hours
Introduction to Data Science
Deep Learning & AI
Introduction to Deep Learning & AI & Machine Learning
Deep Learning: A revolution in Artificial Intelligence
Limitations of Machine Learning
What is Deep Learning?
Need for Data Scientists
Foundation of Data Science
What is Business Intelligence
What is Data Analysis
What is Data Mining
What is Machine Learning
Analytics vs Data Science
Value Chain
Types of Analytics
Lifecycle Probability
Analytics Project Lifecycle
Advantage of Deep Learning over Machine learning
Reasons for Deep Learning
Real-Life use cases of Deep Learning
Review of Machine Learning
Data
Basis of Data Categorization
Types of Data
Data Collection Types
Forms of Data & Sources
Data Quality & Changes
Data Quality Issues
Data Quality Story
What is Data Architecture
Components of Data Architecture
OLTP vs OLAP
How is Data Stored?
Big Data
What is Big Data?
5 Vs of Big Data
Big Data Architecture
Big Data Technologies
Big Data Challenge
Big Data Requirements
Big Data Distributed Computing & Complexity
Hadoop
Map Reduce Framework
Hadoop Ecosystem
Data Science Deep Dive
What Data Science is
Why Data Scientists are in demand
What is a Data Product
The growing need for Data Science
Large Scale Analysis Cost vs Storage
Data Science Skills
Data Science Use Cases

Data Science Project Life Cycle & Stages


Data Acuqisition
Where to source data
Techniques
Evaluating input data
Data formats
Data Quantity
Data Quality
Resolution Techniques
Data Transformation
File format Conversions
Annonymization
Statistics
Whats is Statistics
Descriptive Statistics
Central Tendency Measures
The Story of Average
Dispersion Measures
Data Distributions
Central Limit Theorem
What is Sampling
Why Sampling
Sampling Methods
Inferential Statistics
What is Hypothesis testing
Confidence Level
Degrees of freedom
what is pValue
Chi-Square test
What is ANOVA
Correlation vs Regression
Uses of Correlation & Regression
Python Getting Started with Python
Python Overview
About Interpreted Languages
Advantages/Disadvantages of Python pydoc.
Starting Python
Interpreter PATH
Using the Interpreter
Running a Python Script
Using Variables
Keywords
Built-in Functions
StringsDifferent Literals
Math Operators and Expressions
Writing to the Screen
String Formatting
Command Line Parameters and Flow Control.

Sequences and File Operations


Lists
Tuples
Indexing and Slicing
Iterating through a Sequence
Functions for all Sequences
Using Enumerate()
Operators and Keywords for Sequences
The xrange() function
List Comprehensions
Generator Expressions
Dictionaries and Sets.

Numpy & Pandas


with Matplotlib &
Seaborn
Learning NumPy
Pllotting using Matplotlib and Seabron
Machine Learning application

Introduction to Pandas
Creating Data Frames
GroupingSorting
Plotting Data
Creating Functions
Converting Different Formats
Combining Data from Various Formats
Slicing/Dicing Operations.

Deep Dive - Functions Sorting Errors and Exception Handling


Functions
Function Parameters
Global Variables
Variable Scope and Returning Values. Sorting
Alternate Keys
Lambda Functions
Sorting Collections of Collections
Sorting Dictionaries
Sorting Lists in Place
Errors and Exception Handling
Handling Multiple Exceptions
The Standard Exception Hierarchy
Using Modules
The Import Statement
Module Search Path
Package Installation Ways.

Regular Expressionsit's Packages and Object Oriented Programming in Python


The Sys Module
Interpreter Information
STDIO
Launching External Programs
PathsDirectories and Filenames
Walking Directory Trees
Math Function
Random Numbers
Dates and Times
Zipped Archives
Introduction to Python Classes
Defining Classes
Initializers
Instance Methods
Properties
Class Methods and DataStatic Methods
Private Methods and Inheritance
Module Aliases and Regular Expressions.

Debugging, Databases and Project Skeletons


Debugging
Dealing with Errors
Creating a Database with SQLite 3
CRUD Operations
Creating a Database Object.
Plotly & Dash Getting Started with Plotly
Plotly and Dash Overview

Plotly Basics

Scatter Plots

Line Charts

Bar Charts

Bubble Plots

Box Plots

Histograms

Distplots

Heatmaps

Introduction to Dash Dash Basics - Layout

Introduction to Dash Basics

Dash Layouts

Dash Layouts - Styling

Converting Simple Plotly Plot to Dashboard with Dash

DashBoard Basics

Create a Simple Dashboard

DashBoard Components

Dash Components

HTML Components

Core Components

Markdown with Dash


Interactive Components

Single Callbacks for Interactivity

Dash Callbacks for Graphs

Multiple Inputs

Multiple Outputs

Callbacks with State

Controlling Callbacks with Dash State

Interacting with Visualizations

Hover Over Data

Click Data

Selection Data

Updating Graphs on Interactions

Updating Graphs on Interactions Part 2

Updating Graphs on Interactions - Part Three

Project Imports and Graph Setup

Input Box and Basic Callback

Reading Data with Pandas Datareader

Adding DatePickers for Choosing Dates

Adding in Dash State

Multiple Stock Option Dropdown

Live Updating

Layout Updating

Deployment
App Authorization

Deploying App to Heroku


oard with Dash
Machine Learning
Deep Learning & AI
using Python
Project Machine learning algorithms Python

Feature Selection and Preprocessing

Which Algorithms perform best

Model selection cross validation score


Introduction
ML Fundamentals
ML Common Use Cases
Understanding Supervised and Unsupervised Learning Techniques
Linear Regression
Case study in Financial Domain
Introduction to Predictive Modeling in Financial Domain
Linear Regression Overview
Simple Linear Regression
Multiple Linear Regression

Logistic Regression
Case study in Banking Domain
Logistic Regression Overview
Data Partitioning
Univariate Analysis
Bivariate Analysis
Multicollinearity Analysis
Model Building
Model Validation
Model Performance Assessment AUC & ROC curves
Scorecard

Clustering
Case study in Ecommerce Domain
Similarity Metrics
Distance Measure Types: Euclidean, Cosine Measures
Creating predictive models
Understanding K-Means Clustering
Understanding TF-IDF, Cosine Similarity and their application to Vector Space Model
Case study

Implementing Association rule mining


Case study in Retail Domain
What is Association Rules & its use cases?
What is Recommendation Engine & it’s working?
Recommendation Use-case
Case study
Understanding Process flow of Supervised Learning Techniques

Decision Tree Classifier


Case study in Healthcare Domain
How to build Decision trees
What is Classification and its use cases?
What is Decision Tree?
Algorithm for Decision Tree Induction
Creating a Decision Tree
Confusion Matrix
Case study

Random Forest Classifier


What is Random Forests
Features of Random Forest
Out of Box Error Estimate and Variable Importance
Case study in Healthcare Domain

Naive Bayes Classifier.


Case study
Project Discussion
Problem Statement and Analysis
Various approaches to solve a Data Science Problem
Pros and Cons of different approaches and algorithms.

Support Vector Machines


Case study in Healthcare Domain
Introduction to SVMs
SVM History
Vectors Overview
Decision Surfaces
Linear SVMs
The Kernel Trick
Non-Linear SVMs
The Kernel SVM

Time Series Analysis


Case study in Stock Exchange & Power Infra Domain
Describe Time Series data
Format your Time Series data
List the different components of Time Series data
Discuss different kind of Time Series scenarios
Choose the model according to the Time series scenario
Implement the model for forecasting
Explain working and implementation of ARIMA model
Illustrate the working and implementation of different ETS models
Forecast the data using the respective model
What is Time Series data?
Time Series variables
Different components of Time Series data
Visualize the data to identify Time Series Components
Implement ARIMA model for forecasting
Exponential smoothing models
Identifying different time series scenario based on which different Exponential Smoothing model can be applied
Implement respective model for forecasting
Visualizing and formatting Time Series data
Plotting decomposed Time Series data plot
Applying ARIMA and ETS model for Time Series forecasting
Forecasting for given Time period
Case Study

ne learning algorithms Python


Various machine learning algorithms in Python
Apply machine learning algorithms in Python

tion and Preprocessing


How to select the right data
Which are the best features to use
Additional feature selection techniques
A feature selection case study
Preprocessing
Preprocessing Scaling Techniques
How to preprocess your data
How to scale your data
Feature Scaling Final Project

hms perform best


Highly efficient machine learning algorithms
Bagging Decision Trees
The power of ensembles
Random Forest Ensemble technique
Boosting - Adaboost
Boosting ensemble stochastic gradient boosting
A final ensemble technique

on cross validation score


Introduction Model Tuning
Parameter Tuning GridSearchCV
A second method to tune your algorithm
How to automate machine learning
Which ML algo should you choose
How to compare machine learning algorithms in practice
othing model can be applied
Deep Learning & AI
using Python
Deep Learning & AI

Introduction to Artificial Neural Networks

Convolutional Neural Networks

What are RNNs - Introduction to RNNs

Restricted Boltzmann Machine (RBM) and Autoencoders


Tensorflow with Python

Building Neural Networks using


Tensorflow

Deep Learning using


Tensorflow
Transfer Learning using
Keras and TFLearn

Text Mining & NLP &


Deep NLP & Chatbot
Sentimental Analysis

Computer Vision
Case study in Stock Exchange &
Image Recognition & Banking Domain
Deep Learning Overview
The Brain vs Neuron
Introduction to Deep Learning

The Detailed ANN


The Activation Functions
How do ANNs work & learn
Gradient Descent
Stochastic Gradient Descent
Backpropogation
Understand limitations of a Single Perceptron
Understand Neural Networks in Detail
Illustrate Multi-Layer Perceptron
Backpropagation – Learning Algorithm
Understand Backpropagation – Using Neural Network Example
MLP Digit-Classifier using TensorFlow
Building a multi-layered perceptron for classification
Why Deep Networks
Why Deep Networks give better accuracy?
Use-Case Implementation
Understand How Deep Network Works?
How Backpropagation Works?
Illustrate Forward pass, Backward pass
Different variants of Gradient Descent
Case study in Banking Domain

Case study in Image Recognition Domain


Convolutional Operation
Relu Layers
What is Pooling vs Flattening
Full Connection
Softmax vs Cross Entropy
Building a real world convolutional neural network
for image classification

Case study in Stock Exchange Domain


Recurrent neural networks rnn
LSTMs understanding LSTMs
long short term memory neural networks lstm in python

M) and Autoencoders
Restricted Boltzmann Machine
Applications of RBM
Introduction to Autoencoders
Autoencoders applications
Understanding Autoencoders
Building a Autoencoder model

Introducing Tensorflow
Introducing Tensorflow
Why Tensorflow?
What is tensorflow?
Tensorflow as an Interface
Tensorflow as an environment
Tensors
Computation Graph
Installing Tensorflow
Tensorflow training
Prepare Data
Tensor types
Loss and Optimization
Running tensorflow programs

Tensors
Tensorflow data types
CPU vs GPU vs TPU
Tensorflow methods
Introduction to Neural Networks
Neural Network Architecture
Linear Regression example revisited
The Neuron
Neural Network Layers
The MNIST Dataset
Coding MNIST NN

Deepening the network


Images and Pixels
How humans recognise images
Convolutional Neural Networks
ConvNet Architecture
Overfitting and Regularization
Max Pooling and ReLU activations
Dropout
Strides and Zero Padding
Coding Deep ConvNets demo
Debugging Neural Networks
Visualising NN using Tensorflow
Tensorboard

Transfer Learning Introduction


Google Inception Model
Retraining Google Inception with our own data demo
Predicting new images
Transfer Learning Summary
Extending Tensorflow
Keras
TFLearn
Keras vs TFLearn Comparison

Case study
Case study

Case study
Computer Vision
using Open CV
NumPy and Image Basics

Introduction to Numpy and Image Section

NumPy Arrays

What is an image?

Images and NumPy

Image Basics with OpenCV

Introduction to Images and OpenCV Basics

Opening Image files with OpenCV

Drawing on Images

Direct Drawing on Images with a mouse - Advanced

Image Processing

Color Mappings

Blending and Pasting Images

Blending and Pasting Images - Masks - Advanced

Image Thresholding

Blurring and Smoothing

Blurring and Smoothing - Advanced

Morphological Operators

Gradients

Histograms

Histogram Eqaulization

Image Processing Assessment

Video Basics with Python and OpenCV


Introduction to Video Basics

Connecting to Camera

Using Video Files

Drawing on Live Camera

Video Basics Assessment

Object Detection with OpenCV and Python

Introduction to Object Detection

Template Matching

Corner Detection - Harris Corner Detection

Corner Detection - Shi-Tomasi Detection

Edge Detection

Grid Detection

Contour Detection

Feature Matching

Feature Matching

Watershed Algorithm

Custom Seeds with Watershed Algorithm

Introduction to Face Detection

Face Detection with OpenCV

Detection Assessment

Object Tracking

Introduction to Object Tracking

Optical Flow

Optical Flow Coding with OpenCV


MeanShift and CamShift Tracking Theory

MeanShift and CamShift Tracking with OpenCV

Overview of various Tracking API Methods

Tracking APIs with OpenCV

Deep Learning for Computer Vision

Understanding Classification Metrics

Introduction to YOLO v3

YOLO Weights Download

YOLO v3 with Python


NLP - Chatbots
Text Mining
Natural Language Processing Basics

Introduction to Natural Language Processing

What is Natural Language Processing?

Tokenization - Part One

Stemming

Lemmatization

Stop Words

Phrase Matching and Vocabulary

Part of Speech Tagging and Named Entity Recognition

Introduction to Section on POS and NER

Part of Speech Tagging

Named Entity Recognition

Sentence Segmentation

Text Classification

Introduction to Text Classification

Classification Metrics

Confusion Matrix

Text Feature Extraction

Semantics and Sentiment Analysis

Overview of Semantics and Word Vectors

Semantics and Word Vectors with Spacy

Sentiment Analysis with NLTK

Topic Modeling
Latent Dirichlet Allocation Overview

Deep Learning for NLP

The Basic Perceptron Model

Introduction to Neural Networks

Keras Basics

Recurrent Neural Network

LSTMs, GRU, and Text Generation

Chat Bots Overview

Creating Chat Bots with Python


Intro to R Programming
Introduction to R
Business Analytics
Analytics concepts
The importance of R in analytics
R Language community and eco-system
Usage of R in industry
Installing R and other packages
Perform basic R operations using command line
Usage of IDE R Studio and various GUI

R Programming Concepts
The datatypes in R and its uses
Built-in functions in R
Subsetting methods
Summarize data using functions
Use of functions like head(), tail(), for inspecting data
Use-cases for problem solving using R

Data Manipulation in R
Various phases of Data Cleaning
Functions used in Inspection
Data Cleaning Techniques
Uses of functions involved
Use-cases for Data Cleaning using R

Data Import Techniques in R


Import data from spreadsheets and text files into R
Importing data from statistical formats
Packages installation for database import
Connecting to RDBMS from R using ODBC and basic SQL queries in R
Web Scraping
Other concepts on Data Import Techniques

Exploratory Data Analysis (EDA) using R


What is EDA?
Why do we need EDA?
Goals of EDA
Types of EDA
Implementing of EDA
Boxplots, cor() in R
EDA functions
Multiple packages in R for data analysis
Some fancy plots
Use-cases for EDA using R

Data Visualization in R
Story telling with Data
Principle tenets
Elements of Data Visualization
Infographics vs Data Visualization
Data Visualization & Graphical functions in R
Plotting Graphs
Customizing Graphical Parameters to improvise the plots
Various GUIs
Spatial Analysis
Other Visualization concepts
Apache Spark
using Scala Module 1 Apache Spark

Module 2 Introduction to Scala

Module 3 Spark Core Architecture

Module 4 Spark Internals

Module 5 Introducing Mllib

Module 1 Apache Spark


Introduction to Apache Spark
Why Spark
Batch Vs. Real Time Big Data Analytics

Batch Analytics - Hadoop Ecosystem Overview,


Real Time Analytics Options,
Streaming Data - Storm,
In Memory Data - Spark, What is Spark?,       
Spark benefits to Professionals
Limitations of MR in Hadoop
Components of Spark
Spark Execution Architecture
Benefits of Apache Spark
Hadoop vs Spark
Module 3 Spark Core Architecture
Spark & Distributed Systems
Spark for Scalable Systems
Spark Execution Context
What is RDD
RDD Deep Dive
RDD Dependencies
RDD Lineage
Spark Application In Depth
Spark Deployment
Parallelism in Spark
Caching in Spark
Module 4 Spark Internals & Spark SQL
Spark Transformations
Spark Actions
Spark Cluster
Spark SQL Introduction
Spark Data Frames
Spark SQL with CSV
Spark SQL with JSON
Spark SQL with Database
Module 1 Big Data and Hadoop Introduction
What is Big Data and Hadoop?
Challenges of Big Data
Traditional approach Vs Hadoop
Hadoop Architecture
Distributed Model
Block structure File System
Technologies supporting Big Data
Replication
Fault Tolerance
Why Hadoop?
Hadoop Eco-System
Use cases of Hadoop
Fundamental Design Principles of Hadoop
Comparison of Hadoop Vs RDBMS

Module 2 Map Reduce Concepts


What is Map Reduce?
Why Map Reduce?
Map Reduce in real world.
Map Reduce Flow
What is Mapper?
What is Reducer?
What is Shuffling?
Word Count Problem
Distributed Word Count Flow & Solution
Log Processing and Map Reduce

Module 3+B6 HIVE


Hive Fundamentals   & Architecture
Loading and Querying Data in Hive
Hands-On Exercise
Hive Architecture and Installation
Comparison with Traditional Database
HiveQL: Data Types, Operators and Functions,
Hands-On Exercise
Hive Tables ,Managed Tables and External Tables
Hands-On Exercise
Partitions and Buckets
Hands-On Exercise
Storage Formats, Importing Data, Altering Tables, Dropping Tables
Hands-On Exercise
Querying Data, Sorting and Aggregating, Map Reduce Scripts,
Hands-On Exercise
Joins & Sub queries, Views
Hands-On Exercise
When to Use HIVE, Impala and Pig
Hands on Exercises
Integration, Data manipulation with Hive
Hands-On Exercise
User Defined Functions,
Hands-On Exercise
Appending Data into existing Hive Table
Hands-On Exercise
Static partitioning vs dynamic partitioning
Hands-On Exercise
Azure Machine Learning
Azure Machine Learning Workflow
Getting Access to Azure Machine Learning
Azure Machine Learning Studio
Creating Models using Azure ML
Getting and Exploring Data using Azure ML
Selecting the Algorithm and Training the Model
Evaluating the Trained Model
Deep into Azure Machine Learning

Data Processing
Data Input-Output - Upload Data
Data Input-Output - Convert and Unpack
Data Input-Output - Import Data
Data Transform - Add Rows/Columns, Remove Duplicates, Select Columns
Data Transform - Apply SQL Transformation, Clean Missing Data, Edit Metadata
Sample and Split Data - How to Partition or Sample, Train and Test Data

Classification
Logistic Regression - What is Logistic Regression?
Logistic Regression - Build Two-Class Loan Approval Prediction Model
Logistic Regression - Understand Parameters and Their Impact
Understanding the Confusion Matrix, AUC, Accuracy, Precision, Recall and F1Score
Logistic Regression - Understanding the results
Logistic Regression - Model Selection and Impact Analysis
Logistic Regression - Build Multi-Class Wine Quality Prediction Model
Decision Tree - What is Decision Tree?
Decision Tree - Ensemble Learning - Bagging and Boosting

Two Class Decision Forest - Income Prediction


Decision Tree - Multi Class Decision Forest
SVM - What is Support Vector Machine?
SVM - Census Income Prediction

Hyperparameter Tuning
Tune Hyperparameter for Best Parameter Selection

Regression Analysis
What is Linear Regression?
Regression Analysis - Common Metrics
Linear Regression model using OLS
Linear Regression - R Squared
Gradient Descent
Clustering
What is Cluster Analysis?
Cluster Analysis Experiment 1
Cluster Analysis Experiment 2 - Score and Evaluate

Recommendation System
What is a Recommendation System?
Data Preparation using Recommender Split
What is Matchbox Recommender and Train Matchbox Recommender
How to Score the Matchbox Recommender?
Restaurant Recommendation
Understanding the Recommendation Results

Data to Azure Machine Learning


Getting and Exploring Data using Azure ML
Data Pre-processing using Azure ML
Incorporating Code
Selecting and Training an Algorithm
Evaluating Your Trained Model
Understanding Evaluation
Comparing Model Results
Deploy Webservice
Azure ML Webservice - Prepare the experiment for webservice
Deploy Machine Learning Model As a Web Service
Use the Web Service
plicates, Select Columns
Missing Data, Edit Metadata
e, Train and Test Data

al Prediction Model
Their Impact
cy, Precision, Recall and F1Score

y Prediction Model
hbox Recommender

r webservice

You might also like