Syllabus
Syllabus
PDS1501 Credits :4
REFERENCES:
1. Jojo Moolayil, “Smarter Decisions : The Intersection of IoT and Data Science”,
PACKT, 2016.
2. Cathy O’Neil and Rachel Schutt , “Doing Data Science”, O'Reilly, 2015.
3. David Dietrich, Barry Heller, Beibei Yang, “Data Science and Big data Analytics”,
EMC 2013
4. Raj, Pethuru, “Handbook of Research on Cloud Infrastructures for Big Data
Analytics”, IGI Global.
SEMESTER – I Hours/week: 4
PDS1502 Credits :4
REFERENCES:
1. Gupta, S.C. and Kapoor, V.K.: “Fundamentals of Mathematical Statistics”, Sultan &
Chand & Sons, New Delhi, 11th Ed, 2002.
2. Hastie, Trevor, et al. “The elements of Statistical Learning”, Springer, 2009.
3. Practical Statistics for Data Scientists, 2nd Edition, Peter Bruce, Andrew Bruce and
Peter Gedeck, May 2020
4. Statistics for Machine Learning, By Pratap Dangeti, July 2017
SEMESTER – I Hours/week: 5
PDS1503 Credits :5
PYTHON FOR DATA SCIENCE
.
Unit – I: Data Structures and OOP
Python Program Execution Procedure – Statements – Expressions – Flow of Controls –
Functions – Numeric Data Types – Sequences – Strings – Tuples – Lists – Dictionaries.
Series and DataFrame data structures in pandas - Creation of Data Frames – Accessing the
columns in a DataFrame - Accessing the rows in a DataFrame - Panda’s Index Objects -
Reindexing Series and DataFrames - Dropping entries from Series and Data Frames -
Indexing, Selection and Filtering in Series and Data Frames - Arithmetic Operations between
Data Frames and Series - Function Application and Mapping.
REFERENCES:
1. Gowrishanker and Veena, “Introduction to Python Programming”, CRC Press,
2019.
2. Python Crash Course, 2nd Edition, By Eric Matthes, May 2019
3. NumPy Essentials, By Leo Chin and Tanmay Dutta, April 2016
4. Joel Grus, “Data Science from scratch”, O'Reilly, 2015.
5. Wes Mc Kinney, “Python for Data Analysis”, O'Reilly Media, 2012.
6. Kenneth A. Lambert, (2011), “The Fundamentals of Python: First Programs”,
Cengage Learning
7. Jake Vanderplas. Python Data Science Handbook: Essential Tools for
Working with Data 1st Edition.
SEMESTER – I Hours/week: 4
PDS1504 Credits :4
LIST OF EXERCISES:
1. Editing and executing Programs involving Flow Controls.
2. Editing and executing Programs involving Functions.
3. Program in String Manipulations
4. Creating and manipulating a Tuple
5. Creating and manipulating a List
6. Creating and manipulating a Dictionary
7. Object Creation and Usage
8. Program involving Inheritance
9. Program involving Overloading
10. Reading and Writing with Text Files and Binary Files
11. Combining and Merging Data Sets
12. Program involving Regular Expressions
13. Data Aggregation and GroupWise Operations
SEMESTER – I Hours/week: 4
PDS1505 Credits :4
RDBMS LAB
1. Creating a database
2. Creating a table
3. Inserting records in a table
4. Altering the table structure.
5. Deleting data from table
6. Updating data from table.
7. Select command
8. Where clause
9. Aggregate functions
10. Numeric functions ( Absolute, ceiling, floor, modulo, round off, square, Square Root,
power)
11. Constraints
12. Group By, Having
13. Operators (and, or, not between, In , not in, is null, is not null, like, Order By)
14. String Functions (Lower, Upper, Replace, left-trim, right-trim, substring, Length,
rename)
15. Drop (table, database)
16. Truncate
17. Sub Queries, Alias
SEMESTER – I Hours/week: 5
PDS1506 Credits :5
MACHINE LEARNING
.
Unit – I: Introduction
Machine Learning Foundations – Overview – Design of a Learning System – Types of
Machine Learning – Supervised Learning and Unsupervised Learning – Mathematical
Foundations of Machine Learning – Applications of Machine Learning.
REFERENCES:
1. Write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.
2. Assuming a set of documents that need to be classified, use the naïve Bayesian algorithm.
3. Classifier model to perform this task. Built-in Java classes/API can be used to write the
program. Calculate the accuracy, precision, and recall for your data set.
4. Write a program to implement k-Nearest Neighbour algorithm to classify the iris. print
both correct and wrong predictions. Java/Python ML library classes can be used for this
problem.
5. Write a program to implement Logistic Regression algorithm to classify the housing price
data set. Print both correct and wrong predictions. Java/Python ML library classes can be
used for this problem.
6. Write a program to implement and compare SVM, KNN and Logistic regression algorithm
to classify the iPhone purchase records data set. Print both correct and wrong predictions.
Java/ Python ML library classes can be used for this problem.
SEMESTER – II Hours/week: 4
PDS2501 Credits :4
STATISTICAL INFERENCE
UNIT – V:
Non-parametric tests – Kolmogorov -Smirnov test, Sign test, Wald- Wolfowitz run test, run
test for randomness, median test, Wilcoxon test and Wilcoxon – Mann-Whitney U test.
REFERENCE BOOKS
1. Gupta, S.C. and Kapoor, V.K.: “Fundamentals of Mathematical Statistics”, Sultan &
Chand & Sons, New Delhi, 11th Ed, 2002.
2. Rohatgi, V.K. : “Statistical Inference”, John Wiley and sons, 1984.
3. Hogg, R.V, Craig. A.T. and Tannis: “Introduction to mathematical statistics”, Prentice
Hall, England, 1995.
4. Dudewicz. E.J and Mishra.S.N.: “Modern Mathematical statistics”, John Wiley and sons,
1988.
SEMESTER – II Hours/week: 4
PDS2502 Credits :4
REFERENCES:
1. Tomasz Drabos, “Learning PySpark”, PACKT, 2017.
2. Padma Priya Chitturi, “Apache Spark for Data Science”, PACKT, 2017.
3. Holden Karau, “ Learning Spark”. PACKT, 2016.
4. Sandy Riza, “Advanced Analytics with Spark”, O’ Reilly, 2016.
5. Romeo Kienzler, “Mastering Apache Spark”, PACKT, 2017.
SEMESTER – II Hours/week: 4
PDS2503 Credits :4
LIST OF EXERCISES:
1. Program involving Resilient Distributed Datasets
2. Program involving Transformations and Actions
3. Program involving Key-Value Resilient Distributed Datasets
4. Program involving Local Variables, Broadcast Variables and Accumulators
5. Program involving Filter, Join, GroupBy, Agg operations
6. Viewing and Querying Temporary Tables
7. Transferring, Summarizing and Analysing Twitter data
8. Program involving Flume, Kafka and Kinesis
9. Program involving DStreams and Dstream RDDs
10. Linear Regression
11. Decision Tree Classification
12. Principal Component Analysis
13. Random Forest Classification
14. Text Pre-processing with TF-IDF
15. Naïve Bayes Classification
16. K-Means Clustering
SEMESTER – II Hours/week: 4
PDS2504 Credits :4
NOSQL DATABASES
Unit – V: Cassendra
Introduction – Features - Data types – CQLSH - Key spaces - CRUD operations – Collections –
Counter – TTL - Alter commands - Import and Export - Querying System tables.
SEMESTER – II Hours/week: 4
PDS2505 Credits :4
Exercises on HDFS
Exercises on Apache Hive as an HDFS Data Warehouse
Exercises on HBase
Exercises on MongoDB
Exercises on Cassandra
Exercises on Neo4j
SEMESTER – II Hours/week: 4
PDS2601 Credits :3
Unit: I
Introduction: Meaning-Importance of Financial Analytics uses-Features-Documents used in
Financial Analytics: Balance Sheet, Income Statement, Cash flow statement-Elements of
Financial Health: Liquidity, Leverage, Profitability. Financial Securities : Bond and Stock
investments - Housing and Euro crisis - Securities Datasets and Visualization - Plotting
multiple series.
Unit: II
Using Excel to Summarize Data, Slicing and Dicing Financial Data with PivotTables, Excel
Charts to Summarize Marketing Data. Excel Functions to Summarize Data, Pricing
Analytics, Risk based pricing, Fraud Detection and Prediction, Recovery Management, Loss
Risk Forecasting, Risk Profiling, Portfolio Stress Testing.
Unit: III
Descriptive Analytics, Data Exploration, Dimension Reduction and Data Clustering
Geographical Mapping Market Basket Analysis. Predictive Analytics Fraud Detection Churn
Analysis Crime Mapping, Content Analytics Sentiment Analysis
Unit: IV
Forecasting Analytics Estimating Demand Curves and Optimize Price, Price Bundling, Non
Linear Pricing and Price Skimming, Forecasting, Simple Regression and Correlation Multiple
Regression to forecast sales. Modelling Trend and Seasonality Ratio to Moving Average
Method, Winter’s Method
UNIT - V
Analyzing financial data and implement financial models using R. Process of Data analytics
using R: obtaining publicly available data, refining such data, implement the models and
generate typical output, Prices and individual security returns, Portfolio returns, Risks, Factor
Models
TEXTBOOKS
REFERENCE BOOKS
Analyzing Financial Data and Implementing Financial Models Using ‘R’, Ang
Clifford, Springers.
Microsoft Excel 2013: Data Analysis and Business Modeling, Wayne L.
Winston, Microsoft Publishing
SEMESTER – II Hours/week: 4
PDS2602 Credits :3
UNIT I
Introduction
Introduction to Healthcare Data Analytics- Electronic Health Records– Components of EHR-
Coding Systems- Benefits of EHR- Barrier to Adopting HER Challenges-Phenotyping
Algorithms.
Unit II
Image Analysis
Biomedical Image Analysis- Mining of Sensor Data in Healthcare- Biomedical Signal
Analysis- Genomic Data Analysis for Personalized Medicine.
Unit III
Data Analytics
Natural Language Processing and Data Mining for Clinical Text- Mining the Biomedical
Social Media Analytics for Healthcare.
Unit IV
Advanced Data Analytics
Advanced Data Analytics for Healthcare– Review of Clinical Prediction Models- Temporal
Data Mining for Healthcare Data- Visual Analytics for Healthcare- Predictive 53 Models for
Integrating Clinical and Genomic Data- Information Retrieval for Healthcare- Data
Publishing Methods in Healthcare.
Unit V
Applications
Applications and Practical Systems for Healthcare– Data Analytics for Pervasive Health-
Fraud Detection in Healthcare- Data Analytics for Pharmaceutical Discoveries- Clinical
Decision Support Systems- Computer-Assisted Medical Image Analysis Systems- Mobile
Imaging and Analytics for Biomedical Data.
TEXT BOOKS
Chandan K. Reddy and Charu C Aggarwal, “Healthcare data analytics”, Taylor &
Francis, 2015.
REFERENCE BOOKS
Hui Yang and Eva K. Lee, “Healthcare Analytics: From Data to Knowledge to
Healthcare Improvement, Wiley, 2016.
SEMESTER – II Hours/week: 3
P__2901 Credits :2
TEXTBOOKS:
• Fundamentals of Data Visualization, By Claus O. Wilke, April 2019
• Visual Analytics with Tableau, By Alexander Loth, May 2019
SEMESTER – III Hours/week: 4
PDS3501 Credits :4
REFERENCES:
1. Joseph F Hair, William C Black etal , “Multivariate Data Analysis” , Pearson
Education, 7th edition, 2013.
2. T. W. Anderson , “An Introduction to Multivariate Statistical Analysis, 3rd Edition”,
Wiley, 2003.
3. William r Dillon, John Wiley & sons, “Multivariate Analysis methods and
applications”, Wiley, 1984.
4. Naresh K Malhotra, Satyabhusan Dash, “Marketing Research Anapplied
Orientation”, Pearson, 2011.
SEMESTER – III Hours/week: 4
PDS3502 Credits :4
DEEP LEARNING
.
Unit – I: Artificial Neural Networks
The Neuron – Activation Function – Gradient Descent – Stochastic Gradient Descent – Back
Propagation – Business Problem.
REFERENCES:
1. Francois Challot, “ Deep learning with Python”, Manning, 2017.
2. Deep Learning Illustrated: A Visual, Interactive Guide to Artificial Intelligence,By
Jon Krohn, Grant Beyleveld and Aglaé Bassens, September 2019
3. Ian Goodfellow, “Deep Learning”, MIT Press, 2017.
4. Josh Patterson, “Deep Learning: A Practitioner’s Approach”, PACKT, 2017.
5. Dipayan Dev, “ Deep Learning with Hadoop”, PACKT, 2017.
6. Hugo Larochelle’s Video Lectures on Deep Learning
SEMESTER – III Hours/week: 4
PDS3503 Credits :4
DEEP LEARNING - LAB
LIST OF EXERCISES:
1. Setting up the Spyder IDE Environment and Executing a Python Program
2. Installing Keras, Tensorflow and Pytorch libraries and making use of them
3. Artificial Neural Networks
4. Convolutional Neural Networks
5. Image Transformations
6. Image Gradients and Edge Detection
7. Image Contours
8. Image Segmentation
9. Harris Corner Detection
10. Face Detection using Haar Cascades
11. Chatbot Creation
SEMESTER – III Hours/week: 4
PDS3504 Credits :4
CLOUD COMPUTING
Unit – I: Introduction
Evolution of Cloud Computing –Essential Characteristics of cloud computing – Operational
models such as private, dedicated, virtual private, community, hybrid and public cloud –
Service models such as IaaS, PaaS and SaaS – Governance and Change Management –
Business drivers, metrics and typical use cases. Example cloud vendors – Google cloud
platform, Amazon AWS, Microsoft Azure, Pivotal cloud foundry and Open Stack.
Unit – II: Infrastructure Services
Basics of Virtual Machines - Taxonomy of Virtual Machines. Virtualization Architectures.
Challenges with Dynamic Infrastructure - Principles of Infrastructure as Code -
Considerations for Infrastructure Services and Tools - Monitoring: Alerting, Metrics, and
Logging - Service Discovery - Server Provisioning via Templates - Patterns and Practices for
Continuous Deployment - Organizing Infrastructure and Testing Infrastructure - Change
Management Pipelines for Infrastructure.
Unit – III: Platform Engineering
Cloud Native Design and Microservices– Containerized - Dynamically orchestrated design –
Continuous delivery - Support for a variety of client devices – Monolithic vs Microservices
Architecture - Characteristics of microservice architecture – 12 factor application design -
Considering service granularity – Scalable Services - Sharing dependencies between
microservices - Stateless versus Stateful microservices - Service discovery – Service Registry
– Performance Considerations.
Unit – IV: Serverless Architecture and DevOps
Function as a Service (FaaS) - Backend as a Service (BaaS) - Advantages of serverless
architectures - Taking a hybrid approach to serverless architecture - Function deployment and
Function invocation. Introduction to DevOps - The Deployment Pipeline - The Overall
Architecture - Building and Testing - Deployment - Crosscutting Concerns such as
Monitoring, Scalability, Repeatability, Reliability, Recoverability, Interoperability,
Testability, and Modifiability,
Unit – V: Cloud Security
Security Considerations – STRIDE Threat Model - Cloud Security Challenges – Cloud
specific Cryptographic Techniques – CIA Triad – Security by Design – Common Security
Risks - Risk Management – Security Monitoring – Security Architecture Design – Data
Security – Application Security – Virtual Machine Security.
REFERENCES:
1. Dr.AnandNayyar, (2019), “Handbook of Cloud Computing”, BPB
2. Mastering Azure Machine Learning, By Christoph Korner and Kaijisse
Waaijer, April 2020
3. Hands-On Machine Learning on Google Cloud Platform,By Giuseppe
Ciaburro, V Kishore Ayyadevara and Alexis Perrier, April 2018
4. Learning Path: AWS Certified Machine Learning-Specialty ML, By Noah
Gift, April 2019
5. Software Architect's Handbook, by Joseph Ingeno, Published by Packt
Publishing, 2018
6. Architecting Cloud Computing Solutions by Scott Goessling, Kevin L.
Jackson, Publisher: Packt Publishing, Release Date: May 2018
7. Microservices: Flexible Software Architecture, by Eberhard Wolff, Publisher:
Addison-Wesley Professional, Release Date: October 2016
SEMESTER – III Hours/week: 4
PDS3601 Credits :3
UNIT IV - SYNTAX
Basic Concepts of Syntax – Parsing Techniques – General Grammar rules for Indian
Languages – Context Free Grammar – Parsing with Context Free Grammars – Top Down
Parser – Earley Algorithm – Features and Unification - Lexicalised and Probabilistic Parsing.
REFERENCES:
1. Daniel Jurafskey and James H. Martin “Speech and Language Processing”, Prentice
Hall, 2009.
2. Christopher D.Manning and Hinrich Schutze, “Foundation of Statistical Natural
Language Processing”, MIT Press, 1999.
3. Ronald Hausser, “Foundations of Computational Linguistics”, Springer-Verleg, 1999.
4. James Allen, “Natural Language Understanding”, Benjamin/Cummings Publishing
Co. 1995.
5.Applied Natural Language Processing with Python: Implementing Machine Learning
and Deep Learning Algorithms for Natural Language Processing,By Taweh Beysolow
II, September 2018
SEMESTER – III Hours/week: 4
PDS3602 Credits :3
Here is a rough outline of topics and the number of lectures to be spent on each topic:
Textbooks: