2nd Sem, Data Science Syllabus
2nd Sem, Data Science Syllabus
Semester-I
Course code: MDS41MMP501 Course name: Practical Based on Introduction to Data Science
Course category: Major Mandatory Credits: 1
Course Objectives: Understand and learn the lifecycle and phases of data science and work comfortably
with data science projects
Course Outcomes: At the end of the course, the students will be able to -
CO1: To develop fundamental knowledge of concepts underlying data science
CO2: To develop practical data analysis skills, which can be applied to practical problems
CO3: To explain how math and information sciences can contribute to building better algorithms and
software
CO4: To develop applied experience with data science software, programming, applications and processes
Contents –
Teaching
Unit Content
hours
1 To create records in excel using different data types. 1
2 To create student mark details by applying at least any ten statistical functions. 1
Text Books: Jeffrey S.Saltz,Jeffrey M.Stanton Introduction to Data Science Ebook SAGE Publications
Reference Books:
Online Resources: 1. NPTEL / SWAYAM lectures.
MGM Campus, N-6, CIDCO, Chhatrapati Sambhajinagar – 431003, Maharashtra, India. II mgmu.ac.in
26
Semester-II
Course code: MDS41MML504 Course name: Advances in Database Management System
Course category: Major Mandatory Credits: 3
Pre-requisites: Basic knowledge about computers including some experience using UNIX or Windows.
Understanding of creation and updating of tables.
Course Objectives: Students can design new database and modify existing ones for new or existing
applications
Course Outcomes: At the end of the course, the students will be able to -
CO1: To know the different issues involved in the design and implementation of a database system.
CO2: Can use data manipulation language to query, update, and manage a database.
CO3: Essential DBMS concepts such as: database security, integrity, concurrency, etc.
CO4: Understand Advanced Application Development
Contents –
Teaching
Unit Content
hours
Introduction to DBMS:
Conceptual Database Design: Database Administrator (DBA), Database
1 User, Database System, Database Architecture, DBMS Components. 10
Data Models: Schemas and Instances, Database Languages, Database
Structure, E-R Model, Strong and Weak Entity Sets.
SQL and Normalization:
Features of SQL, Data Definition Languages (DDL), Data Manipulation
2 Languages (DML), Views, Functions, RoolBack, Commit and Savepoint, 10
Indexes. Normalization: 1NF, 2NF, 3NF, 4NF, 5NF, Flowchart of
Normalization, Domain Key Normal Form (DKNF).
Concurrency Control Techniques and Recovery:
Lock-based Protocols, Deadlock Handling, Recovery System, Failure
3 Classification, Store Structure, Categorization of Recovery Algorithm, 10
Log-based Recovery, Shadow Paging, Recovery with Concurrent
Transaction, Check Points.
Parallel and Web Databases:
Design of Parallel Databases, Parallel Query Evaluation, Advantages of
4 Parallel Databases, Elements of Parallel Database Processing. Introduction 10
to Web Databases, Accessing Database Through the Web, XML Database.
Multimedia Databases, Mobile Databases, Digital Databases.
Advanced Application Development:
Performance Tuning, Bottlenecks, Tuning the Database Design,
Performance Simulation, Database Application Classes, Benchmarks
5 5
Suites, Standardization, Database Connectivity Standards, Object Oriented
Databases Standards, Marketplaces, Secure Payment Systems, Legacy
Systems.
Text Books: 1. Rajiv Chopra Database Management System (DBMS)A Practical Approach S. Chand
Publishing
Reference Books: Avi Silberschatz, Henry F. Korth and S. Sudarshan Database System Concepts
McGraw-Hill Education
Online Resources: 1. NPTEL / SWAYAM lectures.
MGM Campus, N-6, CIDCO, Chhatrapati Sambhajinagar – 431003, Maharashtra, India. II mgmu.ac.in
27
Semester-II
Course code: MDS41MML505 Course name: Data Mining and Visualization
Course category: Major Mandatory Credits: 3
Pre-requisites: students should know the basic concepts of data Structure and analysis
Course Objectives: Students will be able to actively manage and participate in data mining projects. To
develop research interest towards advances in data mining. Students will be able to understand the
visualization techniques
Course Outcomes: At the end of the course, the students will be able to -
CO1: Identify appropriate data mining algorithms to solve real world problems.
CO2: Compare and evaluate different data mining techniques like classification, prediction, clustering and
association rule mining.
CO3: Describe complex data types with respect to spatial and Data Visualization.
CO4: Benefit the user experiences towards research and innovation. Integration in Data Mining area.
Contents –
Teaching
Unit Content
hours
Introduction to Data Mining:
Why Mine Data, Commercial Viewpoint, Scientific Viewpoint
Motivation, Definitions, Origins of Data Mining, Data Mining Tasks,
Classification, Clustering, Association Rule Discovery, Sequential Pattern
Discovery, Regression, Challenges of Data Mining, Data Mining Data:
1 10
What is Data? Attribute Values, Measurement of Length, Types and
Properties of Attributes, Discrete and Continuous Attributes, Types of data
sets, Data Quality, Data Preprocessing, Aggregation, Sampling,
Dimensionality Reduction, Feature subset selection, Feature creation,
Discretization and Binarization, Attribute Transformation, Density.
Data Mining:
Exploring Data, Data Exploration Techniques, Summary Statistics,
Frequency and Mode, Percentiles, Measures of Location: Mean and
2 Median, Measures of Spread: Range and Variance, Visualization, 10
Representation, Arrangement, Selection, Visualization Techniques:
Histograms, Box Plots, Scatter Plots, Contour Plots, Matrix Plots, Parallel
Coordinates, Other Visualization Techniques, OLAP : OLAP Operations
Data Mining Classification:
Basic Concepts, Decision Trees, and Model Evaluation: Classification:
Definition, Classification Techniques, Tree Induction, Measures of Node
3 10
Impurity, Practical Issues of Classification, ROC curve, Confidence
Interval for Accuracy, Comparing Performance of Two Models,
Comparing Performance of Two Algorithms.
Data Mining Classification:
Alternative Techniques: Rule‐Based Classifier, Rule Ordering Schemes,
4 Building Classification Rules, Instance‐Based Classifiers, Nearest 10
Neighbor Classifiers, Bayes Classifier, Naive Bayes Classifier, Artificial
Neural Networks (ANN), Support Vector Machines.
5 Introduction to Data Visualization: 5
MGM Campus, N-6, CIDCO, Chhatrapati Sambhajinagar – 431003, Maharashtra, India. II mgmu.ac.in
28
MGM Campus, N-6, CIDCO, Chhatrapati Sambhajinagar – 431003, Maharashtra, India. II mgmu.ac.in
29
Semester-II
Course code: MDS41MML506 Course name: Machine Learning
Course category: Major Mandatory Credits: 3
Pre-requisites: Basic knowledge about Data Mining and data warehousing
Course Objectives: To introduce students to the basic concepts and techniques of Machine Learning. To
become familiar with regression methods, classification methods, clustering methods.
Course Outcomes: At the end of the course, the students will be able to -
CO1: To become familiar with Dimensionality reduction Techniques.
CO2: Identify machine learning techniques suitable for a given problem
CO3: At the end of the course the students should be able to design and implement machine learning
solutions to classification, regression, and clustering problems; and be able to evaluate and interpret the
results of the algorithms.
CO4: Understand the Data Clustering
Contents –
Teaching
Unit Content
hours
Introduction to Machine Learning:
Intelligent Machine, Machine Learning Problem, Applications, Data
1 Representation, Domain Knowledge for Productive use of Machine 10
Learning, Diversity of Data: Structured and Unstructured, Forms of
Learning, Machine Learning and Data Mining.
Supervised Learning:
Learning from Observations, Bias and Variance, Computational Learning
2 Theory, Heuristic Search, Cross-Validation, Bootstrapping, Mean Square 10
Error, Mean Absolute Error, Misclassification Error, Confusion Matrix,
ROC Curves, Issues in Machine Learning.
Learning with SVM and NN:
Learning with Support Vector Machine (SVM): Linear Discriminant
Functions for Binary Classification, Perceptron Algorithm, Nonlinear
Classifier, Linear Regression and Nonlinear Regression, SVM
3 10
Techniques.
Learning with Neural Network (NN): Neuron Model, Biological Neuron,
Artificial Neuron, Network Architecture, Feedforward and Recurrent
Network, Perceptron.
Data Clustering:
Unsupervised Learning, Clustering, Data Analysis, Clustering Analysis,
4 Data Transformation, Enhancing the Information Content of the Data, 10
Partitional Clustering, Hierarchical Clustering, K-Means Clustering,
Fuzzy K-Means Clustering, Expectation Maximization.
Business Intelligence and Data Mining:
Basic Analytical Techniques, Data Warehousing, Intelligent Information
5 5
Retrieval System: Text Retrieval, Image Retrieval, Audio Retrieval, Data
Mining Applications, Data Mining Trends, Technologies for Big Data.
MGM Campus, N-6, CIDCO, Chhatrapati Sambhajinagar – 431003, Maharashtra, India. II mgmu.ac.in
30
Contents –
Teaching
Unit Content
hours
Introducing Web Analytics :
Defining Web Analytics, Quantitative and Qualitative Data, The
Continuous Improvement Process, Measuring Outcomes, What Google
Analytics Contributes, How Google Analytics Fits in the Analytics
Ecosystem, Creating an Implementation Plan: Gather Business
1 10
Requirements, Analyze and Document Website Architecture, Create an
Account and Configure Your Profile, Configure the Tracking Code and
Tag Pages, Tag Marketing Campaigns, Create Additional User Accounts
and Configure Reporting Features, Perform Optional Configuration
Steps.
How Google Analytics Works :
Data Collection and Processing, Reports, About the Tracking Code, The
Mobile Tracking Code, App Tracking, The (Very) Old Tracking Code:
2 10
urchin.js, Understanding Page views. Tracking Visitor Clicks, Outbound
Links, and Non-HTML Files: About the Tracking Cookies Designing
Blogs for Google Analytics
Google Analytics Accounts and Profiles:
Google Analytics Accounts, Creating a Google Analytics Account,
Creating Additional Profiles, Access Levels, All About Profiles, Basic
Profile Settings, Profile Name, Website URL, Time Zone, Default Page,
3 Exclude URL Query Parameters, E-Commerce Settings, Tracking On-Site 10
Search, Applying Cost Data.
Filters: Filter Fields, Filter Patterns, Filter Type, Include/Exclude Filters,
Search and Replace Filters, Lowercase/Uppercase Filters, Advanced
Profile Filters, Predefined Filters.
Tracking Conversions with Goals and Funnels:
4 Goals, Time on Site, Pages per Visit, URL Destinations, Additional Goal 10
Settings, Tracking Defined Processes with Funnels.
Must-Have Profiles:
Profile Roles, Raw Data Profile, Master Profile, Test Profile, Access-
5 Based Profiles, Using Profiles to Segment Data, Exclude Internal Traffic, 5
Include Valid Traffic, Force Request URI to Lowercase, Force Campaign
Parameters to Lowercase, Keeping Track of Your Configuration Changes.
MGM Campus, N-6, CIDCO, Chhatrapati Sambhajinagar – 431003, Maharashtra, India. II mgmu.ac.in
31
Semester-II
Course code: MDS41MEL506 Course name: Neural Networks
Course category: Elective Credits: 3
Pre-requisites: Introduction to Machine Learning
Course Objectives: To study learning and modeling of the algorithms of Neural Networks.
Course Outcomes: At the end of the course, the students will be able to -
CO1: To Learn to Feed Forward Neural Networks
CO2: Learn working of Supervised Learning
CO3: Understand the modeling and applications of Neural Networks
CO4: Algorithms used in Neural Networks
Contents –
Teaching
Unit Content
hours
Introduction to Feedforward Neural Networks:
Artificial Neurons, Neural Networks and Architectures: Neuron
1 Abstraction, Neuron Signal Functions, Mathematical Preliminaries, 10
Neural Networks Defined, Architectures: Feed forward and Feedback,
Salient Properties and Application Domains of Neural Network.
Geometry of Binary Threshold Neurons and Their Network:
Patterns Recognition and Data Classification, Convex Sets, Convex Hulls
and Linear Separability, Space of Boolean Functions, Binary Neurons are
2 10
pattern Dichotomizes, Non‐linearly separable Problems, Capacity of a
simple Threshold Logic Neuron, Revisiting the XOR Problem, Multilayer
Networks.
Supervised Learning:
Supervised Learning I: Perceptrons and LMS: Learning and Memory,
From Synapses to Behaviour: The Case of Aplysia, Learning Algorithms,
3 10
Error Correction and Gradient Descent Rules, The Learning Objective for
TLNs, Pattern space and Weight Space, Perceptron Learning Algorithm,
Perceptron Convergence Theorem
Perceptron Learning:
Perceptron learning and Non‐separable Sets, Handling Linearly Non‐
Separable sets, α‐Least Mean Square Learning, MSE Error Surface and its
4 10
Geometry, Steepest Descent Search with Exact Gradient Information, μ‐
LMS: Approximate Gradient Descent, Application of LMS to Noise
Cancellation.
Supervised Learning II:
Backpropagation and Beyond: Multilayered Network Architectures,
Backpropsagation Learning Algorithm, Structure Growing Algorithms,
5 5
Fast Relatives of Backpropagation, Universal Function Approximation
and Neural Networks, Applications of Feedforward Neural Networks,
Reinforcement Learning
Text Books: Satish Kumar Neural Network‐ A Classroom Approach. Tata McGraw Hill
Reference Books: Sivanandam, S Sumathi, S N Deepa Introduction to neural networks using MATLAB
6.0 TATA McGraw HILL
Online Resources: 1. NPTEL / SWAYAM lectures.
MGM Campus, N-6, CIDCO, Chhatrapati Sambhajinagar – 431003, Maharashtra, India. II mgmu.ac.in
32
Semester-II
Course code: MDS41MEL507 Course name: Optimization Techniques
Course category: Elective Credits: 3
Pre-requisites: Basics of statistics
Course Objectives: Introduction to optimization techniques using both linear and non-linear
programming.
Course Outcomes: At the end of the course, the students will be able to -
CO1: - To Learn Linear algebra
CO2: To Learn Linear Programming
CO3: Cast problems into optimization framework
CO4: Learn efficient computational procedures to solve optimization problems.
Contents –
Teaching
Unit Content
hours
Mathematical preliminaries:
1 Linear algebra and matrices, Vector space, eigen analysis, Elements of 10
probability theory, Elementary multivariable calculus.
Linear Programming:
2 Introduction to linear programming model, Simplex method, Duality, 10
Karmarkar's method.
Unconstrained optimization:
3 Conjugate direction and quasi-Newton methods, Gradient-based methods, 10
One-dimensional search methods
Constrained Optimization:
4 Constrained Optimization Lagrange theorem, FONC, SONC, and SOSC 10
conditions, Projection methods
KKT:
5 KKT conditions, Non-linear constrained optimization models, Non-linear 5
problems
MGM Campus, N-6, CIDCO, Chhatrapati Sambhajinagar – 431003, Maharashtra, India. II mgmu.ac.in
33
Semester-II
Course code: MDS41MEL508 Course name: Business Intelligence
Course category: Elective Credits: 3
Pre-requisites: Introduction to Analytics techniques
Course Objectives: This course provides an introduction to the concepts of business intelligence (BI) as
components and functionality of information systems.
Course Outcomes: At the end of the course, the students will be able to -
CO1: To learn organization's business operations through the use of relevant data.
CO2: It explores how business problems can be solved effectively by using operational data to create data
warehouses
CO3: Applying data mining tools and analytics to gain new insights into organizational operations.
CO4: Understand decision support systems and knowledge management systems
Contents –
Teaching
Unit Content
hours
Business intelligence:
Effective and timely decisions, Data, information and knowledge, The role
of mathematical models, Business intelligence architectures, Ethics and
1 business intelligence Decision support systems: Definition of system, 10
Representation of the decision-making process, Evolution of information
systems, Definition of decision support system, Development of a decision
support system
Mathematical models for decision making:
Structure of mathematical models, Development of a model, Classes of
2 models Data mining: Definition of data mining, Representation of input 10
data , Data mining process, Analysis methodologies Data preparation:
Data validation, Data transformation, Data reduction
Classification:
Classification problems, Evaluation of classification models, Bayesian
3 methods, Logistic regression, Neural networks, Support vector machines 10
Clustering: Clustering methods, Partition methods, Hierarchical methods,
Evaluation of clustering models
Business intelligence applications:
Marketing models: Relational marketing, Sales force management,
Logistic and production models: Supply chain optimization, Optimization
4 10
models for logistics planning, Revenue management systems. Data
envelopment analysis: Efficiency measures, Efficient frontier, The CCR
model, Identification of good operating practices
Knowledge Management:
Introduction to Knowledge Management, Organizational Learning and
Transformation, Knowledge Management Activities, Approaches to
5 5
Knowledge Management, Information Technology (IT) In Knowledge
Management, Knowledge Management Systems Implementation, Roles of
People in Knowledge Management
Text Books: Carlo Vercellis Business Intelligence: Data Mining and Optimization for Decision Making
Wiley First 2009
Reference Books: Efraim Turban, Ramesh Sharda, Dursun Delen Decision support and Business
Intelligence Systems Pearson
Online Resources: 1. NPTEL / SWAYAM lectures.
MGM Campus, N-6, CIDCO, Chhatrapati Sambhajinagar – 431003, Maharashtra, India. II mgmu.ac.in
34
Semester-II
Course code: MDS41MMP504 Course name: Practical Based on ADBMS
Course category: Major Mandatory Credits: 1
Course Objectives: Students can design new database and modify existing ones for new or existing
applications
Course Outcomes: At the end of the course, the students will be able to -
CO1: To know the different issues involved in the design and implementation of a database system.
CO2: Can use data manipulation language to query, update, and manage a database.
CO3: Essential DBMS concepts such as: database security, integrity, concurrency, etc.
CO4: Understand Advanced Application Development
Contents –
Teaching
Unit Content
hours
1 Create Student Database 1
2 Create Student Detail and Marks Table 1
3 Insert Records to Student Detail and Marks Table 1
4 Display all records of Student Details and Marks Table at once 1
5 Index on table Student Details with the name and date of birth 1
Text Books: Rajiv Chopra Database Management System (DBMS)A Practical Approach S. Chand
Publishing
Reference Books: Avi Silberschatz, Henry F. Korth and S. Sudarshan Database System Concepts
McGraw-Hill Education
Online Resources: 1. NPTEL / SWAYAM lectures.
MGM Campus, N-6, CIDCO, Chhatrapati Sambhajinagar – 431003, Maharashtra, India. II mgmu.ac.in
35
Semester-II
Course code: MDS41MMP505 Course name: Practical Based on Data Mining and Visualization
Course category: Major Mandatory Credits: 1
Pre-requisites: Basics concepts of Data Structure
Course Objectives: Students will be able to actively manage and participate in data mining projects. To
develop research interest towards advances in data mining. Students will be able to understand the
visualization techniques
Course Outcomes: At the end of the course, the students will be able to -
CO1: Identify appropriate data mining algorithms to solve real world problems.
CO2: Compare and evaluate different data mining techniques like classification, prediction, clustering and
association rule mining.
CO3: Describe complex data types with respect to spatial and Data Visualization.
CO4: Benefit the user experiences towards research and innovation. Integration in Data Mining area.
Contents –
Teaching
Unit Content
hours
1 Demonstration of preprocessing on dataset student.arff 1
MGM Campus, N-6, CIDCO, Chhatrapati Sambhajinagar – 431003, Maharashtra, India. II mgmu.ac.in
36
Semester-II
Course code: MDS41MMP506 Course name: Practical Based on Machine Learning
Course category: Major Mandatory Credits: 1
Course Objectives: Students can design new database and modify existing ones for new or existing
applications
Course Outcomes: At the end of the course, the students will be able to -
CO1: To become familiar with Dimensionality reduction Techniques.
CO2: Identify machine learning techniques suitable for a given problem
CO3: At the end of the course the students should be able to design and implement machine learning
solutions to classification, regression, and clustering problems; and be able to evaluate and interpret the
results of the algorithms.
CO4: Understand the Data Clustering
Contents –
Teaching
Unit Content
hours
Implement and demonstrate the FIND-S algorithm for finding the most specific
1 hypothesis based on a given set of training data samples. Read the training data from 1
a .CSV file.
For a given set of training data examples stored in a .CSV file, implement and 1
2 demonstrate the Candidate-Elimination algorithm to output a description of the set of
all hypotheses consistent with the training examples.
Write a program to demonstrate the working of the decision tree based ID3 1
algorithm.
3
Use an appropriate data set for building the decision tree and apply this knowledge
to classify a new sample.
Build an Artificial Neural Network by implementing the Backpropagation algorithm 1
4
and test the same using appropriate data sets.
Write a program to implement the naïve Bayesian classifier for a sample training 1
5 data set stored as a .CSV file. Compute the accuracy of the classifier, considering
few test datasets
Assuming a set of documents that need to be classified, use the naïve Bayesian 1
6 Classifier model to perform this task. Built-in Java classes/API can be used to write
the program. Calculate the accuracy, precision, and recall for your data set.
Write a program to construct a Bayesian network considering medical data. Use this 1
7 model to demonstrate the diagnosis of heart patients using standard Heart Disease
Data Set. You can use Java/Python ML library classes/API.
Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data 1
set for clustering using k-Means algorithm. Compare the results of these two
8
algorithms and comment on the quality of clustering. You can add Java/Python ML
library classes/API in the program.
Write a program to implement k-Nearest Neighbour algorithm to classify the iris 1
9 data set. Print both correct and wrong predictions. Java/Python ML library classes
can be used for this problem.
Implement the non-parametric Locally Weighted Regression algorithm in order to fit 1
10
data points. Select appropriate data set for your experiment and drawgraphs.
MGM Campus, N-6, CIDCO, Chhatrapati Sambhajinagar – 431003, Maharashtra, India. II mgmu.ac.in
37
Contents –
Teaching
Unit Content
hours
1 Create Google Analytics Account 1
MGM Campus, N-6, CIDCO, Chhatrapati Sambhajinagar – 431003, Maharashtra, India. II mgmu.ac.in
38
Contents –
Teaching
Unit Content
hours
1 To study about MATLAB. 1
5 How the weight & bias value effects the output of neurons. 1
How the choice of activation function effect the output of neuron experiment with 1
6 the following function purelin(n), bimary threshold(hardlim(n)
haradlims(n)) ,Tansig(n) logsig(n)
How the weight and biased value are able to represent a decision boundary in the 1
7 feature space.
8 How the Perceptron Learning rule works for Linearly Separable Problem. 1
9 How the Perceptron Learning rule works for Non-Linearly Separable Problem. 1
Text Books: Satish Kumar Neural Network‐ A Classroom Approach. Tata McGraw Hill
Reference Books: Sivanandam, S Sumathi, S N Deepa Introduction to neural networks using MATLAB
6.0 TATA McGraw HILL
Online Resources: 1. NPTEL / SWAYAM lectures.
MGM Campus, N-6, CIDCO, Chhatrapati Sambhajinagar – 431003, Maharashtra, India. II mgmu.ac.in
39
Contents –
Teaching
Unit Content
hours
1 Installation and configuration of Matlab 1
2 Introduction to MATLAB 1
MGM Campus, N-6, CIDCO, Chhatrapati Sambhajinagar – 431003, Maharashtra, India. II mgmu.ac.in
40
Contents –
Teaching
Unit Content
hours
Import the legacy data from different sources such as (Excel , SqlServer,
1 Oracle etc.) and load in the target system. (You can download sample 1
database such as Adventureworks, Northwind, foodmart etc.)
Create the cube with suitable dimension and fact tables based on ROLAP, 1
4 MOLAP and HOLAP model.
5 Create the ETL map and setup the schedule for execution. 1
6 Execute the MDX queries to extract the data from the datawarehouse. 1
Import the datawarehouse data in Microsoft Excel and create the Pivot table and 1
7 Pivot Chart.
Import the cube in Microsoft Excel and create the Pivot table and Pivot Chart to 1
8 perform data analysis
Apply the what – if Analysis for data visualization. Design and generate 1
9 necessary reports based on the data warehouse data
Text Books: Carlo Vercellis Business Intelligence: Data Mining and Optimization for Decision Making
Wiley
Reference Books Efraim Turban, Ramesh Sharda, Dursun Delen Decision support and Business
Intelligence Systems Pearson
Online Resources: 1. NPTEL / SWAYAM lectures.
MGM Campus, N-6, CIDCO, Chhatrapati Sambhajinagar – 431003, Maharashtra, India. II mgmu.ac.in