0% found this document useful (0 votes)
42 views

Syllabus New Data Science (See Page 32)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Syllabus New Data Science (See Page 32)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Rayat Shikshan Sanstha’s

Karmaveer Bhaurao Patil College, Vashi


Autonomous
Affiliated to University of Mumbai
Syllabus
Sr. No. Heading Particulars

1 Title of Course Master in Data Science

Students with a Bachelor's degree in


Mathematics / Statistics / Computer Science
/Computer Application/ Information
Technology / Physics/B.E. in Computer
2 Eligibility for Admission
Science / Information Technology from a
recognized university with a minimum
aggregate score of 50% or higher are
eligible for this course.

3 Passing marks 40%

4 Ordinances/Regulations (if any)

Course Duration of Master of Science


5 Duration [M.Sc] (Data Science) is 2 Years.

6 Level P.G.

Choice Based Credit, Grading and


7 Pattern
Semester

8 Status New

9 To be implemented from 2021-2022


Academic year
AC -
Item No -

Rayat Shikshan Sanstha’s


KARMAVEER BHAURAO PATIL COLLEGE, VASHI.
NAVI MUMBAI
(AUTONOMOUS COLLEGE)
Sector-15- A, Vashi, Navi Mumbai - 400 703

Syllabus for M.Sc. in Data Science

Program: M.Sc. Data Science

Course:M.Sc. Data Science


(Choice Based Credit, Grading and Semester System
with effect from the academic year 2021‐22)
Preamble

This syllabus is an honest attempt to include following ideas, among other things, into practice:
● Create a unique identity for MSc in Data Science distinct from similar degrees in other
related subjects.
● Recommend provision for specialization in Data Science.
● Specialized knowledge of the central concepts, theories, and research methods of data
science as well as applied skills.
● Specialized knowledge of computer science theories, methods, practices and strategy.
● Understanding of statistical, mathematical concepts in the context of data science.
● Understanding of various analysis tools and software used in data science.
● Awareness of rapid technological changes.
● Analytical and critical thinking skills.
● Written and oral communication skills, including presentations and report writing.

M.Sc in Data Science is a postgraduate course that comes under the set of science as a major
field of study. The duration of the course is 2 years which is equally divided into 4 semesters.In
the third semester,one of the course is an internship. M.Sc Data Science course syllabus is
designed in a manner that covers all the aspects of Data Science.
The syllabus proposes to have three core compulsory courses, one Skill Enhancement Course
and one Discipline Specific Elective course in semester I. Semester II also proposes three core
compulsory courses, one Skill Enhancement Course and one Discipline Specific Elective
course.
The course gives insights into the practical and theoretical aspects of data science, Big data
analytics, Business Analytics, Real-Time Processing, Neural Networks, Artificial Intelligence,
and Machine Learning. The primary focus of the course is to equip the candidates of the course
with principal concepts of data science and application of the same in real-time processing and
applications.

Data science combines the knowledge of mathematics, computer science and statistics to solve
exciting data-intensive problems in industry and in many fields of science. As data is collected
and analysed in all areas of society, demand for professional data scientists is high and will grow
higher.
We thank all the industry experts, senior faculties and our colleagues of different colleges as well
as BOS members who have given their valuable comments and suggestions, which we tried to
incorporate.

DEPARTMENT VISION

To become a Center of excellence offering quality education and innovation in Computer


Science and Information Technology.

DEPARTMENT MISSION

1. To prepare the students to excel in the field of Data Science and IT industry
2. To prepare the students to pursue higher studies and develop sustainable innovative
solutions for the society.
Choice Based Credit Semester System
Academic year 2021-2022
SEMESTER - I

SCHEME OF SCHEME OF
CODE COURSE TYPE SUBJECT INSTRUCTION EXAMINATION NO. OF
(PERIOD PER CREDITS
WEEK) (MAX MARKS)
TH LAB CIA SEE TOTAL
Advance Database
PGDS101 CORE
Technologies 4 - 40 60 100 4
Descriptive Statistics
PGDS102 CORE and
Probability 4 - 40 60 100 4
PGDS103 CORE Applied Linear Algebra 4 - 40 60 100 4
Skill Enhancement Data Visualization
PGDS104
Elective-I using R 3 - 40 60 100 3
Discipline Specific Data Warehousing &
PGDS105 Elective-I
Mining 4 - 40 60 100 4
OR
Discipline Specific Data Structure with
PGDS106
Elective-I Python 4 - 40 60 100 4
PGDSP101 Core Subject Practical PGDS101
- 4 50 2
PGDSP102 Core Subject Practical PGDS102 - 4 50 2
PGDSP103 Core Subject Practical PGDS103 - 4 50 2
Skill Enhancement
PGDSP104 PGDS104
Practical - 2 50 1
Discipline Specific
PGDSP105 Elective-I Practical PGDS105
OR - 4 50 2
Discipline Specific
PGDSP106 Elective-I PGDS106
Practical - 4 50 2
TOTAL 750 28
SEMESTER - II

SCHEME OF SCHEME OF
CODE COURSE TYPE SUBJECT INSTRUCTION EXAMINATION NO. OF
(PERIOD PER CREDITS
WEEK) (MAX MARKS)
TH LAB CIA SEE TOTAL
PGDS201 CORE Research in Computing 4 - 40 60 100 4
PGDS202 CORE Analysis of Algorithm 4 - 40 60 100 4
Statistical Inference
PGDS203 CORE
4 - 40 60 100 4

Skill Enhancement Advanced Python


PGDS204
Elective-I Programming
2 - 40 60 100 2
Discipline Specific
PGDS205 Elective-I Big Data Analytics
OR 4 - 40 60 100 4
Discipline Specific Optimization
PGDS206
Elective-I Techniques 4 - 40 60 100 4
PGDSP201 Core Subject Practical PGDS201
- 4 50 2
PGDSP202 Core Subject Practical PGDS202 - 4 50 2
PGDSP203 Core Subject Practical PGDS203 - 4 50 2
Skill Enhancement
PGDSP204 PGDS204
Practical - 4 50 2
Discipline Specific
PGDSP205 PGDS205
Elective-I Practical - 4 50 2
Discipline Specific
PGDSP206 Elective-I PGDS206
Practical - 4 50 2
TOTAL 750 28

Note: TH-Theory, CIA- Continuous Internal Assessment, SEE-Semester End Examination.


Semester I – Theory

Class: M.Sc Branch: Data Science Semester: I


Subject: Advanced Database Technologies
Period per Week(Each 60 Lecture 04
mins) Practical 04
Hours Marks
Semester End Exam 2 hrs.30min 60
Continuous Internal Assessment __ 40
Evaluation System
Semester End Practical
2 hrs. 50
Examination
Total __ 150

Course: Advanced Database Technologies Lectures


PGDS101 (Credits : 4 Lectures/Week: 4)
Expected Learning Outcomes:
After successful completion of this course, students would be able to
1 .Recall the concept of Database Systems, Relational Databases ,Structure of
Relational Databases & Relational Algebra.
2.Describe the Object Databases Systems ,Design the E-R model
,Normalization process.
3. Illustrate the NOSQL concept with the NOSQL database.
4. Explain Data Modeling With Graph (NEeo4j), Key-Value Databases (Riak),
Column-Family stores (Cassandra).
Introduction: Purpose of Database Systems, View of Data:Data Abstraction,Instance
and Schemas, Relational Databases: Tables, DML, DDL, Data storage and querying:
Unit I Storage Manager, The query processor , Database Architecture , Speciality Databases 8
Introduction to Relational Model :
Structure of Relational Databases ,Database Schema , Keys ,Relational Algebra
Database Design and E-R model :Overview of the Design process and Entity
Relationship ,Functional Dependency, Anomalies in a Databases,Normalization
process: Conversion to first normal form, Conversion to second normal form,
Conversion to third normal form, The Boyce-Codd Normal Form (BCNF), Fourth
Unit II Normal form and fifth normal form Denormalization,
10
Object Databases Systems: Overview of Object -Oriented concepts & characteristics
Objects,,OIDs and reference types, Database design for ORDBMS,Comparing
RDBMS, OODBMS & ORDBMS
Introduction to NOSQL (Core concepts):
Why NoSQL,Brief History of NoSQL Databases ,Features of NoSQL,Types of NoSQL
Databases,CAP Theorem,Aggregate Data Models, Data modeling details ,Distribution
Models, Consistency ,Version stamps, Map-Reduce
Unit III Implementation with NOSQL databases: 15
Document Databases (Mongodb)
MongoDB Features, MongoDB Example,Key Components of MongoDB
Architecture,Why Use MongoDB,Data Modelling in MongoDB,Difference between
MongoDB & RDBMS
Data Modeling With Graph (NEeo4j):
Comparison of Relational and Graph Modeling, Property Graph Model Graph Analytics:
Link analysis algorithm- Web as a graph, Page Rank- Markov chain, page rank
computation, Topic specific page rank (Page Ranking Computation techniques: iterative
processing, Random walk distribution Querying Graphs: Introduction to Cypher, case
study: Building a Graph Database Application- community detection.
Unit IV Key-Value Databases (Riak): 15
From array to key value databases, Essential features of key value Databases, Properties
of keys, Characteristics of Values, Key-Value Database Data Modeling Terms, Key-Value
Architecture and implementation Terms, Designing Structured Values, Limitations of
KeyValue Databases, Design Patterns for Key-Value Databases, Case Study: Key-Value
Databases for Mobile Application Configuration
Column-Family stores (Cassandra)
Data warehousing schemas: Comparison of columnar and row-oriented storage,
Column-store Architectures: C-Store and Vector-Wise, Column-store internals and,
Unit V Inserts/updates/deletes, Indexing, Adaptive Indexing and Database Cracking 12
Advanced techniques: Vectorized Processing, Compression, Write penalty, Operating
Directly on Compressed Data Late Materialization Joins , Group-by, Aggregation and
Arithmetic Operations, Case Studies
Text book:
● NoSQL Distilled Pramod Sadalge, Martin Fowler
● Next Generation database: NoSQL and big data by Guy Harrison
Reference
● NoSQL for Dummies A Willy Brand

Links:
● https://hostingdata.co.uk/nosql-database/
● https://www.guru99.com/what-is-mongodb.html

Sr. No. Practicals of PGDSP101


1 Practical on Relational Algebra, SQL Commands,Normalization
2 How to Download & Install MongoDB on Windows

3 Hello World MongoDB: JavaScript Driver

4 ● Install Python Driver


● Install Ruby Driver

5 Install MongoDB Compass- MongoDB Management Tool


MongoDB Configuration, Import, and Export

6 Download a zip code dataset at http://media.mongodb.org/zips.json .Use mongo import


to import the zip code dataset into MongoDB. After importing the data, answer the
following questions by using aggregation pipelines: (1) Find all the states that have a city
called ”BOSTON”. Find all the states and cities whose names include the string ”BOST”.
Each city has several zip codes. Find the city in each state with the most number of zip
codes and rank those cities along with the states using the city populations. MongoDB
can query on spatial information.

7 Master Data Management using Neo4j Manage your master data more effectively The
world of master data is changing. Data architects and application developers are
swapping their relational databases with graph databases to store their master data. This
switch enables them to use a data store optimized to discover new insights in existing
data,provide 360-degree view of master data and answer questions about data
relationships in real time

8 Create a database that stores road cars. Cars have a manufacturer ,a type. Each car has a
maximum performance and a maximum torque value. Do the following: Test Cassandras
replication schema and consistency models.

9 Case Study

Class: M.Sc-I Branch: Data Science Semester: I


Subject: Descriptive Statistics and Probability

Period per Week(Each 60 Lecture 04


mins) Practical 04
Hours Marks
Semester End Exam 2 hrs.30min 60
Evaluation System
Continuous Internal Assessment __ 40
Semester End Practical Examination 2 hrs. 50
Total __ 150

Course: Descriptive Statistics and Probability Lectures


PGDS102 (Credits : 4 Lectures/Week: 4)
Expected Learning Outcomes:
After successful completion of this course, students would be able to
1. Describe the data and its properties by use of central tendency and
variability.
2. Explain the concepts of probability and its distributions.
3. Apply sampling distributions to contribute to the process of making
rational decisions in analytical problems
4. Analyze the relationship between two quantitative variables using
Correlation and Regression
Descriptive Statistics and Introduction to Probability:
Measures of Central Tendency: Mean, Median, Mode
Partition Values: Quartiles, Percentiles, Box Plot
Measures of Dispersion: Variance, Standard Deviation, Coefficient of variation
Skewness: Concept of skewness, measures of skewness
Kurtosis: Concept of Kurtosis, Measures of Kurtosis.
Probability - classical definition, probability models, axioms of probability,
Unit I probability of an event.
15 L
Concepts and definitions of conditional probability,
multiplication theorem P(A∩B) =P(A).P(B|A) Bayes’
theorem (without proof)
Concept of Posterior probability, problems on posterior probability.
Definition of sensitivity of a procedure, specificity of a procedure. Application of
Bayes’ theorem to design a procedure for false positive and false negative.
Concept and definition of independence of two events.
Numerical problems related to real life situations.
Introduction to Random Variables
Definition of discrete random and continuous random variable. Concept of
Unit II Discrete and Continuous probability distributions. (p.m.f. and p.d.f.). 15 L
Distribution function, Expectation and variance, Numerical problems related to
real life situations.
Special Distributions
Binomial Distribution, Uniform Distribution, Poisson Distribution, Negative
Binomial Distribution, Geometric Distribution, Continuous Uniform,
Unit III Distribution, Exponential Distribution, Normal Distribution, Log Normal 15 L
Distribution, Gamma Distribution, Weibull Distribution, Pareto Distribution.
(For all the probability distributions its pmf/pdf, p-p plot, q-q plot, generation of
probabilities and random samples using R software is expected. )
Correlation and Regression
Bivariate data, Scatter diagram. Correlation, Positive
Correlation, Negative correlation, Zero Correlation, Karl
Pearson's coefficient of correlation (r), limits of r (-1 ≤r
≤1), interpretation of r, Coefficient of determination (r2),
Meaning of regression, difference between correlation and
regression. Fitting of line Y = a+bX, Concept of residual
plot and mean residual sum of squares. Multiple
correlation coefficient, concept, definition, computation
Unit IV and interpretation. Partial correlation coefficient, 15 L
concept, definition, computation and interpretation.
Multiple regression plane. Identification and solution to
Multicollinearity. Evaluation of the Model using R square
and Adjusted R square.
Introduction to logistic regression, Difference between linear and logistic
regression, Logistic equation, How to build logistic regression model in R, Odds
ratio in logistic regression.
All topics to be covered for raw data using R software. Manual calculations are
not expected.
Text book:
● Fundamentals of Applied Statistics (3rd Edition), Gupta and Kapoor, S.Chand and Sons,
New Delhi, 1987.
● An Introductory Statistics, Kennedy and Gentle.
Reference
1. Statistical Methods, G.W. Snedecor, W.G. Cochran, John Wiley & sons, 1989.
2. Introduction to Linear Regression Analysis, Douglas C. Montgomery, Elizabeth A. Peck,
G. Geoffrey Vining, Wiley.
3. Modern Elementary Statistics, Freund J.E., Pearson Publication, 2005.
4. Probability, Statistics, Design of Experiments and Queuing theory with applications
Computer Science, Trivedi K.S., Prentice Hall of India, New Delhi,2001.
5. A First course in Probability 6th Edition, Ross, Pearson Publication, 2006.
6. Introduction to Discrete Probability and Probability Distributions, Kulkarni M.B.,
Ghatpande S.B., SIPF Academy, 2007.
7. A Beginners Guide to R, Alain Zuur, Elena Leno, Erik Meesters, Springer, 2009.
8. Statistics Using R, Sudha Purohit, S.D.Gore, Shailaja Deshmukh, Narosa, Publishing
Company
Links:
● https://www.dcpehvpm.org/E-Content/Stat/FUNDAMENTAL%20OF%20MATHEMATI
CAL%20STATISTICS-S%20C%20GUPTA%20&%20V%20K%20KAPOOR.pdf
● https://www.mathsisfun.com/data/random-variables.html

Sr. No. Practicals of PGDSP102


1 Introduction to R-studio, mathematical and logical operators in R, Data types and data
structures, simple operations and programs, matrix operations

2 Data frames, string operations, factors, handling categorical data, lists and list

3 Operations Loops and conditional statements, switch and break function

4 Apply functions, Statistical problem solving in R,

5 Visualizations in R – 1

6 Visualizations in R – 2

7 Spatial Data Representation and Graph Analysis.

8 Hands-on data manipulations1: cleaning, sub-setting, sampling, data transformations and


allied data operations

9 Hands-on data manipulations2: cleaning, sub-setting, sampling, data transformations and


allied data operations

10 Case Study

Class: M.Sc Branch: Data Science Semester: I


Subject: Applied Linear Algebra
Period per Week(Each 60 min)Lecture 04
Practical 04
Hours Marks
Semester End Exam 2 hrs.30min 60
Evaluation System Continuous Internal Assessment __ 40
Semester End Practical Examination 2 hrs. 50
Total __ 150
Course: Applied Linear Algebra Lecture
PGDS103 (Credits : 3 Lectures/Week: 4) s
Expected Learning Outcomes:
After successful completion of this course, students would be able to
1. Describe the concept of characteristic polynomial, eigenvalues and
eigenvectors.
2. Recognize and use equivalent forms to identify matrices and solve linear
systems of equations.
3. Explain how orthogonal projections relate to least square approximations.
4. Acquire the knowledge of various concepts in Applied Algebra.
5. Employ Python to perform various matrix and vector computations.
Matrices
Matrices: Introduction to Matrices, Zero and identity Matrices, Transpose,
Unit I addition and Matrix Multiplication, Geometric Transformation, Linear and
15 L
Orthogonal Transformations Rank of matrix, normal form, Consistency, System of
Linear Equations, Eigenvalues and eigenvectors.
Vectors
Unit II Vector: Vector addition, Scalar Vector multiplication, unit vector, norm of vector.
15 L
Linear Functions, Linear Combinations, Linearly dependent and independence,
Basis.
Inner Product Space :
Inner Product Spaces, Norms and Distance: Orthogonality Inner products,
Unit III 15 L
Cauchy-Schwarz inequality, Orthogonal projections, Gram-Schmidt
orthogonalization, Matrix representation of inner product.
Least Squares
Least Squares: Least Squares Problem, Solution, Solving Least Squares Problems,
Examples.
Least squares data fitting: Least Squares data fitting, Validation, Feature
Engineering.
Least Squares Classification: Classification, Least Squares Classifier,
Multiclassifiers 15 L
Unit IV
Multi Objective Least Squares: Multi Objective Least Squares, Control,
Estimation and Inversion, Regularised data fitting, Complexity Constrained Least
Squares: Constrained Least Squares problem, Solution, Solving constrained Least
Squares problems.
Constrained Least Squares Applications: Portfolio Optimization, Linear Quadratic
control, Linear Quadratic State Estimation.
Textbooks:
1. Advanced Engineering Mathematics by Erwin Kreyszig (Wiley Eastern Ltd.)
2. Introduction to Applied Linear Algebra Vectors, Matrices and Least Squares by
Stephen Boyd (Stanford University) and Lieven Vandenberghe (University of
California, Los Angeles) Cambridge University Press.

References:
1. Least Squares Regression Analysis in Terms of Linear Algebra By Enders A. Robinson
2. Kenneth H. Rosen's Discrete Mathematics and Its Applications with Combinatorics and Graph
Theory 7th Edition(McGraw-Hill Education)
3. Higher Engineering Mathematics by B. S. Grewal (Khanna Publication, Delhi) Reference

Links :
● https://www.google.co.in/books/edition/Introduction_to_Applied_Linear_Algebra/IApaD
wAAQBAJ?hl=en&gbpv=1&dq=Least+Squares+for+algebra&printsec=frontcover

Sr. No. Practicals of PGDSP103


1 Introduction to numpy and sympy.
2 Write a program to do the following:
1. Enter a vector u as a n-list
1. Enter another vector v as a n-list
2. Find the vector addition
3. Find the scalar vector multiplication
3 Write a program to do the following:
1. Enter a vector u as a n-list
2. Enter another vector v as a n-list
3. Find the linear Independence & Dependance of vectors
4 Write a program to find the inner product of two vectors.
5 Write a program on The K means algorithm
6 Write a program to do the following:
1. Enter a vector b and find the projection of b orthogonal to a given vector u.
2. Find the projection of b orthogonal to a set of given vectors
7 Write a program to do the following:
1. Enter an r by c matrix M (r and c being positive integers)
2. Display M in matrix format
3. Display the rows and columns of the matrix M
4. Find the scalar multiplication of M for a given scalar.
5. Find the transpose of the matrix M.
8 Write a program to Find the vector –matrix multiplication of a r by c matrix M with an
c-vector u
9 Write a program to enter a matrix and check if it is invertible. If the inverse exists, find the
inverse.
10 Write a program to solve system of linear equation
Class: M.Sc Branch: Data Science Semester: I
Subject: Data Visualization using R
Period per Week(Each 60 min)Lecture 03
Practical 01
Hours Marks
Semester End Exam 2 hrs.30min 60
Evaluation System Continuous Internal Assessment __ 40
Semester End Practical Examination 2 hrs. 50
Total __ 150

Course: Data Visualization using R Lecture


PGDS104 (Credits : 4 Lectures/Week: 3) s
Expected Learning Outcomes:
After successful completion of this course, students would be able to
1. Explain basic programming language concepts using R
2. Differentiate between different R data structures such as: string, number, vector,
matrix, data frame, factor, date and time object
3. Collect detailed information raw data using R profiler
4. Visualize your data using base R graphics
Overview of R :History and Overview of R- Basic Features of R-Design of the R
System- Installation of R- Console and Editor Panes- Comments- Installing and
Loading R Packages- Help Files and Function DocumentationSaving Work and
Unit I Exiting R- Conventions- R for Basic Math- Arithmetic- Logarithms and
15 L
ExponentialsE-Notation- Assigning Objects- Vectors- Creating a Vector-
Sequences, Repetition, Sorting, and Lengths- Subsetting and Element Extraction-
Vector-Oriented Behaviour
Matrices And Arrays: Defining a Matrix – Defining a Matrix- Filling Direction-
Row and Column Bindings- Matrix DimensionsSubsetting- Row, Column, and
Diagonal Extractions- Omitting and Overwriting- Matrix Operations and Algebra-
Unit-II 15 L
Matrix Transpose- Identity Matrix- Matrix Addition and Subtraction- Matrix
MultiplicationMatrix Inversion-Multidimensional Arrays- Subsets, Extractions,
and Replacements
Non-numeric Values :Logical Values- Relational Operators- Characters- Creating
Unit-III a String- Concatenation- Escape SequencesSubstrings and Matching- Factors- 15 L
Identifying Categories- Defining and Ordering Levels- Combining and Cutting
Lists And Data Frames:Lists of Objects-Component Access-Naming-Nesting-Data
Frames-Adding Data Columns and Combining Data Frames-Logical Record
Subsets-SomeSpecial,Values-Infinity-NaN-NA-NULLAttributes-Object-Class-Is-
Dot Object-Checking Functions-As-Dot Coercion Functions
Basic Plotting:Using plot with Coordinate Vectors-Graphical
Parameters-Automatic Plot Types-Title and Axis LabelsColor-Line and Point
Appearances-Plotting Region Limits-Adding Points, Lines, and Text to an Existing
Unit-IV Plot-ggplot2 Package-Quick Plot with ggplot-Setting Appearance Constants with 15 L
Geoms-- READING AND WRITING FILES- R-Ready Data Sets- Contributed
Data Sets- Reading in External Data Files- Writing Out Data Files and Plots- Ad
Hoc Object Read/Write Operations
TextBook:
1. https://www.cs.upc.edu/~robert/teaching/estadistica/rprogramming.pdf
2. Tilman M.Davies,“THE BOOK OF R - A FIRST PROGRAMMING AND STATISTICS” Library of
Congress Cataloging-in-Publication Data,2016
References:
1. Wickham, H. & Grolemund, G. (2018). for Data Science. O’Reilly: New York. Available
2. Steven Keller, “R Programming for Beginners”, CreateSpace Independent Publishing Platform
2016
3. Kun Ren ,”Learning R Programming”, Packt Publishing,2016
Links:
● https://r4ds.had.co.nz/

Sr. No. Practicals of PGDSP104


1 1.Develop the R program for Basic Mathematical computation –Square, Square root,
exponential etc.
2. Create an object X that stores the value then overwrite the object in by itself divided by Y.
Print the result to the console.
3. Create and store a sequence of values from x to y that progresses in steps of 0.3
2 Create and store a three-dimensional array with six layers of a 4 X 2 matrix, filled with a
decreasing sequence of values between 4.8 and 0.1 of the appropriate length
3 Extract and store as a new object the fourth- and first-row elements, in that order, of the second
column only of all layers of (1).
4 1.Confirm the specific locations of elements equal to 0 in the 10 X 10 identity matrix I10
2.Store this vector of 10 values: foo <- c(7,5,6,1,2,10,8,3,8,2).Then, do the following: i. Extract
the elements greater than or equal to 5, storing the result as bar. ii. Display the vector
containing those elements from foo that remain after omitting all elements that are greater than
or equal to 5.
5 Store the string "Two 6-packs for $12.99". Then do the following:
i. Use a check for equality to confirm that the substring beginning with character 5 and ending
with character 10 is "6-pack".
ii. Make it a better deal by changing the price to $10.99
6 Create a list that contains, in this order, a sequence of 20 evenly spaced numbers between -4
and 4; a 3 X 3 matrix of the logical vector c(F,T,T,T,F,T,T,F,F) filled column-wise; a character
vector with the two strings "don" and "quixote"; and a factor vector containing the observations
c("LOW","MED","LOW","MED","MED","HIGH"). Then, Extract row elements 2 and 1 of
columns 2 and 3, in that order, of the logical matrix.
7 Create and store this data frame as dframe with the fiiels of person,sex,funny in your R
workspace.Append the two new records. 3. Write a single line of code that will extract from
mydataframe just the names and ages of any records where the individual is female and has a
level of funniness equal to Med OR High
8 Create a database with the fields of weight,height and sex then create a plot of weight on the
x-axis and height on the y-axis. Use different point characters or colors to distinguish between
males and females and provide a matching legend. Label the axes and give the plot a title.
9 Create a plot using ggplot2 for the same database consisting of weight on the x-axis and height
on the y-axis. Use different point characters or colors to distinguish between males and females
and provide a matching legend. Label the axes and give the plot a title.
10 Write R code that will plot education on the x-axis and income on the y-axis, with both x- and
y-axis limits fixed to be [0;100]. Provide appropriate axis labels. For jobs with a prestige value
of less than or equal to 80, use a black * as the point character. For jobs with prestige greater
than 80, use a blue @.

Class: M.Sc Branch: Data Science Semester: I


Subject: Data Warehousing & Mining

Period per Week(Each 60 min)Lecture 04


Practical 04
Hours Marks
Semester End Exam 2 hrs.30min 60
Evaluation System Continuous Internal Assessment __ 40
Semester End Practical Examination 2 hrs. 50
Total __ 150

Course: Data Warehousing & Mining Lecture


PGDS105 (Credits : 4 Lectures/Week: 4) s
Expected Learning Outcomes:
After successful completion of this course, students would be able to
1. Explain the operational and decision support system.
2. Evaluate the impact of use and information using knowledge discovery in databases
and KDD process models.
3. Summarize the data mining concepts with the help of Apriori algorithm, support,
confidence and trees.
4. Construct data models and prototypes needed to gain stakeholder support to achieve
business objectives.
Data Warehouse Fundamentals: Introduction to Data Warehouse, OLTP
Systems, Differences between OLTP Systems and Data Warehouse,
Unit I Characteristics of Data Warehouse, Components of Data Warehouse, Advantages 08L
and Applications of Data Warehouse, Top- Down and Bottom-Up Development
Methodology, Tools for Data warehouse development, Data Warehouse Types,
Planning and Requirements: Introduction: Planning Data Warehouse and Key
Issues, Data warehouse Project, Data Warehouse development Life Cycle, The
Unit-II 10 L
Project Team, Requirements Gathering Approaches: Team organization, Roles,
and Responsibilities, Extraction - Transformation - Loading
OLAP: Introduction, Characteristics, Advantages, Disadvantages; OLTP vs
OLAP, Data cubes, Data cube operations, OLAP types,
Dimensional Modeling: Dimensional Modeling Basics, E-R Modeling Versus
Unit-III 15 L
Dimensional Modeling, Data Warehouse Schemas; Star Schema, Inside
Dimensional Table, Inside Fact Table, Fact Less Fact Table, Star Schema Keys:
Snowflake Schema, Slowly Changing Dimensions
Data Mining: Introduction to Data Mining, The process of knowledge discovery
in databases, predictive and descriptive data mining techniques, supervised and
Unit-IV unsupervised learning techniques. 15 L
Data preprocessing: Data cleaning, Data transformation, Data reduction,
Discretization.
Classification: Decision trees, Bayesian classification,
Clustering: Basic issues in clustering, k-means clustering, Hierarchical
clustering- Agglomerative clustering, Divisive clustering, Density-based methods-
Unit - V 12L
DBSCAN
Association Rule Mining: Support, Confidence, Frequent item sets, Apriori
algorithm
TextBook:
1. Data Warehousing Fundamentals: A Comprehensive Guide for IT Professionals. Paulraj
Ponniah
2. Data Mining: Concepts and Techniques, The Morgan Kaufmann Series in Data Management
Systems, Han J. and Kamber M. Morgan Kaufmann Publishers, (2000).
3. Data Mining: Introductory and Advanced Topics, Dunham, Margaret H, Prentice Hall (2006)
References:
1. Luis Torgo, Data Mining with R Learning with Case Studies, Second Edition, CRC Press,
2017
2. Building the Data Warehouse, Inmon: Wiley (1993).
Links:
1) http://www.vssut.ac.in/lecture_notes/lecture1428550844.pdf
2) https://lecturenotes.in/subject/32/data-mining-and-data-warehousing-dmdw

Sr. No. Practical of PGDSP105

1. Create tables using different applications.


2. Develop an application to design a warehouse by importing various tables from external sources
3. a. Develop an application to creating a fact table and measures in a cube
b. Develop an application to create dimension tables in a cube and form star schema.
4. Develop an application to create fact and dimension tables in a cube and form snowflake
schema
5. Develop an application to demonstrate operations like roll-up, drill-down, slice, and dice.
6. Develop an application to demonstrate processing and browsing data from a cube.
7. Develop an application to pre-process data imported from external sources.
8. Pre-process the given data set and hence apply hierarchical algorithms and
density based clustering techniques. Interpret the result.
9. Pre-process the given data set and hence classify the resultant data set using
tree classification techniques. Interpret the result.
10. Create association rules by considering suitable parameters.

Class: M.Sc Branch: Data Science Semester: I


Subject: Data Structure with Python

Period per Week(Each 60 min)Lecture 04


Practical 04
Hours Marks
Semester End Exam 2 hrs.30min 60
Evaluation System Continuous Internal Assessment __ 40
Semester End Practical Examination 2 hrs. 50
Total __ 150

Course: Data Structures with Python Lecture


PGDS106 (Credits : 4 Lectures/Week: 4) s
Expected Learning Outcomes:
After successful completion of this course, students would be able to
1. Recall the concepts of arrays, strings and algorithms for basic operations.
2. Recognize the concept of stacks, queues, linked list and algorithms for basic
operations.
3. Identify the familiarity with major algorithms and data structures
4. Analyze appropriate algorithms and data structures for various applications
5. Formulate the computational complexity of various algorithms
Abstract Data Types: Introduction, The Date Abstract Data Type, Bags, Iterators.
Application
Arrays: Array Structure, Python List, Two Dimensional Arrays, Matrix Abstract
Data Type, Application
Sets and Maps: Sets-Set ADT, Selecting Data Structure, List based
Unit I Implementation, Maps-Map ADT, List Based Implementation, Multi-Dimensional
15 L
Arrays-Multi-Array ADT, Implementing Multi Arrays, Application
Algorithm Analysis: Complexity Analysis-Big-O Notation, Evaluating Python
Code, Evaluating Python List, Amortized Cost, Evaluating Set ADT, Application
Searching and Sorting: Searching-Linear Search, Binary Search,
Sorting-Bubble, Selection and Insertion Sort, Working with Sorted
Lists-Maintaining Sorted List, Maintaining sorted Lists
Linked lists : Linear lists, Single Linked List and Chains, Representing Chains,
Designing a Chain Class, Chain Manipulation Operations, The Template Class
Chain, Implementing Chains with Templates, Chain Iterators ,Chain Operations,
Unit-II 15 L
Circular List, Doubly Linked Lists, Skip list, Generalized Lists, Representation of
Generalized Lists, Recursive Algorithms for Lists, Reference Counts, Shared and
Recursive Lists
Stacks: Stack ADT, Implementing Stacks-Using Python List, Using Linked List,
Stack Applications-Balanced Delimiters, Evaluating Postfix Expressions
Queues:Queue ADT, Implementing Queue-Using Python List, Circular Array,
Unit-III 15 L
Using List, Priority Queues- Priority Queue ADT, Bounded and unbounded
Priority Queues
Advanced Sorting: Merge Sort, Quick Sort, Radix Sort, Sorting Linked List
Recursion: Recursive Functions, Properties of Recursion, Its working, Recursive
Hash Table: Introduction, Hashing-Linear Probing, Clustering, Rehashing,
Separate Chaining, Hash Functions
Unit-IV 15 L
Binary Trees: Tree Structure, Binary Tree-Properties, Implementation and
Traversals, Expression Trees, Heaps and Heapsort,Search Trees, R-Trees & R+
Trees.
TextBook:
1. Data Structure and algorithm Using Python, Rance D. Necaise, 2016 Wiley India Edition
2. Data Structure and Algorithm in Python, Michael T. Goodrich, Robertom Tamassia, M. H.
Goldwasser, 2016 Wiley India Edition
References:
1. Data Structure and Algorithmic Thinking with Python-Narasimha Karumanchi, 2015,
Careermonk Publications
2. Fundamentals of Python: Data Structures, Kenneth Lambert, Delmar Cengage Learning
Links:
https://lecturenotes.in/subject/81/data-structure-using-c-ds
http://www.cs.yale.edu/homes/aspnes/classes/223/notes.pdf
https://www.smartzworld.com/notes/data-structures-pdf-notes-ds/
https://www.geeksforgeeks.org/data-structures/

Sr. No. Practicals of PGDSP106


1 Implement Linear Search to find an item in a list.
2 Implement binary search to find an item in an ordered list
3 Implement Sorting Algorithms
a. Bubble sort
b. Insertion sort
c. Quick sort
d. Merge sort
4 Implement use of Sets and various operations on Sets.
5 Implement working of Stacks. (pop method to take the last item added off the stack and a push
method to add an item to the stack)
6 Implement Program for
a. Infix to Postfix conversion
b. Postfix Evolution
7 Implement the following
a. A queue as a list which you add and delete items from.
b. A circular queue. (The beginning items of the queue can be reused).
8 Implement Linked list and demonstrate the functionality to add and delete items in the linked
list.
9 Implement Binary Tree and its traversals.
10 Recursive implementation of
a. Factorial
b. Fibonacci
c. Tower of Hanoi
Semester II – Theory

Class: M.Sc Branch: Data Science Semester: II


Subject: Research in Computing
Period per Week(Each 60 min)Lecture 04
Practical/ Tutorial 04
Hours Marks
Semester End Exam 2 hrs.30min 60
Evaluation System Continuous Internal Assessment __ 40
Semester End Practical Examination 2 hrs. 50
Total __ 150

Course: Research in Computing Lecture


PGDS201 (Credits : 4 Lectures/Week: 4) s
Expected Course Outcomes
After successful completion of this course, students would be able to
1) Develop analytical skills by applying scientific methods.
2) Review the existing research article on Machine learning & Business
analytics
3) Survey the specific research areas in field of Computer Science
4) Test & validate the proposed methodology on research problems.
Unit I Introduction: Role of Business Research, Information Systems and Knowledge
12 L
Management, Theory Building, Organization ethics and Issues
Beginning Stages of Research Process:Problem definition,
Unit-II 12 L
Qualitative research tools, Secondary data research
Research Methods and Data Collection: Survey research, communicating with
Unit-III 12 L
respondents, Observation methods, Experimental research
Measurement Concepts, Sampling and Field work: Levels of Scale
Unit-IV measurement, attitude measurement, questionnaire design, sampling designs and 12 L
procedures, determination of sample size
Data Analysis and Presentation: Editing and Coding, Basic Data Analysis,
Unit-V Univariate Statistical Analysis and Bivariate Statistical analysis and differences 12 L
between two variables. Multivariate Statistical Analysis.
TextBook:
1. Business Research Methods, William G.Zikmund, B.J Babin, J.C. Carr,Atanu Adhikari, M.Griffin,
8th Edition. 2016.
2. Business Analytics, Albright Winsto, 5th Edition,2015
3.Research Methods for Business Students Fifth Edition, Mark Saunders, 2011.
4.Multivariate Data Analysis, Hair, Pearson, 7th Edition, 2014.
References:
Links:
● http://www.library.auckland.ac.nz/subject-guides/med/pdfs/Hindex%20and%20impact%20facto
rs.pdf
● www.openintro.org/stat/down/OpenIntroStatFirst.pdf

Sr. No. Practical of PGDSP201

A Write a program for obtaining descriptive statistics of data.


1 Import data from different data sources (from Excel,csv, mysql, sql server, oracle
B
to R/Python/Excel)
Design a survey form for a given case study, collect the primary data and analyze
A
2 it
B Perform suitable analysis of given secondary data.
A Perform testing of hypothesis using one sample t-test.
3 B Perform testing of hypothesis using two sample t-test.
C Perform testing of hypothesis using paired t-test.
A Perform testing of hypothesis using chi-squared goodness-of-fit test.
4
B Perform testing of hypothesis using chi-squared Test of Independence
5 Perform testing of hypothesis using Z-test.
A Perform testing of hypothesis using one-wayANOVA.
6 B Perform testing of hypothesis using two-wayANOVA.
C Perform testing of hypothesis using multivariateANOVA (MANOVA).
A Perform the Random sampling for the given data and analyse it.
7
B Perform the Stratified sampling for the given data and analyse it.
8 Compute different types of correlation.
A Perform linear regression for prediction.
9
B Perform polynomial regression for prediction.
A Perform multiple linear regression.
10
B Perform Logistic regression.

Class: M.Sc Branch: Data Science Semester: II


Subject: Analysis of Algorithms
Period per Week(Each 48 min)Lecture 04
Practical 04
Hours Marks
Semester End Exam 2 hrs.30min 60
Evaluation System Continuous Internal Assessment __ 40
Semester End Practical Examination 2 hrs. 50
Total __ 150

Course: Analysis of Algorithms Lectures


PGDS202 (Credits : 4 Lectures/Week: 4)
Expected Learning Outcomes:
After successful completion of this course, students would be able to
1. Explain the concepts of algorithms for designing good program
2. Implement algorithms using Python
3. Determine how to transform new problems into algorithmic problems
with efficient solutions
4. Illustrate algorithm design techniques for solving different problems
Introduction to algorithm, Why to analysis algorithm, Running time
analysis, How to Compare Algorithms, Rate of Growth, Commonly Used
Rates of Growth, Types of Analysis, Asymptotic Notation, Big-O Notation,
Unit I
Omega-Ω Notation, Theta-Θ Notation, Asymptotic Analysis, Properties of 15 L
Notations, Commonly used Logarithms and Summations, Performance
characteristics of algorithms, Master Theorem for Divide and Conquer, Divide
and Conquer
Tree algorithms: What is a Tree? Glossary, Binary Trees, Types of Binary
Trees, Properties of Binary Trees, Binary Tree Traversals, Generic Trees
(N-ary Trees), Threaded Binary Tree Traversals, Expression Trees, Binary
Search Trees (BSTs), Balanced Binary Search Trees, AVL (Adelson-Velskii
Unit II 15 L
and Landis) Trees
Graph Algorithms: Introduction, Glossary, Applications of Graphs, Graph
Representation, Graph Traversals, Topological Sort, Shortest Path Algorithms,
Minimal Spanning Tree
Selection Algorithms: What are Selection Algorithms? Selection by Sorting,
Partition-based Selection Algorithm, Linear Selection Algorithm - Median of
Medians Algorithm, Finding the K Smallest Elements in Sorted Order
Divide and Conquer Concept of divide and Conquer, Binary Search
(recursive), Quick Sort, Merge sort
Greedy Method Fractional Knapsack problem, Optimal Storage on Tapes,
Huffman codes, Concept of Minimum Cost Spanning Tree, Prim’s and
Unit III Kruskal's Algorithm 15 L
Dynamic Programming The General Method, Principle of Optimality, Matrix
Chain Multiplication, 0/1 Knapsack Problem, Concept of Shortest Path, Single
Source shortest path, Dijkstra’s Algorithm, Bellman Ford Algorithm, Floyd-
Warshall Algorithm, Travelling Salesperson Problem
Branch & Bound Introduction, Definitions of LCBB Search, Bounding
Function, Ranking Function, FIFO BB Search, Traveling Salesman problem
Using Variable tuple.
Decrease and conquer Definition of Graph Representation, BFS, DFS,
Unit IV Topological Sort/Order, Strongly Connected Components, Biconnected 15 L
Component, Articulation Point and Bridge edge
Problem Classification Basic Concepts: Deterministic Algorithm and Non
deterministic, Definitions of P, NP, NP-Hard, NP-Complete problems, Cook’s
Theorem (Only Statement and Significance)
TextBook:
1. Data Structure and Algorithmic Thinking with Python, Narasimha Karumanchi , CareerMonk
Publications, 2016
2. Introduction to Algorithm, Thomas H Cormen, PHI
Additional References:
1. Data Structures and Algorithms in Python, Michael T. Goodrich, Roberto Tamassia, Michael
H. Goldwasser, 2016, Wiley
2. Fundamentals of Computer Algorithms, Sartaj Sahni and Sanguthevar Rajasekaran Ellis
Horowitz, Universities Press
Links:
1. https://www.tutorialspoint.com/data_structures_algorithms/
2. https://www.javatpoint.com/data-structure-tutorial

Sr. No. Practicals of PGDSP202


1 Write a Python program to perform matrix multiplication. Discuss the complexity of the
algorithm used.
2 Write a Python program to sort n names using Quick sort algorithm. Discuss the complexity of
the algorithm used.
3 Write a Python program to sort n numbers using Merge sort algorithm. Discuss the complexity
of algorithm used
4 Write a Python program for inserting an element into a binary tree.
5 Write a Python program for deleting an element (assuming data is given) from a binary tree.
6 Write a Python program for checking whether a given graph G has a simple path from source s
to destination d. Assume the graph G is represented using adjacency matrix
7 Write a Python program for finding the smallest and largest elements in an array A of size n
using the Selection algorithm. Discuss Time complexity
8 Write a Python program for finding the second largest element in an array A of size n using
Tournament Method. Discuss Time complexity.
9 Write a Python program for implementing Huffman Coding Algorithms. Discuss the
complexity of algorithm
10 Write a Python program for implementing Strassen's Matrix multiplication using Divide and
Conquer method. Discuss the complexity of the algorithm.

Class: M.Sc Branch: Data Science Semester: II


Subject: Statistical Inference
Period per Week(Each 48 min)Lecture 04
Practical 04
Hours Marks
Semester End Exam 2 hrs.30min 60
Evaluation System Continuous Internal Assessment __ 40
Semester End Practical Examination 2 hrs. 50
Total __ 150

Course: Statistical Inference Lectures


PGDS203 (Credits : 3 Lectures/Week: 3)
Expected Learning Outcomes:
After successful completion of this course, students would be able to
1. Recognize software tools for projects in data management.
2. Apply technical skills in statistical data analysis to transform a simple to
multiple variables.
3. Describe the statistical decision-making theory and interpretation.
4. Analyze and solve real-time problems
Sampling & Sampling Distributions
Introduction to Sampling, Simple random Sampling, Stratified Random
Unit I Sampling, Cluster Sampling, Concept of Sampling Error, Introduction to
15 L
Sampling distributions, Student’s t distribution, Chi square distribution,
Snedecor’s F distribution, Interrelations among t, chi-square and F
distributions, Central Limit Theorem (Various Versions) and its applications.
Testing of hypothesis
Definitions: population, statistic, parameter, standard error of estimator.
Concept of null hypothesis and alternative hypothesis, critical region, level of
Unit II significance, type I and type II error, one sided and two-sided tests, p- value. 15 L
Large Sample Tests, Tests based on t, Chi-square and F-distribution.
All tests to be taught using R software. Manual calculations are not
expected.
Analysis of Variance
One Way ANOVA, Two Way ANOVA, Application of ANOVA to test the
Unit III overall significance of Regression. 15 L
All topics to be covered using R software. Manual calculations are not
expected.
Time Series
Meaning and Utility. Components of Time Series. Additive and Multiplicative
models. Methods of estimating trend: moving average method, least squares
Unit IV 15 L
method and exponential smoothing method. (single, double and triple),
Elimination of trend using additive and multiplicative models. Simple time
series models: AR (1), AR (2). Introduction to ARIMA Modelling.
TextBook:

1. Fundamentals of Applied Statistics (3 rd Edition), Gupta and Kapoor, S.Chand and Sons, New
Delhi, 1987.
2. Time Series Methods, Brockell and Devis, Springer, 2006.
3. Time Series Analysis,4 th Edition, Box and Jenkin, Wiley, 2008.

References:

1. Modern Elementary Statistics, Freund J.E., Pearson Publication, 2005.


2. Probability, Statistics, Design of Experiments and Queuing theory with applications Computer
Science, Trivedi K.S. ,Prentice Hall of India, New Delhi,2001.
3. Common Statistical Tests, Kulkarni M.B., Ghatpande S.B., Gore S.D.,
Satyajeet Prakashan,Pune, 1999.
4. Probability And Statistical Inference, 9 th Edition, Robert Hogg, Elliot Tanis, Dale Zimmerman,
Pearson education Ltd, 2015. A Beginners Guide to R, Alain Zuur, Elena Leno, Erik Meesters,
Springer, 2009.
5. Statistics Using R, Sudha Purohit, S.D.Gore, Shailaja Deshmukh, Narosa, Publishing Company.
Links:
1. https://www.youtube.com/watch?v=10cuDKGytMw
2. https://www.tutorialspoint.com/time_series/time_series_moving_average.htm
3. https://otexts.com/fpp2/arima-r.html
Sr. No. Practicals of PGDSP203
1 Write a program on sampling distribution.
2 Write a program on Central Limit Theorem.
3 Write a program on normal test
4 Write a program on t test
5 Write a program on Chi-square
6 Write a program on F-distribution
7 Write a program on One Way ANOVA
8 Write a program on Two Way ANOVA
9 Write a program on AR (1), AR (2)
10 Write a program on ARIMA Modelling

Class: M.Sc Branch: Data Science Semester: II


Subject: Advanced Python Programming
Period per Week(Each 60 min)Lecture 03
Practical 02
Hours Marks
Semester End Exam 2 hrs.30min 60
Evaluation System Continuous Internal Assessment __ 40
Semester End Practical Examination 2 hrs. 50
Total __ 150

Course: Advanced Python Programming Lectures


PGDS204 (Credits : 4 Lectures/Week: 2)
Expected Learning Outcomes:
After successful completion of this course, students would be able to
1. Explain fundamental understanding of the Python programming language.
2. Describe common Python functionality and features used for data science
3. Illustrate the Object-oriented Programming concepts in Python.
4. Visualize and describe DataFrame structures for cleaning and processing data
LIST MANIPULATION:Introduction to Python List∙ Creating
List∙ Accessing List∙ Joining List∙ Replicating List∙ List
Slicing, list comprehension
Unit I TUPLES Introduction to Tuple∙ Creating Tuples∙ Accessing
15 L
Tuples∙ Joining Tuples∙ Replicating Tuples∙ Tuple Slicing∙
DICTIONARIES Introduction to Dictionary∙ Accessing
values in dictionaries∙ Working with dictionaries∙
Properties∙
Set and Frozeset: Introduction to Set and Frozenset,Creating Set and Frozenset,
Accessing and Joining, Replicating and Slicing
Regular Expressions: Match function, Search function, Grouping, Matching at
Beginning or End, Match Objects , Flags
Object-Oriented Programming: Classes and Objects, Creating Classes in Python
Creating Objects in Python, The Constructor Method, Classes with Multipl
Objects, Class Attributes versus Data Attributes, Encapsulation, Inheritance Th
Polymorphism.
Functional Programming: Iterators, Generators, Decorators
Unit-II 15 L
Files and Working with Text Data: Types of Files, Creating and Reading Text
Data, File Methods to Read and Write Data, Reading and Writing Binary Files, The
Pickle Module, Reading and Writing CSV Files, Python os and os.pathModules,
JSON and XML in Python, Processing HTML Files, Processing Texts in Natural
Languages
Working with Tabular Numeric Data(Numpy with Python): NumPy Arrays
Creation Using array() Function, Array Attributes, NumPy Arrays Creation with
Initial Placeholder Content, Integer Indexing, Array Indexing, Boolean
ArrayIndexing, Slicing and Iterating in Arrays, Basic Arithmetic Operations on
NumPy Arrays, Mathematical Functions in NumPy, Changing the Shape of an
Unit-III Array, Stacking and Splitting of Arrays, Broadcasting in Arrays. 15 L
Working with Data Series and Frames: Pandas Data Structures, Reshaping Data,
Handling Missing Data, Combining Data, Ordering and Describing Data,
Transforming Data, Taming Pandas File I/O
Plotting: Basic Plotting with PyPlot, Getting to Know Other Plot Types, Mastering
Embellishments, Plotting with Pandas
Textbook:
● Michael Urban and Joel Murach, Python Programming, Shroff/Murach, 2016
● Haltermanpython Mark Lutz, Programming Python, O`Reilly, 4th Edition, 2010
References:
1. Wesley J. Chun, “Core Python Programming”, Prentice Hall,2006.
2. Mark Lutz, “Learning Python”, O’Reilly, 4th Edition, 2009
Links:
https://www.w3schools.com/python
https://docs.python.org/3/tutorial/index.html
https://www.python-course.eu/advanced_topics.php
Sr. No. Practicals of PGDSP204
1 a. Program with a function that takes two lists L1 and L2 containing integer numbers as
parameters. The return value is a single list containing the pair wise sums of the numbers
in L1 and L2
b. Program to read the lists of numbers as L1, print the lists in reverse order without using
reverse function.
2 Program to find max and min of a given tuple of integers.
3 Write a program that combine lists L1 and L2 into a dictionary.
4 Program to find union, intersection, difference, symmetric difference of given two sets.
5 Write a program for searching, splitting and replacing things based on pattern matching using
regular expression.
6 Write programs to parse text files, CSV, HTML, XML and JSON documents and
extract relevant data. After retrieving data check any anomalies in the data,
missing values etc.
7 Write programs for reading and writing binary files
8 a. Program to implement the inheritance
b. Program to implement the polymorphism
9 Write programs to create numpy arrays of different shapes and from different sources, reshape
and slice arrays, add array indexes, and apply arithmetic, logic, and aggregation functions to
some or all array elements
10 Write programs to use the pandas data structures: Frames and series as storage containers and
for a variety of data-wrangling operations, such as:
● Single-level and hierarchical indexing
● Handling missing data
● Arithmetic and Boolean operations on entire columns and tables
● Database-type operations (such as merging and aggregation)
● Plotting individual columns and whole tables
● Reading data from files and writing data to files
Datasets
For this laboratory, appropriate publicly available datasets, can be studied and
used. Example:
MNIST (http://yann.lecun.com/exdb/mnist/),
UCI Machine Learning
Repository(https://archive.ics.uci.edu/ml/datasets.html),
Kaggle(https://www.kaggle.com/datasets)
Twitter Data

Class: M.Sc Branch: Data Science Semester: II


Subject: Big Data Analytics
Period per Week(Each 60 min) Lecture 04
Practical 04
Hours Marks
Semester End Exam 2 hrs.30min 60
Evaluation System Continuous Internal Assessment __ 40
Semester End Practical Examination 2 hrs. 50
Total __ 150

Course: Big Data Analytics Lectures


PGDS205 (Credits : 4 Lectures/Week: 2)
Expected Learning Outcomes:
After successful completion of this course, students would be able to
1. Describe the fundamentals of various big data analytics techniques.
2. Design efficient algorithms for mining the data from large volumes.
3. Analyze the HADOOP and Map Reduce technologies associated with big data
analytics.
4. Prepare a complete business data analytics solution
Understanding Big Data:
What is big data , why big data , Data Storage and Analysis, Comparison with
Other Systems, Relational Database Management System , Grid Computing,
Unit I
Volunteer Computing, unstructured data, industry examples of big data, web 15 L
analytics, big data and marketing, fraud and big data, risk and big data, big data
and healthcare, big data in medicine, advertising and big data, big data
technologies, cloud and big data, Crowd sourcing analytics,
Big Data MapReduce MapReduce, Introduction to Map Reduce: The map tasks,
Grouping by key, The reduce tasks, Combiners, Details of MapReduce Execution,
Unit-II Word Count MapReduce, Different tools on Big data Platform, Vector data 15 L
(newspaper article or document search), PageRank Algorithm, Twitter Data
Analytic, Social Media mining
Basics of Hadoop
Data format, introduction to Hadoop, Hadoop ecosystem, analyzing data with
Unit-III Hadoop, scaling out, Hadoop streaming, Hadoop pipes, design of Hadoop 15 L
distributed file system (HDFS), HDFS concepts, Java interface, data flow, Hadoop
I/O, data integrity, compression, serialization, Avro – file-based data structures
A General Overview of High-Performance Architecture – HDFS – MapReduce
and YARN – Map Reduce Programming Model, Hive, storage of Hive data
Unit-IV 15 L
(database) in HDFS, Query writing to achieve business tasks, Database
management, Query optimization, Views and Partition
Apache Pig, What is PIG?, Pig Architecture, Prerequisites, How to Download and
Unit-V Install Pig, Example Pig Script, Data flow programming, Storing data in HDFS /
Hood, MongoDB, Database creation, Query building, regular expression
TextBook:
1. Big Data, Black Book: Covers Hadoop 2, MapReduce, Hive, YARN, Pig, R and Data
Visualization, By DT Editorial Services, 2016
2. Programming Hive. By Jason Rutberglen, Dean Wampler, Edward Copriolo, 2012
3. Programming Pig by Anal Gates, 2011
4. MongoDB: The Definitive Guide, by Kristina Chodorow, 2013
References:
1. Hadoop, The Definitive Guide, by Tom White, 2015
2. Mining of Massive Datasets, by Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman, 2015
Links:
http://index-of.co.uk/Big-DataTechnologies/Data%20Science%20and%20Big%20Data%20Analytics.pdf

Sr. No. Practicals of PGDSP205


1 Write a map-reduce program to count the number of occurrences of each
alphabetic character in the given dataset. The count for each letter should be
case-insensitive (i.e., include both upper-case and lower-case versions of the
letter; Ignore non-alphabetic characters).
2 Write a map-reduce program to count the number of occurrences of each word in
the given dataset. (A word is defined as any string of alphabetic characters
appearing between non-alphabetic characters like nature's is two words. The
count should be case-insensitive. If a word occurs multiple times in a line, all
should be counted)
3 Write a map-reduce program to determine the average ratings of movies. The
input consists of a series of lines, each containing a movie number, user number,
rating and a timestamp.
4 (i)Perform setting up and Installing Hadoop in its two operating modes:
a. Pseudo distributed,
b. Fully distributed.
(ii) Use web based tools to monitor your Hadoop setup
5 Implement the following file management tasks in Hadoop:
a. Adding files and directories
b. Retrieving files
c. Deleting files
6 Install and Run Hive then use Hive to create, alter, and drop databases, tables, views,
functions, and indexes
7 Install and Run Pig then write Pig Latin scripts to sort, group, join, project, and filter your
data.
8
Case Study
Class: M.Sc Branch: Data Science Semester: II
Subject: Optimization Techniques
Period per Week(Each 60 min) Lecture 04
Practical 04
Hours Marks
Semester End Exam 2 hrs.30min 60
Evaluation System Continuous Internal Assessment __ 40
Semester End Practical Examination 2 hrs. 50
Total __ 150

Course: Optimization Techniques Lectures


PGDS206 (Credits : 4 Lectures/Week: 2)
Expected Learning Outcomes:
After successful completion of this course, students would be able to
1) Explain the theory of optimization methods and algorithms.
2) Apply the mathematical results and numerical techniques of optimization theory to
concrete data science problems.
3) Apply basic concepts of mathematics to formulate an optimization problem.
4) Analyze and appreciate a variety of performance measures for various optimization
problems.
Introduction to Operations Research
Unit I Introduction-Mathematical models of Operation Research-Scope and applications
15 L
of Operation Research-Phases of Operation Research study-Characteristics of
Operation Research-Limitations of Operation Research
Linear Programming
Introduction –Properties of Linear Programming-Basic assumptions-Mathematical
Unit-II formulation of Linear Programming-Limitations or constraints-Methods for the 15 L
solution of LP Problem-Graphical analysis of LP-Graphical LP Maximization
problem-Graphical LP Minimization problem
Dual Linear Programming
Introduction- Primal and Dual problem -Dual problem properties-Solution
Unit-III 15 L
techniques of Dual problem-Dual Simplex method-Relations between direct and
dual problem-Economic interpretation of Duality
Unit-IV Transportation and Assignment Models 15 L
Introduction: Transportation problem-Balanced-Unbalanced-Methods of basic
feasible solutionOptimal solution-MODI method. Assignment problem-Hungarian
Method.
Network Analysis
Basic concepts-Construction of Network-Rules and precautions-CPM and PERT
Unit-V
NetworksObtaining of critical path. Probability and cost consideration. Advantages
of Network.
TextBook:
1) Hamdy Taha, Operations Research, 10th edition, Prentice Hall India, 2019.
2) P. K. Gupta and D. S. Hira, Operations Research, S. Chand & co., 2007
References:
1) S.D. Sharma (2000), Operations Research, Nath & Co., Meerut. Maurice Solient, Arthur
Yaspen, Lawrence Fridman, (2003), OR methods and Problems, New Age International Edition.
2) J K Sharma (2007), Operations Research Theory & Applications, 3e, Macmillan India Ltd. P.
Sankara Iyer, (2008), Operations Research, Tata McGraw-Hill.
3) A Ravindran, Don T Philips and James J Solberg, Operations Research: Principles and Practice,
2nd edition, John Wiley and sons, 2007
Links:
https://towardsdatascience.com/tagged/optimization-algorithms
https://www.geeksforgeeks.org/optimization-for-data-science/

Sr. No. Tutorial of PGDSP206


1 A minimum of 5 problems to be worked out by students in every tutorial class. Another 5
problems per tutorial class to be given as a home work

You might also like