0% found this document useful (0 votes)
3 views

syllabus sem 6

The document outlines course details for Distributed Databases and Data Science Tools Workshop, including course outcomes, content, and practical assignments. Key topics include distributed DBMS architecture, query processing, optimization techniques, and data visualization methods. Suggested readings and practical tasks are provided to enhance learning and application of the concepts.

Uploaded by

Shubham Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

syllabus sem 6

The document outlines course details for Distributed Databases and Data Science Tools Workshop, including course outcomes, content, and practical assignments. Key topics include distributed DBMS architecture, query processing, optimization techniques, and data visualization methods. Suggested readings and practical tasks are provided to enhance learning and application of the concepts.

Uploaded by

Shubham Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Course no.

Type Subject L T P Credits CA MS ES CA ES Pre-


requisites

CDCSC18 CC Distributed 3 0 2 4 15 15 40 15 15 CDCSC05


Databases

Course Outcome:

After learning the course the students should be able to:

1. Understand Distributed DBMS and its Architecture


2. Apply various fragmentation techniques in organizing distributed database
3. Understand the steps of query processing
4. Learn various Query Optimization Algorithms
5. Understand Transaction Management & Compare various approaches to control
concurrency control and deadlocks in Distributed database ∙

Unit 1:
Introduction: Distributed Data Processing, Distributed Database Systems, advantages and
drawbacks of DDBSs, Distributed DBMS Architecture : Models- Autonomy, Distribution,
Heterogeneity, DDBMS Architecture – Client/Server, Peer to peer, MDBS

Unit 2:
Database Distribution
Design Alternatives – localized data, distributed data, Fragmentation – Vertical, Horizontal
(primary & derived), hybrid, general guidelines, correctness rules, Distribution transparency
– location, fragmentation, replication Impact of distribution on user queries – No Global Data
Dictionary(GDD), GDD containing location information,

Unit 3:
Query Processing
Query Processing, Query Processing in Centralized Systems. Layers of Query Processing,
Distributed query processing, Query decomposition, Localization of distributed data

Unit 4:
Optimization of Distributed Queries
Query Optimization, Centralized Query Optimization, Distributed Query Optimization
Algorithms

Unit 5:
Distributed Transaction Management & Concurrency Control
Transaction concept, ACID property, Distributed Concurrency Control,
Serializability and recoverability, Distributed Serializability, Enhanced lock based
and timestamp based protocols, Multiple granularity, Multi version schemes,
Optimistic Concurrency Control techniques, Distributed deadlock & Recovery

Case study on Distributed databases in cloud

Suggested Readings
1. M.T. Ozsu and P. Valduriez, “Principles of Distributed Database Systems”, Pearson
Publication
2. D. Bell and J. Grimson, “Distributed Database System”, Addison-Wesley
3. Stefano Ceri and Giuseppe Pelagatti, “Distributed Databases: principles and systems”,
Course No. Type Subject L T P Credits CA MS ES CA ES Pre-
requisites
CDCSE19 ED Data Science 0 2 4 4 Python
Tools Workshop
COURSE OUTCOMES

1. Asking the correct questions and analyzing the raw data.


2. Modeling the data using various complex and efficient algorithms.
3. Visualizing the data to get a better perspective.
4. Understanding the data to make better decisions and find the final result

COURSE CONTENTS

UNIT-1: Data Science an Introduction: Computer Science, Data Science, and Real Science, What is Data
Science? Need for Data Science, Data Science Components, Tools for Data Science, Data Science Lifecycle,
Applications of Data Science.
UNIT-2:Python and R Programming for Data Science for Data Science: Introduction to Python
Programming (Python Basics, Python Data Structures, Python Programming Fundamentals, Working
with Data in Python, Working with NumPy, Pandas, SciPy, and Matplotlib).
UNIT-3: Data Processing: Data Operations, Data cleansing, Processing CSV Data, Processing JSON
Data, Processing XLS Data, Relational databases, NoSQL Databases, Date and Time, Data
Wrangling, Data Aggregation, Reading HTML Pages, Processing Unstructured Data, Word
tokenization, Stemming and Lemmatization
UNIT 4: Statistical Data Analysis: Measuring Central Tendency, Measuring Variance, Normal
Distribution, Binomial Distribution, Poisson Distribution, Bernoulli Distribution, P-Value,
Correlation, Chi-square Test, Linear Regression
UNIT-5: Data Visualization: Chart Properties, Chart Styling, Box Plots, Heat Maps, Scatter Plots,
Bubble Charts, 3D Charts, Time Series, Geographical Data, Graph Data

Suggested Text Book(s):


• Data Science from Scratch by Joel Grus
• Data Science for Dummies by Lillian Pierson and Jake Porway
• An Introduction to Statistical Learning by Gareth James, Daniela Witten, et al.

Reference Book(s):
• An Introduction to Probability and Statistics by V.K. Rohatgi & A.K. Md. E. Saleh, Wiley,
(2008), 3rd ed.
• Introduction to Probability Theory and Statistical Inference by H.J. Larson, John Wiley
& Sons, (2005) 3rd ed.

Other useful resources (s):


• https://nptel.ac.in/courses/110/106/110106064/
• https://onlinecourses.nptel.ac.in/noc18_cs28/previ
List of Practical’s

S No Description Hours
• Write a Python/R program to create a vector of a specified type and length.
Create a vector of numeric, complex, logical, and character types of length
6.
• Write a Python/R program to add two vectors of integer type and length 3.
1 3
• Write a Python/R program to create a list containing a vector, a matrix, and
a list and remove the second element
• Write a Python/R program to create a list containing a vector, a matrix, and
a list and update the last element.
Write Python/R programs to solve the following tasks in both of them.
• Read numbers from a file, and print them out in sorted order.
2 • Read a text file, and count the total number of words. 3
• Read a text file, and count the total number of distinct words.
• Read a file of numbers, and plot a frequency histogram of them
Statistical Data Analysis
Write a program to solve linear regression for a given data set.
Y = ax + b
where
a = (nΣxy –ΣxΣy) / nΣx2 – Σ(x)2
b = (Σy- aΣx)/n
Here
Y: response variable
X: predicator variable
3 a, b: regression coefficients 3
Read data set
X Y

-2 -1

1 1

3 2

Statistical Data Analysis


Solve the linear regression for a given data set, and also predict sales in the year
2012.
Year Sales
4 3
2005 12

2006 19

2007 29
2008 37

2009 45

Statistical Data Analysis


Compute Logistic Regression for Organization dataset.
Response Variables
Y = Compensation in rupees
Prediction Variables
X1 = Experience in years
X2 = Education in years (after 10th standard)
X3 = Number of Employees Supervised
X4 = Number of Projects Handled
S Compensation Experience Education Number Projects
No supervised

1 1500 2 5 4 10

2 1650 3 6 5 10

3 1750 3 3 5 12

4 1400 2 3 3 9
5 2
5 2000 4 4 6 15

6 2200 5 6 6 14

7 2100 1 5 4 12

8 2750 5 8 7 15

9 2900 8 9 8 25

10 1100 3 3 2 7

11 1000 4 2 1 5

12 1350 6 4 4 12

13 1550 4 6 4 11

Here you will get an error as y- value must be 0 < 1. So modify Y values.
Statistical Data Analysis
• In an entrance examination, there are twenty multiple-choice questions. Each
question has four options, and only one of them is correct. Find the
probability of having seven or less than seven correct answers if a student
6 2
attempts to answer every question at random.
• Let us assume that the test scores an entrance exam fit a normal distribution
where the mean test score is 67, and the standard deviation is 13.7. Calculate
the percentage of students scoring 80 or more in the exam?
Mid-Semester Lab Examination
Data Visualization
Construct a revealing visualization of some aspect of your favorite data set, using:
• A well-designed table.
• A dot and/or line plot.
9 • A scatter plot. 3
• A heatmap.
• A bar plot or pie chart.
• A histogram.
• A data map.
Data Visualization
10 Create ten different versions of line charts for a particular set of (x, y) points. 3
Which ones are best and which ones worst? Explain why.
Data Visualization
11 Construct scatter plots for sets of 10, 100, 1000, and 10,000 points. Experiment 3
with the point size to find the most revealing value for each data set.
Data Visualization
Experiment with different color scales to construct scatter plots for a particular
12 3
set of (x, y, z) points, where color is used to represent the z dimension. Which
color schemes work best? Which are the worst? Explain why.
End-Semester Lab Examination
Total Lab hours 28
Course Type Subject L T P Credits CA MS ES CA ES Pre-
No. requisites
CDCSC20 CC
Query Processing 3 1 0 4 25 25 50 - - CDCSC05
and Optimization
COURSE OUTCOMES

1. To develop an understanding of the fundamentals of query processing.


2. To develop an understanding of query optimization.
3. To design and implement a database for any specified domain according to well-known
design principles that balance data retrieval performance with data consistency
guarantees
4. To Formulate data retrieval queries in SQL and the abstract query languages
5. To optimize various operations of SQL
COURSE CONTENTS:

UNIT I
Query Processing: Introduction, Steps: Parsing and Translation, Optimization, Evaluation;
Measures of Query Cost
Relational Algebra, Operations from Set Theory, Translational SQL Queries into Relational
Algebra, Equivalence rules, Equivalence derivability and minimality, Enumeration of Equivalent
Expressions.

UNIT II
Algorithms for Selection Operations: using indices, comparisons, complex selections; Algorithms
for External Sorting, Algorithms for SELECT Operations, Aggregation Operation
Algorithms for JOIN Operations: Nested-Loop Join, Block Nested Loop Join, Indexed Nested loop
join, Merge-Join, Hash Join, Hybrid Hash Join, Complex Joins, Outer Join, Algorithms for Project
and Set Operations

UNIT III
Evaluation of Expression, Transformation of Relational Expression, Combining Operations using
Pipelining, Procedure-driven pipelining, Double pipelining join technique, Materialization,
Materialized Evaluation

UNIT IV
Query Optimization, Introduction, Query Evaluation Plan(QEP), cost based query optimization,
Estimation of QEP cost, using heuristics in query optimization, Selectivity and Cost Estimation,
Semantic Query optimization

UNIT V
Estimation Statistics of Estimation Results, Cost Estimation, Statistical Information for cost
estimation: Histograms, Selection and JOIN Size Estimation, Projection and aggregation size
estimation
Choice of Evaluation Plans, Dynamic programming in Optimization, Cost of Optimization,
Structure of Query Optimizers, Materialized Views, Optimization in distributed databases.

SUGGESTED READINGS
1. Raghu Rama Krishnan and J. Gehrke,, Database Management Systems,, 3rd Edition,
McGraw Hill
2. Silberschatz, H. F. Korth& A. Sudarshan,, Database System Concepts,, McGraw Hill, 5th
ed, 2006.

You might also like