0% found this document useful (0 votes)
17 views4 pages

Data Science Syllabus

The document outlines a Data Science Tools Workshop course (CDCSE19) that includes practical applications of Python and R programming, data processing, statistical analysis, and data visualization. It details course outcomes, contents divided into five units, suggested textbooks, and practical exercises for hands-on learning. The course aims to equip students with skills to analyze and visualize data effectively using various tools and techniques.

Uploaded by

Shubham Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views4 pages

Data Science Syllabus

The document outlines a Data Science Tools Workshop course (CDCSE19) that includes practical applications of Python and R programming, data processing, statistical analysis, and data visualization. It details course outcomes, contents divided into five units, suggested textbooks, and practical exercises for hands-on learning. The course aims to equip students with skills to analyze and visualize data effectively using various tools and techniques.

Uploaded by

Shubham Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Course No.

Type Subject L T P Credits CA MS ES CA ES Pre-


requisites
CDCSE19 ED Data Science 0 2 4 4 Python
Tools Workshop
COURSE OUTCOMES

1. Asking the correct questions and analyzing the raw data.


2. Modeling the data using various complex and efficient algorithms.
3. Visualizing the data to get a better perspective.
4. Understanding the data to make better decisions and find the final result

COURSE CONTENTS

UNIT-1: Data Science an Introduction: Computer Science, Data Science, and Real Science, What is Data
Science? Need for Data Science, Data Science Components, Tools for Data Science, Data Science Lifecycle,
Applications of Data Science.
UNIT-2:Python and R Programming for Data Science for Data Science: Introduction to Python
Programming (Python Basics, Python Data Structures, Python Programming Fundamentals, Working
with Data in Python, Working with NumPy, Pandas, SciPy, and Matplotlib).
UNIT-3: Data Processing: Data Operations, Data cleansing, Processing CSV Data, Processing JSON
Data, Processing XLS Data, Relational databases, NoSQL Databases, Date and Time, Data
Wrangling, Data Aggregation, Reading HTML Pages, Processing Unstructured Data, Word
tokenization, Stemming and Lemmatization
UNIT 4: Statistical Data Analysis: Measuring Central Tendency, Measuring Variance, Normal
Distribution, Binomial Distribution, Poisson Distribution, Bernoulli Distribution, P-Value,
Correlation, Chi-square Test, Linear Regression
UNIT-5: Data Visualization: Chart Properties, Chart Styling, Box Plots, Heat Maps, Scatter Plots,
Bubble Charts, 3D Charts, Time Series, Geographical Data, Graph Data

Suggested Text Book(s):


• Data Science from Scratch by Joel Grus
• Data Science for Dummies by Lillian Pierson and Jake Porway
• An Introduction to Statistical Learning by Gareth James, Daniela Witten, et al.

Reference Book(s):
• An Introduction to Probability and Statistics by V.K. Rohatgi & A.K. Md. E. Saleh, Wiley,
(2008), 3rd ed.
• Introduction to Probability Theory and Statistical Inference by H.J. Larson, John Wiley
& Sons, (2005) 3rd ed.

Other useful resources (s):


• https://nptel.ac.in/courses/110/106/110106064/
• https://onlinecourses.nptel.ac.in/noc18_cs28/previ
List of Practical’s

S No Description Hours
• Write a Python/R program to create a vector of a specified type and length.
Create a vector of numeric, complex, logical, and character types of length
6.
• Write a Python/R program to add two vectors of integer type and length 3.
1 3
• Write a Python/R program to create a list containing a vector, a matrix, and
a list and remove the second element
• Write a Python/R program to create a list containing a vector, a matrix, and
a list and update the last element.
Write Python/R programs to solve the following tasks in both of them.
• Read numbers from a file, and print them out in sorted order.
2 • Read a text file, and count the total number of words. 3
• Read a text file, and count the total number of distinct words.
• Read a file of numbers, and plot a frequency histogram of them
Statistical Data Analysis
Write a program to solve linear regression for a given data set.
Y = ax + b
where
a = (nΣxy –ΣxΣy) / nΣx2 – Σ(x)2
b = (Σy- aΣx)/n
Here
Y: response variable
X: predicator variable
3 a, b: regression coefficients 3
Read data set
X Y

-2 -1

1 1

3 2

Statistical Data Analysis


Solve the linear regression for a given data set, and also predict sales in the year
2012.
Year Sales
4 3
2005 12

2006 19

2007 29
2008 37

2009 45

Statistical Data Analysis


Compute Logistic Regression for Organization dataset.
Response Variables
Y = Compensation in rupees
Prediction Variables
X1 = Experience in years
X2 = Education in years (after 10th standard)
X3 = Number of Employees Supervised
X4 = Number of Projects Handled
S Compensation Experience Education Number Projects
No supervised

1 1500 2 5 4 10

2 1650 3 6 5 10

3 1750 3 3 5 12

4 1400 2 3 3 9
5 2
5 2000 4 4 6 15

6 2200 5 6 6 14

7 2100 1 5 4 12

8 2750 5 8 7 15

9 2900 8 9 8 25

10 1100 3 3 2 7

11 1000 4 2 1 5

12 1350 6 4 4 12

13 1550 4 6 4 11

Here you will get an error as y- value must be 0 < 1. So modify Y values.
Statistical Data Analysis
• In an entrance examination, there are twenty multiple-choice questions. Each
question has four options, and only one of them is correct. Find the
probability of having seven or less than seven correct answers if a student
6 2
attempts to answer every question at random.
• Let us assume that the test scores an entrance exam fit a normal distribution
where the mean test score is 67, and the standard deviation is 13.7. Calculate
the percentage of students scoring 80 or more in the exam?
Mid-Semester Lab Examination
Data Visualization
Construct a revealing visualization of some aspect of your favorite data set, using:
• A well-designed table.
• A dot and/or line plot.
9 • A scatter plot. 3
• A heatmap.
• A bar plot or pie chart.
• A histogram.
• A data map.
Data Visualization
10 Create ten different versions of line charts for a particular set of (x, y) points. 3
Which ones are best and which ones worst? Explain why.
Data Visualization
11 Construct scatter plots for sets of 10, 100, 1000, and 10,000 points. Experiment 3
with the point size to find the most revealing value for each data set.
Data Visualization
Experiment with different color scales to construct scatter plots for a particular
12 3
set of (x, y, z) points, where color is used to represent the z dimension. Which
color schemes work best? Which are the worst? Explain why.
End-Semester Lab Examination
Total Lab hours 28

You might also like