Datascience
Datascience
Course Code
Course Category Professional Core
Course Title Data Science for Engineers
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs 2 - 2 2+0+1=3
Pre-requisites:
x Linux Based Python Laboratory
Course Objectives:
1. Knowledge (i) To know fundamentals of data science and apply python concept for data
analysis
2. Attitude (i) To identify machine learning algorithm to solve real world problems
Course Outcomes:
After completion of the course the students will be able to: -
1. Understand fundamentals of data science and python concepts for data analysis
2. Apply statistical concepts to solve real life problems
3. Apply appropriate machine learning algorithms to solve real world problems
4. Apply Visualization tool and techniques to find insights from real world data
Course Contents:
1. Introduction to Data Science
2. Statistics for Data Science
3. Machine Learning
4. Data Visualization
Reference Books:
1. Foundations of Data Science by Avrim Blum, John Hopcroft, and Ravindran Kannan
2. Ward, Grinstein Keim, Interactive Data Visualization: Foundations, Techniques, and
Applications. Natick: A K Peters, Ltd.
3. Glenn J. Myatt, Making sense of Data: A practical Guide to Exploratory Data
Analysis and Data Mining, John Wiley Publishers, 2007.
Supplementary Reading:
https://swayam.gov.in/nd1_noc19_cs60/preview
Web Resources:
https://nptel.ac.in/courses/106/106/106106179/
Weblinks:
https://www.youtube.com/watch?v=MiiANxRHSv4https://www.youtube.com/watch?
v=y8Etr3Tx6yE&list=PLyqSpQzTE6M_JcleDbrVyPnE0PixKs2JE&index=5
MOOCs:
https://intellipaat.com/data-scientist-course-training/
Pedagogy:
● PowerPoint Presentation
● Flipped Classroom Activity
● Project based Learning
● Jupyter notebook for coding
Assessment Scheme:
Class Continuous Assessment: 50 Marks (50% of the Total Marks)
Assignments Mid Test MCQ/Poster Presentation
(Research
Statement)/Active
Learning
30 Marks 20 Marks
Theory Syllabus:
Workload in
Module
Contents Hrs
No.
Theory Lab
Introduction to Data Science:
Data Science Fundamentals: Types of Data, Data Quality, Data
Science Life Cycle, Applications, Types of datasets, Python for
1 07
Data Science: Pandas and Numpy, Matplotlib for data analysis,
Data Pre-processing: Missing data handling, Data scaling and
normalization, Feature extraction.
Statistics for Data Science:
Basic Statistics: Descriptive Statistics, Measures of Central
Tendency: Mean, Median, Mode, Measures of Dispersion: Range,
Variance, Standard Deviation, Measures of Position: Quartiles,
2 08
Percentile, Z-score, Data transformation, Measure of
Relationship: Covariance, Correlation, Basic Probability and
Distribution, Hypothesis testing, Applying statistical concepts in
Python.
Machine Learning:
Introduction to machine learning, Supervised and Unsupervised
Learning, splitting datasets: Training and Testing, Regression:
3 08
Simple Linear Regression, Classification: Naïve Bayes classifier
and clustering: K-means, Evaluating model performance, Python
libraries for machine learning.
Data Visualization:
Introduction to data visualization, challenges, Types of Data
4 visualization: Bar charts, scatter plots, Histogram, Box Plots, 07
Heatmap, Data Visualization using python: matplotlib, seaborne,
Data Visualization tool:Tableau.
Laboratory Assignments:
Attempt any 3
i. Write a python program to create a dictionary which contains
student’s names and marks. Iterate over the dictionary and
apply below conditions to print their grades:
a. Marks greater than or equal to 70 – Distinction
b. Marks between 60-69 – First Class
c. Marks between 50-59 – Second Class
d. Marks between 40-49 –Pass
02
1 e. Marks less than 40 - Fail
ii. Write a Python Program to create a 1D array of numbers from
0 to 9.
iii. Write a NumPy program to create an array of all the even
integers from 30 to 70.
iv. Write a NumPy program to create a 3x4 matrix filled with
values from 10 to 21.
v. Write a NumPy program to compute the sum of all elements,
sum of each column and sum of each row of a given array.
Attempt any 3
i. Write a python program to output a 3-by-3 array of random
numbers following normal distribution
Stack these arrays vertically:
a = np.arange(10).reshape(2,-1)
b = np.repeat(1, 10).reshape(2,-1)
ii. Get the common items between two numpy arrays 04
2 a = np.array([1,2,3,2,3,4,3,4,5,6])
b = np.array([7,2,10,2,7,4,9,4,9,8])
iii. Create a series from a list, numpy array and dictionary
Combine many series to make a data frame.
iv. Create a normalized form of iris's sepallength whose values
range exactly between 0 and 1 so that the minimum has value
0 and maximum has value 1.
Input:
url = 'https://archive.ics.uci.edu/ml/machine-learning-
databases/iris/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float',
usecols=[0])
Hint: Apply Min-Max Scalar formula
Load Data and perform Data Pre-processing.
Input: df = pd.read_csv
('https://raw.githubusercontent.com/selva86/datasets/master/Cars93
_miss.csv')
i. Read a csv file to create a data frame and print top records.
ii. Check if there are any missing values in the data.
3 iii. Drop null values / Impute the missing values with mean / 04
median.
iv. Import ‘crim’ and ‘medv’ columns of the BostonHousing
dataset as a dataframe and get the nrows, ncolumns,
datatype, summary stats of each column of a dataframe.
v. Which manufacturer, model and type has the highest Price?
vi. How to create one-hot encodings of a categorical variable.
Prepared by Approved By
Prof. Shilpa Sonawani Dr. Vrushali Kulkarni
HoS, SCET