U23AD492 - Data Science Syllabus
U23AD492 - Data Science Syllabus
1. Course Description
The course aims to provide students with a comprehensive understanding of data science, covering
key concepts, methodologies, and tools essential for data analysis, interpretation, and decision-
making. Students will learn to collect, preprocess, and analyze data from various sources using
statistical techniques and machine learning algorithms. Students will gain practical experience in
applying data science methods to real-world problems. By the end of the course, students will be
equipped with the knowledge and proficiency needed to extract valuable insights from data, make
informed decisions, and contribute effectively to the rapidly evolving field of data science.
2. Course Objectives:
3. Syllabus
Data science: definition, scope, importance of data-driven decision making, interdisciplinary nature
of data science, stages of data science life cycle; overview of data science tools and techniques,
applications of data science; Data acquisition: Sources of data, data collection and API, web
scraping: extracting data from websites, accessing different sources of data.
Role of Statistics in Data Science; Population vs. Sample; Descriptive vs. Inferential statistics;
Probability distributions: Poisson, Normal, Binomial, Uniform; Bayes' theorem and conditional
probability; Descriptive statistics: Measures of central tendency: Mean, median, mode; Measures of
dispersion: Variance, standard deviation; Inferential statistics: Hypothesis testing: Null and
alternative hypotheses, p-values; Confidence intervals, ANOVA, Chi-square test, T-test; Correlation
and Covariance.
Tableau: Introduction, Overview of Tableau interface and workspace; Features and advantages,
connecting to data sources, importing data from local files and cloud storage services, creating
basic visualizations in Tableau: Bar charts, line charts, scatter plots, pie charts, histograms,
heatmaps, advanced visualization techniques in Tableau:Treemaps, bubble charts, box plots, dual-
axis charts, combination charts, adding filters and parameters, building interactive dashboards in
Tableau.
Power BI: Overview, connecting to data Sources in Power BI, Importing data from local files,
databases, and web sources; creating basic visualizations in Power BI: Bar charts, line charts,
scatter plots, pie charts, histograms, heatmaps; advanced visualization techniques in Power BI:
Treemaps, bubble charts, box plots, dual-axis charts, combination charts, building interactive
dashboards in Power BI.
Data analytics: descriptive analysis, diagnostic analytics, predictive analytics, predictive analytics;
Data pre-processing: handling missing values – imputation techniques, dealing with outliers;
Exploratory Data Analysis(EDA); Feature Engineering: One-hot encoding, label encoding, creating
new features, dimensionality reduction techniques.
Microsoft Excel for data analysis: Introduction to Excel for basic data manipulation and analysis,
data cleaning and formatting techniques in Excel, creating charts and graphs, pivot tables and pivot
charts for summarizing and analyzing data, advanced Excel features for statistical analysis; Python
packages for data science: NumPy for statistical analysis, data manipulation with Pandas data
frames, data visualization using Matplotlib and Seaborn library.
Total Hours (Theory): 30 Hours
LIST OF EXPERIMENTS:
1. Web Scrapping
Use Case: Perform Web-Scrapping, create DataFrame by collecting the data from the suitable
resource.
2. Exploratory Data Analysis: Perform Data Preprocessing & Data Wrangling on Netflix
International Dataset
3. Exploratory Data Analysis: Perform EDA on Netflix International Dataset.
4. Fraud Detection in Financial Transactions
Use Case: A banking institution aims to detect fraudulent transactions by analyzing historical
transaction data.
Experiment: Explore the dataset to identify patterns and anomalies indicative of fraudulent
behavior. Develop new features such as transaction frequency, transaction amount, and
geographical location. Apply anomaly detection techniques to flag suspicious transactions for
further investigation.
5. Predictive Maintenance for Industrial Equipment
Use Case: A manufacturing plant wants to implement predictive maintenance strategies to minimize
downtime and optimize equipment performance.
Experiment: Explore sensor data collected from industrial equipment to identify patterns associated
with equipment failures. Engineer features such as equipment usage, temperature, and vibration
levels. Train machine learning models to predict equipment failures before they occur based on
historical sensor data.
6. Market Segmentation Analysis- Tableau
Use Case: A beverage company is planning to launch a new health drink targeted towards health-
conscious consumers. However, they recognize that the health-conscious market is diverse, with
varying preferences and needs. To ensure the success of their product, they decide to conduct a
market segmentation analysis..
7. Covid-19 Trends- Power BI
Use Case: During the COVID-19 pandemic, public health authorities and policymakers need
accurate and timely information to respond effectively to the evolving situation. Market
segmentation analysis can be a valuable tool to understand how different population segments are
affected by the virus, which can inform targeted interventions and resource allocation.
8. Exploring COVID-19 Data Trends
Use Case: Health authorities want to visualize and analyze trends in COVID-19 cases to inform
public health policies.
Experiment: Collect COVID-19 data from reliable sources such as government health departments.
Use data visualization tools to create interactive dashboards displaying trends in case counts,
testing rates, and vaccination coverage. Analyze the data to identify hotspots and patterns over
time.
9. Visualizing Stock Market Volatility
Use Case: Financial analysts want to visualize and analyze stock market volatility to make
informed investment decisions.
Experiment: Gather historical stock market data from financial databases. Use data visualization
techniques to create candlestick charts and volatility plots showing price fluctuations and trading
volumes. Apply technical analysis indicators such as moving averages and Bollinger Bands to
identify potential trading opportunities.
10. Sales Performance Analysis
Use Case: Analyze sales data to identify top-performing products and regions for strategic decision-
making.
Experiment: Analyze sales data using Microsoft Excel to uncover insights into sales performance
and trends. Utilize Excel's data manipulation, visualization, and analysis tools to examine total
sales revenue, product performance, regional sales distribution, and sales trends over time.
11. Mini-Project
Total Hours (Lab + Project): 30+30=60
Total Hours 30+30+30=90
Text Books:
1. Avrim Blum, John Hop croft, and RavindranKannan,” Foundations of Data Science”,
Springer-2018
2. Suresh Kumar Mukhiya, Usman Ahmed, “Hands-On Exploratory Data Analysis with
Python”, O’Reilly, 2013.
3. CathyO’Neil, Rachel Schutt, “Doing Data Science, Straight Talk from The Frontline”,
O’Reilly, 2013.
4. Chandraish Sinha, “Tableau 10 for Beginners: Step by Step Guide to Developing
Visualizations in Tableau 10”, Createspace Independent Pub, 2017.
References:
Reference Books:
1. Dean J, “Big Data, Data Mining and Machine learning”, Wileypublications, 2014.
2. Provost F and Fawcett T, “Data Science for Business”,O‘Reilly Media Inc, 2013.
3. https://onlinecourses.nptel.ac.in/noc21_cs69/
4. https://pll.harvard.edu/course/data-science-visualization
Journals (Reference):
1. https://jds-online.org/journal/JDS
2. https://link.springer.com/journal/41060
3. https://epjdatascience.springeropen.com/
Video references:
1. https://www.youtube.com/watch?v=-ETQ97mXXF0
2. https://www.youtube.com/watch?v=dcXqhMqhZUo&t=2s
NPTEL Courses:
1. https://onlinecourses.nptel.ac.in/noc21_cs69/preview
2. https://onlinecourses.nptel.ac.in/noc22_cs32/preview