0% found this document useful (0 votes)
2 views

Foundations of Data Science.docx

The document outlines a course on the Foundations of Data Science, detailing objectives, units of study, practical exercises, software requirements, and course outcomes. Key topics include data analysis concepts, statistical methods, Python tools like NumPy and Pandas, data visualization techniques, and recent trends in data science applications. The course aims to equip students with essential skills for data inspection, cleansing, and interpretation using various data science tools.

Uploaded by

vjay2003
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Foundations of Data Science.docx

The document outlines a course on the Foundations of Data Science, detailing objectives, units of study, practical exercises, software requirements, and course outcomes. Key topics include data analysis concepts, statistical methods, Python tools like NumPy and Pandas, data visualization techniques, and recent trends in data science applications. The course aims to equip students with essential skills for data inspection, cleansing, and interpretation using various data science tools.

Uploaded by

vjay2003
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

FOUNDATIONS OF DATA SCIENCE L TP C

30 2 4

COURSE OBJECTIVES:

 To Understand the basic concepts of Data Analysis


 To acquire skills in data preparatory and preprocessing steps
 To understand the mathematical skills in statistics
 To learn the tools and packages in Python for data science
 To acquire knowledge in data interpretation and visualization techniques
 Will gain Knowledge about recent trends in Data Science

UNIT- I INTRODUCTION 8

Applications: Search engines, Image recognition


Need for data science – benefits and uses – facets of data – data science process – setting the research
goal – retrieving data – cleansing, integrating, and transforming data – exploratory data analysis – build
the models – presenting and building applications

UNIT- II DESCRIBING DATA 9

Applications: Speech recognition, Recommendation systems


Frequency distributions –Outliers –relative frequency distributions –cumulative frequency distributions –
frequency distributions for nominal data –interpreting distributions –graphs –averages -normal
distributions –z scores –normal curve problems –finding proportions –finding scores –more about z–
interpretation of r2 –multiple regression equations –regression toward the mean- statistical metrics with
python.
.
UNIT- III INTRODUCTION TO NUMPY 8

Applications: Machine Learning, Scientific Computing


Data types in Python -basics of Numpy arrays - computations on Numpy Arrays-universal functions-
aggregations: min, max and Everything in between-computation on arrays: broadcasting - comparisons,
masks, and Boolean logic - fancy indexing -sorting values in Numpy array-fast sorting-sorting along
rows or columns-partial sorts-K nearest neighbors- Numpy’s structured arrays
.
UNIT- IV DATA MANIPULATION WITH PANDAS 8

Applications: Financial Analysis, Data Visualization


Pandas objects - data indexing and selection - operating on data in pandas -handling missing data -
hierarchical indexing - combining datasets: concat and append - combining datasets: merge and join-
aggregation and grouping- pivot tables-vectorized string operations - working with time Series - high-
performance pandas: eval()and query().
.
UNIT -V PYTHON FOR DATA VISUALIZATION 7

Applications : Climate Change Analysis, Sports data Analysis


Visualization with matplotlib – line plots – scatter plots – visualizing errors – density and contour plots –
histograms, binnings, and density –three dimensional plotting – geographic data – data analysis using
statmodels and seaborn – graph plotting using Plotly – interactive data visualization using Bokeh

UNIT -VI RECENT TRENDS IN DATA SCIENCE 5

Healthcare- Drug development, Virtual healthcare assistance- Finance- Fraud detection- Marketing-
Targeted advertising, Customer interactions- Transportation - Driverless cars, Airline routing.
TOTAL: 45 PERIODS
PRACTICAL EXERCISES:
1. Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and Pandas
packages.
2. Working with Numpy arrays
3. Working with Pandas data frames
4. Reading data from text files, Excel and the web and exploring various commands for doing
descriptive analytics on the Iris data set.
5. Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing the
following:
a. Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard Deviation,
Skewness and Kurtosis.
b. Bivariate analysis: Linear and logistic regression modeling
c. Multiple Regression analysis
d. Also compare the results of the above analysis for the two data sets.
6. Apply and explore various plotting functions on UCI data sets.
a. Normal curves
b. Density and contour plots
c. Correlation and scatter plots
d. Histograms
e. Three dimensional plotting
7. Visualizing Geographic Data with Basemap
8. Importing Data from External Source Using Python
SOFTWARE REQUIREMENTS
Python, Numpy, Scipy, Matplotlib, Pandas, statmodels, seaborn, plotly, bokeh
TOTAL : 30 PERIODS
TOTAL: 75 PERIODS
COURSE OUTCOMES:

At the end of this course, the students will be able to:


CO1: Apply the skills of data inspecting and cleansing.
CO2: Determine the relationship between data dependencies using statistics
CO3: Represent the useful information using mathematical skills
CO4: Handle data using primary tools used for data science in Python
CO5: Apply the knowledge for data describing and visualization using tools
CO6: Aware of the current scope and limitations of DS and societal implications

TEXT BOOKS
1. David Cielen, Arno D. B. Meysman, and Mohamed Ali, “Introducing Data Science”, Manning
Publications, 2016. (first two chapters for Unit I)
2. Robert S. Witte and John S. Witte, “Statistics”, Eleventh Edition, Wiley Publications, 2017.
(Chapters 1–7 for Units II)
3. Jake VanderPlas, “Python Data Science Handbook”, O’Reilly, 2016. (Parts of chapters 2–4 for
Units III,IV and V)

REFERENCES
1. Allen B. Downey, “Think Stats: Exploratory Data Analysis in Python”, Green Tea Press, 2014
2.Sanjeev J. Wagh, Manisha S. Bhende, Anuradha D. Thakare, “Fundamentals of Data
Science”, CRC Press, 2022
3.Chirag Shah, “A Hands-On Introduction to Data Science”, Cambridge University Press

CO’s & PO’s MAPPING

CO PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11
CO1 2 2 1 2 2 - - - 1 1 1
CO2 2 1 - 1 1 - - - 2 1 1
CO3 2 2 1 2 2 1 1 - 1 2 1
CO4 3 2 2 1 2 - - - 1 1 2
CO5 2 2 1 2 2 - - - 1 1 1
CO6 2 2 1 2 2 - - - 1 1 1

You might also like