FDS Syllabus and CIS
FDS Syllabus and CIS
COURSE OBJECTIVES:
To understand the data science fundamentals and process.
To learn to describe the data for the data science process.
To learn to describe the relationship between data.
To utilize the Python libraries for Data Wrangling. To present and interpret data using
visualization libraries in Python
UNIT I INTRODUCTION 9
Data Science: Benefits and uses – facets of data - Data Science Process: Overview – Defining
research goals – Retrieving data – Data preparation - Exploratory Data analysis – build the
model– presenting findings and building applications - Data Mining - Data Warehousing – Basic
Statistical descriptions of Data
UNIT II DESCRIBING DATA 9
Types of Data - Types of Variables -Describing Data with Tables and Graphs –Describing Data
with Averages - Describing Variability - Normal Distributions and Standard (z) Scores
UNIT III DESCRIBING RELATIONSHIPS 9
Correlation –Scatter plots –correlation coefficient for quantitative data –computational formula
for correlation coefficient – Regression –regression line –least squares regression line – Standard
error of estimate – interpretation of r2 –multiple regression equations –regression towards the
mean
UNIT IV PYTHON LIBRARIES FOR DATA WRANGLING 9
Basics of Numpy arrays –aggregations –computations on arrays –comparisons, masks, boolean
logic – fancy indexing – structured arrays – Data manipulation with Pandas – data indexing and
selection – operating on data – missing data – Hierarchical indexing – combining datasets –
aggregation and grouping – pivot tables
UNIT V DATA VISUALIZATION 9
Importing Matplotlib – Line plots – Scatter plots – visualizing errors – density and contour plots
– Histograms – legends – colors – subplots – text and annotation – customization – three
dimensional plotting - Geographic Data with Basemap - Visualization with Seaborn.
COURSE OUTCOMES:
At the end of this course, the students will be able to:
CO1: Define the data science process
CO2: Understand different types of data description for data science process
CO3: Gain knowledge on relationships between data
CO4: Use the Python Libraries for Data Wrangling
CO5: Apply visualization Libraries in Python to interpret and explore data
TOTAL:45 PERIODS
TEXTBOOKS:
1. David Cielen, Arno D. B. Meysman, and Mohamed Ali, “Introducing Data Science”, Manning
Publications, 2016. (Unit I)
2. Robert S. Witte and John S. Witte, “Statistics”, Eleventh Edition, Wiley Publications, 2017.
(Units II and III)
3. Jake VanderPlas, “Python Data Science Handbook”, O’Reilly, 2016. (Units IV and V)
REFERENCE:
1. Allen B. Downey, “Think Stats: Exploratory Data Analysis in Python”, Green Tea Press,2014.
COURSE TITLE: FOUNDATION OF DATA SCIENCE CODE: CS3352
YEAR/SEMESTER: II / III
COURSE OBJECTIVES:
The students should be made to:
S.No Objectives
1 To understand the data science fundamentals and process.
2 To learn to describe the data for the data science process.
3 To learn to describe the relationship between data.
4 To utilize the python libraries for data wrangling.
5 To present and interpret data using visualization libraries in python.
COURSE OUTCOMES:
At the end of the course, the students should be able to:
COURSE OBJECTIVES:
The students should be made to:
S.No Objectives
1 To understand the data science fundamentals and process.
2 To learn to describe the data for the data science process.
3 To learn to describe the relationship between data.
4 To utilize the python libraries for data wrangling.
5 To present and interpret data using visualization libraries in python.
COURSE OUTCOMES:
At the end of the course, the students should be able to:
Ease of use
Python is relatively easy to learn and much less wordy compared to
other languages like Java. (This characteristic is known as being
Pythonic.) This simplicity lowers the barrier to enter Python as a new
programming language.
The best part is that the simplicity of the language does not affect
the functionality in any shape or form, and Python is always
powerful. Simply install the language, and you are ready to get
started. There are no complex configurations required, such as
setting up compilers.
Licensing structure
Python is an open-source language managed by the non-profit
Python Software Foundation. This open-source nature allows Python
to be used in any project without fearing any interference from a
third party. Contrast this to other programming languages managed
or owned by commercial organizations, where a single decision can
cripple the usage of the language.
Data analytics projects can be complex and time-consuming
endeavors. Thus, the open-source nature of Python helps data
scientists and analysts confidently use it for the foreseeable future
of any kind of commercial or hobbyist project.
Active community
Python has an active and thriving community, members of which are
routinely and actively:
Python packages
Python comes with the Python Package Index (PIP), the open-source
repository that contains all the third-party packages available for
Python. This library consists of packages to help users in various
tasks, from simple tasks like JSON parsing to complete data
transformation, analytics, and visualizations packages.
Pandas
SciPy
scikit-learn
And more
Python notebooks
Jupyter Notebooks, have become the defector standard when it
comes to creating notebooks. Notebooks offer users a browser-
based coding environment that can be used to create and share
notebooks that contain everything you need in one spot:
Code
Visualizations
Equations
Text