0% found this document useful (0 votes)

53 views38 pages

Introduction To Data Science - UNIT-1 (Session-2) - Dr.R.Richards - 09-03-2025

The document provides an introduction to data science, covering its definition, evolution, types of data, and the importance of data wrangling. It explains the various types of data, including structured, unstructured, semi-structured, and metadata, and discusses the applications and benefits of data science in decision-making and predictive analytics. Additionally, it highlights the challenges in data processing and the significance of big data in the modern data landscape.

Uploaded by

Hassan Ahmad Abubakar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views38 pages

Introduction To Data Science - UNIT-1 (Session-2) - Dr.R.Richards - 09-03-2025

Uploaded by

Hassan Ahmad Abubakar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 38

BCA 602- Introduction

to Data Science
Unit No. I

Introduction to Data
Science

Centre for Distance and Online Education

Unit No. I

Introduction to Data
Science
Data Evolution (Data to Data
Science)

Centre for Distance and Online Education

Objectives
After completion of Unit-I, you will be able to understand the
following areas:
 Data to Data Science -Understanding data: Introduction -Types of Data.
 Data Evolution - Data Sources. Preparing and gathering data and knowledge
 Philosophies of data science - data all around us: the virtual wilderness
 Data wrangling: from capture to domestication
 Data science in a big data world
 Benefits and uses of data science and big data - facets of data.

Centre for Distance and Online Education

Data Science –Definition
Data Science is the area of study which involves extracting insights from vast
amounts of data by the use of various scientific methods, algorithms, and
processes. It helps you to discover hidden patterns from the raw data.

Data Science is a multi-disciplinary science with an objective to

perform data analysis to generate knowledge that can be used
for decision making. This knowledge can be in the form of similar
patterns or predictive planning models, forecasting models etc.

A data science application collects data and information from

multiple heterogenous sources, cleans, integrates, processes and
analyses this data using various tools and presents information
and knowledge in various visual forms.

Centre for Distance and Online Education

Centre for Distance and Online Education
Centre for Distance and Online Education
Advantages:

 It helps in making business decisions such

as deciding the health of companies with
whom they plan to collaborate.
 It may help in making better predictions for
the future such as making strategic plans of
the company based on present trends etc.
 It may identify similarities among various
data patterns leading to applications like
fraud detection, targeted marketing etc

Centre for Distance and Online Education

Centre for Distance and Online Education
Types of Data in Data Science
Data and Big Data includes huge volume, high velocity, and
extensible variety of data.

The data in it will be of four types.

1. Unstructured data: Word, PDF, Text, images, audio and video
2. Semi Structured data: XML data.
3. Meta Data: Data about data
4. Structured data: Relational data.

Centre for Distance and Online Education

Unstructured Big Data
Any data with unknown form or the structure is classified as unstructured data. In
addition to the size being huge data poses multiple challenges in terms of its processing
for deriving value out of it. Typical example of unstructured data is, a heterogeneous
data source containing a combination of simple text files, images, audio and videos etc.
Semi-Structured data
Semi structured data can contain both the forms of data. We can see Semi structured
data in form but it is actually not defined .With example a table definition in relational
DBMS.
Example of semi-structured data is a data represented in XML file. Web pages are
generated in scripting of HTML which is also an example semi structured data.
Personal data stored in a XML file
<rec><name>Amitav</name><gender>Male<gender><age>45</age></rec>
<rec><name>Sudipta</name><gender>Male</gender><age>17</age></rec>
<rec><name>Soumya</name><gender>Male</gender><age>15</age></rec>

Centre for Distance and Online Education

Meta Data
Metadata is defined as the data providing information about one or more
aspects of the data. It is used to summarize basic information about data
which can make tracking and working with specific data easier.
There are three main types of metadata:
• Descriptive metadata describes a resource identification It can
include elements such as title of the book, abstract and keywords.
• Structural metadata indicates how compound objects are put
together e.g. how pages are ordered to form chapters.
• Administrative metadata provides information to help manage a
resource, such as when and how it was created, file type and other
technical information, and who can access it.

Centre for Distance and Online Education

META DATA- EXAMPLE
Structured Data
Any data that can be stored, accessed and processed in the form of fixed format is
termed as a ‘Structured’ data. In other words all data which can be stored in database
SQL in form of table with rows and columns.

Centre for Distance and Online Education

Semi-structured Data
As the name suggest Semi-structured has some structure in it.
The structure of semi-structured data is due to the use of tags or
key/value pairs The common form of semi-structured data is
produced through XML, JSON objects, Server logs, EDI data, etc.
<Book>
<title>Data Science and Big Data</title>
<author>R Raman</author>
<author>C V Shekhar</author>
<yearofpublication>2020</yearofpublication>
</Book>
"Book": {
"Title": "Data
Science",
"Price": 5000,
"Year": 2020
}
Centre for Distance and Online Education
Structured and Semi structured Data
Structured and Semi structured Data
Examples of Data Science

I. Examples of data science and its applications are

everywhere. Data science has applications in
everything from food delivery, sports, traffic, and
health. Data is everywhere and so data science can
be applied to everything.

II. In terms of food, Uber is investing in an expansion to

its ride-sharing system focused on the delivery of
food, Uber Eats. Uber Eats needs to get people their
food in a timely fashion, while it is still hot and fresh.
In order for this to occur, data scientists for the
company need to use statistical modeling that takes
into account aspects like distance Centre
from restaurants to
for Distance and Online Education
Philosophies of data science - data all around us: the virtual
wilderness

Data science is the extraction of knowledge from data.

Simple enough, but that description doesn’t distinguish
data science from the many other similar terms, except
perhaps to claim that data science is an umbrella term for the
whole lot. On the other hand, this era of data science has a
property that no previous era had, and it is, to me, a fairly
compelling reason to apply a new term to the types of things
that data scientists do that previous applied statisticians and
data-oriented software engineers did not. This reason helps me
underscore an often-overlooked but very important aspect of
data science.

Centre for Distance and Online Education

• Measuring things in real life
• Measuring things online
• Scripting and web scraping
• Data-collection devices— Today’s concept of the Internet of Things gets
considerable media buzz partially for its value in creating data from physical devices,
some of which are capable of recording the physical world—for example, cameras,
thermometers, and gyroscopes.

• Log files or archives— Sometimes jargonized into digital trail or exhaust, log files are
(or can be) left behind by many software applications.

Centre for Distance and Online Education

Data wrangling: from capture to domestication
In data science, "data wrangling" refers to the process of taking
raw, unorganized data and transforming it into a clean, structured
format that is suitable for analysis, essentially "domesticating"
the data by cleaning, organizing, and structuring it to make it
usable for further operations like modeling and visualization; this
often involves capturing data from various sources, then cleaning
it by handling missing values, inconsistencies, and duplicates, and
finally reshaping it to fit the analysis needs.

Key points about data wrangling:

Goals: To prepare raw data for analysis by making it consistent,

accurate, and accessible.

Centre for Distance and Online Education

Data wrangling: from capture to domestication
Steps involved:

1.Data capture: Gathering data from different sources,

including databases, APIs, files, etc.
2.Data cleaning: Identifying and correcting errors like
missing values, outliers, duplicates, and inconsistent
formatting.
3.Data transformation: Reshaping data by combining
datasets, splitting variables, aggregating data, or
converting data types to fit analysis needs.
4.Data enrichment: Adding relevant information to
the dataset to enhance analysis.
Centre for Distance and Online Education
Why is data wrangling important?

 Accuracy of analysis: Clean and properly formatted data leads

to more reliable and meaningful results from analysis.
 Efficiency: By preparing data upfront, data scientists can spend
more time on analysis and modeling instead of struggling with
messy data.
 Model performance: Quality data is crucial for training accurate
machine learning models.

Common data wrangling challenges:

 Incomplete data: Dealing with missing values and finding ways
to impute them.
 Inconsistent formatting: Standardizing data formats across
different sources.
 Data quality issues: Identifying and correcting errors like typos
or incorrect data types.
 Data integration: Combining data fromCentre
multiple sources while
for Distance and Online Education
The data wrangling process typically involves these
steps:

 Discovering
 Structuring
 Cleaning
 Enriching
 Validating

Tools for data wrangling:

 Programming languages: Python (with libraries like

pandas, NumPy), R
 Data analysis platforms: Tableau, Power BI
 Data wrangling tools: Alteryx, Trifacta

Centre for Distance and Online Education

Data science in a big data world:

• Big data is a blanket term for any collection of data sets so large or complex that
it becomes difficult to process them using traditional data management
techniques such as, for example, the RDBMS (relational database management
systems).

• The widely adopted RDBMS has long been regarded as a one-size-fits-all

solution, but the demands of handling big data have shown otherwise. Data
science involves using methods to analyze massive amounts of data and extract
the knowledge

• Data science involves using methods to analyze massive amounts of data and
extract the knowledge it contains. You can think of the relationship between big
data and data science as being like the relationship between crude oil and an oil
refinery. Data science and big data evolved from statistics and traditional data
management but are now considered to be distinct disciplines.

Centre for Distance and Online Education

 Predictive analytics: Uses historical data and algorithms
Benefits and uses of data science and big data - facets of data
to make forecasts and predictions.
Data science and big data can help businesses and organizations
 Machine
make learning:
better decisions, Used
improve to create
customer predictive
experience, and models and
increase efficiency.
analyze data.
 Data visualization: Helps users understand complex data
sets.
 Recommendation systems: Helps users find relevant
information.
 Fraud detection: Helps identify fraudulent transactions.
 Sentiment analysis: Helps understand how people feel
about a product or service.

Centre for Distance and Online Education

Big data - Uses
 Improved decision-making: Helps organizations
make data-driven decisions.
 Better customer experiences: Helps
personalize customer experiences.
 More efficient operations: Helps streamline
operations.
 Improved risk management: Helps mitigate risk
and handle setbacks.
 Increased agility and innovation: Helps
organizations respond to market demands.
 Data science and big data are used in many
industries, including healthcare, finance,
marketing, and technology.

Centre for Distance and Online Education

Facets of data
In data science and big data you’ll come across many
different types of data, and each of them tends to require
different tools and techniques. The main categories of
data are these:
 Structured
 Unstructured
 Natural language
 Machine-generated
 Graph-based
 Audio, video, and images
 Streaming

Centre for Distance and Online Education

STRUCTURED DATA
Structured data is data
that depends on a data
model and resides in a
fixed field within a
record. As such, it’s
often easy to store
structured data in tables
within databases or
Excel files (figure 1.1).
SQL, or Structured
Query Language, is the
preferred way to
manage and query data
that resides in
databases. You may also
come across structured
data that might give you
a hard time storing it in
a traditional relational
database. Hierarchical Centre for Distance and Online Education
UNSTRUCTURED DATA
Unstructured data is data that
isn’t easy to fit into a data model
because the content is context-
specific or varying. One example
of unstructured data is your
regular email (figure 1.2).
Although email contains
structured elements such as the
sender, title, and body text, it’s a
challenge to find the number of
people who have written an email
complaint about a specific
employee because so many ways
exist to refer to a person, for
example. The thousands of
different languages and dialects
out there further complicate this.
Centre for Distance and Online Education
NATURAL LANGUAGE
• Natural language is a special type of unstructured data; it’s
challenging to process because it requires knowledge of specific
data science techniques and linguistics.

• The natural language processing community has had success in

entity recognition, topic recognition, summarization, text
completion, and sentiment analysis, but models trained in one
domain don’t generalize well to other domains. Even state-of-
the-art techniques aren’t able to decipher the meaning of every
piece of text. This shouldn’t be a surprise though: humans
struggle with natural language as well. It’s ambiguous by nature.
The concept of meaning itself is questionable here. Have two
people listen to the same conversation. Will they get the same
meaning? The meaning of the same words can vary when
coming from someone upset or joyous.
Centre for Distance and Online Education
MACHINE-GENERATED DATA
Machine-generated data is information
that’s automatically created by a
computer, process, application, or other
machine without human intervention.
Machine-generated data is becoming a
major data resource and will continue to do
so. Wikibon has forecast that the market
value of the industrial Internet (a term
coined by Frost & Sullivan to refer to the
integration of complex physical machinery
with networked sensors and software) will
be approximately $540 billion in 2020. IDC
(International Data Corporation) has
estimated there will be 26 times more
connected things than people in 2020. This
network is commonly referred to as the
internet of things.
The analysis of machine data relies on
highly scalable tools, due to its high
volume and speed. Examples of machine
data are web server logs, call detail
Centre for Distance and Online Education
records, network event logs, and telemetry
GRAPH-BASED OR NETWORK DATA
“Graph data” can be a confusing term
because any data can be shown in a
graph. “Graph” in this case points to
mathematical graph theory. In graph
theory, a graph is a mathematical
structure to model pair-wise relationships
between objects. Graph or network data
is, in short, data that focuses on the
relationship or adjacency of objects. The
graph structures use nodes, edges, and
properties to represent and store
graphical data. Graph-based data is a
natural way to represent social networks,
and its structure allows you to calculate
specific metrics such as the influence of a
person and the shortest path between two
people.
Centre for Distance and Online Education
AUDIO, IMAGE, AND VIDEO
Audio, image, and video are data types that pose specific
challenges to a data scientist. Tasks that are trivial for humans,
such as recognizing objects in pictures, turn out to be challenging
for computers. MLBAM (Major League Baseball Advanced Media)
announced in 2014 that they’ll increase video capture to
approximately 7 TB per game for the purpose of live, in-game
analytics. High-speed cameras at stadiums will capture ball and
athlete movements to calculate in real time, for example, the
path taken by a defender relative to two baselines.

STREAMING DATA
While streaming data can take almost any of the previous forms,
it has an extra property. The data flows into the system when an
event happens instead of being loaded into a data store in a
batch. Although this isn’t really a different type of data, we treat it
here as such because you need to adapt your Centreprocess toOnline
for Distance and deal with
Education
Summary
• Data to Data Science -Understanding data: Introduction -Types of
Data.
• Data Evolution - Data Sources. Preparing and gathering data and
knowledge
• Philosophies of data science - data all around us: the virtual
wilderness
• Data wrangling: from capture to domestication
• Data science in a big data world
• Benefits and uses of data science and big data - facets of data
Centre for Distance and Online Education
Additional Resources
Video Links:
https://www.youtube.com/watch?v=lSwIe0TMUhc

https://www.youtube.com/watch?v=KxryzSO1Fjs

Tutorial Lesson Web site URL:

https://www.geeksforgeeks.org/introduction-to-data-science/

https://www.tutorialspoint.com/data_science/index.htm

https://www.tpointtech.com/data-science

Centre for Distance and Online Education

Any Questions ?

Centre for Distance and Online Education

Thank You!
The title of the college here along
with a brief description if required

Centre for Distance and Online Education

DWH Project Documentation Template
No ratings yet
DWH Project Documentation Template
3 pages
IDS - Lecture 1
No ratings yet
IDS - Lecture 1
52 pages
Big Data in Data Science
No ratings yet
Big Data in Data Science
3 pages
FDS - Unit 1
No ratings yet
FDS - Unit 1
233 pages
FDS - Unit 1
No ratings yet
FDS - Unit 1
233 pages
MCS 226
No ratings yet
MCS 226
348 pages
Unit 1
No ratings yet
Unit 1
19 pages
Block 1
No ratings yet
Block 1
107 pages
Explaratory Data Analysis - Python
No ratings yet
Explaratory Data Analysis - Python
16 pages
Data Science SPPU
No ratings yet
Data Science SPPU
115 pages
Unit I - Data Science
No ratings yet
Unit I - Data Science
161 pages
DS231 Week 2
No ratings yet
DS231 Week 2
33 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
36 pages
Data v2
No ratings yet
Data v2
25 pages
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
From Everand
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
Marlowe Reyes
No ratings yet
Fdsa PPT - Unit 1
No ratings yet
Fdsa PPT - Unit 1
19 pages
Unit I
No ratings yet
Unit I
262 pages
Fods Notes For Lecturing
No ratings yet
Fods Notes For Lecturing
5 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
26 pages
DS231 Module 2
No ratings yet
DS231 Module 2
33 pages
Unit 1
No ratings yet
Unit 1
25 pages
Big Data and Data Science
No ratings yet
Big Data and Data Science
6 pages
Mod 3
No ratings yet
Mod 3
96 pages
CS 3353 FDS Unit 1 Notes JPR
No ratings yet
CS 3353 FDS Unit 1 Notes JPR
39 pages
Introduction To Datasciecne
No ratings yet
Introduction To Datasciecne
50 pages
Module 1
No ratings yet
Module 1
35 pages
21css303t Datascience Unit 1 Notes
No ratings yet
21css303t Datascience Unit 1 Notes
246 pages
Introduction To Data Science: Chapter Two
No ratings yet
Introduction To Data Science: Chapter Two
52 pages
Unit 1
No ratings yet
Unit 1
26 pages
Fdsunit 1
No ratings yet
Fdsunit 1
27 pages
Unit I - Data Science
No ratings yet
Unit I - Data Science
161 pages
Lecture 1 and 2 Powerpoints
No ratings yet
Lecture 1 and 2 Powerpoints
32 pages
L1 - Introduction To Data Science
No ratings yet
L1 - Introduction To Data Science
33 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
36 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Unit 1
No ratings yet
Unit 1
28 pages
Data Science Unit 1 Notes
No ratings yet
Data Science Unit 1 Notes
22 pages
Data Evolution Unit 1 Material
No ratings yet
Data Evolution Unit 1 Material
28 pages
Defining Data Science
100% (1)
Defining Data Science
167 pages
Data Science - FYBCA-Sem-II
No ratings yet
Data Science - FYBCA-Sem-II
13 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
37 pages
CH1 Introduction To Data Science BS
No ratings yet
CH1 Introduction To Data Science BS
69 pages
Chapter Two
No ratings yet
Chapter Two
57 pages
Chapter 2-2
No ratings yet
Chapter 2-2
34 pages
Chapter 2 Introduction To Data Science
No ratings yet
Chapter 2 Introduction To Data Science
50 pages
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
Ds Unit 1
No ratings yet
Ds Unit 1
18 pages
Chapter 2
No ratings yet
Chapter 2
10 pages
Unit 1 To 5
No ratings yet
Unit 1 To 5
202 pages
Module 1 - Data Science Introduction - Detailed
No ratings yet
Module 1 - Data Science Introduction - Detailed
131 pages
ET Ch-2 Data Science PPT
No ratings yet
ET Ch-2 Data Science PPT
28 pages
03-07-2024-Data Science - Orentation Programme
No ratings yet
03-07-2024-Data Science - Orentation Programme
53 pages
ETCh 2
No ratings yet
ETCh 2
36 pages
FODS Full Notes
No ratings yet
FODS Full Notes
217 pages
Python For Data Science 2025 Slides
No ratings yet
Python For Data Science 2025 Slides
364 pages
Introduction To Data Science - Students
No ratings yet
Introduction To Data Science - Students
237 pages
Dr. Ayaz - Data Science Presentation
No ratings yet
Dr. Ayaz - Data Science Presentation
164 pages
Data Science-New (Unit-I)
No ratings yet
Data Science-New (Unit-I)
18 pages
Unit 1
No ratings yet
Unit 1
9 pages
Data Science
No ratings yet
Data Science
35 pages
CHAPTER 2 Emerging
No ratings yet
CHAPTER 2 Emerging
8 pages
Mycot in Millet Grains
No ratings yet
Mycot in Millet Grains
7 pages
Software Testing Assignment
No ratings yet
Software Testing Assignment
2 pages
Live Class 2
No ratings yet
Live Class 2
10 pages
BCA Assignment No. 3
No ratings yet
BCA Assignment No. 3
1 page
Assessment 3
No ratings yet
Assessment 3
1 page
Live Class 1 - 20250315 - 054059 - 0000
No ratings yet
Live Class 1 - 20250315 - 054059 - 0000
13 pages
Ideas Assignment 1
No ratings yet
Ideas Assignment 1
2 pages
Assignment No 1
No ratings yet
Assignment No 1
2 pages
Assignment 2
No ratings yet
Assignment 2
1 page
Assignment 1-1
No ratings yet
Assignment 1-1
1 page
Janada Karimu-Seminar Slide
No ratings yet
Janada Karimu-Seminar Slide
15 pages
MAA Assignment No. 1
No ratings yet
MAA Assignment No. 1
2 pages
Python Assignment
No ratings yet
Python Assignment
6 pages
Dbi PDF
No ratings yet
Dbi PDF
13 pages
Chapter 6
No ratings yet
Chapter 6
49 pages
Name: Mabangula Cindy Student No.: 218121970 Group: 3E Assignment 1
No ratings yet
Name: Mabangula Cindy Student No.: 218121970 Group: 3E Assignment 1
8 pages
William Chang Resume Azure
No ratings yet
William Chang Resume Azure
6 pages
Unit 2
No ratings yet
Unit 2
13 pages
Oracle PL SQL Cheat Sheet 1690341202
No ratings yet
Oracle PL SQL Cheat Sheet 1690341202
7 pages
2024 - 25 EVEN CE263 DBMS PracticalList
No ratings yet
2024 - 25 EVEN CE263 DBMS PracticalList
6 pages
PHS3233 Test 2 (Q)
No ratings yet
PHS3233 Test 2 (Q)
5 pages
Unit-2-Part-1 Data Mining
No ratings yet
Unit-2-Part-1 Data Mining
12 pages
Class Xii Python - Merged
No ratings yet
Class Xii Python - Merged
28 pages
Data Mining-Unit IV
No ratings yet
Data Mining-Unit IV
15 pages
CS 3308 Discussion Assignment Unit 1
No ratings yet
CS 3308 Discussion Assignment Unit 1
6 pages
Practice Set (ABEE)
No ratings yet
Practice Set (ABEE)
8 pages
Chap 05 Interacting With Database
No ratings yet
Chap 05 Interacting With Database
25 pages
DBMS-Module 5
No ratings yet
DBMS-Module 5
15 pages
Asm Technologies-22 June: I Think U Can Get The Answers in Internet...
No ratings yet
Asm Technologies-22 June: I Think U Can Get The Answers in Internet...
2 pages
Northwind Case Study
No ratings yet
Northwind Case Study
6 pages
Chapter 6.VI Interacting With Database
No ratings yet
Chapter 6.VI Interacting With Database
18 pages
GATE Ques Set 1
No ratings yet
GATE Ques Set 1
10 pages
Lesson Note Senior Secondary School 1
No ratings yet
Lesson Note Senior Secondary School 1
6 pages
PSQL Quick Reference: General Informational
No ratings yet
PSQL Quick Reference: General Informational
1 page
JDBC Lecture Notes
No ratings yet
JDBC Lecture Notes
14 pages
Informatica Powercenter 8.6: Basics Training Course
No ratings yet
Informatica Powercenter 8.6: Basics Training Course
197 pages
Unit 5
No ratings yet
Unit 5
26 pages
L23 - Mysql 1
No ratings yet
L23 - Mysql 1
49 pages
Big Data Unit 1 AKTU Notes
No ratings yet
Big Data Unit 1 AKTU Notes
87 pages
Dec40073 PW3
No ratings yet
Dec40073 PW3
7 pages
30 Nov 23
No ratings yet
30 Nov 23
3 pages
CS352 Advance Data All in Source by Jayson C. Lucena
100% (1)
CS352 Advance Data All in Source by Jayson C. Lucena
138 pages

Introduction To Data Science - UNIT-1 (Session-2) - Dr.R.Richards - 09-03-2025

Uploaded by

Introduction To Data Science - UNIT-1 (Session-2) - Dr.R.Richards - 09-03-2025

Uploaded by

BCA 602- Introduction

Centre for Distance and Online Education

Centre for Distance and Online Education

Centre for Distance and Online Education

Data Science is a multi-disciplinary science with an objective to

A data science application collects data and information from

Centre for Distance and Online Education

 It helps in making business decisions such

Centre for Distance and Online Education

The data in it will be of four types.

Centre for Distance and Online Education

Centre for Distance and Online Education

Centre for Distance and Online Education

Centre for Distance and Online Education

I. Examples of data science and its applications are

II. In terms of food, Uber is investing in an expansion to

Data science is the extraction of knowledge from data.

Centre for Distance and Online Education

Centre for Distance and Online Education

Key points about data wrangling:

Goals: To prepare raw data for analysis by making it consistent,

Centre for Distance and Online Education

1.Data capture: Gathering data from different sources,

 Accuracy of analysis: Clean and properly formatted data leads

Common data wrangling challenges:

Tools for data wrangling:

 Programming languages: Python (with libraries like

Centre for Distance and Online Education

• The widely adopted RDBMS has long been regarded as a one-size-fits-all

Centre for Distance and Online Education

Centre for Distance and Online Education

Centre for Distance and Online Education

Centre for Distance and Online Education

• The natural language processing community has had success in

Tutorial Lesson Web site URL:

Centre for Distance and Online Education

Centre for Distance and Online Education

Centre for Distance and Online Education

You might also like