0% found this document useful (0 votes)
39 views

Unit - I IDS

Uploaded by

RIYA SUDRIK
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Unit - I IDS

Uploaded by

RIYA SUDRIK
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

MIT School of Computing

Department of Computer Science & Engineering

Third Year Engineering

21BTCS503 - Introduction to Data Science


PLD
Class - T.Y. Core (SEM-V)

Unit - I

1
MIT School of Computing
Department of Computer Science & Engineering

Unit - I: Syllabus
● Introduction to Data Science,

● Facets of Data : Structured data,PLD


Unstructured data , Natural language,
Machine generated data, Graph based or network data, Audio, Image
and Video, Streaming data

● The Data Science Process: setting the research goal, retrieving data,
Data preparation, Data exploration, model building, Presentation and
Automation.
2
MIT School of Computing
Department of Computer Science & Engineering

Data science is an interdisciplinary field that uses algorithms, procedures, and


processes to examine large amounts of data in order to uncover hidden patterns,
generate insights, and direct decision making.

PLD

3
MIT School of Computing
Department of Computer Science & Engineering

Data Science is about data gathering, analysis and decision-making.

Data Science is about finding patterns in data, through analysis, and make future predictions.

PLD
By using Data Science, companies are able to make:

● Better decisions (should we choose A or B)


● Predictive analysis (what will happen next?)
● Pattern discoveries (find pattern, or maybe hidden information in the data)

4
MIT School of Computing
Department of Computer Science & Engineering

Examples ????

● For route planning: To discover the best routes to ship


● To foresee delays for flight/ship/train etc. (through predictive analysis)
● To create promotional offers PLD
● To find the best suited time to deliver goods
● To forecast the next years revenue for a company
● To analyze health benefit of training
● To predict who will win elections

5
MIT School of Computing
Department of Computer Science & Engineering

Data Science tasks


So, It encompasses a wide range of tasks such as:

● Data Cleaning and preparation


● Data Visualization
● Data Analysis PLD
● Web Scraping
● IDE
● Programming Language
● Math
● Machine Learning
● Deploy

6
MIT School of Computing
Department of Computer Science & Engineering

PLD

7
MIT School of Computing
Department of Computer Science & Engineering

Career opportunities in Data Science field

● "The rise of Data Science needs will create roughly 11.5 million job openings by 2025" US
Bureau of Labour Statistics.

PLD
● "By 2026, Data Scientists and Analysts will become the number one emerging role in the
world." World Economic Forum 2.

● Data Science and Artificial Intelligence are amongst the hottest fields of the 2Ist century that
will impact all segments of daily life by 2025, from transport and logistics to healthcare and
customer service.
8
MIT School of Computing
Department of Computer Science & Engineering

Video bytes on Ameca : A humanoid robot

● https://www.youtube.com/watch?v=wGWVKkYEHBE

PLD

9
MIT School of Computing
Department of Computer Science & Engineering

Structuring features according to their type


● Quantitative features possess “numerical quantity”, such as height, age, number of
births, etc., and can either be continuous or discrete.

PLD

● Qualitative features do not have a numerical meaning, but their possible values can
be divided into a fixed number of categories, such as {M,F} for gender or {blue,
black, brown, green} for eye color. For this reason such features are also called
categorical. (Nominal or ordinal)

10
MIT School of Computing
Department of Computer Science & Engineering

Description
of Data
variables in
PLD nutritional
study

11
MIT School of Computing
Department of Computer Science & Engineering

An exercise: Modifying the type of data

PLD

12
MIT School of Computing
Department of Computer Science & Engineering

An exercise: To know the structure of data…..


For same access the dataset from:

PLD
http://www.biostatisticien.eu/springeR/nutrition_elderly.xls

13
MIT School of Computing
Department of Computer Science & Engineering

Facets of Data

PLD

14
MIT School of Computing
Department of Computer Science & Engineering

Facets of Data

PLD

15
MIT School of Computing
Department of Computer Science & Engineering

PLD

16
MIT School of Computing
Department of Computer Science & Engineering

Facets of Data
1. Structured data: Structured data refers to data that has a predefined format and fits neatly into
relational databases or tabular structures. (Quantitative Data)
E.g. Spreadsheet data: , Relational data, Transactional data, Sensor data

PLD

2. Unstructured data: Unstructured data refers to data that does not have a predefined structure and
does not fit well into traditional databases. (Qualitative Data)
E.g. Textual data, Multimedia data, web data, social media data and web data

17
MIT School of Computing
Department of Computer Science & Engineering

3. Graph based or network data: Graph-based or network-based data refers to data that
represents relationships or connections between entities. In this type of data, entities are
represented as nodes or vertices, and the relationships between them are represented as edges
or links.

E.g. Recommendation systems, Transportation systems


PLD

18
MIT School of Computing
Department of Computer Science & Engineering

PLD

19
MIT School of Computing
Department of Computer Science & Engineering

Data Science Applications/Projects deployed

● Nissans Robotic Chairs


● AI finding your dream job
● ChatGPT PLD
● AI Powered shoes
● Ameca
● TayTweets (Why was it stopped?)
● Smart Watches
● Netflix
20
MIT School of Computing
Department of Computer Science & Engineering

Henceforth, Data science

is a field that involves

using statistical and

PLD computational techniques

to extract insights and

knowledge from data.

Data Science is all about

to Play with Data.

21
MIT School of Computing
Department of Computer Science & Engineering

Data Science Process


1. Business understanding
2. Data collection
3. Data preparation
PLD
4. Modeling
5. Evaluation
6. Deployment

Link to its description:


https://www.springboard.com/blog/data-science/data-science-process/#h2
22
MIT School of Computing
Department of Computer Science & Engineering

Data Science Process

PLD

23
MIT School of Computing
Department of Computer Science & Engineering
Data Science Process
1. Business understanding:
Data scientists meet with stakeholders, subject matter experts, and others who
can offer insights into the problem at hand. They may also do preliminary
research to see how others have tried to
PLDsolve similar problems.

2. Data collection:
● Logs from web servers
● Data gathered from social media
● Census datasets
● Data streamed from online sources using APIs 24
MIT School of Computing
Department of Computer Science & Engineering
Data Science Process
3. Data preparation:
Data cleaning: fixing incomplete or erroneous data
Data integration: unifying data from different sources
Data transformation: formatting thePLD
data
Data reduction: reducing data to its simplest form
Data discretization: reducing the number of values to make data management
easier
Feature engineering: selecting and transforming variables to work better with
machine learning

25
MIT School of Computing
Department of Computer Science & Engineering
Data Science Process
4. Modeling:
● A list of parameter settings

● A description of the models


PLD
● The models themselves

5. Evaluation:
You’ll evaluate the model based on the goals of your business. Then, you’ll review
your work process and explain how your model will help the business, summarize
your findings, and make any corrections.
26
MIT School of Computing
Department of Computer Science & Engineering
Data Science Process
6. Deployment:
During the deployment phase, you’ll plan and document how you intend to

deploy the model and how the results will be delivered and presented. You’ll also
PLD
need to monitor the results and maintain the model during the deployment phase.

27
MIT School of Computing
Department of Computer Science & Engineering

Case Study: Data Analysis for a food delivery App


Sr. no. Attribute

1 Customer placed order time

2 Placed order with restaurant

3 Driver at restaurant
PLD

4 Delivered to customer

5 Driver ID

6 Restaurant ID

7 Consumer ID

8 Delivery Region
28
MIT School of Computing
Department of Computer Science & Engineering

Case Study: Data Analysis for a food delivery App


Sr. no. Attribute

9 Order amount

10 Discount amount

11 Tip amount
PLD

12 Refunded amount

13 Operational time

29
MIT School of Computing
Department of Computer Science & Engineering

Case Study
Insights drawn:

1. Time stamps: Customer → Restaurant —> Driver arrival time at restaurant → Driver
PLD
arrival time at customer (Time duration) (Operations smoothly?)

2. Restaurants Business: Restaurant ID, Where are they ordering the food?, Which are the

most important restaurants? Restaurant within city or different cities.. Generate amount of orders

? How restaurants would succeed?

30
MIT School of Computing
Department of Computer Science & Engineering

Case Study
Insights drawn:

3. Customers Tip Value: How much are they tipping?, Customers tip amount and which

restaurants are they ordering from ? PLD

31
MIT School of Computing
Department of Computer Science & Engineering

Case Study
Outcomes:

1. Deepen partnership with high growth restaurants


PLD
2. Ongoing hiring - “Attractive tip for dashers”

3. Drive towards operational excellence

32
MIT School of Computing
Department of Computer Science & Engineering

PLD
Thank you

33

You might also like