0% found this document useful (0 votes)
17 views

Introduction

Uploaded by

Aathmika Vijay
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Introduction

Uploaded by

Aathmika Vijay
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Introduction

Data Science
• Recent field combining big data, unstructured data and a combination
of statistics and analytics and business intelligence
• A new field that has emerged within the field of data management
provides an understanding of the correlation between structured and
unstructured data
Definition
• Data science is the discipline of using quantitative methods of
statistics and mathematics along with technology (computers and
software) to develop algorithms designed to discover patterns,
predict outcomes, and find optimal solutions to complex problems

• Data Science is also defined as the field of study that involves


extracting knowledge and insights, from noisy data, and then turning
those insights into actions that our business or organization can take.
• Data science is the intersection between 3 different disciplines
• Computer science
• Mathematics
• Business Expertise
The intersection of these three is data science
• Depends upon complexity and value

1. Descriptive analytics- what is happening in business. It involves accurate


data collection to make sure that we know what’s happening
Eg: did sales go up or down
2. Diagnostic Analytics- why did something happen
Eg: why did sales go up or down? Whether it has any problem
3. Predictive analytics- what is likely to happen
Eg: what will our sales performance be next quarter? Whether it involves using
historical patterns in our data to predict outcomes in the future.
4. Prescriptive analytics-what do I need to do next? What is the recommended
best action for a particular outcome?
Eg: what do I need to improve sales by 10%
Data science Lifecycle
• Business Understanding- business expertise and business knowledge
• Data Mining- going out to the data landscape and procuring data that
we need for analysis.
• Data cleaning- preparation and cleaning
• Exploration- using different analytical tools that help answer some
types of questions
• Machine Learning (ML)- massive amount of computing power and
massive amount of high-quality data to make predictions and
prescribe actions for the future
• Visualize- our insights and outcome of our analysis
People involved
• Business Analysts- involved in formulating the questions.
• They have domain expertise.
• They can help with the business understanding, but they are also
involved with visualizing our insights that are useful for the business.
• Data engineer-find the data (data Mining), clean the data and do
some analysis
• Data Scientist- involved in exploration, ML and visualization
• Overlap between the roles.
• Its better to have collaboration across these roles
Summary
What does data scientists do
What is generating so much data
• Data can be generated by
• Humans
• Machines
• Human’s- machines combination

• It can be generated anywhere any information is generated and


stored in structured or unstructured formats
How data add value to business
Data product
Why data is important
Methodology of handling Data- 5 elements

Data
Organise and Presentation Generalize
Collection Analyze data
classify of data (Inference)
(raw data)

30-01-2024 15
Contd..
➢Proper collection of data
• raw data- collected accurately and recorded.
• Faulty data and faulty collection techniques would result in wrong
conclusion.
• Secondary data
• Primary data- some area of data which has not been previously
ascertained

➢Organisation and classification of data- edit to correct any


inconsistencies, ambiguities, recording error
• Eg: Indians living in Canada- characteristics/attributes-age group, annual
income, employment type, education etc
30-01-2024 16
Data Acquisition errors
• Error happen while entering the data
• Eg: 24 yr old person entering as 42 or while putting in excel by
mistake putting it as 42.
• Outliers….

30-01-2024 17
Contd..
➢Presentation of data- presented in the form of tables
• Helps in presentation and analysis of data (Excel)

• Analysis of data- purpose is to make it useful for certain conclusion


• eg: averages, dispersion, percentages ( descriptive or inferential
analysis)

• Interpretation of data- drawing conclusion from the data which


helps in decision making
• Eg: demand less sales reduce
30-01-2024 18

You might also like