0% found this document useful (0 votes)
26 views

Chapter One Data Science

revision dat science just

Uploaded by

abaasmuuse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Chapter One Data Science

revision dat science just

Uploaded by

abaasmuuse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Chapter one

1. Data is the new oil. It’s valuable, but if unrefined, it cannot really be
used. changed gas, plastic, chemicals,
2. Types of Data
 Unstructured Text Data (Web)
 Semi-structured Dat (XML)
 Streaming Data (images and videos)
3. What is data science ?
 field of study that combines domain expertise
 Data scientists require background in these fundamental
related disciplines

4. Career Opportunities After Completion?


 Data Architect and Administrators
 Data Engineer
 Data Analyst
 Data Scientist
 Machine Learning Engineer
 Statisticians and Mathematicians
 Business IT Analyst
 Marketing Analyst
5. Why study data science ?
 Data is oneof theimportant assets of every organization or
business to make informed decisions - ‘new oil’.
 Big data - people and devices generate data at a growing and
unprecedented speed, e.g. social media.

6. Data science workflow ?


 Capture date : collect date from various resource, also known as
data acquisition, data extraction, data ingestion,

1 | Page writer:
Abaas Muuse
 Managing & date cleaning: remove errors and duplicates, also
known as data cleansing, data preparation
 Exploratory analysis:(EDA): if look data understand use
histogram like prices also knows as exploration analysis ,
descriptive analysis
 Final analysis
 Reporting
7. Major components Artificial intelligence (AI)
 Knowledge Representation
 Natural language processing (NLP)
 Reasoning
 Machine learning
8. What is machine learning?
 study computer algorithms that improve automatically through
experience
9. Difference between AI, machine learning , deep learning ?
Artificial Intelligence (AI): machines think and act like humans.
Machine Learning: enable computer perform tasks without explicit
programming.
Deep learning: subset machine learning basic artificial neural
networks
10. Big data definitions in terms of volume, velocity, and variety,
big data characteristics are often referred to as the “3 Vs of big data”
first defined by Gartner in 2001.
 Volume : This describes the enormous amount of data that is
available for collection and produced from a variety of sources
and devices on a continuous basis.
 Velocity: refers to the speed at which data is generated
 Variety : data is heterogeneous many different sources and can
be structured, unstructured, or semi-structured
11. Data analytics is the process of transforming raw data into
meaningful insights for better decision making, mostly using statistical
processing and machine learning.

12. What is Data Science Used For?


 used for an array of applications, predicting customer behavior
to optimizing business processes.
13. There are four major types of data analytics.

2 | Page writer:
Abaas Muuse
 Descriptive analytics: Analyzes past data to understand
current state and trend identification. retail stores might use
it to analyze last quarter's sales or identify best-selling
products.
 Diagnostic analytics. Explores data to understand why
certain events occurred, identifying patterns and anomalies.
 Predictive analytics. Uses statistical models to forecast
future outcomes based on past data, used widely in finance,
healthcare, and marketing.
 Prescriptive analytics: actions based on results from other
types of analytics to mitigate future problems or leverage
promising trends. example, a navigation app advising based
on current traffic conditions
14. Data science applications.
 Data exploration and visualization
 Financial analysis and fraud detection
 Healthcare Analysis
 Driverless cars (Google)
 Election polling and predictions
 Customer behavior analysis
 Question answering (Siri, IBM Watson, …)
15. Why Python ?
 easy to learn
 Free open-source tool
 Employment opportunities
 No 1 in data science and machine learning
 Rich library and large community
 Particularly designed for scientific computing
16. Anaconda distribution combines thousands of open-source
data science libraries and packages in a single framework, e.g.
 data analysis (e.g., Pandas)
 data visualization (e.g., Matplotlib)
 statistical analysis (e.g. Stats models)
 machine learning (e.g. Scikit-learn)
17. It runs on all major OS platforms, windows, , e.g. Windows, Mac
OS, and Linux, and includes the following:
 Standard Python
 Jupyter notebook (interactive coding via a browser)
 Spyder, a code editor
3 | Page writer:
Abaas Muuse
End- chapter one data science

4 | Page writer:
Abaas Muuse

You might also like