Chapter One Data Science
Chapter One Data Science
1. Data is the new oil. It’s valuable, but if unrefined, it cannot really be
used. changed gas, plastic, chemicals,
2. Types of Data
Unstructured Text Data (Web)
Semi-structured Dat (XML)
Streaming Data (images and videos)
3. What is data science ?
field of study that combines domain expertise
Data scientists require background in these fundamental
related disciplines
1 | Page writer:
Abaas Muuse
Managing & date cleaning: remove errors and duplicates, also
known as data cleansing, data preparation
Exploratory analysis:(EDA): if look data understand use
histogram like prices also knows as exploration analysis ,
descriptive analysis
Final analysis
Reporting
7. Major components Artificial intelligence (AI)
Knowledge Representation
Natural language processing (NLP)
Reasoning
Machine learning
8. What is machine learning?
study computer algorithms that improve automatically through
experience
9. Difference between AI, machine learning , deep learning ?
Artificial Intelligence (AI): machines think and act like humans.
Machine Learning: enable computer perform tasks without explicit
programming.
Deep learning: subset machine learning basic artificial neural
networks
10. Big data definitions in terms of volume, velocity, and variety,
big data characteristics are often referred to as the “3 Vs of big data”
first defined by Gartner in 2001.
Volume : This describes the enormous amount of data that is
available for collection and produced from a variety of sources
and devices on a continuous basis.
Velocity: refers to the speed at which data is generated
Variety : data is heterogeneous many different sources and can
be structured, unstructured, or semi-structured
11. Data analytics is the process of transforming raw data into
meaningful insights for better decision making, mostly using statistical
processing and machine learning.
2 | Page writer:
Abaas Muuse
Descriptive analytics: Analyzes past data to understand
current state and trend identification. retail stores might use
it to analyze last quarter's sales or identify best-selling
products.
Diagnostic analytics. Explores data to understand why
certain events occurred, identifying patterns and anomalies.
Predictive analytics. Uses statistical models to forecast
future outcomes based on past data, used widely in finance,
healthcare, and marketing.
Prescriptive analytics: actions based on results from other
types of analytics to mitigate future problems or leverage
promising trends. example, a navigation app advising based
on current traffic conditions
14. Data science applications.
Data exploration and visualization
Financial analysis and fraud detection
Healthcare Analysis
Driverless cars (Google)
Election polling and predictions
Customer behavior analysis
Question answering (Siri, IBM Watson, …)
15. Why Python ?
easy to learn
Free open-source tool
Employment opportunities
No 1 in data science and machine learning
Rich library and large community
Particularly designed for scientific computing
16. Anaconda distribution combines thousands of open-source
data science libraries and packages in a single framework, e.g.
data analysis (e.g., Pandas)
data visualization (e.g., Matplotlib)
statistical analysis (e.g. Stats models)
machine learning (e.g. Scikit-learn)
17. It runs on all major OS platforms, windows, , e.g. Windows, Mac
OS, and Linux, and includes the following:
Standard Python
Jupyter notebook (interactive coding via a browser)
Spyder, a code editor
3 | Page writer:
Abaas Muuse
End- chapter one data science
4 | Page writer:
Abaas Muuse