0% found this document useful (0 votes)
9 views

Data Science

id

Uploaded by

tashatcteess
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Data Science

id

Uploaded by

tashatcteess
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Event-Driven Programming

Python for
Data Science
Prepared by:
Paulo Jay Christian P. De Guzman
Data
Statistics
Data Science
Machine Learning
Statistics
What statistics ?
The field of statistics - the practice and study of
collecting and analyzing data.
A summary statistics - fact about or summary of some
data
What can statistics do?
• How likely is someone to purchase a product? Are
people more likely to purchase it if they can use a
different payment system?
• How many occupants will your hotel have? How can
you optimize occupancy?
What can’t statistics do?
Why the TV series Game of Thrones is so popular?

Instead...
Are series with more violent scenes viewed by more
people?

But...
Even so, this can't tell us if more violent scenes lead to
more views
Types of statistics
Descriptive statistics Inferential statistics
Describe and summarize data
Use a sample of data to make
inferences about a larger
population

• 50% of them drive to work


• 25% ride the bus • What percent of people drive to
• 25% bike work?
Types of Data
Numeric (Quantitative)
• Continuous (Measured): airplane
speed, time spent waiting in line
• Discrete (Counted): number of
pets
Categorical (Qualitative)
• Nominal (Unordered):
male/female, married/unmarried
• Ordinal (Ordered): degree of
agree/disagree
Why does data type matter?
Why does data type matter?
Role of Statistics in Data Science
Statistics provides the theoretical foundation for
Data Science, equipping practitioners with methods
to understand, analyze, and interpret data.
DATA SCIENCE
DATA SCIENCE
A cross-disciplinary field combining domain expertise, statistics, and
computational tools to extract meaningful insights from data.
Data science
is a popular and lucrative career that
involves analyzing and
managing data, using machine learning
and programming skills, and
understanding business needs.
Data Science extends traditional
statistics by integrating computational
techniques and handling large, complex
datasets.
It requires a variety of skills, including
data analysis, business acumen, comm
unication skills, and more.
Types of Data Science
• Descriptive Analytics (Business Intelligence): Get useful data in front of the right
people in the form of dashboards, reports, and emails
• Which customers have churned?
• Which homes have sold in a given location, and do homes of a certain size sell
more quickly?
• Predictive Analytics (Machine Learning): Put data science models continuously into
production
• Which customers may churn?
• How much will a home sell for, given its location and number of rooms?
• Prescriptive Analytics (Decision Science): Use data to help a company make
decisions
• What should we do about the particular types of customers that are prone to churn?
• How should we market a home to sell quickly, given its location and number of
rooms?
The Standard Data Science Workflow
• Data Collection: Compile data from different sources
and store it for efficient access
• Exploration and Visualization: Explore and visualize
data through dashboards
• Experimentation and Prediction: The buzziest topic in
data science—machine learning!
Exploratory Data Analysis
Descriptive Statistics
Calculate metrics on measures of location like mean and median, measure of
variation like range and standard deviation, and other characteristics of
features Calculate metrics like correlation to understand the relationships
between feature

Data Visualization
Create plots like bar plots, histograms and box plots to visualize single features. Create
plots like scatter plots, line plots and heat maps to visualize relationships between
features.
Descriptive analytics is one of most
commonly used methods in data science.
It helps you explore, and understand your
data, and make decisions as a
consequence. Descriptive statistics
techniques are one of the most widely
used-tools in descriptive analytics.
Data
Visualizati
on
Data visualization is one of the most
widely-used data skills—and is often called
the "gateway drug" into data science.
How to Capture a Trend
• Line chart: The most straightforward way to capture how a numeric
variable is changing over time
• Multi-line chart: Captures multiple numeric variables over time. It can
include multiple axes allowing comparison of different units and scale
ranges
• Area chart: Shows how a numeric value progresses by shading the area
between line and the x-axis
• Stacked area chart: Most commonly used variation of area charts, the best
use is to track the breakdown of a numeric value by subgroups
• Spline chart: Smoothened version of a line chart. It differs in that data
points are connected with smooth curves to account for missing values, as
opposed to straight lines
How to Visualize Relationships
• Bar chart: One of the easiest charts to read which helps in quick comparison of
categorical data. One axis contains categories and the other axis represents values.
• Column chart: Also known as a vertical bar chart, where the categories are placed on
the x-axis. These are preferred over bar charts for short labels, date ranges, or
negatives in values.
• Scatter plot: Most commonly used chart when observing the relationship between
two variables. It is especially useful for quickly surfacing potential correlations
between data points.
• Connected scatterplot: A hybrid between a scatter plot and a line plot, the scatter
dots are connected with a line
• Bubble charts: Often used to visualize data points with 3 dimensions, namely
visualized on the x-axis, y-axis, and with the size of the bubble. It tries to show
relations between data points using location and size
• World cloud chart: A convenient visualization for visualizing the most prevalent
words that appear in a text. This can be used to visualize the relationship between
different words that appear together or capture a trend on the most commonly
prevalent words.
Part-to-whole Charts
• Pie chart: One of the most common ways to show part to whole data. It is also
commonly used with percentages
• Donut pie chart: The donut pie chart is a variant of the pie chart, the difference being
it has a hole in the center for readability
• Heat maps: Heatmaps are two-dimensional charts that use color shading to
represent data trends
• Stacked column chart: Best to compare subcategories within categorical data. Can
also be used to compare percentages
• Treemap charts: 2D rectangles whose size is proportional to the value being
measured and can be used to display hierarchically structured data
How to Visualize a Single Value
• Card: Cards are great for showing and tracking
KPIs in dashboards or presentations
• Table chart: Best to be used on small datasets, it
displays tabular data in a table
• Gauge chart: This chart is often used in executive
dashboard reports to show relevant KPIs
How to Capture Distributions
• Histograms: Shows the distribution of a variable. It converts
numerical data into bins as columns. The x-axis shows the
range, and the y-axis represents the frequency
• Box plot: Shows the distribution of a variable using 5 key
summary statistics—minimum, first quartile, median, third
quartile, and maximum
• Violin plot: A variation of the box plot. It also shows the full
distribution of the data alongside summary statistics
• Density plot: Visualizes a distribution by using smoothing to
allow smoother distributions and better capture the
distribution shape of the data
Visualize a flow
• Sankey chart: Useful for representing flows in systems. This
flow can be any measurable quantity
• Chord chart: Useful for presenting weighted relationships or
flows between nodes. Especially useful for highlighting the
dominant or important flows
• Network chart: Similar to a graph, it consists of nodes and
interconnected edges. It illustrates how different items have
relationships with each other
Machine
Learning
AI refers to the
development of
programs that
behave
intelligently
and mimic
human
intelligence
through a set
of algorithms.
The field
focuses on
three skills:
learning,
reasoning, and
self-correction
to obtain
maximum
efficiency.
Machine Learning
Machine Learning, often abbreviated as ML, is a subset
of artificial intelligence (AI) that focuses on the
development of computer algorithms that improve
automatically through experience and by the use of
data. In simpler terms, machine learning enables
computers to learn from data and make decisions or
predictions without being explicitly programmed to do
so.
Importance of Machine Learning
•Data processing. One of the primary reasons machine learning is so
important is its ability to handle and make sense of large volumes of data.
•Driving innovation. Machine learning is driving innovation and efficiency
across various sectors.
•Enabling automation. Machine learning is a key enabler of automation.
By learning from data and improving over time, machine learning
algorithms can perform previously manual tasks, freeing humans to focus
on more complex and creative tasks. This not only increases efficiency but
also opens up new possibilities for innovation.
Machine Learning Workflow
Machine Learning Workflow
Machine Learning Workflow
Machine Learning Workflow
Types of
Machine
Learning
Supervised Learning
Supervised learning models are models
that map inputs to outputs, and
attempt to extrapolate patterns
learned in past data on unseen data.
Supervised learning models can be
either regression models, where we try
to predict a continuous variable, like
stock prices—or classification models,
where we try to predict a binary or
multi-class variable, like whether a
customer will churn or not.
Unsupervised Learning
Unsupervised learning is about discovering
general patterns in data. The most popular
example is clustering or segmenting
customers and users. This type of
segmentation is generalizable and can be
applied broadly, such as to documents,
companies, and genes. Unsupervised
learning consists of clustering models that
learn how to group similar data points
together or association algorithms that
group different data points based on pre-
defined rules.
Thank
You!

You might also like