Data Sciences
Data Sciences
Data Sciences
data sciences -
○ interdisciplinary field that involves extracting knowledge & insights from structures & unstructured data
using various scientific methods, processes, algorithms & tools
○ concept to unify statistics, data analysis, machine learning & their related methods to understand &
analyse actual phenomena with data
○ employs techniques & theories drawn from many fields within the context of Mathematics, Statistics,
Computer Science & Information Science
ML relies on DS principles & techniques to analyse & interpret large datasets. Data scientists use ML algorithms to
uncover patterns, relationships & trends within the data. By training these algorithms on historical data, they enable
the computer to learn from examples & make predictions / decisions based on new, unseen data
Types of Analytics
DS relies on various types of analytics to extract meaningful information from complex datasets
○ descriptive analytics: examining & summarizing historical data to understand what happened in the past
ex. a marketing team analyses customer purchase data to gain insights into customer preferences, identify
popular products & understand sales trends
○ diagnostic analytics: focuses on analysing data to understand why certain events / outcomes occurred
ex. a manufacturing company may investigate data on production line performance to identify the root
cause of quality issues / machine breakdowns, using statistical analysis & root cause analysis techniques
○ predictive analytics: utilises historical data & statistical models to make predictions about future events /
outcomes
ex. an insurance company can analyse customer data, including demographics & historical claim records,
to build a predictive model that estimates the likelihood of future claim filing
○ prescriptive analytics: provides recommendations on what actions to take to achieve desired outcomes
ex. a supply chain management company uses optimization algorithms to determine the most efficient
routes for delivering goods, considering factors like delivery time, cost & available resources
Internet Search
Google, Yahoo, Bing, Ask, AOL make use of data science algorithms to deliver the best result for our searched query.
Google processes more than 20 petabytes of data every day with the help of DS
Targeted Advertising
The digital marketing spectrum relies on DS algorithms,, starting from the display banners on various websites to
the digital billboards at the airports, which is why digital ads have been able to get a much higher CTR
(Call-Through Rate) than traditional advertisements. They can be targeted based on a user’s past behaviour
Website Recommendations
Suggestions about similar products on Amazon help us find relevant products from billions of products available
with them & add to the user experience. A lot of companies have used this engine to promote their products in
accordance with the user’s interest & relevance of information. Internet giants like Amazon, Twitter, Google Play,
Netflix, LinkedIn, IMDB use recommendations made based on previous search results to improve the user experience
Data Collection
data collection - involves gathering relevant data from various sources to support analysis & decision-making
Data scientists must adhere to privacy regulations & ethical guidelines to protect sensitive information & ensure
responsible data handling practices
Structure of data
○ structured data:
✽ organized & well-defined
✽ stored in tables with rows & columns
✽ includes information like customer details, sales records & financial data found in databases &
spreadsheets
○ semi-structured data:
✽ doesn’t have strict structure but still has some organization
✽ may have tags or labels, like XML or JSON files
✽ ex. log files / social media feeds
○ unstructured data:
✽ lacks a predefined structure
✽ can be in various formats like text, images, audio or video
✽ includes social media posts, customer reviews, emails & multimedia content
✽ analysis requires specialized techniques like NLP or computer vision
Types of data
In DS & ML, the type of data being used influences which algorithms & techniques would be used to analyze the
data & make predictions. Different data types require different approaches to get accurate results
1. numerical data: consists of quantitative values that are expressed as numbers
○ continuous data:
✽ represents measurements that can take any value within a specific range
✽ can be represented as real numbers & often involve calculations & statistical analysis
✽ ex. temperature, height, weight & time
○ discrete data:
✽ represents values that are separate & distinct
✽ consists of whole numbers or counts, such as the number of products sold, the number of
people in a group or the number of votes received
✽ often used for counting, enumeration & categorical analysis
2. categorical data: non-numeric, qualitative or categorical variables that can take on a limited number of
distinct categories or labels
○ nominal data:
✽ represents categories that have no specific order or hierarchy
✽ ex. gender, eye colour or product categories
✽ typically used for grouping, classification & creating factors in statistical analysis
○ ordinal data:
✽ represents categories with a specific order or ranking
✽ ex. rating scales, customer satisfaction levels or high school levels
✽ retains the categorical nature byt also conveying relative differences in magnitude or
preference
Data Visualization
data visualisation -
○ represents data or information in a graph, chart or other visual formats
○ communicates information clearly & efficiently to users
○ provides a way to see & understand trends, outliers & patterns in data
standard deviation: measures the spread of the sequence around its average value
variance: average of the squared differences from the mean