0% found this document useful (0 votes)
3 views

Data Sciences

The document provides an overview of Artificial Intelligence, specifically focusing on Data Sciences, which is an interdisciplinary field that extracts knowledge from structured and unstructured data using various scientific methods. It discusses the relationship between Data Science and Machine Learning, types of analytics, applications in various domains such as finance and genetics, and the importance of data collection and visualization. Additionally, it outlines the structure and types of data, as well as basic statistical concepts relevant to data analysis.

Uploaded by

Nahid Noufal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Data Sciences

The document provides an overview of Artificial Intelligence, specifically focusing on Data Sciences, which is an interdisciplinary field that extracts knowledge from structured and unstructured data using various scientific methods. It discusses the relationship between Data Science and Machine Learning, types of analytics, applications in various domains such as finance and genetics, and the importance of data collection and visualization. Additionally, it outlines the structure and types of data, as well as basic statistical concepts relevant to data analysis.

Uploaded by

Nahid Noufal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Artificial Intelligence

Data Sciences

AI can be classified into 3 domains


○ data sciences - working around numeric & alpha-numeric data
○ computer vision - working around image & visual data
○ natural language processing - working around textural & speech-based data

data sciences -
○ interdisciplinary field that involves extracting knowledge & insights from structures & unstructured data
using various scientific methods, processes, algorithms & tools
○ concept to unify statistics, data analysis, machine learning & their related methods to understand &
analyse actual phenomena with data
○ employs techniques & theories drawn from many fields within the context of Mathematics, Statistics,
Computer Science & Information Science

DS draws inspiration from several domains including


○ mathematics & science: DS relies on mathematical concepts &
statistical techniques to analyse & interpret data, identify patterns &
make predictions
○ computer science & programming: data scientists use programming
languages like Python / R to manipulate & process data efficiently. They
use algorithms & computational techniques to extract insights from
large datasets
○ data visualisation: involves creating graphical representations, charts,
graphs to communicate data. It presents complex & large datasets in a
visually appealing format, allowing analysts & stakeholders to gain
insights, identify patterns & make informed decisions based on
visualised data
○ domain expertise: DS applies methods to various domains like finance,
healthcare, marketing & social sciences

Machine Learning & Data Science


machine learning - branch of AI that focuses on developing algorithms & models that allow computers to learn
from data, make predictions & perform tasks without being explicitly programmed

ML relies on DS principles & techniques to analyse & interpret large datasets. Data scientists use ML algorithms to
uncover patterns, relationships & trends within the data. By training these algorithms on historical data, they enable
the computer to learn from examples & make predictions / decisions based on new, unseen data

Types of Analytics
DS relies on various types of analytics to extract meaningful information from complex datasets
○ descriptive analytics: examining & summarizing historical data to understand what happened in the past
ex. a marketing team analyses customer purchase data to gain insights into customer preferences, identify
popular products & understand sales trends

○ diagnostic analytics: focuses on analysing data to understand why certain events / outcomes occurred
ex. a manufacturing company may investigate data on production line performance to identify the root
cause of quality issues / machine breakdowns, using statistical analysis & root cause analysis techniques

○ predictive analytics: utilises historical data & statistical models to make predictions about future events /
outcomes
ex. an insurance company can analyse customer data, including demographics & historical claim records,
to build a predictive model that estimates the likelihood of future claim filing

○ prescriptive analytics: provides recommendations on what actions to take to achieve desired outcomes
ex. a supply chain management company uses optimization algorithms to determine the most efficient
routes for delivering goods, considering factors like delivery time, cost & available resources

Applications of Data Science


Data Sciences majorly work around analysing the data & the analysis helps in making the machine intelligent
enough to perform tasks by itself

Fraud & Risk Detection


The earliest applications of data science were in Finance. Companies, fed up with debts & losses, used the data
they collected during the initial paperwork while sanctioning loans. Banking companies learnt to divide & conquer
data via customer profiling, past expenditures & other essential variables to analyse the probabilities of risk &
default. It helped them push their banking products based on customer’s purchasing power

Genetics & Genomics


Data science enables an advanced level of treatment personalization through research in genetics & genomics to
understand the impact of DNA on our health & find individual biological connections between genetics, diseases &
drug response. DS techniques allow integration of different kinds of data with genomic data in disease research,
which provides a deeper understanding of genetic issues in reactions to particular drugs & diseases. Reliable
personal genome data helps achieve a deeper understanding of the human DNA. The advanced genetic risk
prediction will be a major step towards more individual care

Internet Search
Google, Yahoo, Bing, Ask, AOL make use of data science algorithms to deliver the best result for our searched query.
Google processes more than 20 petabytes of data every day with the help of DS
Targeted Advertising
The digital marketing spectrum relies on DS algorithms,, starting from the display banners on various websites to
the digital billboards at the airports, which is why digital ads have been able to get a much higher CTR
(Call-Through Rate) than traditional advertisements. They can be targeted based on a user’s past behaviour

Website Recommendations
Suggestions about similar products on Amazon help us find relevant products from billions of products available
with them & add to the user experience. A lot of companies have used this engine to promote their products in
accordance with the user’s interest & relevance of information. Internet giants like Amazon, Twitter, Google Play,
Netflix, LinkedIn, IMDB use recommendations made based on previous search results to improve the user experience

Airline Route Planning


The Airline Industry across the world is known to bear heavy losses. Except for a few airline service providers,
companies are struggling to maintain their occupancy ratio & operating profits. With high rise in air-fuel prices &
the need to offer heavy discounts to customers, the situation has got worse. Now, while using DS, airline companies
can:
○ predict flight delay
○ decide which class of airplanes to buy
○ whether to directly land at the destination or take a halt in between
○ effectively drive customer loyalty programs

Data Collection
data collection - involves gathering relevant data from various sources to support analysis & decision-making

Steps involved in data collection


1. identify data requirements:
○ clearly defining data requirements based on the objectives of the analysis or problem at hand
○ Determining the types of data needed (numerical, categorical or textual data) & any specific
variables of features of interest
2. identify data sources
○ databases: data stored in structured databases (ex. SQL databases) can be accessed using
appropriate query languages
○ Application Programming Interfaces: online platforms & services offer APIs that allow data retrieval
through specific endpoints
○ web scraping: data can be extracted from websites using web scraping techniques to scrape HTML
or get data from APIs
○ sensor data: sensors & IoT devices generate real-time data
○ surveys & questionnaires: collect specific information from individuals or groups
○ social media & online platforms: data from social media platforms, online forums or
user-generated content
○ publicly available datasets: institutions & organisations that provide publicly accessible datasets
for analysis

While accessing data from any of the data sources


○ only public data should be used
○ personal datasets should only be used with the consent of the owner
○ private data obtained by breaching privacy shouldn’t be used
○ data should only be taken from reliable sources
○ reliable sources of data ensure the authenticity of data
3. gather data: once data is collected, data scientists collect the required data by extracting, downloading or
accessing it from select sources
4. data preprocessing
○ processing gathered data to clean, transform & format it for further analysis
○ handles missing values, removed duplicates, standardises formats & performs necessary data
transformations
5. validate data quality: data quality is assessed by checking for accuracy, completeness, consistency &
relevance, which involves examining the data for any anomalies, errors or inconsistencies & taking
appropriate measures to address them
6. store & organise data: data is stored in a structured manner, in databases or data warehouses. Proper data
organisation, including creating tables, defining relationships & indexing allows for efficient data retrieval &
analysis

Data scientists must adhere to privacy regulations & ethical guidelines to protect sensitive information & ensure
responsible data handling practices

Structure of data
○ structured data:
✽ organized & well-defined
✽ stored in tables with rows & columns
✽ includes information like customer details, sales records & financial data found in databases &
spreadsheets
○ semi-structured data:
✽ doesn’t have strict structure but still has some organization
✽ may have tags or labels, like XML or JSON files
✽ ex. log files / social media feeds
○ unstructured data:
✽ lacks a predefined structure
✽ can be in various formats like text, images, audio or video
✽ includes social media posts, customer reviews, emails & multimedia content
✽ analysis requires specialized techniques like NLP or computer vision

Types of data
In DS & ML, the type of data being used influences which algorithms & techniques would be used to analyze the
data & make predictions. Different data types require different approaches to get accurate results
1. numerical data: consists of quantitative values that are expressed as numbers
○ continuous data:
✽ represents measurements that can take any value within a specific range
✽ can be represented as real numbers & often involve calculations & statistical analysis
✽ ex. temperature, height, weight & time
○ discrete data:
✽ represents values that are separate & distinct
✽ consists of whole numbers or counts, such as the number of products sold, the number of
people in a group or the number of votes received
✽ often used for counting, enumeration & categorical analysis
2. categorical data: non-numeric, qualitative or categorical variables that can take on a limited number of
distinct categories or labels
○ nominal data:
✽ represents categories that have no specific order or hierarchy
✽ ex. gender, eye colour or product categories
✽ typically used for grouping, classification & creating factors in statistical analysis
○ ordinal data:
✽ represents categories with a specific order or ranking
✽ ex. rating scales, customer satisfaction levels or high school levels
✽ retains the categorical nature byt also conveying relative differences in magnitude or
preference

Data Visualization
data visualisation -
○ represents data or information in a graph, chart or other visual formats
○ communicates information clearly & efficiently to users
○ provides a way to see & understand trends, outliers & patterns in data

common types of data visualizations


○ charts
○ graphs
○ tables
○ maps
○ histograms

Basic statistics with Python


mean: the average value of a sequence
○ add all numbers & divide the sum by the number of values

median: 50th percentile value of a sequence


○ if the number of data points is odd, the median is the middle data point in the list
○ if the number of data points is even, the median is the average of the 2 middle data points in the list

mode: most frequent value of the sequence


○ count how often each number appears & the number that appears the most times is the mode

standard deviation: measures the spread of the sequence around its average value
variance: average of the squared differences from the mean

You might also like