Mathophilia

The document provides an overview of data science, highlighting its multidisciplinary nature and the importance of mathematics, particularly linear algebra, in analyzing large datasets. It discusses various mathematical applications such as loss functions, regularization, covariance matrices, and singular value decomposition, as well as their relevance in fields like natural language processing and image representation. Additionally, it touches on the role of statistics in data science and its applications across various domains, including robotics and genomics.

Uploaded by

sujitha Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views18 pages

Mathophilia

Uploaded by

sujitha Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 18

MATHEMATICS IN DATA

SCIENCE
INTRODUCTION – DATA SCIENCE
 Data science is a multidisciplinary field that combines various techniques, tools, and algorithms to extract
meaningful insights and knowledge from structured and unstructured data.
 It involves using statistical analysis, machine learning, data visualization, and other methods to uncover patterns,
trends, and correlations that can inform decision-making and drive business outcomes.
 Data scientists employ a combination of skills from mathematics, statistics, programming, and domain
knowledge to solve complex problems. They work with large datasets, often referred to as "big data," and
leverage advanced computational techniques to process and analyze the information contained within the data.
 Data science has applications in various fields, such as business, healthcare, finance, marketing, social sciences,
and many others. It has become increasingly important in today's data-driven world, as organizations strive to
extract valuable insights from their data to gain a competitive edge and make informed decisions.
Linear algebra
• Linear algebra is a branch of Mathematics that studies the properties of matrices and vector
spaces.
• Linear Algebra is the “mathematics” of Data Science helping to provide structure and powerful
theory to work with big data sets.
• Linear algebra in data science is used as follows:
Application
of
Mathematic MATH
S
s in data
science
LOSS FUNCTION

A loss function is an application of the vector norm in linear algebra. The norm of a
vector can simply be its magnitude. There are many types of vector norms.
L1 Norm: Also known as the Manhattan Distance or Taxicab Norm. The L1 Norm is the distance
you would travel if you went from the origin to the vector if the only permitted directions are
parallel to the axes of the space

In this 2D space, you could reach the vector (3, 4) by traveling 3

units along the x-axis and then 4 units parallel to the y-axis (as
shown). Or you could travel 4 units along the y-axis first and then
3 units parallel to the x-axis. In either case, you will travel a total
of 7 units.
• L2 Norm: Also
known as the
Euclidean Distance.
L2 Norm is the
shortest distance of the
vector from the origin.
• This distance is
calculated using the
(Pythagoras Theorem).
• It is the square root of
(3^2 + 4^2), which is
equal to 5.
Regularization
• Regularization is a very important concept in data science. It’s a technique we use to prevent models from
overfitting. Regularization is actually another application of the Norm.
• A model is said to overfit when it fits the training data too well. Such a model does not perform well with new
data because it has learned even the noise in the training data. It will not be able to generalize on data that it
has not seen before.
• Regularization penalizes overly complex models by adding the norm of the weight vector
to the cost function. Since we want to minimize the cost function, we will need to minimize this norm.
This causes unrequired components of the weight vector to reduce to zero and prevents the prediction
function from being overly complex.
Covariance Matrix
• We want to study the relationship between pairs of variables. Covariance or Correlation are measures used to study relationships
between two continuous variables.
• Covariance indicates the direction of the linear relationship between the variables. A positive covariance
indicates that an increase or decrease in one variable is accompanied by the same in another. A negative covariance indicates that
an increase or decrease in one is accompanied by the opposite in the other.

correlation is the standardized value of

Covariance. A correlation value tells us
both the strength and direction of the
linear relationship and has the range
from -1 to 1.
SINGULAR VALUE
DECOMPOSITION
• SVD in Dimensionality Reduction. Specifically, this is known
as Truncated SVD.
• We start with the large m x n numerical data matrix A, where m is the
number of rows and n is the number of features
• Decompose it into 3 matrices
NATURAL LANGUAGE PROCESSING (NLP)
WORD EMBEDDINGS
• Machine learning algorithms cannot work with raw textual data. We need to convert the text into some numerical and statistical
features to create model inputs. There are many ways for engineering features from text data, such as:
• Meta attributes of a text, like word count, special character count, etc.
• NLP attributes of text using Parts-of-Speech tags and Grammar Relations like the number of proper nouns
• Word Vector Notations or Word Embeddings
• Word Embeddings is a way of representing words as low dimensional vectors of numbers while preserving their context in
the document. These representations are obtained by training different neural networks on a large amount of text which is called
a corpus. They also help in analyzing syntactic similarity among words:
Image Representation as Tensors
• How do you account for the ‘vision’ in Computer Vision? Obviously, a computer does not process images as
humans do. Machine learning algorithms need numerical features to work with.
• A digital image is made up of small indivisible units called pixels

This grayscale image of the digit

zero is made of 8 x 8 = 64 pixels. Each
pixel has a value in the range 0 to 255. A
value of 0 represents a black pixel and
255 represents a white pixel.
Conveniently, an m x n grayscale image
can be represented as a 2D
matrix with m rows and n columns
with the cells containing the respective
pixel values:
But colored image? A colored image is generally
stored in the RGB system. Each image can be
thought of as being represented by three 2D
matrices, one for each R, G and B channel. A pixel
value of 0 in the R channel represents zero
intensity of the Red color and of 255 represents the
full intensity of the Red color.
Each pixel value is then a combination of the
corresponding values in the three channels:
In reality, instead of using 3 matrices to represent
an image, a tensor is used. A tensor is a generalized
n-dimensional matrix. For an RGB image, a 3rd
ordered tensor is used. Imagine it as three 2D
matrices stacked one behind another:
REAL LIFE APPLICATIONS

STATISTICS
• Statistics is an inherently necessary
component of data science
• Statistics is used to predict the
weather, restock retail shelves,
estimate the condition of the
economy, and much more.
• Data scientists use statistics to
gather, review, analyze, and draw
conclusions from data, as well as
apply quantified mathematical
models to appropriate variables.

PITCH DECK 13
ROBOTICS
• Reprogramming a robot for a new function or preparing for a new real-time
trend involving vision-oriented tasks was time-consuming.
• Data Scientists who rely on AI and Machine Learning learned to work with
robots that would evolve, acquire newer behavior through labeled data, evolve
after learning to identify errors in existing data, and so on. As a result, the
scientist’s task becomes easier, and robots can evolve with little human
intervention.
Chemical sciences and
engineering have also used
data science tools to, for
example, monitor and control
chemical processes, predict
activity depending on
chemical structures or
properties, and inform
business and research
decisions.
Data-driven science as an iterative
process: (1) identify a database
(2) eliminate redundancies, reduce
large uncertainties, and describe or
annotate the data Chemical Physics
(3) use data science methods to
develop and validate a data-driven
model that can examine correlations,
Geonomics
Genomic data science is a field of study that enables
researchers to use powerful computational and statistical
methods to decode the functional information hidden in
DNA sequences.

Genomic data science emerged as a field in the 1990s to bring

together two laboratory activities:
• Experimentation: Generating genomic information from
studying the genomes of living organisms.

• Data analysis: Using statistical and computational tools to

analyze and visualize genomic data, which includes
processing and storing data and using algorithms and
software to make predictions based on available genomic
data.

16
OTHER FIELDS
• IMAGE PROCESSING
• QUANTUM PHYSICS
• NEURAL NETWORK
• PRINCIPAL COMPONENT ANALYSICS (PCA)
• SUPPORT VECTOR MACHINE
CLASSIFICATION
THANK YOU

ML Interview Questions and Answers
No ratings yet
ML Interview Questions and Answers
105 pages
Unit 1 - DE
No ratings yet
Unit 1 - DE
44 pages
Data - Science and - Artificial - Intelligence
No ratings yet
Data - Science and - Artificial - Intelligence
106 pages
Data Science
No ratings yet
Data Science
74 pages
Larxxia Newest
No ratings yet
Larxxia Newest
809 pages
Data Science Unit - 3 - 31.8.23
No ratings yet
Data Science Unit - 3 - 31.8.23
62 pages
Linear Algebra For Machine Learning
No ratings yet
Linear Algebra For Machine Learning
65 pages
Unit 1
No ratings yet
Unit 1
50 pages
Aljabar Linier
No ratings yet
Aljabar Linier
810 pages
SCSA3016
No ratings yet
SCSA3016
302 pages
Linalg Math Start23 Jupyter Notebook
No ratings yet
Linalg Math Start23 Jupyter Notebook
17 pages
DL Coursefile
No ratings yet
DL Coursefile
219 pages
FDS Module II-I
No ratings yet
FDS Module II-I
27 pages
Unit 3 DS
No ratings yet
Unit 3 DS
16 pages
Module 1 Lecture 3 - Linear Algibra
No ratings yet
Module 1 Lecture 3 - Linear Algibra
34 pages
Machine Learning: The Basics
No ratings yet
Machine Learning: The Basics
288 pages
Lecture 3 Introduction To Linear Algebra (Part 2)
No ratings yet
Lecture 3 Introduction To Linear Algebra (Part 2)
57 pages
LinearAlgebra Lect2 Karan
No ratings yet
LinearAlgebra Lect2 Karan
62 pages
WINSEM2024-25 CSE4006 ETH AP2024254000693 2024-12-14 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000693 2024-12-14 Reference-Material-I
36 pages
Data Science Using R
No ratings yet
Data Science Using R
130 pages
What Is Computer Vision?
No ratings yet
What Is Computer Vision?
125 pages
Unit 4 DSC
No ratings yet
Unit 4 DSC
30 pages
Linear Algebra Primer Concepts
No ratings yet
Linear Algebra Primer Concepts
50 pages
1 2.-Maths ML
No ratings yet
1 2.-Maths ML
18 pages
Da&ml PPT-1
No ratings yet
Da&ml PPT-1
35 pages
Numerical Linear Algebra in Data Mining: Lars Eld en
No ratings yet
Numerical Linear Algebra in Data Mining: Lars Eld en
58 pages
05 Linear Algebra and Machine Learning
0% (1)
05 Linear Algebra and Machine Learning
24 pages
DL Notes Unit 1
No ratings yet
DL Notes Unit 1
28 pages
1 & 2 Linear Algebra and Probability Distribution
No ratings yet
1 & 2 Linear Algebra and Probability Distribution
11 pages
Maths Roadmap For Machine Learning
No ratings yet
Maths Roadmap For Machine Learning
21 pages
1 Linear Algebra Basics 25-07-2024
No ratings yet
1 Linear Algebra Basics 25-07-2024
30 pages
Linear Algebra Assignment
No ratings yet
Linear Algebra Assignment
5 pages
Ai Application
No ratings yet
Ai Application
28 pages
Deep Learning Unit - I Notes
No ratings yet
Deep Learning Unit - I Notes
20 pages
Ids Unit 3 Notes CSM & CSD
No ratings yet
Ids Unit 3 Notes CSM & CSD
24 pages
DS Unit 2
No ratings yet
DS Unit 2
50 pages
Machine Learning and Econometrics
No ratings yet
Machine Learning and Econometrics
50 pages
22amh32 - Data Analytics and Data Science Unit I & Mathematics Foundations For Data Science 1. Mathematics Foundations For Data Science
No ratings yet
22amh32 - Data Analytics and Data Science Unit I & Mathematics Foundations For Data Science 1. Mathematics Foundations For Data Science
5 pages
Maths in Data Science
No ratings yet
Maths in Data Science
3 pages
Examples of Linear Algebra in Machine Learning 11022025 060334pm
No ratings yet
Examples of Linear Algebra in Machine Learning 11022025 060334pm
12 pages
Data Science Is An Amalgamation of Different Scientific Methods, Algorithms and Systems Which Enable Us
No ratings yet
Data Science Is An Amalgamation of Different Scientific Methods, Algorithms and Systems Which Enable Us
35 pages
LFD 1
No ratings yet
LFD 1
39 pages
Introduction To Machine Learning For Computer Graphics: Peter M. Hall University of Bath
No ratings yet
Introduction To Machine Learning For Computer Graphics: Peter M. Hall University of Bath
33 pages
Math 4 AI
No ratings yet
Math 4 AI
25 pages
Application of Linear Algebra New
No ratings yet
Application of Linear Algebra New
10 pages
Data Science Using R Programming - Data Science Using R Unit 1-5
No ratings yet
Data Science Using R Programming - Data Science Using R Unit 1-5
25 pages
Unit 1 Ganeshk e
No ratings yet
Unit 1 Ganeshk e
24 pages
DSR Unit 3
No ratings yet
DSR Unit 3
25 pages
Stats Lecture 26
No ratings yet
Stats Lecture 26
23 pages
Unit 1
No ratings yet
Unit 1
39 pages
Linear Algebra
No ratings yet
Linear Algebra
19 pages
Algebra Aplicacion de Espacios Vectoriales
No ratings yet
Algebra Aplicacion de Espacios Vectoriales
11 pages
Computer Vision Unit 1
No ratings yet
Computer Vision Unit 1
20 pages
Types of Digital Data
No ratings yet
Types of Digital Data
22 pages
1 9780692196380 FM
No ratings yet
1 9780692196380 FM
3 pages
Linear Algebra - A Powerful Tool For Data Science
No ratings yet
Linear Algebra - A Powerful Tool For Data Science
6 pages
Use of Linear Algebr
No ratings yet
Use of Linear Algebr
4 pages
1 Why Mathematical Modeling?
No ratings yet
1 Why Mathematical Modeling?
12 pages
Linear Algebra: Submitted by Ahmad Saeed Submitted To Sir Muzzam Ali BITM-F18-022
No ratings yet
Linear Algebra: Submitted by Ahmad Saeed Submitted To Sir Muzzam Ali BITM-F18-022
5 pages
Business Turnaround: Action Plan
No ratings yet
Business Turnaround: Action Plan
4 pages
Ibuprofen - Pharmacology, Therapeutics and Side Effects. 2012
No ratings yet
Ibuprofen - Pharmacology, Therapeutics and Side Effects. 2012
260 pages
Jeppview For Windows: List of Pages in This Trip Kit
No ratings yet
Jeppview For Windows: List of Pages in This Trip Kit
21 pages
Chemical Bonding
No ratings yet
Chemical Bonding
274 pages
Developmental Characteristics of Children and Adolescence SARAH&JAYBERT
No ratings yet
Developmental Characteristics of Children and Adolescence SARAH&JAYBERT
7 pages
Cop Pha Dam - Kenhxaydung - VN
No ratings yet
Cop Pha Dam - Kenhxaydung - VN
20 pages
Black Book - Ritesh Yadav TYBMS
No ratings yet
Black Book - Ritesh Yadav TYBMS
41 pages
Adventure Workbook
No ratings yet
Adventure Workbook
54 pages
AMBA User Guide Build 107
No ratings yet
AMBA User Guide Build 107
134 pages
CIA983414 Value and Supply Chain Management
No ratings yet
CIA983414 Value and Supply Chain Management
6 pages
Nairobi Expressway Route Map 120522
No ratings yet
Nairobi Expressway Route Map 120522
1 page
Energy Powerpoint
No ratings yet
Energy Powerpoint
18 pages
AIOTA 2.0 - Suji
No ratings yet
AIOTA 2.0 - Suji
17 pages
Information Security - Seminar
No ratings yet
Information Security - Seminar
13 pages
Dela Cruz Reviel Marcformulating My Philosophy of Education
100% (1)
Dela Cruz Reviel Marcformulating My Philosophy of Education
10 pages
Split Valuation SAP
No ratings yet
Split Valuation SAP
7 pages
Fresh 2 You
No ratings yet
Fresh 2 You
7 pages
微调方法 ROSA - ACCURATE PARAMETER-EFFICIENT FINE-TUNING VIA ROBUST ADAPTATION
No ratings yet
微调方法 ROSA - ACCURATE PARAMETER-EFFICIENT FINE-TUNING VIA ROBUST ADAPTATION
16 pages
Gracel'S Enterprises: General Specifications
No ratings yet
Gracel'S Enterprises: General Specifications
17 pages
SIGNificant Client UserGuide
No ratings yet
SIGNificant Client UserGuide
65 pages
Critical Data in Physics and Chemistry: David R. Lide, Jr. Bettijoyce B. Lide
No ratings yet
Critical Data in Physics and Chemistry: David R. Lide, Jr. Bettijoyce B. Lide
12 pages
The Revenge of Gaia
No ratings yet
The Revenge of Gaia
2 pages
Dream House Higher Ability Activity Sheet
No ratings yet
Dream House Higher Ability Activity Sheet
3 pages
B L A D E S Inc. Case Study
No ratings yet
B L A D E S Inc. Case Study
3 pages
Solar Mounting Solutions
No ratings yet
Solar Mounting Solutions
9 pages
Is Metabolic Syndrome Associated With The Risk of Recurrent Stroke: A Meta-Analysis of Cohort Studies
No ratings yet
Is Metabolic Syndrome Associated With The Risk of Recurrent Stroke: A Meta-Analysis of Cohort Studies
6 pages
From The Gate To The Neuromatrix
No ratings yet
From The Gate To The Neuromatrix
6 pages
5th Grade Lesson Plan - Edu 280-1
100% (1)
5th Grade Lesson Plan - Edu 280-1
3 pages
Ferro Molybdenum Powder - Ferro Alloy Powders - Kamman Group
No ratings yet
Ferro Molybdenum Powder - Ferro Alloy Powders - Kamman Group
3 pages
I Don't Know Nobody Here. I Don't Know Anybody Here. I Can't Find No Books. I Can't Find Any Books
No ratings yet
I Don't Know Nobody Here. I Don't Know Anybody Here. I Can't Find No Books. I Can't Find Any Books
3 pages
TW8807 La2 GR
No ratings yet
TW8807 La2 GR
2 pages
The Position of Mirza Yahya Nuri Subh-I-Azal
No ratings yet
The Position of Mirza Yahya Nuri Subh-I-Azal
2 pages
The Importance of Philosophy in Education
No ratings yet
The Importance of Philosophy in Education
1 page
About Blank
No ratings yet
About Blank
1 page
Applied Linear Algebra: Core Principles
From Everand
Applied Linear Algebra: Core Principles
Kartikeya Dutta
No ratings yet

Mathophilia

Uploaded by

Mathophilia

Uploaded by

MATHEMATICS IN DATA

In this 2D space, you could reach the vector (3, 4) by traveling 3

correlation is the standardized value of

This grayscale image of the digit

Genomic data science emerged as a field in the 1990s to bring

• Data analysis: Using statistical and computational tools to

You might also like