Formulas Data Analytics

The document outlines various statistical formulas and concepts related to data analytics, including combinatorics, probability calculus, and distributions such as binomial and normal distributions. It also covers distance metrics, classification techniques like decision trees and Bayesian classification, and performance metrics such as precision and recall. Key formulas are provided for each topic, serving as a reference for statistical decision-making and data analysis.

Uploaded by

Jeroen Janssens

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Formulas Data Analytics

Uploaded by

Jeroen Janssens

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Formulas Statistics and decision making

Formulas Data Analytics

Combinatorics
Principle of inclusion-exclusion
|𝐴 ∪ 𝐴 | = |𝐴 | + |𝐴 | − |𝐴 ∩ 𝐴 |
r-permutation r-combination
𝑛! 𝑛!
𝑃(𝑛, 𝑟) = 𝐶(𝑛, 𝑟) =
(𝑛 − 𝑟)! 𝑟! (𝑛 − 𝑟)!
r-combination with repetition
r-permutation with repetition (𝑛 + 𝑟 − 1)!
𝑛
𝑟! (𝑛 − 1)!
Permutation with repetitive elements
𝑛!
𝑛 ! .𝑛 ! .….𝑛 !
Probability Calculus
Formula of Laplace
|𝐸|
𝑃(𝐸) =
|Ω|
Probability of an event’s complement Probablity of a union of events (or)
𝑃(𝐸 ) = 1 − 𝑃(𝐸) 𝑃(𝐸 ∪ 𝐸 ) = 𝑃(𝐸 ) + 𝑃(𝐸 ) − 𝑃(𝐸 ∩ 𝐸 )
Condition probability Bayes’ Rule
𝑃(𝐸 ∩ 𝐹) 𝑃(𝐸|𝐹)𝑃(𝐹)
𝑃(𝐸|𝐹) = 𝑃(𝐹|𝐸) =
𝑃(𝐹) 𝑃(𝐸|𝐹)𝑃(𝐹) + 𝑃(𝐸|𝐹 )𝑃(𝐹 )

Bayes spam filter (twee words)

𝑃(𝑤 )𝑃(𝑤 )
𝑅(𝑤 , 𝑤 ) =
𝑃(𝑤 )𝑃(𝑤 ) + 𝑄(𝑤 )𝑄(𝑤 )
Discrete en continuous stochastic variables
Binomial distribution B(n,p)
𝑃(𝑋 = 𝑘) = 𝐶(𝑛, 𝑘)𝑝 𝑞 𝐸(𝑋) = 𝑛. 𝑝 = 𝜇 𝜎² = 𝑛. 𝑝. 𝑞
Poisson distribution P(𝝀)

𝑃(𝑋 = 𝑥) = !
𝐸(𝑋) = 𝜇 = 𝜆 𝜎² = 𝜆

Normal distribution N(µ, 𝝈) Z-score (standardisation) 𝑷(𝝁 − 𝝈 ≤ 𝑿 ≤ 𝝁 + 𝝈) = 𝑷(−𝟏 ≤ 𝒁 ≤ 𝟏) ≈ 𝟔𝟖, 𝟐𝟕%

𝑷(𝝁 − 𝟐𝝈 ≤ 𝑿 ≤ 𝝁 + 𝟐𝝈) = 𝑷(−𝟐 ≤ 𝒁 ≤ 𝟐) ≈ 𝟗𝟓, 𝟒𝟓%
1 ( ) 𝑋−𝜇
𝑓(𝑥) = 𝑒 𝑍= 𝑷(𝝁 − 𝟑𝝈 ≤ 𝑿 ≤ 𝝁 + 𝟑𝝈) = 𝑷(−𝟑 ≤ 𝒁 ≤ 𝟑) ≈ 𝟗𝟗, 𝟕𝟑%
𝜎√2𝜋 𝜎

Descriptive statistics
Standard deviation
∑ 𝑛 (𝑥 − 𝑥̅ )
𝑠 =
𝑛−1

Formulas Statistics and decision making
Data and distance
Euclidian distance Minkowski distance

𝑑(𝑋, 𝑌) = (𝑥 − 𝑦 )² 𝑑 = 𝑥 −𝑥

Simple matching, Jaccard and cosine

𝑓 +𝑓 𝑓 (𝑥⃗ ∙ 𝑦⃗)
𝑆𝑀𝐶 = 𝐽= cos(𝑥⃗, 𝑦⃗) =
𝑓 +𝑓 +𝑓 +𝑓 𝑓 +𝑓 +𝑓 ‖𝑥⃗‖‖𝑦⃗‖

Edit distance
EditDistance(string1, string2) = length(string1) + length(string2) – 2 ∗ LCS

With LCS = longest common subsequence

Classification – Decision Trees

Entropy, Gini and Information gain
Entropy(𝑡) Gini(𝑡) Information gain
=1 Classification error (𝑡)
= 1 − max[𝑝(𝑖|𝑡)] 𝑁 𝑣
=− 𝑝(𝑖|𝑡) log 𝑝(𝑖|𝑡) ∆= 𝐼(parent) − 𝐼(𝑣 )
− [𝑝(𝑖|𝑡)]² 𝑁

With 𝑝(𝑖|𝑡) the fraction of records belonging to class 𝑖 in node 𝑡, 𝐼 𝑣 the impurity of a given node
𝑣 ; N the total number of nodes; k the number of attribute values and 𝑁 𝑣 the number of records
of the child node 𝑣 .
Bayesian Classification
Bayes posterior probability
𝑃(𝐴 𝐴 … 𝐴 |𝐶)𝑃(𝐶)
𝑃(𝐶|𝐴 𝐴 … 𝐴 ) =
𝑃(𝐴 𝐴 … 𝐴 )
Naive Bayes Classification
Assuming independence between attributes 𝐴 with given class
𝑃(𝐴 𝐴 … 𝐴 |𝐶) = 𝑃 𝐴 𝐶 . 𝑃 𝐴 𝐶 … 𝑃(𝐴 |𝐶 )
New data point is classified as 𝐶 when 𝑃 𝐶 ∏ 𝑃 𝐴 𝐶 is maximal.
Confusion matrix
Recall or TPR FPR
𝑇𝑃 𝐹𝑃
𝑟 = 𝑇𝑃𝑅 = 𝐹𝑃𝑅 =
𝑇𝑃 + 𝐹𝑁 𝑇𝑁 + 𝐹𝑃
Precision F1
𝑇𝑃 2 ∗ 𝑇𝑃
𝑝= 𝐹 =
𝑇𝑃 + 𝐹𝑃 2 ∗ 𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
89% (45)
12 Week Program: Summer Body Starts Now
70 pages
Knee Ability Zero Now Complete As A Picture Book 4 PDF Free
94% (68)
Knee Ability Zero Now Complete As A Picture Book 4 PDF Free
49 pages
Read People Like A Book by Patrick King-Edited
61% (72)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (77)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (541)
How To Develop and Write A Grant Proposal
17 pages
Workbook For The Body Keeps The Score
88% (52)
Workbook For The Body Keeps The Score
111 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (29)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
78% (27)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
Satanic Calendar
25% (55)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (7)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
ALCHEMIST
64% (14)
ALCHEMIST
4 pages
1001 Songs
70% (70)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Operator'S Manual: Robot M-900+B /360/360E/280L/280/330L
No ratings yet
Operator'S Manual: Robot M-900+B /360/360E/280L/280/330L
174 pages
Embedded Brochure
No ratings yet
Embedded Brochure
8 pages
Exam P Formula Sheet PDF
No ratings yet
Exam P Formula Sheet PDF
9 pages
Kaspersky Antivirus Price List 091025
No ratings yet
Kaspersky Antivirus Price List 091025
4 pages
CQE Academy Equation Cheat Sheet - D
No ratings yet
CQE Academy Equation Cheat Sheet - D
15 pages
BB113 Formula Sheet
No ratings yet
BB113 Formula Sheet
5 pages
CRE Equations and Formulas Print Out
No ratings yet
CRE Equations and Formulas Print Out
30 pages
Eda Formulas
No ratings yet
Eda Formulas
2 pages
GB Academy Equation List
No ratings yet
GB Academy Equation List
16 pages
Course Reference Sheet - Revision 2
No ratings yet
Course Reference Sheet - Revision 2
6 pages
TIme-series Analysis
No ratings yet
TIme-series Analysis
17 pages
Data Sheet
No ratings yet
Data Sheet
8 pages
Summary
No ratings yet
Summary
3 pages
Sta2023 Formula Sheet 1
No ratings yet
Sta2023 Formula Sheet 1
4 pages
WTW123 - Formulas, Proofs and Graphs
No ratings yet
WTW123 - Formulas, Proofs and Graphs
22 pages
Financial Statistics - Formula Sheet
No ratings yet
Financial Statistics - Formula Sheet
26 pages
Formula sheet for Midterm_F24
No ratings yet
Formula sheet for Midterm_F24
3 pages
IB Maths SL Formula Sheet 2019
No ratings yet
IB Maths SL Formula Sheet 2019
1 page
Numerical Methods For Engineers Formulas
No ratings yet
Numerical Methods For Engineers Formulas
1 page
Binomial Distribution
No ratings yet
Binomial Distribution
2 pages
List of Formula - Managerial Statistics
No ratings yet
List of Formula - Managerial Statistics
6 pages
EEE 147 Reviewer
No ratings yet
EEE 147 Reviewer
4 pages
Formulario Parcial III
No ratings yet
Formulario Parcial III
2 pages
Stat Cheatsheet (Ver.2)
No ratings yet
Stat Cheatsheet (Ver.2)
2 pages
SL Maths 1 Page Formula Sheet
No ratings yet
SL Maths 1 Page Formula Sheet
1 page
Statprob Notes
No ratings yet
Statprob Notes
31 pages
Formula sheet for Final_F24
No ratings yet
Formula sheet for Final_F24
4 pages
Estimation of Mean Vector and Variance Covariance Matrix PDF
No ratings yet
Estimation of Mean Vector and Variance Covariance Matrix PDF
7 pages
课本附录 (二) - 公式表 Formula Sheet - final
No ratings yet
课本附录 (二) - 公式表 Formula Sheet - final
2 pages
Formula of Chapter 1 - 5 (Statistics & Probability)
No ratings yet
Formula of Chapter 1 - 5 (Statistics & Probability)
5 pages
Cheat Sheet
No ratings yet
Cheat Sheet
6 pages
SFU MACM 409 Chapter 1 Notes
No ratings yet
SFU MACM 409 Chapter 1 Notes
11 pages
Assignment 2 Documentation
No ratings yet
Assignment 2 Documentation
15 pages
Che320 Midterm Review
No ratings yet
Che320 Midterm Review
24 pages
Negative Binomial Regression
No ratings yet
Negative Binomial Regression
36 pages
ACTMath Formula Sheet 2
No ratings yet
ACTMath Formula Sheet 2
2 pages
Analysis
No ratings yet
Analysis
11 pages
equation_sheet
No ratings yet
equation_sheet
5 pages
Formula sheet
No ratings yet
Formula sheet
2 pages
Formula Sheet
No ratings yet
Formula Sheet
4 pages
Common Univariate Random Variables
No ratings yet
Common Univariate Random Variables
3 pages
Regression-probabilistic-perspective
No ratings yet
Regression-probabilistic-perspective
20 pages
EXAM FormulaSheet
No ratings yet
EXAM FormulaSheet
9 pages
08 Kmethods3 Curse Deminsionality
No ratings yet
08 Kmethods3 Curse Deminsionality
44 pages
Formulas - MathSc1 (2024)
No ratings yet
Formulas - MathSc1 (2024)
6 pages
7.4 Notes
No ratings yet
7.4 Notes
2 pages
Statistics Mock Test (Dec 24)
No ratings yet
Statistics Mock Test (Dec 24)
3 pages
Exam_Equation_Sheet
No ratings yet
Exam_Equation_Sheet
3 pages
Formulas
No ratings yet
Formulas
1 page
Formula Sheet For Final - Exam
No ratings yet
Formula Sheet For Final - Exam
1 page
Formula Sheet
No ratings yet
Formula Sheet
4 pages
Formulario Analisis de Sistemas
No ratings yet
Formulario Analisis de Sistemas
1 page
Formula Sheet For Final - Exam
No ratings yet
Formula Sheet For Final - Exam
1 page
Communication Theory Lecturer Notes - Dr Roshan Godaliyadda
No ratings yet
Communication Theory Lecturer Notes - Dr Roshan Godaliyadda
80 pages
Theory of Approximation and Splines-I Lecture-1 Basic Concepts of Interpolation
No ratings yet
Theory of Approximation and Splines-I Lecture-1 Basic Concepts of Interpolation
4 pages
Formulas
No ratings yet
Formulas
2 pages
Dapr2 Equation Sheet
No ratings yet
Dapr2 Equation Sheet
2 pages
Mech 430 Final Formula Sheet Updated
No ratings yet
Mech 430 Final Formula Sheet Updated
12 pages
ML-Map-and-Bayseian
No ratings yet
ML-Map-and-Bayseian
35 pages
Probability Formula Sheet
No ratings yet
Probability Formula Sheet
4 pages
Week+3_418
No ratings yet
Week+3_418
9 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
Walkie Talkie PRM446
No ratings yet
Walkie Talkie PRM446
1 page
FM Radio Receiver
No ratings yet
FM Radio Receiver
1 page
FM Radio Receiver -
No ratings yet
FM Radio Receiver -
3 pages
FM Radio Receiver rework 2 2 channels
No ratings yet
FM Radio Receiver rework 2 2 channels
1 page
FM Radio Receiver rework 2 1 channel
No ratings yet
FM Radio Receiver rework 2 1 channel
1 page
Cover Letter - Jeroen Janssens - English
No ratings yet
Cover Letter - Jeroen Janssens - English
1 page
Alaina: Cleric (Grave) 2 Tiefling (Fierna)
No ratings yet
Alaina: Cleric (Grave) 2 Tiefling (Fierna)
3 pages
Ashron
No ratings yet
Ashron
3 pages
Yelena Tilldeboult: Rogue (Scout) 1 Human
No ratings yet
Yelena Tilldeboult: Rogue (Scout) 1 Human
3 pages
Eltherian: Wood Elf
No ratings yet
Eltherian: Wood Elf
3 pages
Furt'th'darian: Druid Circle of Moon 1 Wood Elf
No ratings yet
Furt'th'darian: Druid Circle of Moon 1 Wood Elf
3 pages
HR Analytics Course Syllabus
No ratings yet
HR Analytics Course Syllabus
6 pages
PSM Circular No. E of 2023
100% (2)
PSM Circular No. E of 2023
53 pages
Improvising Your Own Wireless Router
No ratings yet
Improvising Your Own Wireless Router
8 pages
Chap 1 12 09 23
No ratings yet
Chap 1 12 09 23
3 pages
QCM Informatique Generale 50 Questions
No ratings yet
QCM Informatique Generale 50 Questions
14 pages
8 Signs Upgrading TDM PBX
No ratings yet
8 Signs Upgrading TDM PBX
3 pages
Diagnosis_of_Coronary_Heart_Disease_Through_Deep_Learning-Based_Segmentation_and_Localization_in_Computed_Tomography_Angiography
No ratings yet
Diagnosis_of_Coronary_Heart_Disease_Through_Deep_Learning-Based_Segmentation_and_Localization_in_Computed_Tomography_Angiography
17 pages
RAD Data Communications ETX-203A Installation and Operation Manual
No ratings yet
RAD Data Communications ETX-203A Installation and Operation Manual
310 pages
Entrep Leadership - Sample Case Study
No ratings yet
Entrep Leadership - Sample Case Study
2 pages
2 IP-20G Radio Frequency Units
No ratings yet
2 IP-20G Radio Frequency Units
29 pages
EL-O-MATIC Brch-Eng
100% (1)
EL-O-MATIC Brch-Eng
24 pages
Asus EeePC P700 - 701 Schematic
No ratings yet
Asus EeePC P700 - 701 Schematic
48 pages
Bahria University (Karachi Campus) : Midterm Examination - Spring Semester - 2020
No ratings yet
Bahria University (Karachi Campus) : Midterm Examination - Spring Semester - 2020
4 pages
Installation Manual MB-504: I.Accessory Parts II. Installation Procedure
No ratings yet
Installation Manual MB-504: I.Accessory Parts II. Installation Procedure
6 pages
A Systematic Review On Digital Literacy
No ratings yet
A Systematic Review On Digital Literacy
18 pages
NVR1008H-8P(UN)1.20 Datasheet
No ratings yet
NVR1008H-8P(UN)1.20 Datasheet
6 pages
Installing VMS Pro Instructions (1)
No ratings yet
Installing VMS Pro Instructions (1)
2 pages

Formulas Data Analytics

Uploaded by

Formulas Data Analytics

Uploaded by

Formulas Statistics and decision making

Formulas Data Analytics

Bayes spam filter (twee words)

Normal distribution N(µ, 𝝈) Z-score (standardisation) 𝑷(𝝁 − 𝝈 ≤ 𝑿 ≤ 𝝁 + 𝝈) = 𝑷(−𝟏 ≤ 𝒁 ≤ 𝟏) ≈ 𝟔𝟖, 𝟐𝟕%

© Brian Baert en Dirk Vandycke 1

Simple matching, Jaccard and cosine

With LCS = longest common subsequence

Classification – Decision Trees

© Brian Baert en Dirk Vandycke 2

You might also like