Formulas Data Analytics
Formulas Data Analytics
𝑃(𝑋 = 𝑥) = !
𝐸(𝑋) = 𝜇 = 𝜆 𝜎² = 𝜆
Descriptive statistics
Standard deviation
∑ 𝑛 (𝑥 − 𝑥̅ )
𝑠 =
𝑛−1
𝑑(𝑋, 𝑌) = (𝑥 − 𝑦 )² 𝑑 = 𝑥 −𝑥
Edit distance
EditDistance(string1, string2) = length(string1) + length(string2) – 2 ∗ LCS
With 𝑝(𝑖|𝑡) the fraction of records belonging to class 𝑖 in node 𝑡, 𝐼 𝑣 the impurity of a given node
𝑣 ; N the total number of nodes; k the number of attribute values and 𝑁 𝑣 the number of records
of the child node 𝑣 .
Bayesian Classification
Bayes posterior probability
𝑃(𝐴 𝐴 … 𝐴 |𝐶)𝑃(𝐶)
𝑃(𝐶|𝐴 𝐴 … 𝐴 ) =
𝑃(𝐴 𝐴 … 𝐴 )
Naive Bayes Classification
Assuming independence between attributes 𝐴 with given class
𝑃(𝐴 𝐴 … 𝐴 |𝐶) = 𝑃 𝐴 𝐶 . 𝑃 𝐴 𝐶 … 𝑃(𝐴 |𝐶 )
New data point is classified as 𝐶 when 𝑃 𝐶 ∏ 𝑃 𝐴 𝐶 is maximal.
Confusion matrix
Recall or TPR FPR
𝑇𝑃 𝐹𝑃
𝑟 = 𝑇𝑃𝑅 = 𝐹𝑃𝑅 =
𝑇𝑃 + 𝐹𝑁 𝑇𝑁 + 𝐹𝑃
Precision F1
𝑇𝑃 2 ∗ 𝑇𝑃
𝑝= 𝐹 =
𝑇𝑃 + 𝐹𝑃 2 ∗ 𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁