0% found this document useful (0 votes)

7 views19 pages

Business Analytics

Summary of business analytics course

Uploaded by

giuliespos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views19 pages

Business Analytics

Summary of business analytics course

Uploaded by

giuliespos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 19

BUSINESS

ANALYTICS
Lessons: Monday , Tuesday, Wednesday 9:00-11:00 a.m.

NON COMPULSORY COMPUTING LAB

Prof. Carlo Cavicchia February 29; March 1-7-14-15
FINAL TEST 22 March at 9:00-11:00 (Solo se segui tutte le lezioni)
A Lab exam describing a statistical analysis using SAS on a specific dataset.
You may add until 2 points to the score of the final exam.
Software: SAS studio – Sas OnDemand for Academics

FINAL EXAM: 4 APRIL multiple choice/open answers test, written or on-line time: one hour

Books
J. Neter, M. Kutner, C. Nachtsheim, W. Wasserman, 1996, Applied Linear Regression Models, Irwin J. Lattin,
J. Carroll, P. Green, 2003, Analyzing Multivariate Data, Thomson

Office: Room SB5, 3rd floor, Building B

06.72595943 [email protected]
Office hours: Monday , 2:30-3:30 p.m.

Pre-requisites for the course

Basic knowledge of descriptive statistics, elements of probability, random variables and statistical
inference. (use f.i. Statistical background slide
Lesson 1 19/02/24

WHY BUSINESS ANALYTICS AND NOT STATISTICS?

In the last 20 years we developed the possibility to deal with statistical inquiries that include more and
more qualitative information, not anymore just numerical data.
Today we have real time information about almost everything. (Thanks for example to socials we now have
information about many consumer preferences, 500 terabytes per day by Facebook)

ERA OF BIG DATA An enormous amount of data with 3 characteristics (VVV)

VOLUME
We can store tons of data, or better, there is so
much data that physical facilities are needed to store
them (clouds have physical supports)

VELOCITY
ex years ago Census required almost a year to be
processed, now is real time. There has been an
increase in computational capability

VARIETY
not only traditional data (income, age…) but also
other formats as images, audio, videos… which
require new methods for analysis: Deep Learning and
Machine Learning, both based on algorithms.

How does a dependent variable Y change on the basis of independent, explanatory variables Xn?

We can study two types of relationships:

- Y as influenced by explanatory unrelated variables

- Reciprocal Relationships between X variables

MACHINE LEARNING MODELS

SUPERVISED
• Parametric Method = we have assumptions about the behaviour or the variable Y
Y is a random variable
whose behaviour is like a random normal variable

• Non Parametic Approach = no assumptions on Y

Data Driven Approach: we have such a large set of data that no assumption is needed, big data is sufficient
to tell something about the behaviour of the variable

Lecture 2 20/02/2024

STATISTICAL MODELS
Regression

Model Construction
1. Choice of variables what phenomena Y we want to study in relation to which Xn
2. Data collection recording observations by primary or secondary sources
3. Model Type Selection suitable to describe the X,Y relationship and fit to the data collected
4. Parameters estimation Bo and B1 to get expected value and variance
5. Goodness of fit of the model R2
+
6. Purpose of use
• Explanatory analysis to analyse the relationships between variables X and Y, mostly parametric
• Forecasting predict the behaviour of Y, non-parametric, ignores the relationships
• Simulate describing different scenarios on the basis of different inputs X

If we se a linear relationship between Y and only one X we can use a Scatterplot Graph
A positive slope indicates a positive relation, and
the opposite a negative one.

Sample Data Collection

variables y and x can be measured…
1. Cross-section on n different units, in a
certain instant T=1
2. Time Series on n different units in T different times T=n
3. Panel always on the same n units in T different times T=n

Lecture 2 20/02/2024

Simple Linear Regression Model

Assumptions Needed For A Parametric Model

- Shape of the Function F(x) on the basis the variable type

- NATURE OF THE DISTURBANCE Ꜫi

random variables have different behaviours, so we need to first understand its nature

E(εi |xi )=0 The average Value of Ꜫi given xi is equal to 0

This is true when we have both positive and negative values of X
-100 -20 -2 +2 +20 +100 Expected Value =
Average = 0

Var(εi |xi )=σ2 Variance of Epsilon is costant

Omoschedasticity : Increasing X, the variability remains the same.
We adopt this assumption in our exercises for simplicity, even if there are
exceptions.
for little changes in income, x=1  x =2 consumption variability remains
unchanged
for big changes in income x=1 x=200  consumption variability, variance changes

εi is a Normal Random Variable

εi i=1,…,n Since it is a normal random variable, the values of Ꜫi are incorrelated,

independent

Given these assumptions, we can analyse the BEHAVIOUR OF Y

- The regression function f ( x )= β0 + β 1 x 1 describes the relationship between X and the conditional
expectation of Y
E ( Y i|X i) =μi=β 0 + β 1 x 1 derives from
X =E ( β 0 + β 1 x 1+ ε|X )=E ( β 0|X ) + E ( β1 x 1| X 1 )+ E ( ε|X )=β 0 + β 1 x 1+ 0
- β 0=E ( Y i| X i )=0
depending on the inquiry, b0 can be relevant or not
price Y per square metre X=0, B0 irrelevant consumption Y for income X=0, relevant

- β 1 is the average change in Y when increasing X by 1 unit for linear relationships only

- Ꜫi includes the effect of omitted variables and noise factors on the response variable Y
if noise is very high, maybe we missed an important variable X in our model

RECAP REGRESSION Equation E ( Y i|X i) =μi=β 0 + β 1 x 1

2
Hypothesis Y i N ( μi, σ ε )

ALL THE VARIABILITY OF Y DEPENDS ON THE VARIABILITY OF THE NOISE TERM Ꜫi

2 2
V (Y ∨X )=V ( β 0 + β 1 x 1 +εi)=Var (β 0∨X )+V (β 1∨X )+V (ε∨x )=0+0+ σ ε =σ ε
2
So we can rewrite our hypothesis as : Y i∨ X i N ( β0 + β 1 x1 , σ ε )

Now that we have an idea of the type of model we are about to use, we can go on with the estimation of
the parameters, who will reveal useful information for our inquiry.
If now have to find the parameters

RIVEDERE PEZZO CON LUDO

ESTIMATION OF REGRESSION PARAMETERS

Ordinary Least Squares Method
estimates the the regression parameters that minimize the Sum of Square Criterion, that minimize the
distance the observed data and the estimated regression line.
Here in fact yi and xi are observed values and not random variables.

We use two Estimators, formulas to get the value of the population parameter from a little sample

They are both Unbiased, there is no bias

and this means that they are reliable and very close to the real parameter, because they are the average of
the possible results from all possible samples of the entire population.
In general by changing samples we could get different estimations of beta, but the average of all samples
allows us to overcome this problem.
All else being equal, an unbiased estimator is preferable to a biased estimator, although in practice, biased
estimators are frequently used. When a biased estimator is used, bounds of the bias are calculated.

We also want the estimators to be Efficient : showing the smallest possible variance across samples,
indicating that there is a small deviance between the estimated value and the "true" value.

Of course, the best estimator is both Unbiased and Efficient, but if we had to choose one?
Bias- Variance Tradeoff
In general, a simpler model can have higher bias and lower variance. Bias gets down while variance goes up
while a model becomes complicated. Which is better over another pretty much depends on the aim of
analysis. If interpretation is more important, simpler models would serve well (we preferer biased over
unbiased) while complicated models, so called black-box models, would be necessary if prediction is of
main interest (we prefer efficient, even if biased).

Once we obtained our estimates of the parameters β0 and β1 we can go on and compute the Expected
Value μ and Variance σ2, which in the case of Linear Regression are β 0 + β 1 x 1 and σ ε .
2
For the variance we need to estimate the variability of the error Ꜫ.

Lecture 3 21/02/24
In SAS Studio

Is the regression line we drew close or far to the sample data?

Fit to have a prediction very close to the sample. In this case the high variability of Y does
not allow for good predictions

To understand if the model is reliable for making predictions, we need to make a GOODNESS OF FIT TEST

We first need the Composition of Total Variance

Start with the variability of a point in respect to y

Then consider the Deviance

describes variability of Y in the sample data
R2 tells us about the usefulness of our regression analysis, how much of ∆Y is explained by ∆X

If R2=0  SSM = 0 E(Y) does not change according to X…. Y is a constant = a flat line
no correlation, my model is not useful.

The more explanatory variables X I use, the more R2 will increase.

Limitations of R2
First case the relation is not linear, is quadratic, but still the model il a good fit.
Second case R2 is low, and the model, the linear regression is not a good approximation of the real Y=f(x)
is not that there is no relationship between X and Y, only that is not explained by linear regression.

Third Parameter Variance of the Noise Term

Why this one?

Y = β0 + β 1 x1 + εε =Y −β 0 + β 1 x 1

The problem is that we cannot observe epsilon, we just can estimate it by

An therefore we apply a correction, particularly important when we have a small sample.

xx
Look at an example

By MSR and Root MSE we have the full information about the variability of our data.

WHICH IS THE REAL FUNCTION Y=F(X) ?

It is impossibile to look at the whole population’s values of Y and X, we just look at a sample when we
estimate B0 and B1.
The problem is that samples are subsets of N and they can differ from each other, so they lead us to
different values of Bo and B1.  Sample Variability

We overcome this obstacle by thesting our inference about Bo and B1 by using

statistical test + confidence interval

T-test
Null Hypothesis H0 usually implies no correlation, so b1=0
H1 some kind of correlation B1 ≠0

TEST STATISTICS

If the Standard error is very high, (+40 in a sample, -20 in another) then B1 is not stable, we need other
estimations. When close to 0, we are making a good approximation of B1

If we get t form many different random sample, then t is a random variable: T-student
If we change the degrees of freedom we change the shape of the distribution.
In other words, the P-value is the plausibility of the Null Hypothesis (close to 0 = rejection)
Aplha is the level of confidence, usually we admit a 95% certainty.

Consider the t-test for a fixed level of α.

If P Value is 0  Null Hypothesis is rejected
Imagine that now the P-Value is 0,05 = 5%  we now accept the null Hypothesis

Example
Lecture 4 26.02.2023

RESIDUAL ANALYSYS
tools to verify the our assumptions, hypotheses are true
roperties of Residuals

Standardizzazione = trasformi la variabile in una Normale, con media E(0) =0 e V(X)=1

we obtain new values that are comparable with eachother

DEPARTURES FROM MODELS TO BE STUDIED BY RESIDUALS

1. The regression function is not linear
2. The error terms don’t have constant variance
3. The error terms are not independent
4. The model fits all but one or few outlier observations (we can check the presence of outliers, whether
the capability of the model is able to consider them)
5. The error terms are not normally distributed

Diagnostics for residuals based on plots:

1. Plot of residuals against explanatory variable or again fitted values
2. Plot of residuals against time
3. Box plot or histogram of residuals
4. Normal probability plot of residuals
The position of the residual is random, but in this case the are negative, positive, then negative…we can tell
this is not a random behavior…still the linear regression function is not a good model for Y=f(x)
By the residual, we can see that the true function is not linear.

Nonconstancy of error variance

We can use the residual plot against the fitted values
The position of the residuals is random, but we see that the more we increase the Y^ values, the variability
of residuals increases…..omoschedasticity is not true
anymore.

NB: If the variance of the error term is not constant,

we need to change the formulas to obtain B0 and B1.

Normality of residuals
A normal quantile plot graphs the quantiles of a variable against the quantiles of a normal (Gaussian)
distribution. qnorm is sensitive to non-normality near the tails, and indeed we see considerable deviations
from normal, the diagonal line, in the tails.

The variance of X is equal to the variance of the

observed residuals.
We overlap the theoretical curve with the observed
data (histogram) check for Normality: is my empirical
distribution equal to the theoretical distribution?

We can more easily use another graph  QNORM

If the red curve is equal to the istogram, we are on the
line…otherwise not.
In this case there is a good match between tehoritical
and empirical, with a little mismatch in the tails … still
the assumption of normality is true.

Presence of outliers
Three different types of outliers:
1: a value of Y very different from the whole sample but not X
2: values of X and Y very different from the whole sample
3: a value of X very different from the whole sample but not Y

Are point 1,2,3 strange points or not?

We now apply standardization

We now plot the standardized residual e* and see that there is a 99% probability that values are between
+3 and -3.
So in this case, OR point 3 is a mistake OR my model, my linear function is not fit for all data.

So, the first step is to understand if we do have outliers in the data set. To do so, we can use a BOXPLOT
The median is robust, is a measure of position In this
case the distribution is quite symmetrical, as Q2 is in
the middle of the box.

SAS
Classification Variable = Qualitative or Discrete Quantitative
Continuous Variable = Quantitivate

Volume of the noise = variance of the residual errors.

Correlation index -1 < R < +1

Square Correlation Index 0 < R2 < 1

When we only use 1 explanatory variable, we can obtain a

SPURIOS CORRELATION
two variables Y and X have no direct causal connection, yet it may be
wrongly inferred that they do, due to either coincidence or the presence
of a certain third, unseen factor Z.
↑X =↑Y ma magari solo perche ↑Z

For instance we consider a sample student

We observe Stature and Hair length
We estimate simple linear regression with Y= Hair length and X=Stature

There is only an apparent relationship between Stature and Hair length, which is not even logical.
Once we include gender in our study, the relationship is more plausible (and we can ignore stature).

binomial = dummy variable (assume 2 valori, 0 oppure 1, si/no)

LESSON 1_10-03-2025_merged
No ratings yet
LESSON 1_10-03-2025_merged
210 pages
Qbus2810 Notes PDF
100% (1)
Qbus2810 Notes PDF
58 pages
week01_lecture_Lyu (2)
No ratings yet
week01_lecture_Lyu (2)
70 pages
Screenshot 2024-12-15 at 8.15.38 PM
No ratings yet
Screenshot 2024-12-15 at 8.15.38 PM
138 pages
Data Analytics Unit 3
No ratings yet
Data Analytics Unit 3
104 pages
Advancedeconometricsl3!4!240128102442 58a0f1f1
No ratings yet
Advancedeconometricsl3!4!240128102442 58a0f1f1
58 pages
SIMPLE LINEAR REGRESSION ANALYSIS..
No ratings yet
SIMPLE LINEAR REGRESSION ANALYSIS..
51 pages
Econometrics Notes
No ratings yet
Econometrics Notes
30 pages
StatLearning2r PDF
No ratings yet
StatLearning2r PDF
267 pages
Econ20222 MJAbackgr
No ratings yet
Econ20222 MJAbackgr
164 pages
Econometrics Unit 3 Tedy Best
No ratings yet
Econometrics Unit 3 Tedy Best
147 pages
Unit 5
No ratings yet
Unit 5
104 pages
Ordinary least Squares
No ratings yet
Ordinary least Squares
54 pages
Lecture 1
No ratings yet
Lecture 1
36 pages
Ordinary Differential Equations and Stability Theory: An Introduction
From Everand
Ordinary Differential Equations and Stability Theory: An Introduction
David A. Sanchez
No ratings yet
Week 5 Notes
No ratings yet
Week 5 Notes
175 pages
Unit 1 - Part 1
No ratings yet
Unit 1 - Part 1
105 pages
Stat 473-573 Notes
No ratings yet
Stat 473-573 Notes
139 pages
2
No ratings yet
2
62 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
BA501 Week5 Linear Regression
No ratings yet
BA501 Week5 Linear Regression
45 pages
Lec2 ASE
No ratings yet
Lec2 ASE
86 pages
Econometrics Session
No ratings yet
Econometrics Session
43 pages
Chapter 6
No ratings yet
Chapter 6
58 pages
Week1 Lecture2
No ratings yet
Week1 Lecture2
57 pages
Chapter1 - An Overview of Regression Analysis
No ratings yet
Chapter1 - An Overview of Regression Analysis
35 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
Unit-III
No ratings yet
Unit-III
13 pages
Statistical Learning
No ratings yet
Statistical Learning
31 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Week01 Lecture BB
No ratings yet
Week01 Lecture BB
70 pages
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
No ratings yet
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
20 pages
ECO 391-007 Lecture Handout For Chapter 15 SPRING 2003 Regression Analysis Sections 15.1, 15.2
No ratings yet
ECO 391-007 Lecture Handout For Chapter 15 SPRING 2003 Regression Analysis Sections 15.1, 15.2
22 pages
DA-3rd unit
No ratings yet
DA-3rd unit
16 pages
Regression Analysis
No ratings yet
Regression Analysis
65 pages
Chapter Two Metrics (I)
No ratings yet
Chapter Two Metrics (I)
35 pages
CHAPTER 2
No ratings yet
CHAPTER 2
17 pages
Unit-III (Data Analytics)
50% (2)
Unit-III (Data Analytics)
15 pages
Econometrics Chapter Two
No ratings yet
Econometrics Chapter Two
36 pages
01 SLR Final
No ratings yet
01 SLR Final
37 pages
Econometrics (Yamaguchi)
No ratings yet
Econometrics (Yamaguchi)
30 pages
CH 06
No ratings yet
CH 06
22 pages
Lesson 2 Linear Regression
100% (1)
Lesson 2 Linear Regression
21 pages
CH 2
No ratings yet
CH 2
31 pages
Regression Analysis
No ratings yet
Regression Analysis
41 pages
ECON3049 Lecture Notes 1
No ratings yet
ECON3049 Lecture Notes 1
32 pages
M2L2 CLRM & Simple Linear Regression Analysis
No ratings yet
M2L2 CLRM & Simple Linear Regression Analysis
13 pages
Short - Notes - Econometric Methods
No ratings yet
Short - Notes - Econometric Methods
22 pages
Midterm 2 Nem Veg Leges
No ratings yet
Midterm 2 Nem Veg Leges
9 pages
Multiple Regression
No ratings yet
Multiple Regression
49 pages
Unit - 1
No ratings yet
Unit - 1
8 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
Linear Regression Analysis: Module - I
No ratings yet
Linear Regression Analysis: Module - I
13 pages
econometrics-cheat-sheet
No ratings yet
econometrics-cheat-sheet
4 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Student Notes Madule 2
No ratings yet
Student Notes Madule 2
12 pages
Topic0 Introduction
No ratings yet
Topic0 Introduction
9 pages
Solution Manual for Precalculus Mathematics for Calculus 7th Edition Stewart Redlin Watson 1305071751 9781305071759 - Available With All Chapters For Instant Download
100% (5)
Solution Manual for Precalculus Mathematics for Calculus 7th Edition Stewart Redlin Watson 1305071751 9781305071759 - Available With All Chapters For Instant Download
46 pages
TY BSC Statistics
No ratings yet
TY BSC Statistics
34 pages
Data Science Q&A - Latest Ed (2020) - 3 - 1
No ratings yet
Data Science Q&A - Latest Ed (2020) - 3 - 1
2 pages
Lecture Notes - Econometrics I - Andrea Weber
No ratings yet
Lecture Notes - Econometrics I - Andrea Weber
119 pages
Powerpoint 3 (Confidence Intervals) 2425
No ratings yet
Powerpoint 3 (Confidence Intervals) 2425
50 pages
Random Walk A Modern Introduction 2010
No ratings yet
Random Walk A Modern Introduction 2010
378 pages
Chapter 6: How To Do Forecasting by Regression Analysis
No ratings yet
Chapter 6: How To Do Forecasting by Regression Analysis
7 pages
Causal Inference Extended Tutorial
No ratings yet
Causal Inference Extended Tutorial
189 pages
Utility Analysis PDF
No ratings yet
Utility Analysis PDF
124 pages
Introduction to Statistical Concepts_new1
No ratings yet
Introduction to Statistical Concepts_new1
57 pages
CHP 5
No ratings yet
CHP 5
101 pages
R20 M.Tech DS
No ratings yet
R20 M.Tech DS
64 pages
Basic Regression Analysis
No ratings yet
Basic Regression Analysis
5 pages
2018 exam past paper and memo _1_
No ratings yet
2018 exam past paper and memo _1_
18 pages
A Brief Overview of The Classical Linear Regression Model (CLRM)
No ratings yet
A Brief Overview of The Classical Linear Regression Model (CLRM)
85 pages
Ecm Estimating Ebook 1223
No ratings yet
Ecm Estimating Ebook 1223
23 pages
Ridge Regression
No ratings yet
Ridge Regression
24 pages
Advanced FFT Topics and Applications
No ratings yet
Advanced FFT Topics and Applications
26 pages
Properties of The OLS Estimator: Quantitative Methods 2
No ratings yet
Properties of The OLS Estimator: Quantitative Methods 2
57 pages
AP Stats Notes Bias
No ratings yet
AP Stats Notes Bias
26 pages
Briefly Explain The Properties of Good Estimators
No ratings yet
Briefly Explain The Properties of Good Estimators
4 pages
Point Estimation An Introduction
No ratings yet
Point Estimation An Introduction
10 pages
MA2040 Final Exam: 1. Email Address
No ratings yet
MA2040 Final Exam: 1. Email Address
24 pages
Class Work 3
No ratings yet
Class Work 3
5 pages
Kendler Et Al 2011 The Impact of Environmental Experiences On Symptoms of Anxiety and Depression Across The Life Span
No ratings yet
Kendler Et Al 2011 The Impact of Environmental Experiences On Symptoms of Anxiety and Depression Across The Life Span
10 pages
Chapter12 Sampling Successive Occasions
No ratings yet
Chapter12 Sampling Successive Occasions
11 pages
1989 - Harris, T. - Assessment of Control Loop Performance
No ratings yet
1989 - Harris, T. - Assessment of Control Loop Performance
6 pages
L1Norm Genetic
No ratings yet
L1Norm Genetic
10 pages
MLESAC: A New Robust Estimator With Application To Estimating Image Geometry
No ratings yet
MLESAC: A New Robust Estimator With Application To Estimating Image Geometry
19 pages
Problem in Regression Analysis
No ratings yet
Problem in Regression Analysis
7 pages
Stat A Cheat Sheets
No ratings yet
Stat A Cheat Sheets
6 pages
Test of Admission
No ratings yet
Test of Admission
10 pages
Power System Static-State Estimation
No ratings yet
Power System Static-State Estimation
6 pages

Business Analytics

Uploaded by

Business Analytics

Uploaded by

BUSINESS

NON COMPULSORY COMPUTING LAB

Office: Room SB5, 3rd floor, Building B

Pre-requisites for the course

WHY BUSINESS ANALYTICS AND NOT STATISTICS?

ERA OF BIG DATA An enormous amount of data with 3 characteristics (VVV)

We can study two types of relationships:

- Reciprocal Relationships between X variables

• Non Parametic Approach = no assumptions on Y

Sample Data Collection

Simple Linear Regression Model

Assumptions Needed For A Parametric Model

- Shape of the Function F(x) on the basis the variable type

- NATURE OF THE DISTURBANCE Ꜫi

E(εi |xi )=0 The average Value of Ꜫi given xi is equal to 0

Var(εi |xi )=σ2 Variance of Epsilon is costant

εi is a Normal Random Variable

εi i=1,…,n Since it is a normal random variable, the values of Ꜫi are incorrelated,

Given these assumptions, we can analyse the BEHAVIOUR OF Y

RECAP REGRESSION Equation E ( Y i|X i) =μi=β 0 + β 1 x 1

ALL THE VARIABILITY OF Y DEPENDS ON THE VARIABILITY OF THE NOISE TERM Ꜫi

RIVEDERE PEZZO CON LUDO

ESTIMATION OF REGRESSION PARAMETERS

They are both Unbiased, there is no bias

Is the regression line we drew close or far to the sample data?

We first need the Composition of Total Variance

Then consider the Deviance

The more explanatory variables X I use, the more R2 will increase.

Third Parameter Variance of the Noise Term

Why this one?

The problem is that we cannot observe epsilon, we just can estimate it by

An therefore we apply a correction, particularly important when we have a small sample.

WHICH IS THE REAL FUNCTION Y=F(X) ?

We overcome this obstacle by thesting our inference about Bo and B1 by using

Consider the t-test for a fixed level of α.

Standardizzazione = trasformi la variabile in una Normale, con media E(0) =0 e V(X)=1

DEPARTURES FROM MODELS TO BE STUDIED BY RESIDUALS

Diagnostics for residuals based on plots:

Nonconstancy of error variance

NB: If the variance of the error term is not constant,

The variance of X is equal to the variance of the

We can more easily use another graph  QNORM

Are point 1,2,3 strange points or not?

We now apply standardization

Volume of the noise = variance of the residual errors.

Correlation index -1 < R < +1

When we only use 1 explanatory variable, we can obtain a

For instance we consider a sample student

binomial = dummy variable (assume 2 valori, 0 oppure 1, si/no)

You might also like