Intro To Regression
Intro To Regression
MICHAEL SINOCRUZ
Asia Pacific Energy Research Centre
What do we want to know?
Energy Demand
Power generation
How many new power plants are needed or what type of power plant will
be needed to meet the electricity demand?
Is the indigenous supply enough or how much will be imported?
2
Regression Analysis
Regression analysis is concerned with the study of the relationship between one
variable called the explained (or dependent) variable and one or more other
variables called explanatory (or independent) variables.
Reference: Damodar Gujarati, Essentials of Econometrics (second edition), McGraw-Hill
Regression Analysis - enables us to describe a straight line that best fits a series of
ordered pairs (x, y). The equation for a straight line, known as a linear equation w/c is
expressed as:
y = a + bx
3
Regression Analysis
a or the y- intercept
= is the point where the line crosses the y-axis
4
Regression Analysis - Determination Coefficient
Y=a*X+b Y=a*X+b
Y Y
X
X
Y
* Yi Actual value
Yi Yˆi Error i Yi Yˆi
Yi Yi
Yˆi Estimated value
Yˆ Y
i i
Yi Average value
Error Sum of Squares
n
ESS Yi Yˆi
2
Regression Equation
i 1
Total Sum of Squares
X n
TSS Yi Yi
2
i 1
7
Regression Analysis
y = a + bx
8
Regression Analysis
Ŷ = a + bx
= 276.36 + 82.10 x
= 276.36 + 82.10 (8) = PhP 933.16
9
Regression Analysis
r= 0.634
10
Regression Analysis
In other words, 40.2 percent of the variation in the repair bill is explained by the
age of the vehicle. If r2 = 1, all the variation in y is explained by the variable x.
If r2 = 0, none of the variation in y is explained by the variable x.
NOTE :
11
END USE DEMAND 101 (A)
Indicators should be calculated at the most disaggregated end-use level possible in order to represent
each activity level.
International Standard
End use Energy
Industry Sector Industrial Classification (ISIC)
Food and
Disaggregation Beverage
Industrial
Cooking
Basic Metals
Manufacturing Fuel (Oil, Industrial
(Iron and Steel) space cooling
Electricity)
Non-metallic Space
Industry Construction Minerals heating
(Cement)
Specific
energy Industrial
Lighting
Mining and Industry activity
Quarrying
sub-sector
END USE DEMAND 101 (B)
14
Regression Analysis using Simple-E
15
Introduction to SEE – Main Menu
Free area
Usually the comments of the Input the Code Names of the dependent
Code Names are put here. For variables here. Pay attention so that the
example, their meanings, code names should be exactly the same
units, and the sources of the as what have input in the “data” sheet.
data, etc…
Put the Code Name of the dependent “Option Type” includes ①the form of
variable y here. Pay attention to that the relationship between Y and X1, X2,… (equal,
code names should be exactly the same as linear (OLS), Double-log, Semi-log, etc…),
what have been input in the “data” sheet and ②how you want Y to change with time
18
(Linear trend, Growth trend, etc…)
Introduction to SEE – Typical Function Forms
Internal Option
I J Y Type X1 X2
Elasticity
For double-log function ln(Y) = a + b*ln (X), the slope coefficient ‘b’ is
equal to the average elasticity coefficient. This is because
Once click the “All through” button in the Main Menu and if there are no bugs in your
model, the simulation results (the model outputs) will be displayed in the “simulation”
sheet automatically.
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
2023 2024 2025 2026 2027 2028 2029 2 0 3 0 < Su mmary> [Variable s Total 5 ; In te rn al 5 (Re gre ssion 4 ; De fin ition 0 ; Dire c t 1 ); Exte rn al 0 ]
8651.374 9094.028 9558.309 10045.27 10556.03 11091.73 11653.62 12242.95 1. RGDP; G%(5.42/5.16); [Growth Trend, Constant Adjusted]
165244.9 174214.9 183623.2 193491.1 203841.1 214696.8 226082.9 238025.2 2. TFED; G%(6.33/5.41); [=-10068.3+20.2642*RGDP] & [Linear Trend, Constant Adjusted]
177443.8 188393.9 199994 212282.5 225300 239089.4 253696.4 269169 3. TFED; G%(6.33/6.13); [=EXP(1.20766)*(RGDP)^1.20002] & [Linear Trend, Constant Adjusted]
561739.4 683902.2 840675.5 1043859 1309932 1662160 2133782 2772854 4. TFED; G%(6.33/20.81); [=EXP(9.39292+0.000444539*RGDP)] & [Linear Trend, Constant Adjusted]
161484.8 170220 179387.5 189006.5 199097.8 209683.8 220788.1 232435.6 5. TFED; G%(6.33/5.27); [=-2983.3+7.2932*RGDP+0.66185*LAG1.TFED] & [Linear Trend, Constant Adjusted]
Check by parameters
R-squared
T-value
DW-value
etc…
Others
Elasticity
Trend
etc…
Source: Sichao Kan 22
Introduction to SEE – Model Sheet
Check the fitness of your model on the right half of your model.
Model equation
① R-squared ② T-value
Parameters for testing the fitness of
③ Durbin-Watson testing the model
value
Notice: After building the model, go to the “main menu” and click the “All through” button. The
equation of the model and the parameters for testing the fitness of the model will be displayed on
the right half of the “model”. 23
Source: Sichao Kan
Parameters for testing the fitness of your model (estimation)
(1) R R-Squared, 0 Explained variance / Total variance 1,
(The larger the better)
(2) AR Adjusted R-Squared, AR 1, (The larger the better)
(3) SD SD = (e2 /(n-k))1/2 ,
e = Residual, n = Sample size, k = No. of independent variables
(4) t-value t2 : Significant
2t1 : Admissible to use
t1 : Insignificant
(5) DW Durbin Watson Statistics, 1 < DW < 3
DW = 2 : No serial correlation
DW 0 : Positive correlation
DW 4 : Negative correlation
(6) Dh Durbin h Statistics with lag, Dh 2
(7) Rho Coefficient of serial correlation, Rho 1
(8) DF Degree of Freedom, DF > 1 (The lager the better)
(9) F F-Statistics, F > 0 (The larger the better)
(10) RSS Residual Sum of Square, RSS > 0 (The smaller the better)
(11) YX Correlation Coefficient between Y and X’s, YX 1 24
(12) XX Correlation Coefficient between X’s, XX 0.95
Introduction to SEE – How to start model building with SEE?
YY = a * XX + b
26
Introduction to SEE – Functions
Y Type X1
YY = XX
YY = XX
27
Introduction to SEE – Functions
“$DL” – Double Log: Simple E. executes regression after
transforming the variables of both sides to log format.
Y Type X1
YY $DL XX
Log(YY) = a * log(XX) + b
28
Introduction to SEE – Functions
$CA—Constant Adjustment: Simple E adjust between
regression equation and the latest actual value.
Y
Y=a*X+b+c
Y=a*X+b
29
Introduction to SEE – Functions
$TL
Y=a*X+b
X
1 2 3 4 5 6 7 8 9 10 11 12
10.0% 10.0%
$TG
30