What Is Econometrics?: Hypotheses Forecasting
What Is Econometrics?: Hypotheses Forecasting
Econometrics is the quantitative application of statistical and mathematical models using data to
develop theories or test existing hypotheses in economics, and for forecasting future trends from
historical data. It subjects real-world data to statistical trials and then compares and contrasts the
results against the theory or theories being tested. Depending on if you are interested in testing
an existing theory or using existing data to develop a new hypothesis based on those
observations, econometrics can be subdivided into two major categories: theoretical and applied.
Those who routinely engage in this practice are commonly known as econometricians.
Econometrics is the use of statistical techniques to understand economic issues and test theories.
Without evidence, economic theories are abstract and might have no bearing on reality (even if
they are completely rigorous). Econometrics is a set of tools we can use to confront theory with
real-world data.
The goal of an applied econometric study might be to test a hypothesis – for example, to
determine how much of the ‘gender pay gap’ can be explained by differences in education and
experience. Alternatively, a study could estimate a key parameter, such as the price elasticity of
demand for oil. Or econometric techniques could be used to generate forecasts, like the Bank of
England uses to determine the level that the base interest rate should be set each month.
Fast Facts
Econometrics was pioneered by Lawrence Klein, Ragnar Frisch and Simon Kuznets. All three
won the Nobel Prize in economics in 1971 for their contributions. Today, it is used regularly
among academics as well as practitioners such as Wall Street traders and analysts.
An example of the application of econometrics is to study the income effect using observable
data. An economist may hypothesize that as a person increases his income, his spending will also
increase. If the data show that such an association is present, a regression analysis can then be
conducted to understand the strength of the relationship between income and consumption and
whether or not that relationship is statistically significant - that is, it appears to be unlikely that it
is due to chance alone.
Note that you can have several explanatory variables in your analysis, for example changes to
GDP and inflation in addition to unemployment in explaining stock market prices. When more
than one explanatory variable is used, it is referred to as multiple linear regression - a model that
is the most commonly used tool in econometrics.
Several different regression models exist that are optimized depending on the nature of the data
being analyzed and the type of question being asked. The most common example is the ordinary
least-squares (OLS) regression, which can be conducted on several types of cross-sectional or
time-series data. If you're interested in a binary (yes-no) outcome - for instance, how likely you
are to be fired from a job (yes, you get fired, or no, you do not) based on your productivity - you
can use a logistic regression or a probit model. Today, there are hundreds of models that an
econometrician has at his disposal.
Econometrics is now conducted using statistical analysis software packages designed for these
purposes, such as STATA, SPSS, or R. These software packages can also easily test for
statistical significance to provide support that the empirical results produced by these models are
not merely the result of chance. R-squared, t-tests, p-values, and null-hypothesis testing are all
methods used by econometricians to evaluate the validity of their model results.
Limitations Of Econometrics
Econometrics is sometimes criticized for relying too heavily on the interpretation of raw data
without linking it to established economic theory or looking for causal mechanisms. It is crucial
that the findings revealed in the data are able to be adequately explained by a theory, even if that
means developing your own theory of the underlying processes.
Regression analysis also does not prove causation, and just because two data sets show an
association, it may be spurious: for example, drowning deaths in swimming pools increase with
GDP. Does a growing economy cause people to drown? Of course not, but perhaps more people
buy pools when the economy is booming. Econometrics is largely concerned with correlation
analysis; and remember, correlation does not equal causation.
Linear Regression
Linear regression is the starting point of econometric analysis. The linear regression model has a
dependent variable that is a continuous variable, while the independent variables can take any
form (continuous, discrete, or indicator variables). A simple linear regression model has only
one independent variable, while a multiple linear regression model has two or more independent
variables. The linear regression is typically estimated using OLS (ordinary least squares).
Examples include studying the effect of education on income; or the effect of recession on stock
returns.
Linear regression model: topics covered
Linear regression model
Coefficients and marginal effects
An error term is a variable in a statistical or mathematical model, which is created when the
model does not fully represent the actual relationship between the independent variables and the
dependent variables. ... The error term is also known as the residual, disturbance, or
remainder term.
Observational error (or measurement error) is the difference between a measured value of a
quantity and its true value. In statistics, an error is not a "mistake". Variability is an inherent
part of the results of measurements and of the measurement process.
Generally errors are classified into three types: systematic errors, random errorsand blunders.
Gross errors are caused by mistake in using instruments or meters,
calculating measurement and recording data results.Aug 20, 2014
Why do we include a disturbance term in a regression model? Disturbance is another name for
prediction error. Regression models rarely predict most of the variance in the dependent variable.
Consider this example:
Prediction error = effects of predictors not included in the regression equation + random
variance (inherent unpredictability) + “free will” (if you believe in that).
If the dependent variable is a behavior measure (such as frequency of helping), another possible
“cause” is “free will” (although some scientists would say there is no such thing as free will).
Failure to cooperate and do as expected is a problem in human research and can be a problem
even in animal studies,
The error term is also known as the residual, disturbance, or remainder term and is variously
represented in models by the letters e, ε, or u.
Y = dependent variable
a, ß = constants
X, ρ = independent variables
Ɛ = error term
When the actual Y differs from the expected or predicted Y in the model during an empirical
test, then the error term does not equal 0, which means there are other factors that influence Y.
Points that do not fall directly on the trend line exhibit the fact that the dependent variable, in this
case, the price, is influenced by more than just the independent variable, representing the passage
of time. The error term stands for any influence being exerted on the price variable, such as
changes in market sentiment.
The two data points with the greatest distance from the trend line should be an equal distance
from the trend line, representing the largest margin of error.
Key Takeaways
An error term appears in a statistical model, like a regression model, to indicate the
uncertainty in the model.
The error term is a residual variable that accounts for a lack of perfect goodness of fit.
Heteroskedastic refers to a condition in which the variance of the residual term, or error
term, in a regression model varies widely.
A linear regression exhibits less delay than that experienced with a moving average, as the line is
fit to the data points instead of based on the averages within the data. This allows the line to
change more quickly and dramatically than a line based on numerical averaging of the available
data points.