BA unit3
BA unit3
2
Modeling Relationships and Trends in Data
• Mathematics and the descriptive properties of different functional
relationships are important in building predictive analytical models.
• Common types of mathematical functions used in predictive analytical
models include the following:
• Regression analysis is a tool for building mathematical and statistical models that
characterize relationships between a dependent variable (which must be a ratio
variable and not categorical) and one or more independent, or explanatory, variables,
all of which are numerical (but may be either ratio or categorical).
• Two broad categories of regression models are used often in business settings: (1)
regression models of cross-sectional data and (2) regression models of time-series
data, in which the independent variables are time or some function of time and the
focus is on predicting the future.
• Time-series regression is an important tool in forecasting.
Prof. S.Adinarayana, Dept of CS&SE, College of Engineering,
Andhra University
6
simple linear regression
Dr.S.Adinarayana,Professor,CS&SE,Andhra University 8
Least squares regression
• The mathematical basis for the best-fitting regression line is called least-
squares regression.
• In regression analysis, we assume that the values of the dependent
variable, Y, in the sample data, are drawn from some unknown population
for each value of the independent variable, X.
• Imagine we have a list of people’s study hours and test scores. In the
scatterplot, we can see a positive relationship exists between study time
and test scores. Statistical software can display the least squares regression
line and its equation.
•b is the y-intercept.
•m is the slope of the line.
The slope represents the mean change in the dependent variable for a
one-unit change in the independent variable.
• T-tests are statistical hypothesis tests that you use to analyze one
or two sample means.
• Depending on the t-test that you use, you can compare a sample
mean to a hypothesized value, the means of two independent
samples, or the difference between paired samples.
• t-Tests Use t-Values and t-Distributions to Calculate Probabilities.
The residual plot shows a fairly random pattern - the first residual is
positive, the next two are negative, the fourth is positive, and the last
residual is negative. This random pattern indicates that a linear model
provides a decent fit to the data.
27
2. Calculate Regression Sums
• The works out daily (exercise==1) describes everyone who doesn’t work out 2-3 times or
once a week and is therefore included in the α term.