Stata....Basic Note

Sustainability

Uploaded by

abiydemisse1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Stata....Basic Note

Sustainability

Uploaded by

abiydemisse1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Basic Note

What is the difference between a histogram

and a bar graph?
• Although histograms and bar charts use a column-based
display, they serve different purposes.
• A bar graph is used to compare discrete or categorical
variables in a graphical format whereas a histogram depicts the
frequency distribution of variables in a dataset.
• Histograms visualize quantitative data or numerical data,
whereas bar charts display categorical variables.
• In most instances, the numerical data in a histogram will be
continuous (having infinite values).
Bar charts
• A bar chart or a bar graph is a type of data visualization used to
compare discrete data categories or data groups.
• They are best for those cases when you need data in separate, non-
adjacent horizontal bars (=bar chart) or vertical columns (=column
chart).
• The reason is that data visualized in separate columns is easy to
compare. This is why bar charts are commonly used for nominal and
categorical data, eg. product categories, cities, months, countries,
and similar discrete values.
• Bar charts are mainly used when you want to compare or contrast
discrete data categories or groups. Bar charts are commonly used in
nominal or categorical data, e.g. different categories of data in
products, cities, or months.
• Bar charts usually represent categorical variables, discrete variables
or continuous variables in class interval groups.
Histogram

• A histogram is a data visualization type designed to show the

distribution of interval or continuous data. In histograms, data is
shown in the form of contiguous bars, where each bar corresponds
to a data range or a bin.
• You would use a histogram when you want to visualize the
frequency or count of data points within each of those data ranges
and understand how the data is distributed.
• There are two axes on a histogram.
• The horizontal axis (x-axis) shows the range of values or bins into
which the data is divided. Each horizontal bar represents a range of
bins or data values.
• The vertical axis (y-axis) is the frequency or count of data points
that belong to each data range or bin on the x-axis.
Pie Chart
• A pie chart is a type of graph that represents the data in the
circular graph. The slices of pie show the relative size of the
data, and it is a type of pictorial representation of data.
• A pie chart requires a list of categorical variables and
numerical variables. Here, the term “pie” represents the whole,
and the “slices” represent the parts of the whole.
• The “pie chart” is also known as a “circle chart”, dividing the
circular statistical graphic into sectors or sections to illustrate
the numerical problems. Each sector denotes a proportionate
part of the whole.
• To find out the composition of something, Pie-chart works the
best at that time. In most cases, pie charts replace other graphs
like the bar graph, line plots, histograms, etc.
Multiple Regression Explanation,
Assumptions, and Interpretation
• There are many types of regression models; but, here we will
deal only with some three types of regression models.
1. Simple regression model
2. Multiple regression model
3. Multivariate regression model
1. Simple regression model: Simple regression model is a
statistical equation that characterizes the relationship between a
dependent variable and only one independent variable.
2. Multiple regression model: Multiple regression model is a
mathematical model that characterizes the relationship between a
dependent variable & two or more independent variables.
Cont’d…
• Multivariate regression model is algebraic system of
equations that characterizes the relationship among more than
one dependent variable & one or more independent variables
through a set of statistical regression models.
What's your approach to interpreting
regression analysis results?
• Regression analysis is a powerful tool for data analysis that
allows you to explore the relationship between a dependent
variable and one or more independent variables.
• However, interpreting the results of a regression analysis can
be challenging, especially if you are not familiar with the
assumptions, limitations, and pitfalls of the method.
• In this course, you will learn a practical approach to
interpreting regression analysis results, based on four key
steps: checking the model fit, examining the coefficients,
testing the hypotheses, and assessing the validity.
1. Check the model fit

• The first step in interpreting regression analysis results is to

check how well the model fits the data.
• This means evaluating how closely the predicted values match
the observed values, and how much of the variation in the
dependent variable is explained by the independent variables.
• There are several statistics that can help you assess the model
fit, such as R-squared, adjusted R-squared, standard error, F-
test, and residuals.
• You should look for a high R-squared, a low standard error, a
significant F-test, and normally distributed residuals with no
outliers or patterns.
2. Examine the coefficients

• The second step in interpreting regression analysis results is to

examine the coefficients of the independent variables.
• The coefficients tell you the direction and magnitude of the
effect of each independent variable on the dependent variable,
holding all other variables constant.
• You should pay attention to the sign, size, and significance of
the coefficients, and compare them with your expectations and
prior knowledge.
• You should also look for any signs of multicollinearity, which
is a situation where two or more independent variables are
highly correlated and affect the reliability of the coefficients.
3. Test the hypotheses
• The third step in interpreting regression analysis results is to test the
hypotheses that you have formulated before conducting the
analysis.
• The hypotheses are statements about the relationship between the
dependent variable and the independent variables, such as whether
there is a positive or negative effect, or whether there is a difference
between groups or levels.
• To test the hypotheses, you need to look at the p-values and
confidence intervals of the coefficients, and compare them with a
significance level that you have chosen.
• The p-value tells you the probability of observing a coefficient as
extreme or more extreme than the one obtained, assuming that there
is no effect. The confidence interval tells you the range of values
that contain the true coefficient with a certain level of confidence.
• You can reject a hypothesis if the p-value is lower than the
significance level(5%,10%), or if the confidence interval does not
include zero.
4. Assess the validity
• The fourth and final step in interpreting regression analysis results
is to assess the validity of the model and the assumptions that
underlie it.
• The validity refers to how well the model represents the true
relationship between the variables, and how generalizable and
robust it is to different situations and data sets.
• To assess the validity, you need to check whether the assumptions
of the regression method are met, such as linearity, independence,
homoscedasticity, and normality. You can use various diagnostic
tests and plots to check these assumptions, and apply appropriate
transformations or corrections if they are violated.
• You should also consider any potential confounding factors,
omitted variables, or endogeneity issues that might bias the results,
and address them with suitable methods, such as adding control
variables, using instrumental variables, or applying fixed effects.
Key Difference Between R-squared and
Adjusted R-squared for Regression Analysis
R-Squared
• R-squared measures the proportion of the variance in the
dependent variable explained by the independent variables in the
model.
• It ranges from 0 to 1, where 0 indicates that the model does not
explain any variability, and one indicates that it explains all the
variability.
• Higher R-squared values suggest a better fit, but it doesn’t
necessarily mean the model is a good predictor in an absolute
sense.
• R-squared is a goodness-of-fit measure that tends to reward you
for including too many independent variables in a regression
model, and it doesn’t provide any incentive to stop adding more.
Some Problems with R-squared
• Unfortunately, there are yet more problems with R-squared that
we need to address.
• Problem 1: R-squared increases every time you add an
independent variable to the model. The R-squared never
decreases, not even when it’s just a chance correlation between
variables. A regression model that contains more independent
variables than another model can look like it provides a better fit
merely because it contains more variables.
• Problem 2: When a model contains an excessive number of
independent variables and polynomial terms, it becomes overly
customized to fit the peculiarities and random noise in your
sample rather than reflecting the entire population. Statisticians
call this overfitting the model, and it produces deceptively high R-
squared values and a decreased capability for precise predictions.
• Fortunately for us, adjusted R-squared and predicted R-squared
address both of these problems.
Cont’d…
Adjusted R-Squared
• Adjusted R-squared addresses a limitation of Adjusted R
Squared, especially in multiple regression (models with more
than one independent variable).
• While R-squared tends to increase as more variables are added to
the model (even if they don’t improve the model significantly),
Adjusted r squared vs adjusted r squared penalizes the addition
of unnecessary variables.
• It considers the number of predictors in the model and adjusts R-
squared accordingly. This adjustment helps to avoid overfitting,
providing a more accurate measure of the model’s goodness of
fit.
• Use adjusted R-squared to compare the goodness-of-fit for
regression models that contain differing numbers of independent
variables.
Comparison

• R-squared will stay the same when adding more predictors,

even if they are not contributing meaningfully. It may give a
falsely optimistic view of the model.
• Adjusted R-squared is more conservative and will decrease if
additional variables do not contribute to the model’s
explanatory power.
• As a rule of thumb, a higher R-squared or Adjusted r squared
vs adjusted r squared is desirable, but it’s crucial to consider
the context of the specific analysis and the trade-off between
model complexity and explanatory power
Cont’d…
• Let’s say you are comparing a model with five independent
variables to a model with one variable and the five variable
model has a higher R-squared. Is the model with five variables
actually a better model, or does it just have more variables? To
determine this, just compare the adjusted R-squared values!
• The adjusted R-squared adjusts for the number of terms in the
model. Importantly, its value increases only when the new term
improves the model fit more than expected by chance alone. The
adjusted R-squared value actually decreases when the term
doesn’t improve the model fit by a sufficient amount.
• The example below shows how the adjusted R-squared
increases up to a point and then decreases. On the other hand, R-
squared blithely increases with each and every additional
independent variable.

Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models
From Everand
Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models
Jim Frost
5/5 (4)
meWeek 3
No ratings yet
meWeek 3
57 pages
Graphic Reviewer
No ratings yet
Graphic Reviewer
7 pages
Introduction To Data Analysis: Professor David Richardson IIT Stuart School of Business
No ratings yet
Introduction To Data Analysis: Professor David Richardson IIT Stuart School of Business
31 pages
Simple Linear Regression (1)
No ratings yet
Simple Linear Regression (1)
83 pages
20230305slides
No ratings yet
20230305slides
39 pages
Quality Trainer Content Outline
0% (1)
Quality Trainer Content Outline
4 pages
Chapter 14 - Analyzing Quantitative Data
No ratings yet
Chapter 14 - Analyzing Quantitative Data
8 pages
Big Data - Sources and Opportunities
No ratings yet
Big Data - Sources and Opportunities
30 pages
Data Science Q&A - Latest Ed (2020) - 3 - 1
No ratings yet
Data Science Q&A - Latest Ed (2020) - 3 - 1
2 pages
Data Analysis with SPSS 2nd Edition Stephen A. Sweet pdf download
No ratings yet
Data Analysis with SPSS 2nd Edition Stephen A. Sweet pdf download
64 pages
Screenshot 2024-12-15 at 8.15.38 PM
No ratings yet
Screenshot 2024-12-15 at 8.15.38 PM
138 pages
Graphical Presentation
No ratings yet
Graphical Presentation
6 pages
Content Outline: Chapter 1: Descriptive Statistics and Graphical Analysis
50% (2)
Content Outline: Chapter 1: Descriptive Statistics and Graphical Analysis
4 pages
Data Analysis and Interpretation
No ratings yet
Data Analysis and Interpretation
24 pages
Data Analysis Guide
No ratings yet
Data Analysis Guide
4 pages
Ch-5
No ratings yet
Ch-5
26 pages
YMS Topic Review (Chs 1-8)
No ratings yet
YMS Topic Review (Chs 1-8)
7 pages
Day 3 Statistics Interview QnA
No ratings yet
Day 3 Statistics Interview QnA
5 pages
(eBook PDF) Intermediate Social Statistics: A Conceptual and Graphic Approach instant download
100% (2)
(eBook PDF) Intermediate Social Statistics: A Conceptual and Graphic Approach instant download
25 pages
Quant
No ratings yet
Quant
31 pages
data screening and main model analysis in spss
No ratings yet
data screening and main model analysis in spss
26 pages
It0089 Finalreviewer
100% (1)
It0089 Finalreviewer
143 pages
3. Variables & Chart
No ratings yet
3. Variables & Chart
60 pages
Buy ebook Data Analysis with SPSS 2nd Edition Stephen A. Sweet cheap price
100% (5)
Buy ebook Data Analysis with SPSS 2nd Edition Stephen A. Sweet cheap price
51 pages
Module 3 - Regression and Correlation Analysis
No ratings yet
Module 3 - Regression and Correlation Analysis
54 pages
Corr_Regression Analysis
No ratings yet
Corr_Regression Analysis
19 pages
Appendix 1 Basic Statistics: Summarizing Data
No ratings yet
Appendix 1 Basic Statistics: Summarizing Data
9 pages
Basics
No ratings yet
Basics
8 pages
Practical Data Analysis With JMP
No ratings yet
Practical Data Analysis With JMP
8 pages
Midterm 2 Nem Veg Leges
No ratings yet
Midterm 2 Nem Veg Leges
9 pages
Algebra 1 Unit 6 Describing Data Notes
No ratings yet
Algebra 1 Unit 6 Describing Data Notes
13 pages
Descriptive Analysis
No ratings yet
Descriptive Analysis
35 pages
Unit-III (Data Analytics)
100% (1)
Unit-III (Data Analytics)
15 pages
Chapter 1
No ratings yet
Chapter 1
25 pages
Content Outline: Chapter 1: Descriptive Statistics and Graphical Analysis
No ratings yet
Content Outline: Chapter 1: Descriptive Statistics and Graphical Analysis
4 pages
CORRELATION
No ratings yet
CORRELATION
10 pages
Chapter 5 Data Analysis Ab
No ratings yet
Chapter 5 Data Analysis Ab
56 pages
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
No ratings yet
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
11 pages
(eBook PDF) Intermediate Social Statistics: A Conceptual and Graphic Approach download
No ratings yet
(eBook PDF) Intermediate Social Statistics: A Conceptual and Graphic Approach download
55 pages
Advance Business Research Methods
No ratings yet
Advance Business Research Methods
38 pages
What Is Statistics
No ratings yet
What Is Statistics
20 pages
Group 8 Prepared Notes For Statistics Presentation
No ratings yet
Group 8 Prepared Notes For Statistics Presentation
8 pages
1st Unit Notes
No ratings yet
1st Unit Notes
22 pages
Module 3 - Data Visualization 1
No ratings yet
Module 3 - Data Visualization 1
69 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
46 pages
ASM using r 2 marks answer Keys
No ratings yet
ASM using r 2 marks answer Keys
10 pages
Regression
No ratings yet
Regression
86 pages
3-4-RESEARCH-8-2
No ratings yet
3-4-RESEARCH-8-2
54 pages
Second Stats Packet 24
No ratings yet
Second Stats Packet 24
100 pages
LO3 - TASK 2&3: Statistics and Financial Decisions
No ratings yet
LO3 - TASK 2&3: Statistics and Financial Decisions
10 pages
Regression Using Excel
No ratings yet
Regression Using Excel
18 pages
Day 01-Basic Statistics
No ratings yet
Day 01-Basic Statistics
36 pages
statistics-concept-review
No ratings yet
statistics-concept-review
54 pages
Multivariant Data.
No ratings yet
Multivariant Data.
36 pages
Midterms Gec Math Adooooor
No ratings yet
Midterms Gec Math Adooooor
6 pages
Ba Explaination
No ratings yet
Ba Explaination
5 pages
AP Statistics Michel Liao
No ratings yet
AP Statistics Michel Liao
20 pages
Data Analysis-Univariate & Bivariate
No ratings yet
Data Analysis-Univariate & Bivariate
9 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Ams Lab3
No ratings yet
Ams Lab3
6 pages
AND002 Endesa Concrete Posts
No ratings yet
AND002 Endesa Concrete Posts
23 pages
Hardware Compatibility: List of Macos Versions, The Supported Systems On Which They Run, and Their Ram Requirements
No ratings yet
Hardware Compatibility: List of Macos Versions, The Supported Systems On Which They Run, and Their Ram Requirements
1 page
Final Css Manual
No ratings yet
Final Css Manual
50 pages
New GA700 - Kaepc71061700e - 4 - 0
No ratings yet
New GA700 - Kaepc71061700e - 4 - 0
74 pages
Principles of Naval Architecture Vol I - Stability and Strength PDF
No ratings yet
Principles of Naval Architecture Vol I - Stability and Strength PDF
316 pages
Majmundar2018 PDF
No ratings yet
Majmundar2018 PDF
7 pages
Ab Induction
No ratings yet
Ab Induction
2 pages
TGSS Admission Form
No ratings yet
TGSS Admission Form
4 pages
Lab 2
No ratings yet
Lab 2
9 pages
HT-ST-Manual-V1
No ratings yet
HT-ST-Manual-V1
11 pages
Direct To Honeywell
No ratings yet
Direct To Honeywell
11 pages
What Are JOINS?
No ratings yet
What Are JOINS?
9 pages
Delos Reyes - Precious Melody-Part 2
No ratings yet
Delos Reyes - Precious Melody-Part 2
16 pages
Chapter 12 - Servicing: BHT-212-MM-2
No ratings yet
Chapter 12 - Servicing: BHT-212-MM-2
24 pages
Unit 3 Digital Documentation
No ratings yet
Unit 3 Digital Documentation
4 pages
DBMS Unit 1
No ratings yet
DBMS Unit 1
16 pages
BrickGun LEGO Storage Labels - (Standard) Bricks 1xN
No ratings yet
BrickGun LEGO Storage Labels - (Standard) Bricks 1xN
8 pages
GPS Tracker Motor Waterproof Manual - All
100% (1)
GPS Tracker Motor Waterproof Manual - All
4 pages
Arabic Typesetting (OpenType) PDF
No ratings yet
Arabic Typesetting (OpenType) PDF
1 page
Needs Analysis For Contextualization Schools 2
No ratings yet
Needs Analysis For Contextualization Schools 2
1 page
Reflection Paper On Email Communication Everinda Putri
No ratings yet
Reflection Paper On Email Communication Everinda Putri
7 pages
Reaction Paper
No ratings yet
Reaction Paper
1 page
CDS13181 M181 Ecu
No ratings yet
CDS13181 M181 Ecu
10 pages
T230C User Manual
No ratings yet
T230C User Manual
7 pages
YT347_YT359_OM_PREVIEW
No ratings yet
YT347_YT359_OM_PREVIEW
6 pages
Graph Theory - Coverings - Tutorialspoint
No ratings yet
Graph Theory - Coverings - Tutorialspoint
5 pages
Gfps Us Catalog Contain It Secondary Containment en
No ratings yet
Gfps Us Catalog Contain It Secondary Containment en
23 pages
Dormant Account Reactivation Form 1
No ratings yet
Dormant Account Reactivation Form 1
2 pages
SVC Imagepress v1000 Ced Rev2
No ratings yet
SVC Imagepress v1000 Ced Rev2
123 pages

Stata....Basic Note

Uploaded by

Stata....Basic Note

Uploaded by

Basic Note

What is the difference between a histogram

• A histogram is a data visualization type designed to show the

• The first step in interpreting regression analysis results is to

• The second step in interpreting regression analysis results is to

• R-squared will stay the same when adding more predictors,

You might also like