2022 Emba 507
2022 Emba 507
Assignment No. 01
Chi-square Test: Test of Significance to determine the difference observed and expected frequencies of
certain observations.
a. ANOVA
There are two main types of ANOVA: (1) "one-way" ANOVA compares levels (i.e. groups) of a single
factor based on single continuous response variable (e.g. comparing test score by 'level of education') and
(2) a "two-way" ANOVA compares levels of two or more factors for mean differences on a single
continuous response variable (e.g. comparing test score by both 'level of education' and 'zodiac sign'). In
practice, you will see one-way ANOVAs more often and when the term ANOVA is generically used, it
often refers to a one-way ANOVA. Henceforth in this blog entry, I use the term ANOVA to refer to the
one-way flavor.
b. MANOVA
The obvious difference between ANOVA and a "Multivariate Analysis of Variance" (MANOVA) is the
“M”, which stands for multivariate. In basic terms, A MANOVA is an ANOVA with two or more
continuous response variables. Like ANOVA, MANOVA has both a one-way flavor and a two-way
flavor. The number of factor variables involved distinguish a one-way MANOVA from a two-way
MANOVA.
Correlation And Different Type Of Regression Analysis
c. Correlation Analysis
Correlation analysis is applied in quantifying the association between two continuous variables, for
example, a dependent and independent variable or among two independent variables.
d. Regression Analysis
Regression analysis refers to assessing the relationship between the outcome variable and one or more
variables. The outcome variable is known as the dependent or response variable and the risk elements,
and co-founders are known as predictors or independent variables. The dependent variable is shown by
“y” and independent variables are shown by “x” in regression analysis.
The sample of a correlation coefficient is estimated in the correlation analysis. It ranges between -1 and
+1, denoted by r and quantifies the strength and direction of the linear association among two variables.
The correlation among two variables can either be positive, i.e. a higher level of one variable is related to
a higher level of another or negative, i.e. a higher level of one variable is related to a lower level of the
other.
The sign of the coefficient of correlation shows the direction of the association. The magnitude of the
coefficient shows the strength of the association.
For example, a correlation of r = 0.8 indicates a positive and strong association among two variables,
while a correlation of r = -0.3 shows a negative and weak association. A correlation near to zero shows
the non-existence of linear association among two continuous variables.
i. Linear Regression
Linear regression is a type of model where the relationship between an independent variable and a
dependent variable is assumed to be linear. The estimate of variable “y” is obtained from an equation, y’-
y_bar = byx(x-x_bar) …… (1) and estimate of variable “x” is obtained through the equation x’-x_bar =
bxy(y-y_bar) …. (2). The graphical representation of linear equations on (1) & (2) is known as
Regression lines. These lines are obtained through the Method of Least Squares.
RIDIT Analysis
In statistics, ridit scoring is a statistical method used to analyze ordered qualitative measurements. The
tools of ridit analysis were developed and first applied by Bross, who coined the term "ridit" by analogy
with other statistical transformations such as probit and logit. A ridit describes how the distribution of the
dependent variable in row i of a contingency table compares relative to an identified distribution (e.g., the
marginal distribution of the dependent variable).
An artificial neural network is an attempt to simulate the network of neurons that make up a human brain
so that the computer will be able to learn things and make decisions in a humanlike manner. ANNs are
created by programming regular computers to behave as though they are interconnected brain cells.
Artificial neural networks are created to digitally mimic the human brain. They are currently used for
complex analyses in various fields, ranging from medicine to engineering, and these networks can be used
to design the next generation of computers
Conjoint Analysis
Conjoint analysis is a research technique used to quantify how people value the individual features of a
product or service. A conjoint survey question shows respondents a set of concepts, asking them to
choose or rank the most appealing ones.
Conjoint analysis is a survey-based statistical technique used in market research that helps determine how
people value different attributes (feature, function, benefits) that make up an individual product or
service.
The objective of conjoint analysis is to determine what combination of a limited number of attributes is
most influential on respondent choice or decision making. A controlled set of potential products or
services is shown to survey respondents and by analyzing how they make choices among these products,
the implicit valuation of the individual elements making up the product or service can be determined.
These implicit valuations (utilities or part-worth’s) can be used to create market models that estimate
market share, revenue and even profitability of new designs.
Canonical Correlation
Canonical correlation analysis is used to identify and measure the associations among two sets of
variables. Canonical correlation is appropriate in the same situations where multiple regression would be,
but where are there are multiple intercorrelated outcome variables. Canonical correlation analysis
determines a set of canonical variates, orthogonal linear combinations of the variables within each set that
best explain the variability both within and between sets.
Canonical correlation analysis is a method for exploring the relationships between two multivariate sets
of variables (vectors), all measured on the same individual.
Consider, as an example, variables related to exercise and health. On one hand, you have variables
associated with exercise, observations such as the climbing rate on a stair stepper, how fast you can run a
certain distance, the amount of weight lifted on bench press, the number of push-ups per minute, etc. On
the other hand, you have variables that attempt to measure overall health, such as blood pressure,
cholesterol levels, glucose levels, body mass index, etc. Two types of variables are measured and the
relationships between the exercise variables and the health variables are of interest.
Co-Integration
Co-integration tests identify scenarios where two or more non-stationary time series are integrated
together in a way that they cannot deviate from equilibrium in the long term. The tests are used to identify
the degree of sensitivity of two variables to the same average price over a specified period of time. In
other words, Co-integration is a technique used to find a possible correlation between time series
processes in the long term. The most popular co-integration tests include Engle-Granger, the Johansen
Test, and the Phillips-Ouliaris test.
MCDM analyses the criteria to determine whether each criterion is a favorable or unfavorable choice for
a particular application. It also attempts to compare this criterion, based on the selected criteria, against
every other available option in an attempt to assist the decision maker in selecting an option with the
minimal compromise and maximum advantages. The criteria used in the analyses of these criteria can be
either qualitative or quantitative criteria.
Data Mining
Data mining is the process of analyzing a large batch of information to discern trends and patterns. Data
mining can be used by corporations for everything from learning about what customers are interested in
or want to buy to fraud detection and spam filtering.
Data mining programs break down patterns and connections in data based on what information users
request or provide. Social media companies use data mining techniques to commodify their users in order
to generate profit. This use of data mining has come under criticism lately as users are often unaware of
the data mining happening with their personal information, especially when it is used to influence
preferences.
Cluster Analysis
Cluster analysis is a multivariate data mining technique whose goal is to groups objects (eg., products,
respondents, or other entities) based on a set of user selected characteristics or attributes. It is the basic
and most important step of data mining and a common technique for statistical data analysis, and it is
used in many fields such as data compression, machine learning, pattern recognition, information retrieval
etc.
Multi-Dimensional Scaling
Multidimensional scaling is a visual representation of distances or dissimilarities between sets of objects.
“Objects” can be colors, faces, map coordinates, political persuasion, or any kind of real or conceptual
stimuli (Kruskal and Wish, 1978). Objects that are more similar (or have shorter distances) are closer
together on the graph than objects that are less similar (or have longer distances). As well as interpreting
dissimilarities as distances on a graph, MDS can also serve as a dimension reduction technique for high-
dimensional data (Buja et. al, 2007).
The term scaling comes from psychometrics, where abstract concepts (“objects”) are assigned numbers
according to a rule (Trochim, 2006). For example, you may want to quantify a person’s attitude to global
warming. You could assign a “1” to “doesn’t believe in global warming”, a 10 to “firmly believes in
global warming” and a scale of 2 to 9 for attitudes in between. You can also think of “scaling” as the fact
that you’re essentially scaling down the data (i.e. making it simpler by creating lower-dimensional data).
Data that is scaled down in dimension keeps similar properties. For example, two data points that are
close together in high-dimensional space will also be close together in low-dimensional space (Martinez,
2005). The “multidimensional” part is due to the fact that you aren’t limited to two dimensional graphs or
data. Three-dimensional, four-dimensional and higher plots are possible.
Correspondence Analysis
Correspondence analysis reveals the relative relationships between and within two groups of variables,
based on data given in a contingency table. For brand perceptions, these two groups are brands and the
attributes that apply to these brands. For example, let’s say a company wants to learn which attributes
consumers associate with different brands of beverage products. Correspondence analysis helps measure
similarities between brands and the strength of brands in terms of their relationships with different
attributes. Understanding the relative relationships allows brand owners to pinpoint the effects of previous
actions on different brand related attributes, and decide on next steps to take.
Correspondence analysis is valuable in brand perceptions for a couple of reasons. When attempting to
look at relative relationships between brands and attributes, brand size can have a misleading effect;
correspondence analysis removes this effect. Correspondence analysis also gives an intuitive quick view
of brand attribute relationships (based on proximity and distance from origin) that isn’t provided by many
other graphs.
What sets time series data apart from other data is that the analysis can show how variables change over
time. In other words, time is a crucial variable because it shows how the data adjusts over the course of
the data points as well as the final results. It provides an additional source of information and a set order
of dependencies between the data.
Time series analysis typically requires a large number of data points to ensure consistency and reliability.
An extensive data set ensures you have a representative sample size and that analysis can cut through
noisy data. It also ensures that any trends or patterns discovered are not outliers and can account for
seasonal variance. Additionally, time series data can be used for forecasting—predicting future data based
on historical data.
Econometric Analysis
Econometrics is the use of statistical methods to develop theories or test existing hypotheses in economics
or finance. Econometrics relies on techniques such as regression models and null hypothesis testing.
Econometrics can also be used to try to forecast future economic or financial trends.
As with other statistical tools, econometricians should be careful not to infer a causal relationship from
statistical correlation. Some economists have criticized the field of econometrics for prioritizing statistical
models over economic reasoning.
Data Stationarity
A common assumption in many time series techniques is that the data are stationary.
A stationary process has the property that the mean, variance and autocorrelation structure do not change
over time. Stationarity can be defined in precise mathematical terms, but for our purpose we mean a flat
looking series, without trend, constant variance over time, a constant autocorrelation structure over time
and no periodic fluctuations (seasonality).
For practical purposes, stationarity can usually be determined from a run sequence plot.
Granger Causality
If you’ve explored the vector autoregressive literature, it is likely that you have come across the term
Granger causality. Granger causality is an econometric test used to verify the usefulness of one variable to
forecast another.
At this point, you may be asking yourself what does it mean for a variable to be “helpful” in forecasting?
In simple terms, a variable is “helpful” for forecasting, if when added to the forecast model, it reduces the
forecasting error.
In the context of the vector autoregressive models, a variable fails to Granger-cause another variable if its:
Lags are not statistically significant in the equation for another variable.
Past values aren’t significant in predicting the future values of another.
Vector Error Correction Model / Vector Auto Regression Model
(VEC/VAR);
The vector autoregressive (VAR) model is a general framework used to describe the dynamic
interrelationship among stationary variables. So, the first step in time-series analysis should be to
determine whether the levels of the data are stationary. If not, take the first differences of the series and
try again. Usually, if the levels (or log-levels) of your time series are not stationary, the first differences
will be.
If the time series are not stationary then the VAR framework needs to be modified to allow consistent
estimation of the relationships among the series. The vector error correction (VEC) model is just a special
case of the VAR for variables that are stationary in their differences (i.e., I(1)). The VEC can also take
into account any co-integrating relationships among the variables.
Delphi Technique
The Delphi method is a process used to arrive at a group opinion or decision by surveying a panel of
experts. Experts respond to several rounds of questionnaires, and the responses are aggregated and shared
with the group after each round.
The experts can adjust their answers each round, based on how they interpret the “group response”
provided to them. The ultimate result is meant to be a true consensus of what the group thinks.
Game Theory
Game theory is a theoretical framework to conceive social situations among competing players. The
intention of game theory is to produce optimal decision-making of independent and competing actors in a
strategic setting. Using game theory, real-world scenarios for such situations as pricing competition and
product releases (and many more) can be laid out and their outcomes predicted.
Scenarios include the prisoner's dilemma and the dictator game among many others. Different types of
game theory include cooperative/non-cooperative, zero-sum/non-zero-sum, and simultaneous/sequential.
Formal Logic
Formal logic is the abstract study of propositions, statements, or assertively used sentences and of
deductive arguments. The discipline abstracts from the content of these elements the structures or logical
forms that they embody. The logician customarily uses a symbolic notation to express such structures
clearly and unambiguously and to enable manipulations and tests of validity to be more easily applied.
Although the following discussion freely employs the technical notation of modern symbolic logic, its
symbols are introduced gradually and with accompanying explanations so that the serious and attentive
general reader should be able to follow the development of ideas.
Discrete Mathematics
Discrete mathematics is the study of mathematical structures that are countable or otherwise distinct and
separable. Examples of structures that are discrete are combinations, graphs, and logical statements.
Discrete structures can be finite or infinite. Discrete mathematics is in contrast to continuous
mathematics, which deals with structures which can range in value over the real numbers, or have some
non-separable quality.
There are various approaches to conducting thematic analysis, but the most common form follows a six-
step process: familiarization, coding, generating themes, reviewing themes, defining and naming themes,
and writing up.
Cybernetic Modeling
Cybernetics is a control theory as it is applied to complex systems. Cybernetics is associated with models
in which a monitor compares what is happening to a system at various sampling times with some standard
of what should be happening, and a controller adjusts the system’s behavior accordingly.
Simulations
A simulation is the imitation of the operation of a real-world process or system. The behavior of a system
is studied by generating an artificial history of the system through the use of random numbers. These
numbers are used in the context of a simulation model, which is the mathematical, logical and symbolic
representation of the relationships between the objects of interest of the system. After the model has been
validated, the effects of changes in the environment on the system, or the effects of changes in the system
on system performance can be predicted using the simulation model.
Linear Programming
Linear programming (LP) or Linear Optimization may be defined as the problem of maximizing or
minimizing a linear function that is subjected to linear constraints. The constraints may be equalities or
inequalities. The optimization problems involve the calculation of profit and loss. Linear programming
problems are an important class of optimization problems that helps to find the feasible region and
optimize the solution in order to have the highest or lowest value of the function.
In other words, linear programming is considered as an optimization method to maximize or minimize the
objective function of the given mathematical model with the set of some requirements which are
represented in the linear relationship. The main aim of the linear programming problem is to find the
optimal solution.
Linear programming is the method of considering different inequalities relevant to a situation and
calculating the best value that is required to be obtained in those conditions. Some of the assumptions
taken while working with linear programming are:
1. MPLUS
Mplus is a statistical modeling program that provides researchers with a flexible tool to analyze their data.
Mplus offers researchers a wide choice of models, estimators, and algorithms in a program that has an
easy-to-use interface and graphical displays of data and analysis results.
Mplus is a highly flexible, powerful statistical analysis software program that can fit an extensive variety
of statistical models using one of many estimators available. Perhaps its greatest strengths are in its
capabilities to model latent variables, both continuous and categorical, which underlie its flexibility.
Among the many models Mplus can fit are:
2. EViews
EViews, or Econometric Views, is the statistical package used for time series oriented econometric
analysis. With an aim to provide the best assistance to scholars, policy makers, government agencies and
academicians for econometrics analysis, we associate with economic experts, who are also trained
analysts and provide focused assistance for the application of EViews.
Since statistical analysis is used for economic decision making, our statisticians act as perfect guides for
doctoral candidates to help manage and analyse the data. Model simulation can be done and forecasts can
be generated by using the EViews quickly and effectively.
Due to its all-inclusive nature, it can be used for the following tasks:
Estimation
Forecasting
Simulation
Graphics
Statistical analysis
Data management
3. Primavera
Primavera is advanced software that is trusted by project managers and companies from different
industries globally. It provides sophisticated solutions to plan, manage and execute projects of any size
and scale. It increases project efficiency significantly by identifying bottlenecks and scheduler overruns.
Oracle Primavera® is used for major projects in industries such as engineering and construction,
aerospace and defense, utilities, oil and gas, chemicals, industrial manufacturing, automotive, financial
services, communications, travel and transportation, healthcare, and government.
4. Lingo
LINGO is a comprehensive tool designed to make building and solving Linear, Nonlinear (convex &
nonconvex/Global), Quadratic, Quadratically Constrained, Second Order Cone, Semi Definite, Stochastic,
and Integer optimization models faster, easier and more efficient. LINGO provides a completely
integrated package that includes a powerful language for expressing optimization models, a full featured
environment for building and editing problems, and a set of fast built-in solvers. The recently released
LINGO 20 includes a number of significant enhancements and new features.
5. Lindo
linear, nonlinear, integer, stochastic and global programming solvers have been used by thousands of
companies worldwide to maximize profit and minimize cost on decisions involving production planning,
transportation, finance, portfolio allocation, capital budgeting, blending, scheduling, inventory, resource
allocation and more.
6. Mendeley
Mendeley Reference Manager is a free web and desktop reference management application. It helps you
simplify your reference management workflow so you can focus on achieving your goals. With Mendeley
Reference Manager you can: Store, organize and search all your references from just one library.
7. Visio
With Visio on your PC or mobile device, you can: Organize complex ideas visually. Get started with
hundreds of templates, including flowcharts, timelines, floor plans, and more. Add and connect shapes,
text, and pictures to show relationships in your data.
8. MATLAB
MATLAB® is a programming platform designed specifically for engineers and scientists to analyze and
design systems and products that transform our world. The heart of MATLAB is the MATLAB language,
a matrix-based language allowing the most natural expression of computational mathematics.
MATLAB is a high-performance language for technical computing. It integrates computation,
visualization, and programming in an easy-to-use environment where problems and solutions are
expressed in familiar mathematical notation. Typical uses include:
9. AMOS
AMOS is statistical software and it stands for analysis of a moment structures. AMOS is an added SPSS
module, and is specially used for Structural Equation Modeling, path analysis, and confirmatory factor
analysis. It is also known as analysis of covariance or causal modeling software.
These are two different softwares in nature. If you want to develop a new theory (exploratory research),
then Smart PLS is preferred and if you want to test a theory (confirmatory research), then AMOS will be
better choice. Look up your objectives of research that will assist you in deciding.
11. LISREL
LISREL is statistical software that is used for structural regression modeling. Structural equation models
are the system of linear equations. LISREL is the simultaneous estimation of the structural model and
measurement model. Structural model assumes that all variables are measured without error.
measurement models,
structural equation models based on continuous or ordinal data,
multilevel models for continuous and categorical data using a number of link functions,
generalized linear models based on complex survey data.
Additional statistical analyses than can be performed include, to name a few:
exploratory factor analysis (EFA),
multivariate analysis of variance (MANOVA),
logistic and probit regression,
cenored regression,
survival analysis.