Indian Institute of Technology (BHU), Varanasi: Software Metrics (CS-7003) "Empirical Investigation "
Indian Institute of Technology (BHU), Varanasi: Software Metrics (CS-7003) "Empirical Investigation "
Software Metrics
(CS-7003)
“Empirical Investigation ”
3
Choosing an investigative
technique
A survey is a retrospective study of situation to try to
document relationships and outcomes
Survey is always done after an event is occurred
In software engineering survey are conducted by using
polls to determine which sets of data has got highest
majority of votes
Based on the majority voting the best method, techniques
, tool or relation is determined
4
Example
5
Survey is retrospective study which means we don’t have
any control of the situation
Both case studies and formal experiments are usually
not retrospective. We decide in advance what we want to
investigate and plan what data to capture to support
your investigation
A case study is a research technique where you identify
key factors that may effect the outcome of the activity
and then documents the activity like inputs, constraints
, output, and resources.
Formal experiments is a rigorous controlled investigation
of an activity, where key factors are identified and
manipulated to documents there effects on the outcome.
Formal experiments require great deal of control, they
tend to be small, involving small number of people or
events
6
Case studies generally look at typical projects, rather
then trying to capture information about all possible
cases.
Case study is thought of a “research in typical” where as
surveys is “research in large”
There are several guidelines which help what to follow
either surveys , formal experiments or case studies.
The first step is deciding whether the investigation is
retrospective or not
If the activity of your investigation is already occurred,
then you must perform survey or case study, if not
occurred you may choose between case study or formal
experiments.
The central factor is upon the level of control is required
for formal experiments.
7
If you have high level of control over the variables that
can effect the outcome , then u can consider an
experiment
If you do not have a control then case study is
preferred technique
8
Stating the hypothesis
After deciding how to investigate , the first step is what
to you want to investigate
This step help us to identify which type of research
technique is most appropriate for the situation
The goal of the research can be expressed as
hypothesis, that is what we want know from the
investigation
The hypothesis is a tentative theory which explains
what we want to explore
For example ,say method A provides better results
then method B
9
Maintaining control over variables
Once we have explicit hypothesis, then we can figure
out which variables control the truth.
After identifying the variables, we must decide how
much control we have on the variables
A state variable is a factor that can characterize your
projects, and influence your evaluation results
Sometimes state variables are called independent
variables, because they can be manipulated to effect of
the outcome
Dependent variable is one whose value is effected by
changing one or more independent variables
variables are selected in such a way that it should
exabits the characteristics of the project
10
• In the above example the case study might involve
choosing a language that is usually used in most of
the projects, rather than trying to choose a set of
projects to cover as many languages as possible.
• Thus a state variable is used to distinguish between
11
Making your investigation meaningful
Conforming theories and conventional wisdom
Conventional wisdom suggests that they are the
best approaches in software engineering
Many national or international standard follows
this approach
Many organization uses standard limits on
structural measures or thumb rule about module
size to assure the quality of the software
Based on the macab rule or module size must be
less than 200 lines of code
Case study and surveys can be used to check the
validity of the claims made with in the
organization
Formal experiments can be used to provide a
context in which certain standards ,methods, and
tools are recommended to use.
12
Exploring relationships
Software practitioners often interested in the
relationship among various attributes of resources and
software products
◦ how does the project team experience with the application area
effect the quality of the resulting code?
◦ How does the requirement quality effect the productivity of the
designers?
◦ How does the design structure effect the maintainability of the
code?
13
Evaluating the accuracy of the models
Models are always used to predict the outcome of the
activity or to guide the use of the methods or tools
For example, size model like functional points suggest
how large the code may be
Cost model predicts how the development or
maintenance cost would be
Capability maturity model guide the use of techniques
such as configuration management or the introduction
to testing tools and methods
Formal experiments can confirm or refute the accuracy
and dependability of the models
Models present a particular difficult problem when
design an experiment or case study, because
predictions effect their outcome
When predictions become goals and the developers
strive to achieve the goal intentional or not.
14
This effect is common when cost and schedule models
are used and project managers turn the predictions
into targets for completion.
For this reason experiments evaluating the models can
be designed as double blinded experiments, where the
participants do not know what the predictions until the
experiment is done
Some models like reliability models do not influence the
outcome, since reliability measured as mean time to
failure cannot be evaluated until the project is
completed
15
Validating measures
In general study can be conducted to test whether a
given measure appropriately reflects changes in the
attribute its supposed to capture
Validating measures is driven with problems
Validation is performed by corelating one measure with
another
It is very important to validate using second measure
which is itself a direct and valid measure of the factors
its measures
measurements must be confined to human notations
that is it should be expressed in mathematical
notations
This notations helps to preserve the relationship means
that the measure must be subjective or objective
16
Planning formal experiments
Formal experiments requires a great deal of care and
planning if they are to provide meaningful ,useful results
Procedure for performing experiments
◦ Conception
◦ Design
◦ Preparation
◦ Execution
◦ Analysis
◦ Dissemination and decision making
17
Conception
The first step is to decide what you want to learn and
define the goal of your experiment
The conception stage include the type of analysis required,
to ensure that the formal experiment is the most
appropriate research technique to use.
Next the objective of the study must be stated clearly and
precisely
The objective may include which tool or method is
superior in some way to another method or tool
One can also show the difference in the environmental
conditions or quality of resources effect the use or output
of the method or tool
The objective must be clearly evaluated at the end of the
experiment, that is it should be stated as the question you
want to answer
The next step is design of an experiment that will provide
the answer.
18
Design
Once the objective is clearly stated ,you must translate
the objective into formal hypothesis
There are two hypothesis usually a) Null hypothesis b)
experimental hypothesis(alternative hypothesis)
Null hypothesis is the one that assume there is no
significant difference between two treatments(methods,
tools, experiments, etc) with respect to dependent
variable you are measuring( such as productivity, cost,
or quality)
Alternative hypothesis on the other hand post the
significant difference between the two treatments
It is always easy to tell which is null and alternative
based on statistical assumption
The null hypothesis is assumed to be true unless the
data indicates otherwise
Testing the null hypothesis means determining whether
the data is convincing enough to reject the null
hypothesis and accept the alternative as true.
19
Preparations
◦ Preparation involves reading the subjects for the
applications of the treatment
◦ For example, preparation of the experiment may
involve purchasing tools, training staff, or configuring
hardware in the certain way
◦ Instructions must be written out or recorded properly
◦ To ensure the plan is complete and the instructions
are understandable dry run the experiment on small
set of people may be useful
20
Execution
Following the steps laid out in plan, and measuring the
attributes as prescribed in the plan, you apply the
treatment to the experimental subjects
To make the results sensible carefully the items must
be measured, and treatments are applied consistently
Finally, the experiments can be executed
Analysis
The analysis has two parts
First, you must review all the measurements taken to
make sure that they are valid or useful
Organize the measurements into sets of data that will
be examined as part of hypothesis testing process
Second, analyze the sets of data accordingly to the
statistical principals
These statistical test will tell whether the null
hypothesis is supported or refuted by the results of the
experiment
21
Dissemination and decision making
At the end of the analysis phase, a conclusion can be
made how different characteristics examined effected
the outcome
Document the conclusion that will allow others to
duplicate the experiment and confirm the conclusion in
similar testing
All the key aspects of the research like the objectives,
the hypothesis, and the experimental subjects and
objects, the response, the state variables, the
treatments and the resulting data must be documented
properly
You must state the conclusion clearly making sure to
address any problems experienced during the running
of the experiment
◦ Example staff change or toll upgrade must be noted
22
The experimental results must be used in three ways:
◦ First you may use them to support decision about how you will
develop and maintain software in future, what tools or methods
will be used and in what situation
◦ Second others may use your results to suggest changes to their
development environment
◦ Others likely to replicate your experiment to confirm the results on
their similar projects
◦ Third , you and others may perform similar experiments with
variations in experimental subjects or state variables
◦ These new experiments will help to understand how the results
are affected by carefully controlled changes.
◦ For example , if your experiment demonstrates a positive change
in the quality by using c++ program , others may test to see if the
quality can be improved still further by using c++ in concert with
any c++ related tool or in application domain
23
Principals of experimental design
Useful results depends upon careful, rigorous and
complete experimental design
Simple design help to make the experimental practical ,
minimizing the use of time, money, personnel, and
experimental resources
Simple design are easy to analyze
Experimental design has two important concepts:
◦ Experimental units
◦ Experimental errors
24
Experimental error describes the failure of two
identically treated experimental units to yield identical
results
The error reflect host of problems:
◦ Errors of experimentation
◦ Errors of observation
◦ Errors of measurements
◦ The variations in experimental resources
◦ The combined effect of all extraneous factors that can influence
the characteristic under study, but which will not be signaled out
for attention in the investigation
25
Eliminate the effects of other variables so that only the
so that only the only the effects of independent
variables are reflected in the values of the dependent
variables.
By doing this we are eliminating experimental errors
Complete elimination is not possible in reality
Experiments are designed in such a way that effects of
irrelevant variables are distributed across all the
experimental condition
The above mentioned problem can be addressed by
using Replication, randomization , and local control
26
Randomization
Replication makes possible a statistical test of the significance of
the result
But it does not insure the validity of the results
Randomization is the random assignments of the subjects to the
groups or of treatments to experimental units, to validated the
results
Randomization does not guarantee independence, but allows us to
assume the correlation of the any comparison of treatments as
small as possible
By randomly assigning treatments to experimental units, some
results can be kept from being biased by source of variations over
which we have no control
Example, sometimes the results of an experimental trail can be
effected by the time, the place or the unknown characteristics of the
participants
These uncontrollable factors can have effects that hide or skew the
results of the controllable varaibles
27
Replication
It is repetition of the basic experiment
Repetition is done under identical condition, rather
then repeating measurements on the same
experimental unit
Replication provides an estimate of experimental error
that acts as a basis for assessing the importance of
observed differences in an independent variable by
using statically techniques
Replication enables us to eliminate the mean effect of
any experimental factors
Need to avoid confounded effect in replication
Confounded effect is where it is impossible to separate
their effects when the subsequent analysis is performed
28
Local control
One of the key factors that distinguishes a formal
experiment from a case study is the degree of control
Local control is the aspect of the experimental design that
reflects how much control you have over the placements of
the subjects in experimental unit and the organization of
those units
Local control is usually discussed in terms of two
characteristics of the design: blocking and balancing of
the units
Blocking means allocating experimental units to blocks or
groups so the units within the block are relatively
homogeneous.
The blocks are designed so that the predictable variations
among units has been confounded with the effects of the
blocks
Experimental design captures the anticipated variations in
the blocks , so that variations does not contribute to the
experimental errors
29
Balancing is blocking and assigning of treatments so
that an equal number of subjects is assigned to each
treatment whenever possible
It is desirable because its simplifies the statistical
analysis
Design can range from completely balance or no
balance
Experiments investigating only one factor, blocking and
balancing plays an important role
Experimental design should include a description of the
block defined and allocation of treatments to each
30
Types of experimental design
There are many types of experimental design likely to
be used in software engineering research
Based on the type of design constraints, the type of
analysis can be performed and what type of
conclusions can be drawn it is useful to understand
several types of designs
For example s- statistics test for variance
The choice of calculation depends upon the
experimental design, including the number of variables
and in a way the subjects are grouped and balanced
Most design in software engineering research are based
on two simple relations between factors:
◦ Crossing
◦ nesting
31
Crossing
Expressing design in terms of factors called the
factorial design, tells you how many different
treatments combinations are required
Two factors A and B in the design are said to be
crossed if each level of each factor appears with each
level of other factors, denoted as A x B
Example of crossed design
32
In the figure ai represents levels of factor A
Bj represents levels of factor B
In the figure first row indicates treatment of level 1 of A
is occurring with level 1 of B, of level 1 of A is occurring
with level 2 of B, of level 1 of A is occurring with level 3
of B
The first column shows the treatment of level 1 of B
occurring with each of two levels of A
A crossed design with m levels of first factor and n
levels of second factor will have mn cells with each cell
representing particular situation
The above figure has 2 levels of A and 3 levels of B
results in 6 possible treatments in total
33
Nesting
factor B is nested with factor A if each meaningful
level of B occurs with only one level of factor A
The relationship is given as B(A), where B is nested
factor and A is nest factor
Example two factor nested
34
In the previous figure we have two levels of factor A and
three levels of factor B
Here B is dependent on A and each level of B occurs
with only one level of A, that is B is nested with A
35
36
Advantages of expressing design in terms of
factorials
Factorials insure that resources are used most
efficiently
Information obtained in the experiment is complete and
reflects the various possible interactions among
variables
The factorial design involves in implicit replication,
yielding the related benefit in terms of reduced
experimental error
37
Selecting an experimental design
Choosing the number of factors
Experiments involve one variable or factors
The more the comparison, complex the experiments are
to conduct
Comparison in the sense the effect of different language
used, or of several different tools
One variable experiment are simple to analyze because
the effects of single factor are isolated from the variable
that may effect the outcome
38
Example
No interaction between factors Interaction between factors
39
Factors versus blocks
How blocking can be used to improve experimental design
Blocking can be used only after deciding the number of
factors appropriate to our experiment
Do determine which approach is best blocking or factor
consider the basic hypothesis
The hypothesis is “ if we are interested in whether design
A is better than design B, then experience should be
considered as blocking variable”.
If we are interested in results of using design models A
and B are influenced by staff experience then experience
should be treated as factor.
Note : if not interested in interaction blocking is better, if
interaction are important , then multiple factors are
needed.
40
Guidelines for blocking
If you are deciding between two methods or tools, then
you should identify state variables that are likely to
effect the results and sample over those variables using
blocks to ensure an unbiased assignment of
experimental units to the alternative methods or tools
If you are deciding between methods or tools in a
variety of circumstances, then you should identify state
variables that define the different circumstances and
treat each variable as factor
Too keep it simple use block to eliminate bias and use
factors to distinguish case or circumstances
41
Choosing between nested and crossed
design
After selecting appropriate number of factors for
experiment to be conducted, we need to select a
structure that supports the investigation and answer
the questions
Example of crossed design for design methods and tool
usge
42
In the above example 12 projects are organized and
assign 3 projects randomly to each treatment in the
design
Nested design
43
Nested design is useful for investigating one factor with
two or more condition
Crossed design is useful for two factors looking each
with two or more factors, each within two or more
conditions
The more the factors , more complex the analysis
The next slide shows us how to choose a design in
software engineering
44
45
Fixed or random effects
Some factors allow us to have complete control over the
design
For example factor like programming language to be
used to design the system or word processor the system
is designed on
Factors like staff experience is not easy to control
The degree of control over factor level is an important
consideration in choosing an experimental design
A fixed effect model has factors levels or blocks that are
controlled
A random effect model has factor levels or blocks that
are random samples from a population of values
46
Matched or same subject design
Economy sometimes prevents us from using same type
of subject for each type of treatments in experimental
design
For example their might not be enough programmers to
participate in an experiment or deficiency in funds
Then we can use same subject for different treatments
For example we can same programmer to use tool A in
one situation and tool B in another situation
To design an experiment , you should decide how many
and what type of subjects you want to use
As a result one can use the same subject for one factor
but different subject for another factor to yield a mixed
between and with-in subjects design
47
Repeated measurements
Repeating measurement can be helpful in validating, by
assessing the error associated with the measurement
process
Below figure depicts the results of any experiment
involving one product and three developers
48
Each developer was asked to calculate the number of
function points in the product during development that
is at the time of specification, design and code
In the figure there are 3 points marked at each of these
3 estimation times
The horizontal variation indicates the variation over
time, while the vertical differences at each measurnents
time indicate the variation due to difference among the
developers
Repeated measurements add value to the experiment
but it is complex and requires good analysis.
The horizontal variation helps us to understand the
error about the line connecting the means at each
measurement time, and the vertical line helps us to
identify observational errors
49
Planning case studies
Every case study requires conception, design
,preparation, execution, analysis, dissemination, and
decision making
A case study compares one situation with the other
For example results of one method or tool with the other
To avoid bias and to be sure the hypothesis set is tested
properly the case study is organized in anyone of the 3
ways: sister project , baseline or random selection
Sister projects
The projects which are similar in terms of application
domain, implementation language, specification
language, specification technique and design method.
50
Baseline project
If there are no similar projects like sister projects then
new project can be tested with baseline projects
Here the organization collects data from various projects,
regardless of how one project is different from the other
Baseline is like average situation of the projects in the
organization
Random selection
Sometimes it is possible to portion a project into parts and
assign each portion to a new technique or to older one.
This case study is similar to formal experiment , because
you are taking advantage of randomization and replication
performing analysis.
This type of case study is useful for situation when the
method being study can take on variety of values
51
THANK YOU
QUERIES
52