Research Methods
Research Methods
penned by José Nelson Amaral with significant contributions from Michael Buro,
Renee Elio, Jim Hoover, Ioanis Nikolaidis, Mohammad Salavatipour,
Lorna Stewart, and Ken Wong
Computing Science researchers use several methodologies to tackle questions within the
discipline. This discussion starts by listing several of these methodologies. The idea is not
to classify researchers or projects in each of these methodologies or to be exhaustive. Tasks
performed by a single researcher fall within different methodologies. Even the activities
required to tackle a single research question may include several of these methodologies.
1 Methodologies
The following list of methodologies is intended to organize the discussion of the approach
required by each of them.
Formal In Computing Science, formal methodologies are mostly used to prove facts about
algorithms and system. Researchers may be interested on the formal specification of
a software component in order to allow the automatic verification of an implementa-
tion of that component. Alternatively, researchers may be interested on the time or
space complexity of an algorithm, or on the correctness or the quality of the solutions
generated by the algorithm.
Experimental Experimental methodologies are broadly used in CS to evaluate new solu-
tions for problems. Experimental evaluation is often divided into two phases. In an
exploratory phase the researcher is taking measurements that will help identify what
are the questions that should be asked about the system under evaluation. Then an
evaluation phase will attempt to answer these questions. A well-designed experiment
will start with a list of the questions that the experiment is expected to answer.
Build A “build” research methodology consists of building an artifact — either a physical
artifact or a software system — to demonstrate that it is possible. To be considered
research, the construction of the artifact must be new or it must include new features
that have not been demonstrated before in other artifacts.
Process A process methodology is used to understand the processes used to accomplish
tasks in Computing Science. This methodology is mostly used in the areas of Software
Engineering and Man-Machine Interface which deal with the way humans build and use
computer systems. The study of processes may also be used to understand cognition
in the field of Artificial Intelligence.
Model The model methodology is centered on defining an abstract model for a real system.
This model will be much less complex than the system that it models, and therefore
will allow the researcher to better understand the system and to use the model to
perform experiments that could not be performed in the system itself because of cost
or accessibility. The model methodology is often used in combination with the other
four methodologies. Experiments based on a model are called simulations. When a
formal description of the model is created to verify the functionality or correctness of
a system, the task is called model checking.
In the rest of this document we will attempt to provide advice about each of these
methodologies. But first, lets consider some general advice for research. New and insightful
ideas do not just happen on an empty and idle mind. Insightful research stems from our
interaction with other people that have similar interests. Thus it is essential for all researchers
to be very involved in their communities, to attend seminars, participate in discussions, and,
most importantly, to read widely in the discipline.
It is also important to keep a record of our progress and to make notes of the ideas that
we have along the way. The brain has an uncanny nag to come back to ideas that we have
considered in the past — and often we do not remember what were the issues that led us
not to pursue that idea at that time. Thus a good record keeping system — a personal blog,
a notebook, a file in your home directory — is a very important research tool. In this log we
should make notes of the papers that we read, the discussions that we had, the ideas that
we had, and the approaches that we tried.
Once a student starts having regular meetings with a thesis supervisor, it is a good idea
to write a summary of each of these meetings. The student may send the summaries to the
supervisor or may keep it to herself. After several months, revisiting these meeting logs will
help review the progress and reassess if she is on track with her plans towards graduation.
In research, as in other areas of human activities, an strategy to get things done is
as important as great visions and insightful ideas. Whenever working with others, it is
important to define intermediate milestones, to establish clear ways to measure progress
at such milestones and to have clear deadlines for each of them. Collaborators will be
multitasking and will dedicate time to the tasks that have a clear deadline.
2
hard is it to solve? Given a computational model, what are its limitations? Given a formal-
ism, what can it express?
TCS is not only concerned with what is doable today but also with what will be possible
in the future with new architectures, faster machines, and future problems. For instance
Church and Turing gave formalisms for computation before general-purpose computers were
built.
TCS researchers work on the discovery of more efficient algorithms in many areas includ-
ing combinatorial problems, computational geometry, cryptography, parallel and distributed
computing. They also answer fundamental questions about computability and complexity.
They have developed a comprehensive theoretical frame to organize problems into complex-
ity classes, to establish lower bounds for time and space complexity for algorithms, and to
investigate the limits of computation.
The best practical advice for new researchers in the area of formal research methods is to
practice solving problems and to pay attention to detail. The general advice for researchers
in computing science, know the literature, communicate with colleagues in the area, ask
questions, think, applies to formal method research as well. Problem solving can be risky
but also very rewarding. Even if you don’t solve your original problem, partial results can
lead to new and interesting directions.
The skills and the background knowledge that formal method researchers find useful
include: problem-solving, mathematical proof techniques, algorithm design and analysis,
complexity theory, and computer programming.
3
Thus experimental computing science would greatly benefit if each experimental computing
scientist would treat her experiment with the same care that a biologist treats a slow-growing
colony of bacteria. Annotating, filing, and documenting are essential for the future relevance
of an experimental scientist’s work.
Design the software system . No matter how simple the system is, do not allow it to
evolve from small pieces without a plan. Think before you build. Most importantly,
consider a modular approach - it simplifies testing. Testing is also simplified by choos-
ing text-based data and communication formats. Defining small interfaces increases
flexibility and reuse potential.
4
Reuse components . Are some needed software components already (freely) available? If
yes, using such components can save time. When deciding which components to reuse
consider the terms of use attached with them. The components that you reuse in the
system can have implications on the software license under which the new system can
be distributed. For instance, if a component distributed under the GNU Public License
(GPL) is used in a software system, the entire system will have to be distributed under
GPL.
Choose an adequate programming language . Often researchers want to use a pro-
gramming language that they already know to minimize the time invested on learning
a new language. However it may pay off to learn new languages that are more adequate
for the building of an specific system. Important factors to consider when selecting
a programming language include: required run-time speed (compiled vs. interpreted
languages), expressiveness (imperative vs. functional vs. declarative languages), relia-
bility (e.g. run-time checks, garbage collection), and available libraries.
Consider testing all the time . Don’t wait to test the entire system after it is built. Test
modules first. Keep a set of input/output pairs around for testing. This way future
changes can be tested when they are introduced. Consider building an automated
testing infrastructure that compares the program’s output on a set of input data with
correct outputs and also measures run time. Set this automated testing infrastructure
to run automatically daily/weekly to notify the builders about troublesome changes
immediately.
Documentation is crucial in any software system. Good computer programs must be well
documented. Supervisors, outside users, and fellow students who may extend the system in
the future need to be able to understand the code without much trouble. Even when there is
a single developer there are advantages to use of a version control system, such as Concurrent
Versions System (CVS). CVS gives the developer, and anyone that needs casual access to
the code and documentation, easy access to the set of current files from multiple locations.
Moreover, CVS allows access to previous versions in case of changes that introduce bugs.
Once the software system is functional, researchers should compare its functionality
and/or performance with that of existing systems to verify that the claim(s) that they
want to make about the system still hold. Often, the runtime/space requirements dependent
on input size are reported on commonly used test sets. Architecture-independent measures,
such as the number of nodes visited on a graph per unit of time, should be reported based on
wall-clock time or actual memory consumption to simplify comparison with other systems.
Results should be reported using statistics - such as percentiles (e.g. quartiles min 25%
median 75% max) - that don’t depend on unjustified distribution assumptions.
5
systems — large or small, the design and evaluation of human-computer interactions, and
the understanding of cognitive processes. More recently the creation of interactive games
has been studied extensively. This activities often involve studies with human subjects.
6
aimed at hypothesis testing. For instance, psychologists describe “anchor effects” that may
impact how a person chooses a rating on a rating scale to make a preference judgment. The
psychometric literature should be explored before designing a study that employs surveys
or interviews. Natural sources of information range from the web to meeting with someone
from the Psychology Department for advice, especially if the data produced by the survey is
a crucial part of the research. The University of Alberta has a Population Research Lab that
offers consultation on the design of reliable surveys. It may be useful to contact someone in
that centre (there may be a fee, so discuss this with your supervisor).
Once results are collected from a properly design survey or study, a researcher should
ensure that appropriate statistical techniques are used for the analysis of these results.
7
computational process typically leads to predictions that themselves are verifiable in other
experiments. In fact, a computational model that does not — as a by product — leads
to new testable hypotheses is not very interesting or useful to the broader enterprise of
understanding a complex system.
So how does one get around writing a one-off computer program to model some particular
set of results? First, decide what kind of computational paradigm to operate in. This is
a personal choice. You can operate with a neural net paradigm, you can operate within
a higher-level architecture paradigm. Adopt a paradigm and situate your work within the
community that uses that paradigm to increase the likelihood that the new research results
produced will actually contribute to some larger understanding about an issue.
8
and quantitative aspects of a system, while simulation is open ended. Simulation often lacks
the power to make definite statements about properties of the system. For instance,the re-
sults of simulations may not be used to prove that a deadlock never develops in a concurrent
system.
The best approach for new researchers in areas where models are required is to study
publications that disclose the full details of the model used, or that describe models for
similar systems. Unfortunately, often the presentation of models is abbreviated in papers
due to page restrictions. Authors often assume that the reader is aware of usual modeling
assumptions in a particular domain. A careful researcher that sees no details about the model
used in a simulation based study will keep in mind that certain properties (qualitative or
quantitative) of the study might be directly influenced by sloppy modeling. Such researcher
will have a keen eye (developed through experience) to spot discrepancies that go hand-in-
hand with sloppy modeling. It is also a good idea to reflect on, spot shortcomings, and to
report them in the literature when models are found inadequate or incorrectly assume to
“do their job” when they don’t. Proposing a new (better) model is also a very important
contribution to the community.
References
[1] D. S. Johnson. Challenges for TCS (NSF 2000).
http://www.research.att.com/∼dsj/nsflist.html.