0% found this document useful (0 votes)
11 views

CSC8208_ Evaluating Your Project

The document outlines the importance of evaluating computer science projects, particularly in the context of security and resilience. It discusses the evaluation process, including defining baselines, data collection, analysis, and drawing conclusions, while emphasizing the need for rigorous testing to validate claims. Case studies are provided to illustrate effective evaluation methods and the significance of planning in project assessments.

Uploaded by

pheonio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

CSC8208_ Evaluating Your Project

The document outlines the importance of evaluating computer science projects, particularly in the context of security and resilience. It discusses the evaluation process, including defining baselines, data collection, analysis, and drawing conclusions, while emphasizing the need for rigorous testing to validate claims. Case studies are provided to illustrate effective evaluation methods and the significance of planning in project assessments.

Uploaded by

pheonio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

CSC8208: Research Methods and Group Project

in Security and Resilience


On Evaluating Computer Science Projects

Dr Carlton Shepherd

Images: DALL-E 3
Today’s Learning Outcomes

● Understand current challenges in building a security system (a secure chat system).

● Issues that relate to the planning and execution of a team-based software project, at
an advanced level.

● The methods of conducting and analysing research in Computer Science.

● The challenges involved in planning a system development in which security,


resilience and safety are significant.
Agenda
● What is an evaluation and why it is
important as it applies to your project.

● The evaluation process at a high level.

● Baselines, data collection and analysis.

● Making conclusions

● Some case studies.


Why evaluating a project is
important
Why evaluate a project (1)

● Definition: “the making of a judgement about the


amount, number, or value of something; an
assessment.” (Oxford Languages, 2024).

● Provides evidence that our claims are valid; helps


us distinguish what is true from what is bogus.

● Helps us judge to what extent the project’s aims and


objectives were met, and any challenges.

● Think: How would you feel if someone said their


system “meets real-world requirements” without
presenting any experiments?

(Source: Wikimedia Commons, 2024)


Why evaluate a project (2)

● Conducting a good evaluation is an often


overlooked point for student projects.

● How will you assure readers, e.g. markers,


that your project is valuable? And is fit for
purpose?

● What evidence will you provide to back up any


particular claims about security and
resilience? And why?

● How do you alleviate any concerns that your


claims are false, hyped up, exaggerated?

(Source: Wikimedia Commons, 2024)


A toy example
Boats ‘R’ Us

● Scenario: A new start-up, Boats ‘R’ Us Ltd., has


developed a “groundbreaking” new boat for
carrying 200 passengers across the Atlantic
Ocean at 80 km/h.

● Their approach: they evaluated the


seaworthiness and performance of the vessel
by testing it in a small laboratory pool with only
three passengers on board at 2 km/h.

● They say that it’s “groundbreaking,” so it must


be good, right?
Vevox
What is wrong with this evaluation?
What is wrong

● This evaluation fails to replicate the conditions


the boat will face in the open sea. ● Testing a boat this way doesn’t demonstrate
fitness for purpose. Similarly, evaluating a
● There are strong currents, waves, and weather chat system under unreasonable scenarios
conditions that may affect the vessel. offers little information about its intended
performance.
● It was tested using only three passengers,
despite 200 being claimed as its intended ● A bad evaluation ignores the complexities
capacity. and variables of a real environment, leading
to evaluations that are not only unhelpful but
● The extra weight of 197 passengers could potentially misleading.
cause the boat to lose speed (invalidating the
speed claim) or may cause it to sink.
On the process of evaluation
General Principles

● The basis of your evaluation should be to


clearly and persuasively demonstrate why ● It is important to plan ahead when evaluating
your project is valuable, meets your stated your projects. A LOT of time can be wasted if
claims, and is fit for purpose. you do not have a plan.

● Would you trust a ‘resilient’ chat system ● You will have to implement code that
that was tested against only 3 users, or measures data of interest. You will also have
100,000 users? to run the experiments. Do not
underestimate this development effort!
● What should you measure and why?
A Flow Chart for Doing Evaluations

1 Your 5 Data
Solution Collection

2 Revisit Aim 6 Analysis


and Objectives
3 Related
Work

4 Baseline 7 Conclusions

Inspired by Zober, “Writing for Computer Science,” 4 ed. Springer, 2014.


A Flow Chart for Doing Evaluations

Your secure 1 Your 5 Data


chat system Solution Collection

2 Revisit Aim 6 Analysis


and Objectives
3 Related
Work

4 Baseline 7 Conclusions
A Flow Chart for Doing Evaluations

1 Your 5 Data
Solution Collection

Do they still
apply? 2 Revisit Aim 6 Analysis
and Objectives
3 Related
Work

4 Baseline 7 Conclusions
A Flow Chart for Doing Evaluations

1 Your 5 Data
Solution Collection

How does existing work


measure success? 2 Revisit Aim 6 Analysis
and Objectives
3 Related
Work

4 Baseline 7 Conclusions
A Flow Chart for Doing Evaluations

1 Your 5 Data
Solution Collection

2 Revisit Aim 6 Analysis


and Objectives
3 Related
Work

How will I measure 4 Baseline 7 Conclusions


success?
A Flow Chart for Doing Evaluations
What data do I need?
Can I collect it? If
5 Data not, how can I?
1 Your
Solution Collection

2 Revisit Aim 6 Analysis


and Objectives
3 Related
Work

4 Baseline 7 Conclusions
A Flow Chart for Doing Evaluations

1 Your 5 Data
Solution Collection

Is this data
2 Revisit Aim 6 Analysis useful?
and Objectives
3 Related
Work

4 Baseline 7 Conclusions
A Flow Chart for Doing Evaluations

1 Your 5 Data
Solution Collection

2 Revisit Aim 6 Analysis


and Objectives
3 Related
Work

4 Baseline 7 Conclusions What does the data


mean? Does it say
enough?
Baselines
Baselines

● Baselines are used to identify how value can be


measured in your project. ● Bad baseline: Comparing the lines of code (LOC)
of your new algorithm against DES.
● Or, it is the benchmark against which your project
can be assessed or compared. ● This is a bad benchmark. DES is outdated; we
have competing lightweight ciphers. LOC differs
● Assume we have developed a new lightweight according to context (e.g. language).
encryption algorithm. A baseline may involve
investigating: ● Better baseline: Measuring the CPU execution time
○ How do we measure ‘lightweight’ in this context? and RAM consumption of your algorithm against
○ What do I compare it against? all similar proposals in the state of the art.
○ Methods? Experimental or theoretical/asymptotic.
Applied to your project

● A hypothetical objective: “We will build a


resilient chat system that is capable of
● You should carefully think about this in your
handling real-world user loads.”
teams during your evaluations.

● What do we mean by “real-world”?


● Once you have defined what you want to
○ Maybe we intend our messing system to be
used by a small business (~100 users). collect, the evaluation process can become
○ Or a whole country (50+ million users). much easier.

● What is a “user load”? ● You will likely need to implement specific


○ How much time they spend on your service? code to collect the necessary data.
○ How many users logging in at once?
○ How many conversations at the same time?
Data collection and analysis
Types of data

● You are likely to see two evaluation methods


for using data: qualitative and quantitative.

● Qualitative methods: non-numerical data, e.g.


concepts, opinions, experiences, audio.
Example: interviews in the context of a
usability study.

● Quantitative methods: numerical data.

● Your collection and analysis will depend


significantly on what method you will use.
Vevox
List some types of quantitative data that
you might use in your projects
Types of data

From: S. McLeod, “Qualitative vs Quantitative Research Methods and Data Analysis,”


2023. https://www.simplypsychology.org/qualitative-quantitative.html
Collecting data

● Technical security projects tend to rely on ● Some considerations:


○ What about other programs/processes?
quantitative research methods. Example: how
○ Like that really annoying antivirus product that
do we measure the overhead of a new consumes 100% CPU unpredictably?
operating system security mechanism? ○ Multi-core systems?

● After reading related work, we have seen that ● You must control for these variables as best
measuring CPU consumption when our as possible to avoid bias.
mechanism is enabled versus disabled is an ○ Very difficult in time-critical areas.
appropriate baseline.
● You will need to spend some implementation
● Accurately measuring CPU consumption is effort to ensure accurate data collection.
surprisingly difficult.
Analysing data

● Humans cannot process megabytes of raw


CPU data that you might have collected.

● The process of inspecting, cleaning,


transforming and visualising information in
order to extract meaning (intelligence).

● Thankfully, we have tools to help us.


○ Python stack: Pandas, Numpy, Scipy.
○ R Studio
○ Julia

● Use standard formats for sharing raw data,


e.g. CSV and XML.

● Use reproducible code for analyses, e.g.


Jupyter Notebooks.
Vevox
We want to evaluate how a secure chat
system scales with the number of users.
What might you measure?
Making conclusions ● Conclusions are inferred from what we have
analysed.

● Directly support, or indeed show false, the


claims we are making, e.g. aims and objectives.

● No easy ‘recipe’ for success.


○ Consider the context of your data sources.
○ Check, and check again, for any potential biases
in your data.
○ Confounding variables?: something that you have
NOT measured which affects your results.

● Be careful to avoid overselling or exaggerating


what is actually shown in the data.
Bad example: “Following statistical analysis, we conclude that Nicholas Cage causes drownings (0.66 correlation).”
Case Studies
Case Study: SafeSlinger

● SafeSlinger (Farb et al., 2013) is a chat system that


allows users to exchange cryptographic keys with
nearby users for group communication.

● Execution time against number of nearby users was


determined as one of the quantitative methods for
evaluating the project.

● This was used as a baseline. Two closely related


works, SPATE and GAnGS, were used for comparison.

● From: M. Farb et al. "Safeslinger: Easy-to-use and secure public-key exchange." 19th Annual International Conference on Mobile
Computing and Networking (MobiCom). ACM, 2013.
Case Study 2: COINKS

● COINKS (Melara et al., 2015) is a cryptographic


enhancement for real-world E2E encrypted chat systems.

● Latency was also determined as a suitable quantitative


method for evaluating the project.

● Demonstrated under a large number of users, which we


might expect for a commercial system (1 x 10^7 existing
users).

● From: M. S. Melara et al., “COINKS: Bringing key transparency to end users," 24th USENIX Security Symposium, 2015.
Closing Thoughts
Takeaways

● In this module, the evaluation is worth 15% of ● Think: what do you intend on measuring and
the Final Project Report. why?

● A good evaluation is often hard to design, ● Consider: a chat system has many different
execute, and present correctly. Many student possible evaluation methods and metrics.
projects are let down by poor evaluations. ○ Connection latency.
○ Server and client RAM consumption.
● The basis of your evaluation should be to ○ Server and client CPU consumption.
○ Usability.
clearly and persuasively demonstrate why your
○ Reliability (e.g. downtime).
project is valuable, satisfies your aims and
○ Network bandwidth.
objectives, and is fit for purpose.
Takeaways (2)

● Evaluating time and memory performance is


● Analysing the results – crunching the
generally a good rule of thumb for any
numbers, generating tables and charts – also
systems research (but it depends).
takes time. Make sure you leave enough time
to do this!
● It is important to plan ahead when evaluating
your projects.
● It can sometimes make sense to have
dedicated team members for focussing on
● You will likely have to implement code that
the evaluation.
measures data of interest.
Any Questions?
Extra: Presentation
Use Graphs and Images

● (Left) C. Shepherd et al., “A side-channel analysis of sensor multiplexing for covert channels and application profiling on mobile devices,” IEEE
Transactions on Dependable and Secure Computing. IEEE, 2023.
● (Right) G. Goller and G. Sigl. "Side channel attacks on smartphones and embedded devices using standard radio equipment." International
Workshop on Constructive Side-Channel Analysis and Secure Design. Springer, 2015.
A Brief Detour on Information Loss

● Try saving a JPEG image a few times.

● You will notice that image quality degrades.


○ JPEG is a lossy compression algorithm.

● We say that it undergoes information loss.

● We can measure this in several ways, e.g. edit


distances between two images (as bit strings)
and structural similarity index measure (SSIM).

● We can also evaluate this using human


judgement, i.e. by eye.

(Image source: Wikimedia Commons, 2024)


Example 1: A Covert Channel

● A covert channel is an unintended and unauthorised method of


data communication between two entities (e.g. processes).
Most covert channels come with some information loss.

● The authors develop a channel allowing Android applications to


communicate information in a broadcast way, bypassing the
Android permissions system.

● This figure clearly illustrates how different images undergo


information loss using different covert channels.
○ Not much space taken up to convey “fitness for purpose”.

● (The authors also used quantitative data to reinforce their


arguments, e.g. edit distance between sent / received images.)
Example 2: A Physical Side-channel Attack

● A physical side-channel attack uses physical phenomena to


recover secret data (e.g. electromagnetic radiation
emissions, or power consumption, to learn bits of a key).

● This annotated line graph shows how the authors


recovered bits of a key over time from a smartphone using
EM radiation.

● We see that longer distances (in time) between troughs


correspond to ‘1’ values in the key. Shorter times
correspond to ‘0’ values.

● The gist of a complex approach is conveyed in a single


image.
A more complex example

● F. Pendlebury et al., “TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time,” USENIX Security, 2019.

You might also like