SSDLC Exercise 10
SSDLC Exercise 10
Exercise 10.2: Explain with an example why resilience to cyber attacks is a very
important characteristic of system dependability.
Answer:
Resilience means a system can handle attacks and still keep working or recover quickly.
Example: A bank’s website should continue to work during a DDoS (Distributed Denial of
Service) attack by using backup servers and security filters. This ensures people can still
access their accounts even during an attack.
Answer:
Sociotechnical systems include not only machines but also people and their work
environment.
Example: In air traffic control, even if the software and radar are perfect, if the controller is
not trained or makes a mistake, it can cause an accident. So people and processes must be
part of the dependable system design.
Exercise 10.4: Give two examples of government functions that are supported by
complex sociotechnical systems and explain why, in the foreseeable future, these
functions cannot be completely automated.
Answer:
1. Judicial System – Judges make decisions based on laws and ethics. A computer cannot
understand human emotions and justice the way a person can.
2. Emergency Services – Police and firefighters must make quick decisions in stressful
situations. These human choices cannot be replaced by machines fully.
Diversity means using different types of systems to avoid the same mistake happening in
both.
Example: One system using Windows and another using Linux.
Exercise 10.6: Explain why it is reasonable to assume that the use of dependable
processes will lead to the creation of dependable software.
Answer:
When developers follow a clear and tested process (like coding rules, testing, and reviews),
they are less likely to make mistakes. This helps build more reliable and secure software.
Exercise 10.7: Give two examples of diverse, redundant activities that might be
incorporated into dependable processes.
Answer:
1. Parallel Testing – Testing the software on different devices to catch device-specific issues.
2. Code Reviews and Static Analysis – Having both human reviewers and tools check the
code to find more errors.
Exercise 10.8: Give two reasons why different versions of a system based on software
diversity may fail in a similar way.
Answer:
1. Common Design Flaws – If all versions follow the same design plan, they may all have
the same problem.
2. Same Development Team – If the same people build the systems, they might make the
same mistakes in each version.
Exercise 10.9: Report on the benefits of formal methods for safety-critical train control
systems.
Answer:
Report on Using Formal Methods for Train Control Systems
Introduction:
Formal methods use mathematics to design and check systems. For train systems, safety is
very important, so these methods are helpful.
Benefits:
1. Mathematical Checking – Makes sure everything works as planned without guessing.
2. Early Error Catching – Finds problems before building the system.
3. Safety Proof – Gives solid proof that the system is safe.
4. Meets Rules – Helps meet the rules set by safety regulators.
Conclusion:
Even though formal methods cost more, they are worth it for systems like train controls
where safety is very important.
Exercise 10.10: Discuss whether regulation inhibits innovation and whether regulators
should impose development methods.
Answer:
Regulations can slow down new ideas by forcing people to use older methods. But they are
important to keep systems safe and trusted. Instead of forcing specific ways of development,
regulators should give helpful guidelines and let developers choose safe, modern methods.
Exercise 11.1
Question: Explain why it is practically impossible to validate reliability specifications
when these are expressed in terms of a very small number of failures over the total
lifetime of a system.
Answer:
When a system is expected to fail very rarely, it becomes difficult to test it enough times to
be sure about its reliability. For example, if a system should only fail once in 10 years, we
can’t wait 10 years to test that. Running enough test cases to simulate years of use is also
very hard and expensive. So, it's not practical to confirm such high-reliability claims with
real testing.
2. Word Processor
Metric: Rate of Occurrence of Failure (ROCOF)
Reason: Occasional crashes are annoying but not life-threatening.
Suggested Value: Less than 1 crash per 1,000 hours of use
Question: Describe three reliability metrics for a national telecom network operations
center.
Answer:
1. Availability: To ensure the network monitoring system is always running.
2. Mean Time to Repair (MTTR): How quickly the system can recover after a failure.
3. Rate of Occurrence of Failure (ROCOF): To monitor how often failures happen over time.
Question: What is the common feature of all fault-tolerant software architecture styles?
Answer:
All fault-tolerant architectures include redundancy. This means they have extra components
or systems to take over if one part fails.
11.6Question: Suggest an architecture style for a 24/7 communication switch that is not
safety-critical.
Answer:
A hot standby architecture is good. It has a main system and a backup system running in
parallel. If the main system fails, the backup takes over quickly. This ensures continuous
operation.
11.8 Question: Why might all versions in a diverse software system fail in the same
way?
Answer:
1. Similar Specifications: All versions may follow the same incorrect requirement.
2. Shared Understanding: Developers may think in the same way and make similar mistakes.
Exercise 11.9 Question: How does exception handling in programming languages help
software reliability?
Answer:
Exception handling allows the software to deal with errors gracefully instead of crashing. It
lets the program detect problems and take safe actions, which improves reliability.
11.10 Question: Is it ethical to release software with known faults? Should companies
be liable?
Answer:
Releasing faulty software can be unethical, especially if it causes harm or loss. Companies
should try to fix known issues before release. If users face losses, companies should be held
responsible. There should be rules and warranties to protect users, just like with physical
products.
12.1 Identify six consumer products that are likely to be controlled by safety-critical
software systems.
1. Airbags in cars – To protect passengers during accidents.
2. Pacemakers – To regulate a patient’s heartbeats.
3. Smart insulin pumps – To deliver correct insulin doses.
4. Autopilot system in airplanes – To control flight operations safely
5. Home security systems – To detect intrusions and alert users.
6. Microwave ovens – To stop operation if the door is open.
12.2 How would high safety standards affect the risk triangle?
The risk triangle would become narrower at the top.
Minor injuries and near misses would be almost eliminated.
More focus would be on preventing all risks, even very low-probability ones.
12.3 Three user errors in an insulin pump system and safety requirements
1. Error: User forgets to change the insulin supply
Requirement: Alert the user when insulin is low or expired.
2. Error: User sets a very high single dose
Requirement: Limit maximum dose and ask for confirmation.
3. Error: User accidentally sets the daily dose too high
Requirement: Warn and require a doctor's code for large daily dose changes.
12.7 Why is model checking sometimes more cost-effective than formal specification?
Model checking is automated and finds many errors early.
It checks all possible states of a system.
Formal methods can be time-consuming and harder to apply
Model checking is good for early-stage verification of design.
if (state == safe)
{
Door.locked = false;
Door.unlock();
}
else
{
Door.lock();
Door.locked = true;
}
}
13.4 Two more threats and controls for the Mentcare system
A bank protects money using multiple layers: locked safes, security guards, alarms, and
cameras.
Similarly, secure systems use layers: firewalls, encryption, authentication, and monitoring.
13.10 Three possible attacks on the Mentcare system and testing checklist
Possible Attacks:
1. Man-in-the-Middle (MITM) Attack – Hacker intercepts patient data during transmission.
2. Unauthorized Database Access – Insider uses direct database access to view records.
3. Phishing Attack – Fake login page tricks doctors into revealing credentials.
Extended Testing Checklist: | Security Area | Checklist Item | |------------------|------------------|
| Data Encryption | Is patient data encrypted during transmission? | | Access Controls | Are
user permissions correctly implemented? | | Login Security | Are multi-factor authentication
and CAPTCHA enabled? | | Network Security | Is traffic monitored for unusual access
patterns? | | Audit Logs | Are logs maintained and regularly reviewed? |
Chapter 14: Resilience Engineering – Exercises and Answers
14.1 Explain how the complementary strategies of resistance, recognition, recovery, and
reinstatement may be used to provide system resilience.
Answer:
1. Resistance – Preventing problems before they occur.
Example: Firewalls and antivirus software block malware before it enters the system.
2. Recognition – Detecting problems when they happen.
Example: An intrusion detection system (IDS) that monitors network activity for suspicious
behavior.
3. Recovery – Taking action to quickly restore services
Example: Automatic system failover to a backup server when the primary server crashes.
4. Reinstatement – Restoring full system functionality.
Example: Rebuilding a compromised system using a clean backup and updating security
measures.
14.2 What are the types of threats that have to be considered in resilience planning?
Provide examples of the controls that organizations should put in place to counter those
threats.
Answer:
14.3 Describe the ways in which human error can be viewed according to Reason
(Reason, 2000) and the strategies that can be used to increase resilience according to the
Swiss Cheese Model.
Answer:
Reason’s Model of Human Error:
Active Errors – Mistakes made directly by users (e.g., entering incorrect data).
Latent Errors – Hidden system flaws (e.g., a software bug that allows incorrect data entry).
Swiss Cheese Model:
Errors pass through multiple defense layers if all layers fail.
Example:
1. Doctor prescribes wrong medication.
2. Pharmacist does not verify.
3. Nurse administers without checking.
Resilience Strategies:
Adding multiple safety layers.
Using automated alerts to detect errors.
Providing better training for staff.
14.4 A hospital proposes to introduce a policy that any member of clinical staff (doctors
or nurses) who takes or authorizes actions that lead to a patient being injured will be
subject to criminal charges. Explain why this is a bad idea, which is unlikely to improve
patient safety, and why it is likely to adversely affect the resilience of the organization.
Answer:
Negative Effects:
Staff will fear punishment, making them reluctant to report mistakes.
Hidden errors will not be corrected, leading to more future mistakes.
14.5 What is survivable systems analysis and what are the key activities in each of the
four stages involved in it?
Answer:
Survivable Systems Analysis (SSA) Stages:
1. System Understanding – Identify critical system components and functions.
2. Essential Capability Definition – Determine key services that must continue during an
attack.
3. Compromise Analysis – Identify how attackers might break the system.
4. Survivability Strategies – Develop plans for resistance, recognition, and recovery.
14.6 Explain why process inflexibility can inhibit the ability of a sociotechnical system
to resist and recover from adverse events such as cyberattacks and software failure. If
you have experience of process inflexibility, illustrate your answer with examples from
your experience.
Answer:
Why inflexibility is bad:
Prevents quick responses to unexpected cyberattacks.
Makes security updates slow due to bureaucratic approval processes.
Example:
A hospital uses outdated password policies, but updating them takes months due to slow
internal approvals, leaving the system vulnerable to cyberattacks.
14.7 Suggest how the approach to resilience engineering in Figure 14.9 could be used in
conjunction with an agile development process for the software in the system. What
problems might arise in using agile development for systems where resilience is
important?
Answer:
How to integrate resilience into Agile:
Include security testing in every Agile sprint.
Perform risk assessments before each release.
Problems with Agile in resilience-critical systems:
Frequent updates might introduce new security flaws.
Security testing takes time, which may slow Agile sprints.
Quick changes may bypass security best practices.
14.8 In Section 13.4.2, two cyberattacks were mentioned: (1) an unauthorized user
places malicious orders to move prices and (2) an intrusion corrupts the database of
transactions. For each attack, identify resistance, recognition, and recovery strategies
that might be used.
Answer:
14.9 In Figure 14.11, a number of adverse events were listed for the Mentcare system.
Draw up a test plan for this system that sets out how you could test the ability of the
Mentcare system to recognize, resist, and recover from these events.
Answer:
Answer:
Ethical Concerns:
Employees have a right to privacy.
Secret monitoring creates distrust and reduces morale.
Ethical Approach:
Inform employees before monitoring begins.
Only log work-related actions, not personal data.
Use monitoring for security improvement, not for spying.