0% found this document useful (0 votes)
577 views177 pages

Space System Failures

This document provides recommendations for reviewers to avoid common mistakes. It discusses two specific issues: sign errors involving orientation and phasing of components, and problems verifying last-minute configuration changes after testing. For sign errors, it recommends controlling orientation-critical components, verifying orientation end-to-end independently, carefully controlling interfaces, and ensuring anomalies can be corrected. For late changes, it recommends listing all post-test changes, having an effective configuration control plan, and establishing fool-proof change procedures. The document provides background on why these issues commonly cause failures and specific questions reviewers should ask to verify contractors are adequately addressing the risks.

Uploaded by

sudhiruday31
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
577 views177 pages

Space System Failures

This document provides recommendations for reviewers to avoid common mistakes. It discusses two specific issues: sign errors involving orientation and phasing of components, and problems verifying last-minute configuration changes after testing. For sign errors, it recommends controlling orientation-critical components, verifying orientation end-to-end independently, carefully controlling interfaces, and ensuring anomalies can be corrected. For late changes, it recommends listing all post-test changes, having an effective configuration control plan, and establishing fool-proof change procedures. The document provides background on why these issues commonly cause failures and specific questions reviewers should ask to verify contractors are adequately addressing the risks.

Uploaded by

sudhiruday31
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 177

AEROSPACE REPORT NO.

TOR-2007(8617)-1

Five Common Mistakes Reviewers Should Look


Out For

29 June 2007

Prepared by

P. G. CHENG
Risk Assessment and Management Subdivision
Systems Engineering Division

Prepared for

SPACE AND MISSILE SYSTEMS CENTER


AIR FORCE SPACE COMMAND
483 N. Aviation Blvd.
El Segundo, CA 90245-2808

Contract No. FA8802-04-C-0001

Engineering and Technology Group

DISTRIBUTION STATEMENT: Distribution is limited to US Government agencies and their


contractors only; Administrative or Operational Use, 29 June 2007. Other request for this
document shall be referred to SMC/AX.
DESTRUCTION NOTICE: For classified documents, follow the procedures in DOD 5220.22-M,
National Industrial Security Program Operating Manual (NISPOM), Paragraph 5, Section 7. For
unclassified, limited documents, destroy by any method that will prevent disclosure of contents
or reconstruction of the document.

EL SEGUNDO, CALIFORNIA
AEROSPACE REPORT NO.
TOR-2007(8617)-1

FIVE COMMON MISTAKES REVIEWERS SHOULD LOOK OUT FOR

Prepared by

P. G. CHENG
Risk Assessment and Management Subdivision
Systems Engineering Division

29 June 2007

Engineering and Technology Group


THE AEROSPACE CORPORATION
El Segundo, CA 90245-2691

Prepared for

SPACE AND MISSILE SYSTEMS CENTER


AIR FORCE SPACE COMMAND
483 N. Aviation Blvd.
El Segundo, CA 90245-2808

Contract No. FA8802-04-C-0001

DISTRIBUTION STATEMENT: Distribution is limited to US Government agencies and their


contractors only; Administrative or Operational Use, 29 June 2007. Other request for this document
shall be referred to SMC/AX.
DESTRUCTION NOTICE: For classified documents, follow the procedures in DOD 5220.22-M,
National Industrial Security Program Operating Manual (NISPOM), Paragraph 5, Section 7. For
unclassified, limited documents, destroy by any method that will prevent disclosure of contents or
reconstruction of the document.
ii
iii
iv
Foreword

“It’s always the simple stuff that kills you….It’s not that they are stupid, with all the testing
systems everything looked good.”
James Cantrell, Main Engineer for the Skipper Satellite
Skipper failed because its solar panels were connected
backward (Associated Press 1996).

Failure reports routinely trace the underlying cause of mishaps to “engineering blunders” and
bemoan “inadequate reviewing.” But how can reviewers, in a few hours, find a mistake that
has eluded years of design and quality checks by the contractor and program office?

Over the years the author has published 20 volumes of Space Systems Engineering Lessons
Learned, analyzing past failures and highlighting practices required to avoid recurrences.
Another report, 100 Questions for technical Review, suggests questions that reviewers can
use to efficiently look for errors. This report focuses on five of the most common lapses and
prescribes ways to catch them. The original “100 Questions” are shown in Appendix A; all
published lessons are available in Appendix B via hyperlinks.

Adequate reviews call for substantial supporting information, which requires reviewers to
coordinate with the contractors beforehand. Better yet, the program office and the contractors
should address each of these five areas of concern on their own.

v
vi
Five Common Mistakes Reviewers Should Look Out For

1: Could the Sign Be Wrong? ............................................................................................. 1

2: How Will Last-Minute Configuration Changes Be Verified? ........................................ 3

3: Can the Vehicle Survive a Computer Crash? ................................................................. 5

4: Is the Circuit Overcurrent Protection Adequate?............................................................ 7

5: Can Pyros Cause Unexpected Damage?......................................................................... 9

vii
viii
Acronyms
CDR Critical design review
EPS Electrical power subsystem
ESD Electrostatic discharging
FR Final review (of an item’s as-flight configuration, encompassing unit selloff,
system selloff, and various pre-flight readiness reviews)
GN&C Guidance, navigation, and control subsystem
IRT Independent review team, a term encompassing the mission assurance team
(MAT), independent readiness review team (IRRT), and other independent
review activity teams
PDR Preliminary design review
SRR System requirements review
TRR Test readiness review

ix
x
1. Could the Sign Be Wrong?

Most Applicable Review Occasions:

Program Milestone SRR PDR CDR TRR IRT FR


Check Recommendations 1, 2 1, 3, 4, 5 3, 4, 5 3, 4, 5 3, 5

Background

Sign errors, involving orientation and phasing (polarity) of torque coils, moving mechanical
assemblies, and many other components is a leading cause of satellite failures (Lessons 43,
53, 60, 80, 93 , and 97). A spacecraft recently crashed because the engineers did not realize
an avionics sensor could only detect deceleration from a particular direction. Catastrophic
mistakes have likewise occurred during programming, database development, manufacturing,
and even post-test integration.

Recommendations

The reviewers should verify that the contractors:


1. Prepare a plan to control everything that is sign- or orientation-critical. Make sure the list
is not limited to attitude control, and that a mechanical engineer supports this activity.
2. Identify the process, including component- and system-level test plans, to ensure that
each piece of equipment will be developed correctly. The configuration of sign-sensitive
parts and the assemblies in which they are mounted must be controlled. Board layout,
manufacturing, database, and software development must be scrutinized.
3. Prove that proper orientation has been verified end-to-end, after spacecraft integration.
Make sure the tests are made independently—the sign should be determined by a priori
analysis instead of being established during test (test is for verification only). Whenever
possible, use basic crosschecks (such as a compass). Validate that test-as-you-fly
exceptions do not mask sign errors.
4. Carefully control interfaces—including between satellite and launcher as well as between
space and ground—to prevent different organizations from confusing vehicle coordinate
systems. For example, merely specifying the deployment of a helix coil “in a clockwise
manner” is not enough because clockwise to the person standing in front of the helix’s tip
is counterclockwise to the person behind it.
5. Ensure flight anomalies caused by mistakes in GN&C sensor and actuators can be speed-
ily corrected with software patching. Develop contingency plans.

For more information, call John Bohner at (310) 336-1772.

1
2
2. How Will Last Minute Configuration Changes Be Verified?

Most Applicable Review Occasions:

Program Milestone SRR PDR CDR TRR IRT FR


Check Recommendations 1, 2, 4 1, 3, 4, 5 3, 4, 5, 6 3, 4, 5, 6

Background

Vehicle configurations often change after system testing. For example, placeholder blankets
need to be swapped out, flight connectors have to mate, and database parameters may
change. Some items, such as locking brackets to prevent flight hardware from coming loose
during ascent, can only be introduced at the launch site. Non-flight items have to be removed.

Hasty changes, especially those made in the heat of launch preparation, have caused several
failures (Lessons, 3, 25, 29, 43, 61, 63, 64, 70, 79, 97, and 104), in part because late installa-
tions and removals can be difficult to verify.

Recommendations

The reviewers should verify that the contractors:


1. List everything that will be introduced or removed after unit selloff or system testing.
2. Develop a configuration control plan, including the use of visible tags, effective logs,
unique part numbers, and other tools to ensure proper handling of late changes.
3. Establish fool-proof configuration change procedures, such as using uniquely keyed
connectors to avoid mismating, conducting tabletop rehearsal of hardware installation, as
well as performing hardware-in-the-loop emulation of new flight software or database.
Make sure an experienced operator reviews all installation procedures.
4. Create a plan to independently verify configuration changes. For example, if a place-
holder cable is used during test, how will it be verified that the flight cable will perform
satisfactorily?
5. Provide sufficient closeout photography of the “as tested” and “as flown” configurations
to aid troubleshooting.
6. Incorporate late (in particular launch site) engineering changes in the as-designed
configuration prior to consent-to-encapsulate.

For more information, call Dana Speece at (310) 336-5021 or Gary Shultz at (310) 336-2342.

3
4
3. Can the Vehicle Survive a Computer Crash?

Most Applicable Review Occasions:

Program Milestone SRR PDR CDR TRR IRT FR


Check Recommendations 1 1, 2, 3, 5 1, 2, 3, 5 2, 3, 4, 5 2, 4, 5

Background

Computer malfunctions—often caused by subtle timing or memory glitches—have thwarted


several missions (Lessons 18, 35, 36, 79, 84, 94, 112, 115, 123, and 124). Besides making
sure that the flight computers do not malfunction in the first place, reviewers should check if
the vehicle can gracefully handle a computer crash, as shown by its ability to:

• Revert to the “last known good state,” otherwise even a brief outage may cause
irreversible harm—for example, by incorrectly resetting the guidance system.
• Reboot without being stuck in an endless reset cycle.
• Switch to back-up computers after the primary side fails.
• Protect the memory, communication channels, thrusters, batteries, and thermal
subsystems to buy time for ground diagnosis and rescue. In one case, a prolonged
computer anomaly drained the batteries, but when the solar arrays were eventually
illuminated, an EPS design oversight prevented the solar arrays from charging the
battery, and the “dead bus” could not recover.
• Implement robust fault protection functions that cannot be falsely tripped, can function
after the computer locks up, and permit ground rescue. Avoid complex safemode design.

Recommendations

The reviewers should verify that the contractors:


1. Demonstrate that a computer freeze will not result in immediate vehicle failure. Consider
hot stand-by computers and uninterrupted power supplies.
2. Safeguard against having the computer stuck in endless reset cycles. Make sure faults that
trigger a freeze will not interfere with the restart, and the start sequences will not require
potentially unavailable resources. Consider, for example, allowing the watchdog function
to switch to a recovery mode after a few “try agains.”
3. Design the EPS system to withstand battery undervoltage.
4. Conduct exhaustive analysis and testing of the fault protection logic to ascertain if it will
survive computer anomalies without becoming a source of single-point failure.
5. Ensure ground commandabilty in the event that the computer locks up. Consider install-
ing a backdoor receiver so the computer can be reset from ground.
For more information, call Pete Carian at (310) 336-8215.

5
6
4. Is the Circuit Overcurrent Protection Adequate?
Most Applicable Review Occasions:

Program Milestone SRR PDR CDR TRR IRT FR


Check Recommendations 1, 2, 3 1, 2, 3, 4 4 1, 2, 3

Background

Numerous satellites failed due to improper protection against shorting from foreign objects,
debris, unexpected contact, or plasma arcing (a high-current, low-voltage discharge in
vacuum over metal vapor). Even digital units are not immune. On-board processors
containing relays plated with pure tin have disabled several satellites after arcing triggered by
tin whiskers blew the fuses (Lessons 5, 19, 41, 47, 49, 56, 71, 75, 98, 100, and 104).

Fuses, circuit breakers, thermostats, and other devices protect upstream assets (such as the
power distribution board) from being damaged by a short, but careful design is required to
prevent them from becoming a single-point source of failure.

Recommendations

The reviewers should verify that the contractors:


1. Address all shorting and arcing risks. Encapsulate sources of plasma release, such as
high-voltage (> 15 volts) conductors, bus bars, test points, unsealed fuses, fuse wires,
resistors, and multilayer capacitors. Check that conformal coating has not left surfaces
(especially sharp corners) exposed and vulnerable to arcing. Ensure adequate clearances
between power and returns, and between positive and negative potentials.
2. Certify the absence of pure tin plating in parts, including subcontracted items. 1
3. Protect the primary power source either by fusing or by adopting single-fault tolerant
designs. 2 Fuses should be sized such that their derated value (per MIL-STD 1547) will
support the maximum steady-state current and to withstand peak in-rush or out-rush
current. Wiring between the fuses and loads must be sized to carry the maximum non-
blow fuse current without overheating. All fused circuits should be analyzed to ensure
credible load faults will result in blowing the fuses.
4. Provide adequate vent holes in high-voltage cables or sealed modules—including test
equipment in thermal vacuum chambers—to assure complete outgassing and mitigate
corona discharge.
5. Inspect all connectors for foreign objects prior to mating.

For more information, call Tom Hecht at (310) 336-1505 or Dave Landis at (310) 336-1585.

1
. Zinc and cadmium plating should also be avoided due to contamination hazard.
2
. Examples include double insultation, serially connected components, current limiters, etc.

7
8
5. Can Pyros Cause Unexpected Damage?

Most Applicable Review Occasions:

Program Milestone SRR PDR CDR TRR IRT FR


Check Recommendations 1, 2 1, 2, 3, 4 2, 3, 4, 5 1, 2, 4, 5

Background

Pyro designers assiduously make sure their devices will fire. However, premature pyrotech-
nic firings have caused not only catastrophic mission failures, but also severe personnel
losses. Excessive shocks, flying debris, voltage surges, and post-firing shorts have also
damaged critical equipment (Lessons 7, 68, 77, 89, 95, 98, 100, 109, 110, 111, and 119).
Safe design and handling procedures are thus vital.

Recommendations

The reviewers should verify that the contractors:

1. Conduct in-depth analysis of single-point failures, overcurrent, and sneak circuits on pyro
equipment. In particular, all inhibits (typically the ARM and FIRE relays) must be inde-
pendent—not defeated by any single event, including an erroneous software command. 3
2. Follow Range safety requirements and guidelines set forth in MIL-HDBK-83578, includ-
ing the use of double-shielded circuits and safe-arm devices to prevent accidental ignition
by electrostatic discharging (ESD).
3. Implement safe testing procedures and mechanisms to retain ejected parts and to prevent
electrical disruption.
4. Verify that pyroshock does not cause damage, especially to nearby ordnance devices.
5. Adopt design and inspection procedures to guard against damaging transients and post-
firing conduction to chassis. Ensure that live tests before launch do not disable the drive
elements.

For more information, call Selma Goldstein at (310) 336-1013 and Ron Williamson at (310)
336-2149.

3
. The same concern also applies to nonexplosively actuated mechanisms such as wax heaters.

9
10
Appendix A

100 Questions for Technical Review

(The questions were originally listed in TOR-2005(8617)-4204;


this TOR refers to more lessons)

A-1
A-2
Section 1: Requirements

1-1 Are units and tolerances specified?


• Quantifying requirements reduces mistakes and surfaces manufacturing and
test issues.
• Use TBDs to highlight the need for further clarification, but clear them off
in a timely manner.
• Watch out for mistakes when two interfacing organizations use different
units (English versus metric, for example), CAD/CAM protocols, or engi-
neering practices.
• Lessons: 73 and 76.

1-2 Is the specification’s wording unambiguous?


• Avoid incomplete lists (typically ending with “etc.”), vague words such as
“to the best possible,” passive voice, such as “the counter is set” (by
whom?), and negative statements.
• See http://www.ntsc.navy.mil/Resources/Library/Acqguide/spec.htm, a compre-
hensive “Guide to Specification Writing for U.S. Government Engineers.”
• Lessons: 4 and 12.

1-3 Should any statement be split up?


• Lumped requirements are difficult to trace. Some may be overlooked.
• Lesson: 12.

1-4 How will it be demonstrated that each requirement is met?


• Each requirement should be traceable to a compliance matrix.
• If a requirement is implemented by software, it must be linked to test cases.
• Lesson: 19.

1-5 Does each requirement trace upward?


• Unnecessary or overtight requirements drive up costs.
• Rationale for each derived requirement should be documented.
• If a lower-level implementation affects a higher level, make sure other sub-
systems will not be surprised.
• Lesson: 85

1-6 How are configuration changes tracked?


• Make sure requirement or design changes are coordinated, and reincorpo-
rate all redlinings and ad-hoc changes in the specifications.
• Lessons: 97, 70, 53, 64, 102, and 108.

A-3
A-4
Section 2: Heritage and “Qualification by Similarity”

2-1 Have all “heritage equipment” test and flight anomalies been resolved?
• The implication of each anomaly must be carefully addressed.
• Lessons: 41 and 65.

2-2 Have catastrophic failures that involved similar technologies been reviewed?
• Lessons: 87 and 107.

2-3 Did the original analyst review the model’s application?


• Reusing a model without fully understanding underlying assumptions can
be risky.
• Lesson: 99.

2-4 Do previous analyses still apply?


• Changes in configuration or flight environment may invalidate the original
analysis.
• Parameters worth checking include temperature, power, electrical and
mechanical stress, and flight duration.
• Lessons: 95, 83, and 47.

2-5 Is the heritage design well understood?


• Lesson: 50.

2-6 Should an old unit recommissioned for flight be retrofitted?


• Design upgrades made while an old unit sat on the shelf should be
considered.
• Lesson: 57.

2-7 Have replacement materials and parts been fully qualified?


• It is not sufficient for the replacements to merely meet lot acceptance
specifications.
• Make sure liens are given to parts procured prior to full qualification.
• Lessons: 108 and 14.

2-8 Should fault management circuits be redesigned?


• When a heritage unit is scaled up, key parameters such as start-up current
and rise time may change.
• Lesson: 84.

A-5
A-6
Section 3: Analysis

3-1 Have all critical analyses been placed under configuration control?
• Design changes may invalidate the original analysis.
• Lessons: 83 and 26.

3-2 Have designs been compared to similar, proven, equipment?


• Novel design approaches may entail risks.
• Make sure subcontractors concur with the way their product is used.
• Lessons: 82 and 99.

3-3 Has the analyst inspected the actual hardware?


• Sometimes the hardware is not what the analyst imagined.
• Lessons: 81 and 26.

3-4 Can the manufacturing process meet design requirements?


• Make sure the manufacturing engineer reviewed drawings early on.
• Use prototype and engineering models to discover problems early—issues
found in late tests can be very expensive.
• Lessons: 37 and 55.

3-5 Is the design tolerant of dimensional changes?


• Example: thermal mismatch and creep can cause dimension change,
interference, and shorting.
• Lessons: 52, 122. 106, and 47.

3-6 Was component qualification based on sufficient engineering data?


• That a few items worked is not sufficient—statistical data may be required
to show margin of safety.
• Instrumentation data may provide information to substantiate or disprove
the analysis, which is more essential.
• Do not accept an analysis based on unverifiable “contractor proprietary
data” or “classified information”— there are always workarounds.
• Lessons: 82 and 23.

3-7 Was the analysis complete?


• Do not throw out data points that do not fit a theory or could not be readily
understood.
• Properly account for the variances in the source data.
• Lessons: 113 and 59.

A-7
3-8 Was the space environment fully accounted for?
• Examples: damping, radiation, charging, arcing, heat dissipation, refractive
index, and microgravity.
• Ground thermal insulation blanket to prevent space charge buildups.
• Lessons: 41, 42, 10, 75, and 120.

3-9 Has the presence of residual magnetism been considered?


• Iron or nickel alloys may become magnetized upon work-hardening or
exposure to a magnetic field.
• Residual magnetism can disable solenoids.
• Lessons: 121 and 122.

3-10 Has the electrical schematic been independently checked, from end to end?
• Mistakes sometimes occur between drawings.
• Lesson: 68.

3-11 Can the harness be misconnected?


• Wiring and connectors should be designed to preclude mismating.
• Lesson: 63.

3-12 Are mechanical load margins adequate?


• Immature state-of-the-art in the analysis of vibration, separation shocks,
thruster imbalance, and dynamic load caused many failures.
• Lessons: 116, 11, 81, 33, 27, and 69.

3-13 Will excessive thermal or electrical loads damage hardware?


• Example: Relays can be welded shut by in-rush current and cause premature
deployment.
• Output circuits should be self-limiting for worst-case failure currents.
• Lessons: 19, 71, 99, 87. and 44.

3-14 Can unexpected time-dependent circuit behavior be accommodated?


• Start-up and turn-off transients can introduce problems such as EMI.
• Lengths of transients, such as pyro firing pulses, should be bounded.
• Lessons: 82 and 77.

3-15 Has a thorough safety analysis been conducted on each pyro event?
• Pyros impart a large and irreversible shock to the system and are involved
in many mission failures.
• Pyro design should be checked against available guidelines.
• The effect of pyro shock on adjacent structures and circuits must be
thoroughly validated.
• If explosive bolt cutters are used, all ejected debris should be contained.
• Lessons: 109, 110, 111, 119, 98, 89, 68, 77, and 7.

A-8
3-16 Are deployables readily tested both in 0 g and in 1 g?
• Designs that work in 0 g but not in 1 g are difficult to verify.
• During the performance of 1 g test, avoid imparting force unavailable in
space.
• Lessons: 116, 42, and 20.

3-17 Will a malfunctioning valve cause a failure?


• Contamination in valves has led to numerous failures.
• Make sure mistakes, such as software er rors, in the valve controller will not
disable the vehicle.
• Lessons: 83, 65, 57, and 54.

3-18 Do moving units possess sufficient torque margins and clearance?


• Soft items such as cable and multi-layer insulation can move unexpectedly
in the launch or space environment and cause interference.
• Consider stiction in torque analysis.
• Avoid structures that can snag soft items, and route wires to avoid pinching
or snagging by a deployed structure.
• Lessons: 116, 107, 78, 42, 9, and 70.

3-19 Will the solar array flutter?


• Conduct modal frequency analysis to avoid excessive vibration of the solar
arrays upon entering or exiting the Earth’s shadow.
• Lesson: 13.

3-20 Has a worst-case analysis of EMI or crosstalk been conducted?


• EMI analysis should consider the possibility of multiple boxes working in
unison causing superimposition (such as in TDMA payloads).
• Lesson: 86.

3-21 Are the power distribution and grounding schemes, including over-voltage and
under-voltage limits, safe?
• All units should be protected from over- or under-voltage conditions.
• Double-check if a fuse should be installed, and carefully analyze fault
scenarios to size fuses.
• Components such as step motors and pyro circuits that experience sudden
current changes should be isolated from all other current-carrying circuits.
• Lessons: 98, 104, 114, and 117.

3-22 Are all known quirks of field programmable gate arrays (FPGAs)
accounted for?
• FPGAs have demanding electrical design rules and software interface.
• A NASA website http://www.klabs.org/ describes common design mistakes.
• Lessons: 77 and 100.

A-9
A-10
Section 4: Failure Modes and Fault Management

4-1 Has the fault protection logic been independently verified?


• The fault management system (particularly the software) can be a source of
single-point failures.
• Example: Faulty sensor data may create a phantom problem and spoof the
fault management system into taking precipitous actions such as resets.
• Fault detection setting and responses should pass sanity checks. Endless
resets, for example, are dangerous.
• Lessons: 18, 36, and 43.

4-2 Will the satellite autonomous management system and the ground controller be
provided with correct information?
• Inaccurate situation awareness can lead to wrong disposition.
• Ensure subsystems report true status to the autonomy functions.
• Lessons: 44 and 29.

4-3 Does the fault management design consider all operational possibilities?
• Example: solar array mispointing, engine abort, or eclipse transient.
• Lessons: 36 and 38.

4-4 Is telemetry sufficient for all critical events?


• Knowledge for events such as separation can enable recovery.
• Capture indelible records of system parameters in past events with, for
example, strip chart records.
• Lessons: 67 and 36.

4-5 Are multiple safeguards available during early operation?


• Problems frequently occur during early orbit operation.
• Ground coverage must be ample.
• The satellite should autonomously operate in case ground commands do not
arrive promptly (due to erroneous position estimation, for example).
• Lessons: 101, 39, and 53.

4-6 Can a glitch trigger a crash?


• Systems should be designed to revert to “last known good state.”
• Example: A momentary wiring short in the bus may reset all relays, with
fatal consequences.
• Lesson: 91.

A-11
4-7 How will the satellite handle battery undercharging?
• The satellite should be able to automatically shed non-essential loads under
low voltage.
• Even a partially deployed solar array should provide enough current to
sustain the system.
• The power regulator should be energized from the solar array, instead of
being solely dependent on the battery for housekeeping.
• Lessons: 53, 47, 67, 30, and 101.

4-8 Can the fault management system itself survive major anomalies?
• Example: If a computer freezes, will fault correction software execute?
• Lesson: 35.

4-9 Are contingency plans for on-orbit anomalies adequate?


• Contingency recovery plans, such as to correct the spacecraft’s attitude,
should be based on realistic timeline constraints and rehearsed.
• Lesson: 60.

4-10 Can a problem in a primary unit cause the same failure in its backup?
• If the primary and redundant units share the same current feed, software, or
processor, one flaw in the primary component can cause the backup to fail
in the same way.
• Lessons: 18 and 19.

4-11 Can serial safety devices (inhibits) fail simultaneously?


• Deployment mechanisms such as squibs or wax heater actuators should
have separately driven safety devices lest one single error defeat both.
Failure analysis of safety devices is particularly tricky.
• Lessons: 77 and 100.

4-12 Can a device damage its neighbors?


• Example: EMI or shock from squibs and step motors.
• Lessons: 89 and 100.

4-13 Does the design allow in-flight upgrades?


• On-orbit reprogrammability provides flexibility.
• Lessons: 50, 33, and 23.

4-14 Can the on-board computer be safely reset?


• Executable software should be easily loadable even if the computer
locks up.
• Consider providing a backdoor receiver with default mode to overcome a
computer lockup.
• Lesson: 79.

A-12
Section 5: Embedded Software and Database

5-1 Will unexpected inputs cause the software to freeze or loop endlessly?
• Lessons: skipped sensor input data, data outside the expected range, or data
that does not compute.
• Software should ignore spurious inputs through filtering or limit checking.
• Consider deliberately ignoring faults if there is no possible recovery.
• Avoid permitting software to reset in response to errors. Consider error
messages in telemetry instead.
• All “IF” branches should provide an “ELSE” for the unexpected input.
• Lesson: 18.

5-2 What happens if the software hangs up?


• Fault management logic must provide a way out.
• Fault analysis must not assume perfect software.
• Consider independent fault protection, such as hardware watchdog timers.
• Lessons: 35, 36, and 18.

5-3 Can the computer get stuck during boot up?


• Do not let an error or malfunction prevent the computer from booting up.
• Make sure watchdog functions will not cycle between start and reset.
• Lessons: 112 and 79.

5-4 Will it be possible to remotely diagnose computer problems?


• Consider keeping debug utilities.
• Avoid using dynamic memory allocation, which may complicate
troubleshooting.
• Lesson: 94.

5-5 Is every critical software under configuration control?


• All software, not just the software that is uploaded, that affects satellite
behavior is critical and requires careful verification.
• Changes should be tracked back to requirements and specifications.
• Lesson: 73.

5-6 How are database parameters verified?


• Treat database loading as carefully as coding.
• Ensure data entry procedures are free from human errors, and conduct
independent verification of database integrity.
• Lessons: 3 and 43.

A-13
5-7 Are command scripts formally controlled?
• A bad command sequence can be fatal.
• Lessons: 29 and 104.

5-8 Will testing exercise all logic branches?


• Software should be tested over several days of equivalent mission time to
find problem such as timing errors, overrunning counters, or unintended
re-entries to “one-time” events.
• Use automated tools to verify code paths.
• All branches should be exercised and all parameters should be verified.
• Lesson: 94.

5-9 How are reused or modified codes verified?


• Software changes should be controlled and retested as rigorously as
hardware modification.
• Issues understood by the original designer may be overlooked during
modification.
• Reused software should be compatible with the new application
environment.
• If a function in the reused code is not used, make sure it is completely
disabled and has no output to downstream code.
• Consider stripping off “dead” code.
• Lessons: 18, 25, 48, 79, and 124.

5-10 Has the flight software been tested with high-fidelity hardware in the loop, in
the flight configuration?
• The ground test bed should be configured the same as the flight computer.
At a minimum, the test bed should have the flight processor, flight
memories, flight software, flight cables, flight power management
equipment, and high-fidelity engineering model hardware.
• Test beds should include test points for measuring all signal and control
voltages and currents.
• Lessons: 19, 36, and 53.

5-11 Are memory and throughput margins adequate?


• Check against unexpected data rates or excitation.
• Ensure the system can handle a runaway sensor.
• Lessons: 115, 112, 35, and 94.

5-12 Have all major events been scrubbed for out-of-sequence inputs?
• A signal arriving earlier or later than expected can trigger unintended timing
conflicts.
• Missing data may leave the system in an unknown state.
• Lessons: 12, 25, and 104.

A-14
Section 6: Interfaces

6-1 Have interface authority, end-to-end responsibility, and conflict resolution


authorities been assigned?
• Examples: payload-to-bus, satellite-to-GSE, satellite-to-launcher, and
satellite-to-ground interfaces.
• Create a constructive mechanism to proactively distribute requirements,
flow down error budgets, and assign footprints or connectors.
• Lessons: 37, 81, and 105.

6-2 Have potential incompatibilities between interfaces been analyzed early on?
• Independent analysis is often needed to overcome organizational barriers.
• Lessons: coupled loads, nutational instability, and EMI.
• Lessons: 2, 11, and 33.

6-3 Are handover procedures between two sources of control well defined?
• Two pieces of equipment vying for control (or each assuming the other is
doing the job) can be dangerous.
• Conduct thorough switching analysis to ensure fail-safe transfers.
• Lessons: 105 and 81.

6-4 Are there items that could resonate with one another?
• Example: Spacecraft can mechanically resonate with the launch vehicle,
causing fatigue damage.
• Lesson: 11.

6-5 Do interfacing organizations use different engineering conventions?


• Use the engineering model to verify interface early.
• Lessons: English/metric units, positive/negative polarity grounding.
• Lessons: 73 and 93.

6-6 Are launch integration operations thoroughly planned?


• Ground support, typically involving several organizations, often causes
confusion, or even equipment damage.
• Consult the AIAA/NRO Space Launch Integration Recommended
Practices.
Lessons: 71, 76, and 105.

A-15
A-16
Section 7: Parts, Materials, and Manufacturing Process

7-1 Are drawing tolerances compatible with manufacturing processes?


• Unnecessarily tight tolerances cause manufacturing difficulties.
• Tolerance stack-up can result in improper fits, or even failures.
• Provide sufficient stress relief points to prevent fatigue.
• Lessons: 47, 4, 8, and 52.

7-2 Are honeycomb structures vented?


• Unvented honeycomb panels can entrap moisture and violently delaminate
during ascent depressurization. Several failures have occurred as a result.
• Lessons: 1 and 34.

7-3 Does any part, including those subcontracted, contain pure tin-plating or
cadmium?
• Tin whiskers can cause shorts and arcing and have disabled several
satellites.
• Cadmium, commonly used to plate airborne equipment, outgases in space.
• Audit vendor or subcontractor materials lists to ensure completeness.
• Lessons: 49 and 5.

7-4 Are there separable flared fittings (B-nuts) or check valves in fluid lines?
• B-nuts and check valves can leak.
• Lessons: 83 and 15.

7-5 Are cables, connectors, and circuit cards labeled and/or keyed to prevent
mismating?
• Mismating can cause inadvertent shorting during testing, even flight failure.
• Lesson: 63.

7-6 Can installations at launch site be readily verified?


• The pressure of prelaunch preparation often causes mistakes.
• Lessons: 61, 63, and 43.

7-7 Will rework be difficult?


• Rework is a fact of life in our business.
• Lesson: 57.

7-8 Are there procedures to prevent parts from being mixed up?
• Different parts may look alike.
• Lesson: 51.

A-17
7-9 Did a significant accident occur during manufacturing?
• Make sure the MRB thoroughly investigated the anomaly before accepting
the part as-is.
• Lesson: 6.

7-10 Have the root causes of manufacturing problems been corrected?


• The corrective actions should be back annotated on drawings or shop orders
to prevent recurrence.
• Lesson: 64.

7-11 Can each work instruction be verified?


• Verification should be done independently.
• Make sure rework instructions include verification.
• Lessons: 32, 118, and 88.

7-12 Might handling damage delicate hardware?


• Examples: composite pressure vessels, primary battery, optics, and
cryogenic equipment.
• Procedures used to handle satellites during test and integration should be
reviewed by safety personnel.
• Lessons: 22, 28, 54, 88, and 102.

7-13 Will handling or testing procedures reduce hardware life?


• Example: running high-speed tests in air can destroy lubricants.
• Lessons: 9 and 21.

7-14 Were excessive acceleration factors used to qualify design life?


• Acceleration factors larger than 5-10 should be independently ascertained.
• Different degradation mechanisms, not susceptible to thermal acceleration,
may operate in flight.
• Lesson: 95.

7-15 Are procedures adequate to prevent non-flight items or debris from being left
inside the hardware?
• Make sure there is a special tracking system for non-flight items since loose
materials have led to numerous reworks or failures.
• Lesson: 90.

7-16 Are manufacturing facilities sufficiently clean?


• Examples of equipment requiring special care include valves, high-voltage
electronics, heat pipes, and optics.
• Watch out for contamination (especially chloride).
Lessons: 16, 41, 45, 65, and 75.

A-18
Section 8: Testing and Evaluation

8-1 Do the tests independently confirm development results?


• If testing reuses equipment, analysis, or algorithms from design or
manufacturing, a source of single-point failure exists.
• Manual adjustment, such as shimming and alignment, should be
independently verified, too.
• Lesson: 96.

8-2 Have results been analytically established before testing?


• Tests should be used to verify analysis, not for discovery.
• All testing must be preceded by prototyping and analysis, followed by
model correlation and updating.
• Problems found during late testing can be very costly.
• Lessons: 60 and 37.

8-3 Is the polarity (phasing) of equipment (hardware coupled with software)


correct?
• Phasing mistake (particularly in the ACS subsystem) is one of the most
common sources of failure.
• Lessons: 60, 80, 43, 53, and 93.

8-4 Can a simple test be used to crosscheck an elaborate test?


• Although a simple test will not provide the necessary precision, it can
prevent gross errors.
• If the equipment can pass the more elaborate test, it should pass the simpler
test easily. Failure to pass the simple test must thus be treated as a red flag.
• Lessons: 103, 96, and 80.

8-5 Has all test data been reviewed for trends, oddities, “out-of-family” values, and
other indicators of anomalies?
• Test sets should collect data and enable automatic trending.
• Excessive current draw during electrical test (suggestive of an impending
short) and high G spikes (indicating intermittent rubbing) during acoustic
testing should receive particular attention.
• Many problems occur during the first temperature cycle. Therefore, the
results after the first cycle should be scrutinized.
• Lessons: 71, 39, and 19.

A-19
8-6 Are all test anomalies fully understood?
• Many flight failures first occur during tests but are mistakenly attributed to
“random failures” or “test set malfunctions.”
• Test equipment should be sufficiently powerful to enable unambiguous
assignment of anomaly causes.
• Lessons: 106, 120, 92, 38, 46, 55, and 56.

8-7 Have the test articles been fully inspected after testing?
• It is particularly important to inspect the hardware after vibration or
acoustic tests, thermal cycling, or live pyro firing.
• Lessons: 100, 66, and 7.

8-8 Do the tests cover all operating modes?


• Conditions worth checking include eclipse transits, cold start, safe-holding,
load shedding, and recovery.
• Simulate each operational mode through several cycles.
• Lessons: 84 and 38.

8-9 Is the test equipment compatible with the test conditions?


• Example: Test equipment used inside the thermal vacuum chamber must be
space qualified to prevent damage to the hardware or test facility.
• Lessons: 114 and 49.

8-10 Could the hardware be damaged during testing?


• Test equipment should be properly maintained and calibrated.
• Trial runs using limited force, current, or temperature should be made first
and the responses characterized.
• Hardware should be protected from sudden test equipment malfunctioning.
• Vibration of large satellites should be avoided.
• Lessons: 24, 66, 74, 102, 109, 110, and 111.

8-11 Do tests accurately simulate time-dependent (especially start-up) behavior?


• Before an analog circuit stabilizes, it can behave unpredictably.
• Equipment that change status abruptly (step motors or pyro firing circuits,
for example) often exhibit, or require, an unexpected time profile to
overcome initial resistance or prevent premature decay.
• The test set should be able to record transients!
• Lessons: 77, 82, 115, 123, and 124.

A-20
8-12 Does the test equipment allow sneak paths?
• Sneak paths via the test set can mask hardware deficiencies (by providing
gratuitous grounding or power, for example).
• If test equipment temporarily provides certain functions, independently
verify that the hardware can operate on its own.
• Test set sneak paths can also damage hardware.
• Lessons: 58, 72, 109, 111, and 119.

8-13 Have the units demonstrated an ability to start without the need of ground
equipment (plug-out) or manual intervention?
• It is particularly important to check payload, GN&C, and C&DH processors
to prevent endless looping.
• Lessons: 84 and 79.

8-14 Does the design allow adequate inspection?


• Unambiguous inspection criteria should be developed before verification.
• Lessons: 120 and 47.

8-15 Does the system being tested represent the flight configuration?
• Insert enough test points to compensate for items that could not be
live-tested (thrusters and deployment mechanisms, for example).
• Lessons: 85, 53, and 19.

8-16 Does the test inject sufficient off-nominal conditions to ensure the equipment
is robust?
• Examples of off-nominal conditions include current spikes, sluggish
separation wire breakage, and excessive data rate.
Lessons: 44, 56, 94, 103, and 123.

A-21
A-22
Appendix B: Space Systems Engineering Lessons 1-124

B-1
B-2
Space Systems Engineering Lessons Learned
Lesson 1
Honeycomb Structures Should be Vented to Reduce Delamination Risk

The Problem:
Several satellites have been destroyed when their honeycomb structures failed. Examples in-
clude:
• A NASA satellite was destroyed at T+103 sec when the payload fairing reached 600°F.
During subsequent ground tests, the witness panels disintegrated (1964).
• A DOD rocket blew up shortly after launch. Later, the fairing's witness panel came apart
when tested on ground (1966).
• Another DOD satellite was severely damaged upon launch. The fairing for the next flight
was subsequently proof tested, whereupon it also burst (1981).
• Two solar array panels on a DOD program failed during qualification (1985).
• The massive hydrogen tank on an experimental reusable launch vehicle delaminated,
eventually causing the program to be cancelled (1999).
The Cause: As Fabricated

Honeycomb panels for terrestrial applications are


Fluid Ingression
usually unvented—neither the panels nor the
cores have holes. However, unvented honey-
comb structures should not be used in space be- Aerodynamic Heating
cause aerodynamic heating during launch can
cause temperature to rise dramatically. In an un- Escape Explode
vented design, entrapped fluid (e.g., moisture)
can expand, turning each cell into a tiny pressure
vessel that stresses the skin-to-core bonds. Vented Unvented
Debonding is apt to occur if the panel has a weak
bond due to manufacturing defects. Perforating honeycomb cells relieves
pressure during ascent
Lessons Learned:
• Honeycomb structures for space systems should be vented whenever possible. The vast
majority of spacecraft or launch vehicles use vented honeycomb structures, and these have
not failed in space.
• If an unvented design cannot be avoided (e.g., to avoid contamination), it is necessary to
adopt extensive development, verification, and quality assurance, including proof tests
under applicable temperature and vacuum conditions. Aerospace experts are available to
explain detailed design and quality-assurance requirements.

For more technical information, call S. R. Lin at (310) 336-7697.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 1
Space Systems Engineering Lessons Learned
Lesson 2
Perform Independent Mass Property, Stability Control, and Structural Load Analyses
on Spacecraft and Launch Vehicles

The Problem:
Mistakes in determination of mass-property and control-stability analyses have caused a large
number of launch failures. Examples include:
• Inappropriate reuse of aerodynamic coefficients (1994).
• Unanticipated structural vibration mode not filtered out (1995).
• Incorrectly simulated weight (1995).
• Underprediction of the load as well as an unexpected resonance due to wind shear (1992
and 1995).
• Unexpected increase in horizontal velocity (1996).
• Unaccounted roll mode caused by air-lit solid rocket motors (1998).
Flawed analysis has also led to numerous on-orbit anomalies.
The Cause: SV Structural
Dynamic Model
Launching a satellite calls for extremely complex SV Drawing
simulation of the mass, thermo-structural, fluid-
mechanical, propulsion, and control properties (a Response Recovery
Equations
single subsystem can easily involve over 100,000 Coupled
Engine Drawings
equations). The state of the art in this area is far from SV/LV
Analysis
robust: subtle assumptions, insufficiently sophisti- Fairing Drawings

cated techniques, or human errors can all throw the Tank Drawings
LV Structural
Dynamic Model
results seriously off. U/S Drawings
Moreover, when the satellite is integrated with the LV Drawings
launcher, each organization must generate parochial
models but each has little insight into each other's
analytical process. Costly problems can easily arise Integrating space vehicle (SV) to launch vehicle (LV)
without a clear settling of responsibility, especially involves complex modeling; independent analysis is
with today's emphasis on proprietary data protection. often necessary to overcome organizational barriers.

Lessons Learned:
• Inaccuracies on mass property, stability control, and structural loads continue to threaten
mission performance.
• To ensure correct analysis, many programs require an independent analysis. These activi-
ties also help validate operational procedures, support flight anomaly resolution, and
overcome the organizational issues. There have been no catastrophic failures in programs
that abide by this policy, and several failures were averted thanks to independent analysis.
For more technical information, call Ray Skrinska at (310) 336-4001.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 2
Space Systems Engineering Lessons Learned
Lesson 3
Rigorously Manage and Test Software, Including the Database

The Problem:
An expensive military satellite failed to reach the right orbit because a misplaced decimal
point in the avionics database of the upper stage caused the reaction controller to fire exces-
sively, depleting its fuel.
The Cause: Before wrong database was loaded

Multiple deficiencies in the software development,


testing, and quality assurance (QA) processes
allowed a single-point failure escape. Specifically:
• The process to create and test the constants data-
base was poorly documented, fragmented, and
not well understood. The control dynamics en-
gineers created a new roll rate filter constant
instead of using one that had been previously After
validated. This critical number was manually
entered in error, slipped through visual inspec-
tion, and was not formally checked.
• The as-flown constant was neither independ-
ently verified nor validated due to a lack of The wrongly placed decimal point caused the
middle line to become flat. This anomalous
overall software ownership. Many players were reading was flagged at the launch site but fell
involved in the process, but none completely through the crack.
understood it.

As the program downsized, mission assurance functions were supposed to change from “over-
sight to insight.” This transition did not successfully take place, and the problem sneaked
through all QA gates.
After the wrong constant was loaded, launch site personnel saw anomalous reading and tried
to contact the designers. However, the issue was ignored. Even during the day of launch, the
rocket showed a wrong response to the wind and to the rotation of the earth. A simple plot
could have identified the problem and averted the failure.
Lessons Learned:
• One must test actual flight hardware and software.
• The integrity of software databases is no less critical than the source codes.
• The space business is extremely complex and human error cannot be completely elimi-
nated. The system must be robust enough to catch the inevitable faults.

For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 3
Space Systems Engineering Lessons Learned
Lesson 4
Document Engineering Requirements As Clearly As Possible

The Problem:
Two very expensive mishaps occurred recently, in part, due to inadequate communications
between the designers and the manufacturing operation:
• The combustion chamber of a rocket engine breached because an unclear requirement
made it possible for a weak joint to pass quality assurance, leading to the loss of a $230M
commercial satellite.
• A DOD satellite was stranded in the wrong orbit because confusing drawing instructions
led technicians to apply thermal protection tape in a way that prevented stage separation.

The Cause:
In the first incident, the seams of the engine are re- Design Intention (80%
inforced with many metal strips. The design per linear inch) means
requires the strips be brazed "80% per linear inch" there can be no big void Brazing voids
(i.e., no big holes, see diagram), but the drawing anywhere.
only specified "80%".
X-ray photos revealed that some strips were
poorly brazed, but they were allowed to pass since Actual requirement (80%)
the requirement was thought as "80% coverage implies that a big hole is OK as
averaged over the entire length of the reinforce- long as there is 80% coverage over
ment strip." The strips failed in flight. the entire length

In the second failure, the work instruction stated Deleting the "per linear inch" phrase led
QA to pass joints with low brazing
that the wrapping should be applied "within 0.5 coverage. In flight, the defective part
inches of the mounting bracket flange" (instead of caused combustion chamber to breach.
saying, e.g., no closer than 0.5 inches). The techni-
cians, not knowing that the parts were to unfasten,
applied the tapes as closely to the flange as possi-
ble, making separation impossible. As-built
Correct
Lesson Learned:
• Engineers must clearly articulate their inten-
tions and determine how the requirements
should be interpreted or could be miscon-
strued. This is particularly true when making Thermal tapes were too tightly wrapped
seemingly minor (Category II) changes. over the as-built connector and inhibited
stage separation.

For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 4
Space Systems Engineering Lessons Learned
Lesson 5
Avoid Pure Tin Plating

The Problem:
Pure tin plating can grow conductive filaments (whiskers) which have caused many problems.
Examples include:
• In the late 1990s, at least four commercial satellites had problems with their spacecraft
control processors (SCP), reportedly because whiskers grew on the relays and caused the
power-supply fuses to blow. In three cases, both the primary and the redundant SCPs
failed, and the satellites were lost.
• Again in the late 1990s, three DOD programs incurred costly delays: one discovered tin
whiskers in an atomic clock, the second found tin whiskers on ground lugs, and the third
saw tin whiskers forming inside thin-film capacitors.
The Cause:

MIL-STD-1547B bars several materials from space


hardware. Among these "prohibited materials," tin
is most noteworthy. Pure tin plating is often used

NASA
commercially because it forms an excellent protec-
tive layer that accepts solder readily. Plating shops
prefer pure tin over tin-lead to avoid lead disposal Tin whisker shorts
costs.
However, pure tin is liable to spontaneously form conductive whiskers, which can provide an
unwanted conductive path and degrade hardware by causing shorts and even catastrophic
arcing. The whiskers appear unpredictably, without the need of an applied voltage or moisture
(unlike silver dendrites), even in vacuum. It is impossible to ensure hardware integrity by
inspection or by stress testing—the only way to prevent this problem is to eschew pure tin
plating, fused tin, and alloys with very high (greater than 97%) tin contents.
Lessons Learned:
• Prohibit pure tin plating in both flight hardware and ground equipment but assume tin will
be found.
• Ensure prime contractors flow down unambiguous plating requirements, and perform
appropriate receiving inspections.
• Purge prohibited materials from project stores and standard catalog items, paying particu-
lar attention to the "commercial parts."
• Review subcontractor designs and part specifications to confirm that parts are safe.
• Apply conformal coatings on all exposed conducting surfaces wherever possible to inhibit
shorts and vacuum arcing.
For more technical information, call Katherine Westphal at (310) 336-8794 or Steve Frost at
(310) 336-7131.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 5
Space Systems Engineering Lessons Learned
Lesson 6
Following a Major Repair, Watch Out for Secondary Damage

The Problem:
In two launch failures, the Material Review Board (MRB) allowed repaired hardware to be
used without taking secondary damage into full account. The first incident led to the destruc-
tion of three DOD satellites; the second mishap stranded another DOD satellite in a wrong
orbit.
Restrictor Patch
The Cause:
In the first incident, a large cut was made on a Patch
rocket segment during repair, and the slit was sub- Restrictor
sequently patched up. The engineers expected the Cut

cut to close by internal pressure, but it opened in-


stead, allowing the flame to burn through the case. Propellant Inside
Propellant
Afterwards, the manufacturer implemented several
SRM segment Repair side view
corrective actions to address the MRB repair proc-
ess. The need to repair was eliminated by process Patching of deep cut allowed flame to
changes, and other repaired segments were burn through the case
scrapped.
In the second case, the fabrication of the apogee
kick motor (AKM) nozzle involved wrapping a re- Titanium Case/Liner/
Internal Insulation
inforcement layer over the primary structure in a Propellant
Overwrap

bag, and heating the assembly under hydraulic Igniter Throat


Primary Structure
Liner
pressure to cure. The bag broke, and the part came Nozzle Insulator
Assembly
into contact with water. The contractor then ma-
chined off the semi-cured overwrap layer, laid up a Apogee kick motor
new overwrap, and resumed production.

Unfortunately, the part was not oven-dried—moisture was trapped in the primary structure
and diffused back to the interface when the part was cured again. Not only was the mechani-
cal strength lower as a result, but the interfacial adhesion between the primary structure and
the overlap also became seriously degraded. During flight, the nozzle was unable to withstand
the motor pressure and was ejected.

Lesson Learned:
• Ad-hoc repair processes tend to be much less defined and qualified than regular
manufacturing operations. MRB reviews need to be more vigilant, and significant MRBs
should be added to the readiness review process. In particular, the possibility of secondary
damage must be taken into account.
For more technical information, call S. R. Lin at (310) 336-7697.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 6
Space Systems Engineering Lessons Learned
Lesson 7
Perform High-Fidelity System Validation Tests for Pyrotechnics

The Problem:
Explosive devices (pyros) are highly efficient, easily controlled, and can be readily stored.
However, several anomalies occurred when pyros were turned on:
1. A science mission ended during the first orbit when its infrared telescope cover was
unintentionally ejected, causing the loss of all cryogen (1999).
2. Three satellites, one for Earth observation, one for communication, and one for science,
failed due to propulsion-system ruptures induced by pyros. A propulsive valve on a fourth
similarly failed on ground (early 1990s).
3. An interplanetary probe almost fatally failed when the firing of a pyro initiator caused a
voltage surge and induced a latch-up in the redundant memory board. The mission would
have ended if the primary memory board had been affected (1989).
The Cause:
The telescope cover was ejected because a
Pr imary
controller chip took a few milliseconds to Pyro
Back-up
Pyro

warm up, during which a transient was gen- F uel Line F uel Line

erated. The designer did not take this known Thruster


(a) Dual pyrovalves (b) Primary fires, some
problem into account, and the design was before firing hot gas leaks into line

not reviewed. Ground test failed to catch the


flaw because a lab power supply was used,
and its slower power rise time masked the
Fuel
transient. In flight, a relay applied power in
two milliseconds, allowing the spurious fir- (c) Back-up fires, hot gas
turbulently mixes into fuel
(d) fuel explodes,
breaching line
ing to occur.
The four 1990 incidents involved dual "pyrovalves": the fuel-feed system incorporated two
valves, the primary opening one second before the redundant. The second firing could lead to
a blow-by of hot gas, igniting the propellant and breaching the fuel line. This problem escaped
earlier tests that used an inert working fluid.
In the 1989 incident, the problem was not easy to spot, but could have been found if the engi-
neering model had been tested with a simulated (non-explosive) pyro.
Lessons Learned:
• Pyros by themselves are very reliable, but the adjacent systems must be designed to with-
stand the mechanical or electrical shocks generated by the pyros.
• Tests should simulate flight configuration and functional performance.
• Post-test examinations of qualification or acceptance specimens should look for signs of
inferred margin or incipient failure modes.

For more technical information, call Selma Goldstein at (310) 336-1013.


For comments on the Aerospace Lessons Learned Program including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 7
Space Systems Engineering Lessons Learned
Lesson 8
Solar Arrays Must Withstand Extreme Environments

The Problem:
Solar array mishaps have disabled numerous satellites. Examples include:
• Two Earth observation satellites failed due to shorts in the solar-array system, one in 1978
and another in 1993.
• In 1999, a technology demonstration spacecraft experienced excessive solar panel
degradation that ended its mission prematurely.
• In the late 1990s, two commercial satellites suffered serious power losses, reportedly in
solar storms.
The Cause:
Solar arrays contain many fragile elements, and are exposed to wide temperature fluctuations
and other space hazards. They are thus particularly vulnerable to a host of problems that the
designers must guard against. The mishaps above were caused by faulty materials, processes,
and insufficient testing. Staking
Adhesive Insulated
Wire
In the case of the commercial satellites, the wiring Gr/Ep
Facesheet
harnesses were squeezed into tight feed-through
holes with sharp kinks and without sufficient Aluminum Core

strain-relieving loops. Temperature cycling, cou- Wire-to-facesheet


Kapton Film
pled with the movement of the adhesive, shifted Contact
As-Fabricated
the wires by several mils relative to the facesheets
during each cycle. More Stress
Relief

With repeated heating and cooling, the insulation


was abraded, as if by a saw. A short was inevita-
ble, and was triggered by electrostatic discharges
Thicker Insulated
(ESDs) during weather storms. The problem could Wire
easily have been averted if the harness incorpo- Correct
rated ample stress relief and thicker insulation. Insufficient stress relief and insulation
caused abrasion of wiring harness
Lessons Learned:
• Solar arrays should be carefully designed to prevent their fragile parts from being
damaged by the hostile space environment.
• Satellites must be robustly designed to withstand the extremes of space weather as well as
other space hazards.

For more technical information, call Robert W. Francis at (310) 336-6272.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 8
Space Systems Engineering Lessons Learned
Lesson 9
Excessive Handling Can Destroy Solid Lubricant

The Problem:
Lubricants based on molybdenum disulfide (MoS2) are used in gyros, drives, gimbals, or
other moving mechanical assemblies. Several problems involving this lubricant have been
noted, including:
• A microwave imager on a weather satellite catastrophically failed.
• A degraded sun sensor on another weather satellite caused excessive oscillation.
• The high-gain antenna on an interplanetary probe could be not fully opened.
The Cause:
MoS2 has excellent properties in space, but it
oxidizes in the presence of moisture. Hence,
MoS2 is degraded either by improper han-
dling or by prolonged storage.
Unfortunately, ground tests can fail to detect
degraded lubrication because materials can
behave differently on the ground than they
do in space.
The imager problem occurred because (a) High gain antenna unfurls (b, inverse view) The
like an umbrella. Excessive motor could not overcome
manufacturing and storage exposed the la- friction developed between the the friction and stalled, and
bile lubricants in the slip-ring assembly to pin and the socket (inset) due the antenna could not open.
excessive oxidation. Furthermore, the part to loss of lubricant.
was stored for more than 11 years, causing
more lubricant loss. The sun sensor problem
was also traced to oxidation and contamina-
tion of the slip-ring materials during storage.
The high-gain antenna problem was caused by excessive handling (including vibration test-
ing, rib pre-loading, and four cross-country trips) that dispersed the lube. Ground testing did
not catch the problem because the vacuum test was not realistic and because the titanium pins
got some lubrication (from the contaminants in the test chamber) not available in space.

Lessons Learned:
• Operation, testing, or storage of mechanisms under nonvacuum conditions must be per-
formed with caution when MoS2 dry lubricant is involved.
• Follow Aerospace's handling and storage guidelines to safeguard lubricants.

For more technical information, call Jeff Lince at (310) 336-4464.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 9
Space Systems Engineering Lessons Learned
Lesson 10
Design Satellites to Withstand Space Weather, Regardless of Solar Cycles

The Problem:
Space environment has caused hundreds of on-orbit anomalies, including:
• A military satellite lost power to its communications subsystem suddenly (1973).
• A weather satellite lost its primary instrument (1982).
• A foreign weather satellite lost attitude control (1988).
• A foreign communication satellite found its solar cells severely damaged (1991).
• A foreign commercial satellite was disabled for seven months after both reaction wheels
failed (1994).
• A foreign communication satellite lost power (1997).
• A foreign science satellite was abandoned when increased atmospheric drag overpowered
the attitude control system (2000).
The Cause:
The principal space weather hazards involve geomagnetic storms, which are stirred up when
large numbers of solar particles hit the Earth’s magnetic field. Storms can trigger an electro-
static discharge (ESD) in the spacecraft: all failures cited above except the last one involved
ESDs. Max 180
Max 160 Max Max
Space weather hazards are often thought as
Sunspot Number

140
120
mainly driven by the 11-year solar cycles. For 100
80
example, there was extensive “satellite-killer” 60
40
hype in the media in 2000 because one cycle 20
0
peaked late that year. Conversely, some peo- 1970 1975 1980
Year
1985 1990 1995 2000

ple associate periods of low solar activities : Catastrophic Failures Due to Charging
: Other Weatherr-induced Catastrophic Failures

with minimal weather hazards. Space Weather Hazards Can Occur Outside of Solar Max

This belief is unfounded since space weather hazards and solar activity only marginally
correlate. Geomagnetic storms can occur anytime, not just during the height of the solar cy-
cles. Satellites can thus fail during valleys of solar cycle as easily as during peaks. Moreover,
all storm prediction efforts, including new spacecraft designed to monitor solar activities,
have been unsuccessful so far, and satellite operators cannot count on being forewarned of
weather threats.
Lessons Learned:
• Spacecraft must be designed to withstand worst-case space environments as a matter of
course.
• Satellites should be hardened against ESD, using well-established design guide-lines on
structure, materials, shielding, cable interfaces, and circuits.
For more technical information, call Harry Koons at (310) 336-6519.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 10
Space Systems Engineering Lessons Learned
Lesson 11
Carefully Evaluate Satellite-Launcher Interface

The Problem:
An experimental spacecraft fell silent after having been successfully released from the launch
vehicle. This failure was deemed to have occurred because unexpectedly high vibration de-
veloped in the launch vehicle before it was air-dropped, imparting stress in the satellite
beyond its design limit.
1E-1
The Cause:
ICD Spec
This failure was caused primarily by a satellite- 1E-2

launcher interface problem: 1E-3

Power Spectral Density (PSD) in G2/Hz


• The booster, while being carried by the 1E-4
launching airplane, vibrated at 40-50 Hz. In
1E-5
several previous flights, shaking went beyond 1 10 100 1000 10000
Frequency - Hz

the level spelled out in the Interface Control (a)


Document (ICD). As a result, the rocket con- 1E-1

tractor reduced the airplane's speed to 1E-2


minimize this problem. Still, vibration in this
1E-3
flight was double the specification.
• The satellite exhibited a structural resonance at 1E-4

40 Hz. During factory test, this resonance am- 1E-5


Frequency - Hz

plified an acceleration input six-fold. 1 10 100


(b)
1000 10000

• The satellite contractor conducted the vibra-


Vibrational forces, expressed as power
tion acceptance test at a lower level than the spectral density (PSD) in log scale (a)
ICD specification. A defect in the electronics imparted on the spacecraft by the carrier
or harness probably went undetected in the airplane, and (b) as satellite's response
test, but propagated under a combination of toward an even level of excitation. Spacecraft
excessive in-flight vibration and resonance to resonated at the frequency where above-spec
shaking took place.
cause the failure.
• Both the launcher and the satellite prime contractors recognized the vibration issue and
proposed to conduct a coupled-loads analysis. It was not performed because the program
office, which served as the overall systems integrator, lacked funds.
Lessons Learned:
• Cables and connectors must be designed to withstand vibration-induced stresses.
• Margins must be reserved both in dynamic input estimation and in design.
• The interfaces among different organizations, particularly between the spacecraft side and
the launcher side, frequently lead to problems. Independent analysis is advised to over-
come organizational barriers (see Lesson No. 2).

For more technical information, call Robert Morse at (310) 336-2364.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 11
Space Systems Engineering Lessons Learned
Lesson 12
One Requirement, One Statement

The Problem:
Contact to an interplanetary probe was lost.
The Cause:
As the lander parachuted down, it deployed three
Legs deployed;
legs, each with a sensor designed to command the sensors tricked
Software read
sensors;
engine off upon touchdown lest the lander overturn. Descent decelerated shut down engine
with engine
Leg deployment shock could spoof the sensors into
thinking the probe had landed. To prevent the confu-
sion, the systems spec required: “The sensors
shall…(commence operation shortly before touch-
Landing Sequence
down). However, the use of the sensor data shall not
begin until...(after the leg deployment completes)….”

This “However...” phrase was unfortunately not picked up by the software team or by other
subsystems, and was not specifically tested at the system level. During descent, the deploy-
ment shock set off a status flag. When the touchdown sensing logic subsequently ran, it was
misled into thinking landing already occurred. The descent engine shut itself off prematurely;
the probe crashed.
The software walkthrough and integration/test did not detect this problem (logic flow dia-
grams could have helped). What’s more, a leg-deployment test failed to detect the fault
because the sensors were improperly wired at first. A rerun of the deployment test, which
might have caught the error, was not performed after rewiring.
Lessons Learned:
• Do not lump several requirements together—write them out separately so that each can be
tracked individually. Negative statements (e.g., “Sampling shall not begin until…”) may
cause misunderstanding and should be avoided.
• Systems engineers must take ownership of requirements and partition them to the
appropriate subsystem. Whether or not a requirement is the software’s responsibility, for
example, should not be left to the discretion of the software team.
• Systems engineering must ensure thorough end-to-end failure mode testing.
• The software review process should emphasize logic flow. Tests should exercise every
requirement to see if there are conditions that could cause the software to fail.
• Test planning needs to consider transients or spurious signals.
• When important tests are aborted or are known to be flawed, they must be rerun after the
errors are fixed. Repeat the test if any software or hardware involved are changed.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 12
Space Systems Engineering Lessons Learned
Lesson 13
Flexible Solar Arrays Are Susceptible to Thermally Induced Vibrations

The Problem:
Thermally induced vibrations of spacecraft appendages have recurred numerous times. Re-
sultant problems include:
• Two science satellites stopped spinning (early 1960s).
• Two Earth observation satellites showed large disturbances about the roll and yaw axes
whenever the spacecraft entered or exited sunlight (early 1980s).
• A space observatory had to have its solar arrays replaced on-orbit because “jitters” inter-
fered with star pointing (1993).
• A scientific satellite failed due to heating and expansion of the solar panels that damaged
the structure (1997).
The Cause:
Spacecraft equipped with long appendages or solar arrays are susceptible to attitude perturba-
tion upon entering or leaving the Earth's shadow, because large temperature gradients can
develop around the boom. The sun-facing side of the boom or array can bend and create a
torque on the satellite very rapidly, causing a flutter. Satellites with a single solar array are
most susceptible.

Panel before thermal


150 Long appendages can deform and
Gyro Rate Count

Torque gradient is applied 90


cause the spacecraft to shiver
30
-30
during eclipse transitions.
Connection to
vehicle Panel after thermal -90
Sunset Sunrise
Effective attitude control
gradient is applied -150
0 20 40 60 80 100 120 algorithms should be developed
Time in Minutes to address this concern.

The space observatory mentioned above, for example, employed flexible solar arrays with
telescoping booms. A thermal gradient as much as 25-deg C developed around the boom cir-
cumference within one minute, causing the tip of the spar to defect by 20 cm.
Lessons Learned:
• Flexible solar arrays and supporting equipment are sensitive to thermal environment.
• Thorough thermomechanical analyses of the solar arrays, particularly on their modal fre-
quencies, should be conducted.
• Control algorithms used to mitigate the effects of solar-array excitations should be refined.

For more technical information, call John Welch at (310) 336-6556.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 13
Space Systems Engineering Lessons Learned
Lesson 14
Look Beyond Specifications in Qualifying Materials by Similarity

The Problem:
Numerous failures have occurred due to deficiencies in substitution materials that were
thought to be similar to those originally specified. Some recent examples include:
• A rocket nozzle failed during test firing because a replacement insulator delaminated.
• The propulsion valves in a rocket broke down just before launch because the oxidizer
reacted with a new cleaning solvent.
• A solar array would not open in space because radiation caused a rubber spacer to become
sticky.
The Cause:
Programs sometimes must replace materials that are no longer available. It is often thought
that if the substitute meets all the specifications, it can be accepted "by similarity." This
approach can be risky; specifications usually only call out rudimentary requirements to fa-
cilitate incoming inspection—key tests used to qualify a material may be cumbersome to
repeat, and are routinely left out of the spec as new materials lots are received.
In the first incident, a supplier problem prompted the contractor to select a replacement resin
for the nozzle skirt. This new material met the applicable specification, had been used on
other programs, and had passed an array of tests in the laboratory. However, test results of the
new material were statistically different from the original material, and test conditions were
not sufficiently flight-like: many properties were measured at room temperature, whereas the
flight temperature approached 3000-deg F. Additionally, certain critical properties were not
measured, and the vital thermal expansion test was performed at too low a heating rate.

Outer The replacement material outgassed and


Thermal Expansion

Laminate 10oF/sec delaminated during firing. This problem


Asbestos/Phenolic
Insulation escaped qualification since slow heating
rates (0.1-deg F/sec) used in the lab
1oF/sec provided time for the gas to escape.
Faster rates would have revealed the
0.1oF/sec
Nozzle Skirt Insulation issue.
Temperature (oF)

In a test firing, the flame burned through the new resin. At the time, two rockets having
nozzles made from the new materials were already being prepared for launch. Potential losses
of the satellites were narrowly averted.
Lesson Learned:
• Substitute materials should be tested under conditions that realistically simulate flight
conditions and give results comparable to those exhibited by the original material.

For more technical information, call Wayne Goodman at (310) 336-5356.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 14
Space Systems Engineering Lessons Learned
Lesson 15
Avoid Separable Flared Fittings

The Problem:
Tubular fittings with flared ends, commonly referred to as B-nuts and designated as AN, MS,
and MC types, are sometimes used as separable plumbing joints in rocket engines and space-
craft propulsion subsystems. These connectors are often found to leak during tests, and may
be difficult to fix. Leaky fittings have also been implicated in several in-flight malfunctions,
including the failure of a transfer vehicle.
The Cause:
Standard separable connectors are commonly
used in ground systems to facilitate part replace-
ment. B-nuts work by converting the applied
torque into a stress that physically clamps and
deforms the flared end of fitting until it fits The flared-fitting seal relies on maintaining
tightly over the threaded element. the clamping force high enough to deform
the flare into a fit on the threaded elements.
However, just as bolts in furniture can unscrew
over time, the flared end of these fittings can un-
dergo "stress relaxation" and become loose, 1400
resulting in a leak. Launch vibration can also pull 1200

the nuts back and cause leaks.


Torque (in/lb)

1000
1/3 drop in torque
How fast the seals loosen depends on the manu- 800

600
facturing process, storage conditions, and other 400

factors, but tests have shown that the applied 200


0
3/4” MS Union and B-Nut
5 10 15 20 25 30
torque can drop by one-third over a matter of Days
weeks. Unless retightened (which can be difficult The applied torque can drop substantially in a
to do because the connectors may not be accessi- week and cause leaks to develop.
ble), loose fittings can cause failures.
Lessons Learned:
• Separable fittings in fluid lines should be avoided wherever practical in favor of perma-
nent connections such as welded or brazed joints.
• Where separable connectors must be used, the fittings should have machined sleeves or
redundant sealing surfaces. All separable connectors should be readily accessible at all
stages of assembly and at the launch site to allow torque checks and repairs.
• All separable fittings should be torque-checked as close to launch as possible. If torque
checks are not possible within 10 days prior to launch, locking devices that do not cause
contamination should be used.

For more technical information, call Leon Gurevich at (310) 336-1268.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 15
Space Systems Engineering Lessons Learned
Lesson 16
Systematically Monitor and Control Contamination

The Problem:
Contamination has degraded numerous radiators, thermal coatings, solar arrays, sensors,
moving mechanical assemblies, and other components in space. Examples include:
• The sun-viewing bays of an interplanetary probe were 20-deg C hotter than anticipated.
• The radiator of a data-relay satellite became too hot.
• An instrument failed on-orbit when internal outgassing caused arcing.
• The focal plane on an early-warning satellite degraded.
• A satellite lost its orientation accuracy because three star trackers were fouled.
• The solar array output from five navigation satellites decreased more than expected.
• The wide-field planetary camera on a space telescope lost its ultraviolet capability. A
similar camera degraded during thermal vacuum test.
The Cause: Solar Absorptance (α) of Silverized Radiator Mirrors
Contamination is a serious risk during all phases of a A (MEO) B (HEO)
spacecraft’s life. Particulate can accumulate during
manufacturing, testing, storage, and launch. Volatile F
G
materials can be released during vacuum tests or in E
space, and condense on critical surfaces. Some mole- Doubling of α (raising radiator
temperature by up to 20°C)
cules can react with sunlight to deposit tenacious D
C
films that darken over time.
0 2 4 6 8
Contamination control has historically been per- Years on Orbit
formed on a "best effort" basis: all "low outgassing" Contamination of radiators makes electronics
materials were deemed acceptable in any application run hotter. Except for curves A and B, data
in any quantity, and manufacturing requirements was obtained from GEO satellites. Satellite C
were rather arbitrary. used a special design to reduce conta-
mination.
Today’s new sensors, which must be kept extraordinarily clean, require a quantitative con-
tamination budget flowdown throughout the entire spacecraft lifecycle. Sophisticated moni-
tors and models should be used to verify that derived cleanliness requirements are met.
Lessons Learned:
• Recognize the importance of contamination-control engineering during every phase of
development and hardware design.
• Perform contamination budget analysis, using tools derived from experimental data.
• Establish quantitative cleanliness requirements and apply cutting-edge processes to con-
trol particulate and molecular contamination.

For more technical information, call David Hall at (310) 336-5896.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 16
Space Systems Engineering Lessons Learned
Lesson 17
Watch Out for the Return of Leonid Micrometeoroid Storms

The Problem:
When the Earth crosses a comet’s orbit, tiny debris trailing the comet can trigger micromete-
oroid outbursts and damage satellites. For example:
• A scientific spacecraft suffered a hit and lost substantial telescope capability (1991).
• A communication satellite lost its Earth sensor and had to be abandoned, probably due to
a particle strike that triggered a power surge (1993).
The Cause:
1x10 -3

Micrometeoroid showers occur several times a year,

Flux (particles/m 2 - s)
1966 11/18/01, 11/19/02 2031
1x10 -4

with dozens, sometimes hundreds, of particles per 1x10 -5


1985
1999
2018

hour burning up in the Earth’s atmosphere during a 1x10 -6


Draconid Storms

shower’s peak. 1x10 -7


Leonid Storms 2032

Background
Showers with 1000 or more particles per hour are 1x10 -8
1960 1970 1980 1990 2000 2010 2020 2030 2040

called storms. The Leonid storms in 1966 exhibited a Year

peak rate approaching 100,000 per hour. Leonid par- The next Leonid storms will occur in
ticles travel at speeds of about 70 km/sec and pose a November 2001 and 2002. Each may have
significant threat to satellites. multiple bursts over approximately 16 hours.
Long-term projections remain imprecise.
Satellite operators can mitigate risks by:
• Turning telescopes away from incoming particles, Fewer hits,
more get in
More hits,
fewer get in
adjusting solar panels, and orienting the satellite
to minimize damage to internal hardware.
• Reviewing procedures for rebooting subsystems. Spacecraft
Body Spacecraft
• Making sure experienced personnel are on duty Body
Body
One way to reduce storm damage involves
during the storm. orienting the satellite to face the micromete-
• Turning off equipment sensitive to electrostatic oroids at an oblique angle. Although more
surface is exposed, particles will tend to glance
discharge (ESD), and avoiding commanding the off instead of penetrating into the spacecraft.
satellite or firing thrusters during storms.

These techniques have proved successful. In the widely publicized 1998-2000 Leonid season,
only a few minor anomalies were attributed to possible meteor strikes.
Lessons Learned
• Awareness of the space environment situation is vital.
• Advanced planning in anticipation of the coming storms is essential.
For more technical information, call Dave Desrocher at (719) 638-2280. A monograph from
The Aerospace Press, Dynamics of Meteor Outbursts and Satellite Mitigation Strategies, dis-
cusses this issue in great length.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 17
Space Systems Engineering Lessons Learned
Lesson 18
Make Sure Critical Software Performs in its Intended Environment

The Problem:
The 1996 maiden flight of a launch vehicle ended in a crash.
Flight Software Sizes of Major Programs
The Cause: 100000

80000 SBIRS-High
The launcher’s flight control system, which had de-
60000
rived considerable heritage from the previous Milstar
40000
generation, used two identical inertial reference con- DSP UHF F/O
20000
trollers, including a “hot” stand-by. Phase 1 DSP
0
1965 1975 1985 1995 2005
One function inherited from the legacy software com-
puted the platform alignment before launch. This As software takes over many functions
that used to be controlled by hardware,
function was no longer needed in the new generation. code sizes increase almost exponentially.
Software reliability thus poses a growing
The new rocket flew a different trajectory, creating an challenge and warrants more quality
alignment bias that was too large for the legacy code to assurance efforts.
compute. An “operand error exception” occurred.
Such errors are common, and are typically handled by software (for example, by inserting
“likely” values). Unfortunately, although the programmers did identify the alignment bias in-
put as one of the several variables capable of causing operand errors, they chose to leave it
unprotected, probably supposing that there would be large safety margins.
More tragically, the system was designed in the belief that any fault would be due to random
hardware problems, and should be handled by an equipment swap. Thus, when the software
detected the errant and irrelevant exception, it halted the active controller and switched to the
backup. Of course, the backup immediately encountered the same error exception, and also
shut down. The launch vehicle in essence destroyed itself even though both controllers
worked perfectly.
Lessons Learned:
• Hardware redundancy does not necessarily protect against software faults.
• Mission-critical software failures should be included in system reliability and fault
analysis.
• Software specifications should always include specific operational scenarios.
• Software reuse should be thoroughly analyzed to ensure suitability in a new environment,
and all associated documentation, especially assumptions, should be reexamined.
• Extensive testing should be performed at every level, from unit through system test, using
realistic operational and exception scenarios.

For more technical information, call Suellen Eslinger at (310) 336-2906.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 18
Space Systems Engineering Lessons Learned
Lesson 19
Be Sure that the Architecture Isolates Faults
5
The Problem:

28V S/C Bus


A pair of scientific satellites was launched in late 2000, 1
Rx A
Tx 1
Ant enna

and in less than three weeks both stopped receiving 3


commands. Both spacecraft failed due to improperly
implemented software, compounded by a fault- Rx B
Tx 2
2 Ant enna
intolerant power-distribution architecture
4 Computer 6
& drivers
The Cause:
The root failure cause involved overheated relays,
which should receive pulsed commands according to System Schematics (Simplified)
the system requirements. Software documents did not The bus power fed into one command
pick up this specification, and a constant voltage was receiver (Rx A) via a fuse (c) and into
supplied instead. another via a circuit breaker (d).
The receiver power was, via “OR”
A status indicator relay coil shorted under continuous diodes (i.e., the downstream circuits can
heating in vacuum and caused the circuit breaker of draw current from either receiver, e),
Receiver B to trip. Receiver A should have been tapped off by two transmitter
isolated from this fault, but was not because it was (Tx)/antenna switches (f) and two
joined to Receiver B via an “OR” diode. It thus also commercial-off-the-shelf status indicator
relays (g). The relays were commanded
suffered a current surge and blew the fuse, preventing by the flight computer (h).
the ground station from controlling the satellite.
The architectural oversight escaped design review probably because the status indicator relays
were not thought to be crucial. However, because these relays drew current from both receiv-
ers, a short in either of them would cause a catastrophic failure of the system.
The continuous command fault was not detected during unit test because the test set software
correctly drove the relays with pulsed signals. System test should have caught the error be-
cause the continuously powered coils drew five extra watts, a considerable amount in a low-
power system. Unfortunately, the extra power draw was not noticed.
Lessons Learned:
• Create and use a verification matrix for all levels of test requirements.
• Inspect all test data for trends, oddities, and “out-of-family” values, even when all values
are within expectation. Evaluate all indicators for potential impacts, should trends con-
tinue. Seek to explain all instances of anomalous data.
• Incorporate flight software into test at the earliest opportunity.
• Avoid sneak failure paths by keeping circuit designs straightforward.
• Use isolation resistors or downstream fuses to prevent a grounded component from bring-
ing down the entire system.

For more technical information, call Peter Carian at (310) 336-8215.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 19
Space Systems Engineering Lessons Learned
Lesson 20
Thoroughly Analyze and Test Deployables

The Problem:
Troubles associated with deployables have affected numerous satellites. For example:
• A foreign satellite could not open its solar sail, causing attitude-control errors to build up
and the mission to fail (1982).
• A comsat was abandoned after a solar array failed to deploy (1987).
• An interplanetary probe could not unfurl its high-gain antenna (1989).
• Two solar arrays of a comsat jammed, leading to an insurance claim of over $200 million
(1998).
In addition, several potential on-orbit catastrophes have been narrowly averted. Stuck deploy-
ables have been shaken loose by space-walking astronauts or by rocket burns. In 1991, the
antenna on a comsat stuck and disabled the satellite for three months, until repeated on-orbit
maneuvers finally freed it.
The Cause:
Deployables are complex mechanical equipment
customized for each mission, and thus lack the heri-
tage of testing and usage common to electronic
devices. With deployables, robust design, thorough
= 4-hinge line Articulation
testing, and careful handling are vital. = 3-hinge line
Fingers
= 2-hinge line
The design must provide adequate force margins,
including thermal and tolerance analyses, to over-
come all resistances. The 1991 anomaly cited above
was caused by interference from thermal blankets.
A thermal blanket Velcro pad likewise snagged the Deployable design should not be so
complex that it cannot be verified on the
magnetometer boom of another satellite in 1990.
ground. The deployment scheme in the
Testing is a major part of the deployment develop- satellite depicted above was too complex
ment effort. Special tests and off-loading fixtures to be tested, and The Aerospace
Corporation had to run an in-depth
(such as balloons or air bearings) are frequently re- analysis to verify it. Although the
quired to demonstrate deployability in a zero- deployment proved successful in space,
gravity environment. Some deployables cannot the contractor learned a lesson and
support their own weight on Earth, and require spe- decided to revert to simpler schemes in
cial testing accommodations. the future.

Lessons Learned:
• Make sure the design can be effectively tested.
• Avoid unconventional designs, especially those involving complex motions.
For more technical information, call Brian Gore at (310) 336-7253.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 20
Space Systems Engineering Lessons Learned
Lesson 21
Prevent Loss of Lubricating Oil and Grease During Storage and Test

The Problem:
Many failures have been caused by mishandling of liquid lubricants (oils and greases), par-
ticularly during prelaunch storage. For example:
1. The reaction wheels on several navigation satellites malfunctioned.
2. Many instruments stopped functioning when their ball-bearing cages ran dry.
3. The focusing system in a space telescope developed high torque and had to be replaced in
space.
4. A gyroscope stopped working during testing.
5. A sensor problem affected eight satellites, and caused an on-orbit failure.
6. A gimbal drive unit developed excessive noise.
The Cause:
Liquid lubricants are susceptible to physical loss and chemi- Reaction Wheel
Bearing
cal degradation. Physical loss can occur by evaporation and
migration. In the first mishap above, the satellites were
stored longer than originally anticipated, and some oil was
lost. Later builds switched to a less volatile oil, and stored
the wheels separately from the satellites, with their spin axes
oriented horizontally to limit migration. Oil Retained Oil Drips Out

Physical loss can also involve absorption. The second mis- The spin axes of gyros and wheels
hap occurred because the hardware surfaces are porous. Oil should be oriented during storage
was absorbed into them and was no longer available for in such a way as to ensure oil
retention.
lubrication.

Oil and grease can also chemically degrade and lose their ability to lubricate. Unprotected lu-
bricants have been known to polymerize (which caused mishap No. 3), oxidize (No. 4), react
with titanium surfaces (No. 5), or dissolve plastics (No. 6).
Lessons Learned:
• Minimize oil evaporation and migration during hardware storage.
• Use enough oil to sustain storage and operation needs. If porous hardware requires
lubrication, they should be thoroughly cleaned, protected from moisture, and stored in oil.
• Test high-speed moving parts in an inert environment to prevent oxidation.
• Perform materials compatibility analysis to avert chemical reactions.
• Check NASA Mechanisms Handbook (NASA/TP-1999-206988) for guidelines on
mechanical assemblies.
For more technical information, call Steve Didziulis at (310) 336-0460.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 21
Space Systems Engineering Lessons Learned
Lesson 22
Be Aware of Challenges in Silver/Zinc Battery Manufacturing and Deployment

The Problem:
Silver-zinc batteries have supplied power to many launch vehicles and upper stages over the
years. These batteries are susceptible to a variety of problems during development and manu-
facturing. In the field, batteries have splashed operators with caustic chemicals, delayed
launches, and caused a serious malfunction in an upper stage.

The Cause:
Launch vehicles rely on primary (non- Terminal Vent
rechargeable) batteries to power avionics,
pyrotechnics, range safety, and other equip-
ment. Silver/zinc batteries, the most common Case

type, can be stored “dry” for several years until Ag Zn


activated by the addition of the electrolyte. The
activated batteries must be used within weeks Electrolyte

or, at most, a few months.


Film Separator
Customized for the launch and space environ- (prevents shorting)

ment and for each particular program, batteries


are hand-built in small lots. They are sensitive Batteries consist of numerous cells, each
to operator changes, material alteration, con- containing a silver electrode and a zinc
tamination, and a host of factors during electrode. One of the most common battery
development. problems pertains to the plastic separators
that wrap around the silver electrodes.
At launch sites, mishandling of batteries can Minor changes in the constituents of these
allow caustic chemicals to escape. If too much items have led to incompatibility problems
electrolyte is added, batteries can spew or even with the electrolytes, causing excessive
shrinkage or chemical reactions.
start fires. The upper stage problem cited
above, for example, occurred because electro-
lyte escaped from inadequately vented cells,
causing a short to ground.

Lessons Learned:
• Design, documentation, manufacturing, storage, and field application of batteries require
constant vigilance.
• Materials must be thoroughly screened before being incorporated in batteries.

For more technical information, call Margot Wasz at (310) 336-2141.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 22
Space Systems Engineering Lessons Learned
Lesson 23
Make Sure Requirements Are Developed Correctly

The Problem:
As a planetary probe neared its objective, a potentially crippling flaw was discovered—the
designers had neglected to take the Doppler Effect into account.
The Cause:
After a seven-year journey toward one of the Saturn’s 100

Percent of Life-Cycle Cost


90 % Cost Committed
moons, the probe will enter the moon’s atmosphere, 80

collecting data during descent for relay to the Earth via 70

an accompanying orbiter. 50
% Cost
Expended

As the probe speeds away from the orbiter, the data sig- Production
& Operations
nal frequency will drop slightly, due to the Doppler 10
Engr & Deployment &
Concept Dem/Val Support
shift. According to the Inquiry Board Report, this un- Mfring
Program Milestones
avoidable frequency drop was overlooked from initial
project requirement determination all the way through Most of the project’s cost and performance
design specification of the orbiter’s receiver. Extensive are established by front-end decisions, but
internal and external reviews failed to discover this mistakes made there are difficult to catch.
oversight, in part due to a proprietary issue. Later, the More resources, including the most
experienced personnel, should be made
design flaw escaped the system-level test because an available to ensure the early decisions are
incorrect frequency was used. made properly.
Two and half years after launch, a check-out of the Designers should thoroughly review the
history of similar projects. If the probe
probe indicated that the signal frequency was outside the designers had analyzed the requirements of
receiver’s bandwidth. Had the problem been unveiled on other deep space projects, both the
the ground, it could have been fixed with a simple soft- importance of the Doppler shift and the
ware patch. Unfortunately, the software is not accessible correct way to perform end-to-end test would
have become obvious.
in flight.

To minimize the Doppler shift, the flight trajectory had to be changed, at considerable ex-
pense in fuel, so that the orbiter will be farther away from the probe as it descends.
Lessons Learned:
• Formalize requirement development process and capture lessons.
• Provide adequate design margins and operational flexibility, such as the ability to use soft-
ware patches.
• Make sure that the hardware or software a contractor wants to reuse from another program
is indeed applicable and has a satisfactory flight history. Do not be deterred by the excuse
that details are not available because the previous program was proprietary or classified—
there are always ways to get around that hurdle.

For more technical information, call Mark Simpson at (310) 336-0159.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 23
Space Systems Engineering Lessons Learned
Lesson 24
Safeguard Hardware Against Inadvertent Overtesting

The Problem:
A satellite suffered considerable damage during vibration test because worn-out equipment
misled the test operator into applying an excessive force.

The Cause:
Prior to vibrating the spacecraft, the operators
first subjected it to a low-level calibration test Shaker
to compute how much force should be applied Body
to achieve the specified acceleration. Trunion
Slip Plate
Unfortunately, the shaker was over 40 years Shaker Area of Friction
old, and its trunion bearings had broken. The Base
Granite Oil Film Table
slip plate came into contact with the shaker
table, resulting in an interference that attenu- Friction during start-up can greatly
ated the satellite’s motion. exceed that during operation. This
problem, known as stiction, frequently
Unaware of the malfunction, the test engineer causes trouble. For example, when a
thought a much larger force needed to be tape drive is adjusted, the tape may not
move until enough voltage to
applied to achieve the required acceleration.
overcome the stiction is applied; but
This force overcame the start-up friction, but then the force is too large, and the tape
overshot the acceleration by tenfold, damaging suddenly runs wild.
the spacecraft.
Lessons Learned:
• Make sure that test facilities are maintained and checked.
• Implement overtest protection (such as over-temperature trip circuits in thermal cham-
bers).
• Take risks of overtesting during vibration tests into account. In particular, large satellites
should typically be acoustically tested instead of vibration-tested to prevent damage.
• Step up vibration tests from one-third to one-half of the full level so that the required force
can be more accurately computed.
• Test procedures, set up, and data should be thoroughly checked to account for operator
mistakes and avoid damage.

For more technical information, call Alan Peterson at (310) 336-0101.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 24
Space Systems Engineering Lessons Learned
Lesson 25
Thoroughly Verify All Software Changes
The failed launch was rehearsed three
times, during which the console opera-
The Problem: tors could have spotted the open valve
but missed it.
A launch vehicle failed because part of a command
Graphical displays, summarized tele-
line was left out of a software change.
metry data, and error checking should
The Cause: be provided to allow operators to
identify and diagnose faults.
The launch vehicle had flown successfully several
times. This mission, however, had to be launched at a
particular time. Accordingly, the time variable in the Valve 1
Valve 2
on
off
Valve 7
Valve 8
off
on Cluttered telemetry
Valve 3 off Valve 9 on
software was changed from Reference Time to Fixed Valve 4
Valve 5
on
off
Valve 10
Valve 11
off
off
dis plays can c onfuse
operators
Valve 6 on Valve 12 on
Time. Well- designed graphic al
dis plays allow operators to
Multiple updates to the ground software were made, quickly identify faults

including one that controlled a valve regulating the Valves 5 and 11


ground-supplied nitrogen and, indirectly, an attitude- should NOT both
be ‘off’
control engine. This valve should have been closed
shortly before liftoff.
To learn more about human factor en-
Since the Reference Time no longer applied, an exist- gineering, see SMC Publication HM-
ing command, “If the state is Abort (or the state is RB-2001-1, “Human Computer
Interface Display Conventions” on the
Nominal and Reference Time is T-105 sec), close
“Documents” section of the SMC/AX
Valve X.” should have been updated to: “If the state is Web site (http://ax.losangeles. af.mil/
Abort (or the state is Nominal and Fixed Time is T-105 chief_ engineer/).
sec), close Valve X.”
Unfortunately, the conditional statement in the parenthesis was omitted, and the command
became “If the state is Abort, close valve X.” Hence, the valve stayed open, and the engine
malfunctioned.
The error went undetected because the change notice included several unrelated items, failed
to explain why the control code was changed, and did not compare the was/is algorithms. In
addition, not all logic paths, displays, and output commands were verified.
Lessons Learned:
• A small software error can have catastrophic mission impacts.
• Software change processes require the same degree of rigor as the original development.
Each change and associated rationale must be individually approved.
• Retest and regression testing should be formal and thorough. All logic paths affected by
changes must be verified, and all results must be checked.
• Operational status, particularly off-nominal indicators, must be displayed effectively.

For more technical information, call Suellen Eslinger at (310) 336-2906.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 25
Space Systems Engineering Lessons Learned
Lesson 26
Make Sure Hardware Analyzed Is Hardware Actually Built

The Problem:
A technology-demonstrator mission was terminated after only eight months because an over-
sight in thermal analysis was unrecognized by two projects.

The Cause: Solar Panel Configuration: Modeled versus actual


The solar array was originally designed for a
Cells Solder Temperature
“faster, better, cheaper” mission. Unfortu- Cover
nately, the thermal model did not account Core Facesheet Baseline
for the presence of harnesses and harness Cover
covers which, by preventing heat from radi- Originally Modeled

ating away, raised the temperature in the


cells near the harness by as much as 40-deg 20-deg C above
C, causing stress in the solder joints of the expectation

cell interconnections. The joints cracked As Built –


Extra
Cover
open, and the circuits failed. Areas with Harness Cover

The original mission never flew. However,


the panel design was carried into this pro-
gram without being revalidated, most likely Dielectric
40-deg C above
expectation
because of resource constraints. Circuitry
Extra
As Built –
In retrospect, if a thermal analyst had actu- Areas With Harness Cover
And Harness Cover
ally looked at the hardware and seen the
conspicuous harnesses at panel fold loca- Cells near the harness became hotter and
tions, the problem would have been caught degraded first.
right away.
Lessons Learned:
• Designers should be called back to inspect the products, to see if there are major differ-
ences between analysis and implementation.
• Modeling mistakes are not easily caught. Analysis does not negate testing.
• Do not cut corners on modeling or testing.
• Programs should insist that the analysts document their methodology and assumptions,
and compare them against the actual hardware so that errors may be found.
• Do not rely on heritage designs until their flight experiences are thoroughly understood.

For more technical information, call David Gilmore at (310) 336-1897.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 26
Space Systems Engineering Lessons Learned
Lesson 27
Control Propellant Balance

The Problem:
Dynamic instability caused by fluid imbalance has afflicted several satellites during orbit
transfer maneuvers. Example include:
• A commercial communication satellite was stranded in a low orbit, and had to expend sig-
nificant fuel in hundreds of thruster firings to reach a geosynchronous orbit.
• A foreign satellite failed to reach geostationary orbit.
• A military communication satellite wobbled unexpectedly (but was able to recover).
The Cause:
Propulsion control is a delicate task because many
parameters, such as the flow rate of propellant in
space, cannot be precisely modeled or controlled.
Several factors can trigger fluid imbalance:
• Improper fuel-load procedures. (This problem
caused the first incident cited above). 1 2 3
• Differences in flow rates or valve responses can
cause propellant to be drawn preferentially from As satellites spin during transfer
one tank over another. (This problem probably maneuvers, mass imbalances coupled with
centrifugal forces can cause tilting. Severe
caused the second mishap). tilt can divert the transfer thrust and
If one tank is cooler than the other, propellant will prevent satellites from reaching their
flow into the cooler tank from the warmer tank, proper orbit.
causing imbalance.
Gas
Lessons Learned: Feedback loops can be
n designed to control gas
• Make sure tank loads are balanced. pressure (n) or fuel flow
• Use a single tank, if feasible, to avoid propellant Fuel (o) between the tanks to
restore balance. The latter
migration. o method is more precise.
• Ensure that attitude-control algorithms and
mechanisms can correct dynamic instability Thruster

caused by propellant imbalance.


• If possible, place a gas pressure regulator above the tanks, or latching isolation valves be-
low each tank, to control propellant flow.

For more technical information, call Mark Mueller at (310) 336-5081.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 27
Space Systems Engineering Lessons Learned
Lesson 28
Graphite/Epoxy Structures Are Easily Damaged by Processing Changes and Handling
Mishaps

The Problem:
Two failures involving graphite/epoxy pressure vessels occurred recently:
• A launch vehicle crashed when one of its solid boosters ruptured.
• Two solid-rocket segments failed during hydroproof testing.
The Cause:
Graphite/epoxy composites are used for trusses,
Impact
pressure vessels (such as nickel-hydrogen
batteries and motor cases), and many other
applications. Composite technology is rela-
tively new. Minor variations in fiber, resin, and
processing can dramatically affect product per-
formance. Quality assurance is vital, yet diffi-
cult to achieve. Broken Fibers Delamination

Graphite/epoxy pressure vessels, especially


In addition to graphite/epoxy, Kevlar/
those incorporating high-strength fibers, are epoxy structures are also easily damaged.
easily damaged. The launch failure was attrib- In both cases, external impact usually leads
uted to a handling mishap such as uneven to damage on the inside and can be
lifting or an inadvertent impact. Unfortunately, difficult to detect.
damages are not readily detected—existing
nondestructive testing procedures, based on
ultrasonic scanning, is cumbersome and not
100-percent effective.
The rocket segment failures took place after the contractor altered materials to meet environ-
mental regulation requirements and made several innocuous changes in the manufacturing
processes. Although limited laboratory tests were satisfactory, the fibers wrinkled during
winding, greatly reducing the composite’s burst strength.

Lessons Learned:
• Protect graphite/epoxy pressure vessels from handling damages.
• Insist on safety margins and quality inspections for composite structures.
• Perform extensive requalification and acceptance tests to guard against subtle processing
changes.
For more technical information, call S. R. Lin at (310) 336-7697.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 28
Space Systems Engineering Lessons Learned
Lesson 29
Validate Changes in Command Script Configuration

The Problem:
Contact with a deep space observatory was lost (control was regained three months later
following a dramatic rescue; see Lesson 30).

The Cause:
Sun L1 L2
The spacecraft used three gyros:
• Gyro A, to control the safe mode; The Lagrange Points

• Gyro B, to detect faults; and There are five Lagrange Points where
gravitational attractions from the Sun and
• Gyro C, for normal attitude control. Earth balance each other. The loss of
control occurred at the first Lagrange Point
The flight software should turn on the normally (L1, about 1.5 million kilometers from
off Gyro A when the satellite entered safe mode. Earth), from which location the space
Unfortunately, the engineer making a command observatory monitors solar activities. The
procedure change did not know to implement the L2 point, on the night side, is suitable for
infrared astronomy.
enable command. A loose change-control
process failed to catch the error.
During a routine operation, Gyro B was accidentally set incorrectly, causing a false reading.
The on-board computer detected B’s error and put the satellite in safe mode. The fault on B
was fixed, but control shifted from C to A.
Sensed rates from Gyro A (despun, reading zero) and B (active with variable readings) soon
diverged, prompting the thruster to fire to try to null the nonexistent roll error. The effort was
futile, and the satellite entered safe mode again two hours later.
The spacecraft was designed to survive in safe mode for at least 48 hours. Nonetheless, the
operators did not pause to analyze why one anomaly followed on the heels of another. Side-
stepping the required telemetry data check that would have indicated that Gyro A was in fact
off, the operators mistook Gyro B’s variable readings as a sign of a fault, and turned it off.
With no functional gyro, control was soon lost.
Lessons Learned:
• Treat command-procedure changes with the same rigor as flight-critical software. This
includes formal configuration management, peer review with knowledgeable technical
personnel, and full command verification with an up-to-date simulator.
• Ensure change implementation timelines are consistent with staff workloads.
• Display spacecraft health and safety information clearly.
• Follow validated operations procedures, including review of all pertinent data.
For more technical information, call Suellen Eslinger at (310) 336-2906.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 29
Space Systems Engineering Lessons Learned
Lesson 30
Maximize On-board Reprogrammability To Enable Fault Recovery

The Event:
An observatory lost in deep space (Lesson 29) was brought back to life following three
months of clever troubleshooting.
The Cause: 800 Power from
700 Solar Array
The salvage team faced daunting challenges.
600
Following the loss of attitude control, the

Power, W
500 Power Used
satellite’s heaters had shut down, its batteries 400 to Thaw Tanks
were drained, and its fuel had frozen. Insuffi- 300
200
cient bus power made it impossible to sustain 100
a downlink long enough for the ground station 0
0 5 10 15 20 25
to lock on, and rescuers were not even sure
Seconds
exactly which communication frequency
would work.
Power-Efficient Thawing of the Hydrazine Tank
The team hit upon the idea of borrowing the
The fuel tank had to be warmed up before pipes and
world’s largest radar to transmit to the space-
thrusters were, lest overpressure burst the lines.
craft, and using another big dish to receive
Software changes allowed the battery to discharge
return signals. They set up a special wideband current like a thermistor and turn on selective
analyzer over the Internet so that the down- heaters whenever power became available. Because
link signal could be analyzed instantly. the flight computer was off during battery charging,
the software patch had to be reloaded each time.
The shot in the dark paid off—a faint heart-
After fine-tuning, controllers managed to thaw the
beat was received from the lost satellite. Only tanks with 48 heaters, using a peak power of over
the carrier signal came, however, because the 500 watts!
on-board receivers could not lock onto the
uplink signal.
Ingenious commands, together with efficient power management, eventually brought the bus
voltage up to 28 V, permitting controllers to monitor spacecraft status and thaw the propulsion
system. An intricate attitude recovery maneuver was devised to allow the satellite to reacquire
the Sun, and normal operations resumed. Remarkably, despite having been alternatively ex-
posed to extremes of -120º and 100ºC, all instruments survived!
Lessons Learned:
• Design into the satellite the flexibility to handle unforeseen emergencies, and provide
emergency reset capability for major components.
• Add emergency protection of a satellite battery system, such as low-battery-voltage cut-
out of nonessential loads.
For more technical information, call Julie White at (310) 416-7229.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 30
Space Systems Engineering Lessons Learned
Lesson 31
Oxidation Can Cause Erratic Open Circuits In Solid State Devices

The Problem:
Several photodetector chips developed intermittent open (high resistance) circuits during inte-
gration.
Self-

Current
The Cause: Repaired
This anomaly baffled experts because the
chips, when returned to the foundry, often Voltage
passed diagnostic tests. Also, investigators Good Chip Oxidized Chip
could find no mechanical defects (such as
fractures) that might account for the open cir- An applied voltage can sometimes heal the
chips temporarily by pushing the oxide layer
cuits. aside.
An in-depth study revealed that the anomaly
resulted from oxidation of the titanium diffu-
sion barrier under the gold signal line. Aperture Detector Geometry
Titanium oxide can “switch” (jumping be- Pixel Top
tween conducting and insulating states) View
Via
causing the circuits to open erratically. Gold Trace

The subtle flaw was caused by manufacturing Gold Titanium Layer


Trace
imperfections that exposed the titanium layer Passi-
vation Gold Via Side
to oxidation. The defect was not caught by the View
Photodetector Chip
chip maker because the oxidation developed
very slowly.
Incomplete coverage of the gold via by the
Other metals are susceptible to this problem. trace exposed the titanium layer to
Oxidation of lead created excessive noise in a inadvertent oxidation.
lead sulfide detector. Oxidation of nickel
made some devices oversensitive to applied
voltage or even shock. Nonlinear voltage-
current behaviors were the cause in each case.

Lessons Learned:
• Protect sensitive metal layers from oxidation (caused by over-etching, for example) during
semiconductor fabrication.
• Use current-voltage profiles as a diagnostic tool—nonlinear high resistance usually indi-
cates oxidation.

For more technical information, call Alfred Fote at (310) 336-6926.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 31
Space Systems Engineering Lessons Learned
Lesson 32
One Operation, One Verification

The Problem:
A prototype reusable rocket crashed because a technician forgot to reconnect a helium line.
The Cause:
Stem
The goal of the project was to demonstrate rapid Spring Plug
turnarounds between vertical takeoffs and land-
ings. A streamlined management approach kept Set Screw
paperwork to a minimum. A working vehicle
was built in 18 months; a modified version had
already flown three times before the incident.
The flyer was supported with four legs that were A Similar Incident:
Failure Caused by a Loose Screw
actuated by an on-board helium supply. During
preflight preparation, each leg was deployed The precision regulator in a booster engine
once so the control center could verify its de- control system used a stem screw to modulate
ployment monitors. The helium line was then gas inlet. A set screw forced a nylon plug
disconnected to vent the actuator, the legs against the stem screw threads and prevented
the stem from rotating.
stowed, and the helium line reconnected. Four
technicians repeated this procedure on each leg. The regulator was reworked to repair leakage
Unfortunately, a technician forgot to reattach one during build. The rework instruction did not
explicitly require set screw retorquing and
helium line. The error was not detected because verification. The loose set screw caused the
there was no procedure to check the integrity of stem screw to unseat. The launch failed.
the system after disconnection and reconnection.
At landing, the leg failed to deploy, whereupon
the vehicle toppled and exploded.
The investigators found that procedures were neither well developed nor rigorously applied.
Operators and technicians used the procedures as guidelines instead of checklists. In fact, fail-
ure to reconnect happened once before. Although caught, the incident was not documented.
Lessons Learned:
• Implement a discrete verification step for each critical task.
• Avoid multiple tasks within a procedure (see Lesson 12).
• Ensure a fail-safe process by applying software technology, self-checking indicators, or
positive feedback mechanisms to complex operations vulnerable to human errors.
• Document each near miss and correct its root cause.

For more technical information, call Ron Williamson at (310) 336-2149.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 32
Space Systems Engineering Lessons Learned
Lesson 33
Check Satellite-Launcher Compatibility As Early As Possible

The Problem:
A technology demonstrator satellite had to be substantially redesigned because the vehicle’s
stability during the orbit-transfer maneuver was not considered early on.
The Cause:
When a satellite spins, its components vibrate
at a “nutation frequency” determined by the
moments of inertia and by the spin rate. Flexi-
ble parts, such as whip antennas and fluids, will
dissipate the rotational energy, particularly if
these parts resonate near the nutation fre- The first American satellite, Explorer 1,
quency. Energy dissipation may lead to went into a flat spin because its flexible
increased coning angles, even a flat spin. antennas triggered nutational growth.
Nutational growth caused several early
satellites to malfunction. Although well under-

Yaw Rate Deg/sec


40
stood in general today, it remains a challenge
20
whenever spinning upper stages are used—
0
because fuel motion and burning complicate
-20
the analysis, the satellite should be designed
with extra margins to prevent the stack from -40

entering a flat spin during orbit transfer. 0 30 60 90 120


Time from Ignition, sec
The upper stage selected by this program spins.
Unfortunately, the contractor failed to pay
attention to the issue during preliminary design, As shown here, solid upper stages, which this
mission used, are more prone to instability.
despite advice from experts. The instability The satellite contractor did not recognize this
could have been mitigated by simply modifying risk in part because the launch vehicle con-
the satellite propellant tanks. However, because tractor failed to formally communicate this
the problem was recognized late, numerous requirement. The design changes kept the
costly modifications became necessary. The instability in check during flight, and the
satellite reached the correct orbit.
project was almost cancelled.
Lesson Learned:
• Ensure interface problems between the satellite and launcher, such as dynamic instability,
are analyzed early on in the design process (see Lessons 2, 11).
For more technical information, call David Stampleman at (310) 336-2243.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 33
Space Systems Engineering Lessons Learned
Lesson 34
Safeguard Hardware Against Inadvertent Overtesting (II)

The Problem:
A satellite launch had to be postponed by several months because an antenna panel delami-
nated.
Antenna
Element
The Cause:
Heater Conformal
Vent Blockage
The antenna assembly, based on a honeycomb Tape Coating
sandwich structure, was undergoing a thermal
vacuum test. An operator set the heater voltage Honeycomb Core

too high, causing the panel to be subjected to Aluminum Facesheet


100-deg C instead of the planned 61-deg C.
Antenna Construction
Similar overheating problems had occurred
before at this facility, and an automated tem-
perature limiter or alarm on the test equipment
would have averted the mishap. However,
motivation to invest in facilities or training was
low because the program was coming to an
end.
Overheating prompted pressure to build up
within the sandwich cells. Unfortunately, four
of five venting holes in the facesheet were in-
advertently blocked by conformal coating
because the operators were not provided with
clear assembly instructions. The trapped
pressure caused the panel to rupture.
Lessons Learned: Delamination Area (1’ x 2’)

• Implement overtest protection (see Lesson 24).


• Correct the root cause of operational mistakes.
• Incorporate visual guides or overlays as part of process control procedures.
• Honeycomb sandwich structures for space structures should be vented. Otherwise, when
heated, trapped air and moisture can expand, creating pressure and causing delamination
(Lesson 1).
For more technical information, call Susan Ruth at (310) 336-6765.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 34
Space Systems Engineering Lessons Learned
Lesson 35
Implement Independent Fault Protection

The Problem:
A deep-space mission ended prematurely after excessive thruster firing depleted its fuel.
The Cause:
This spacecraft was developed by a highly
motivated group operating under a rigid cost
cap and tight schedule. Flying just 22 months
Command Sensor Solid State
after being funded, it successfully circled the Module D Recorder
A
moon and demonstrated many technologies. T Data Handler
Telemetry
A
Soon afterward, however, a maneuver triggered Module Sensor
Processor
a numeric overflow in the processor, causing it B
U
31,000 Lines
to erroneously fire its thrusters and freeze. A ACS/RCS S Housekeeping
Module Processor
“watchdog timer” algorithm should have
Memory 34,000 Lines
stopped the thrusters from continuously firing,
but did not execute because the computer had
A Rushed Job
already crashed. By the time ground operators
regained control, all the fuel was gone. Over 65,000 lines of flight code
(only 20% inherited) were de-
A hard-wired timer, which would have stopped veloped in 17 increments within one
thruster firing, was not implemented due to the year, leaving little time for thorough
tight schedule. Time pressure also prevented testing
the software from being fully tested, and many
changes had to be uploaded as faults were dis-
covered.
The overflow error had occurred thousand of times (without causing malfunctions) because
the project had to settle for an inadequate but available processor. Software changes had been
written to correct the problem, but the overstretched staff could not handle operations, anom-
aly analysis, and software repair at the same time, and the change was not loaded.
Four years later, another interplanetary probe encountered a similar anomaly. Fortunately, en-
gineers learned the lessons from the previous incident; the precautions they took allowed them
to successfully complete the mission (see Lesson 36).
Lessons Learned:
• Apply independent fault protection for critical software functions.
• Implement exception handling to protect the flight processor from aborts due to data han-
dling errors (see Lesson 18).
• Do not cut corners in testing critical flight software.
For more technical information, call Suellen Eslinger at (310) 336-2906.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 35
Space Systems Engineering Lessons Learned
Lesson 36
Implement Independent Fault Protection (II)

The Event:
An interplanetary probe recovered from a major anomaly.

The Cause: Processor


Resets CPU if no
The spacecraft, designed to rendezvous with an pulse arrives Programmed
asteroid, employed extensive autonomy because (indicating Output
processor crash) Hardware
ground intervention during an emergency would Watchdog
take too long. The designers studied the history of
an earlier project, which terminated prematurely Resets and continues upon
after a data error depleted on-board fuel (see receiving correct signals
Lesson 35). Watchdog Scheme (Simplified)
Three years into the flight, an engine burn aborted. The processor feeds a series of
A missing command in the burn-abort contingency programmed pulses into the hardware
command script prevented a graceful transition timer, which will reset itself and await
the next input. If the expected
into the safe mode, and a series of anomalies “heartbeat” does not arrive, the
ensued. Communication was lost for 27 hours be- watchdog knows that the processor has
fore the flight computer regained control. probably crashed and intervenes (such as
by initiating a fault protection routine).
The initial script error was not caught during soft-
ware tests. Hardware-in-the-loop simulation could
not test abort scenarios because the brassboards
were difficult to use. Exactly how the anomalies
propagated is unclear because a bus undervoltage
wiped out data from the recorder, nor could the
anomalous behaviors be reproduced on ground.
During the emergency, the spacecraft fired its thrusters thousands of times. Fortunately, the
fuel loss was tolerable because the thrusters were hardwired to fire only for fractions of a sec-
ond. The mission was saved because the designers took precaution against fuel depletion
during a software crash, a lesson learned from the previous failure.
Lessons Learned:
• Create extensive, realistic nominal and anomalous operational scenarios for testing at
every level, from unit through system test.
• Implement robust simulators, including hardware-in-the-loop, for testing critical flight
software functions.
• Apply independent fault protection, such as hardware watchdogs, to mitigate risk in real-
time systems, where errors can be so deeply buried as to be practically undetectable.
For more technical information, call Richard Adams at (310) 336-2907.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 36
Space Systems Engineering Lessons Learned
Lesson 37
Aim for Realistic Schedules in Development Projects

The Problem:
A sophisticated instrument was delivered five years behind schedule.

The Cause:
Delivery Date Slip (Year)
Combining three previously separate sen-
Pyroshock/Vibration Failure
sors and aiming for greater sensitivity, this Chip Underperformance
instrument densely packed together diverse Scan Drive Problems
5 EMI/Intermittent Problems
technologies. The developer contracted for Deployment Redesign
delivery in three years, even though two 4
Digital Engineer’s Death
heritage systems each took eight years to 3 Slip Ring Noise
RF Parts Delayed
build. 2 Oscillator Failure
1 Faulty Mixers
Soon after program start, the spacecraft Delivery!
0
Award CDR Thermal Vac Rework/Retest
prime contractor issued unexpectedly strin-
~ 10 Years
gent interface requirements. The
preliminary instrument design had to be
substantially altered to meet new weight, Slim margins, unproven technology, tight
schedules, and fixed cost conspired to incre-
volume, and vibration constraints. mentally push the delivery date.
More features (such as stiffer structures) Items marked with arrows each impacted the
had to be added, but design flexibility was schedule by between 9 and 18 months.
limited due to volume constraints. In com-
pensation, cutting-edge electronics had to
be deployed, but the vendors could not de-
liver them on schedule due to manufac-
turing difficulties.
The contractor adopted first-pass-success schedules—the design went into manufacturing
directly, skipping prototyping. Problems surfaced late (such as during thermal vacuum test-
ing), and were discovered sequentially. Despite the contractor’s heroic effort, it took eight
years before the product was delivered.
Lessons Learned:
• Provide a detailed interface specification as early as possible.
• Foster a cooperative working arrangement among contractors and proactively maintain
realistic power, weight, and volume reserves.
• Create engineering models so that problems can be discovered early.
For more technical information, call Alfred Fote at (310) 336-6926.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 37
Space Systems Engineering Lessons Learned
Lesson 38
Do Not Ignore Unexplained Test Anomalies

The Problem:
A power regulator had to be pulled from a spacecraft.
Stand-by
The Cause: Battery

A new block of satellites, extensively upgraded in its


(a)
power system, exhibited several unusual anomalies
during system testing. Although the contractor
managed to work around the anomalies, the program Open Loop Gain of
office was uncomfortable with the design robustness Array Voltage Regulator

and requested an independent analysis. Magnitude Phase

A preliminary independent simulation did not find any-


thing odd. The analyst continued to refine the model
without spotting a “smoking gun” that would account
for the problems, and most people became skeptical as
to whether a problem in fact existed.
(b)
The analyst’s persistence paid off, however, when he Frequency (Hertz)

found a circuit instability that induced the glitches. Solar Array Voltage
Moreover, the design flaw would have caused the solar A

array voltage to oscillate when the satellite exited from


eclipse, overstressing power components and trigger-
ing immediate mission loss. The contractor verified the
subtle flaw and pulled the already integrated unit from
the satellite, averting a disaster.
(c)
The problem was not found earlier because the test
Time in Oscillation
configuration was not sufficiently flight-like—
resonance was quickly damped out by a one-ohm The line filter and feed-through capacitor
dummy load resistor. On orbit, the solar array’s high (a) combined to resonate at a “crossover
impedance would have made it impossible to keep the frequency” (b). The array would suffer
resonance in check. sustained oscillation (c) and fail.

Lessons Learned:
• Test under all operating conditions—not only sunlight and eclipse operation, but transi-
tions, safe-hold mode, loadshed mode, and recovery mode.
• Strive to understand implications of test anomalies.
• Ensure perceptive instrumentation, lest test-set glitches cast doubt on results.
• Minor design changes in power supplies can result in disastrous consequences. Double-
check design changes, and perform independent analysis where practical.
For more technical information, call Kasemsan Siri at (310) 336-2931.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 38
Space Systems Engineering Lessons Learned
Lesson 39
Thoroughly Review Test Data for Early Indicators of Anomalies
Brazed
Junction
The Problem: Fusion
Copper Weld
Foil

A satellite lost attitude control when defec- Nickel or


Platinum
tive heater circuitry caused a fuel line Lead

rupture. Heater Wire

The Cause: Damage of the wiring at the heater lead (left)


probably caused this failure. A more robust confi-
Shortly after launch, the propellant began guration (right) was used in all subsequent flights.
to freeze. After a few days on orbit,
repeated freeze/thaw cycles fractured a
line, and all propellants were lost. Failure Indicator Available But
Missed
A review of the test record revealed that a
heater had ceased functioning on the Thruster Temperature (ºF)
ground, but the defect was not noticed. Normal 100+
Cutbacks in ground support prevented During First System Test 104
continuous satellite monitoring during During Four Subsequent Tests 77-83
early operations. The anomalous thruster Although the heater failed during early ground tests,
temperatures were recognized several days the problem was not recognized because temperature
limit checks were set to accommodate test environ-
too late by controllers. If the problem had ment changes, not to verify heater performance.
been spotted earlier, the satellite could
Later tests and operations used computer-controlled
have been saved by firing the thruster stepwise limit checks to highlight anomalous behav
before its line froze.
In the wake of this failure, numerous design and operation changes were implemented, and
the propulsion thermal control system on all subsequent flights performed successfully.
Lessons Learned:
• Carefully inspect all test and operational data for trends, oddities, and “out-of-family”
values, even when all values are within preset limits. Evaluate all indicators for potential
impacts, should trends continue. Seek to explain all instances of anomalous data (see
Lesson 19).
• Make sure that experienced operators closely monitor the satellite’s health during early
operations.
• Provide ground-commandable back-up heaters.
• Install heaters to fill/drain lines, and provide temperature monitors for all propellant lines
and valves.
For more technical information, call Lori Crosse at (310) 336-5821.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 39
Space Systems Engineering Lessons Learned
Lesson 40
Avoid Radio Frequency Interference

The Problem:
Signals from one program inadvertently interfered with another program.

The Cause:
This project, driven by a unique requirement,
provides radio frequency (RF) intersatellite Emission from crosslinks can reach
links among its fleet. Earth and interfere with other users.

The RF crosslinks were originally designed to


null toward Earth to prevent appreciable
amounts of emission from reaching the
ground.
Over the years, however, the original require-
ment was forgotten, and the next generation
of satellites no longer nulled toward Earth.
The emission problem can be cured
At a conference, an analyst fortuitously no- by phasing the signals in the array to
ticed that another program’s downlinks used place a null toward Earth.
the same frequency band. A quick calculation
showed that this program would suffer inter-
ference from the crosslinks. The impact is
minimal at present, but will increase as
crosslinks on the first program multiply.
The remaining satellites on the ground had to
be reengineered to reduce leakage toward the
ground.

Lessons Learned:
• Understand why requirements exist in legacy designs before discarding them.
• Coordinate spectrum planning with authorities (for example, Manager of Spectrum
Allocation at the Space Command), because not all frequency usages are public informa-
tion.

For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 40
Space Systems Engineering Lessons Learned
Lesson 41
Carefully Consider the Implication of Test Failures Beyond the Narrow Issues at Hand

The Problem: Conductive


Springs
(a)
Ring
Separators Brus h
The electric power system on a satellite suddenly failed. (b)
Rings

The Cause: Rotor

The slip rings, wired with opposite polarities on adjacent


brushes and therefore prone to arcing, were destroyed Solar
after debris induced a short. Array

Several mistakes led to the faulty design:


Bus Power
(c)
• Having chosen a bus with an excellent flight history, Distribution Unit

the program focused virtually exclusively on the Slip rings connect rotating solar
payload. The bus in fact had to be extensively modi- arrays to the bus.
fied—rotating arrays, for example, were put on the
aft end of the satellite for the first time, requiring
new array drive electronics. Yet, the program was Short
Ejection Force
too firmly set in the idea of a standard bus to grasp
- +
the risks. (a)
Bridging
Boiling
Arc Anode
• The slip ring design provided practically no internal (b) - + - +
Metal
(c)
clearance between adjacent brushes, making it apt
for debris to cause a short. The design was accepted
because another project had flown it. Shorting of slip rings is fairly common—
improperly lubricated brushes can easily
• The other project, however, had rewired the rings to abrade conductive slivers out of the rings.
keep the same polarities next to each other after en- The voltage gap across adjacent brushes
countering a short during launch-simulating exacerbated shorting by triggering an arc,
vibration tests. Notified of the change, the first pro- which wrecked every anode in its path.
gram felt that the change did not apply because its
slip rings were unpowered during launch.
• Slip ring arcing was also observed during ground test of a control moment gyro by the
same contractor working on yet another project. Unaware of this incident, the designers
did not consider shorting in the reliability analysis or in part selection. The program also
deleted thermal vacuum test of the slip rings to save money.
Lessons Learned:
• Thoroughly evaluate the heritage and applicability of using “existing” or “flight-proven”
equipment, especially if modifications have been made.
• Include shorting in analyzing potential failure modes of power systems.
• Apply manufacturing and handling practices that minimize slip ring damage.

For more technical information, call Jeff Lince at (310) 336-4464.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 41
Space Systems Engineering Lessons Learned
Lesson 42
Account for Electrostatic Interaction in Structural Analysis

The Problem:
The performance of a communication satellite significantly degraded.

The Cause:
The satellite deployed a new phased-array antenna, Antenna Element:
consisting of multiple microstrip elements made of Sunshield Cu on dielectric
copper circuits over dielectrics. A large thermal blan- Tensioner
ket, used for the first time on this type of antenna, Antenna Structure Standoff
shielded the elements from the Sun. As Designed
The sunshield was not adequately supported―too
few tensioners were provided to keep the blanket taut
under Earth’s gravity (1 G). The sunshield was
installed loosely, often touching the antenna ele-
ments. Nevertheless, no attempt was made to Actual
compare antenna performance before and after blan-
ket installation on ground, because the cover was The sunshield curled toward the
expected to recover from drooping once in orbit. antenna due to charges that
accumulated in the insulators. Notice
Unfortunately, an electrostatic charge built up in the that electrostatic attraction can take
ungrounded dielectrics of the antenna. The resulting place even though one surface (the
electrostatic attraction overpowered the insufficiently sunshield in this case) is grounded.
applied tension, keeping part of the blanket in contact
with the elements. The phased-array’s gain degraded
due to dielectric coupling and shorting to the conduc-
tive layer of the sunshield.

Lessons Learned:
• Be aware of the propensity of dielectrics to pick up an electrostatic charge in space.
• Thoroughly review the potential impacts of the space environment on flight hardware.
• Whenever possible, a design’s operation in space (0 G) should be designed to be verifiable
under 1 G test conditions.
• Test the entire system in the final flight configuration.

For more technical information, call Harry Koons at (310) 336-6519.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 42
Space Systems Engineering Lessons Learned
Lesson 43
Do Not Circumvent Processes Designed to Catch Human Errors

The Problem:
A satellite was placed into a moderately degraded orbit.
. -.
The Cause: Rn versus Rn
During launch preparations, operators made final The First Software-Related Crash
measurements of the spacecraft’s inertial measurement
unit (IMU). The readings, together with factory calibra- An incorrect formula in the ground
tion data, were used to control the satellite’s orientation software led to the failure of Mariner I
in 1962.
during ascent.
Ascent control required velocity
Unlike all the other inputs loaded to the satellites, the smoothing, or “R dot bar n” where R
IMU measurement and calibration data could not be stood for radius from a tracking
verified in a testbed because the readings had to be made antenna, the dot for the first derivative
just before launch. Therefore, a procedure was set forth (i.e., the velocity), the bar for
averaging, and n for the increment.
to avert mistakes: one operator was required to tran-
scribe the calibrations numbers from the factory The bar was left out of the
printout, another would verify the entries. handwritten equations provided to the
programmer, causing the guidance
An engineer supervising the keyboard operators copied computer to be coded to process raw
the calibration data from the computer printout onto a velocity instead. Confronted by
fluctuating telemetry, the computer
scratch paper, leaving the original printout in his office. sent erratic correction signals, forcing
He gave the scratch paper to the operators, telling them a smoothly ascending booster to veer
that it was suitable. The data were typed in and verified. off course.
Unfortunately, the engineer left out a symbol, and the
orbit insertion went awry!
Lessons Learned:
• Ascertain software databases as thoroughly as the source codes (see Lesson 3).
• Verify software algorithm and database on a simulator whenever possible.
• Double-check manually entered data against original sources.
• Automate data transfer and checking whenever possible to minimize human error.

For more technical information, call Julio Rivera at (310) 336-3287.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 43
Space Systems Engineering Lessons Learned
Lesson 44
Beware of Sneak Paths Through Test Equipment

The Problem:
Two days before launch, a satellite spontaneously tried to deploy.
The Cause: Timed
Baffled engineers found that the separation sensor On/Off Reset Solar Array
Command
unexpectedly powered up. Even then, it should not Spacecraft
Deployment
c Separation e Squib Firing
have turned on. Unexplained internal flaws inside the +28 V Sensor S-Band
unit, which had operated nominally up to that day, f Transmitter
threatened to scrub the mission. Simulator
Turn-On
Launcher/Satellite
Not wanting to spend millions of dollars to return the Port d Breakwires
satellite to the factory, the program sought help from
an outside expert, who found: Simplified Separation Electronics Schematics
• The functional test was unable to detect whether A latch in the separation sensor (powered via
the power relay was open or not. relay c) opens after the satellite breaks away
from the launcher (d), deploying the solar
• The test set inadvertently enabled the sensor, as if array via relay e.
the breakwire had opened. Failure of relay c, due to the addition of a filter
f, formed a sneak path (dashed line) via the
• The sensor could turn on only if powered quickly. simulator port, triggering the prelaunch
• The anomaly first occurred when the bus was anomaly. Premature separation in fact could not
powered up too fast by mistake, but appeared occur in flight because the port is not used.
again after the power was properly reapplied.

The analyst traced the anomaly to a noise filter added to the input line. The filter caused an
overcurrent, welding the relay shut and powering the sensor up. Welding in fact occurred on a
relay installed in this same spot once before, but no corrective action was taken.
Energizing the bus too fast during ground test created a current strong enough to turn on the
sensor and start the deployment sequence. After an abort, the problem recurred upon a nomi-
nal restart because the sensor timer had not yet reset.
Once understood, the concern vanished—the relay would be closed in flight and the sneak
path would be blocked by the flight plug. The satellite flew successfully.
Lessons Learned:
• Determine and correct the root cause of all failures.
• Trace the flow of power and signals from source to load during troubleshooting.
• Provide a mechanism to independently validate the status of critical components.
• Inject unexpected conditions (such as a closed relay, current surge, and sluggish separa-
tion wire breakage) during reliability analysis to discover lurking failure paths.

For more technical information, call Peter Carian at (310) 336-8215.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 44
Space Systems Engineering Lessons Learned
Lesson 45
Guard Against Chloride Contamination Due to Manufacturing Process Changes

The Problem:
Two heat pipes suffered significant performance degradation in system-level test.
The Cause:
Heat Out
Analysis of the failed units revealed particulate
materials, hydrogen gas, and internal etching. Obvi- Heat In Noncondensables
ously, the ammonia working fluid had reacted with Condensor
Thermocouple
the aluminum tubing—a problem that had not Gaseous Ammonia
Evaporator
occurred in recent memory.
Constant Conductance Heat Pipe (Degraded)
The problem was eventually traced to a minor
manufacturing procedure change. After machining,
the vendor previously wrapped the end of the tubing Noncondensables

with aluminum foil to keep dust out. It replaced this


untidy-looking procedure with dust caps. Appar-
ently the tubing scratched the common polyvinyl Variable Conductance Heat Pipe
chloride (PVC) caps, lodging some debris in the Noncondensables diminish heat rejection effi-
assembly. ciencies of constant conductance heat pipes.
Unfortunately, chloride in the PVC catalyzed
ammonia’s decomposition. The entire batch of heat
pipes had to be removed. A Similar Incident
The impact of this procedure change was not evi- An engine suffered severe leak during recent
dent to the manufacturer. The caps were first used ignition testing because the chamber was
on a batch of variable conductance heat pipes in- cleaned with over-the-counter detergent.
stead of the more common constant conductance Chloride in the cleaner induced stress
corrosion, cracking the tubes.
type, and the variable conductance mode masked
the noncondensable problem. Moreover, this batch
passed vendor acceptance test because the test was
made within two days of ammonia charge, before
noncondensables had a chance to build up.
Lessons Learned:
• Heat pipes are highly sensitive to minor materials and process changes.
• Seemingly minor process alterations can have catastrophic side effects.
• Allow sufficient time before conducting tests of chemical degradation.

For more technical information, call Robert Prager at (310) 336-5582.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 45
Space Systems Engineering Lessons Learned
Lesson 46
Make Sure Test Equipment Is Sufficiently Capable

The Problem:
A power regulation unit underwent five months of acceptance tests due to an inefficient setup.

The Cause: S S S
T T T
The unit under test, consisting of eight DC-DC A A A
G G G
power stages, exhibited major glitches during E E E
vibration. Based on sketchy data, the manufac- 8 2 1
turer assumed a short had occurred in the output
(a)
stage, and replaced all suspected parts.
Scope
The same anomalies recurred during a second S S S
vibration test. Now the vendor believed that the T
A
T
A
T
A
first power stage was at fault. G G G
E E E
An independent simulation showed that neither 8 2 1
scenario was credible, and it was recommended
that full instrumentation as well as computerized (b)
data collection be implemented. The manufac- Data Log

turer did not do this.


Troubleshooting was hampered because the test
Four more rounds of vibration failures ensued. set (a) could not monitor all channels. Also, the
Not only did the root cause remain elusive, the reliance on oscilloscopes made data collection
equipment’s vibration life was almost depleted. inefficient. Digital data collection from all ports
The manufacturer proposed to replace the entire (b) solved the problem in a few days.
first power stage, which would have seriously Housekeeping (as opposed to hardware-related)
impacted the program schedule. glitches in facility, software, equipment, or
connectors routinely account for the majority of
Exasperated, the program office made the manu- discrepancy reports, unnecessarily impacting
facturer run one more test with full instru- program schedule.
mentation. Right away, an insidious short in the
current sensor was found. Within a few weeks,
the repaired unit passed.
Lesson Learned:
• Budget for high fidelity, reproducible, functional tests to facilitate troubleshooting.

For more technical information, call Dave Caldwell at (310) 336-6344.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 46
Space Systems Engineering Lessons Learned
Lesson 47
Review Hardware Reusability When Configuration Changes Affect Margins

The Problem:
A satellite failed two weeks after launch when a battery charger shorted.
The Cause:
The short took place between the grounded radiator
Most Screws
and the electronics-mounting heatsink that was at Longer Than Relay
1.217
1.217 inc hes Inches
the solar array potential. Available

The radiator was isolated from the heatsink by thin Adhesive

adhesive and anodization layers only. A tolerance Heatsink (at Solar Array Potential)

buildup, after repeated temperature excursions, Radiator Plate (at Ground Potential)
drove the mounting screws through the anodization, Anodization

causing the short. Conductive debris could also


have bridged the heatsink to the box’s walls. A Vulnerable Packaging Design
Several factors contributed to the mistake: An inspection of the hardware destined for
1. The charger had flown on many spacecraft over the next flight revealed that many screws
were too long to fit into the space between
20 years. Far from robust, the units were the relay mount and the radiator plate,
handled meticulously in the past. The failed making a short virtually inevitable.
box, on the other hand, was treated routinely, Moreover, the heatsink barely cleared the
and not thoroughly inspected. unit walls. Because the heatsink was not
conformally coated, debris such as a loose
2. Two scientific instruments added to this mission solder ball could also have caused a short.
caused the system to run 10 deg C hotter, exac-
erbating the tolerance problem by, for example,
flexing the box walls. Unfortunately, the units
were not requalified.

3. The survival mode software, which could have shed the load and provided time to diag-
nose the problem before the spacecraft batteries were depleted, was not enabled.
Lessons Learned:
• Recognize that workmanship plays a large role in the space hardware, and reliability may
be compromised when undertrained personnel assemble heritage equipment.
• Computerize manufacturability analysis, including interface tolerance buildup, dynamic
interference, and ease of inspection on all packaging designs.
• Provide automatic fault management mechanisms so that a single defect will not bring
down the entire system.
For more technical information, call Robert Tsutsui at (310) 336-3273.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 47
Space Systems Engineering Lessons Learned
Lesson 48
Thoroughly Reverify Software When Requirements Change

The Problem:
The Patriot defense system failed to intercept a Scud Track
Validate
missile.
Range Search
Gate
The Cause:
As the Patriot detects a threat, its radar beam narrows Scud
Patriot
for better tracking. The fire controller extrapolates the Radar

trajectory (the position where the object should appear


next) to commence locking on the target. Trajectory
calculations require knowledge of time. Time is up- Cumulative precision loss let
dated in the system clock every tenth of a second. A the radar look in the wrong
pair of 24-bit integers (31 x 0.1 sec, 32 x 0.1 sec, and place (range gate) for the Scud.
so on) are converted to a floating point number before
computation. Because 0.1 cannot be fully expressed in
binary digits, it is truncated, with a loss in precision by
one part per million.
When Patriots were brought to the Gulf, the software was modified to track faster Scuds. A
change was made to convert clock-time more accurately, but was not inserted everywhere it
was needed in the software. The elapsed time between two radar pulses, which used to be
based on two clock readings containing canceling arithmetic errors, now contained a system-
atic error because the truncated time of one pulse was subtracted from a more accurate time of
another pulse.
The Patriots, designed for mobile defense, were expected to shut down for redeployment or
maintenance after no more than 14 hours. In the Gulf, they were operated continuously from
fixed positions. As the radar clock ticked, the error accumulated. The Army became aware of
the drift, modified the software, and alerted the field units to periodically reboot so the clock
could start anew.
Unfortunately, the instructions arrived the day after a Scud hit an Army barracks and killed 28
soldiers. The battery protecting the base had been in operation for over 100 consecutive hours,
during which the timing inaccuracy had grown to the point where the Patriots could no longer
lock on the Scud.
Lessons Learned:
• Reverify software performance when its intended environment changes (Lesson 18).
• Thoroughly analyze the impact of loss of precision.
• Ensure change analysis is complete and changes are comprehensively verified.

For more technical information, call Suellen Eslinger at (310) 336-2906.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 48
Space Systems Engineering Lessons Learned
Lesson 49
Equipment Intended for Use in Simulated Space Environments Should Be Space-Rated

The Problem:
A flight payload was damaged during thermal vacuum testing.
The Cause:

An inspection of the flight hardware showed


Cd
that the test cable, wrapped in microwave- Vapor
reflective tapes, suffered corona discharging
and overheated. Thermal
Vacuum
Flight-qualified cables have safety features, Chamber

such as built-in vents in the connectors. Un-


fortunately, neither the test cable nor its
connectors were vacuum qualified. Corona
started in the connector, and the cable out-
A Similar Incident
gassed. A destructive resonance, known as
multipaction breakdown, set in, and ignited A test set scheduled for use in the thermal
vacuum chamber contained cadmium-plated
parts on the payload. parts. Cadmium, commonly used to plate mili-
The accident was not caught during thermal tary components, sublimes in vacuum and is
not allowed in space. If the test had gone ahead,
vacuum testing operations because no one
the cadmium could have contaminated not only
from the payload supplier monitored the test the spacecraft being tested, but also the cham-
and because the satellite was not fully instru- ber and future satellites!
mented.

Lessons Learned:
• Perform formal design reviews on ground-test equipment intended for use in space-like
environments.
• Test radio frequency equipment in vacuum to 6 decibels over the expected input level (to
account for unfavorable signal return) to ensure operational safety.
• Monitor flight hardware during test lest overstressing cause damage.
• Improve interfaces between payload engineers and bus engineers, particularly during
system level tests.

For more technical information, call Tom Darone at (703) 633-5134.


For comments on the Aerospace Lessons Learned Program including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 49
Space Systems Engineering Lessons Learned
Lesson 50
Virtual Cross-strapping Extends Satellite Life

The Event:
A government satellite, almost deorbited after losing both primary and redundant gimbal con-
trol, was brought back to operational status.
Position
Sensor x Power Supply
The Cause: CMD Gimbal Controller
Gimbal
Gimbal Motor ? Motor Side A
A power supply failure in the A-side caused O
TLM Controller Processor Driver

B
the payload gimbal control to be switched to C
Forward Control CMD
At Failure
Gimbal Motor
the B-side. Later, the B-side was disabled CMD
Controller Processor Driver
?
Gimbal
Motor Side B

when its position sensor malfunctioned. An TLM Position


x Power Supply
initial analysis indicated an in-orbit fix was Sensor

impossible. Sensor
x Power Supply
New Database
CMD Gimbal
An engineer who worked on the original gim- TLM
Gimbal
Controller Processor
Motor
Driver
? Motor
Side A

O
bal development was brought in to assist B
C
Forward Control CMD Re routed

safing the satellite for de-orbit. Drawing on Gimbal Motor


Gimbal Side B
 Driver
his experience with similar programs and on Controller Processor Motor

x
this gimbal’s design, the engineer realized Sensor Power Supply

that there was a secondary command path for : Disconnected (gain set to zero)
? : Inoperative due to malfunction

the gimbal motor, which would make it possi- X : Malfunctioned


OBC: On-Board Computer

ble to cross-strap the functioning components Recovery Strategy


of both sides. Calculations showed that the The gimbal controller design included a path to
spacecraft’s design margins would support forward-control nonlinear motor driver behavior.
this fix without significantly compromising The rescue scheme fed commands, derived from
mission status. sensor A data and calculated by the processor
using new control laws, into the motor controller
The new control laws were programmed into B via this route, bypassing the processor B.
the controller processor, and the mission was
restored.
Lessons Learned:
• On-board reprogrammability provides enormous flexibility (see Lesson 30).
• In a tight spot, seek cross-program wisdom from diverse organizations.
• Capture knowledge of heritage designs and look for novel ways to take advantage of
design features.

For more technical information, call Hiroshi Shibata at (310) 336-5036.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 50
Space Systems Engineering Lessons Learned
Lesson 51
Review Troubleshooting Process When Encountering Surprising Test Results

The Problem:
An attitude control unit exhibited unrepeatable performance degradation.

The Cause:
In the middle of the acceptance test, a pro-
duction unit failed. Engineers could not
identify the cause.
Eleven days later, the problem abruptly
vanished. An all-out effort, lasting over four
months, failed to recreate the anomaly,
driving the contractor to consider tearing the
unit apart.
It turned out that the unit, slightly modified A Similar Incident
from a product designed for another project, A thermal vacuum test was delayed because two
looked identical to the other except for the rolls of Kapton tapes were mixed up.
part number on the nameplate. Both
operated on the same test set and were Both rolls of tape came from the same supplier
and looked exactly the same. However, the roll
equipped with identical connectors. inadvertently used to attach insulation blankets
contained a adhesive that was based on silicone
Units for both programs, by chance having instead of on low-outgassing acrylics. The
the same serial number, were stored in iden- satellite had to be baked and pumped for a long
tical carrying cases and stowed side by side time before silicone outgassing subsided.
in the same storage cabinet. Apparently, a
technician had removed the wrong unit from
the cabinet to test. During the intensive
troubleshooting effort, nobody checked the
label of the unit under test!
Lessons Learned:
• Consider using bar codes in production control.
• Incorporate design features, such as colored cables, to preclude human errors.
• Don’t overlook simple human errors when confronting unexplained problems.
For more technical information, call Tom Fuhrman at (310) 336-6596.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 51
Space Systems Engineering Lessons Learned
Lesson 52
Protect Cryogenic Systems Against Thermal Expansion Mismatch

The Problem:

An expensive instrument performed poorly and failed early.


The Cause: Forward
Optical Port

The instrument used a dewar filled with solid nitro- Vent Line Two-part
gen to cool the detectors. Between filling the dewar Aluminum
Light Baffle

and launch, cold helium was pumped through coils Vacuum Jacket
For Dewar
Optical Bench

Photodetector
to keep the nitrogen from thawing. Solid Nitrogen/
Aluminum Foam Circulator for
at 58° K
Soon after the dewar was attached to the optical Multilayered
Ground-supplied
Cold Helium Gas

system, the cameras were found to be out of focus, Insulator


Aft
albeit within the adjustable range. An investigation
panel concluded that the dewar had deformed due to How Deformation Occurred
thermal expansion mismatch but approved the
Because the aft part, though which super-
launch. cold helium was pumped, was colder than
Unfortunately, defocusing worsened on orbit, and a the forward part, forward nitrogen could
camera became disabled. Moreover, the cryogen de- sublime and refreeze aft, eliminating ullage
space.
pleted rapidly, ending the mission.
After helium flow stopped, the tank
Apparently, part of the camera light baffle, attached warmed up. The large CTE differential
to the inner wall, expanded forward and touched the (700 ppm/°K for solid nitrogen, 17 ppm/°K
other part of the baffle, which was attached to the for aluminum) probably forced the dewar
to yield. Progressive deformation gradually
outer shield. The thermal short accelerated cryogen closed the gaps between the baffles.
loss and increased deformation.

The unanticipated impact of repeated cooling cycles was not recognized because there was no
prototype testing. During optics installation, an “alarmingly small clearance” was reported,
but neither the designers nor the first investigation team conducted an interference analysis.
Lessons Learned:
• Perform in-depth modeling and thermal cycling tests on cryogenic systems, which are
delicate equipment involving complex physics and material behavior.
• Provide adequate tolerances for thermal expansion mismatch (using flexible links, for
example).
• Be extra vigilant when stretching the state-of-the-art.

For more technical information, call Martin Donabedian at (310) 336-6315.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 52
Space Systems Engineering Lessons Learned
Lesson 53
Test Hardware and Software Together

The Problem:
A satellite lost power shortly after launch.

The Cause: Earth’s


Field
The satellite used magnetic torquers for attitude
control, a common approach.
Applied
Installation constraints made it necessary to Dipole
mount one of the torque coils with a phase
Torque on
opposite of that of the other two coils. Unfor- Vehicle
tunately, this configuration was not reflected in
the software reused from another mission,
resulting in a sign error. Magnetic torquers are coils wound
around an iron core. Passing a current
The mistake was not caught because the soft- through the coils creates a magnetic
ware was reviewed only at a top level. More- dipole which interacts with the Earth’s
over, the attitude control test to verify coil magnetic field and generates a feeble
wiring was hardware-only. An end-to-end test, torque. Reversing the current flow
which would have detected the fault, was (phase) produces the opposite effect.
deemed too costly. Torquer polarity mistakes occur often.
The orientation of large coils are
In orbit, the phase reversal caused the solar easily verified with a magnetometer
array to be steered away from the Sun. Limited (essentially a compass). Background
ground station coverage made it impossible to noise can make checking small
torquers difficult.
diagnose the problem soon enough to prevent
the battery from being drained.
Lessons Learned:
• Rigorously control configuration, especially at hardware/software interface.
• Always ascertain torquer polarity.
• Provide sufficient ground station coverage in early operation.
• Design battery protection to keep the satellite alive long enough for troubleshooting by
implementing automatic load shedding and by configuring solar panels so that even a par-
tially deployed array could keep battery charged.
For more technical information, call Tom Fuhrman at (310) 336-6596.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 53
Space Systems Engineering Lessons Learned
Lesson 54
Design and Handle Cryogenic Equipment with Great Care

The Problem:
Absolute Pressure
A cryogenic dewar containing liquid helium exploded on ground. Relief Valve

The Cause: N2 (liquid) Con-


Solid Air
tainer
Blockage
Liquid helium freezes air. If air leaks into he-
lium containers and blocks vent lines, internal
Helium (liquid) Standoffs
pressurization can set off violent failures.
Container
Blockage may occur when containers are
Optical Spacer
brought to a lower altitude (for example, after Compartment
being carried down from a mountaintop ob-
servatory). Also, since helium boils extremely
readily, any heat ingression can cause the The exact cause of this accident could not be
pressure to rise rapidly. Accidents involving ascertained. The leak sprang due to contaminants
cryogenic equipment are therefore fairly accumulated in the valve, or fatigue of internal
parts. The container was damaged before, which
common. probably sheared off a spacer and tilted the con-
In this incident, a leak allowed air to freeze, tainer slightly. When the blockage formed,
internal pressure pushed the tank into contact
plugging the vent line following a plane trip. with the outer shroud, causing an unexpected
Subtle structural flaws caused a thermal short. thermal short. A small helium leak could have
The pressure rose quickly, and the tank burst. taken place too.
Lessons Learned:
• Review and follow operating and transportation procedures associated with cryogenic
equipment to ensure safety to personnel, flight hardware, or facilities.
• Provide a graceful failure mechanism, if possible, to prevent catastrophic failure.
• Design for containment—make sure the cryogens that unexpectedly boil off can be con-
strained within the vessel.
• Provide redundant vent paths.
• Design for convenient disassembly to aid inspection and maintenance.
• Service absolute pressure valves often—never exceed vendor specifications. Test valves
before every field operation.

For more technical information, call John Hackwell at (310) 336-6041.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 54
Space Systems Engineering Lessons Learned
Lesson 55
Do Not Dismiss Test Anomalies as Random Events—Find Out Why (I)

The Problem:
Two commercial satellites failed to deploy during the same Space Shuttle mission.
The Cause: C/C = Carbon/Carbon Volatiles from
C/P = Carbon/Phenolic
b er Insulator Pyrolysis
Gap Area is Shaded Rub
Both satellites suffered identical mishaps—the Ti
Ti
C/P
carbon/carbon nozzles on their kick motors came Ti
C/P Insu
lato r
C/P C/P ne
off a few seconds into firing. C/C
E xit
Co Bondline
C/C H eat
Three other nozzles failed in a similar manner
during qualification tests. Unfortunately, these
failures were attributed to deficiencies in materi- Exit Cone Collapse
als and workmanship. The flight incident
investigation report also blamed the two failures
on undetected flaws in the material used to fabri- The independent investigation prompted
cate the exit cones. The fundamental problem NASA to conduct its own instrumented
was not diagnosed. firing, which proved the buckling scenario.
Prior to firing, the cone curled toward the
Because the motors were slated for government left. It became vertical (Photograph A) and
applications, Congress asked for an independent started to curl toward the right (Photograph
investigation. Finally, the root cause was dis- B). The cone failed shortly afterwards.
covered: charring of the unvented carbon/
phenolic insulator created gaseous pressure
within the exit cone. Since permeabilities inside
the insulating materials are highly variable, the
gas sometimes became trapped, forcing the exit (a)
cone to buckle. The problem could have been
avoided simply by placing vent grooves in the
bondlines.
(b)
Lessons Learned:
• Exhaustively search for the root cause of failures.
• Conduct fully instrumented tests.
• Provide sufficient thermal and structural margins to allow for material, manufacturing,
and processing fluctuations.

For more technical information, call S. R. Lin at (310) 336-7697.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 55
Space Systems Engineering Lessons Learned
Lesson 56
Do Not Dismiss Test Anomalies as Random Events—Find Out Why (II)

The Problem:
A solar array drive failed soon after deployment.
28V Bus h
Drive Motor Redundant
The Cause: Controller
DC/DC Primary
The problem occurred because of a seemingly Solar Array Boom
Converter

minor design tweak. The addition of an elec- i


B A’

tromagnetic interference (EMI) filter allowed Sun c d


e f
g
Sensor
transient noises from the bus to propagate into EMI A B’
Filter
the drive electronics. A spike blew the fuse
for the H-bridge that controlled the motor.
During thermal vacuum testing in the months
preceding this on-orbit failure, two other Signals from the Sun Sensor passed through
the EMI filter, c, the slip rings, d, and the
satellites in the same block also blew their
amplifier, e, to the controller f. The con-
controller fuses. Unfortunately, even though troller oriented the boom by alternating the
the previous block of satellites never encoun- motor g between two states (A, A’ transis-
tered this problem, the project did not tors on the H-bridge open, B, B’ closed; and
investigate the root cause. The damaged parts B, B’ open, A, A’ closed).
were simply replaced, allowing the satellites The grounded EMI filter, coupled with a
to be bought off. circuit not designed for fast switching,
allowed transient noises from the chassis to
An earnest analysis would have identified the momentarily turn all transistors on, blowing
leakage from the EMI filter, and the on-orbit the fuse, h.
failure would have been avoided. Installation of a resistor (i) eliminates the
noise problem.
Lessons Learned:
• Define and implement a verification plan.
• Perform a worst-case circuit analysis to meet defined interface requirements.
• Always ascertain the root causes of ground test anomalies (Lesson 55).
For more technical information, call Walter Dennis at (310) 416-7207 or Steve VanWormer at
(703) 633-5213.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 56
Space Systems Engineering Lessons Learned
Lesson 57
Protect Propulsion System from Contamination

The Problem:
A launch was delayed for many months. Fill/
Drain
The Cause: Thruster
Pressure
Following a guidance system malfunction, Gauge
the satellite had to be removed from the Hydrazine
Tank
launch vehicle. Off-loading of the toxic
propellant caused a problem: the legacy
satellite had no gravity drains, and the
thruster valve was not robust. Neither the Fuel System (Simplified)
original valve vendor nor the system The higher location of the fill/drain port in
manufacturer was still in business, nor the legacy propulsion system prevents
gravity draining, and the single seat valve
could the build paper be located to help is prone to leak. Dual seat valves (right),
find a good solution. typically used in new designs, would have
prevented air ingression unless both valves
The decision was made to pump out most leaked.
of the fuel, fix the guidance unit, and re-
stack the satellite. Unfortunately, before A Similar Incident
refueling could start, a valve failed. Carbon
An ICBM, refurbished to launch satellites,
dioxide in the air leaked in and reacted suffered a performance degradation re-
with hydrazine, forming corrosive carbazic cently after its turbine seal leaked,
acid and fouling the line. The entire allowing ammonia in the exhaust gas to
propulsion system had to be replaced. react with the lubricant, plugging the filter
and blocking lubricant circulation.
Lessons Learned: The problem, chemically alike the thruster
• Consider retrofitting legacy hardware contamination, was addressed in the
follow-on generation of the rockets, but the
with proven design upgrades. Antici- original units were not retrofitted.
pate out-of-sequence operations, such
as rework, during hardware design.
• Design propulsion systems to
accommodate ground handling by in-
cluding features such as low point
drains to facilitate fuel removal.
• Archive manufacturing documents.

For more technical information, call Mark Mueller at (310) 336-5081.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 57
Space Systems Engineering Lessons Learned
Lesson 58
Guard Against Sneak Paths Through Ground Test Equipment

The Problem:
The primary side of an instrument failed shortly after launch.

The Cause:
Defective Crimps, Soldering,
The instrument had parallel redundant power or Socket/P in Connection
pins, but the power plug on the bus had only Cable
Instrument
single pins for source and return. The flight Current
Source
Flight
Hardware
cable had to be spliced so redundant conduc-
Power Control &
tors could be crimped into the same socket. Distribution Unit
The circuit opened because of broken solder + -
Status Indicators External Power Supply/
joints at the current supply board, loose con- Battery Backup
Should Be Added
tacts, or defective crimps.
A subtle test issue hid this single point failure.
The instrument needed a long time to stabi- Test Setup (Simplified)
lize, and was therefore kept on during ground
testing by an external power supply with
battery backup. On the test stand, the instru-
ment operated normally, despite the faulty
cable, by drawing power from the external Similar Examples
power supply.
A flight box was not grounded by mistake. The
The flaw would likely have been caught if the problem was missed because the test equipment
test equipment provided metering to show the li d di
unit was unexpectedly drawing power from it.
Lessons Learned:
• Independently confirm hardware performance for functions temporarily provided by test
equipment.
• Use a breakout box to check harness connector paths, and directions and magnitudes of
currents flows.
For more technical information, call Peter Carian at (310) 336-8215.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 58
Space Systems Engineering Lessons Learned
Lesson 59
Lesson from Challenger: Understand Your Data!

The Problem:
Vital O-ring data was ignored before the Shuttle lifted off on a freezing morning.
MOTOR O-Ring Tested on
The Cause: horizontal
platforms
During a pre-launch telecon, 34 engineers DM-4 47
in Utah
DM-2 52
debated for hours over whether to delay QM-3 48
the launch, out of the concern that cold QM-4 51
The only 2
launches (of
weather might compromise the seals. SRM-14 53 24) shown
SRM-22 75
Citing O-ring anomalies at both 75 deg F SRM-25 29
Forecasted
and 53 deg F launches, some engineers 27
temperature
for the
argued against launch. But because Challenger
damage occurred both hot and cold,
managers perceived no temperature
A table of temperature data presented during
effect. The launch went forward. pre-launch telecon included irrelevant in-
The Post-Challenger Investigation Com- formation but only selective flight
mission found that in presenting the flight experience. The audience was misled.
history, the engineers omitted data from 12
flights in which the O-rings remained in- Failures
y
tact, mistakenly thinking that successful 8
Only Only Data Points Pre sented

flights did not provide any evidence Scattering


yy y y Data
O-Ring damage index

4
about risk. y
If presenters had plotted data from all 0
25 35 45 55 65 75 85
flights, nobody would have missed the 12
y
effect of temperature on the O-rings! All Data

8 Trend
more
Lessons Learned: obvious
4 Forecasted yy y y
• Consider all relevant information. Temperature
01/27/86 y
• Develop a coherent explanation of en- 0
25 35 45 55 65
yyyyyyy75yyyyyyyy85
gineering data to help audience Joint temperature (°F) at Launch
analyze risks.
O-ring Damage History
• Display data cogently (see Visual Ex-
planations by E. Tufte, for example). Anomalies rarely occurred in warm days,
but routinely took place during launches
For more technical information, call Jon below 65°F.
Binkley at (310) 336-7787.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 59
Space Systems Engineering Lessons Learned
Lesson 60
Tests Are for Verification, Not for Discovery

The Problem:
A satellite started to tumble shortly after deployment.

The Cause:

S
N

The spacecraft used magnetic torque rods to stabi-


lize body spins. During the Guidance and Control
(G&C) subsystem test, an analyst misinterpreted the

N
meaning of the Earth’s magnetic poles and set the
flight software incorrectly. The error went un-
noticed because the coil test had no expected
The Earth as a Magnet
polarity values—the configuration was determined
based on the measured responses. Opposite magnetic poles attract.
The north pole of magnet needles
After separating from the launcher, the satellite be- points to the Earth’s magnetic
gan to wobble. Fortunately, the lead G&C engineer South Pole, also called the
was prepared. Having heard many horror stories geomagnetic North Pole!
about torque rod phase mistakes, he had spent the
previous day making contingency plans. Within half
an hour, he reversed the controller gain, stabilizing
the satellite.

Lessons Learned:
• Expected test results should be established in advance of the test. Deviation from expected
results should raise a flag, and be thoroughly investigated before making any changes.
• Rigorously manage software development, especially on requirements, interfaces, and
configuration control.
• Plan for contingencies, using a top-down fault tree (ask “what happens if the satellite
failed to de-spin?” for example).
• Double-check torquer signs (Lesson 53).

For more technical information, call Tom Fuhrman at (310) 336-6596.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 60
Space Systems Engineering Lessons Learned
Lesson 61
Do Not Assume a Situation Is Acceptable Simply Because Nothing Is Said About It in
Documents

The Problem:
A separation failure sent a launch vehicle tumbling out of control.

The Cause:
Following stage-1 separation, a small interstage Interstage
nd Sta
ge
Ring 2
ring surrounding the stage-2 nozzle also had to be
jettisoned. Equipped with three guide tracks, this 1
st Sta
ge

ring was supposed to slide along three foam Foam Skid

blocks attached to the gimbaled nozzle without Vectorable


Boot
Nozzle

striking it. Side View


Guide Track
One of the foam skids had to be installed just days Interstage
before launch, through an access panel with little Foam Installed
Off-center
visibility. The technician reported to an on-site End View

engineer that the foam felt too tight. Seeing no in- Nozzle

spection criteria in the sparse launch-site


processing instructions, the engineer assumed the
tight fit was OK. He did not realize that the in-
stallation was off-center, nor query the designers Interstage Ring
as to possible consequences. Hang Up
(Dynamic Model)

During ascent, the nozzle was commanded to a


position that further pushed the foam against the
guide track. Staging unleashed the strain in the
foam, which jammed the interstage on the nozzle.
The mission failed. An on-board video camera cap-
tured the interstage hang-up, en-
Several design changes, such as rounding the foam abling the investigation team to
blocks, were later made to reduce the friction be- create a dynamic model and to
tween the foam and the track—something not replicate the problem on a mock-
previously considered. up.

Lessons Learned:
• Double-check designs against possible misinstallation.
• Make sure field-assembled hardware can be inspected.
For more technical information, call Andy Shearon at (310) 336-1762 or Brian Gore at (310)
336-7253.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 61
Space Systems Engineering Lessons Learned
Lesson 62
Test as You Fly

The Problem:
To Vacuum
A battery exploded on orbit.
Bubbles
Electrolyte
The Cause: Reservoir

The silver-zinc (Ag-Zn) battery, powering an Cell

arcjet experiment, fires pulses at a peak power of


30 kW and a current approaching 180 A. It takes
about 24 hours to replenish the battery between The Hazards of Activated
discharges. Ag-Zn Batteries
Silver-zinc batteries, used in all launch vehicles, Dry silver/zinc batteries are activated
are typically run for just a few minutes. Al- by adding electrolytes in a vacuum
though some Ag-Zn batteries are rechargeable, environment. Once filled, internal
they are not intended for arduous duty cycles. In reactions can lead to frothing and
spattering. Launch depressurization
particular, prolonged use of Ag-Zn batteries in and continuous discharging heat up the
space is apt to cause electrolytes to spill, forming cells, causing more spills.
a metallic zinc bridge through which a large
current can flow. This problem led to a serious Serious mishaps had occurred, even on
malfunction in an upper stage (Lesson 22). the ground. Several years ago, a
launch delay caused a battery to
The designers overlooked these issues. Qualifi- exceed its wet life. Days later, it
cation tests did not fully simulate the operation caught fire. Apparently, drops of
scenarios, and all ground firings were performed escaped electrolyte made their way
along the power wires via capillary
with a fresh battery at atmospheric pressure in an action, shorting a connector.
upright position. In actuality, the cells are par-
tially discharged and laid on their sides during
launch, making spills more likely.

In orbit, leakage triggered a violent short. The plastic case ignited, and the battery blew up.
“Ultimately, this anomaly occurred because of a programmatic philosophy to minimize cost,”
said the failure report. “All failure scenarios could have been ruled out if enough testing had
been done.”
Lessons Learned:
• Analyze prior incidents of equipment malfunction.
• Review all aspects of battery application—do not regard batteries as simple plug-and-play
items.

For more technical information, call Doug Chism at (310) 336- 6375.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 62
Space Systems Engineering Lessons Learned
Lesson 63
Verify Field Installations of All Single-Point-Failure Items

The Problem:
A suborbital launch failed because the second stage would not start.
The Cause: Ground
Support
Equipment
After the first-stage burn, two bolt cutters were Igniter
Primary
fired, successfully jettisoning the spent stage. Ordnance
Battery/Circuitry
However, neither the second-stage motor nor its Thermal
Redundant
thermal battery ignited upon command. Ordnance Battery
Battery/Circuitry

The igniter and the thermal battery shared an Ground Bolt Cutter

ordnance connector which, due to range safety Support


Equipment Bolt Cutter

rules, had to remain detached until just before Pre-Launch Configuration


launch. Adjacent to the ordnance connector was
a ground power receptacle.
Just prior to launch, the ground power umbilical Ignitier X
Pr imary Pr imar y
Or d nanc Ord nanc

was removed. Subsequently, the harness slated


e e
b at t er y/ b at t ery/
Cir cuit ry Circuit r y

for the ordnance connector was instead mated by Re dun dant


Or dnance
battery/ Circ
Therm
al
Battery
Re dun dant
Or dnance
battery/ Circ
TX al
Battery

mistake into the neighboring power plug! Al-


u itry u itry

Bolt
Cutter √ ter
though both sides of the connection were male, Bolt

Cutter √ tter
their shell types and pin configuration allowed Launch Configuration
an unintentional fit. Should-Be Actual

The error was not caught because, unlike most √: Deployed X: Did Not Deploy
Air Force programs, an end-to-end test with a
load to verify circuit performance was not per-
formed, nor was a quality assurance checklist
used.
Lessons Learned:
• Simplify interfaces, commands, and procedures in prelaunch operations lest the hectic
pace cause errors.
• Verify final assembly operations, particularly on single-point-failure risks. Pay particular
attention to possible connector mismating.
• Do not allow primary and redundant sides of critical circuits to join in a single-point-
failure area.
For more technical information, call Bruce Wendler at (310) 336-5475.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 63
Space Systems Engineering Lessons Learned
Lesson 64
Review Out-Of-Flow Processes to Ensure No Steps Are Bypassed

The Problem:
The temperature of an antenna dropped below expectation in certain conditions.

The Cause:
A legacy antenna had a radiator that was
oversized for this mission. Thermal de-
signers specified that the excess area should
be covered with multi-layer insulation
(MLI).
A veteran engineer, conducting a
walkaround prior to the system-level Antenna Without MLI MLI Installed
thermal vacuum test, discovered that the
MLI was missing. The blanket was installed.
A Similar Incident
After the test was completed, the temporary
MLI was removed in preparation for instal- A satellite used active louvers to control
the baseplate temperature of an instrument.
lation of the flight MLI. Unfortunately, the
final integration order still neglected to in- The system, including the louvers,
underwent thermal vacuum testing, after
clude the MLI reinstallation instruction. which the louvers were removed. They
Meanwhile, the old hand retired. His were temporarily reinstalled, without being
replacement did not spot the missing MLI, connected, for fit check.
and the antenna was flown without the blan- The louvers were left in place, without
ket. anyone realizing that the connector
remained unattached. Pre-shipment checks
Lessons Learned: did not verify the mate status because the
connector was not accessible.
• Make sure corrections in engineering Running too hot in space, the instrument
drawings or work instructions are back suffered significant degradation.
annotated in all applicable drawings and
shop orders (including subsequent builds
and units that have been distributed).
• Conduct final walkthroughs in the presence of the most experienced personnel.
• Keep good records of all “non-flight” installations.

For more technical information, call Todd Dickey at (310) 336-5352.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 64
Space Systems Engineering Lessons Learned
Lesson 65
Perform Thorough Post-Flight Analysis

The Problem:
A Similar Incident
A launch vehicle lost control.
Misleading instructions on drawings led
The Cause: assemblers to wrap thermal tapes too close to a
separation connector (Lesson 4). The stage
The investigation board traced the mishap to a jammed (see diagram below), stranding the
solenoid valve in the thrust vector actuators. satellite.
Apparently, microscopic metal shavings, Eleven previous flights were subsequently
created during the assembly and adjustment reviewed; all showed the same hang-up. Seven,
and dispersed during ascent, jammed the spool in fact, were saved only because the floating
connectors were jolted apart when they hit the
shut for eight seconds—time enough to ruin the
allowable stops. The mission right before the
mission. failure had the narrowest escape.
In a previous launch, this valve stuck open. In The warning signs were not pursued.
another, it seized up twice, once open, once
closed. Minor anomalies occurred two other
times, but all previous flights succeeded.
Separation
Since a valve that is stuck open is manageable, Stage 2
Failure
these earlier troubles were disregarded. But a
sticky valve can as easily fail closed as open.
The blockage proved lethal.
“It is recommended that procedures for dealing
with flight and ground test anomalies be re-
viewed. This recommendation is necessarily Stage 1

the least specific of those arising from this in-


vestigation, but may be the most significant,”
concluded the board.
Lessons Learned:
• Track down the root causes of anomalies and consider implications beyond the narrow
issues at hand (Lesson 41).
• Unexpected hardware behavior implies a failure to understand the application. Safety can-
not be inferred just because the mission succeeded since the problem may be much more
severe next time (rephrased from Personal Observations on the Reliability of the Shuttle,
by Richard P. Feynman).

For more technical information, call Keith Coste at (310) 336-0032.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 65
Space Systems Engineering Lessons Learned
Lesson 66
Thoroughly Analyze All Environmental Load Paths and Develop a Detailed System
Dynamic Model

The Problem:
A solar array broke on orbit.

The Cause:
Four solar array paddles were attached to the space- Attachment
Solar Beam Release
craft with aluminum brackets. Three brackets were Paddle Mechanism
stiffened with gussets, but interference from Space-
surrounding components prevented a gusset from be- Compliant craft Flexible
Base Hinge
ing added to the fourth bracket.
Shaker Table
During vibration testing, the flexible hinge channeled
most of the force into the release mechanism at the
other end of the paddle, damaging a latching clevis. Test Configuration (Side View)
The problem would have been recognized had the
paddle been instrumented or the component inspected
after test. Unfortunately, the program did not ade- Restraint Magnetometer
Cable
quately analyze dynamic loads during environmental
testing and launch.
Loads during upper-stage burn exceeded nominal, and Damaged
Hinge
the clevis and bracket came loose. The paddle was left
dangling by its cabling. The attitude-controlling mag- Flexible
Harness
netometer malfunctioned, whereupon the satellite
turned away from the Sun, draining the battery. As-Deployed
The satellite was rescued later (Lesson 67).
Lessons Learned:
• Provide extra margins to accommodate excessive launch shocks that occasionally occur,
especially with new launch vehicles (Lesson 11).
• Independently review dynamic loads analysis prior to test.
• Adequately instrument the unit, subsystem, and vehicle during environment tests.
• Check all data and inspect critical parts for damage after tests.
For more technical information, call Julia White at (310) 416-7229.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 66
Space Systems Engineering Lessons Learned
Lesson 67
Provide Design Flexibility to Enable Emergency Recovery

The Event:
Despite a damaged solar array (Lesson 66), a satellite was recovered.

The Cause:
Dangling
When one of the solar paddles came loose, Paddle
the magnetometer attached to it was dis-
abled. Lacking autonomous attitude control, Launch
Vehicle
the satellite turned away from the Sun, and
the battery drained. Ground controllers Paddle Detached
could not contact the satellite.
Fortunately, a video from the launcher Ground
Station
Ground
Station
Dynamic Model
Kalman Filter

showed that the failure might be survivable. x Magnetometor Flight Ground


Horizon/
Operators persevered. Horizon/
Sun Sensors
Control
Software Sun Sensors
Control
Software

Weeks later, a downlink arrived. As it Satellite


Dynamics
Torquer
Coils
Satellite
Dynamics
Torquer
Coils

happened, the satellite had rotated such that X: inopeative

Earthshine could partially replenish the As-designed Control Loop Imp lemented in Operation

battery!
Pointing Information Recovery
All non-emergency functions were com-
manded off to allow the batteries to fully Accurate attitude knowledge, especially
during orbit night when most of the ob-
charge. With its torquers manually con- servations were made, posed the next
trolled from the ground, the satellite was challenge—the satellite no longer rotated as
reoriented toward the Sun and spun up a rigid body; even the spin axis orientation
nominally. Full operation started three was uncertain.
months after launch. The program created a non-linear rigid-
body model. Using Sun sensor and horizon
Lesson Learned: crossing indicator data as input, an algo-
rithm incorporating Kalman filters
• Provide as much telemetry as possible calculated the satellite attitude to 0.25º
on launch vehicles, especially on sepa- accuracy, even during most of the orbit
ration events. Without knowing how the nights when direct sensor readings were
satellite malfunctioned, controllers unavailable. Most mission requirements
were met.
would likely have given up before the
downlink was received!

For more technical information, call Tom Fuhrman at (310) 336-6596.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 67
Space Systems Engineering Lessons Learned
Lesson 68
Insist On End-to-End Ownership to Verify Interfaces

The Problem:
An uncontrolled explosion during the release of a satellite damaged the Space Shuttle.

The Cause: 55”


Separation
The separator’s harness, which plugged into Ring
four closely located detonators, was de- Primary Detonation
Detonator
signed wrong. Cords

The “Fire 1” command should go to the port Interstage


Bolted-On
and starboard primary detonators, followed Doubler Joint Redundant
Detonator
by a “Fire 2” command to the backups a
Primary Rubber Holder
fraction of second later. A successful firing Cord
Redundant Cord
of the primary cord will cut off the backup
signal, preventing excessive explosion. Stainless Steel Crack Point
Before Separation
Containment
Tube
Instead, the “Fire 1” signal was routed to the
port detonators for both the primary and After Nominal
backup cords. A simultaneous shock broke Separation

the containment tube, hurling debris through


the shuttle bulkhead. Fortunately, nothing Separation Mechanism (Simplified)
critical was hit.

The mistake was not caught despite hundreds of hours of reviews and tests because the sepa-
rate drawings were never put together into a single, end-to-end, schematic. “Even after the
occurrence of the separation system anomaly, detecting the design error through drawing
reviews was difficult,” reported the investigation panel.
Investigators also found that the documentation describing the mechanical and electrical sub-
system interfaces was inadequate. Labeling of the components was “incomplete and
confusing.” Verification tests were flawed—designed to ascertain that the separator was built
to the (flawed) design, instead of demonstrating the intended function. Discrepancies raised
during the critical design review were not properly resolved.
Lessons Learned:
• Develop end-to-end diagrams for electrical and mechanical interfaces, including software
driven interfaces.
• Clearly label each connector to avoid mismating.

For more technical information, call Selma Goldstein at (310) 336-1013.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 68
Space Systems Engineering Lessons Learned
Lesson 69
Protect Solid Rocket Grain Structure from Destabilizing Gas Flow

The Problem:
A prototype solid rocket motor exploded during prequalification firing.

The Cause:
Igniter Case and
Mixing of combustion gas streams created a Joint

turbulence near the interstage joints, causing


the soft propellant grains to crack. Blocked Stress Relief
in the main bore by slumped propellant, the Grooves
(a)
gas burst the case.
The contractor did not realize that the grain Propellant

deformation should be taken into account Motor Case

even though a similar problem occurred in


another solid motor (a lesson not shared). A Crossflow
subscale flow test would have revealed the
dynamic instability problem. Unfortunately,
the contractor bypassed this step.
In the wake of this failure, a sophisticated Crack

model was developed so that the impact of


gas flow on the grains could be evaluated. A (b) (c)
redesigned motor successfully passed the
full-scale firing. The original design (a) constricted flow at the
segment joint. The grain cracked (b), further
Lessons Learned: raising chamber pressure.

• Conduct adequate subscale testing. Chamfering of the forward grain face (c)
eliminated the chokepoint.
• Study post-test and post-flight anomaly
reports from similar programs.

For more technical information, call Nat Patel at (310) 336-6473.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 69
Space Systems Engineering Lessons Learned
Lesson 70
Late Modifications Require Careful Revalidation

The Problem:
A jammed tether prevented a satellite from being deployed from the Shuttle.

The Cause:
Post-flight inspection found that a bolt pro-
truded into the path of a traveling ball nut. Tether

The bolt was installed at the launch site,


Tether
after an eleventh-hour analysis uncovered a Guide
design error: an overlooked thermal design Pulleys
Tension
change took away the cold plate’s ability to Linear
Adjustment
carry loads, and altered the satellite’s mass Drive
Drive Chain
properties. By the time the problem was Traveling
found, the tether deployment mechanism Ball Reverser Housing
had been validated, and the satellite had Added Bolt
been integrated.
Under severe pressure to improvise a fix, en- Tether Mechanism (Simplified)
gineers overlooked the interference caused
by the bolt because assembly drawings were
not current (no updates were required until
after three modifications), nor did drawings
provide a direct view of the interference
path.
The original design engineer, thousands of miles from the Cape, could not see firsthand how
the modified hardware fit. The modification was not tested, and the change review considered
only the loads.
If the load inadequacy had been discovered sooner, it could have been corrected by simply
making the fasteners larger.
Lessons Learned:
• Perform thorough analysis and testing of late hardware changes. Pay particular attention to
system-level impacts.
• Update structural analysis following design changes to find problems earlier.
• Avoid assessing design changes from a narrow, discipline-oriented view.
For more technical information, call Brian Gore at (310) 336-7253.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 70
Space Systems Engineering Lessons Learned
Lesson 71
Make Sure Ground Support Equipment Cannot Damage Flight Hardware

The Problem:
An oxygen tank on Apollo 13 blew up.
Supply
Line
The Cause:
Fill
Tube
A month before launch, the spaceship was stacked on Thermostat
Switches

top of the Saturn V booster and moved to the pad.


Countdown rehearsal began.
Operators filled two liquid oxygen tanks in the service
module, then pumped in gaseous oxygen to empty them.
One tank could not drain. Apparently, a handling acci-
dent had jarred loose an internal fill tube, preventing the
gas from reaching the tank bottom and displacing the
liquid. An internal heater, designed to maintain tank Oxygen Tank (Simplified)
pressure in flight, was used to boil off the remaining
liquid oxygen.
Initially required to operate from 28 bus volts, the tanks had been reengineered to accept
ground power at 65 volts. Unfortunately, the redesign overlooked two bimetallic thermostats
protecting the heater circuits, and neither qualification nor acceptance testing exercised them.
As the detanking proceeded, the temperature rose. The bimetallic switches began to open, but
the higher voltage immediately induced arcing across the contacts and welded them shut.
With no one monitoring the current, the heater ran for eight hours. The temperature reached
1000ºF, severely damaging the insulation on the power wires leading to a fan motor. When
the astronauts activated the fan en route to the Moon, a short touched off the infamous explo-
sion.
Lessons Learned:
• Ensure heritage thermostats and relays properly function when the system is redesigned
for higher voltages.
• Provide ample test instrumentation to validate that all components of a system are
functioning properly, and always check for unplanned current draw (Lesson 19).
• Individual heater circuits should not draw more than two amps to prevent thermostats
from being damaged by self heating (each of the Apollo 13 switches drew six amps).
• Thoroughly test subsystems that are not exercised until they are integrated into the main
spacecraft (such as propulsion lines) during system thermal vacuum test.
For more technical information, call Bill Fischer at (310) 336-5198.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 71
Space Systems Engineering Lessons Learned
Lesson 72
Prevent Failures in Support Equipment from Propagating into Flight Boxes

The Problem:
A transmitter was damaged during test. Should Add Isolation
Resistors Here

Sequentially
The Cause: Scanning
Monitor
The test set incorporated 15 separate power
5 V Flight Hardware Power Supply 1
supplies with various voltages. To auto-
10 V Flight Hardware
matically record data from each test point, Power Supply 2

a computer addressed the power supplies 28 V Flight Hardware Power Supply 3.. Test
Se t
via a bank of relays. The commercial test 31 V Flight Hardware Power Supply 15
unit did not isolate each monitor point. Reed Relays

A relay on the 5-volt line did not disengage


Test Arrangement (Simplified)
after being scanned, remaining tied, via the
monitor’s internal bus, to all power
supplies subsequently scanned. Exposed to
Reed Relays
as high as 31 volts during the following
scan, the 5-volt flight circuits were Reed relays, commonly used in control
circuits, consist of two overlapping iron strips
damaged.
enclosed in a glass tube. The contacts are
The original safety analysis of the ground readily closed with a magnetic field applied
equipment did not consider the impact of a via the surrounding coils.
failure on the flight box. Isolation resistors The strips should spring back to their
between the power supply lines and the normally open position after the field is
turned off, but residual magnetism or
scanner inputs would have averted the magnetic contaminants sometimes keep them
damage. stuck closed.
Lessons Learned:
• Buffer test point outputs so shorts in test will not damage flight hardware.
• Implement abort logic in automated test equipment to prevent damage if a failure occurs.
• Thoroughly understand the inner workings of any item that interacts with flight hardware.
For more technical information, call Ron Williamson at (310) 336-2149.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 72
Space Systems Engineering Lessons Learned
Lesson 73
Trace All Software Changes Back to System Requirements and Specifications—Do Not
Simply Modify the Code

UV Computation SW
The Problem: Tracking Difficulty
New Thruster + Unit Mix-up

A spacecraft broke up near Mars. Wrong Wrong


Nav Analysis
Imprecision
Input Check
File Table
The Cause: Distracted,
Heritage Wrong Undermanned
En route to Mars, the probe would fire its thrust- Codes Model Team

ers to unload the reaction wheels. Ground con- Spacecraft Spacecraft Trajectory Navigation
Telemetry Model Estimation Failure
trollers planned the burns with a thruster model,
reused from a successful mission. Complex Failure Causes

A thruster change made it necessary to update this model, which specified thruster input in
Newton-sec. The thruster vendor—the same for both missions—used lb-force-sec. In the
original model, engineers correctly added the 4.45 conversion factor to the vendor’s equation.
Overlooking the interface specification and seeing no warning in the code comments, the
follow-on team simply made a substitution.
Labeled as non-mission critical, the ground software—without the conversion factor—was
not rigorously reviewed; the “truth” table, computed manually for acceptance testing,
contained the same mistake. Interface with the navigation function was informally tested only
to ensure that it could move across servers.
Only one, occasionally two, engineers navigated the spacecraft. Two months before orbit in-
sertion, radar returns projected a path too close to Mars. Unfortunately, as the probe neared
Mars, poor observation geometry from Earth reduced tracking precision. The flight team, con-
fident with their navigation ability, decided against raising the orbit.
Not until aerobraking, after Martian gravity had captured the probe, was it possible to calcu-
late the spacecraft’s true position. Only then did the controllers realize the probe was 100
kilometers off course!
The successful reflight listed both English and metric units on all interface control documents,
adopted a more robust navigation method, and used six full-time navigators.
Lessons Learned:
• Any software that commands a satellite is mission critical, even though it may not be
embedded in the flight vehicle.
• Validate changes in mission-critical software with more vigor than the original develop-
ment (Lesson 25, 29, 47). Rigorous formal testing is essential.
• Always specify the units in requirements and Interface specifications.
• Generate expected results used in verification tests independently, in accordance with
system requirements.
For more technical information, call Suellen Eslinger at (310) 336-2906.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 73
Space Systems Engineering Lessons Learned
Lesson 74
Understand Why Warning Lights Come On Before Disabling Them

The Problem: Refrigeration Unit Ran for a Few Days


Shutdown Level
During a satellite test, the thermal vacuum 1 Factory Set 1

Turbine Wheel Inlet Pressure


3
1

chamber suffered a pressure burst.


Typical
2
Operating
The Cause: Pressure Oscillates due to Growing Leak Range

An investigation revealed that the helium re- Alarm Level


Factory Set
frigeration system unexpectedly shut down. Actual after drifting
Reset

The unit had sprung a small leak during a pre-


vious test. After a Few More Days
As the test progressed, the helium leak rate in- Shutdown Level
1 Factory Set
creased, causing the pressure in the turbine 2 Actual after drifting,
to below alarm level
1

wheel inlet to oscillate. 2

Vibration caused the alarm setting in the pres-


sure regulator to drift down. The alarm went off Pressure Fluctuates More
Chamber shut down
within acceptable
a few days into the test. operation range,
without warning
Knowing that the equipment was working well
within its normal range, the testers returned the
alarm level to near the factory-set level. The Turbine Pressure Trend
engineers unfortunately did not realize that the
regulator’s emergency shutdown sensor could
drift down, too.
A few more days into the test, while the turbine was still operating within its normal range,
the oversensitive emergency shutdown sensor tripped. Lacking an operator override or other
means to gracefully degrade, the turbine switched itself off. Since the alarm setting had been
adjusted up, the malfunction came without warning. The satellite could not be powered off
first, and corona discharging set in. Luckily, robust hardware design practices prevented seri-
ous damage.
Lessons Learned:
• Operate environmental tests with the same degree of care as space operation (Lesson 49).
• Develop test contingency plans and failure-mode-and-effect-analyses for ground support
equipment (for example, analyze the likelihood of contamination in case the thermal
vacuum facility loses power).
• If turning off a piece of test equipment can endanger flight hardware, such equipment
must not be allowed to shut down autonomously.
For more technical information, call David Homco at (310) 336-5800.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 74
Space Systems Engineering Lessons Learned
Lesson 75
Protect High-Voltage Equipment from Contamination

The Problem:

A satellite was lost when the tether deploying it was severed by arcing
.
Polyester Core/Overwrap
The Cause:
Fluorocarbon
Inspection of the recovered tether fragment re- Insulation
vealed contamination, pinholes, and other de-
fects. Debris was also found on the deployment
mechanism.
Kevlar (Providing Strength)
Apparently, the underlayers of tether experi-
enced severe compression loads while wound Copper Conductor

on the reel. The insulation layer flattened,


Tether Construction
causing debris to puncture through.
As the deployed tether flew through the Earth’s magnetic field, a potential of several thousand
volts was generated along its conductive core. An exposed spot attracted a spark from a
nearby pulley. Because the mechanism housing was insufficiently vented (Lesson 49), the
arcing continued, burning down the tether.
To avoid fatal arcing, the program fabricated the insulator layer with great care. Unfortu-
nately, subsequent processing was performed in a regular shop, making contamination
inevitable.
“Excellent designs can be defeated through quite common cleanliness and handling viola-
tions,” concluded the investigation board.
Lessons Learned:
• Design high-voltage equipment to withstand mishandling.
• Properly vent enclosed areas to eliminate corona and arcing caused by outgassing and
pressure buildup.
• Thoroughly test the entire circuit if a high voltage is expected.
For more technical information, call Peter Carian at (310) 336-8215.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 75
Space Systems Engineering Lessons Learned
Lesson 76
Make Sure Someone Takes Responsibility for Each Interface

The Problem:
A space probe was damaged on the launch pad.

The Cause:
The Importance of Stating TBDs
The probe, developed by one agency (A), re-
Agency B’s cooling plan stated that the
quired another agency (B) to provide launch- equipment would be set to “agency A
pad cooling. value” or “desired” flow rate.” The two
partners reviewed the plan step by step,
Neither agency bothered to assign interface never realizing that this number had not
responsibilities. The requirements were not been agreed upon.
spelled out; the design and operational proce-
Stating “set to TBD ± TBD units
dures were not placed under configuration (agency A value to be supplied)” would
control. Communications faltered. have raised a flag and avoided the
misunderstanding.
Agency A faxed agency B a gas-flow value,
which it intended as the not-to-exceed limit.
The nominal value was buried in a thick re-
view package.
Seeing only the faxed number, agency B made certain it could be met by making several pro-
cedural changes, such as narrowing the cooling duct, without considering the effect of too
much air. On the pad, excessive air flow tore a hole in the probe’s insulation.
The investigation board found that in five years the two organizations missed catching the
problem 26 times. “The actions taken were logical, based on the knowledge available to the
people taking action. The incident was entirely due to inadequate or imprecise information
exchange,” said the board.
Lessons Learned:
• Check ground operation procedures and support equipment to avoid damage to flight
hardware.
• Ensure interfaces between two organizations are worked out in detail, agreed to by both
sides, and documented.
• Bound each requirement within a range.

For more technical information, call Susan Ruth at (310) 336-6765.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 76
Space Systems Engineering Lessons Learned

Lesson 77
Make Sure Sequential Safety Devices Operate Independently

The Problem:
A science mission ended during the first orbit.
Sate llite
28V Regu-
Power lator Po wer-On 5V Relay Arming
F Driver Relay
The Cause: Res et Reset
P Power
Clock
On-Board ARM Osc illator G Sw itch Switch
The aperture cover’s design called for its pyro cir- Computer A Driver

FIRE
cuits “safed” prior to being sequentially “armed” Pyro Electronics Pyro

and “fired.”
Timing Issue in the Safety Mechanism
A design feature in the controller chip invalidated After the bus power is switched to the
all the programming circuits for a few milli- pyro box via a relay, the controller (a
seconds upon powering up. All outputs, including field programmable gate array, FPGA)
should be safed and initialized at the
“ARM” and “FIRE”, were momentarily asserted.
direction of an oscillator clock.
The cover blew open prematurely; the cryogen
escaped. It took 30 milliseconds for the local
voltage to rise and another 25 milli-
The chip would manifest this start-up problem seconds for the safing clock to start,
only after having been turned off for several but only 15 milliseconds for the
transient to occur.
hours. Although power cycled many times during
component testing, it was never unpowered long
enough to reveal the problem.

The use of a slow, non-flight-like, power supply during unit testing masked the spurious out-
put: during the transient period there was not enough voltage to close the arming relays. Later,
anomalies repeatedly occurred during system testing. Unfortunately, because the pyro simu-
lator was very sensitive, a load delay was fitted to the test equipment to filter out spurious
triggers, unintentionally preventing the actual start-up glitch from being recorded. The warn-
ing signs were ignored.
At launch, the chip had been powered down for weeks. Not only did it go awry but, because
power to the pyro box was applied via a fast relay, sufficient voltage had also built up to com-
plete the arming circuit. The FIRE switch, commanded by the same controller and therefore
not truly independent, set off as well, ending the mission.
This controller chip had caused troubles before, prompting NASA to issue an application
note. However, the contractor and the field engineer from the vendor did not know about it.
“[We need] an information hotline, set up on an industry-wide lessons learned web page,”
suggested the engineers later.
Lesson Learned:
• Beware that many programmable devices do not follow their truth tables at power-on—
see http://www.klabs.org/ for more information.
For more technical information, call Peter Carian at (310) 336-8215.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 77
Space Systems Engineering Lessons Learned
Lesson 78
Thermal Blankets And Tie-down Cables Can Jam Mechanisms

The Problem:
An antenna reflector on a communication satellite could not deploy.

The Cause:
The reflector was tied down to the bus deck
Reflector
during launch with four cables. When the Velcro attachment,
reinforced with
cables were pyrotechnically cut, the two Kapton tape
hinged reflector booms failed to deploy.
Tie-down mechanism,
Later, ground testing showed that the pocket- Blanket with cable and internal
shaped thermal blankets covering the tie- cover springs (one of four)
down mechanisms expanded during the
ascent, fouling the wrap cable. The spring-
Cable cutter
loaded hinges did not have enough force to
overcome this interference. Antenna Reflector (Simplified)

Fortunately, the satellite was designed to


collect sufficient solar power even when the
arrays were stowed, making it possible to spin
and nutate the spacecraft in progressively
more drastic maneuvers. Using ingenious
ways to control the orientation, the operators
were able to force the hinges open without
damaging the satellite. The reflector opened a
month later.

Lessons Learned:
• Anticipate the errant movement and expansion of flexible materials, such as wires and
blankets.
• Allow thermal blankets to vent whenever possible.
• Avoid protrusions or sharp edges that can snag soft items.
• Indicate the presence of soft goods on top-level assembly drawings to draw attention to
the risks of interference and obstruction problems.
For more technical information, call Robert Postma at (310) 336-7228.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 78
Space Systems Engineering Lessons Learned

Lesson 79
Make Sure Software and Hardware Engineers Communicate with Each Other

The Problem:
A “Deadly Embrace” by the Watchdog
An experimental spacecraft lost its computers. The computer uses an independently
clocked watchdog function (Lesson
The Cause: 36) to enable switching to the re-
dundant CPU if the primary side
The satellite, hitchhiking on the qualification malfunctions (for example, due to
flight of a launch vehicle, was designed and radiation damage).
built in one year. The final software mistakenly set the
watchdog counter to 0.1-s, but it took
The bus software was checked out against the the hardware about a third of a second
engineering model without incident, but was to boot. The CPU could not finish
not tested against the payloads until the space- booting before being reset, and was
craft was already loaded onto the host vehicle. stuck in an endless loop.
It was then discovered that a payload per-
formed very sluggishly.
Three launch-support engineers worked 14 hours a day for a week to adjust the bus memory-
management functions. They created several software patches, one of them contained a wrong
boot-up parameter. The mistake was not caught because the software developer did not con-
sult with the processor engineers, nor verify the changes in the engineering model.
The software was loaded into the primary processor, which right away halted. Assuming a
faulty primary memory was the cause, and again not enlisting the CPU expert’s help, the en-
gineers loaded the same code in the backup computer. It froze, too.
The computer could be physically reset. But by this time it would take several days to remove
other experiments to reach the frozen computer, possibly delaying the flight. The host mission
refused, and the hitchhiking project could only watch the launch, knowing its computers had
already died.
The project manager traced the failure to poor communication between the software and
hardware personnel, because the software team worked in isolation.
Lessons Learned:
• Make sure no single parameter error or single spacecraft malfunction can cause endless
cycling (for example, by enabling the watchdog function to switch to a recovery mode
after a few “try agains”).
• Double-check last-minute code changes (Lesson 43).
• Problems in embedded systems are not always due to random hardware defects. Pause and
think before inflicting the same software flaw on the redundant side (Lesson 18).
For more technical information, call Lan Nguyen, at (310) 336-2146.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 79
Space Systems Engineering Lessons Learned
Lesson 80
Check, Double-check, and Triple-check Torquer Phases

The Problem:
A magnetic torquer sign error was caught just one day before launch.

The Cause:
The attitude control engineer who calculated Two Other Mistakes on This Mission
the fields induced by the applied current made
1. The calculated moments of inertia,
an error in an equation, which reversed the which should have been referenced
predicted torques. against the center of gravity, were in-
stead referenced against the origin
The engineer left the project, and his suc- point on the drawing. The mistake was
cessor, misunderstanding the vendor’s caught by an independent analysis
drawing notes, installed all three coils upside (Lesson 2).
down. The second error, which could have 2. The star tracker misbehaved on-orbit
been easily discovered with a compass, was because the vendor altered its coordi-
masked by the faulty truth table. nate convention but the change notice
was not heeded.
Fortunately, the prime contractor’s president
had concerns with a delay in generating solar
power (Lesson 53). As a result, the attitude
control components relating to sun acquisition
were thoroughly scrutinized.
To alleviate prelaunch work load, the customer paid to bring back the original attitude control
engineer. Rechecking his own calculations, he spotted the sign error one day before launch.
Lessons Learned:
• Don’t overlook simple tests that can discover problems early.
• Whenever possible, conduct independent analyses.
• Document attitude control coordinate frames early in development to avoid mistakes.

For more technical information, call David Voelkel at (505) 846-8380 or Geoffrey Smit at
(310) 336-1602.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 80
Space Systems Engineering Lessons Learned
Lesson 81
Designate A Responsible Engineer for Complex Equipment

The Problem:
A satellite lost part of its primary structure one minute after liftoff.

The Cause: Undeployed


Problem Solar Array

The micrometeoroid shield that enclosed the Area Solar Array

spacecraft was peeled off by aerodynamic Micrometeoroid


Shield
loads.

d
oi
or
te
ld e
The 1200-pound shroud was supposed to fit

ie om
Sh icr
M
tightly to the satellite body during ascent and
then extend five inches after reaching orbit.
The contractor delegated the development of
this complex hardware to its structures
department without putting a project engineer
in charge.
Coordination suffered. Not having been told
that the shield must fit tightly during launch,
the structural and manufacturing engineers Tiedowns for
Solar Array
made it light but fragile. Without looking at Broke Apart

the actual hardware, project engineers


assumed that design criteria were met and saw Damage Mechanism
no aerodynamic concerns. All dynamic tests
Supersonic air rammed through a
were waived. supposedly sealed tunnel on the shield,
The investigation board blamed the failure on generating excessive lift that broke the
a lack of systems engineering leadership and shield as well as a nearby solar array.
chided the engineers for “believing that a
drawing is the real world.” The board con-
cluded that “positive steps must always be
taken to assure that engineers become familiar
with actual hardware.”
Lessons Learned:
• Designers should inspect actual hardware (Lesson 26).
• Analysis does not obviate the need to test.
For more technical information, call Susan Ruth at (310) 336-6765.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 81
Space Systems Engineering Lessons Learned
Lesson 82
Understand Transient Behavior of Analog Circuits

The Problem:
A pyro device failed to fire on orbit.
The Cause: Spec: 3.5 A
1A
The incident stumped engineers because pyro units
rarely malfunction, and two certification units fired
successfully.
0 amp
An outside expert pointed out that when current
passed through the bridgewire, ohmic heating raised 0.01 s
its resistance. Because the firing circuit was designed Current vs. Time (Malfunctioning Unit)
as a constant voltage output, current and power
dropped off (P = V2/R) just enough to thwart ignition.
Most pyro unit outputs are current-limited with series resistors, or energy-limited with ca-
pacitor discharges. Few engineers realize that the bridgewire resistance can change within the
hundredths of a second it takes to heat the bridgewire enough to ignite the charges. In fact, the
initiator specification only stipulated the firing current, not how long the pulse should hold.
The designers, who did not know how pyro circuits typically work, used a constant, low-
voltage approach that turned out to be vulnerable.
A lack of fidelity in design verification hid this mistake. During simulation tests, a resistor was
used to emulate the initiator, and the current was steady because the resistance did not change.
A fast-blow fuse, which more accurately simulates the load, would have revealed the resis-
tance change.
The design was certified based on only two live firings, during which no current trace was
recorded. In retrospect, the successes were purely a matter of luck—there was just enough
current margin for success 60 percent of the time. If more units had been fired, or if instru-
mentation had been used, the inadequacy would have been found.
Lessons Learned:
• Check time-dependent circuit behavior, and bound transients in specifications.
• Do not qualify a design solely because a unit worked. Measure circuit parameters and ver-
ify that positive margins exist.
• Analyze instrumentation data, which can provide more engineering information such as
postfire conduction (which may drain flight battery).
• Understand how circuits are typically designed and tested before inventing novel
approaches.
• Qualify pyro devices by conducting lot acceptance testing.
• Review the Pyroinitiator User's Guide published by NASA (JSC-28596A).
For more technical information, call Ron Williamson at (310) 336-2149.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 82
Space Systems Engineering Lessons Learned

Lesson 83
Put Critical Analyses Under Configuration Control

The Problem: Fuel pump housing temperature (°F) PSIDIFFERENTIAL


Prechilling used to be less efficient:
Two upper stages failed for the same reason -330
Even if air leaks in, it cannot freeze because
the pumps were not cold enough
within 16 months. Air
freezes Two failed engines:
-350 High Δ P and leaky valve let warm
The Cause: air in, raising pump temperature
and freezing the air
Before launch, liquid helium was circulated -370
through the cryoengines so they could start
Two functiona l engines
smoothly. During the boost phase, aerody- -390

namic turbulence shoved air into the helium


feed port. A malfunctioning check-valve -410
Aerodynamically generated
2

allowed the air into the frigid engine, where it ΔP across check valve (right scale)
1

froze and jammed the turbo pump. -430


0 40 80 120 160 200 240 280
0

Time after launch (sec)


A check-valve, instead of a more secure
shutoff-valve, was used in the duct because an
Telltale Thermal Telemetry Signatures
air flow computation indicated that no
pressure differential would exist within the The failure cause was found in out-of-family
data from successful flights between the two
line. But subsequent design changes created a failures. Notice that a process change, chosen
pressure gradient. Because the aerodynamic to reduce development costs, chilled the
analysis was not placed under configuration engine so much that ingressing air could
control, there was no requirement to recheck freeze.
the calculations to confirm that the check-
valve would still suffice.
After the first failure, engineers tore apart several pieces of hardware in stock and found re-
sidual Scotch-Brite in numerous joints. Under considerable schedule pressure, they concluded
that the failure was caused by contamination. The second investigation team examined more
than 1200 potential causes before finding the actual cause.
Lessons Learned:
• Do not assume the first, easiest explanation is the correct one.
• Refrain from using check-valves as sole means for isolation, as they can chatter or leak
(the check-valve design and assembly process on this launcher was particularly prone to
seize in the open position). See Check-Valve Reliability in Aerospace Applications, NASA
Preferred Reliability Practice No. PD-ED-1267, for additional information.
For more technical information, call Robert Foust at (865) 932-0366.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 83
Space Systems Engineering Lessons Learned
Lesson 84
Check Start-up Circuit Behavior, Particularly at Low Temperatures

The Problem:
The primary side of an onboard computer would not turn on.

The Cause:
The computer received analog housekeeping
inputs via numerous multiplexers inside the Load total at -5°C

Load Current Drawn (mA)


data interface unit (DIU). Load total at +3°C

During power-up, multiplexer chips can


draw twenty times more current than during Power supply current limit

steady operation. The designers did not no-


tice this start-up surge partly because it is
only significant at low temperatures.
Following a safe-hold event, the onboard
Voltage Applied (volts)
computer tried to reboot when it was unusu-
ally cold. The current draw exceeded the Effect of Temperature on Turn-on Loads
limit set on the fault-tolerance circuit, The multiplex chips draw 0.25 mA
preventing the primary DIU, and conse- during operation, but as much as 5 mA
during cold power up.
quently the primary computer, from starting.
When the current draw exceeds the
The current limiter did not have large power source’s capability, the unit would
enough margins because the DIU was inher- continue trying to reboot. The primary
ited from an earlier design that supported computer timed out; its back-up finally
fewer multiplexers. The low temperature succeeded in booting after the chips
warmed up.
DIU test was manually controlled, and the
engineers did not realize that the unit took
longer to boot than the time limit pro-
grammed into the computer.

Lessons Learned:
• Use fault-tolerance circuits to protect upstream assets, not load units. Better yet, use dual-
level current limiters to protect load units during ground tests. But for flight, protect only
the source circuits.
• Redesign fault-tolerance circuits when the load units have been substantially altered.
For more technical information, call Peter Carian at (310) 336-8215.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 84
Space Systems Engineering Lessons Learned
Lesson 85
Systems and Software Engineering Should Actively Coordinate
The Problem:
A satellite could not be deployed.
Forward BW = Bridge wire
Payload SFC = Squib firing circuit
The Cause: I/F = Interface connection
BW BW
The payload separation system was designed to SFC SFC
accommodate two satellites, but only one satel-
lite flew on this mission. Aft Payload

BW BW
The mission specification had the separation
SFC SFC Mission
commands sent to the “forward” position. An unique
I/F I/F
engineer redlined the commands to “aft” to Generic
core
simplify wiring. Unfortunately, this change was
not incorporated in the final mission specifi- Payload Payload
cation. Software
BW BW BW BW
commanded Hard wired
Not realizing that the informal redline had SFC SFC SFC SFC
fallen through the cracks, the hardware group I/F I/F I/F I/F
designed an incompatible harness. The draw-
ings were released as a new baseline, making it
difficult to detect crucial changes. Several Separation Configuration
systems engineering departments could have
(Top) For two payloads
checked the compatibility of the final design to (Bottom) For the failed mission
overall requirements, but none did—the key
mission specification was developed by soft-
ware engineers and was not placed under
systems engineering’s jurisdiction.

The mistake was not discovered on the ground because the generic systems test activated both
positions, allowing the miswired ordnance verification unit to appear working.
Lessons Learned:
• Test the specific configuration that will be flown (Lesson 3).
• Conduct tests and reviews to validate that the requirements are met, rather than that the
drawings are correctly implemented.
• Actively involve systems engineers in software development activities, and formally con-
trol all system (including software) interfaces.

For more technical information, call Suellen Eslinger at (310) 336-2906.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 85
Space Systems Engineering Lessons Learned
Lesson 86
Hand-Over Logic Tree Must Be Unambiguous

The Problem:
A suborbital launch was inadvertently terminated less than a minute after liftoff.

The Cause: Other Staging

During the flight, control had to switch from the Software


8% 6%
ground to a downrange airplane. Commands were 8%
23% Propulsion
sent via three analog channels: A, B, and C. Ground Controls
8%
used tones A and B; and the airplane used tones B
and C.
Tone B was the “ALIVE” signal, and a combination 29% 18% Human
of tones B and C meant “ARM”. Once armed, if the EMI Error
onboard receiver loses the “ALIVE” signal, it
AIAA 2000-3578
would assume that something went awry and abort
the flight. By having the airplane take over control Watch Out for Radio Interference
using the ARM signal, the handover plan put the
flight in danger. A study of missiles converted for
suborbital or space launches found
The onboard receiver detected tone C from the air- that the largest cause of failure was
plane and armed. However, it could not electromagnetic interference (EMI).
immediately lock into the airborne transmitter be-
cause plume attenuation caused incoming ground
transmission to fluctuate. While the receiver
dithered, land and airborne B tones became mo-
mentarily out of phase. The phase-looped oscillator
in the receiver lost lock, spoofing the self-destruct
mechanism into thinking it lost the “ALIVE” heart-
beat. The launcher blew itself up.
Lessons Learned:
• Conduct redundancy switching analysis to ensure a fail-safe transfer between multiple, or
redundant, controllers. Postulate all credible failure paths (such as part failure, start-up
transients, latch-up, overvoltage, and EMI) and determine the effect on the switching
process. Make sure glitches in one unit will not propagate across interfaces.
• Guard against radio frequency (RF) interference from multiple sources.

For more technical information, call Ron Williamson at (310) 336-2149.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 86
Space Systems Engineering Lessons Learned
Lesson 87
Avoid Repeating Other People’s Mistakes

The Problem:
A launcher’s maiden flight failed.
(a) As designed
Actuator
The Cause: and
guidance
cable

The launch vehicle, unlike most other systems,


did not recycle hydraulic fluid, but drained it at Hydraulic
fluid vent

the nozzle exit plane instead.


(b) During flight
The spent oil dripped into the exhaust plume
and caught fire. Recirculated by external air Air flow
entrapping

flow into the aft area, the flame damaged an the flame

uninsulated guidance cable, sending false sig-


nals into the thrust vector controller. The
vehicle veered off course.
Four years earlier, another rocket crashed be-
cause excessive engine heat destroyed guidance
cables. The investigation board concluded that Plume Safety
jettisoned hydraulic oil could have dripped into As a rocket ascends, decreasing
the exhaust and contributed to the mishap. atmospheric pressure causes its flame
Several programs thereafter changed designs to to spread out.
keep fluid clear of the plume and to add insula- The designers of this failed launcher
tion. conducted static firings, but did not
run sufficient computational fluid
Unfortunately, even though the motor supplier dynamics modeling. Thus, they did not
of this failed vehicle also built the motor that anticipate the conflagration or the need
went awry four years earlier, the lesson was not to protect the cable.
heeded.
Lesson Learned:
• Study past failures that involved similar technologies and implement appropriate
corrective actions.
• Ensure subcontractors discuss relevant lessons with the primes.

For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 87
Space Systems Engineering Lessons Learned
Lesson 88
Verify Each Operation Step

The Problem:
A piece of flight hardware was damaged during its integration to the launch vehicle.

The Cause:
A Similar Incident
During launch vehicle erection, the Stage III,
spin table, and the satellite were contained in a As a thunderstorm approached a launch pad,
workers draped a rain shield over a satellite
canister and bolted to the Stage II. After the being processed in the White Room.
guidance systems were connected, a technician
had to remove the bolts before the canister The shield consisted of overlapping strips of
waterproof cloth, secured with adhesive
could be lifted. tapes. The installation instructions stated,
To indicate that he was to start unbolting, the “ensure both top and bottom sides of seam
technician put both thumbs up and shouted are taped.” Nonetheless, the lower side was
neglected, nor was there a verification.
“ready.” The crane operator heard “Randy,” his
name, and mistakenly interpreted the gesture as Rainwater poured through the building’s
a command to hoist. The shackled stack was leaks. The weak rain shield collapsed,
drenching the satellite. Launch had to be
raised up; the spin table suffered structural delayed for years.
damage.
The error took place because:
1. Not realizing the lift operation could be hazardous, the foreman allowed an uncertified
technician to direct the crane. A properly trained rigger would have avoided making an
ambiguous “thumb-up” sign.
2. The operating procedure did not require anyone to verify that the bolts had indeed been
removed. The crane driver should have been taught to ask for the restraining pin, for
example, first.
3. The procedure did not specify communication protocol.
Lessons Learned:
• Implement a discrete verification step for each critical task.
• Require positive confirmation before hazardous commands can be acted upon.
• Do not deviate from written procedures.
• Handle space hardware carefully.
For more technical information, call Norman Lagerquist at (310) 336-2362.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 88
Space Systems Engineering Lessons Learned
Lesson 89
Prevent Hardware Fratricide
Helper spring
Fairing
diagram Circumferentia l
The Problem: thruster spring

A payload fairing did not open in flight. Longitudinal


thruster spring

The Cause: Pyroplug

The shroud was deployed with two sets of


explosive-driven springs. The primary
circumferential squib should have fired
first, followed by, at 22 millisecond inter-
vals, its backup; the primary longitudinal Circumfertial cut pulled longitudinal
firing circuit apart
ordnance; and its backup.
The circumferential cut thrusted the nose- Prior Problems Missed
cone forward and pulled the longitudinal A review of a previous mission revealed that several
firing plugs apart. An unfavorable toler- non-critical pins had disengaged. Unfortunately,
ance buildup, plus an unexpectedly large these warning signs were not heeded and the
connectors were not redesigned (Lesson 65).
forward motion of the fairing, discon-
nected several pins. The longitudinal split
did not take place. Similar Incident
A launcher used shaped charges to separate the
The fix involved adding a heritage locking stages. The initiator on one end fired first, disabling
mechanism to prevent the connector halves the other end of the charge and preventing the
from moving apart during firing. When the structure underneath the damaged initiator from
shroud starts to unlatch forward and out- tearing apart. The vehicle jackknifed.
ward, lanyards attached to a bracket
mounted above the plug pull the fastener Initia tors in
open. Shaped 2
detonation block
1
charge
Lessons Learned: Interstage
skin
• Ensure the neighboring units survive Uncut
after the primary device operates. area

• Qualify ordnance devices in their


operational environment.

For more technical information, call Selma Goldstein at (310) 336-1013.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 89
Space Systems Engineering Lessons Learned
Lesson 90
Account for All Loose Materials

The Problem:
Other “Foreign Object Damage” Incidents
A large engine partially melted during a test
firing. • Debris contamination spoiled five foreign
launches between 1990 and 1999, includ-
The Cause: ing several caused by rags clogging
propulsion lines.
Investigators found that a large piece of sealing
tape, routinely used during engine assembly, • Debris such as paper clips left in RF cavi-
blocked the fuel injector and caused the tur- ties repeatedly caused test failures on a
bopump to overheat. satellite program. The contractor finally
developed an electromagnetic probe to
The investigation board reprimanded the manu- sweep all cavities before they were sealed.
facturer for not having a disciplined process to
handle, or account for, loose materials. The • A jet engine contractor suffered several
failures caused by bolts or tools being left
processing paperwork was not traceable, making inside test units. The management subse-
it difficult to know what work was done on quently required an inspector to go inside
which part. the inlet to check for debris using a flash-
light.
In this case, the build log supposedly docu-
mented tape removal and independent Right after the new procedure was imple-
verification. The Investigation Board discovered, mented, the engine blew up. The flashlight
was left behind. (From “Augustine’s
however, that tape reportedly taken out was Laws.”)
repeatedly found during postfire inspection or
engine rebuild.

Lessons Learned:
• Make sure loose, nonserialized materials (such as wipe cloth) used during assembly are
carefully accounted for.
• Correct the root cause of in-process anomalies (Lesson 32).
• Keep accurate records of all “nonflight” installations.
• Take photos frequently during assembly.
• Design hardware to minimize areas that cannot be easily inspected, and avoid the use of
potential contaminants whenever possible.
• Keep hardware closed when access is not needed.
• Review out-of-flow processes to ensure no steps are bypassed (Lesson 64).
For more technical information, call Dana Speece at (310) 336-5021 or Gary Shultz at (310)
336-2342.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 90
Space Systems Engineering Lessons Learned
Lesson 91
Ensure Critical Systems Are Tolerant of Transient Power Loss

The Problem:
A first-stage engine shut down soon after liftoff.
A Lesson Not Learned
The Cause: After this incident, the contractor redesigned
the 30-year old control electronics to provide
Immediately before the mishap, the bus current redundant power and guidance. A sister launch
spiked twice. Evidently, a power cable had a vehicle program, however, did not make a
breach in its insulation layer, and momentarily similar change.
grounded. The engine relay box lost power, and Years later, the second program suffered a
numerous relays controlling the propulsion failure. Apparently, a defective power cable
valves dropped out, disabling the engine. shorted intermittently, causing the guidance
computer to reset and the inertial measurement
By design, the relays lock on their own contacts
unit to lose reference.
during flight, which depends on a continuous
supply of electricity to retain their running con- The launcher had miles of wires—forty-four
repairs had been made on this particular vehicle
figuration. If the power is lost, even for an alone. In retrospect, it was clearly impossible to
instant, the relays unlatch with no means to re- inspect out every wiring defect, and the
cover. decision not to provide redundant power proved
The vulnerability to a transient short had been costly.
recognized by the contractor for years. Unfortu-
nately, even though many design improvements
Cabling defects
Florida Today

had been made elsewhere, such as in the propul-


led to the most
sion system, little attention was given to this costly unmanned
single failure point. launch failure

Lessons Learned:
• Ensure the onboard computer retains “most recent state” information so that if a glitch
causes the loss of “present state” data, the vehicle can revert to a survivable configuration.
• Anticipate wiring problems, and provide redundant power sources to critical systems, in-
cluding lock-in power circuits to prevent hardware reset.
• Recognize the need to address weaknesses in nonpropulsive systems.
For more technical information, call Peter Carian at (310) 336-8215.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 91
Space Systems Engineering Lessons Learned
Lesson 92
Rigorously Determine the Root Causes of Test Failures

The Problem:
The primary laser in an instrument failed after a month in space. Laser diode
(mounted with
Array tin/lead solder)
endcap
The Cause:
Heat Gold wire bonds
The laser pump consisted of several diodes sink

mounted on heatsinks, soldered together into Indium on

stacks. Apparently, the indium solder con- Spacer Indium solder


gold wires
Indium
taminated the gold bondwires, forming an (thickness
exaggerated)
solder

insulating layer of intermetallics. In orbit, the


The laser bars are connected
corroded bondwires suffered from thermo- in series through gold plating,
mechanical fatigue and cracked. wirebonds, and indium solder.

Lasers have not flown in space often. The de- Laser Array Stacks (Simplified)
sign of this laser was derived from a previous
program and was procured commercially. In Au/In Reaction in Terrestrial Applications
retrospect, the vendor’s internal processes and Original gold wire
Au/In Intermetallics
controls were not up to par for space applica-
8 Years 14 Years 25 Years
tions. The new design was more vulnerable Lawerence Livermore Lab
because current density in the contaminated
bondwires increased by 40 percent, intensi-
fying thermal loads in the wires. Several years
of launch delay made the degradation worse. Remaining gold wire

During qualification, the bondwires broke several times. The vendor replaced the defective
components and asserted that the failures would not recur. A laboratory analysis, which would
have discovered the root problem, was requested but not carried out.
Lessons Learned:
• New technologies require rigorous qualification, analysis of design changes, and a thor-
ough understanding of failure modes.
• Audit a vendor’s manufacturing process, conduct destructive physical analysis of sample
parts, and ascertain the root causes of all anomalies.
• Review the materials and processes for each new application drawing.
• Guard against known materials incompatibilities (gold/tin intermetallics can embrittle
solder joints, for example).
For more technical information, call Renny Fields at (310) 336-6973.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 92
Space Systems Engineering Lessons Learned
Lesson 93
Always Ascertain the Direction of Current Flow

The Problem:
Contact with a satellite was lost soon after launch. -
I
-
Solar Positive polarity ground
Array often us ed
The Cause: +
+ in foreign design
The satellite consisted of a domestic instru-
ment module and a foreign service module.
A design mistake in the foreign unit caused I
+ +
the solar panels to be connected backwards.
Solar Negative polarity ground
The domestic instrument supplier, in charge Array
-
commonly used
- in US design
of system integration, checked the interface
between the solar panels and battery, but
only verified the magnitude of current, not Polarity Confusion
its direction—engineers might have became
confused as to how the current should flow A Similar Incident (from “Augustine’s Laws”)
because the foreign unit grounded positively
but the American unit grounded negatively. A preflight check found two hardware modules
wired in the opposite polarity. Both subcontractors
Once in orbit, the battery drained, ruining reversed their cables. The launch failed.
the mission.
"It's always the simple stuff that kills you," lamented the lead engineer.
Lessons Learned:
• Make sure that engineers understand how the system or component should function during
test.
• Thoroughly verify interfaces of subcontracted items, particularly when the suppliers use
different engineering conventions.
• Use an engineering model to verify interfaces early.

For more technical information, call Ron Williamson at (310) 336-2149.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 93
Space Systems Engineering Lessons Learned
Lesson 94
Provide Debug Features in Flight Software to Assist Anomaly Resolution

The Problem:
An interplanetary probe lost some scientific data due to occasional system resets.
The Cause: J LOW* (Other instruments)
J MEDIUM (Bus tasks) J HIGH (Data management)
Driven by demanding mission requirements, the
designers used a commercial, realtime, multiple- 1553 Databus
Shared
J LOW (from one memory
tasking operating system. instrument) Processor Watchdog timer
An esoteric “priority inversion” problem took
place during science operation and caused some The Priority Inversion Problem
data loss. This glitch was not caught on the
Because the bus and instruments share the processor,
ground because the Earth-pointing antenna per- job allocation is vital. The highest priority is given to
formed better than expected, allowing more data management, followed by bus tasks and by sci-
frequent downlinks than originally planned. ence activities. If data management tasks cannot
complete within the watchdog’s 125 millisecond
Fortunately, debugging tools, written during cycle, an anomaly is assumed and the computer is
code development, were embedded in the reset.
software. With extensive support of the vendor, Data from the bus and payloads flow through a 1553
the project was able to reproduce the problem in data bus, but one instrument is processed directly.
the laboratory and identify the cause. A quick fix That sensor shares a software function with the trans-
allowed the mission to successfully conclude. action manager—not a prudent design but normally
not a problem. Access to this resource was controlled
Lessons Learned: with a key. If a data manager job (JHIGH) starts late in
the cycle, it may find a job from this instrument (JLOW)
• Ensure that commercial software, especially still in process. If JHIGH also requires the shared soft-
the operating system, allows access to inter- ware function, it must pause for the key.
nal information and is compatible with When a communication job (JMEDIUM) initiates during
development debug tools. the short interval, however, it preempts JLOW, prevent-
• Test for off-nominal conditions, both ing the key’s release. The system watchdog timer
starts the next cycle, finds JHIGH unfinished, and resets
“better” and “worse” than expected (for the system.
example, at higher throughput rate), to see if
Turning on “priority inheritance” options for that par-
the system misbehaves. ticular thread (giving high priority to JLOW in light of
• Leave debug capabilities embedded in the jobs blocked by it) solves this problem. This option is
not normally used as default due to performance
operational system.
concerns.
• Shared functions must be thoroughly tested,
especially for timing.
For more technical information, call Suellen Eslinger at (310) 336-2906.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 94
Space Systems Engineering Lessons Learned
Lesson 95
Ensure Heritage Designs Can Operate in the New Application Environment

The Problem:
An interplanetary probe mysteriously failed.

AIAA-2001-3630
The Cause:
The incident occurred when the vehicle,
having completed a year-long flight, pres-
surized its propulsion system in preparation Restrictor exposed to oxidizer vapor
(a) After 30 days (b) After one year showing
for an orbit-insertion burn. The propulsion extensive corrosion

system had been used for apogee boosting in ~


numerous GEO satellites without incident. P
X

V Helium
Extensive testing could not reproduce the 1
F R X

failure. Oxidizer leak path

Years went by. Then, in a program review, a CV X PV 2 CV X


= Service
X valve
Oxidizer Hydrazine ~ = Transducer
propulsion expert heard that a commercial CV = Check-valve
restrictor contained a brazing alloy that is F = Filter
PV = Normally
incompatible with oxidizer vapors. The closed pyrovalve
R = Pressure regulator
same part had been used in the failed space-
craft; a failure mechanism finally dawned on Simplified Propulsion Schematic
him.
Evidently, the oxidizer vapor can pass through the check-valves and cause a very slow corro-
sion. Normally a negligible amount, so much debris would accumulate on this long-duration
mission that when the pyrovalves fired, the debris was shaken into the restrictor orifices and
kept the regulators open. Helium rushed out, bursting the line.
The incompatibility was not recognized at the time because the restrictor’s materials list did
not include the braze. In fact, if the expert had not made the connection, two more probes
would have been launched with the same flaw.

Lessons Learned:
• Avoid relying on short-term tests (days to months) to confirm long-term reliability.
• Audit vendor material lists to ensure completeness.
• Account for vapor diffusion in propulsion subsystem design.
For more technical information, call Mark Mueller at (310) 336-5081.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 95
Space Systems Engineering Lessons Learned
Lesson 96
Tests Must Independently Verify Development Results

The Problem:
A space telescope was out of focus. Null corrector
Point
light source Top view
The Cause:
Upper mirr or
Anti- Chipped
The telescope’s primary mirror was Reflective coating
Inter fer ometer
polished with the aid of a “null corrector.” Cap coating
a
Lights that are shone on a perfect mirror,
when reflected through the corrector, Rod Light fr om
should form straight interference patterns. Inter fer ometer

Lower mirr or
The corrector was set up with a positioning Cap Side view
Lens
rod capped on one end. A light beam
passed through a small aperture in the cap Rod
b

to focus on the rod’s tip, and a lens was Telescope Mirror As Intended Actual
placed at the other end of the rod.
Unfortunately, a speck of antireflective
coating chipped off the rod’s cap, and the Mirror Manufacturing Process (Simplified)
focusing beam was aimed at the cap in- Missing coating (view a above) near the cap aperture
stead. The lens was misplaced; the mirror caused the operator to aim the light at the cap instead of
was misshapen. at the rod (view b above).
Because the contractor used the corrector
not only as a manufacturing tool but also Operators Failing to Call Attention to the Problem
as the sole referee standard, it could not The misfocusing prevented the metering rod from
detect the mistake. In fact, each of two reaching the lens, but the technicians simply extended
pieces of auxiliary optics suggested gross the rod by inserting a few washers.
errors. However, confident that the new-
“That in itself should have alerted people…because
technology corrector was better, the engi- clearly there should not be a need for any unexpected
neers ignored the red flags. washers to be added,” said the investigation board.

Lessons Learned:
• Use simple tools to crosscheck elaborate tests.
• Scrutinize test equipment, analysis, or algorithms reused from design or manufacturing for
possible single-point failure.
For more technical information, call Julie White at (310) 416-7229.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 96
Space Systems Engineering Lessons Learned
Lesson 97
Control Hardware and Software Configurations Before, During, and After Tests
The Problem:
A satellite pointed toward the Sun with the
wrong axis.

The Cause:
As the satellite exited eclipse for the first time, it
should have pointed a vector 35 degrees off the
z-axis toward the Sun. Instead, it wobbled, while
pointing the x-axis to the Sun. Fortunately, one
of the solar wings was illuminated, giving the Satellite
Early in integration (a) Final (b)
engineers time to recover.
The next day, an examination of a photo taken at
the launch site revealed that two Sun sensors
were mounted ninety degrees off. A software
change quickly fixed the problem.
The Sun sensors were mounted on the main
access panel in the intended direction during
verification testing, before the panel was
attached to the spacecraft. When the panel was
being installed, however, the mechanical engi- Closed-up view - notice wiring direction
neers found that the sensor cables were too short (a) (b)
to mount the sensors “as hung.” Seeing no con-
trol document on the sensor configuration, they Sun Sensor Misorientation
turned the sensors sideways, without informing
the guidance and control (G&C) engineers of the
change.

Lessons Learned:
• Always ascertain G&C actuator phasing (Lessons 53, 60, 80).
• Ensure domain engineers own all aspects of their subsystems.
• Conduct end-to-end testing in the flight configuration.
• Take plenty of photographs during assembly.
• Document G&C subsystem-level alignment. See Guideline GD-ED-2211 from NASA
Technical Memorandum 4322A, for example.
For more technical information, call Geoffrey Smit at (310) 336-1602.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 97
Space Systems Engineering Lessons Learned
Lesson 98
Guard Against Post-Firing Conduction of Pyro Initiators

The Problem:
The redundant memory board on a spacecraft filed.

NASA
The Cause:
During an orbit insertion maneuver, the
satellite fired several explosive bolts to Post-firing Conductive Mechanism
jettison a solid rocket.
Arming relays
As designed, the pins
The burning pyro propellant formed a con- are insulated from the
case. Firing current
ductive plasma, shorting to the chassis- + returns via isola ted
path.
grounded case. A voltage surge rippled Power
supply
-
Pyro
Capacitative
through the input protection diode in the Single point
Firing relay Memory
circuits
coupling

backup memory circuit, causing upsets. If ground

Chassis
the primary memory had latched, the mis-
Pyro shorted ( ),
sion could have failed. bypassing norma l return
path. Ringing and
volta ge spike occur in
Lessons Learned: adjacent circuits due to
ground coupling.
• Protect firing circuits against sneak Power
supply
currents and line-to-ground shorts. Memory
circuits
Components such as step motors and
pyro circuits that experience sudden Electromagnetic
coupling
current changes should be isolated
from all other current-carrying circuits Simplified Bus Grounding Architecture
including electrical power, electrical
control, RF transmission lines, and
monitoring circuitry. For additional in-
Other Post-Fire Conduction Conditions
formation, see Electromagnetic Inter-
ference Analysis of Circuit Transients, Post-fire plasma shorts can drain batteries. See
NASA Preferred Reliability Practice Journal of Spacecraft and Rockets, 36, 586-590
(1999).
No. PD-AP-1308, for example.
Drive elements can be disabled by residual
• Check circuit designs against Elec- current, and should be inspected after ground
troexplosive Subsystem Safety live tests. In one case, an inspection found a
Requirements and Test Methods for damaged fusing resistor, which would have
prevented in-flight firing.
Space Systems (MIL-STD-1576),
Between 3% and 5% of firings result in.
NASA Standard Initiator User's Guide conduction.
(JSC-28596A), and Electrical
Grounding Architecture for Unmanned
Spacecraft (NASA-HDBK-4001).
For more technical information, call Ron Williamson at (310) 336-2149.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 98
Space Systems Engineering Lessons Learned
Lesson 99
Have the Model’s Originator Check the Analysis

The Problem:
A spacecraft broke up after firing its embedded solid rocket motor.
Satellite
The Cause: body

The contractor bought the motor off-the-shelf and Solid


learned that another company had flown a similar rocket
motor
design. It obtained that company’s thermostruc-
tural analysis, but did not refine its own model,
nor ask the original analyst for support.
The analyst had also presented the results of his
analysis in a conference. A diagram published in d
the proceedings showed the nozzle was deeply
buried inside the spacecraft (the distance from the
structural base to the nozzle mouth,
dheritage mission, reported = 6.03 inches). The engineers Satellite Diagram (Simplified)
used this information to justify the final design,
which submerged the motor deeper (dnew mission =
4.95 inches) and did not thoroughly shield the
spacecraft against plume heating.

The accident investigation board subsequently found that the spacecraft would suffer massive
heating from the motor exhaust plume and disintegrate. The motor vendor estimated that
heating would be almost two orders of magnitude higher than expected by the contractor.
Why was the design, qualified by similarity, so far off?
It turned out that the motor in the previous mission was actually more extended
(dheritage mission, actual = 11.03 inches). The distance shown in the conference paper was an error!
The author knew about the mistake but unfortunately did not know the contractor relied on his
publication instead of the model, which did not include this erroneous diagram.
Lessons Learned:
• Double check all analysis models, assumptions, methods, and predictions.
• Develop a rigorous process for using experience as a basis for accepting further designs
and equipment.
• Have the original analyst review final product (Lesson 26).
• Make sure key subcontractors accept how their product is being used.

For more technical information, call Dan Perez at (310) 336-2734.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 99
Space Systems Engineering Lessons Learned
Lesson 100
Make Sure Safety Mechanisms Are Truly Independent

The Problem:
Spurious
A satellite suffered a near-catastrophic short. signal
Plunger
Power controller
FPGA
Wax
The Cause: Primary heater

+28V
Following launch, the spacecraft turned on a
set of wax heaters for three minutes to acti- Secondary heater
vate the release actuators on the solar arrays. Wax actuator
(4 Sets)
Later, a design error in a field-programmable
gate array (FPGA) inside the power controller Actuator Diagram (Simplified)
caused the primary heaters to be reactivated.
After ten minutes, the overheating primary
elements shorted to the secondary elements,
and subsequently to the bus structure. The Ensure Independent Safety Mechanisms
short circuits drew hundreds of watts, at a The ARM and FIRE relays in Diagram (a)
current level several times the power board’s below can prematurely close on one FPGA
design limit. error Separate drivers (b) should be used
Enable
High level
commands
Fortunately, the heater traces burned open,
Arm Backup
saving the power distribution unit from perma- FPGA firing
nent damage. Otherwise, the mission would Fire circuits
have ended. Pyro
initiator (a)
Lessons Learned:
• Ensure safing mechanisms will prevent Arm
FPGA 1
one design error from causing a cascade Fire
FPGA 2 (b)
of irreversible failures (Lesson 77). In this
case, one error could have activated all the
heaters, and the solar arrays might have
been deployed prematurely.

• Check for failure mechanisms during extended operation even if that is not the intended
application. If prolonged operation leads to catastrophic failure, provide circuit interrupts,
time-out protection, or a graceful degradation mechanism (Lesson 19, 71).
• Review special design requirements for FPGAs (Lesson 77).

For more technical information, call Peter Carian at (310) 336-8215.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 100
Space Systems Engineering Lessons Learned
Nominal
Lesson 101 Safe Mode

Face-on
Provide Robust Design for Safe Modes Illumination

Y X (intermediate)
The Problem:
A satellite ceased operation four days after launch.
Z: Principal
Axis of Inertia
The Cause:
NADIR

After the spacecraft was launched into a 300-km parking orbit


for initial checkout, a series of anomalies occurred. Z
Y
Controllers put the satellite in safe mode and, after operating
Anomaly
trouble-free for several hours, left for a 12-hour break.
Edge-on
Illumination
The spacecraft relied on a single two-axis gyro for attitude X
control in safe mode. This gyro only sensed rotation about the
major and minor axes, and could not handle torques around
the intermediate axis, an inherently unstable situation.
Unfortunately, a small imbalance spun the unstabilized Satellite Orientation
satellite up. The thrusters autonomously fired to arrest the
disturbance, but the firings occurred so often that the
Minor
watchdog timer—designed to preserve fuel—shut the thrusters

Intermediate
down. The satellite entered a flat spin, turning its solar arrays •Major axis spin is stable
edge-on towards the Sun, and power was lost. By the time the •Minor axis spin is stable
ground crew returned, the battery was depleted. but may be destabilized by
Major energy dissipation
Working in a contractor’s branch office, the attitude-control •Intermediate axis spin is
engineers reused a design without realizing the previous unstable
mission had a more stable configuration in safe mode and
without performing a peer review—a similar satellite being Major-Axis Rule
developed at the prime’s main campus in fact used multiple
gyros in similar safe mode, in anticipation of instability.
Controllers opted to leave the satellite unattended without realizing that even though stability
in the eventual mission orbit (523 km) had been demonstrated, steadiness in the lower parking
orbit, where atmospheric drag is more severe, had not been validated. Later, simulations
confirmed that attitude control would be lost in a few hours.
Lessons Learned:
• Continuously staff the ground station during spacecraft initialization.
• Analyze the effect of anomalies in all operating modes.
• Incorporate mass property and thruster imbalances in attitude stability simulation, and
avoid thruster-only control modes.

For more technical information, call Tom Fuhrman at (310) 336-6596.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 101
Space Systems Engineering Lessons Learned
Lesson 102
Establish Configuration Control for Ground Support Equipment

The Problem:
A spacecraft sustained heavy damage in the factory.

The Cause:
The contractor had built dozens of these
satellites before and was integrating two Cart Satellite 2
more.
Satellite 1 was mounted, via an adapter
plate, on a turnover cart and tested hori-
Booster Adapter Plate
zontally. As the satellite was Turnover Cart Interface Plate

demounted, the hydraulic cart jack Spacecraft Separation Plane Satellite 2


broke, so the cart was sent to the shop 44 Bolts (out of 88 holes) Booster Adapter
for repair. Securing Satellite to Adapter
Plate (installed)
A day later, a program in an adjacent
high bay borrowed the cart (a common 24 Bolts Securing Booster Adapter Plate
Adapter Plate to
practice at this facility) from the shop to Interface Plate
test a different spacecraft. The adaptor (missing) Cart Interface Plate
plate for Satellite 1 was unbolted, but
Cart
before a different adaptor plate could be
installed, the hydraulic jack problem
was discovered. The cart, with a loose
adaptor plate, was returned to the shop.
The crew in the original high bay installed a new jack to begin testing Satellite 2. The
technician responsible for the cart, assuming the cart was just as it was when it went to the
shop, simply signed the shop orders without inspecting the plate. The engineer and the QA
checked off the paperwork based on the technician’s signature, even though it was their
responsibilities to visually check that everything was ready. DCMA had cited the contractor
for similar lapses numerous times in the past, but this time the consequence was severe—
when the operators attached the satellite and tilted the cart, the adapter plate slipped off.
Satellite 2 fell over onto the floor.

Lessons Learned:
• Maintain enough discipline to ensure space equipment is handled carefully, and avoid
allowing familiarity to breed contempt.
• Implement an unambiguous verification step for each critical task (Lesson 32).
For more technical information, call Pat Mak at (310) 336-3529 or Anthony Salvaggio at
(310) 336-3198.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 102
Space Systems Engineering Lessons Learned
Lesson 103
Ensure Adequate User Understanding of Complex Instruments

The Problem:
Results from months of electromagnetic interference (EMI) testing had to be discarded.
The Cause:
The contractor acquired a new test set, Factory Set for Commercial Applications
factory set to scan in 120 kilohertz (kHz) A B
wide windows, stepping up in 40 kHz B’ A”
Window size is A-B
Step is B-B’ (or A-A”)
increments. Military test specifications,
6950 kHz 7000 kHz 7050 kHz
however, required 1 kHz windows and 0.5
kHz steps. When reprogrammed according Tailored Mil-Std-462 Method 1 kHz
to the specifications, the test equipment Window

suffered a memory overflow. 500 H z


Overlap

The test engineer momentarily increased As Tested


Only emissions at
step size to 50 kHz to clear the overflow, 50 kHz i ntervals
assuming that the new unit would act like were tested

the old unit and subsequently reduce the step As Checked


Calibrati on relied
size back to ensure the required overlap. But on comb-like
the new, more sophisticated test set kept the tones at 1 MH z
intervals, where
50 kHz step, skipping 49 kHz between the machine
worked properly.
measurements. The engineer did not consult Coverage
gaps were not
the 700-page manual, and the computer— detected.
3 4 5 6 7 8 9 MH z
which allowed hundreds of measurement
options—gave no warning of coverage gaps.
Calibrations relied on 1 MHz comb-like
tones, all falling right on the sampled
windows. Therefore nothing appeared awry.
A last-minute broadband sweep of the flight hardware, meant as a double-check, revealed
several spikes in regions supposed to be quiet. It took months before these newly detected
emissions could be corrected.
Lessons Learned:
• Design machine interfaces, such as menus and user guides, to avoid confusion.
• Benchmark new instruments against heritage equipment.
• Cross-check test results with secondary, easy-to-understand standards early and often
(Lesson 96).
• Check results with randomly selected values (in this case, if various non-integer spur
intervals were used in calibration runs, the problem may have been caught).
For more technical information, call Kanaiya Mahendra at (310) 336-1649 or Mark Simpson
at (310) 336-0159.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 103
Space Systems Engineering Lessons Learned
Lesson 104
Ensure Ground Commands Do Not Jeopardize Spacecraft

The Problem: Fuses


A telescope controller failed on orbit. Circuit protection devices (fuses or, infrequently,
circuit breakers) prevent a short from damaging
The Cause: upstream assets such as the power distribution
board. Fuses should be used on both the primary
The ground crew routinely sent and backup units of a redundant set, transducers
commands in two parts: “clear register” operating from unregulated primary power, and all
followed by “execution,” but the noncritical equipment. They should not be used on
command checking process would not nonredundant, critical equipment.
preclude “execution” in case the “clear Fuses should be large enough to carry worst-case
operating current (a good rule of thumb is three
register” command was not carried out.
times the peak and start-up current of downstream
A command was issued when the satellite operation), but small enough to prevent the lowest
was at a low elevation off the horizon. possible shorting current from causing damage. In
The “clear” instruction did not go this case, the fuse was sized based only on the
expected load and blew, even though the power
through, but the “execution” did, causing supply could have handled the unexpected load with
the flight computer to freeze. Numerous ease.
stored command sequences backed up.
During subsequent contact with the ground station, the operators reset the interrupt controller.
Right away, all pending commands ran. The telescope aperture door and the calibration
source motor, intended for operation with a 40-second pause in between, turned at the same
time. The telescope controller, not designed to accommodate the simultaneous mechanism
operation, blew its fuse.
Fortunately, the aperture door automatically sprang open to allow the mission to continue, but
on-board calibration was no longer possible.

Lessons Learned:
• Conditional commands (execution of an instruction contingent upon another) must first
verify the completion of the preceding command.
• If multiple commands can cause a mechanical or electrical conflict, code in a prevention
block (i.e., an exclusive OR).
• Make sure flight computers are restarted in a known mode with only appropriate
commands in the queue—always clear pending commands first.
• Double-check if a fuse should be installed, and carefully analyze fault scenarios to size
fuses. For guidelines, see NASA TM-02179, Selection of Wires and Circuit Protective
Devices, and NASA-HDBK-4001, Electrical Grounding Architecture for Unmanned
Spacecraft.
For more technical information on commanding, call Joseph Anselmi at (310) 336-7326; for
information on fuse selection, call Tom Hecht at (310) 336-1505.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 104
Space Systems Engineering Lessons Learned
Lesson 105
Establish Roles and Responsibilities Ahead of Critical Operations
The Problem:
An air launch went forward despite an abort.
The Cause:
Two agencies, a Launch Vehicle team
Satelli te d ish

Control Center Ground-to-Air


accustomed to fast-paced air launches Coordinator (GC)
and a Range team used to more static pad Test Director (TD)
Launch
operations, worked together for the first Range Range Launch Vehicle Team
time. There was only a brief rehearsal. Safety Controller Conductor
(RS) (LC)
One minute before the scheduled drop, an Range Team

equipment anomaly prompted the Range


Safety Officer to issue an abort. The Simplified Team Architecture, Comm Net, and Events
Launch Conductor heard the call second-
TD RS LC GC Pilot
hand but could not tell on whose Channel A
authority it was made, or why. Channel B
The Launch Vehicle team defined Channel C
“ABORT” as a “HOLD,” making it • Range Safety called “Abort” on Channel A.
possible to resume the countdown. Still • The Range crew stood down.
believing that the launcher was set to go, • Launch Conductor heard the commotion and asked on
Channel B who called what, and why.
the Conductor turned to a nearby safety
• The Test Director heard the question on B but replied on
engineer and asked if there was really an A.
abort. Attempting to affirm a “no-go,” the • Not hearing an answer, the Conductor sought to confirm
engineer waved. the abort off the net. The safety engineer answered with a
hand wave.
The Launch Conductor mistook the
• Misinterpreting the gesture, the Conductor ordered the
gesture as “no abort,” and said “go for launch on C.
launch.” The rocket deployed—utterly • The Range crew were not monitoring the conversation
surprising the Range team who, because between the launch team and the pilot and were caught off-
their regulations forbid mission guard by the drop. Fortunately, they recovered in time.
recycling, had already left their stations!
The anomaly turned out to be only a telemetry glitch and the flight went successfully. But the
National Transportation Safety Board issued a blistering review.
Lessons Learned:
• Enforce mission control room discipline including network arrangements, communication
protocol, headset etiquette, hold method, rescind strategy, and decision authority.
For more technical information, call Susan Ruth at (310) 336-6765.
For comments on the Aerospace Lessons Learned Program, including background specifics, call
Paul Cheng at (310) 336-8222.

Lesson 105
Space Systems Engineering Lessons Learned
Lesson 106
Do Not Dismiss Test Anomalies as Random Events—Find Out Why (III)

The Problem:
A vibration isolation equipment jammed in orbit.
The Cause:
The isolator used two dampers to stabilize an Bushing
antenna mounted on the end of a 60-meter Plastic seal

mast. The damping head moved inside a Nominal clearance


silicone-filled cylinder, dissipating vibrational 1.4±0.2 mil

energy as heat. A flight-proven design,


modified to meet the mission needs, was used. Metallic damping
piston rod
Two months before launch, a spare cartridge, Simplified Damper Diagram
which had passed acceptance test eight months (Notice the tight clearance)
earlier, seized up. An inspection revealed that
its polymer internal seal had expanded against
the piston rod.
Engineers thought that the interference was caused by stress relaxation, which tended to occur
quickly at first and then stabilize. Since the flight units went through an extra clearance ad-
justment step during assembly, stress relaxation was believed to be less noticeable. The
mission went forward.
The post-failure investigation revealed that, to meet the mission’s damping requirements, the
vendor changed the seal material. The new plastic is very different from silicone, therefore
neither the vendor nor the project suspected any interaction, even after the preflight failure.
Nobody realized that silicone could slowly seep into the seal, causing it to swell.
Lessons Learned:
• Make sure units that have very tight clearance requirements will retain long-term dimen-
sional stability.
• Carefully validate material substitution (Lesson 14).
• Test for long-term compatibility even on short-term missions—launch schedule may slip
(Lesson 92).

For more technical information, call Nat Patel at (310) 336-6473.


For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 106
Space Systems Engineering Lessons Learned
Lesson 107
Difficult Troubleshooting Calls for System-wide Rethinking

The Problem: Reflector


OMNI Shim To OMNI
Antenna
Jammed antennas on several satellites. Al Housing

The Cause: Graphite Shaft


Hinge
An antenna positioning mechanism
Steel
(APM) steered the high-gain antenna re- APM Bearing
flector both after deployment and
Satellite Body
seasonally.
After a decade of flawless performance, Failure Mechanism
the antenna began to jam. Mechanism
The hinge, attached to the APM via a four-bar linkage, could
specialists quickly concluded that the
seize because at cold temperature its aluminum housing
APM gears must have worn out. contracts much more than the graphite shaft. A seized hinge
However, the specialists just could not forced the APM to exert 20 times as much torque as it was
make the APM, a simple device, fail on designed for.
the ground despite years of tests and The APM, hinge, and antenna were fabricated by three
studies. Unable to prove to customers different groups (Mechanisms, Structures, and Antennas). The
that it understood the problem, the frus- full-up assembly was very large and never tested. Earlier
vehicles only moved the antenna when the hinges were warm,
trated contractor had to design the APM but subsequent missions moved the antenna to track beacons
out. in winter.
A few years later, an antenna on a new
satellite jammed during ground tests June Serving the Northern Hemisphere
even though it did not use an APM! It (Hinge coldest in December)
was finally found that all along, the APM N
was pushing against a hinge that could
bind in winter. When everybody rushed
to analyze why the APM did not generate The Clarke Belt
enough torque, nobody paused to ask, ”Is December S
the APM butting against too much load?”
Serving the Southern Hemisphere
(Hinge coldest in June)
Lessons Learned:
• Problems that cannot be conclu-
sively resolved may involve subtle Engineers did not immediately tie jamming to coldness
interactions, even though the because malfunctions sometimes occurred around June and
sometimes around December. In fact, problems occurring
symptoms are all local. around June were on satellites serving the Southern
Hemisphere, when the hinges on these “flipped” satellites
For more technical information, call were coldest!
John Bohner at (310) 336-1772.
For comments on the Aerospace Lessons Learned Program, including background specifics, call
Paul Cheng at (310) 336-8222.

Lesson 107
Space Systems Engineering Lessons Learned
Lesson 108
Create Open Liens on Parts Procured Ahead of Qualification

The Problem:
An extremely time-critical launch was
aborted.

The Cause:
The rocket was laterally supported by three
retractable arms. Each arm had a crush block
to absorb the shock when the arm was with-
drawn. Crush Block
The preliminary design called for 5.5-inch
blocks, 18 of which were delivered to the
Mounting
launch facilities. The subcontractor attached Plate
a temporary part number on the blocks and ¼”

performed qualification tests, but it was


soon discovered that the retraction mecha-
nisms did not have enough force.
More tests showed that the arms would work Simplified Ground Support Equipment Design
well enough if smaller, 4.75-inch blocks
were used instead. Formal drawings and in-
stallation procedures calling out 4.75-inch
block were released. But the 5.5-inch A Similar Incident
blocks, never qualified to begin with and An instrument housing was difficult to fabricate,
therefore not superseded, were not purged. and the Material Review Board allowed several
threaded holes to be shortened—but the length of
Misled by the temporary part number, a sub- the screws were not changed.
contractor installed three 5.5-inch blocks for The screws breached the housing after two days in
this launch. Consequently two of the three space, whereupon vacuum arcing disabled the unit.
mechanisms did not pull back fully, and the
rocket failed to leave the ground.

Lessons Learned:
• Never accept deliveries of “flight-like” hardware without creating a link to the inventory
system to control its use.
• Develop 100% effective reachback mechanisms for Failure Review Board (FRB) and
Material Review Board (MRB) actions.
For more technical information, call Gary Shultz at (310) 336-2342.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 108
Space Systems Engineering Lessons Learned
Lesson 109
Ensure Safe Handling of Dangerous Equipment (I)

The Problem:
Test Button
<<1 w
A rocket accidentally took off, killing a range + -
<<

technician.

3
Battery
Pack Micro-ampere 1Ω Bridgewire
only Fires at 1 w
The Cause: <<
Pinch
The rocket was mounted in a test stand when Resistor As-intended
operators checked the continuity of the motor
1.3 V
igniter with a hand-held meter. Battery

A voltage-reduction circuit in the meter’s << 1.3 w

3
battery pack delivered a trickle current—
~1.3 ampere
~1.3
enough to verify the connections but not fire
the pyros. Unfortunately, the battery fit well in + -
<<
the meter without the holder, and there was no
1.3 V
warning on the holder. Battery As Incident Occurred

Someone installed a new battery before the test


Test
but inadvertently left the pack on the work- <<

bench. Plugging the full-voltage gauge into the Cal


rocket set off the solid fuel and demolished the Calibration
assembly area. resistor
<<

When word of this accident spread, spokes- Battery Redundant A Safer Design
persons for two other facilities stepped forward resistor

to admit that they had also inadvertently


launched rockets with the same meter!
Safety in Test Set Design
Lesson Learned: A redundant current limiter and an isolation
• Conduct safety analysis of instrumentation switch should have been hard-wired into the
meter. Better yet, squibs should be designed to
that interfaces with dangerous equipment. allow tests with “arming” and “firing” inhibits in
• Handling procedures should be first place, via the pyro box’s test ports, instead of
checked with a non-flight (dummy) unit directly across the bridgewire.
(see Lesson 63).
For more technical information, call Peter
Carian at (310) 336-8215.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222. Information on this incident can be found on line at
http://www.hql.or.jp/gpd/eng/www/nwl/n12/design.html.

Lesson 109
Space Systems Engineering Lessons Learned
Lesson 110
Ensure Safe Handling of Dangerous Equipment (II)
The Problem:
Spring Clip (b)
A rocket being prepared for launch Ground
(Retains Rocket)

unexpectedly ignited, injuring an en- Contact Isolated Ring


gineer. Igniter
Remote
Control
The Cause: Withdrawal
Piston
A safety plug is inserted to cut off
Lanyard
the external power during range
preparation. The plug normally Fin-to-
(c)
Motorcase Fire Arm
hangs from the end of a non- Roll Pin

conductive lanyard.
Safety
n
Unfortunately, someone replaced the Tape Spring-loaded
J-Hook p
lanyard with a electrical wire, cre- over
Igniter
ating a sneak path. During handling, Pin
o
brief electrical contact to the squib (a)
was made, and a loose safe-to-
ground path failed to shunt the (d)

current away from the bridgewire. Circuit Analysis


The motor ignited. During processing (a) a plug
disables the firing circuit. At launch
Lessons Learned: (b) the plug is pulled, permitting
• Make sure everything that inter- the “fire” command to be sent to
the igniter, via the J-hook.
faces with dangerous equipment
is safe (Lesson 109).
Attaching the plug to the piston with a conductive wire
• Avoid relying on mechanical (n) allowed the tube frame to become charged. During
joints or hinges as electrical con- handling, the J-hook touched the tube frame (o),
ductors—use conductive fastening connecting the battery to the squib. The conductive
wire instead, or bridge the joint safety tape (p) should have diverted the current to the
with copper braid bonded to both motor case via the fin (c) but could not, because the 400
surfaces. ohms resistance in the roll pin was higher than the 1.5
• Use positive means to ensure that ohms of the bridgewire (d).
each interlock functions properly
with test points, or implement
high-impedance, fault-tolerant
indicator circuits.
• Check safety-critical grounding, and watch out for sneak circuits.
For more technical information, call Ron Williamson at (310) 336-2149.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 110
Space Systems Engineering Lessons Learned
Lesson 111
Ensure Safe Handling of Dangerous Equipment (III)

The Problem:
An ejected pyro part narrowly missed test en-
gineers. Test Equipment Table
Impact Shield
30 ft
The Cause:
Several refurbished pin pullers from earlier Satellite
pyroshock testing were reused to verify de- Under Test

ployables. Unactuated Pinpuller


Pin
A fired pin broke its retaining cap, ricocheted
End Cap
off the flight hardware, and gouged the wall. Inititator
Ports
Despite considerable safety precautions, the
projectile might have caused serious injury.
The test devices used aluminum end caps not
as robust as the flight units, and the thread on Test Setup
one of the caps apparently deformed during
the first use. The thread damage was not no-
ticed, however, in part because there was no
inspection procedure for rebuilt hardware.
Separation Nut

Lessons Learned: Pyro Initiator


and Booster
• Test pyros remotely—even an impact
shield could not fully protect the test en- Protection Cup
gineers in this incident. Wire Damaged After
Hitting Protection Cup
• Do not use refurbished pin pullers, even
in test.
Another Containment Issue on This Mission
• Protect critical hardware from pyrotechnic
hazards. An ejected separation nut hit its protection cup,
severely bending the cables and causing multiple
• Design release mechanisms to retain spent shorts to the chassis. Foam padding was subse-
pyrotechnic parts. quently added to the cup.
• Make sure released parts cannot damage
adjacent hardware or interfere with space-
craft operation.
For more technical information, call Selma Goldstein at (310) 336-1013.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 111
Space Systems Engineering Lessons Learned
Lesson 112
Understand the Subtle Behavior and Limitations of Commercial Off-The-Shelf (COTS)
Software

The Problem: 32 Mbytes 96 Mbytes

Communication with a Mars rover halted.


Specific Use Buffers

The Cause:
Unallocated Memory (Exhausted)
The Rover collected a large amount of sci- Allocated Buffer
ence data for occasional transmission to an Data
Code
overhead relay spacecraft. A DOS-type
utility indexed these files in the random-
access memory (RAM), using more Too Many Files Exhausting Memory
memory than expected. The file management utility used memory
based on the number of files in each sub-
The amount of data acquired by the Rover directory, including both deleted files and
burgeoned, but housekeeping telemetry did metafiles on science data. The software engi-
not report the memory status. Soon, there neers failed to account for this overhead and
were too many files for the RAM to han- underestimated memory usage.
dle, and the computer turned itself off. Although a design rule prohibited flight
Because the file system reload required as software from reaching into the free RAM once
much memory as that cached before the initialization completed, the file-management
shutdown, the computer could not reboot. utility was allowed to violate this rule in part
because the memory usage was mistakenly
The start-shutdown cycle repeated over thought of as small.
sixty times until the batteries ran down. Controllers attempted to upload a utility to
The system then entered a “crippled mode” remove unnecessary folders (merely deleting
wherein the computer rebooted without files would not suffice). Unfortunately, the link
loading the file indexes. With the RAM failed and the anomaly occurred before the
resend could complete. Several other esoteric
freed up, controllers cleaned up the files problems subsequently combined to impede
and restarted the computer normally. autonomous recovery.

Lessons Learned:
• Provide a way out of endless reboot cycles, and avoid start sequences that require poten-
tially unavailable resources (Lesson 79).
• Track computer resource usage, just as other vehicle consumables.
• Ensure the computer can degrade gracefully, instead of freezing up catastrophically.
For more technical information, call Joe Anselmi at (310) 336-7326.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 112
Space Systems Engineering Lessons Learned
Lesson 113
Analysis Must Make Room for Source Data Uncertainties

The Problem: (+20±5) + (-19 ±5) = +1±?


A new launch vehicle broke up. When two numbers combine, their vari-
ances do not cancel out each other,
The Cause: therefore the uncertainty range could be
The guidance and control design, inher- much larger than the nominal product.
ited from hypersonic airplanes, per- To estimate variance, we should:
formed well in the heritage vehicle. But 1. Identify the distribution of the under-
the new vehicle was larger. lying data by, for example,
examining their histograms or prob-
An inexperienced engineer verified the
ability plots.
robustness of his autopilot design, using a
sensitivity analysis with key parameters 2. Conduct Monte Carlo simulations.
varied by 50% to account for aerody- The spread in the above equation de-
namic uncertainties, a standard airplane pends on, among others, whether 20±5
design practice. implies a normal dispersion (24 is possi-
ble but less likely than 21) or uniform
The rudder should be steered to keep a distribution (any value between 15 and
key parameter (change in the yaw mo- 25 is equally likely). For sure, the uncer-
ment due to sideslip angle, Cnβ) positive. tainty could far exceed 0.25 (25%)!
Embedded in the Cnβ are two large
uncertainty sources which combined in a
way to yield an expected Cnβ of 0.03.
The engineer “proved” his design by applying a 50% perturbation (±0.015) on this small
number, and did not ask someone with more launch vehicle experience to check his calcula-
tions. Unfortunately, the true Cnβ turned out to be about -0.03, or 200% off, creating a load
that overwhelmed the rudder.
A small tweak in the feedback gain could have saved the mission.
Lessons Learned:
• Compute variations in derived numbers according to correct mathematic rules.
• Make sure the uncertainties in models make physical sense, especially at “tipping points.”
• Launch vehicles are more challenging than airplanes.
For more technical information, call Miguel De Virgilio at (310) 336-6245.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 113
Space Systems Engineering Lessons Learned
Lesson 114
Identify All Exposed Circuits to Preclude Inadvertent Shorts

The Problem: Pin Puller


A fuse on a pin puller repeatedly blew during
tests. B
c
A d
The Cause:
The program engineers asked the pin puller
vendor if internal shorting was possible and e
were given assurance that the widely used de- Interrupt Switch
sign was robust. Suspecting intermittent shorts 28V Pin
in the cables but unable to find a smoking Puller
K
gun, the engineers made several worst-case F
repairs. Still, the fuse continued to blow.
As-wired Should Be
During subsequent troubleshooting, as the pin
puller was reset, the oscilloscope registered an Reset Tool
overcurrent, and an operator saw a flash. An
inspection of the reset tool revealed seven
burn marks!
It turned out that a set of interrupt switches in-
side the pin puller had exposed electrical
contacts. Unaware of this design feature, the Paraffin Actuated Pin Puller (Simplified)
vendor supplied a metal reset tool, and the
program wired the switches hot (28V). Each Application of power melts paraffin (c),
time the metal reset tool inadvertently bridged whereupon the shaft (d) retracts until it trips
live contacts to the grounded housing, the fuse the interrupt switches (A and B) and breaks the
circuit. The switches reclose after the wax
was blown. cools.
Lesson Learned: During reset, latching relay K was closed and
transistor F was set ON to resoften the
• Some paraffin-actuated pin pullers have paraffin. The program then turned transistor F
exposed internal contacts that should be OFF before inserting the reset pin (e), making
it possible for the fuse to blow because relay K
connected to the ground side. still connected the switches to the high side.
• Mark exposed contacts in drawings to Changing the material of pin e to an insulator
highlight handling hazards. eliminated the fuse problem.
• Ensure test safety: use care when han-
dling energized hardware.
• Continue troubleshooting until all alternative fishbones have been ruled out.
For more technical information, call Tom Hecht at (310) 336-1505.
For comments on the Aerospace Lessons Learned Program, including background specifics, call
Paul Cheng at (310) 336-8222.
Space Systems Engineering Lessons Learned
Lesson 115
Guard Against Subtle Timing Conflicts in Fast Circuits

The Problem: Multi-chip


Module SRAM
A payload test was plagued by a series
of intermittent computer lock-ups at Processor o p Output
elevated temperatures. n port
Internal q
The Cause: databus

After nine months of troubleshooting, Simplified Failure Scheme


the root cause was traced to a subtle Processor (n) outputs data via two controllers (o,p). When a
chip design flaw: an internal bus was high-priority task (q) intervenes, p should drive the internal
left floating instead of being driven to bus to zero and await o to resend. Because the chip design left
zero. Normally this would not be a the internal bus floating, charges stored on the signal and
control lines had to instead bleed off through the internal
problem, but when the computer is resistor networks.
running hot, the address and data left
on the bus can be interpreted as a Higher temperature increased resistance and slowed the charge
dissipation. If the decay exceeded two 50 nanosecond clock
write to a configuration register, cycles, the internal register in controller o would corrupt,
causing memory access errors. whereupon the computer halts.
The circuit was originally developed
for 13 MHz applications, and the Timeline of
design flaw had never caused trouble. charge decay
But the new payload had a 20 MHz Logic 0
Transition
application that narrowed the timing Level
window. Even then, the qualification
Forced to decay
unit did not malfunction because its (should be) Time
circuit timing match was just right. Decay time at ambient
Decay time at hot
} as implemented
Lessons Learned:
• Actively drive control signals (for good measure also drive data and address lines) to zero
when de-asserting a bus operation.
• Be aware that analog effects can sometimes change the behavior of digital circuits.
• Anticipate subtle timing problems as clock speed increases.
• Develop thorough software/hardware interface specifications and sophisticated circuit
analysis tools.
• Check temperature-dependent circuit behavior.
For more technical information, call Lee Mendoza at (310) 336-5547.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 115
Space Systems Engineering Lessons Learned
Lesson 116
Thoroughly Analyze and Test Deployables (II)

The Problem:

A secondary payload failed several months after launch. Azimuth


axis (1)
The Cause: Elevation
During launch the gimbaled telescope was locked to axis
the host satellite with two caging brackets to pre-
vent rotation, and subsequently was released by Gimbal
thermal actuators. After swinging out, the caging Controller
arms were supposed to signal full retraction by
closing a spring-actuated switch. Unfortunately, the
spring was too heavy and held up the caging arms.
With each gimbal rotation, the partially retracted Host Satellite
arms rubbed against a dangling gimbal cable.
Eventually the cable wore through, shorting out (2)
most of the payload electronics.
The switch was added late in the design cycle, and
Caging brackets with
the change review process overlooked the ex-
torsional retract springs
cessively heavy spring. During ground deployment,
nobody noticed that the arms did not have enough
torque because gravity aided the caging arms’ (3)
motion.
Partially
The cable had excessive slack and was bonded to Spring- retracted
the gimbal. During launch the cable shook loose, actioned caging arm
allowing it to rub against the caging arms. microswitch

Lesson Learned: Simplified Payload Diagrams (1, 2)


and Interference Scenario (3)
• Pay special attention to the operation of com-
plex mechanisms in zero-G (Lessons 20, 42).
• Make sure errant movement or expansion of
flexible materials, such as wires and thermal
blankets, will not jam mechanisms (Lesson 78).
For more technical information, call Ron Williamson at (310) 336-2149.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 116
Space Systems Engineering Lessons Learned
Lesson 117
Carefully Plan for Controlled Mission Shutdown

The Problem:
A satellite lost attitude control for two days.

The Cause: Payload Power Bus


The satellite’s secondary payload d
c EL
suffered a massive short (Lesson 116). Controller
Relay
The short blew a power bus fuse on the Host Controller
Satellite
secondary payload. Unfortunately, a de-
sign oversight made it impossible for Gimbal
ground controllers to switch off the e Controller
gimbal motor, which was on another As Designed Gimbal Motor
circuit. The gimbal continued to spin for
a few months until halted by cold
bearings. EL
Controller
Somehow, the dormant gimbal awak- Relay
ened two years later with a vengeance, Controller
accelerating and decelerating randomly,
reaching spin speeds well in excess of f Gimbal
Controller
its design limit, creating momentum that
Gimbal Motor
saturated the host satellite’s attitude Should Be
controllers. The host vehicle lost Earth
lock. Simplified Power Schematics
After a short (c) blew a fuse (d), operators could not
Operators were relieved when the gim- turn off the gimbal drive by opening the latching relay
bal “mercifully” stalled two days later, (e) because the relay controller lost power. The relay
sparing the primary mission. controller should have been separately fused (f).

Lesson Learned: A Related Incident

• Conduct a detailed failure mode and A spacecraft lasted many years beyond its initial short
effect analysis for every fuse and mission because its battery performed surprisingly well.
relay. The satellite’s beacon, which could not be turned off,
caused persistent radio-frequency interference.
• Make sure that a secondary payload
cannot possibly endanger the
primary mission. Consider a host-
controlled “kill switch.”
For more technical information, call Ron Williamson at (310) 336-2149.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 117
Space Systems Engineering Lessons Learned
Lesson 118
Ensure Subcontracted Tasks are Properly Verified

The Problem:
A satellite launch was delayed by 18 months.

The Cause: Swage


Numerous traveling wave tube amplifiers Rivnut

(TWTAs) were attached to heatpipe


flanges at a subcontractor shop.
The flanges, embedded in honeycomb Rivnuts
panels, had to be reworked. The subcon- A rivnut is an internally threaded fastener de-
tractor applied a hard foam adhesive, signed to swage (mushroom out), thereby
inadvertently making it difficult for the securely attaching itself to the hole. A counter-
sunk rivnut should remain flush or below with
rivnuts to swage. When the TWTAs were respect to the hole when torqued.
firmly torqued into the rivnuts, some
rivnuts came loose.
Defects in rivnut installations are difficult Simplified Problem Area Schematic
to see. The prime contractor required
elaborate inspection of in-house rivnut
installations, but did not insist the
subcontractor to do so, nor conduct re-
ceiving inspections. Raised Rivnut
Honeycomb TWTA

When a TWTA was removed for rework Facesheet Attachment Plate

later on, a loose rivnut was found. All Heat Pipe Flange Thermal

rivnuts—not just the ones associated with


Interface
Rework Adhe sive

the TWTAs—were called into question. Honeycomb Core

Hundreds of attachment points on this


satellite had to be replaced. Reworked Structure Nominal Build

Lessons Learned:
• Make sure the intent of critical design requirements are followed on subcontracted tasks,
especially when specialized “tribal knowledge” is involved.
• Conduct sufficient engineering reviews on repairs and reworks.
For more technical information, call Harry Yerondopoulos at (310) 336-3375.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 118
Space Systems Engineering Lessons Learned
Lesson 119
Ensure Safe Handling of Dangerous Equipment (IV)
Relay Command
The Problem: Panel

A launcher caught fire on the pad, killing FIRE ARM

twenty-one people. Igniter


Start
1Ω Cartridge
Ordnance
SAFE Bridgewire
Panel
The Cause:
Motor Case

The rocket was inadvertently ignited by 100 kΩ


SAFE
Unshielded
electrostatic discharging (ESD). The crew, Cables

accustomed to a tropic environment, was not ARM


Vehicle
Ground Support Equipment (GSE)
ESD-conscious. Unable to readily procure
shielded detonator cables, they used
Simplified Ordnance System Schematic
unshielded cables instead. But just prior to
launch, cold and dry air was continuously n
pumped through the stack, which was FIRE ARM

shrouded with non-conductive plastic, q Igniter r Start


Cartridge
making conditions ripe for ESD. Ordnance
SAFE
p
Panel
Previous versions of the rocket had safe-arm Motor Case s
devices to ensure an accidental flash could SAFE
Unshielded
not spread, but defective safe-arm devices o Cables

cost an earlier launch. ARM


GSE Vehicle
Rather than thoroughly addressing why the
flight units did not work, the program Accidental Ignition Sequence
moved the safing function out of the vehicle ESD introduced an arc in the pyro circuits (n).
and into a ground support equipment (GSE) Impeded from grounding by the two 100 kΩ resistors
relay box. Unfortunately, the GSE design (o) used for continuity checking, the voltage spike
jumped to the grounded detonator case (p) instead,
was flawed, and an ESD event set the rocket setting off the explosive coating the bridgewire (q).
off. In the absence of a safe-arm device downstream, the
accidental flash propagated (r and s).
Lessons Learned:
The SAFE connector should have been grounded.
• Heed Range guidelines on pyro handling.
• Ensure safe design in GSE.
• Follow MIL-HDBK-83578, Criteria for Explosive Systems and Devices Used on Space
Vehicles.
• Correct systemic quality problems after failures—do not just remove the symptom.

For more technical information, call Ron Williamson at (310) 336-2149.


For comments on the Aerospace Lessons Learned Program, including background specifics, call
Paul Cheng at (310) 336-8222.

Lesson 119
Space Systems Engineering Lessons Learned
Lesson 120
Do Not Dismiss Test Anomalies As Random Events—Find Out Why (IV)

The Problem:
A payload could not separate from the launch vehicle.
Laser Firing Unit
The Cause: FIRE
1.4 A Firing Current Combined
Command IN
Firing Output
The laser-initiated squib circuit consisted of Power Unit
Laser
a test current and a separately generated BIT IN
BIT/ Matrix Diodes
Fiber
Optics
BIT OUT
FIRE current. Apparently, the FIRE con- Decode
0.4 A Test
troller chip broke a pin because foam BIT
Current
Initiators
protecting the circuit board from vibration Receiver
had been removed “to improve produci- Simplified Laser Firing Unit Schematics
bility.”
The test circuit generated a faint laser glow to verify
Similar chip failures had occurred three command logic. But the main firing unit itself was not
times during tests, yet the contractor did not tested for fear of setting off the squibs.
recognize that, without the foam, the board Besides building in redundancy, the designers should
would excessively deflect and cause harm. have incorporated a switch to shunt the fire current
Unfortunately, the pin damage was not into the “BIT check” loop, so the entire circuit could
be verified.
caught before launch because the FIRE
current was not checked during or after
ground vibration.
FIRE Controller Chip Vibration Response
Lessons Learned:
G2/Hz (Log)

• Always determine the root causes of


ground test anomalies (Lessons 55, 56,
and 106).
Resonance
• Make sure flight circuit paths are intact
after environment tests. Hz (Log) Hz (Log)

• Avoid single string designs on critical Previous Design: Board New Design: Foam
functions. Stiffened with Foam Removed

• Thoroughly analyze dynamic loads prior


to vibration test.
• Fully instrument units under test: equipment that operates during flight should be vibrated
on ground with power on.
For more technical information, call Peter Carian at (310) 336-8215.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 120
Space Systems Engineering Lessons Learned
Lesson 121
Prevent Nonmagnetic Materials from Becoming Magnetized

The Problem:
A satellite fired its attitude-control thrusters too often, depleting its fuel.

The Cause:
The failure was traced to the propellant tank,

Daily Average Magnetic


Daily Average Yaw
Field 110

Momentum (Nms)
which developed a dipole moment that 1.0

Field (nanotesla)
100
torqued the satellite to align with the Earth’s 90
magnetic axis, overtaxing the thrusters. 0.8
80
Momentum
The tank was made of annealed stainless
steel, normally thought as nonmagnetic. 1 3 5 7 9 11 13 15 17 19
Days Under Monitoring
Apparently, the sheet metal became mag-
netized either while being worked into the
hemispheric shape, or when exposed to an
Pinpointing the Failure Mechanism
external magnetic field—perhaps by coming
into contact with a magnetized equipment. The anomaly was at first blamed on leaks
in the propulsion line. Later on, the
The tank was supposed to be made of tita- anomalous torque was found to correlate
nium, but a switch to stainless steel had to with variations in the Earth’s magnetic
be made due to schedule deadlines. Unfor- field. Moreover, the roll error shot up
tunately, the possibility of magnetization did during magnetic storms. The fuel tank’s
not occur to anybody, otherwise a simple engineering model was subsequently tested
degaussing would have averted the failure. and found to have a large dipole moment.

Lessons Learned: A Similar Incident


• Be aware that iron or nickel alloys may A spacecraft developed attitude-control
become magnetized when work- trouble because two thruster valves were
hardened or exposed to a magnetic field. mistakenly wired in the same polarity,
causing their residual magnetic moments to
• Conduct magnetic testing on end items add up instead of to cancel.
made from ferromagnetic materials.
• Minimize magnetic contamination in
manufacturing facilities.
• Monitor satellite field exposure with magnetic sensors.
For more technical information, call Geoffrey Smit at (310) 336-1602.
For comments on the Aerospace Lessons Learned Program, including background specifics, call
Paul Cheng at (310) 336-8222.

Lesson 121
Space Systems Engineering Lessons Learned
Lesson 122
Minimize Residual Magnetism on Delicate Mechanisms

The Problem: Stator


A shutter failed system-level test. Rotor

The Cause: Stator


Latching
The shutter was supposed to be closed most Tab
of the time, using a solenoid to magnetically Closeout
hold the parts together. In the event an anom-
aly was to cut the power off, a spring would Electromagnetic Coil
pull the parts apart and keep the aperture
open. Gaps among the parts had to be tiny to Tab
minimize current draw, therefore it was vital
to prevent residual magnetism from upsetting Restoring Spring Force
the delicate force balance.

Rotor

Stator
Electromagnetic Coil
All prototypes appeared to work well. How-
ever, because the flight unit was meticulously Frictional Force FF
assembled to ensure the tightest fit, it Magnetic Force FM
developed much more residual magnetism. Gap (L) FM ∝ 1/L2
But nobody knew—the test script cycled the
relays before power-off, unintentionally cre-
ating a reverse current spike that degaussed Mechanism of Shutter Operation
the solenoid.
During final test, which did not spuriously A Similar Incident
degauss, the shutter jammed. Unfortunately, it A solenoid actuator failed because its friction
was too late for repair, and operation suffered. plate, made of Precipitation Hardened (PH)
stainless steel, became magnetized. PH steel is
Lessons Learned: prone to magnetic hysteresis, which varied
from unit to unit more than expected.
• Conduct tolerance analysis and specify
ranges on units sensitive to subtle dimen-
sion changes (caused by, for example,
thermal mismatch, creep, and manufac-
turing variations). Avoid unforgiving de-
signs.
• Use positive means (such as non-magnetic plating) to control gap widths in magnetic circuits.
• Make sure test programs do not mask equipment flaws.
For more technical information, call John Bohner at (310) 336-1772.
For comments on the Aerospace Lessons Learned Program, including background specifics, call
Paul Cheng at (310) 336-8222.

Lesson 122
Space Systems Engineering Lessons Learned
Lesson 123
Double-Check Start-Up Behavior of Digital Electronics

The Problem: If primary fails, astronaunts


Normal command switch over
The first Shuttle launch was aborted at
T-20 minutes. Primary SW Backup SW

Computer 1-4 Computer 5


The Cause:
The delay was traced to a subtle soft- Astronauts
ware change which created an Data Bus
improbable timing overflow: it would Sensors/
only happen about once every 100 Instrument
system starts.
Phase-synchronized Not Phase-synchronized
The Shuttle uses the utmost rigorous Turned on at T-30 h Turned on at T-20 m
process to ensure neither hardware nor
software flaws could endanger the
Simplified C&DH Architecture
crew. Yet ground testing could not
physically verify boot-up behavior The primary software, hosted on four identical computers, is
backed up by independently developed software in a fifth
often enough (simulations usually cy- computer. The primary operating system is asynchronous
cle from “checkpoints” to avoid (interrupt driven); the backup is synchronous (time-
having to initialize the computer each slotted)—less flexible but easier to validate.
time) and only once was the problem To allow the backup system to “listen in” at real-time, the
seen in the lab. primary phase-synchronizes most cyclic processes with an
initialization operation (n) followed by phase polling of
Lessons Learned: various processes (o).
The initialization subroutine was
used in another, more demanding
• Perform worst-case power-on tim- First 40 ms Cycle application and changed to allow
ing analysis, paying special more time (+δ) to initiate. This
attention to delays in power con- change reduced the time available
verter and power reset circuits, Typical Duration for polling. On rare occasions,
where most timing variations Worst case polling (a stochastic process)
occur. could not complete in the first

cycle, and an extra cycle count
• Use high-fidelity test beds, and was automatically added.
add 20-30% timing margins to the The problem was not detected because everything
software specifications. remained internally consistent. But then a unit, not
designed to be phase synchronized, came on-line at T-20
• Document all hardware/software
minutes. The new unit could not reconcile with this extra
interface requirements, and perform cycle, forcing the abort.
exhaustive testing.
For more technical information, call
Richard Covington at (310) 336-3232.

For comments on the Aerospace Lessons Learned Program, including background specifics, call
Paul Cheng at (310) 336-8222.

Lesson 123
Space Systems Engineering Lessons Learned
Lesson 124
Double-Check Start-Up Behavior of Digital Electronics (II)

The Problem:
A payload computer occasionally locked up during ground testing.

The Cause:
Extensive troubleshooting traced
the seemingly random halts to
Computer boots Occasional Exception not
an improbable event in the com- with improper code parity error handled well
puter. The flaw was not found
earlier because lower-level tests Application runs with CPU status register
did not cycle the hardware on interrupt enabled K0 corrupted
and off frequently enough or Application needs to Another interrupt
adequately exercise the test write CPU status to bit K0 happens to be pending
scripts.
CPU illegally attempts
Computer halts
Early on, the computer vendor to read from EEPROM
had notified the prime that the
operating system’s boot se- Infrequent Deterministic
quence should be updated to
avoid having a timing glitch Simplified Failure Scheme
cause parity errors in a control- Notice three infrequent events have to converge for the computer to halt.
ler’s registers. The prime, which
extensively tailored the
algorithms in order to integrate
the computer to the payload, did
not fully understand the problem
and applied its own fix, which
turned out to be inadequate.
Lessons Learned:
• Develop thorough test cases to verify system performance.
• Include sufficient on-off cycling and long run times during computer testing to expose
subtle hardware/software interaction quirks.
• Heed vendor alerts.
For more technical information, call Lee Mendoza at (310) 336-5547.
For comments on the Aerospace Lessons Learned Program, including background specifics,
call Paul Cheng at (310) 336-8222.

Lesson 124
INTERNAL DISTRIBUTION LIST
REPORT TITLE

Five Common Mistakes Reviewers Should Look Out For

REPORT NO. PUBLICATION DATE SECURITY CLASSIFICATION


TOR-2007(8617)-1 29 June 2007 Unclassified
1. FOR OFF-SITE PERSONNEL, SHOW LOCATION SYMBOL, e.g., JOHN 2. IF LIST IS ALTERED, INITIAL CHANGE(S) AND SHOW AFFILIATION
Q. PUBLIC/VAFB *FOR SECRET REPORTS, SHOW BLDG AND ROOM, NOT MAIL STATION
NAME (Include Initials) MAIL CODE* NAME (Include Initials) MAIL CODE*

Anthony S. Abbott M5/681 Jean Breedlove M5/687


Frederic J. Agardy M8/018 Steve J. Breese M8/693
Joseph A. Aguilar M4/919 John P. Brekke M1/123
William H. Ailor M4/066 Alfred O. Britting Jr M5/596
David J. Albert M6/642 Mark A. Brosmer M1/122
Sergio J. Alvarado M1/113 Francis J. Brown M1/039
Andrea L. Amram M4/954 Robert M. Broussard PETERSON
James V. Anderson M6/225 Donald M. Brueck M4/974
Nancy S. Andreas M5/641 Angelia P. Bukley ACP-760
Bruce L. Arnheim M6/203 Stephen E. Burrin M1/340
Graham S. Arnold M5/560 Heinz L. Butner M1/122
Grant C. Aufderhaar CH1/510 Asya Campbell M8/219
Russell E. Averill CH2/404 Michael L. Campbell M1/101
Wanda M. Austin † CH1/610 Diana L. Cannon M1/372
Neal K. Baker SLVR Alan L. Caraway M6/227
A D. Barnard CH1/220 I-Shih Chang M4/967
David W. Bart M4/903 Bernard W. Chau CH1/530
Lee T. Bavaro M6/214 Patrick S. Cheatham CH1/430
Michael J. Baxter M4/943 Paul G. Cheng (15) † M4/905
Stephen S. Bayliss M4/944 Kenneth R. Childers SCHRIEVER
David A. Bearden PAS Andrew B. Christensen M2/254
Steven M. Beck M2/253 David A. Christopher M4/940
Randall S. Beezley M1/023 James B. Chudoba CH2-402
Kevin D. Bell ROS Richard H. Chui M8/234
Kirstie L. Bellman M6/214 William C. Clair CH1/640
John R. Berg M5/682 Donald T. Clark M2/334
Jay M. Bernard M5/576 John E. Clark M5/689
Trudy L. Bergen A4-413 Ronald B. Cohen M5/754
Carl D. Billingsley M4/ 021 Rich F. Coleman CH2/309
Jonathan F. Binkley M4/949 Robert F. Colwell CH2/220
Harlan F. Bittner M5/565 Allen Compito CH2/209
J. Bernard Blake M2/259 James O. Covington A4/413
Nikolas A. Bletsos M4/975 Jeffrey E. Crawford CH1/450
Walter L. Bloss III M2/244 Charles K. Cretcher CH1/540
John S. Bohlson M8/715 Lori B. Crosse M5/658
Michael L. Bolla M5/658 Melvin M. Cutler M5/655
Vincent C. Boles CH1/510 Stephanie B. Danahy CH1/540

FINAL APPROVER DRAW LINE(S) ACROSS UNFILLED SPACE AND INITIAL TO PRECLUDE ADDITIONS 1

APPROVED ________________________________________________________________________________________________ DATE ___________________

IF LIST COMPRISES TWO OR MORE SHEETS, COMPLETE ABOVE BLOCK ON LAST SHEET ONLY

SHEET 1 OF 6
AEROSPACE FORM 2394 REV 3-85
INTERNAL DISTRIBUTION LIST
REPORT TITLE

Five Common Mistakes Reviewers Should Look Out For

REPORT NO. PUBLICATION DATE SECURITY CLASSIFICATION


TOR-2007(8617)-1 29 June 2007 Unclassified
1. FOR OFF-SITE PERSONNEL, SHOW LOCATION SYMBOL, e.g., JOHN 2. IF LIST IS ALTERED, INITIAL CHANGE(S) AND SHOW AFFILIATION
Q. PUBLIC/VAFB *FOR SECRET REPORTS, SHOW BLDG AND ROOM, NOT MAIL STATION
NAME (Include Initials) MAIL CODE* NAME (Include Initials) MAIL CODE*
Theodore H. Davey M1/177 Robert L. Feeley M1/ 023
Gary W. Dahlen COS Joseph F. Fennell M2/259
Glenn A. Davis CH2/209 Wayne R. Fenner M6/204
Leslie G. De Long M1/016 Robert W. Fillers M5/590
Marlene M. Dennis M1/022 Gerald T. Finn CH1/620
Walter J. Dennis M6/213 Robert E. Fischer CH1-530
Manuel De Ponte M8/219 William D. Fischer M4/908
David Desrocher COS Paul D. Fleischauer M2/247
Marc J. Dinerstein COS Alan M. Foonberg M1/102
Martin V. Dixon M4/956 Robert W. Francis M4/934
Richard L. Donnelly CH2/402 Howard R. Freeman SLVR
Willard D. Downs III M1/048 Thomas A. Freitag M5/562
Linda R. Drake M1/131 Lynn M Friesen M2/269
Lyle R. Drinkgern M1/038 Robert P Frueholz M1/928
Gordon S. Dudley CH2-308 John S. Fujita M5/752
John P. Duggan CH2/406 John F. Galanti CH2/406
Christophe B. Dunbar M4/ 976 Gina D. Galasso M4/960
Thomas W. Duncan C/CANAVERAL Thomas E. Gallini M4/956
Karlene S. Duncan SCHRIEVER Bruce E. Gardner M4/035
Robert K. Duncan PETERSON Dorien C. Garman CH2/209
W Paul Dunn M5/553 James G. Gee M5/633
Joseph A. Dworak CH1/520 Al A. Geiger CH2/220
Margherita P. Eastman CH2/403 Isaac Ghozeil M4/904
David S. Eccles ROS Rodney C. Gibson M1/065
Robert C. Elliott CH2/405 William Giragos M1/160
Kenneth B. Elliott III COLUMBIA James B. Gin VAFB
William A. Emanuelsen M1/134 Murry I. Glick M4/984
Jeffery L. Emdee M4/969 Mark H. Goodman M1/016
Jorge L. Encalada M5/554 Wayne H. Goodman M5/559
Suellen Eslinger M1/112 David J. Gorney M5/687
David C. Evans NSSA Carl S Gran M4/967
David J. Evans M4/014 Philip B Grant M5/642
Rita A. Evans M1/162 Gary B. Green M4/944
Arthur M. Falconer M5/559 Patricia W. Green M1/199
Boyd Z. Faught CH2-405 Lawrence T. Greenberg M2/264
Philip A. Fawcett CH2/406 Darrell D. Gritz M4/906
Robert G. Feddes CH1-510 Sergio B. Guarro M4/997

FINAL APPROVER DRAW LINE(S) ACROSS UNFILLED SPACE AND INITIAL TO PRECLUDE ADDITIONS 2

APPROVED ________________________________________________________________________________________________ DATE ___________________

IF LIST COMPRISES TWO OR MORE SHEETS, COMPLETE ABOVE BLOCK ON LAST SHEET ONLY

SHEET 2 OF 6
AEROSPACE FORM 2394 REV 3-85
INTERNAL DISTRIBUTION LIST
REPORT TITLE

Five Common Mistakes Reviewers Should Look Out For

REPORT NO. PUBLICATION DATE SECURITY CLASSIFICATION


TOR-2007(8617)-1 29 June 2007 Unclassified
1. FOR OFF-SITE PERSONNEL, SHOW LOCATION SYMBOL, e.g., JOHN 2. IF LIST IS ALTERED, INITIAL CHANGE(S) AND SHOW AFFILIATION
Q. PUBLIC/VAFB *FOR SECRET REPORTS, SHOW BLDG AND ROOM, NOT MAIL STATION
NAME (Include Initials) MAIL CODE* NAME (Include Initials) MAIL CODE*
Andy T. Guillen M5/584 Jack K. Holmes M1/937
Ira A. Gura M1/106 Kirk S. Horton CH2/402
Charles L. Gustafson M8/234 Mark A. Hopkins ACP-510
Stanley S. Gustafson M5/752 Ronald G. Hopkins M4/ 948
Gerald P. Guydan M5/019 Sharon K. Hoting M5/625
John A. Hackwell M2/254 William H. Huber M5/576
Ranwa Haddad M5/688 John P. Hurrell M2/238
Eric K. Hall II M4/921 Harold J. Huslage Jr HOUSTON
Linda F. Halle CH1/510 Walter J. Hussey ROS
Wayne P. Hallman M4/942 Warren C. Hwang M2/275
Brian T. Hamada M6/213 George I. Iwanaga M8/717
Dennis L. Hamme CH1/620 Don E. Jackson A4/413
Marvin J. Hamilton M8/220 Michael M. Jacobs M5/625
Kristina M. Harrington CH2/404 Bernardo Jaduszliwer M2/238
John M. Haas SLVR Jerry G. Jamieson CH1/520
Thurman R. Haas CH2/220 Bruce K. Janousek M2/264
William G. Hatton M5/643 Douglas H. Jensen M1/015
Gary F. Hawkins M2/250 Lubo B. Jocic M5/557
Sally A. Hayati M1/107 Diana M. Johnson M1/937
Thomas L. Hayhurst M4/041 Eric C. Johnson M2/248
Richard H. Hazen M4/179 Gail A. Johnson M1/016
Michael P. Healy NSSA Ray F. Johnson M5/585
Raymond F. Heidner Iii M6/210 Thomas W. Johnson CH1/450
Thomas J. Heigle CH2/307 Michael R. Jones VAFB
Laurie J. Henrikson CH1/510 Susan E. Jones CH1/410
Ronald R. Herm M5/625 John C. Jones M5/591
William H. Hiatt CH1/440 Mark A. Julian COS
David R. Hickman M4/161 Alvar M. Kabe M4/911
Robert A. Hickman M5/633 Stanley A. Kaminski M8/018
Rodney A. Hignite CH2/307 Jimmy W. Kane CH2/309
Bernardo Higuera M6/209 Frank D. Kantrowitz M4/927
Malina M. Hills M1/003 Wei H. Kao M2/242
Michael R. Hilton M4/923 Patrick L. Keithley NSSA
Nelson J. Ho M1/111 Gerald R. Keller M6/280
Albert C. Hoheb Jr M4/035 Randolph L. Kendall M1/134
Sidney Hollander M8/018 Maurice A. King Jr M8/221
Kenneth G. Holden M5/579 John A. Kinsey ARL

FINAL APPROVER DRAW LINE(S) ACROSS UNFILLED SPACE AND INITIAL TO PRECLUDE ADDITIONS 3

APPROVED ________________________________________________________________________________________________ DATE ___________________

IF LIST COMPRISES TWO OR MORE SHEETS, COMPLETE ABOVE BLOCK ON LAST SHEET ONLY

SHEET 3 OF 6
AEROSPACE FORM 2394 REV 3-85
INTERNAL DISTRIBUTION LIST
REPORT TITLE

Five Common Mistakes Reviewers Should Look Out For

REPORT NO. PUBLICATION DATE SECURITY CLASSIFICATION


TOR-2007(8617)-1 29 June 2007 Unclassified
1. FOR OFF-SITE PERSONNEL, SHOW LOCATION SYMBOL, e.g., JOHN 2. IF LIST IS ALTERED, INITIAL CHANGE(S) AND SHOW AFFILIATION
Q. PUBLIC/VAFB *FOR SECRET REPORTS, SHOW BLDG AND ROOM, NOT MAIL STATION
NAME (Include Initials) MAIL CODE* NAME (Include Initials) MAIL CODE*
Chris F. Klein M4/978 John W. Martillo M1/023
Jorn Kluetmeier SUNY Dean C. Marvin M2/264
Francis L. Knight M4/922 Sumner S. Matsunaga M4/933
John T. Knudtson M2/747 Victor R. Matricardi A4/413
Rokutaro Koga M2/259 Bruce H. Mau M5/586
Kenneth W. Kowalski COS Donald C. Mayer M4/934
William C. Krenz M5/557 Michael G. McLain CH2/307
David B. Kunkee M4/927 Samuel R. McWaters ACP/535
Tung T. Lam M4/916 John S Mclaughlin M5/721
John D. Lang ROS Stuart M. Melzer CH1/520
Thomas J. Lang M4/947 Jean L. Michael CH2/403
Valerie I. Lang M5/564 Jerry D. Michaelson M1/936
Ronald J. Larry M1/036 Mark E. Miller M5/752
Ronald L. Lash M5/656 Stephen B. Miller M8/695
Charles H. Lavine M1/055 Inki A. Min M4/ 940
David G. Lawrie M4/041 Robert J. Minnichelli M8/232
Steven Lazar M5/685 Gregory S. Mitchell CH1/420
Allan W. Legrow ROS Andre C. Montoya CH3/111
Steven M. Leontis M5/649 Ernest M. Moore M1/159
Donald A. Lewis M5/721 Steven C. Moss M2/244
Samuel Lim M4/933 Randolph M. Moyer C/CANAVERAL
Sheng-Rong Lin M4/912 Theodore J. Muelhaupt CH1/430
Alexander C. Liang M4/945 Gary F. Mueller M5/625
Craig T. Lindsay PETERSON Thomas J. Murphy CH1/610
James T. Lloyd CH1-530 Ejike D. Ndefo M4/965
Rodney Lochmann M6/204 Nicola A. Nelson CH1/410
Rita M. Lollock M5/689 Edwin E. Neel Jr CH1/540
Terrence Lomheim M4/980 Mary L. Nichols M1/107
Ernest Long Jr M1/931 Ronald G. Nishinaga M5/500
Gordon J. Louttit M1/042 Kirk Nygren M1/129
Michael P. Lyons M2/377 Todd M. Nygren M8/222
Dan J. Mabry M2/269 Kevin O'Brien ROS
John A. Maguire ROS Michael J. O'Brine SCHRIEVER
Virendra N. Mahajan M5/661 Thomas J. Oldenburg ROS
Mark W. Maier CH1/440 Mark M. Oleksak M8/220
Patrick H. Mak M8/717 Mabel R. Oshiro M1/448
Randy Mamiaro ROS Dee W. Pack M2/266

FINAL APPROVER DRAW LINE(S) ACROSS UNFILLED SPACE AND INITIAL TO PRECLUDE ADDITIONS 4

APPROVED ________________________________________________________________________________________________ DATE ___________________

IF LIST COMPRISES TWO OR MORE SHEETS, COMPLETE ABOVE BLOCK ON LAST SHEET ONLY

SHEET 4 OF 6
AEROSPACE FORM 2394 REV 3-85
INTERNAL DISTRIBUTION LIST
REPORT TITLE

Five Common Mistakes Reviewers Should Look Out For

REPORT NO. PUBLICATION DATE SECURITY CLASSIFICATION


TOR-2007(8617)-1 29 June 2007 Unclassified
1. FOR OFF-SITE PERSONNEL, SHOW LOCATION SYMBOL, e.g., JOHN 2. IF LIST IS ALTERED, INITIAL CHANGE(S) AND SHOW AFFILIATION
Q. PUBLIC/VAFB *FOR SECRET REPORTS, SHOW BLDG AND ROOM, NOT MAIL STATION
NAME (Include Initials) MAIL CODE* NAME (Include Initials) MAIL CODE*
Susan H. Painter M1/151 Donald F. Schmunk M8/234
Eric S. Parker COS Terry D. Schoessow M6/206
Richard M. Pastore ROS Nielson W. Schulenburg CH1-510
Chris T. Pate SLVR Robert V. Schwartz M5/024
Natverlal R. Patel M4/899 Richard A. Seebach M8/715
John R. Parsons M1/004 Randolph P. Sena M5/682
Judith E. Peach M1/119 Kevin M. Severin M1/023
Jay P. Penn M8/615 Walter E. Shepherd M4/927
Erwin Perl M4/907 Alan R. Shibata M5/551
Philip J. Peters M4/978 John R. Shure M5/649
Phillip E. Plemmons C/CANAVERAL Kenneth R. Sieck M5/686
Dennis A. Plunkett M5/665 Milton A. Silveira ROS
Fredric M. Pollack M1/046 Alan G. Silver SCHRIEVER
Peter L. Portanova M6/206 Bruce L. Simpson M1/119
Jan W. Prazak M5/721 John P. Skratt M8/615
Alfred T. Pritt Jr CH1/520 Ramunas J. Skrinska M4/966
Scott J. Prouty ROS Harold S. Smith M1/119
Gary P. Pulliam ROS Patrick L. Smith M4/954
Jeffery Quirk M5/664 Peter J. Soller SCHRIEVER
Rami R. Razouk M1/025 Stephen M. Sondag CH2/402
Kathleen A. Reed M5/559 Alfred N. Sorensen M1 /065
Thomas B. Rehder VAFB Stephen M. Soukup M8/693
Mary A Rich M1/106 Dana J. Speece M4/987
Alexander F. Rivera M4/986 Michael R. Spence C/CANAVERAL
Steven R. Robertson M4/991 Millard H. Spiller AURORA
Jana L. Roche CH2/404 Catherine J. Steele CH2/307
Mark N. Rochlin ROS Linda T. Stephenson M1/046
George A. Rock Jr M5/721 Christine L. Stevens M5/655
Richard J. Rodriguez M1/162 Lauresa Stillwell M1/039
Ronald L. Roehrich PETERSON Stephen A. Stoops ROS
Edward K. Ruth M8/132 Joseph A. Strada CH2/401
Donald G. Sather M1/177 Norman L. Strang M4/903
Carl R. Scheerer COS Joe M. Straus M1/007
George J. Scherer M2/272 David C. Straw CH1-510
Roy D. Schermerhorn M1/013 Eric G. Stroud M6/204
Ernest R. Scheyhing M4/899 Wayne K. Stuckey M2/247
Andrew J. Schickling M5/657 Twain K. Summerset M6/225

FINAL APPROVER DRAW LINE(S) ACROSS UNFILLED SPACE AND INITIAL TO PRECLUDE ADDITIONS 5

APPROVED ________________________________________________________________________________________________ DATE ___________________

IF LIST COMPRISES TWO OR MORE SHEETS, COMPLETE ABOVE BLOCK ON LAST SHEET ONLY

SHEET 5 OF 6
AEROSPACE FORM 2394 REV 3-85
INTERNAL DISTRIBUTION LIST
REPORT TITLE

Five Common Mistakes Reviewers Should Look Out For

REPORT NO. PUBLICATION DATE SECURITY CLASSIFICATION


TOR-2007(8617)-1 29 June 2007 Unclassified
1. FOR OFF-SITE PERSONNEL, SHOW LOCATION SYMBOL, e.g., JOHN 2. IF LIST IS ALTERED, INITIAL CHANGE(S) AND SHOW AFFILIATION
Q. PUBLIC/VAFB *FOR SECRET REPORTS, SHOW BLDG AND ROOM, NOT MAIL STATION
NAME (Include Initials) MAIL CODE* NAME (Include Initials) MAIL CODE*
Carl A. Sunshine M5/665 George F. Widhopf M8/587
Bill W. Sutton VAFB James M. Wilson CH2-220
David G. Sutton M2/248 Paul E. Wilson CH1/650
Stewart A. Sutton M1/003 Herbert J. Wintroub M1/928
Joseph E. Swistak CH1/510 Howard D. Wishner M5/688
Gilbert T. Takahashi M5/500 James M. Womack M4/997
Charles C. Tang M5/560 John R. Wormington M1/016
Lynette S. Tatman CH2/220 Robert P. Wright M1/003
James R. Taylor M5/588 Donald H. Yang M4/983
James T. Tengan M1/359 Allyson D. Yarbrough M4/934
Merlin E. Thimlar M8/080 Karolyn D. Young CH2/307
David M. Thomas CH2/403 Harold T. Yura M2/246
Paul R. Thompson M8/132 Sherrie L. Zacharius M5/681
Edmardo J. Tomei Jr M5/586 Albert H. Zimmerman M2/275
William F. Tosney M6/203 Keith P. Zondervan M5/649
Barbara J. Tressel M4/904 Kevin L. Zondervan ROS
Thomas K Trettin CH2/209 AOLIB (2, 5540, 150, AAB) † M1/199
Bryan K. Tsunoda M1/149 A. A. Barker † M2/377
Scott R. Turner CH1/450
Jacqueline K. Unitis CH2/406 †
—Indicates hardcopy and softcopy
Robert M. Unverzagt M6/209
distribution; all others receive softcopy
Paul R. Vaughan M8/218
only.
Susan M. Vogel CH1/510
John E. Wangsgard CH1/510
Robert J. Waldron Jr M1/300
Donald R. Walker ROS
Dale E. Wallis M1/064
Joseph F. Wambolt VAFB
Michael D. Weidner COS
Marsha V Weiskopf M6/210
John E. Wessel M2/253
Jon R. Westergaard CH2/307
Marilee J. Wheaton M4/929
Harry W. White Jr M1/178
Milo E. Whitson Jr M5/642
Fletcher D. Wicker Jr M1/137
James M. Williams CH1/520

FINAL APPROVER DRAW LINE(S) ACROSS UNFILLED SPACE AND INITIAL TO PRECLUDE ADDITIONS 6

APPROVED ________________________________________________________________________________________________ DATE ___________________

IF LIST COMPRISES TWO OR MORE SHEETS, COMPLETE ABOVE BLOCK ON LAST SHEET ONLY

SHEET 6 OF 6
AEROSPACE FORM 2394 REV 3-85
EXTERNAL DISTRIBUTION LIST

REPORT TITLE

Five Common Mistakes Reviewers Should Look Out For

REPORT NO. PUBLICATION DATE SECURITY CLASSIFICATION


TOR-2007(8617)-1 29 June 2007 Unclassified
MILITARY AND GOVERNMENT OFFICES ASSOCIATE CONTRACTORS AND OTHERS
1. SHOW FULL MAILING ADDRESS: INCLUDE ZIP CODE, MILITARY OFFICE SYMBOL, AND “ATTENTION” LINE.
2. IF LIST IS ALTERED, INITIAL CHANGE(S) AND SHOW AFFILIATION.

Space and Missile Systems Center


Air Force Space Command
483 N. Aviation Blvd.
El Segundo, CA 90245-2808
Attn:
Col. R. Reaser SMC/CZ
Col. J. Horejsi SMC/EN
Lt. Col. D. Dzaran SMC/SB
Lt. Col. R. Fortson SMC/SY
Lt. Col. A. Giczy SMC/MV
Lt. Col. R. Primbs SMC/XR
Lt. Col. J. Wilt SMC/IRRT
Mr. D. Barsotti SMC/AXE
Mr. L. Beckstead SMC/Det 12
Mr. D. Davis SMC/AXEM
Mr. G. Kraver SMC/45 SW
Mr. P. Kocincki SMC/AXC
Mr. R. Krilowicz SMC/AXD
Mr. F. Kozak SMC/30 SW
Mr. P. Rodriguez SMC/AXZ
Ms. A. Schiappi SMC/Det 11
Mr. D. Wynn SMC/LM

FINAL APPROVER DRAW LINE(S) ACROSS UNFILLED SPACE AND INITIAL TO PRECLUDE ADDITIONS

DISTRIBUTION LIMITATIONS MARKED ON THE COVER/TITLE PAGE ARE AUTHORIZED BY SIGNATURE BELOW

APPROVED ________________________________________________________________________________________________ DATE _____________________


(AEROSPACE)

APPROVED BY _____________________________________________________________________________________________ DATE _____________________


(AF OFFICE) (NOT REQUIRED FOR ATR CATEGORY)

IF LIST COMPRISES TWO OR MORE SHEETS, COMPLETE ABOVE BLOCK ON LAST SHEET ONLY

SHEET 1 OF 1
AEROSPACE FORM 2380 REV 11-85

You might also like