Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

To Orbit and Back Again: How the Space Shuttle Flew in Space
To Orbit and Back Again: How the Space Shuttle Flew in Space
To Orbit and Back Again: How the Space Shuttle Flew in Space
Ebook893 pages10 hours

To Orbit and Back Again: How the Space Shuttle Flew in Space

Rating: 5 out of 5 stars

5/5

()

Read preview

About this ebook

The Space Shuttle has been the dominant machine in the U.S. space program for thirty years and has generated a great deal of interest among space enthusiasts and engineers. This book enables readers to understand its technical systems in greater depth than they have been able to do so before.

The author describes the structures and systems of the Space Shuttle, and then follows a typical mission, explaining how the structures and systems were used in the launch, orbital operations and the return to Earth. Details of how anomalous events were dealt with on individual missions are also provided, as are the recollections of those who built and flew the Shuttle. Many photographs and technical drawings illustrate how the Space Shuttle functions, avoiding the use of complicated technical jargon.

The book is divided into two sections: Part 1 describes each subsystem in a technical style, supported by diagrams, technical drawings, and photographs to enable a better understanding of the concepts. Part 2 examines different flight phases, from liftoff to landing. Technical material has been obtained from NASA as well as from other forums and specialists.

Author Davide Sivolella is an aerospace engineer with a life-long interest in space and is ideally qualified to interpret technical manuals for a wider audience. This book provides comprehensive coverage of the topic including the evolution of given subsystems, reviewing the different configurations, and focusing on the solutions implemented.
LanguageEnglish
PublisherSpringer
Release dateAug 27, 2013
ISBN9781461409830
To Orbit and Back Again: How the Space Shuttle Flew in Space

Related to To Orbit and Back Again

Related ebooks

Aviation & Aeronautics For You

View More

Related articles

Reviews for To Orbit and Back Again

Rating: 5 out of 5 stars
5/5

1 rating0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    To Orbit and Back Again - Davide Sivolella

    © Springer Science+Business Media New York 2014

    Davide SivolellaTo Orbit and Back AgainSpringer Praxis Books10.1007/978-1-4614-0983-0_1

    1. A brain and mind for the Orbiter: the avionics system

    Davide Sivolella¹

    (1)

    Aerospace Engineer, Thomson Airways, London Luton Airport, London, United Kingdom

    SHUTTLE DATA PROCESSING SYSTEM: FAMILIARIZATION

    A Swiss-knife computer for the Shuttle

    As the human body cannot live and function without the pumping action of the heart, the data processing system (DPS) formed the active heart of the Space Shuttle, for without it the Orbiter simply could not fly. Events such as external tank separation, jet firings, main engine cutoff, communications, and miscellaneous other functions, were so complex and time-critical that only by using computers were they feasible. Even manual control of ascent and re-entry would have been impracticable without computers, since the manual inputs provided by the pilots needed to be elaborated by the computers to produce the desired effect. At a higher level, the DPS performed tasks essential to flying the vehicle (guidance, navigation and control, or GNC), monitoring on board systems (system management, or SM), and both transmitting telemetry to Mission Control and enabling Mission Control to command on board systems. Owing to this Swiss-knife character of the Shuttle computers, they were normally called general purpose computers or GPCs.

    For reasons that will be explained in the following paragraphs, five GPCs formed the brain of the DPS. In an era in which we are used to knowing which are the most common computer brands and manufactures, the computers used on the Shuttle are hardly known to the general public. In order to lower research and development costs of the Shuttle program, NASA wanted an off-the-shelf computer system. If space rating a system involved stricter requirements than a military standard, starting with a military-rated computer would make the next step in certification a lot easier and cheaper. Therefore, in the early 1970s, only two computers for aircraft avionics under development were potentially suitable for the new spaceship: the IBM AP-101B (a derivative of the technology that was already in use by various military and NASA flight programs) and the Singer-Kearfott SKC-2000 (which at that time was under consideration for the B-1 stealth bomber program). But both would clearly require extensive modification for use in space. The IBM machine was selected because of the company’s success with developing the computers for the Saturn V moonrocket and the Skylab space station, whose systems bore a slight similarity to the avionics configuration planned for the Shuttle. In modern terms, the processing power of the GPC was ridiculously inferior to even the least powerful desktop computer that one can buy, but compared to what was available for space applications back then, they were cutting-edge technology.

    A978-1-4614-0983-0_1_Fig1_HTML.jpg

    AP-101 General purpose computer schematics.

    Each computer consisted of a central processing unit (CPU), an input/output processor (IOP), one megabyte of memory and various other components housed in an electromagnetic interference (EMI) hardened case. While the CPU performed the instructions to control on board systems and manipulate data, the IOP formatted and transmitted commands to the systems, received and validated response data from the systems, and maintained the status of the interfaces between the CPU and the other computers. In other words, while the CPU was the number cruncher the IOP did all the interfacing with the rest of the computers and vehicle systems. The computers were able to perform their functions by control logic embedded in a combination of software and microprogrammed hardware.

    Within a few years of initiating the AP-101B design in January 1972, it became evident that an improved GPC would be required. Studies for upgrading the existing AP-101B started in January 1984, and they culminated in the mid-1990s with the introduction into service of the AP-101S. From a configuration point of view, the big difference was that the new computers incorporated the CPU and the IOP in a single avionics box, halving the weight and size, and also reducing the power requirements. From a performance point of view, this upgrade provided 2.5 times the memory capacity and up to three times the processor speed with minimum impact on flight software. For instance, while the old GPCs were capable of 400,000 operations per seconds, the new ones could perform up to 1,000,000 operations per second.

    The Shuttle nerves

    As in the human body, in which the brain communicates its commands and receives information by means of an extensive network of nerves, the Shuttle computers could communicate with all the on board systems and payloads via discrete signal lines and serial digital data buses. While the discrete signal lines transmitted signals indicating a binary condition such as the position of a given switch or circuit breaker, the data buses transmitted bulk information and data regarding the status of all the systems.

    The choice to use data buses was taken early in the program, in a period in which the aviation industry was already starting to implement this configuration in the latest jets. Because sensors, control effectors and associated devices would be distributed all over the Orbiter, the weight of the individual wires required to carry all the signals and commands needed for operating all of its elements would have been prohibitive. In response, the use of multiplexed digital data buses was investigated and baselined. Generally speaking, a data bus physically consists of a pair of insulated wires twisted together and then electrically shielded, and it permits data transmission from a large number of sources on a time-sharing basis to a single or perhaps multiple receivers.

    A978-1-4614-0983-0_1_Fig2_HTML.jpg

    Data processing system.

    The Orbiter’s data bus network comprised 28 data buses allocated by functional use, criticality, and traffic load, into seven different categories.

    Eight data buses belonged to the category of flight-critical buses (FC) since they carried all the data and command traffic associated with guidance, navigation, flight control, mission sequence, and management of critical non-avionics functions.¹ An important part of a space flight is enabling Mission Control to analyze the status of the vehicle and its payload. In the Orbiter, two pulse-code modulation master units (PCMMU) received data from the on board instrumentation and payload as well as from the five GPCs via individual instrumentation/PCMMU data buses. Once in the PCMMU, the data were formatted in operational downlink, which was sent to one of two network signal processor (NSP). In the NSP the operational downlink was combined with the on board recorded voice for transmission to the ground by either the S-band or Ku-band communications systems. The GPCs were linked to the displays and keyboards on the flight deck by four display/keyboard (DK) data buses. And two launch data buses that were used mainly for ground checkout and launch phase activities served as an interface for data gathered from the solid rocket boosters, and once in space they provided an interface with the controller for the remote manipulator system. These buses differed from the others in that they required isolation amplifiers to accommodate the long wire runs to the launch processing system and to isolate the buses when disconnected at liftoff and at solid booster separation. Two payload data buses (PL) provided an interface for payload support operations, system management functions, payload bay door control, and communications antenna switching.

    Five intercomputer communication (ICC) data buses connected each GPC to its four counterparts to enable them to communicate among themselves. Interestingly, these buses operated in a slightly different way to all the others. In general, each GPC would request data using a specific data bus to prompt the appropriate hardware device to provide that data to the requesting GPC over the same data bus. On the ICC buses, each GPC transmitted to its counterparts without receiving a request for data. In this way, each computer was able to continually know what the others were doing, and this enabled them to remain synchronized.

    Finally, the mass memory (MM) data buses allowed each GPC to retrieve flight software from one of the two mass memory units (MMU). In the initial version, each MMU was a coaxially mounted reel-to-reel digital magnetic tape storage device for GPC software and Orbiter systems’ data and it could be written to or read from. The tape was 602 feet long, 0.5 inch wide and had nine tracks, eight of which were data tracks and the ninth was a control track. Each track was also divided into files and subfiles for finding particular locations. Later in the program, the tape recorders were replaced by modular memory units that were faster, had greater capacity, and used a solid-state mass memory and a solid-state recorder for serial recording and dumping of digital voice.

    Just as the synapses act as ports for exchanging information between neurons in the brain, the GPCs had dedicated connections to enable them to communicate with the digital data buses and hence with the outside world. In fact, each computer had 24 so-called BCE/MIA which acted as input/output ports. How many ports a computer ought to have was the subject of much discussion in the early design phase. At that time the total system bus traffic density was known only to a first approximation, and the catastrophic effects on the system of exceeding the 1 Mbit/ sec bus limit provided the motivation to build in a significant margin. The uncertainty in this area, and the desire for functional isolation, resulted in the greatest number that could reasonably be accommodated in the computer input/ output processor: twenty-four.

    Inside each computer, data were transmitted in parallel along 18-bit buses but on the buses data were transferred in serial form at a 1 MHz rate, so it was necessary to have a translator between these different forms of data. The multiplexer interface adapter (MIA) converted serial data into parallel data for the CPU of the computer and vice versa. The other part of the port, the bus control element (BCE), was a microprogrammed processor which could transfer data back and forth between a GPC’s memory and the MIA.

    The GPCs sent and received commands and data to and from hardware known by the generic name of bus terminal units (BTU), and a multiplexer/demultiplexer (MDM) was one example. Each MDM was connected to a given number of on board sensors, from which it received data to transmit to the GPCs via one of the two data buses to which it was connected. At the same time, an MDM transmitted to a specific sensor the data and commands provided by the GPCs commanding it. In other words, the GPCs were not connected to each individual sensor or system distributed across the Orbiter, as that would have been impractical in terms of wiring and data handling. That is why the serial digital data bus network was implemented. It can be visualized as a sort of tree, where the nourishment is brought from the roots to each single leaf by a network of vessels which branch ever more diversely. In the same manner, the GPCs (the root of the tree) sent commands to the MDMs via the digital data buses (a main branch), from which the information was sent to the specified sensor (a leaf). In the Orbiter though, this was a two-way path.

    Each MDM converted and formatted serial digital GPC commands into separate parallel discrete digital and analog commands for various vehicle hardware systems. This was demultiplexing. The opposite process of multiplexing was converting and formatting the discrete digital and analog data from vehicle systems into serial digital data for transmission to the GPCs.

    Each MDM included two redundant MIAs, which worked in the same manner as the GPC’s ports. Each MIA was part of a redundant channel inside the MDM, which included a sequence control unit (SCU) and an analog-to-digital (A/D) converter. The SCU split the commands provided by the GPCs and directed them to the proper input/output modules (IOM), usually referred to as cards, for forwarding to one of the subsystems to which a card was connected. The cards available in an MDM were specific to the hardware components accessed by that type of MDM. For this reason, a flight-critical MDM and a solid rocket booster MDM (for example) were not interchangeable. In fact, flight-critical MDMs could only be swapped amongst themselves. This could be performed in flight, if doing so would restore access to a critically needed piece of Orbiter hardware. The SCU also assembled all the inputs from the various IOM cards into a single bit stream to be sent to the GPCs. The A/D converted any analog input data to digital form prior to being multiplexed by the SCU. The use of MDMs made the system very flexible, in that sensor devices could be added with only minor changes to the MDMs and the flight software.

    Thirteen of the 20 MDMs on the Orbiter were incorporated into the DPS. They were connected directly to the GPCs, and were named and numbered by reference to their location in the vehicle and hardware interface. The other seven were part of the vehicle instrumentation system and sent instrumentation data to the PCMMUs to be included in the telemetry transmitted to the ground. The DPS MDMs consisted of flight-critical forward (FF) MDMs 1 through 4, payload (PL) MDMs 1 and 2, and GSE/LPS² launch forward (LF1), launch mid (LM1), and launch aft (LA1). One or two flex-MDMs (FMDMs) could also be connected to the PL data buses, depending on the payload needs of a particular flight. Of the seven operational instrumentation MDMs, four were located forward (OF1 to OF4) and three on the aft fuselage (OA1 to OA3).

    A mind for the Shuttle: the primary avionics software system

    If the five GPCs, 24 data buses and 20 MDMs represented the brain and nervous system of the Orbiter, its mind was the primary avionics software system (PASS). Simply put, PASS contained all the programming needed in order to fly the vehicle through all flight phases and to manage all vehicle and payload systems. Due to the vast number of functions that PASS was required to perform, its code was divided into two major groups: system software and application software.

    System software was analogous to an operating system running on each GPC. As such, some of the main functions that it performed were to control GPC input/ output, load in new memory configurations, keep track of time, assign computers in the role of commanders and listeners on specific data buses, and exercise the logic involved in transmitting commands over these buses at specific rates. The system software comprised three modules. In particular, the flight computer operating system (FCOS) controlled the processors, monitored vital system parameters, allocated computer resources, provided orderly program interruptions for higher priority activities, and updated computer memory. The system control program initialized each GPC and also arranged for multiple GPC operation during flight-critical phases. Flight crew commands or requests were processed by the user interface program.

    While the system software served as a housekeeper, the application software was that part of PASS which performed the actual duties required to fly and operate the Orbiter. At this point, it is important to note that the development of the GPCs and PASS were pursued in parallel, and it was soon realized that the memory available in each computer would be insufficient to store all of the flight software that was being developed. This had two important consequences: firstly, the addition to the DPS of mass memory units, and secondly the need to divide PASS into a number of small software modules, some of which would be used only during specific flight phases. This division was organized in three levels.

    A978-1-4614-0983-0_1_Fig3_HTML.jpg

    Orbiter flight computer software.

    On the first level, PASS was divided into three so-called major functions, defined as follows:

    1.

    Guidance, Navigation and Control (GNC): This major function had all the software necessary to perform flight-critical functions such as navigation sensor management, control of aerosurfaces for maneuvers, and trajectory calculations.

    2.

    System Management (SM): All non-avionics systems were managed via this major function, including the electrical, environmental and communication systems. It also contained payload-related software.

    3.

    Payloads (PL): Despite its name, this software did not support operations with the payload during flight. It was only used when preparing the vehicle at KSC, to load content into the MMUs. This major function was said to be unsupported, which meant that at any given time it was not being processed by any of the GPCs.

    On the second fragmentation level, each major function was then split into a given number of submodules, referred to as operational sequences (OPS), each containing the instructions to control a particular phase of the flight. In its turn, on the third level of fragmentation, each operational sequence comprised a series of submodules called major modes (MM) which provided the instructions for a specific portion of a given mission phase.

    Owing to the need for the flight crew to interact with the software on a daily basis for checking software configuration, monitoring on board system status, providing instructions for undertaking mission-related maneuvers, and so on, PASS offered a series of additional software blocks for each OPS to generate data and information to show on the displays on board. The highest priority blocks were linked to the major modes and generated the so-called major mode displays, or base pages, that provided information on the current portion of the mission phase. At the same time, the crew could interact with this data via keyboard entries. Sequencing from base page to base page could be initiated manually by keyboard entry from the crew or, in some cases, automatically in response to a specific event or condition detected by the software.

    A978-1-4614-0983-0_1_Fig4_HTML.jpg

    OPS substructure.

    The second-highest priority software blocks were called specialist functions, and generated SPEC display pages that enabled the crew to monitor the operations of the Orbiter. As with the base pages, the SPEC displays could be altered by the crew via keyboard entry, but unlike the base pages they could be recalled only with keyboard entry. It is important to remember that the difference between base pages and SPEC pages was that the former allowed monitoring and alteration of the primary functions within an OPS, but the latter applied only to secondary or background functions.

    The lowest priority software blocks carried out so-called display functions, that is to say they generated display pages, known as DISP, on which data and information were shown but could not be modified by the crew. The DPS Dictionary, a document of several hundred pages, provided a detailed description of every possible display (base pages, SPECs and DISPs) that a crew member might require to access.

    Once again, it is worth remembering that this seemingly complicated subdivision of the primary software derived from the scarce memory capability of the processors when the Shuttle was designed.

    When a new flight phase was initiated, the appropriate software had to be loaded into the GPCs in a process called OPS transition, where all major modes of an OPS were loaded into GPC memory. Once an OPS was loaded, the crew could manually initiate the new phase of the mission. The operational sequence and major mode transitions were generally performed by the flight crew, but during ascent and the final part of re-entry all major mode transitions were carried out automatically by software because these phases of the flight were very critical and the workload on the crew was already intense. Irrespective of how they were made, all transitions had to be legal, meaning that several preconditions had to be satisfied. Of these, the most interesting was that the transition had to be logical: for example there would be no need to transition the GPCs back to the terminal countdown software from any post-launch major mode. The system would refuse to perform such a transition.

    It is important to note that at any given moment, only one OPS could be present in a GPC’s memory. However, this was not true for ascent, where the GPCs running PASS had both the OPS for a nominal ascent and that for the Return To Launch Site (RTLS) launch abort mode. Because of the need to maintain the Orbiter stable and controllable during the transition from nominal ascent to RTLS, there would not have been enough time for loading the PASS software for the abort. For this reason it was already loaded and ready to be activated immediately if the need arose.

    The memory configuration (MC) comprised the combination of system software and application software for a specific operational sequence. When new application software was to be loaded, the crew had to recall from the memory units one of the eight memory configurations available. In order to improve the redundancy provided by the two memory units, a copy of that portion of software concerning re-entry (OPS3) was also stored in the upper memory of each GPC running PASS, in the so-called G3 archive. In this way the software was always available, ready for an emergency re-entry. In addition, this archive allowed a quick loading of re-entry software in the GPC lower memory for executing a Transoceanic Abort Landing (TAL) and Abort Once Around (AOA) during ascent without wasting time loading it from the memory units.

    A978-1-4614-0983-0_1_Fig5_HTML.jpg

    Application software memory configuration.

    In an effort to continuously optimize the utilization of the scarce memory space available, and also to preserve vital data during software transitions, the application software was further split in two components named major function base (MFB) and OPS overlay. The MFB was the application software that was common to all major modes of a given major function. For example, each operative sequence in the GNC major function used a different scheme for calculating the Orbiter’s state vector,³ but in transitioning from one OPS to the next it was essential to maintain the information about the current state vector. This data would be contained in the MFB for GNC. An MFB would also contain portions of flight software that were common to all of the OPS of a given major function. During an OPS transition, only that part of the code that was not common to the other OPS of the major mode would be loaded, which is why it was called OPS overlay.

    HAL

    In Stanley Kubrick’s movie 2001: A Space Odyssey the crew of Discovery One on course to Jupiter are attacked by a sentient computer named HAL that is unwilling to be deactivated.

    The Orbiter of the Shuttle had its own HAL, but there was no danger of the crew being overpowered. HAL was the name of the programming language used to write the application software for the Shuttle. Prior to the development of the Shuttle, flight software for space applications was written in assembly language, or something close to that level. Generally speaking, assembly language is very powerful since it allows a strict control of the processor memory and registers, making the code optimized for the specific computer on which it is to run. The downside, however, is the lengthy, expensive and difficult development due to the nature of the language, which uses a complex syntax and grammar that is susceptible to errors. Furthermore, software in assembly language is specific to the machine, it is not portable. To overcome this, in the 1960s high level languages (HLL) were invented with a syntax and grammar that was closer to human programmers than machines. This characteristic makes a HLL a powerful tool, because it facilitates faster development and the code is much easier to read and modify. Of course, the computer still needs to receive instructions in its own binary code language, a task carried out by a translator known as a compiler. In this way, software written in a HLL is created independently of the computers on which it will run, making it portable and flexible. The downside of a HLL language is that it does not permit a programmer to directly manipulate the processor’s memory and registers, slightly penalizing the performance.

    Having written the flight software for Apollo in assembly language, and realizing that the flight software for the Shuttle would be considerably more elaborate, NASA opted for a HLL. However, because none of the languages available at that time were optimized for real-time computing, it was decided to develop a new one specifically for this kind of task. HAL not only supports vector arithmetic, it can also schedule tasks according to programmer-defined priority levels. Because NASA directed the development of the language from the beginning, it strongly influenced its final form and specifically the way in which it could handle real-time processing. In developing HAL, NASA adopted a syntax and grammar similar to that which programmers were already accustomed to, and provided a variety of tools that could be used for creating real-time programs.

    It seems that the name HAL had nothing to do with the heuristically programmed algorithmic computer of 2001: A Space Odyssey. Some have suggested that it is an acronym for Higher Avionics Language, but it may simply derive from an engineer named Hal who was involved in the early development.

    The development by NASA of HAL was criticized by managers in a community used to assembly language systems. They felt that it would have been better to write optimized code in assembly language rather than produce less efficient software by a high level language. To settle the controversy, NASA told two teams to race against each other to produce some test software with one team using assembly language and the other using HAL. The running times of the software written in HAL was only 10 to 15 per cent longer than its counterpart in assembly language. It was therefore decided that the system software would be written in assembly language, since it would be modified only very rarely, and the application software and the redundancy management code would be written in HAL.

    It is worth mentioning that unfortunately HAL did not succeed in NASA’s initial intention of making it the primary programming language for space applications. Its only other programs were for the Jupiter-bounded Galileo mission and some ground applications of the Deep Space Network. It was abandoned in favor of Ada, another language optimized for real-time programming, developed in the mid-1970s by the Department of Defense for military applications and which is still in widespread use.

    REDUNDANCY

    When Columbia arose in the skies above Florida on 12 April 1981, it marked the first time that a new spaceship was test flown on its maiden launch with a human crew on board. Even if, with insight, this seems to have been a gamble, it must be admitted that it was possible thanks to the incredible reliability of the avionics system.

    For the Apollo program, the issue of providing a reliable data processing system was addressed by building a special-purpose computer with an incredible high level of expensive quality control. For the Shuttle NASA, facing budget cuts, decided to employ off-the-shelf hardware as much as possible. To compensate for the reduced reliability that this would mean, a new architecture was devised for incorporating into the hardware network the redundancy and reliability needed to enable a crew to safely fly the inaugural flight.

    The concept of redundancy was not new to NASA -the guidance system of the Saturn V used triple modular redundant (TMR) circuits, meaning that there was one computer with redundant components. This was also the philosophy implemented for the Apollo spacecraft, but the near-fatal Apollo 13 mission showed that extensive damage elsewhere in the vehicle could disable its computer. One of the many lessons learned from that mission was that by spreading redundancy among several simplex circuit computers distributed around the spacecraft the effects of such catastrophic failures could be minimized. For the Skylab space station, along with TMRs, it was decided to install two identical computers, each of which was capable of performing all the functions of the mission. Only one of these computers was active at any time, the other was switched off but available for immediate use in the event of a problem involving the first. The disadvantage of this system was that the computer which stepped in would have to find out where its partner had left off by referring to the contents of a 64-bit transfer register in the common section built with TMR circuits. This would have required some time, but it would not have been a problem since for Skylab the computers were not responsible for navigation or high frequency flight control functions. In a failure, it would have been permissible for the attitude of the vehicle to drift temporarily without causing a serious problem. For the Shuttle, things were much more complex. In this regard, it is worth examining in depth the mission and vehicle design drivers that dictated the overall system architecture.

    Mission-derived requirements

    The significant differences between the Shuttle and previous spacecraft included the requirement for much more complex and extensive on-orbit operations in support of a much wider variety of payloads, and the requirement to make precisely controlled unpowered runway landings. Along with the longstanding NASA rule that a mission must be aborted unless at least two means of safely returning to Earth were available, these requirements profoundly affected the design approach. To illustrate, previously the concept of safe return could be reduced to a relatively simple backup process, like a second set of pyrotechnics for extracting the parachutes or using more parachutes than were necessary for a nominal splashdown. So relatively simple backup systems were developed which, although less effective than the primary operational system, would nevertheless comply with the mission rule of assuring a safe re-entry. For the Shuttle this approach was inadequate. Atmospheric entry through final approach and landing imposed a performance requirement on the systems of the Orbiter as severe as any mission phase, meaning that a backup system with reduced performance was not feasible. Simply put, because the complexity of the re-entry required much more maneuvering than previous capsules, the backup systems of the Orbiter had to be capable of performing the same operations as the primary systems in order to ensure a safe return, albeit probably with reduced precision.

    Also, the economic impact of frequently aborting missions on a user-intensive program such as the Shuttle meant that to abort after suffering a single failed system would be unacceptable. Therefore, a comprehensive fail operational/fail safe (FO/ FS) philosophy was applied to all systems. For the avionics, this requirement meant that it had to remain fully capable of performing the operational mission after any single failure (fail operational) and capable of returning safely to a runway landing after any two failures (fail safe).

    Another constraint derived from experience on previous programs concerned the use of built-in test equipment (BITE) as a means of detecting component failure. This requirement was justified by the many recorded cases of BITE circuitry failures that had led to false doubts of the operability of a unit. Again this was unacceptable for the Shuttle due to the high annual rate of missions which it was intended to fly. The much preferred method of fault detection, which was the one chosen, was to compare actual operational data produced by one device or subsystem with that produced by devices or subsystems operating in parallel and performing the same function.

    Vehicle-derived requirements

    The Orbiter was an unstable airframe that could not have been flown manually even for the brief ascent/re-entry aerodynamic phases without full-time control stability augmentation. Although considered early in the program for post-entry aerodynamic flight control, cable/hydraulic boost systems were eliminated owing to their weight and mechanization difficulties, and instead an augmented fly-by-wire approach was baselined.

    Digital flight control systems were successfully used in the Apollo program and NASA was well aware of their advantages, so digital flight control was baselined for the Shuttle. However, the full-time augmentation requirement placed the digital flight control computation system in the safety-critical path, which in turn dictated a high degree of redundancy.

    The control authority necessary to achieve all the Shuttle vehicle requirements, particularly during ascent and re-entry, created a situation in which a control actuator hard-over command, issued erroneously, could cause structural failure and the loss of the vehicle if the command were to be allowed to remain in effect for as little as 10 to 400 milliseconds depending on the mission phase. This situation affected the design in at least two important ways. First, it imposed a requirement for actuator hard-over prevention irrespective of the failure condition. Second, because of the reaction time required, it eliminated any reliance on direct manual intervention from consideration in the reaction to a failure, in turn requiring a fully automatic redundancy fault-down approach. The concept adopted to prevent hard-overs was to use hydraulic actuators with multiple command inputs to a secondary actuator. These secondary actuator inputs were hydraulically force-summed, and the resultant command was sent to the so-called primary or power actuator, which was nothing less than a massive steel rod connected to the aerosurface. If one of the inputs diverged from the rest, as in the event of an erroneous hard-over command, the effect of its secondary stage output would be overpowered by the other secondary stage outputs and the control effector operated correctly. To make such a system work, multiple independently computed commands to the secondary actuator inputs had to be provided.

    These mission and vehicle requirements led to an avionics system that relied on coupled parallel multi-strings, tight synchronization, and redundancy management to accommodate any failure that could jeopardize a safe re-entry and divergence of the commands.

    A978-1-4614-0983-0_1_Fig6_HTML.jpg

    Typical actuator scheme.

    Looking for the right network architecture

    As very often happens in any branch of engineering, the problem of finding the best avionics architecture to satisfy the requirements of the Shuttle avionics gave rise to several different redundancy management schemes; in this case three.

    The first scheme was to run a number of totally independent sensors, computers and actuator strings.⁴ But this approach had a fatal flaw. Consider a scheme that had two independent strings at a critical point of the mission: two equal but opposite commands could be issued. In landing for example, one string might issue the elevon pitch-down command and the other one might issue the elevon pitch-up command, potentially resulting in loss of control of the vehicle. Another flaw in this design was that an analysis had to be made of what values were reasonable for every sensor, and how an average should be defined. It was also hard to set a tolerance level that would reject bad data whilst not losing good data near that limit. A multi-independent-string system, furthermore, would not be very fault tolerant. If the computer commanding a given string were to be lost, then all the sensors and effectors connected to that string would be irretrievably lost.

    Attention switched to the so-called master/slave scheme, in which one computer would be in charge of reading all the sensors and the other computers would be in a listening mode, gathering information. The problem with this scheme was the time it would require for a backup computer to take over if the master computer failed. For some very critical flight phases, there could be as little as four-tenths of a second of reaction time from when the master failed until the control of the vehicle began to be lost. A particularly critical phase was the final flare immediately prior to touchdown, when it was necessary to command an extremely rapid excursion of the elevons so as to touch down at the proper rate of descent. If the master failed just at that moment, it would have been impossible for the crew to switch to the backup computer in time. Another critical point was about 60 seconds into the ascent, where the aerosurfaces had to be moved to relieve the aerodynamic pressure on the wings. If a failure of the master occurred just at this moment, again the inability of the flight crew to switch to the backup computer sufficiently rapidly would have caused the loss of the vehicle. One way to overcome the slow reaction time of the flight crew would have been to arrange for an automatic switchover, but this raised the prospect of a faulty computer erroneously jumping into the automatic switchover code and seizing command of the vehicle. This was not considered a likely failure, but there was sufficient concern to rule out the master/slave scheme.

    The scheme that was adopted was a distributed command approach in which all of the computers process the same information simultaneously, yet remain closely synchronized in order to implement a rapid switchover. The first question that had to be addressed in pursuing this approach was, very simply, how many computers? This had to be considered along with the level of the redundancy required to satisfy the mission-driven requirement of a fail operational/fail safe system. In order to comply with these requirements it would be necessary to incorporate quadruple redundancy involving four computers, each with an independent string for the same information. In this way, a minimum of three strings would guarantee identification of a diverging or disabled unit based on the comparison of actual data produced by devices running in parallel carrying out the same functions, thereby satisfying the fail operational requirement. A fourth string would allow for a second failure to occur without losing control of the vehicle, thereby satisfying the fail safe requirement.

    Nevertheless, the final system had not four computers but five. In the beginning, to achieve the desired level of safety, the requirement was set of guaranteeing a fail operational/fail operational/fail safe approach. This would have meant incorporating five computers. However, reliability projections for fly-by-wire aircraft had shown that triple computer system failures were expected to cause loss of an aircraft three times in a million flights, whereas quadruple computer system failures would do so only four times in one thousand million flights! The cost considerations in terms of equipment and time led NASA to lower its requirement to fail operational/ fail safe, which allowed the number of computer to be reduced to four. Since five computers were already procured and designed into the system, the fifth machine was kept with the initial intention of being loaded with the system management software. Then it was decided to add the system management functions to the primary software, releasing the fifth computer to serve as a repository for the backup flight software which was at that time under development.

    Strangely enough, for the orbital flight tests there were six computers! In fact, as the first Shuttle flights loomed, Arnold Aldrich, in charge of the Shuttle Office at the Johnson Space Center, wrote a memo arguing for a sixth computer to be carried as a spare. He pointed out that because 90 per cent of avionics component failures were expected to be computer failures, and a minimum of three computers and the backup should exist for a nominal re-entry, aborts would then have to take place after one failure. By carrying a spare computer preloaded with re-entry software, the primary system could be brought back to full strength. And indeed the sixth computer was dubbed re-entry in a suitcase and carried on the early flights.

    Synchronization and redundancy management

    In essence, redundancy in the Orbiter’s avionics relied on the fact that each computer could perform all the functions necessary for a particular mission phase. But for true redundancy it was required that each computer be able to listen to all of the other computers on all of the buses (even though each computer commanded only a few of the buses) so that they could be aware of all the data generated in the current phase. Furthermore, all of the computers had also to be able to process data at the same time as the others. To preserve redundancy, a failed computer had to be able to drop out without causing any functional degradation. To achieve all this, it was necessary to devise a means of synchronizing all of the computers.

    In the beginning, the Shuttle’s designers thought it would be possible to run the redundant computers separately and then just compare answers periodically to make sure the data and computation matched. This turned out to be a poor solution, as even small differences in the oscillators that acted as clocks within the computers could soon cause the computers to get out of step. The first step towards a solution was the proposal to synchronize the computers at their input and output points. This concept was later expanded to include synchronization at points of process changes, when the system transitioned from one software module to another.

    In practice what happened was that all of the computers running some part of the PASS were in the so-called common set (CS), all communicating with one another, sharing the basic status information that they needed to know about each other over the ICC data buses 6.25 times per second. Typical information exchanged by this CS synchronization process were input/output errors, fault messages, GPC failure status, keyboard entries, memory configuration tables, system level display information, etc. Synchronization within the CS enabled all of its computers to perform regular checks to verify that they were all correctly executing the flight software.

    A978-1-4614-0983-0_1_Fig7_HTML.jpg

    GPC synchronization. (Courtesy of www.nasaspaceflight.com)

    Due to the criticality involved in flying the vehicle, the GNC major function was designed to run simultaneously in multiple GPCs, in what was called the redundant set (RS). These computers had to simultaneously execute the same part of the flight software with the same inputs. In this case the synchronization check to verify that each computer was at the same place in the flight software was carried out 400 times per second.

    It is important to understand that the difference between common and redundant sets was the kind of information received. In a redundant set, all of the computers had to receive the same input and (owing to the tight synchronization) produce the same output. If one computer failed, then the others would have been able to process the software without interruption or degradation in their performance. Guidance, navigation and control would not be affected at all by a computer dropping out of the redundant set.

    Synchronization of the redundant set worked liked this: when the software being processed by the GPCs received an input, delivered an output, or branched to a new process, it sent a 3-bit discrete signal and waited 4 milliseconds to receive similar discretes from the other computers. The discretes were coded to provide information beyond just saying here I am. For example, 010 meant an I/O operation had been achieved without error, while 011 meant the opposite. If a computer either sent the wrong synchronization code or was late, the other computers detecting either of these conditions concluded that the computer had failed and thereafter refused to listen to it or acknowledge its presence, thereby removing it from the redundant set. Each GPC could vote itself out of the redundant set in the event of noting an error in the input received over a bus to which it was connected. This last function was very important, since in order to maintain consistently identical inputs for the GPCs in the redundant set, input transactions involving these computers had to be protected. This meant that if two or more GPCs in the redundant set failed to receive and process the input data, they would all ignore the data.⁵ The protected transaction capability was maintained through the use of sync codes and certain I/O error processing techniques whereby if any member of the redundant set noted an I/O error then this information would be exchanged by all of the GPCs in the set. If more than one GPC detected an error, all the GPCs would stop listening to that unit or element. Special logic was incorporated to ensure that a single faulty GPC could not prevent the other computers in the set from receiving the data that they needed. If a single GPC was the only member of the set to detect the same I/O error two consecutive times, it would force itself to fail-to-sync and drop out the redundant set and possibly also drop out of the common set.

    Computers in the common set could receive different input, run different software and issue different output, and by means of a less tight synchronization could verify that they were all in good health and able to properly process software. If a computer failed a sync point, it was dropped out the common set. It is worth recalling that although protected transactions were used primarily in the redundant set, this philosophy also applied to transactions over the ICC buses in the common set. Again, if more than one GPC in the common set detected an I/O error on an ICC transaction, this information was shared with the other GPCs in order that they could reject the data from that ICC bus. Also, the same logic applied as in the other protected transactions. If a single GPC detected two successive I/O errors on an ICC transaction whilst the other GPCs did not, that GPC would fail-to-sync and vote itself out the set.

    Fortunately, the software needed for redundancy management required only 5 to 6 per cent of a GPC’s central processor resources. One reason why the redundancy management software was able to be kept so light was that NASA decided to move voting to the actuators, rather than before commands were sent over the data buses. To understand how this worked, it is necessary to remember that each actuator was quadruple redundant. Also, if a failed GPC issued an incorrect command, the commands of the good GPCs would prevail, physically voting down the erratic GPC and giving the crew time to remove it from the redundant set. The only serious possibility was that three computers would fail simultaneously, negating the effects of the voting. If the crew received proper warning, they could engage the backup flight system. It is important to remember that only the crew could physically remove a GPC from the redundant set. The option of having the software undertake this task was rejected in order to ensure that an unknown error in the software could not erroneously remove a computer that was not actually faulty.

    Since only GNC could be run simultaneously on more than one computer, PASS SM and PL were run on only one computer in a configuration called simplex. A computer running in simplex configuration could be in the common set (running for example SM) or not be part of the set at all, such as when it was in a frozen state or if it had been loaded with the backup flight software.⁶ During ascent and re-entry, four computers formed the common and redundant set, since they were all loaded with the PASS GNC software. In noncritical mission phases such as on-orbit, the computers were reconfigured. Two were left in the redundant set to handle guidance and navigation functions, such as maintaining the state vector. A third machine was loaded with the system management software to control life support, power, and the payload. A fourth machine would be loaded with the descent software and powered down (in the jargon freeze dried) for an emergency descent and to protect against a failure of the two memory units. Finally, the fifth computer held the BFS software.

    A978-1-4614-0983-0_1_Fig8_HTML.jpg

    Components of string 1.

    Management of the redundant set also involved the manner in which the GPCs communicated with the Orbiter’s hardware. The digital data bus network had eight flight-critical buses. FC 1 to 4 connected the GPCs to the four flight-critical forward (FF) MDMs, the four flight-critical aft (FA) MDMs, the four integrated display processors, and the two head-up displays. FC 5 to 8 connected the GPCs to the same four FF MDMs and FA MDMs, plus the two master event controllers and the three main engine interface units. These FC buses were grouped to create four strings, each one composed of one FC data bus from each of the two FC-bus groups, along with a fixed distribution of hardware connected to them. If we consider a redundant set of four GPCs, each string was assigned to a computer and that GPC would act as master and commander over the flight-critical hardware elements of that particular string. That GPC would also passively listen to the commands and data transmitted over the other three strings, each commanded by one of the other GPCs of the set. In other words, when a GPC sent a request for data to the hardware on its string, the other GPCs would hear this and receive the same data returned on that string, but without having any power of ruling over that string. This transaction would occur in parallel with the other three strings, so that all GPCs in the set would get a copy of all of the data from all four strings.

    For example, the DPS could be configured to have each GPC commanding one of the four aft right-firing reaction control system jets. Because of the redundancy set synchronization, identical inputs going through identical processing had to yield identical outputs. This meant that as the four GPCs independently executed the GNC algorithms from the same set of inputs, they would issue the same control command. If they decided that a small + Yaw correction was required, they would all issue the command to fire an aft right jet. Supposing that the jet with the highest priority was the R2R⁷ one and it was controlled by string 2, all four GPCs would issue the same command to fire the R2R jet, but this would occur only in response to the command given by GPC 2, the computer that was commanding string 2. Although at first sight this might appear to be a complicated way of organizing and managing the hardware, it gave the DPS incredible power and flexibility not only by guaranteeing nominal mission operations in the event of one string being lost due to a GPC failure but also by allowing a safe return to Earth if a second string were lost. To better illustrate this, consider a failure occurring during ascent or re-entry. For both phases, the redundant set was composed of four GPCs running PASS GNC, each of them controlling only one-quarter of all the flight-critical hardware. If we suppose that GPC 1 fails, then we must consider string 1 to have been lost and with it that portion of the hardware that it was commanding. But the three good GPCs are still commanding the flight-critical hardware and the mission is able to continue smoothly and without any degradation in performance.

    In the event of a GPC failure during ascent or re-entry, the flight rules allowed the crew to attempt a manual change of configuration, or restringing, in which one of the good GPCs was permitted to take command of the string of the failed GPC in order to restore full capability of the flight hardware. Although rehearsed during training, it is debatable whether this could have been done for real, owing to the already high workload on the crew in a very dynamic phase of the mission where their attention was devoted to verifying that the vehicle was flying the proper trajectory. However, restringing was performed without any problem on-orbit, when the redundant set was shrunk to only two GPCs. In this configuration, the two GPCs had to command two strings each.

    If a GPC failed-to-sync and dropped out the redundant set then the remaining computers performed so-called bus masking to terminate the command/listening mode of the string commanded by the failed GPC. For example, for a nominal string assignment on ascent of GPC 1/string 1, GPC 2/string 2, etc., if GPC 3 failed-to-sync with GPCs 1, 2 and 4, then GPCs 1, 2 and 4 would mask FC buses 3 and 7 (string 3) commanded by GPC 3. Similarly, if GPC 3 was still processing software it would mask FC 1 and 5, 2 and 6, and 4 and 8 (strings 1, 2 and 4) commanded respectively by GPCs 1, 2 and 4. Bus masks of this type were reset as appropriate after actions such as an OPS transition or string reassignment. Thus, continuing the example, if string 3 was reassigned to GPC 4, GPCs 1, 2, and 4 would remove their bus masks on FC 3 and 7. Bus masking was also performed in nominal situations when a GPC was required neither to transmit nor to receive data on a particular data bus. For example, the GPC set running PASS GNC would perform a bus masking on the PL data buses on-orbit, when the PL buses were assigned to the SM GPC. In turn, the SM GPC would do the same for the FCs.

    On 28 November 1983, Columbia lifted off launch pad 39A to begin STS-9, the first flight of the European Spacelab, a laboratory carried inside the payload bay that greatly enhanced the capability of the Orbiter to perform scientific experiments. The mission went very smoothly, and on 8 December the crew of six was about to bring home the rich crop harvested in the orbital fields. However, a surprise was in store. As Brewster H. Shaw, the pilot, recalls, About the time that we were reconfiguring the computers, we had a couple of thruster firings... and we got the big X-pole file on the CRT, meaning that the computer had failed. This is the first computer failure we had on the program... So I get out the emergency procedures checklist... We started going through the steps and everything. And in just a couple of minutes we had another one fail the same way, a firing of the jets and the computer failed. Due to the severity of the situation, Mission Control decided to waive off the first de-orbit opportunity in order to try to determine what was going on with the computers. The crew eventually managed to recover one of the failed GPCs, but surprisingly, When the nose gear slapped down, one of the GPCs that had recovered failed again, Shaw remembers. An investigation was immediately started. This found that both GPCs failed for a trivial reason. As Shaw continues, It turns out there were little, itty bitty slivers of solder that were loose in those two computers, and when those jets fired and the solder was floating in there, it made the solder sit down across two memory locations, changing the state of a memory location. The computer, which is always doing a self-test, sees this memory location change value and it says, ‘Something’s wrong. I’m outta there.’ And it self-failed. And the same thing happened to two of those computers. No time was lost in putting into practice the lessons learned. Shaw continues, So we went up to Oswego, New York, where IBM had a plant that built these computers... and watched them do particle impact noise detection tests where they put microphones on the GPC box, then put it on a shaker and listened for loose particles inside. It became a standard screening criteria after that time.

    BACKUP FLIGHT SOFTWARE

    When the design of the Shuttle began, NASA had proven its mettle by successfully landing astronauts on our nearest celestial neighbor, the Moon. It was an engineering triumph. Often called the fourth crew member, the Apollo spacecraft had a digital computer to perform all guidance and navigation tasks. For redundancy, there was an analog flight control system with both automatic and manual modes. In addition, a direct mode enabled the crew themselves to operate the maneuvering jets.

    The Shuttle was a much more complex spaceship than Apollo, and for this reason it could be controlled only by means of digital computers. There was no possibility of an analog backup. Owing to its total reliance on computers, the synchronization and redundant management schemes were developed to provide

    Enjoying the preview?
    Page 1 of 1