Embedded and Real Time System Development: A Software Engineering Perspective
Embedded and Real Time System Development: A Software Engineering Perspective
Volume 520
Series Editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
e-mail: [email protected]
Ajith Abraham
Editors
123
Editors
Mohammad Ayoub Khan Ashraf Darwish
Department of Computer Science and Faculty of Science
Engineering Helwan University
School of Engineering and Technology Cairo
Sharda University Egypt
Greater Noida
India Ajith Abraham
Machine Intelligence Research Labs
Saqib Saeed Scientific Network for Innovation and
Department of Computer Sciences Research Excellence
Bahria University Auburn
Islamabad USA
Pakistan
The software is the driving force for today’s smart and intelligent products. The
products and services have become more instrumented and intelligent. This trend
of interconnection between product and services is stretching software develop-
ment organizations and traditional software development model approaches to the
limit. Now a day’s embedded and real-time system contains complex software.
The complexity of embedded systems is increasing, and the amount and variety of
software in the embedded products are growing. This creates a big challenge for
embedded and real-time software development process. To reduce the complexity
of development cycle many development companies and researcher has paid their
attention to optimize the timeliness, productivity, and quality of embedded soft-
ware development and apply software engineering principles in embedded sys-
tems. Unfortunately, many available software development model do not take into
account the specific needs of embedded and systems development. The software
engineering principles for embedded system should address specific constraints
such as hard timing constraints, limited memory and power use, predefined
hardware platform technology, and hardware costs.
There is a need to develop separate metrics and benchmarks for embedded and
real-time system. Thus, development of software engineering principles for
embedded system and real-time system has been presented as an independent
discipline.
The book presents practical as well as conceptual knowledge of the latest tools,
techniques, and methodologies of embedded software engineering and real-time
systems. Each chapter presents the reader with an in-depth investigation regarding
the actual or potential role of software engineering tools in the context of the
embedded system and real-time system. The book presents state of the art and
future perspectives of embedded system and real-time system technologies, where
industry experts, researchers, and academicians had shared ideas and experiences
surrounding frontier technologies, breakthrough, and innovative solutions and
applications.
v
vi Preface
The book is organized into four parts and altogether 12 chapters. Part I, titled
‘‘Embedded Software Development Process,’’ and contains ‘‘A Flexible
Framework for Component-Based Application with Real-Time Requirements and
its Supporting Execution Framework’’ and ‘‘Automatic Development of Embedded
Systems Using Model Driven Engineering and Compile-Time Virtualisation’’. Part
II, named ‘‘Design Patterns and Development Methodology’’, contains ‘‘MADES
EU FP7 Project: Model-Driven Methodology for Real Time Embedded Systems’’–
‘‘A Fuzzy Cuckoo-Search Driven Methodology for Design Space Exploration
of Distributed Multiprocessor Embedded Systems’’. Part III, titled ‘‘Modeling
Framework’’, contains ‘‘Model-Based Verification and Validation of Safety-Critical
Embedded Real-Time Systems: Formation and Tools’’–‘‘A Multi-Objective
Framework for Characterization of Software Specifications’’. Part IV, titled ‘‘Per-
formance Analysis, Power Management and Deployment’’, contains ‘‘An Efficient
Cycle Accurate Performance Estimation Model for Hardware Software Co-Design’’–
‘‘Software Deployment for Distributed Embedded Real-Time Systems of
Automotive Applications’’. A brief description of each of the chapters follows:
‘‘A Flexible Framework for Component-Based Application with Real-
Time Requirements and its Supporting Execution Framework,’’ presents funda-
mental process of embedded real-time systems to support component-based
development process, and the schedulability analysis of the resulting software.
Authors have proposed Model-Driven Software Engineering paradigm and its
associated technologies. The development process of the proposed model is
accompanied by an Eclipse-based tool-chain, and a sample case study.
‘‘Automatic Development of Embedded Systems Using Model Driven
Engineering and Compile-Time Virtualisation,’’ presents discussion on applica-
tion of model-driven engineering and compile-time virtualization. This chapter
focuses upon new tools for the generation of software and hardware for modern
embedded systems. The presented approach promotes rapid deployment and design
space exploration. This integrated fully model-driven tool flow supports existing
industrial practices. The presented approach also has provision for automatic
deployment of architecture-neutral Java code over complex embedded
architectures.
‘‘MADES EU FP7 Project: Model-Driven Methodology for Real Time
Embedded Systems,’’ presents a complete methodology for the design of RTES
in the scope of the EU funded FP7 MADES project. MADES aims to develop
novel model-driven techniques to improve existing practices in development of
RTES for avionics and surveillance embedded systems industries. It proposes an
effective subset of existing standardized UML profiles for embedded systems
modeling.
‘‘Test-Driven Development as a Reliable Embedded Software Engineering
Practice’’, presents Test-Driven Development (TDD) promotes testing software
during its development, even before the target hardware becomes available.
Preface vii
Principally, TDD promotes a fast feedback cycle in which a test is written before
the implementation. Authors have presented four different evaluation methods for
TDD. Also, a number of relevant design patterns are discussed to apply TDD in an
embedded environment.
‘‘A Fuzzy Cuckoo-Search driven methodology for Design Space Exploration
of Distributed Multiprocessor Embedded Systems’’, discusses a methodology for
conducting a Design Space Exploration (DSE) for Distributed Multi-Processor
Embedded systems (DMPE). Authors have used fuzzy rule-based requirements
elicitation framework and Cuckoo-Search for the DMPE.
‘‘Model-Based Verification and Validation of Safety-Critical Embedded Real-
Time Systems: Formation and Tools,’’ presents a new concept of Verification,
Validation, and Testing (VV&T). The chapter covers software engineering to
system engineering with VV&T procedures for every stage of system design, e.g.,
static testing, functional testing, Unit testing, fault injection testing, consistency
techniques, Software-In-The-Loop (SIL) testing, evolutionary testing, Hardware-
In-The-Loop (HIL) testing, Black box testing, White box testing, Integration
testing, system testing, system integration testing, etc.
‘‘A Multi-Objective Framework for Characterization of Software Specifications,’’
presents the complexity of embedded systems. The complexity is exploding into
two interrelated but independently growing directions: architecture complexity and
application complexity. Authors have discussed a general purpose framework to
satisfy multiple objectives of early design space exploration. Authors have pro-
posed a multi-objective application characterization framework based on a visitor
design pattern. Authors have used MPEG-2 video decoder as benchmark that
shows viability of the proposed framework.
‘‘An Efficient Cycle Accurate Performance Estimation Model for Hardware
Software Co-Design,’’ presents a proposal for performance estimation. Authors
have measured the performance of software in terms of clock cycles. In this
measurement the availability of hardware platform is critical in early stages of the
design flow. The authors have proposed to implement the hardware components at
cycle-accurate level such that the performance estimation is given by the micro-
architectural simulation in number of cycles. Authors have measured the perfor-
mance as a linear combination of function performances on mapped components.
The proposed approach decreases the overall simulation time while maintaining
the accuracy in terms of clock cycles.
‘‘Multicast Algorithm for 2D de Bruijn NoCs,’’ presents De Bruijn topology for
future generations of multiprocessing systems. Authors have proposed de Bruijn
for Networks-on-Chips (NoCs). Also, the chapter proposes a multicast routing
algorithm for two-dimensional de Bruijn NoCs. The proposed routing compared
with unicast routing using Xmulator simulator under various traffics.
‘‘Functional and Operational Solutions for Safety Reconfigurable Embedded
Control Systems,’’ presents run-time automatic reconfigurations of distributed
embedded control systems following component-based approaches. Authors have
proposed solutions to implement the whole agent-based architecture, by defining
UML meta-models for agents. Also, to guarantee safety reconfigurations of tasks
viii Preface
at run-time, a service and reconfiguration processes for tasks, and use the sema-
phore concept to ensure safety mutual exclusions is defined.
‘‘Low Power Techniques for Embedded FPGA Processors,’’ presents low power
techniques for embedded FPGA processors. Authors have emphasized that clock
signals is a great source of power dissipation because of high frequency and load.
Authors have presented investigation and simulation of clock gating technique to
disable the clock signal in inactive portions of the circuit. The chapter also presents
Register-Transfer Level model in Verilog language.
‘‘Software Deployment for Distributed Embedded Real-Time Systems of
Automotive Applications,’’ presents a deployment model for automatic applica-
tions. The chapter discussed the software deployment problem, tailored to the
needs of the automotive domain. Thereby, the focus is on two issues: the con-
figuration of the communication infrastructure and how to handle design con-
straints. It is shown, how state-of-the-art approaches have to be extended in order
to tackle these issues, and how the overall process can be performed efficiently, by
utilizing search methodologies.
This book has three groups of people as its potential audience, (i) undergraduate
students and postgraduate students conducting research in the areas of embedded
software engineering and real-time systems; (ii) researchers at universities and
other institutions working in these fields; and (iii) practitioners in the R&D
departments of embedded systems. This book differs from other books that have
comprehensive case study and real data from software engineering practices.
The book can be used as an advanced reference for a course taught at the
postgraduate level in embedded software engineering and real-time systems.
ix
x Contents
Developing software for Real-Time (RT) systems is a challenging task for software
engineers. Since these systems have to interact with both the environment and human
operators, they are subject to operational deadlines. Also, it is essential that they be
so designed as to involve no risk to the operators, the environment, or the system
itself. Thus, designers have to face two different problems, namely software design
and software analysis, complicated by the fact that time plays a central role in RT
systems. There are many well-known software disciplines that provide solutions to
each of the aforementioned problems in the literature:
Software design: Software Architecture constitutes the backbone for any success-
ful software-intensive system, since it is the primary carrier of a software system’s
quality attributes [27]. Component-Based Software Development (CBSD) is a
bottom-up approach to software development, in which applications are built from
small modular and interchangeable unit, with which the architecture of a system
can be designed and analyzed [29]. Frameworks [17] and design patterns [18] are
the most successful approaches to maximize software quality and reuse available
nowadays.
Software analysis: Software analysis is, perhaps, a broader area than software
design, since there are many characteristics that can be analyzed in a piece of
software, depending on the needs of each stakeholder. Thus, it is possible to use
model checking [4], validation and verification tools [5, 6], schedulability analysis
tools [21, 28], to mention but just a few.
However, is it very difficult to combine the results from both disciplines, since it
implies to reconcile the design and analysis worlds, which are concerned with very
different application aspects, and therefore use very different concepts: components
the former and threads the latter. To ensure that the analyzed models correspond to
the input architectural description, it is necessary to establish univocal correspon-
dences between the concepts of both domains. There are different ways of defining
such correspondences, but most of them imply constraining the implementation to
just a few alternatives, when it would be desirable to select among various alterna-
tives. Typical examples of this are component models that implement components
as processes; or those where all components are passive and invoked sequentially by
the run-time; or those that enforce a given architectural style, like pipes & filters, etc.
This chapter describes a flexible development approach for supporting a
component-based development process of real-time applications, and the schedu-
lability analysis of the resulting software. The word “flexible” in the previous sen-
tence is used to emphasize that our work does not impose a rigid implementation but
rather provides the user with some implementation options, as described in the rest of
the chapter. The approach revolves around the Model-Driven Software Development
(MDSD) paradigm [12, 26] and its associated technologies. They provide the theo-
retical and technological support for defining the most suitable abstraction levels at
which applications are designed, analyzed, deployed, etc., as well as the automatic
evolution of models through the defined abstractions levels. Thanks to model trans-
formations, models can automatically evolve from design to analysis without the
user having to make such transformation manually.
The approach described in this chapter comprises three abstraction levels, namely:
(1) architectural software components for designing applications, (2) processes for
configuring the application deployment and concurrency, and (3) threads and syn-
chronization primitives for analyzing its schedulability. Figure 1 represents the rela-
tionships existing among these levels by using the well-known MDSD pyramids.
This process has been integrated in an Eclipse-based tool-chain, also described in
this chapter.
A Flexible Framework for Component-Based Application 5
Fig. 1 Considered abstraction levels, organized in the well-known MDSD pyramid from two
orthogonal points of view: modeling languages and concepts
The three abstraction levels that comprise the proposed development approach are
supported, respectively, by a language for modeling component-based applications,
a component framework implemented in C++, and the Cheddar analysis tool. All
these tools are integrated and supported by a MDSD tool-chain that enables models
to smoothly evolve from components to objects and analysis models. The devel-
opment approach is based on the particular interpretation of the MDSD approach
offered by the Model-driven architecture (MDA) [23] standard. In MDA, Platform-
Independent Models are created at the level of abstraction provided by components,
and Platform-Specific Models are supported by an object oriented framework, enti-
tled FraCC (Framework for Concurrent Components) and implemented in C++,
which provides platform-specific run-time support. The evolution of the applica-
tion through the different abstraction levels is automatically performed by means of
model transformations. Obviously, this approach can be followed using other frame-
work, providing it fulfills the application requirements. In the field of RTS, there have
been very promising results with the MDA approach. Significant examples include
the Artist Design Network of Excellence in Embedded System Design [2] and the
OpenEmbeDD project [25].
Model transformations enable the automatic evolution of models into other models
or into executable code. But transformations are complex software artifacts, diffi-
cult to understand, develop and maintain. Moreover, model transformations have a
non-modular structure that prevents them from being reused (totally or partially) in
systems that may have similar requirements. The use of frameworks reduce the com-
plexity of model transformations, since they have only to specialize their hot-spots,
not to generate the whole application, and thus transformation maintenance and evo-
lution is dramatically simplified. As a side effect, MDA can help simplifying the use
of frameworks by hiding the complexity of their specialization mechanisms, as stated
in [1]. In addition, the use of software frameworks for the PSM level offers additional
advantages, namely: (1) they are normally designed for fulfilling the non-functional
requirements of the application domain they target; and (2) they can facilitate final
application reconfiguration, provided that they have tools for that purpose. On the
other side, the framework implementation is a very time-consuming task, making its
development only advisable when it can be reused in many similar applications.
In the proposed development approach, we distinguish three roles: that of frame-
work developer, that of MDA supporter, and that of application developer. These
roles can be played by the same or different persons or teams. This article focuses
on the application developer role, describing the tools and artifacts he/she can use to
develop component-based applications and analyze their temporal behavior. Starting
from a set of requirements (functional and non-functional), the application devel-
oper (1) designs the specific application using an architectural component-oriented
modeling language, (2) he/she then executes a model transformations in order to
generate the application, and (3) he/she can use the configuration tools provided by
FraCC to make further modifications to the generated application, thus configuring
A Flexible Framework for Component-Based Application 7
its deployment. In addition, models enable early validation and verification of appli-
cation properties, while other properties cannot be verified until the final implemen-
tation is obtained. Our purpose is twofold: to lessen development times and prototype
testing, by using a MDA development environment, and, on the other hand, to analyze
the application as soon as possible.
3 Related Work
In this section, we will briefly review some of the most relevant works related to
the technologies used in development approach presented in this book chapter:
component-based design and component models, analysis tools, frameworks and
pattern languages.
Among general-purpose component models, we may cite Fractal [8], the CORBA
component model [24], KobrA [3], SOFA 2.0 [9], SaveCCM [13], and ROBO-
COP [20], among others. Fractal, SOFA, SaveCCM and ROBOCOP provide dif-
ferent kind of ADLs to describe the application architecture, and a code generation
facility that generates the application skeleton in Java or C/C++, which must be later
completed by the developer. The CORBA CCM was developed to build component-
based applications over the CORBA communication middleware. It provides an IDL
to generate the external structure of the components. KobrA is one of the most popular
proposals, in which a set of principles is defined to describe and decompose a soft-
ware system following a downstream approach based on architectural components.
But in all cases the implementation of the code and structure of the component is
still completely dependent on the developer. A complete and updated classification
of component models can be found in [14], where the authors propose a classifi-
cation framework for studying component models from three perspectives, namely
Lifecycle, Construction and Extra-Functional Properties.
Regarding analysis tools, the Spin model checker [6] is a widely used software
tool for specifying and verifying concurrent and distributed systems that uses linear
temporal logic for correctness specifications. Uppaal [5] is a toolbox for verifying
RTS, which provides modeling and query languages. Uppaal is designed to verify
systems that can be modeled as networks of timed automata [7] extended with integer
variables, structured data types, user defined functions, and channel synchronization.
Cheddar [28] is a free real-time scheduling tool, designed for checking temporal
constraints of an RTS. MAST [21] defines a model to describe the timing behavior
of RTS designed to be analyzable via schedulability analysis techniques, and a set
of tools to perform such analysis. It can also inform the user, via sensitivity analysis,
how far or close is the system from meeting its timing requirements.
Frameworks are one of the most reused software artifacts, and their development
has been widely studied [17]. New, more general and innovative proposals have
recently appeared in the literature, focusing on the development and use of frame-
works for software systems development in general. In [16], the authors propose a
method for specializing object-oriented frameworks by using design patterns, which
8 D. Alonso et al.
provide a design fragment for the system as a whole. A design fragment, then, is a
proven solution to the way the program should interact with the framework in order
to perform a function. A conceptual and methodological framework for the defin-
ition and use of framework-specific modeling languages is described in [1]. These
languages embed the specific features of a domain, as offered by the associated
framework, and thus facilitate developers the use of such frameworks.
As Buschmann et al. [10] state, not all domains of software are yet addressed
by patterns. However, the following domains are considered targets to be addressed
following a pattern-language based development: service-oriented architectures, dis-
tributed real-time and embedded systems, Web 2.0 applications, software architec-
ture and, mobile and pervasive systems. The research interest in the real-time system
domain is incipient and the literature is still in the form of research articles. A tax-
onomy of distributed RT and embedded system design patterns is described in [15],
allowing the reader to understand how patterns can fit together to form a complete
application.
Timed automata are the key artifacts of the language and modeling approach
described in this book chapter, since they decide which code the component executes,
and whether the component react to messages sent to it or not. They link structure
with code (represented by activities). Also, timed automata regions define the unit
of computational load assigned to threads in the Deployment modeling package,
since in a given region there is one and only one active state which code should
be executed by the component. Finally, activities represent logic units of work, that
must be performed periodically or sporadically, depending on the component state.
Activities in FraCC are programmed in C++ and then linked to the state in which
they are executed. Activities only depend on the interface definitions, and therefore
can be reused in several timed automata.
The described modeling language is embedded in the Eclipse tool-chain, as
described in Sect. 5, while some screenshots of its use are shown in Sect. 6, where a
cruise control system is developed by using the language and its associated tools.
10 D. Alonso et al.
AD1 Control over concurrency policy: number of processes and threads, thread
spawning (static vs. dynamic policies), scheduling policy (fixed priority sched-
ulers vs. dynamic priority scheduler), etc. Unlike most frameworks, these con-
currency issues are very important in order to be later able to perform real-time
analysis, and thus the framework should allow users to define them.
AD2 Control over the allocation of activities to threads, that is, control over the
computational load assigned to each thread, since we consider the activity associ-
ated to a state as the minimum computational unit. The framework allows allocat-
ing all the activities to a single thread, allocating every activity to its own thread,
or any combination. In any case, the framework ensures that only the activities
belonging to active states are executed.
AD3 To avoid “hidden” code, that is, code which execution is outside the devel-
oper’s control. The code that manages the framework is treated as “normal” user
code, and therefore he can assign it to any thread.
AD4 Control over the communication mechanisms between components (syn-
chronous or asynchronous).
AD5 Control over component distribution in different nodes.
The design and documentation of the framework was carried out using design
patterns, which is a common practice in Software Engineering [11]. In order to
describe the framework we will use Figs. 2 and 3. Figure 2 shows the pattern sequence
that has been followed in order to meet the architectural drivers mentioned above,
while Fig. 3 show the classes that fulfill the roles defined by the selected patterns. At
this point, it is worth highlighting that the same patterns applied in a different order
would have led to a very different design.
Among the aforementioned drivers, the main one is the ability to define any num-
ber of threads and control their computational load (architectural drivers AD1 and
AD2). This computational load is mainly determined by the activities associated
to the states of the timed automata. In order to achieve this goal, the Command
Processor architectural pattern [10] and its highly coupled Command pattern [18]
have been selected, and they were the firsts to be applied in the framework design, as
shown in Fig. 2. The Command Processor pattern separates service requests from
their execution by defining a thread (the command processor) where the requests are
managed as independent objects (the commands). These patterns impose no con-
straints over command subscription to threads, number of commands, concurrency
scheme, etc. The roles defined by these two patterns are realized by the classes
ActivityProcessor and RegionActivity, respectively (see Fig. 3).
A Flexible Framework for Component-Based Application 11
Fig. 2 Dependency relationships between the patterns considered in the framework development.
Though the patterns are numbered, the design was iterative, and most of the patterns had to be
revisited, leading to many design modifications
Another key aspect, related to AD3 and AD4, is to provide an object oriented
implementation of timed automata compatible with the selected patterns for concur-
rency control, in order to integrate it in the scheme defined by the aforementioned
Command Processor pattern. It is also an aspect that has a great impact on the
whole design, since timed automata model the behavior of the components. Timed
automata are managed following the Methods for States pattern [10], and the
instances of the classes representing it are stored in a hash table. The class Region
is an aggregate of States, and it is related to a subclass of RegionActivity,
which defines how regions are managed. FraCC provides two concrete subclasses:
FsmManager and PortManager. The former is in charge of (1) the local manage-
ment of the region states (transition evaluation, state change, etc.), and (2) invoking
the StateActivity of the region active state, while the latter is in charge of send-
ing messages through output ports. The subclasses of RegionActivity constitute
the link between concurrency control and timed automata implementation, since they
are those that are allocated to command processors.
Conditions, transitions and events are modeled as separate classes, as shown in
Fig. 3. Condition is an abstract class used to model transitions’ conditions. It
provides an abstract method to evaluate the condition. The only concrete subclass
is StateActiveCondition, which tests whether a specific state is active. But
the user can create his own subclasses to model other kind of conditions. The class
Transition includes the source and target states, the event that triggers it, and a
set of conditions vectors that must be evaluated to determine if the transition should
be executed.
The next challenge is how to store and manage the component internal data,
including all the states and activities mentioned above, the data received or that must
be sent to other components, the transitions among states, event queues, etc. All these
data is organized following the Blackboard pattern. The idea behind the blackboard
pattern is that a collection of different threads can work cooperatively on a common
data structure. In this case, the threads are the command processors mentioned above.
12 D. Alonso et al.
Fig. 3 Simplified class diagram of the developed framework showing some of the patterns involved
in its design
The main liabilities of the Blackboard pattern (i.e. difficulties for controlling and
testing, as well as synchronization issues in concurrent threads) are mitigated by the
fact that each component has its own blackboard, which maintains a relatively small
amount of data. Besides, the data is organized in small hash tables. The roles defined
by this pattern are realized by the classes Data and V3Data.
As shown in Fig. 2, the Blackboard pattern serves as a joint point between
timed automata and the input/output messages sent by components through their
ports. Component ports and messages exchanged between them are modeled as
separate classes. The classes representing these entities are the classes V3Port and
V3Msg, shown in Fig. 3. The communication mechanism implemented by default
in FraCC is the asynchronous without reply scheme, based on the exchange of
messages following the Message pattern. In order to prevent the exchange of many
small messages, we use the Data Transfer Object pattern to encapsulated in
a single message all state information associated to a port interface, which is later
serialized and sent through the port. Finally, since components encapsulate their
inner state, we use the Copied Value pattern to send copies of the relevant state
information in each message. All these patterns are described in [10].
A Flexible Framework for Component-Based Application 13
Fig. 4 Architectural software components for application design: artifacts, models, and eclipse
tool-chain screenshots
It should be highlighted that FraCC does not give any guidance as to the num-
ber of threads that have to be created or how activities should be assigned to them,
but it provides the necessary mechanisms to enable the user to choose the appro-
priate heuristic methods, for example the ones defined in [19]. Both the number
of threads as well as the allocation of RegionActivitys to them can be done
arbitrary, but the main objective should be “ensure application schedulability”. For
instance, a heuristic we normally use in our applications is to assign to the same
thread RegionActivitys that have similar periods.
A Flexible Framework for Component-Based Application 15
Fig. 5 Configuration of application deployment: artifacts, models, and eclipse tool-chain screen-
shots
The case study that illustrates the use of FraCC and its associated tool-chain is a
simplified version of the well-known “Cruise Controller Development” [19]. The
original case study includes calibration and monitoring functions, which are not
taken into account in the current example, since we decided to include only those
functional requirements directly related to real-time system development. Besides,
unlike the original solution, which is object-oriented, we develop a component-based
one.
The cruise control system is in charge of automatically controlling the speed of
a vehicle. The elements involved in the system are the brake and accelerator pedals,
and a control level. This level has tree switch positions: ACCEL, RESUME, and
OFF. The required cruise control functions are:
16 D. Alonso et al.
Fig. 6 Application and analysis models: artifacts, models, and eclipse tool-chain screenshots
ACCEL: with the cruise control level held in this position, the car accelerates
without using the accelerator pedal. After releasing the level, the reached speed
is maintained (referred to as the “cruising speed”) and also “memorized”.
OFF: by moving the control level to the OFF position, the cruise control is switched
off, independently of the driving or operating condition. The cruise control is
automatically switched off if the brake pedal is pressed.
RESUME: by switching the level to the RESUME position, the last “memorized”
speed can be resumed. The “memorized” speed is canceled when the car engine
is turned off.
Due to their extension, it is not possible to show in a single figure all the components
plus the timed automata that model their behavior. Therefore, we will first show the
complete application architecture (see Fig. 7), while the rest of the timed automata
will be progressively introduced.
As shown in Fig. 7, the cruise control is configured as a centralized appli-
cation, comprising five components. Four of them encapsulate hardware access
(Brake_Sensor, Velocity_Sensor, Control_Level, Throttle_Actuator), while the fifth
A Flexible Framework for Component-Based Application 17
Fig. 7 Architecture of the cruise control application. Interface messages are shown as comments
one (Cruise_Control) models the whole control system and orchestrates the rest of
the components.
The Cruise_Control component periodically receives messages from the sensor
components, and, based on the data they provide and on the system state, calculates
the action command and sends it to the Throttle_Actuator component. All the mes-
sages exchanged among components are always sent through the appropriate ports,
as shown in Fig. 7. The Cruise_Control timed automata comprises three orthogo-
nal regions: Brake_Region, Control_Level_Region, and Cruise_Control_Region, as
shown in Figs. 8 and 9, respectively. This last region comprises the following states:
Initial state. When the driver turns the engine on, the region enters the initial state.
The component remains in this state as long as no attempt is made to engage
cruise control. In initial state, unlike Crusing_Off state, there is no previously
stored cruising speed.
Crusing_Off state. When the driver either engages the level in the Off position
(Off_E event) or presses the brake (Brake_On_E event), the cruise control is
deactivated.
Accelerating state. When the driver engages the cruise control level in the ACCEL
position (Accel_E event), the component enters into the Accelerating state and
18 D. Alonso et al.
Fig. 8 Two of the regions that comprise the timed automata describing the behavior of the
Cruise_Control component: Brake_Region on the left (stores the state of the car brake), and Con-
trol_Level_Region on the right (stores the state of the control level)
Fig. 9 The last region that comprise the timed automata describing the behavior of the
Cruise_Control component
accelerates automatically, provided that the brake is not pressed (guard Brake_Off
state).
Cruising state. When the driver releases the level (Cruise_E event), the current
speed is saved as the cruising speed and the component enters the Cruising state,
the car speed is automatically maintained at the cruising speed.
Resuming State. When the driver engages the level in the Resume position
(Resume_E event), and providing the brake is not pressed, the car automatically
accelerates or decelerates to the cruising speed. When the car reaches the desired
speed, the region enters Cruising state (Reached_Crusing_E event).
Sensor components share the same timed automata, shown in Fig. 10 (left), though
the activity that is periodically executed in each case is different. The same applies to
the actuator component, shown in Fig. 10 (right). The activity associated to the state
in each component is in charge of reading the sensor state, and then send messages
to the Cruise_Control component.
A Flexible Framework for Component-Based Application 19
Fig. 10 Regions for controlling the sensors (left) and actuator (right) components
All the components describe above contain an additional region in their timed
automata, not shown in the figures but present in the models, in charge of port
management (as described in Sect. 4.2).
Once the deployment model has been completed, a model-to-text XTend transfor-
mation (see Fig. 6) generates an analysis file for the Cheddar analysis tool. In order
to perform the schedulability analysis, Cheddar requires the number of tasks, their
temporal characteristics (WCET and period), and the number of shared resources of
the application. The Threads of the deployment model are directly transformed into
Cheddar tasks, but shared resources must be derived from the deployment model,
given FraCC’s memory structure and the assignment of RegionActivitys
to threads made in the deployment model. Only buffers that are accessed by
RegionActivitys assigned to different threads should be protected.
As mentioned in Sect. 5, one of the main distinguishing features of FraCC is
the separation between architecture and deployment, which makes it possible to
test different deployments (number of computational nodes, number of concurrent
processes and threads, as well as the computational load assigned to every thread and
their timing properties) easily without needing to modify the architecture. Figure 11
shows the schedulability analysis results, performed with Cheddar, of three different
deployments of the cruise control application.
The work described in this chapter is part of a more general approach for the
development of component-based application supported by MDA technologies. The
described MDA tool-chain hinders the complexity of the development process and
automates the generation of both the final application and the analysis models. From
our experience with the use of MDA technologies, model transformations are perhaps
the most complex MDA artifacts, both in their design and maintenance. The higher
the conceptual gap between the source and target abstraction levels, the higher the
20 D. Alonso et al.
(a)
(b)
(c)
Fig. 11 Schedulability results of the Cheddar analysis tool for three different deployments of
the cruise control application, as modeled in Fig. 7. a Analysis results of a deployment with one
thread to which all RegionActivitys have been assigned to. b Analysis results of a deployment
with five threads, to which RegionActivitys have been assigned to according to their periods.
c Analysis results of a deployment with twelve threads, one for each RegionActivity
A Flexible Framework for Component-Based Application 21
Acknowledgments This work has been partially supported by the Spanish CICYT Project
EXPLORE (ref. TIN2009-08572), the Séneca Project MISSION-SICUVA (ref. 15374/PI/10), and
the Spanish MEC FPU Program (grant AP2009-5083).
References
1 Introduction
2 Background
This section will discuss the unique challenges of embedded development and some
of the ways that they are currently addressed. Section 2.1 discusses the complex
hardware architectures found in embedded systems, Sect. 2.2 discusses the prob-
lems faced by developers of safety-critical and high-integrity systems, and Sect. 2.3
describes industrial concerns.
2.2 Criticality
can use in order to support timing analysis of the application software. The commonly
used model [6] makes the following assumptions:
• The units of computation in the system are assigned a potentially dynamic priority
level.
• At any given time the executing thread can be determined from the priorities in the
system and the states of the threads. i.e. Earliest Deadline First scheduling states
that the thread with the nearest deadline has the highest priority and should be
executing, unless it is blocked.
• Priority inversion (deviations from the above point) in the final system can be
prevented, or predicted and bounded.
• Threads contain code with bounded execution times. This implies bounds on loop
iterations, predictable paths through functions, restrictions on expected input data,
and limitations on exotic language features like code migration, dynamic dispatch,
or reflection.
• Blocking throughout the system is bounded and deadlock free.
Finally, once predictable hardware and software are developed it is still necessary
for the highest levels of certification (such as the avionics standard DO-178B) to
demonstrate traceability from requirements to software elements. Currently this is
not well supported by existing toolchains.
2.4 Summary
Metamodel
conforms to
System Model
representation of
Fig. 1 Basic relations of representation and conformance in MDE (adapted from [21])
• Generating lower-level models and code from higher-level, more abstract models;
• Mapping between different models;
• Querying and extracting information from models;
• Refactoring models;
• Reverse engineering of abstract models from concrete ones.
Model transformations are computer programs, which define how one or more
input models can be transformed into one or more output models. A model transfor-
mation is usually specified as a set of relations that must hold for a transformation to
be successful. The input and output models of the transformation have to conform
to a metamodel.
A model transformation is specified at the metamodel level and establishes a
mapping between all the models, which conform to the input and output metamodels.
Model transformations in MDE follow the model transformation pattern illustrated in
Fig. 2. The execution of the rules of a transformation program results in the automatic
creation of the target model from the source model. The transformation rules, as well
as the source and target models conform to their corresponding metamodels. The
transformation rules conform to the metamodel of the transformation language (i.e.
its abstract syntax), the source model conforms to the source metamodel and the target
model conforms to the target metamodel. At the top level of this layered architecture
lies the meta-metamodel, to which all the other metamodels conform.
Transformation Validation
Migration Language (Flock)
Language (ETL) Language (EVL)
The approach proposed by this chapter is not dependent on the model manage-
ment framework. However, Epsilon was preferred because of some of its unique
features simplify the implementation activities. Such features include the support
of Epsilon for interactive model transformations, the fine-grained traceability mech-
anism of EGL, as well as the framework’s focus on reusability and modularity.
Moreover, Epsilon is a mature model management framework with an active and
large community.
Given the problems highlighted in Sect. 2, it can be seen that software development for
many modern embedded systems is very challenging. Any solution to these problems
must be industrially-acceptable so from the discussions in Sects. 2.3 and 2.2 the
following requirements can be obtained:
CTV introduces a virtualisation layer over the target hardware, called the Virtual
Platform (VP). This is shown in Fig. 4. The VP is a high-level view of the underlying
hardware that presents the same programming model as the source language (in this
case Java) to simplify development. For Java, it presents a homogeneous symmet-
ric multiprocessing environment with a single monolithic shared memory, coherent
caching, and a single uniform operating system. This is equivalent to a standard
desktop computer running an operating system like Linux or Windows and is the
environment in which Java’s runtime expects to operate. Therefore, the developer
can write normal, architecture-independent Java code.
As its name implies, the VP is a compile-time only construct, it does not exist
at run-time. This is because the VP’s virtualisation is implemented by a source-
to-source translation layer that is guided by the virtualisation mappings (that map
threads to CPUs and data to memory spaces). This can be seen in Fig. 5. The job of
the source-to-source translation is to translate the architecturally-independent input
software into architecturally-specific output code that will operate correctly on the
target hardware, according to the provided mappings.
Unlike a standard run-time virtual machine, the virtualisation mappings are
exposed to the programmer. This allows the programmer to influence the implemen-
tation of the code and achieve a better mapping onto the architecture. For example,
by placing communicating threads on CPUs that are physically close to each other,
or locating global data in appropriate memory spaces to minimise copying. Such
design space exploration can be performed very rapidly because software can be
moved throughout the target system without recoding.
32 N. Audsley et al.
Application
RTOS Application
Refactored
Virtual Virtual Application
Platform Platform Refactored
Virtual
RTOS Application
machine Virtualisation layer
For examples of how this trade off can reduce overheads, see Sect. 4.2.4.
Some additional benefits of the VP is that its use abstracts hardware changes
from the software developer. The developer only has to target the VP rather than the
actual hardware and if the hardware is changed at a later date, the same software can
be retargeted without any recoding or porting effort. Similarly, because the VP is
implemented to support development in existing languages, developers do not have
to be trained to use a new language and existing legacy code can be more easily
reused. Also, because the architecture-specific output code is still valid Java, no new
compilers or tool need to be written. This is of vital importance to high-integrity
systems that require the use of trusted compilers, linkers, and other tools.
The CTV approach is different to techniques such as Ptolemy II [9] which aim
to provide new higher-level and more appropriate abstractions for programming
complex systems. CTV is instead designed to allow existing languages and legacy
code to be used to effectively target such systems through the use of very low-
overhead virtualisation. The two different approaches can actually be complementary
Development of Embedded Systems Using MDE and CTV 33
and used together, with CTV used as a low-overhead intermediary to bring legacy
code or legacy programming languages into an otherwise Ptolemy-defined system.
CTV is the name for the general technique. Section 4.2 will now discuss AnvilJ,
the specific implementation of CTV that is implemented in the MADES project.
4.2 AnvilJ
AnvilJ is an implementation of CTV for the Java programming language and its
related subsets aimed at ensuring system predictability, such as the RTSJ. The AnvilJ
system model is shown in Fig. 6. Its input is a single Java application modelled as
containing two sets:
Collectively, AnvilJ Threads and Shared Instances are described using the umbrella
term AnvilJ Instances. AnvilJ Instances are static throughout the lifetime of the sys-
tem; they are created when the system starts and last until system shutdown.
An AnvilJ Instance may communicate with any other AnvilJ Instance, however
the elements it has created may not communicate with the created elements of other
AnvilJ Instances. This restriction allows the communication topology of the system to
be determined at compile-time and the required runtime support to be reduced, as dis-
cussed later. This approach is particularly suited to embedded development because it
mirrors many of the restrictions enforced by high-integrity and certification-focussed
language subsets (such as the Ravenscar subsets of Ada [5] and Java [24] or the
MISRA-C coding guidelines [41]).
34 N. Audsley et al.
AnvilJ Instances
application
Input
AnvilJ AnvilJ Shared
Thread Instance
Processing
architecture
node (JVM)
Target
Endpoint Endpoint
Channel Memory
In AnvilJ, the main unit of computation in the target hardware is the processing
node. A processing node models a Real-Time Java Virtual Machine (JVM) [31]
in the final system (or a standard JVM with accordingly reduced predictability).
The Java specification does not define whether a multicore system should contain a
single JVM for the entire system [1, 19] or one per core. Therefore AnvilJ models
the JVMs, rather than the processors. The JVMs need not have similar performance
characteristics or features. As with all CTV implementations, every AnvilJ Instance
is mapped to exactly one node. AnvilJ Instances cannot migrate between processing
nodes, but (if supported by the Java implementation) other instances can.
Nodes communicate using channels, which are the communication primitives of
the target architecture. AnvilJ statically routes messages across the nodes of the sys-
tem to present the totally-connected communications assumed by Java. The designer
provides drivers for the channels of the system. Memories represent a contiguous
logical address space and endpoints connect processing nodes to other hardware
elements. Every AnvilJ Shared Instance must be mapped to either exactly one node
(on the heap of the JVM), or exactly one memory where it will be available to all
nodes connected to that memory.
This model is compile-time static—the number of AnvilJ Instances does not
change at runtime. This is consistent with the standard restrictions that are imposed
by most real-time programming models (as discussed in Sect. 2.2. For example,
Ravenscar Ada [5] forbids all dynamic task allocation, whereas AnvilJ only forbids
dynamic AnvilJ Instances. This is in contrast to systems like CORBA which adopt
a “dynamic-default” approach in which runtime behaviour is limited only by the
supported language features. Such systems support a rich runtime model but the
resulting system can be heavyweight as they are forced to support features such as
system-wide cache coherency, thread creation and migration or dynamic message
routing, even if not required by the actual application. The approach of CTV is
“static-default” in which the part of the application modelled is static. The restricted
programming model promises less, but the amount of statically-available mapping
information allows the required runtime support to be significantly reduced.
Development of Embedded Systems Using MDE and CTV 35
To aid the use of AnvilJ, MADES integrates it directly into the model-driven engi-
neering (MDE) flow of the project. This is not mandatory for AnvilJ, which can be
used independently. In order to integrate AnvilJ it is necessary to provide the designer
with a way of expressing a high-level view of the target hardware (in terms of the
AnvilJ system model) and a high-level view of relevant parts of the input software.
Not all the input software needs to be modelled, only the parts that are to be marked
as AnvilJ Instances (Sect. 4.2.1). Also, the allocation of AnvilJ instances from the
software model to the processing nodes of the hardware model must be provided.
This information is then translated from the designer’s model into the form which
is required by the AnvilJ tool. The translation is implemented using the Epsilon model
transformation language, which is described in detail in Sect. 3.2. In the MADES
framework, this information is provided by the designer through the use of 13 stereo-
types which are applied to classes in the system model. These MADES stereotypes
36 N. Audsley et al.
are described in Table 1. The modelling tool used in the MADES flow (Modelio
[28]) supports two additional diagram types that use the MADES stereotypes; the
detailed hardware specification and the detailed software specification. Allocations
are performed with a standard allocation diagram. Working with these additional
diagrams aids the designer because the MADES stereotypes can be automatically
applied.
For a more detailed look at how the modelling is performed to integrate AnvilJ,
Sect. 6 presents a case study that shows the development of a subcomponent of an
automotive safety system.
4.2.4 Overheads
AnvilJ’s static system model allows most of the required support to be implemented
at compile-time, resulting in a small runtime support system, especially when com-
pared with much larger (although more powerful) general-purpose frameworks. As
will be shown in this section, the main overhead in an AnvilJ system is that of the
Object Manager (OM). The OM is a microkernel which exists on every processing
Development of Embedded Systems Using MDE and CTV 37
node of the system and implements the AnvilJ system model. The OMs use a
message-passing communications model to implement shared memory, locks, remote
method calls etc.
The full version of the OM compiles to approximately 34 kB of class files including
debugging and error information. However it is also possible to create smaller OMs
which only support a subset of features for when the software mapped to a node
does not require them. For example, if a node contains AnvilJ Shared Instances but
no AnvilJ Threads then 5.7 kB of support for ‘Thread creation and joining’ can be
removed. If none of the shared methods of a node are called then the shared methods
subsection can be removed. The advantage of AnvilJ’s offline analysis is that this can
be done automatically each time, based on the exact input application and hardware
mappings. Table 2 shows a breakdown of some of the feature sets of the OM and
their respective code footprint.
Figure 7 compares this size to other similar systems. It should be noted that this
comparison is provided purely to contextualise the size metric and demonstrate that
AnvilJ’s size is small, relative to related embedded frameworks. The other systems
graphed, especially the CORBA ORBs, are built to support general-purpose, unseen
software and consequentially are much more heavyweight.
3000
Max Min
CodefootprintinkB(approx.)
2500
2000
1500
1000
500
0
Anvil AnvilJ PercPico uCLinux TAOORB ZENORB
Fig. 7 The code footprint of the AnvilJ runtime compared to systems from a similar domains.
Anvil is a C-based implementation of CTV, Perc Pico [2] implements safety-critical Java on systems
without an OS, uClinux is a reduced Linux kernel for microprocessors with MMUs, and TAO [37]
and ZEN [22] are Real-Time CORBA ORBs
38 N. Audsley et al.
Detailed hardware
specification
MHS file
FGPA bitfile
FPGA design
In addition to the small code size of the OM, its runtime memory footprint is also
modest. The full OM in a desktop Linux-based system uses approximately 648 bytes
of storage when idle, which increases as clients begin to use its features.
• Very rapid prototyping and design space exploration can be achieved using this
method due to the fact that hardware architectures can be constructed in the devel-
oper’s modelling environment rather than vendor tools.
• MDE allows a vendor-neutral way of modelling and generating architectures. The
same models could be used to target a wide range of FPGAs, ASICs, or even other
hardware description languages like SystemC, however such an approach would
not support the full flexibility of these systems.
• The same model is used as a source for both the software generation and hard-
ware generation flows. These models share a consistent meta-model and so have
related semantics. This gives confidence in the final design, because the software
generation flow is refactoring code according to the same hardware model used
by the hardware generation flow. In essence, the two flows ‘meet in the middle’
and support each other.
When creating the detailed hardware specification diagram, the hardware only
needs to be modelled at a high level of abstraction. The platform is modelled as a
class stereotyped with the stereotype «mades_architecture». Each detailed hardware
specification contains exactly one such class. Properties in the «mades_architecture»
stereotype are used to guide the software generation process by denoting the entry
point class of the input application and allocating the initial Main thread to a process-
ing node.
The details of the architecture are modelled with the MADES hardware stereo-
types. Processing nodes («mades_processingnode») are the elements of compu-
tation in the platform. Each node supports a logical JVM. They communicate
with other nodes through the use of channels. Nodes connect to channels using
the «mades_channelmedia» endpoint stereotype. Memories («mades_memory») are
data-storage elements and are connected to channels using «mades_memorymedia».
Other hardware elements («mades_ipcore») are connected to channels through the
use of the «mades_devicemedia» endpoint stereotype.
The top-level hardware stereotype «mades_hardwareobject» defines a property
called iptype. This is passed to the hardware generation transformation to specify
the type of hardware which should be instantiated. Further properties can also be
passed depending on the value of iptype. For an example of this see the case study
in Sect. 6.5.
Clock domains are modelled by classes stereotyped with the «mades_clock»
stereotype. Clock synthesis is restricted by the capabilities of the implementation
target and the IP cores used. A set of design rules are first checked using model
verification to ensure that the design can be realised. These are:
• The total number of clock domains is not higher than the limit for the target FPGA.
• All communications across clock boundaries use an IP core that is capable of
asynchronous signalling (such as a mailbox).
• All IP cores that require a clock are assigned one.
Each clock has a target frequency in the model and is implemented using the
clock manager cores of the target FPGA. As with all FPGA design, the described
40 N. Audsley et al.
constraints are necessary but not sufficient conditions. During synthesis the design
may use more clock routing resources than are available on the device, in which case
the designer will have to use a more powerful FPGA or reduce the clock complexity
of the design.
Currently, interfaces (IO with the outside world) have to be taken from the IP
library or manually defined in VHDL or Verilog. It is not the aim of this approach to
provide high-level synthesis of hardware description languages such as in Catapult-C
[27] or Spec-C [10], although such approaches can be integrated by wrapping the
generated core as an IP core for the Xilinx tools.
This section will present a case study to illustrate the benefits of the MADES Code
Generation approach and show how CTV/AnvilJ is integrated into the design flow.
This case study will detail the development of a subsection of an automotive safety
system called the Car Collision Avoidance System (CCAS). The CCAS detects obsta-
cles in front of the vehicle to which it is mounted and, if an imminent collision is
detected, applies the brakes to slow the vehicle. In this case study we focus on a small
part of the detection subsystem and show how the MADES code generation allows
architecture-independent software to be generated to process the radar images with-
out concern for the target platform. Multiple hardware architectures can be modelled
and the software automatically deployed over auto-generated hardware.
Section 6.1 gives a block-level overview of the developed component and Sect. 6.2
discusses how the initial software is developed. The generation of the software and
hardware models is covered in Sects. 6.3 and 6.4. The generation of the target hard-
ware is detailed in Sect. 6.5 and finally Sect. 6.6 discusses deploying the software to
the generated hardware.
The developed subsystem takes images from the radar (or camera) and applies JPEG-
style compression to reduce the size of the image and therefore reduce the demand
on on-chip communications. Once reduced in size, the images are passed on to other
parts of the system for feature extraction and similar algorithms. The block diagram
of the subsystem is given in Fig. 9. The main stages of the subsystem are as follows:
• Read Image: Periodically reads images from the input to the system from a radar
or camera.
• DCT: A Discrete Cosine Transformation moves the representation of the image
from the spatial domain into the frequency domain.
Development of Embedded Systems Using MDE and CTV 41
Image input
Read Image DCT Quantization
(Radar / Camera)
Rest of the
Inverse DCT
system
Display
Monitoring
output
Fig. 9 Block diagram of the implemented subsystem. A monitoring output stage is included to
allow verification of the subsystem during system development
Developing the software for this subsystem is very simple when using AnvilJ because
the developer can develop as if the code will execute on a standard desktop Java envi-
ronment. However, the developer must observe the restrictions detailed in Sect. 4.2.2.
Also it is not possible to develop the low-level drivers for the radar/camera input
through AnvilJ directly, so for the purpose of testing and initial development stub
drivers should be used that operate on the development platform. Final hardware
interfacing must be done once deployment is underway as is normal practice.
The main restriction imposed by AnvilJ is that AnvilJ Instances must be static
and only communicate through other AnvilJ Instances. This forces the developer
to consider the structure of their code carefully, as is the case with all embedded
development. The refactoring engine of AnvilJ allows the entire operation of the
subsystem to be detailed using a single Java program, even though the final hardware
platform may involve multiple heterogeneous processing elements. The code was
structured as follows:
The thread processes images in its work queue, and passes completed images to
the next thread.
• Each thread is designated as an AnvilJ Thread. This ensures that all communica-
tions in the system go through AnvilJ Instances.
• The output stage is designated an AnvilJ Shared Instance.
• Standard implementations of the DCT and Quantize stages are used from open
source, freely-available code. This is one of the great advantages of AnvilJ in that
often legacy code can be integrated easily.
Having created the software, its functionality can be verified immediately simply
by executing the code in the development environment. It is not necessary to use
simulators, cross-compilation or similar. The result of the software operating on a
test image is shown in Fig. 10 and a listing of the Main class can be found in Fig. 11.
Note that the listing is standard Java code, no extra-linguistic features are required.
Fig. 11 Listing of the Main class that initialises the implemented subsystem
software model. This links the instance in the detailed software specification diagram
to the source code.
Figure 12 shows the completed detailed software specification diagram. The dia-
gram is very simple as its only purpose is to add AnvilJ Elements to the software
model and link them to the source code with binding keys. Note that the use of the
«mades_thread» and «mades_sharedobject» stereotypes.
Fig. 12 The detailed software specification diagram for the case study subsystem
44 N. Audsley et al.
Fig. 13 The detailed hardware specification diagram for the case study target architecture. Not
shown are properties in the classes that describe each hardware element in greater detail
Having modelled the software, this section will now describe how the target hardware
platform is modelled for AnvilJ integration. Recall that according to the AnvilJ system
model from Sect. 4.2.1, it is only necessary for the hardware model to cover a high-
level view of the capabilities of the target platform; in terms of processing nodes,
memories, channels, and application-specific IP cores.
In this case study we will describe two target platforms and show how the same
input software can be automatically deployed without recoding. The first presented
architecture is a dual-processor system with a non-uniform memory architecture,
shown in Fig. 13.
Once the detailed hardware model is complete, the hardware generation flow can
be initiated.
The designer uses the MADES model transformations of Sect. 3.2 to transform
the architecture modelled in Sect. 6.4 into an implementable hardware description.
Development of Embedded Systems Using MDE and CTV 45
Fig. 14 Fragment of the MHS generated by transforming the case study architecture of Fig. 13
Fig. 15 An allocation diagram that deploys software from the detailed software specification dia-
gram of Fig. 12 to the detailed hardware specification of Fig. 13
thread to CPU1 and all other threads to CPU2. The diagram that performs this
allocation can be seen in Fig. 13 .
With the addition of the allocation diagram the is model is now complete, so it
is exported in XMI format for use in the Eclipse IDE. Once imported to Eclipse, an
Epsilon model transformation is used to create an AnvilJ architecture description.
This file is created from the hardware, software, and allocation diagrams and is the
input to the AnvilJ refactoring engine. It tells AnvilJ what the structure of the input
software will be, which elements are AnvilJ Instances, the topology of the target
platform, and how to place the AnvilJ Instances throughout the platform. Figure 16
shows the architecture description for the case study (Fig. 15).
Once an architecture description is created, the AnvilJ refactoring engine can
be invoked at any time to refactor the architecturally-neutral Java application (an
Eclipse project) into a set of architecturally-specific output programs, one for each
processing node of the target platform as described in the hardware diagram. As
the case study architecture has two processing nodes, two output projects will be
created. AnvilJ is fully-integrated into the Eclipse Development Environment. After
refactoring is complete, the output applications can be verified by executing both.
AnvilJ’s default implementation uses TCP sockets for inter-node communications,
with the intent that developers replace this with the actual communications drivers
of the target platform. However, this default allows immediate testing on standard
networks. In this case, the two output projects coordinate as expected. The node with
ReadThread reads example radar images and passes them to the other node now
running in a separate JVM on which quantizeThread and dctThread process
Development of Embedded Systems Using MDE and CTV 47
Fig. 16 The AnvilJ architecture description for the case study. Note the binding keys correlate with
those of the software diagram in Fig. 12
them. outputStage displays the processed images. The two output binaries can
be placed on separate networked computers with the same functional behaviour. The
single input program has been automatically converted into a networked program
according to the allocation diagram in the system model.
This code sets up and initialises the Object Manager (OM, AnvilJ’s runtime
support) for the current node. The implementation of the OM is automatically gener-
ated in the anvilj.refactored package and is unique to each processing node
of the final system. For example, the AnvilJ Routing object contains routes to the
other nodes of the system with which this OM will need to communicate. Nodes
that it does not communicate with are not detailed. If the code is updated then more
or fewer routes may be added, but this will always be a minimal size. Routes are
planned offline according to the detailed hardware specification diagram.
Note that two of the calls to Thread.start() have been rewritten by the
refactoring engine to calls into the OM. This is because the threads dctThread
and quantizeThread are allocated to another processing node, so they are started
by calling into the AnvilJ runtime. The runtime sends a ‘start thread’ message to the
processing node that hosts the given thread. The call to start thread readThread
has not been translated, however, because it is allocated to the current node. If the
allocation diagram is altered and AnvilJ is rerun, the refactored calls will change.
Retargeting the case study for a new architecture is simply a case of preparing a
new detailed hardware specification diagram and amending the allocation diagram.
Figure 17 shows a revised target architecture. This is the same as the original case
study architecture (shown in Fig. 13) however a third processor has been added.
The revised allocation diagram allocates the threads more evenly and can be seen in
Fig. 18.
Once the model has been updated, it is re-exported as XMI and AnvilJ re-run.
As the hardware diagram now contains three processing nodes, this produces three
output projects with the AnvilJ Instances distributed as described by the allocation
diagram. Once again, initial functional verification can be performed by executing the
three output projects and observing that the functional behaviour is again identical.
Development of Embedded Systems Using MDE and CTV 49
Fig. 17 Revised hardware specification diagram for the case study target architecture
7 Conclusions
This chapter has presented some of the major problems encountered when devel-
oping complex embedded systems. The hardware architectures of such systems are
characterised by the use of non-standard, application-specific features, such as mul-
50 N. Audsley et al.
are being considered within the T-CREST [42] project which aims to build a time
predictable NoC based multiprocessor architecture, with supporting compiler and
WCET analysis.
References
20. ITRS, International Technology Roadmap for Semiconductors, 2007 edn. (2007), http://www.
itrs.net/
21. F. Jouault, J. Bézivin, M. Barbero, Towards an advanced model-driven engineering toolbox
(Innovations Syst. Softw, Eng, 2009)
22. R. Klefstad, M. Deshpande, C. O’Ryan, A. Corsaro, A.S. Krishna, S. Rao, K. Raman, The
performance of ZEN: A real time CORBA ORB using real time Java (In Proceedings of Real-
time and Embedded Distributed Object Computing Workshop, OMG, Sept, 2002)
23. L.R.R.F.P.D.S. Kolovos, Extensible platform for specification of integrated languages for model
management (Epsilon) (2010), http://www.eclipse.org/gmt/epsilon
24. J. Kwon, A. Wellings, S. King, Ravenscar-Java: A high integrity profile for real-time Java. In
Joint ACM Java Grande/ISCOPE Conference (ACM Press, New York, 2002), pp. 131–140
25. P. Marwedel, Embedded System Design (Springer, New York, 2006)
26. T. Mattson, R.V. der Wijngaart, M. Riepen, T. Lehnig, P. Brett, W. Haas, P. Kennedy, J. Howard,
S. Vangal, N. Borkar, G. Ruhl, S. Dighe, The 48-core SCC processor: the programmer’s view
(Storage and Analysis (SC), In International Conference for High Performance Computing,
Networking, 2010), p. 2010
27. Mentor Graphics. Catapult-C synthesis (2009), http://www.mentor.com/catapult
28. Modeliosoft. Modelio—The open source modeling environment (2012), http://www.
modeliosoft.org/
29. P. Mohagheghi, V. Dehlen, Where Is the Proof?—A review of experiences from applying MDE,
in industry, In Model Driven Architecture U Foundations and Applications, vol. 5095, Lecture
Notes in Computer Science, ed. by I. Schieferdecker, A. Hartman (Springer, Berlin, 2008), pp.
432–443
30. Object Management Group. UML Profile for MARTE: Modeling and Analysis of Real-Time
Embedded Systems (2009), http://www.omgmarte.org/
31. F. Pizlo, L. Ziarek, and J. Vitek, Real time Java on resource-constrained platforms with Fiji
VM. In Proceedings of JTRES, JTRES ’09, ACM, New York, pp. 110–119 (2009)
32. A.L. Pope, The CORBA reference guide: understanding the Common Object Request Broker
Architecture (Addison-Wesley Longman Publishing Co., Inc., Boston, 1998)
33. J. Reineke, D. Grund, C. Berg, R. Wilhelm, Timing predictability of cache replacement policies.
Real-Time Syst. 37, 99–122 (2007). doi:10.1007/s11241-007-9032-3
34. M. Rivas, M. González Harbour, MaRTE OS: An Ada Kernel for real-time embedded applica-
tions, in Reliable SoftwareTechnologies U Ada-Europe 2001, vol. 2043, ed. by D. Craeynest,
A. Strohmeier (Springer, Berlin, 2001), pp. 305–316
35. L.M. Rose, R.F. Paige, D.S. Kolovos, F.A. Polack. The Epsilon generation language. In
ECMDA-FA ’08: Proceedings of the 4th European conference on Model Driven Architec-
ture (Springer, Berlin, 2008), pp. 1–16
36. J.C.H. Roth, Digital systems design using VHDL (Pws Pub. Co., Boston, 1998)
37. D.C. Schmidt, D.L. Levine, S. Mungee, The design of the TAO real-time object request broker.
Comput. Commun. 21(4), 294–324 (1998)
38. Texas Instruments Inc. OMAP5430 mobile applications platform (2011), http://focus.ti.com/
pdfs/wtbu/OMAP5_2011-7-13.pdf
39. The Eclipse Foundation. Eclipse Java development tools (2011), http://www.eclipse.org/jdt/
40. The MADES Consortium. The MADES Project (2011), http://www.mades-project.org/
41. The Motor Industry Software Reliability Association. Guidelines for the Use of the C Language
in Critical Systems. MISRA Ltd., 2004
42. The T-CREST Consortium. The T-CREST Project (2012), http://www.3sei.com/t-crest/
43. T. Weilkiens, Systems engineering with SysML/UML: Modeling, analysis design (Morgan Kauf-
mann Publishers Inc., San Francisco, 2008)
44. D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey, M. Mattina, C.-C. Miao,
J. Brown, A. Agarwal, On-chip interconnection architecture of the tile processor. Micro, IEEE
27, 15–31 (2007)
45. D. Wiklund, D. Liu, SoCBUS: Switched Network on Chip for Hard Real Time Embedded
Systems. In IPDPS ’03, p. 78.1 (2003)
Development of Embedded Systems Using MDE and CTV 53
46. Xilinx Corporation. Embedded System Tools Reference Guide—EDK 11.3.1. Xilinx Applica-
tion, Notes, UG111 (2009)
47. Xilinx Corporation. Platform Studio and the Embedded Development Kit (EDK) (2012), http://
www.xilinx.com/tools/platform.htm
48. Xilinx Corporation. Virtex-5 FPGA Configuration User Guide. Xilinx User Guides, UG191
(2006)
Part II
Design Patterns and Development
Methodology
MADES EU FP7 Project: Model-Driven
Methodology for Real Time Embedded Systems
Abstract The chapter presents the EU funded FP7 MADES project that focus on
real-time embedded systems development. The project proposes a high abstrac-
tion level based model-driven methodology to evolve current practices for real-time
embedded systems development in avionics and surveillance industries. In MADES,
an effective SysML/MARTE language subset along with a set of new tools and tech-
nologies have been developed that support high-level design specifications, verifi-
cation and automatic code generation, while integrating aspects such as component
based Intellectual Property (IP) re-use. In this book chapter, we first present the
MADES methodology and related diagrams developed to fulfill our goals; followed
by a description of the underlying tool set developed in the scope of the MADES
project. Afterwards, we illustrate the MADES methodology in the context of a car
collision avoidance system case study to validate our design flow.
1 Introduction
I. R. Quadri · A. Sadovykh
Softeam, 21 Avenue Victor Hugo, Paris 75016, France
e-mail: [email protected]
A. Sadovykh
e-mail: [email protected]
A. Bagnato (B)
TXT e-solutions, Via Frigia 27, 20126 Milan, Italy
e-mail: [email protected]
tools and technologies that support high-level SysML/MARTE system design spec-
ifications, their verification and validation (V&V), component re-use, followed by
automatic code generation to enable execution platform implementation.
The contribution related to presenting the MADES methodology based on mixed
SysML/MARTE usage is of utmost importance. While a large number of works
deal with embedded systems specifications using only either SysML or MARTE,
we present a combined approach and illustrate the advantages of using these two
profiles. This aspect is significant in nature as while both these profiles provide
numerous concepts and supporting tools, they are in turn difficult to be mastered by
system designer. For this purpose, we present the MADES language, which focuses
on an effective subset of SysML and MARTE profiles and proposes a specific set of
unique diagrams for expressing different aspects related to a system, such as hard-
ware/software specifications and their eventual mapping. In the paper, an overview
of the MADES language and the associated diagrams is presented, that enables rapid
design and incremental composition of system specifications. The resulting models
then can be taken by the underlying MADES tool set for goals such as component
re-use, verification or automatic code generation, which are also briefly detailed in
the chapter.
Afterwards, we illustrate the various concepts present in the MADES language by
means of an effective real-life embedded systems case study: a car collision avoidance
system (CCAS) that integrates the MADES language and illustrates the different
phases of our design methodology and implementation approach. This case study
serves as a reference guide to the actual case studies provided by the MADES end
users: more specifically an onboard radar control unit provided by TXT e-solutions
and a ground based radar processing unit provided by Cassidian (an EADS company).
Hence, the results obtained from the CCAS case study are in turn integrated in the
actual MADES case studies.
In this section, we first provide an overview of the SysML and MARTE profiles
and then describe the related works focusing on their usage. While a large number of
researches exist that make use of either SysML or MARTE for high-level modeling of
RTES, due to space limitations, it is not possible here to give an exhaustive description
and we only provide a brief summary on some of the works that make use of SysML
or MARTE based high abstraction levels and MDE, for RTES design specification
and implementation.
60 I. R. Quadri et al.
System Modeling Language (SysML) is the first UML standard for system engi-
neering proposed by Object Management Group [8] that aims at describing complex
systems. SysML allows describing of the functional requirements in graphical or
tabular form to aid with model traceability, and provides means to express the com-
position of the system by means of blocks and related behavior by means of UML
inspired Activities, Interactions, State Machines, etc. This profile also provides the
designer with parametric formalisms which are used to express analytical models
based on equations.
However, while SysML is used in the RTES community, it was not mainly created
for modeling of embedded system designs. Non-functional properties such as timing
constraints, latency and throughput that are crucial for the design of RTES are absent
in this profile. This is not the case of the UML MARTE profile.
The MARTE profile extends the possibilities to model the features of software and
hardware parts of a real-time embedded system and their relations. It also offers
added extensions, for example to carry out performance and scheduling analysis,
while taking into consideration the platform services (such as the services offered
by an OS). The profile is structured in two directions: first, the modeling of concepts
of real-time and embedded systems and secondly, the annotation of the models for
supporting analysis of the system properties. These two major parts share common
concepts: for expressing non-functional properties (NFPs), timing notions, resource
modeling (such as computing, storage resources), UML inspired components based
modeling (concepts such as classes, instances, port and connectors) and allocation
concepts, among others.
Additionally, MARTE contains certain concepts present in other standards and
frameworks, which permit to increase synergy between designers of different
domains. Architecture Analysis and Design Language (AADL), that has its ori-
gins in the avionic domain, is a SAE1 standard for the development of real-time
embedded systems. In [9], the authors compared the relationship between AADL and
MARTE concepts. Similarly, Automotive Open System Architecture (AUTOSAR)
[10] is a standardized and open automotive software architecture framework, devel-
oped jointly by different automobile manufacturers, suppliers and tool developers.
With regards to AUTOSAR, MARTE already covers many aspects of timing: such
as specification of over-sampling and under-sampling in end-to-end timing chains
(commonly found in complex control systems). In [11], the SPIRIT consortium’s
IP-XACT UML profile has been proposed, which is a specialization of the current
MARTE profile.
2 http://www-01.ibm.com/software/rational/services/harmony/
3 http://www-01.ibm.com/software/awdtools/rhapsody/
62 I. R. Quadri et al.
MADES Language
Design Models
Hardware-independent
software
MHS description
Hardware/ Hardware
software architecture
VHDL
mappings description
Hardware description
Compile-Time Virtualization generation
User Simulation
Embedded software generation scripts input scripts
Figure 2 gives an overview of the underlying MADES language present in the overall
methodology for the initial model based design specifications. The MADES language
focuses on a subset of SysML and MARTE profiles and proposes a specific set of
diagrams for specifying different aspects related to a system: such as requirements,
hardware/software concepts, etc. Along with these specific diagrams, MADES also
uses classic UML diagrams such as State and Activity diagrams to model internal
behavior of system components, along with Sequence and Interaction Overview
diagrams to model interactions and cooperation between different system elements.
Softeam’s Modelio UML Editor and MDE Workbench [31] enables full support of
MADES diagrams and associated language, as explained later on in Sect. 4.1. We
now provide a brief description of the MADES language and its related diagrams.
In the initial specification phase, a designer needs to carry out system design at
high abstraction levels. This design phase consists of the following steps:
64 I. R. Quadri et al.
profiles. Afterwards, the designer can move onto the hardware/software partition-
ing of the refined functional specifications. These following steps are elaborated by
means of MARTE concepts.
Related to the MARTE modeling, an allocation between functional and refined
functional level specifications is carried out using a MADES Allocation Diagram.
Afterwards, a Co-Design approach is used to model the hardware and software
aspects of the system. The modeling is combined with MARTE Non-Functional
Properties and Timed Modeling package to express aspects such as throughput, tem-
poral constraints, etc. We now describe the hardware and software modeling, which
are as follows:
• Hardware Specification: The MADES Hardware Specification Diagram in com-
bination with concepts defined in MARTE’s Generic Resource Modeling package
enables modeling of abstract hardware concepts such as computing, communica-
tion and storage resources. This phase enables a designer to describe the physical
system in a generic manner, without going into too much details regarding the
implementation aspects. By making use of MARTE GRM concepts, a designer
can describe a system such as a car, a transport system, flight management system,
among others.
• Detailed Hardware Specification: Using the Detailed Hardware Specification Dia-
gram with MARTE’s Hardware Resource Modeling package allows extension,
refinement or enrichment of concepts modeled at the hardware specification level.
It also permits modelling of systems such as FPGA based System-on-Chips (SoCs),
ASICs etc. A one-to-one correspondence usually follows here: for example, a
computing resource typed as MARTE ComputingResource is converted into
a hardware processor, such as a PowerPC or MicroBlaze [32], effectively stereo-
typed as MARTE HwProcessor. Afterwards, an Allocation Diagram is then
utilized to map the modeled hardware concepts to detailed hardware ones.
• Software Specification: The MADES Software Specification Diagram along with
MARTE’s Generic Resource Modeling package permits modeling of software
aspects of an execution platform such as schedulers and tasks; as well as their
attributes and policies (e.g. priorities, possibility of preemption).
• Detailed Software Specification: The MADES Detailed Software Specification
Diagram and related MARTE’s Software Resource Modeling are used to express
detailed aspects of the software such as an underlying Operating System, (OS),
threads, address space, etc. Once this model is completed, an Allocation Dia-
gram is used to map the modeled software concepts to detailed software ones:
for example, allocation of tasks onto OS processes and threads. This level can
express standardized or designer based RTOS APIs. Thus multi-tasking libraries
and multi-tasking framework APIs can be described here.
• Clock Specification: The MADES Clock Specification Diagram (not shown in
Fig. 2) is used to express timing and clock constraints/aspects. It can be used to
specify the physical/logical clocks present in the system and the related constraints.
This diagram makes use of MARTE’s Time Modeling concepts such as clock types
66 I. R. Quadri et al.
and related constraints. Here, designers model all the timing and clock constraint
aspects that could be used in all the other different phases.
Iteratively, several allocations can be carried out in our design methodology: an
initial software to hardware allocation may allow associating schedulers and schedu-
lable resources to related computing resources in the execution platform, once the
initial abstract hardware/software models are completed, in order to reduce Design
Space Exploration (DSE).
Subsequently this initial allocation can be concretized by further mapping of the
detailed software and hardware models (an allocation of OS to a hardware memory,
for example), to fulfill designer requirements and underlying tools analysis results.
An allocation can also specify if the execution of a software resource onto a hardware
module is carried out in a sequential or parallel manner. Interestingly, each MADES
diagram only contains commands related to that particular design phase, thus avoid-
ing ambiguities of utilization of the various concepts present in both SysML and
MARTE, while helping designers to focus on their relative expertise. Additionally,
UML behavioral diagrams in combination with MADES concepts (such as those
related to verification) can be used for describing detailed behavior of system com-
ponents or the system itself.
Finally, the MADES language also contains additional concepts used for the
underlying model transformations for code generation and verification purposes,
which are not present in either SysML or MARTE, and are detailed in [29]. Once the
modeling aspects are completed, verification and code generation can be carried out.
These aspects are out of the scope of this chapter, and we refer the reader to [29, 33]
for complete details.
We now describe the MADES tool set that enables to move from high level
SysML/MARTE modeling to verification, code generation and eventual implemen-
tation in execution platforms.
In the frame of the MADES project, Softeam [34] has developed a dedicated exten-
sion to its Modelio UML Editor and MDE Workbench. Modelio fully supports the
MADES methodology and underlying language while providing various additional
features such as automatic document generation and code generation for various
platforms. Modelio is highly extensible and can be used as a platform for building
new MDE features. The tool allows building UML2 Profiles, combined with a reach
graphical interface for dedicated diagrams, model element properties editors and
MADES EU FP7 Project 67
action commands controls. The users have access to several extensions mechanisms:
light Python scripts or Java API. Finally, Modelio is available in both open source
and commercial versions, and nearly all the MADES diagrams are present in the
open source version of Modelio (all except the SysML inspired requirements speci-
fications), in order to carry out RTES modeling, using the MADES methodology.
As seen in Fig. 3, Modelio has developed unique MADES diagrams, as specified
earlier in Sect. 3.1, along with a set of unique commands for each specific diagram
and related design phase. Thus, when a designer is working on a particular phase
that suits his/her particular expertise: such as detailed hardware specification, he/she
will be able to create concepts such as processors, RAM/ROM memories, caches,
bridges, buses etc. The advantage of this approach is that designers do not have to
understand the various concepts present in SysML and MARTE and do not need to
guess which UML concept (such as classes, instances, ports) should be applicable to
which particular design phase, and which particular stereotype should be applied to
that concept. The commands are also assigned simple names in order for someone
not highly familiar with SysML/MARTE standards to guess at their functionality. For
example, in the figure, the Processor command present in the Class model
section of the command palette signifies the possibility of creation of a proces-
sor class. This command in turn automatically creates a dually stereotyped class
with two stereotypes: a MARTE HwProcessor stereotype and a MADES stereo-
type mades_processingnode. The second stereotype is used by the underlying
model transformations for verification and code generation purposes, in a complete
transparent manner for the end user. It should be observed that these additional con-
cepts which are not present in either SysML/MARTE are not needed to be mastered
by a designer carrying out the model based specifications. A designer just needs
to determine which concepts from the palette will be needed for modeling of the
platform and the underlying MADES concepts are added automatically to these con-
cepts, thanks to a mapping between SysML/MARTE concepts and those needed
by the model transformations. This mapping has been defined in [29] and is only
needed for the detailed software/hardware specification design phases. Finally, over-
all design time and productivity can be increased due to the development of specific
diagram set and related commands in Modelio, available both in the open source and
commercial versions.
to Implement the modeled system, and translates the user-provided software to make
use of this layer. If the hardware or allocations are changed in the model then the
generated runtime layer is automatically reduced or expanded accordingly.
Additionally, generation of hardware descriptions of the modeled target
architecture is possible, as the MADES transformations allow for the generation
of implementable hardware descriptions of the target architecture from the input
system modeled via Modelio. The hardware related model transformations generate
hardware description for input to standard commercial FPGA synthesis tools, such
as Xilinx ISE and EDK tools. Presently, the model transformation are capable of
generation Microprocessor Hardware Specification (MHS) which can be taken by
Xilinx tools to generate the underlying hardware equivalent to that modeled using
the MADES language.
The model transformations also enable verification of functional/non-functional
properties, as results from Zot are fed back into Modelio in order to give the user
feedback on the properties and locate errors, if any are found. The code generation
facilities present in the model transformations are used to integrate the back-end of
the verification tool, which Zot, with the front-end, which are the models expressed
using the MADES language. Traceability support is also integrated in the model
transformations for tracing the results of the verification activity back to the mod-
els, for tracing the generated code back to its source models and finally for tracing
requirements to model elements such as use cases or operations, as well as to imple-
mentation files and test cases.
Thus these model transformations assist with mapping the programmer’s code to
complex hardware architectures, describing these architectures for implementation
(possibly as an ASIC or on an FPGA) and verifying the correctness of the final system.
Thus, while MADES does not support automatic hardware/software partitioning of a
system, it enables designers to carry out automatic hardware/software generation of
their specified models and enable software refactoring. Detailed descriptions about
these model transformations, along with their installation and usage guidelines have
been provided in [29, 33].
The car collision avoidance system or CCAS case study for short, when installed
in a vehicle, detects and prevents collisions with incoming objects such as cars and
pedestrians. The CCAS contains two types of detection modules. The first one is a
radar detection module that emits continuous waves. If a transmitted wave collides
with an incoming object, it is reflected and received by the radar itself. The radar
sends this data to an obstacle detection module (ODM), which in turn removes the
noise from the incoming signal along with other tasks such as a correlation algorithm.
MADES EU FP7 Project 71
The distance of the incoming object is then calculated and sent to the controller for
appropriate actions.
The image processing module is the second detection module installed in the
CCAS. It permits to determine the distance of the car from an object by means of
image tracking. The camera takes pictures of incoming objects and sends the data to
the image processing module, which executes a distance algorithm. If the results of
the computation indicate that the object is closer to the car than a specified default
value that means a collision can occur. The result of this data is then sent to the
controller. The controller when receiving the data, acts accordingly to the situation
at hand. In case of an imminent collision, it can carry out some emergency actions,
such as stopping the engine, applying emergency brakes; otherwise if the collision
is not imminent, it can decrease the speed of the car and can apply normal brakes.
The CCAS system development is described in detail subsequently. It should be
mentioned that various modeled components present in the case study are also stored
in the MADES CRP to serve as hardware/software product catalogues. For exam-
ple, a component showcasing a radar functionality can be re-used in another modeled
application dealing with an on board or ground based radar system. Similarly, a Dis-
crete Cosine Transformation or DCT4 algorithm inside the image tracking subsystem
can have several implementations such as 1-D or 2-D based, which can be stored in
the CRP with different version names. Depending upon end user requirements and
Quality of Service criteria (performance, power consumption etc.), a designer can
swap one implementation with the other, facilitating IP re-use.
The CCAS design specifications start with SysML based modeling, which
involves the initial design decisions such as system requirements, behavioral analy-
sis and functionality description, before moving onto MARTE based design phases
(Fig. 6).
Fig. 6 The CCAS installed on a car to avoid collisions with incoming objects
4 http://en.wikipedia.org/wiki/Discrete_cosine_transform
72 I. R. Quadri et al.
Once the requirement phase is partially completed, the next step is to describe the
initial behavioral specifications associated with the system. For the particular case of
CCAS, use cases are used to define the different scenarios associated with a car on
which the CCAS is installed, as shown in Fig. 8. The creation of a MADES use case
specification package guided the user by automatically creating a top level Use Case
Diagram using built-in features in Modelio. The Avoid Collisions scenario
makes use of other specified scenarios and is the one that is related to the system
requirements described earlier.
Once the requirements and use case scenarios of our system are specified; we move
onto the functional block description of the CCAS system as described in Fig. 9.
For this, MADES Functional Block Specification or Internal Functional Block
Specification Diagram(s) are used. This conception phase enables a designer to
describe the system functionality without going into details how the functional-
ity is to be eventually implemented. Here the functional specification is described
using SysML block definition diagram concepts. These functional blocks rep-
resent well-encapsulated components with thin interfaces that reflect an ideal-
ized system modular architecture. The functional description can be specified by
means of UML concepts such as aggregation, inheritance, composition etc. Equally,
hierarchical composition of functional blocks can be specified by means of internal
blocks, ports and connectors. Here we use these concepts for describing the global
composition of the CCAS. The Car block is composed of some system blocks
such as a Ignition System, Charging System, Starting System,
74 I. R. Quadri et al.
Fig. 10 Mapping the Avoid Collisions use case to the Car Collision Avoidance
Module block
the first type is used [29]. For the sake of simplicity, in the chapter, a refinement
allocation (colored in orange) illustrates its related tagged values while these aspects
are omitted for the Co-Design allocation (colored in red).
Having completed the previous steps, it is now possible to complete the requirement
specifications, as described in Fig. 11. As seen here, a related use case scenario and a
functional block have been added to the figure, which helps to complete and satisfy
the functional requirements. It should be noted that as seen in the figure, the Car
Collision Avoidance Module block is utilized to satisfy the global system
requirements, so it is this module that is the focus of the subsequent design phases.
Once the initial design descriptions have been specified, it is possible to partition
and enrich the high-level functionalities. For this, MARTE concepts are used to
determine which parts of the system are implemented in software or hardware along
with their eventual allocation. Additionally, MARTE profile enables the expression
of non-functional properties (NFP) related to a system, such as throughput, worst
case execution times, etc. The subsequent MARTE based design phases are described
in the following.
We now turn towards the MARTE based modeling of the CCAS. All necessary con-
cepts present at the Functional Level Specification Diagram correspond to an equiv-
alent (or refined) concept at the Refined Functional Level Specification Diagram.
Since we are only interested in the Car Collision Avoidance Module
at the functional level specification, an equivalent MARTE component is created.
The RH_Car Collision Avoidance Module is stereotyped as a MARTE
RtUnit that determines the active nature of the component. Figure 12 shows the
related modeling of this concept. The RtUnit modeling element is the basic build-
ing block that permits to handle concurrency in RTES applications [7]. It should be
mentioned that component structure and hierarchy should be preserved between the
functional and refined functional level specification diagrams. As in this particular
example, no hierarchical compositions are present at the functional level specifica-
tions for Car Collision Avoidance Module, they are equally not present
in the underlying refined functional level specifications.
mechanism, we express that the allocation is structural (The structural aspects are
thus related from source to target) and spatial in nature.
Once the initial specification has been carried out, modeling of hardware and soft-
ware aspects of the required functionality is possible in a parallel manner. For that
purpose, we first create a “clock catalogue” (itself stored in the MADES CRP) using
MARTE time concepts (which can be used to describe different timing aspects such
as physical/logical or discrete/dense clocks etc.), as illustrated in Fig. 14, depicting
the available clock elements (such as clock types) that are to be used by the execution
platform of the CCAS. Here, an initial ideal clock type serves as the basis for the
Main and System clock types. In this case study, all the clocks types are discrete in
nature using the MARTE Time package, and their clock frequencies can be specified
using the related tagged values (not visible in the figure).
Thus, three clock types are specified: an IdealClock (with a clock frequency
of 50 MHz) that serves as base for the two other clock types, SystemClock and
HardwareClock (with respective frequencies of 100 and 150 MHz). All modeled
At the hardware specification level, the abstract hardware concepts of the execution
platform are modeled first as shown in Fig. 15. The abstract hardware modeling con-
tains the controller for radar module along with its local memory; the image process-
ing module and a shared memory, a system clock and other additional hardware
resources (radar and camera modules, braking system, etc.); all of which communi-
cate via a CAN bus.
The MARTE GRM package stereotypes are applied onto the different hard-
ware resources: for example ComputingResource for the controller and the
image processing module, StorageResource for the local and shared memo-
ries, CommunicationMedia for the CAN bus, while DeviceResource is used
for the other hardware components. Here, the hardware specification also contains a
clock sysclk of the type SystemClock specified earlier in Fig. 14. Here using
the MARTE Time package, we add a clock constraint onto the clock, specifying that
this clock (and related clock type) runs at a rate 10 times faster than that of the ideal
clock (and the ideal clock type).
The hardware specification contains different hardware components which them-
selves are either further composed of sub components, or have internal behaviors,
expressed by means of classic UML behavioral diagrams. We now describe the inter-
nal behavior of three hardware components:
In Fig. 16, we describe the internal behavior of the Radar component by means
of a state machine diagram. The RadarBehavior state machine is stereotyped as
TimedProcessing (not shown in the Figure). This permits to bind the processing
of this behavior to time by means of a clock. Here the Radar remains in a single
receivingData state and continues to send data to the controller at each tick of
the SystemClock, every 100 ms.
In Fig. 17, the internal behaviour of the controller is specified. The controller
contains three states, noAction, warning and criticalwarning. The con-
troller initially remains in the noAction state when distance from incoming objects
is greater than 3 m.
However, if the distance decreases to less than 3 m, then the controller switches to
the warning state. If it remains in that particular state for 300 ms and distance is still
less than 3 m but greater than 2 m, then a break interrupt is carried out and controller
sends the normal brake command to the Braking System. Similarly, if distance
decreases to less than 2 m, then the controller enters into a criticalwarning
state. If it stays in that state for 300 ms and distance is still less than 2 m, then controller
sends an emergency brake command to the Braking System.
Figure 18 displays the behavior of the Braking System when it receives com-
mands from the controller. It normally remains in an idle state and depending
upon a particular command received, switches to either the normalBraking or
the emergencyBraking state. In a normal condition, the Braking System
applies brakes for 10 ms, while for an emergency condition, emergency brakes are
applied for 100 ms.
We now turn towards modeling of the software specification of the execution plat-
form of the CCAS, as displayed in Fig. 19. Here, schedulable tasks related to the
hardware modules are modeled along with their communications. A scheduler is
also present that manages the overall scheduling based on a fixed priority algorithm.
Once the hardware and software specifications have been carried out, we carry out
Co-Design allocations between the two using the MADES Allocation Diagram. This
is done to map the CCAS software concepts to the hardware ones. Here in Fig. 21,
the majority of the tasks (such as Brake Actuator Task, Air Bag Task)
are allocated to the controller by means of a temporal allocation, while the Radar
and ODM tasks are allocated to their respective hardware modules by means of spatial
allocations (these properties are not shown in the Figure). While tasks related to the
image processing module such as Camera Task are mapped on to it by means of
a temporal allocation. Finally, all the communications are allocated to the CAN bus.
82 I. R. Quadri et al.
It should be noted that while the Allocated stereotype on the software and hardware
concepts has been applied similarly to the concepts illustrated in Fig. 13, they have
not been displayed here for a better visualization.
We now move on to the detailed hardware and software specification design phases
of the CCAS. Here, in the context of this book chapter, we only focus on a particular
aspect of the CCAS, the Obstacle Detection Module and its corresponding
task, which were initially specified in the abstract hardware/software design phases.
We first describe the related enriched detailed hardware/software specifications, and
then carry out the final mapping. In [29], another module of the CCAS, the Image
Processing Module has been depicted with related detailed hardware/software
specifications along with their allocation. Additionally, verification, code generation
and synthesis on a Virtex V series FPGA has also been carried out regarding this
module. These aspects are also included in another chapter of this book, dealing with
MADES code generation aspects.
MADES EU FP7 Project 83
Once the initial abstract hardware specification has been modeled, the designer
can move on to modeling of the detailed hardware specification which corresponds
more closely to the actual implementation details of the execution platform. These
detailed specifications may correspond to a simple one-to-one mapping to the abstract
hardware specifications such as a ComputingResource being mapped to a
HwProcessor for example, albeit with some additional details such as operating
frequencies of processors, memory and address sizes for hardware memories, etc.
It is also possible to enrich the detailed specifications with additional details (such
as additional of behavior, internal structure etc.), as illustrated in Fig. 22 showcasing
the enriched HW Obstacle Detection Module.
Here in the figure, the HW Obstacle Detection Module is itself stereo-
typed as a MARTE HwComputingResource and mades_architecture.
All the components are automatically typed with MARTE and MADES stereotypes
automatically, thanks to the mapping between the stereotype sets in Modelio. For
example, the local bram memory is typed as a MARTE HwMemory and corre-
sponding mades_memory, while the gps is dually stereotyped as HwDevice
and mades_ipcore. The stereotypes having a prefix ‘mades’ denote concepts
needed by the underlying model transformations, such as an iptype attribute of
the mades_processingnode stereotype which tells the hardware generation trans-
formation which IP core (and the version) to use for the modeled processor such as
Microblaze or PowerPC processors. While it was possible to just add these concepts
to MARTE stereotypes as an extension of the profile, the advantage offered by our
approach is the MARTE profile remains intact and any underlying changes in the
model transformations can be mapped to MARTE concepts, transparent to the end
user. For example, the MARTE concepts do not include the possibility of defin-
ing the type of a processor, such as a softcore or hardcore processor. These aspects
can be added to the mades_processingnode stereotype and the model transforma-
tions accordingly, without changing the original MARTE specifications. Hence this
approach also enables portability, as designers from other RTES industry and acad-
emia familiar with MARTE will be able to comprehend the specifications without
needing to interpret another domain-specific language (DSL).5
In parallel, a designer can model the detailed software specification as seen in Fig. 23,
which basically correspond to an enriched version of the ODM Task defined in
Sect. 5.9. Here, the refined SW Obstacle Detection Task contains several
threads along with their operations. The nfilter thread removes any noise from
the incoming signal, while the gps thread calculates the position and velocity of the
car containing the CCAS. The results are then sent to a corr thread that carries out
a correlation and detects if there are any obstacles in the trajectory of the car with
respect to its relative position. This data is then sent to an output thread which in
turn sends this data to the controller of the CCAS.
Once the detailed hardware specifications have been modeled, it is possible to carry
out a refinement allocation that links the hardware to the detailed hardware speci-
fications. In particular, it enables to move from abstract hardware specifications to
detailed ones corresponding closely to an RTL (Register Transfer Level) implemen-
tation. In the specific case of CCAS and the obstacle detection aspects, we carry out
a refinement allocation from the Obstacle Detection Module (and related
instances) to the HW Obstacle Detection Module (and its instances), as
shown in Fig. 24.
In a similar manner, the software specifications are refined and mapped onto the
detailed software specifications, as shown in Fig. 25. Here, the ODM Task is refined
to its detailed version, the SW Obstacle Detection Module via a refinement
allocation.
Finally, once all the detailed specifications related to the software and hardware
aspects of the obstacle detection subsystem have been modeled, it is possible to
carry out a final allocation from the detailed software to the detailed hardware spec-
ifications. Here, as seen in Fig. 26, the different threads are spatially allocated onto
the single Processor present inside the Hw Obstacle Detection Module.
Once this final design phase is completed, it is possible to carry out the subsequent
phases of the MADES methodology, such as code generation and implementation in
execution platforms. However, these steps have not been mentioned in this particular
context, and are the scope of the chapter dealing with MADES model transformations
and code generation.
CCAS. The MADES methodology design phases can help and guide designers to
follow a flexible and generic work flow for their eventual case studies, and provided
semantics to the usage of UML, SysML and MARTE standards.
Particularly, inclusion of unique MADES diagrams for each MADES design phase
comprising of either SysML or MARTE concepts (depending on the design phase)
and a set of unique command set decreased the overall the design time and the learning
curve, as compared to usage in expert mode. Here expert mode refers to annotating
UML concepts (such as classes, instances, ports etc.) with the target profile concepts,
as found in traditional modeling practices present in normal open source or commer-
cial UML CASE tools [37, 38]. Using Modelio or other modeling tool in expert mode
for complex profiles like SysML and MARTE was found to be a very cumbersome
task, as the user has to first create UML concepts, such as classes, instances and
ports and then annotate them using the profiles accordingly. Additionally, utilization
of the profiles directly involved a lot of guess work in the cases where the designer
was not familiar with the profiles, resulting in arisal of significant design errors, due
to annotation of UML modeling concepts with incompatible profile concepts.
Therefore, usage of MARTE and SysML via MADES diagrams was found to be
much more easier and intuitive. In cases when the same MARTE concept could be
applied to different UML elements (classes, instances, connectors, ports, etc.) the
diagrams were able to guide the system designers. A concrete example can be given
on the HwProcessor stereotype, present in the MARTE profile that corresponds
to a processor at the detailed hardware modeling design phase. In MARTE and
normal UML CASE tools and editors, this stereotype can be rightly annotated to
different UML modeling elements, such as classes, instances but incorrectly to ports
as well, which does not makes sense from a hardware designer’s point of view.
In MADES the designers are guided by avoiding this mistake by only mapping the
HwProcessor stereotype to UML classes and instances, and this command is available
in the detailed hardware specification diagram as seen in Fig. 3 and appropriately
named as a ‘Processor’. In this way, the designers do not have to be concerned
with the MARTE or UML concepts, and they can just select the hardware concepts
available as commands present in Modelio, and then carry out modeling according
to their design specifications.
The MADES CRP satisfied the needs of the MADES end users and task evalua-
tors. The evaluators were able to store manually as much information as desired for
keeping track of the various developed versions of the components as IPs in the CRP.
The evaluators stored for each component: its name, description, version, keywords
and tag, developer name, tool used for development and so on.
The integration of the CRP with a modeling environment such as Modelio is
also a significant contribution. Currently, final integration is in process between
Modelio and the CRP, which will enable designers to develop their component based
IPs, which can be in turn automatically stored in the CRP; enabling IP-reuse when
designers need to create systems requiring these components.
88 I. R. Quadri et al.
7 Conclusions
Acknowledgments This research presented in this paper is funded by the European Community’s
Seventh Framework Program (FP7/2007-2013) under grant agreement No. 248864 (MADES). The
authors would like to thank all of the MADES partners for their valuable inputs and comments.
References
1 Test-Driven Development
As embedded systems are currently becoming more complex, the importance of their
software component rises. Furthermore, due to the definite deployment of embedded
software once it is released, it is unaffordable to deliver faulty software. Thorough
testing is essential to minimize software bugs. The design of embedded software is
strongly dependent on the underlying hardware. Co-design of hardware and software
is essential in a successful embedded system design. However, during the design time,
the hardware might not always be available, so software testing is often considered
to be impossible. Therefore testing is mostly postponed until after hardware devel-
opment and it is typically limited to debugging or ad-hoc testing. Moreover, as it
is the last phase in the process, it might be shortened when the deadline is nearing.
Integrating tests from the start of the development process is essential for a meticu-
lous testing of the code. In fact, these tests can drive the development of software,
hence Test-Driven Development.
It is crucial for embedded systems that they are tested very thoroughly, since the
cost of repair grows exponentially once the system is taken in production, as stated
in Fig. 1, which depicts the law of Boehm [1]. However, the embedded system can
only be tested once the development process is finished. In a waterfall-like strategy
for developing embedded systems, the testing phase is generally executed manually.
This ad-hoc testing is mostly heuristic and only focuses on one specific scenario.
In order to start testing embedded as early as possible, a number of problems arise.
One problem is hardware being unavailable early on in the development proces, and
another is the difficulty to automatically test embedded systems.
to it [3]. When the quality of code meets an acceptable level, the cycle starts over
again, as visually represented in Fig. 2.
TDD reverses the conventional consecutive order of steps, as tests should be
written before the code itself is written. Starting with a failing test gives an indication
that the scope of the test encompasses new and unimplemented behavior. Moreover,
if no production code is written without an accompanying test, one can assure that
most of the code will be covered by tests.
Also fundamental to the concept is that every step is supported by executing a
suite of automated unit tests. These tests are executed to detect regression faults
either due to adding functionality or refactoring code.
Fundamental to the concept of TDD is that refactoring and adding new behavior
are strictly separated activities. When refactoring, tests should remain passing. Yet
should a failure occur, it should be solved in quick order or the changes must be
reverted. On the other hand, when adding new functionality, the focus should stay
on the current issue, only conducting the refactorings when all tests are passing.
Refactoring can and should be applied to the tests themselves as well. In that situation
the implementation stays the same and can be reassured that the test does not change
its own scope.
94 P. Cordemans et al.
The properties of a good unit test can be described by the F.I.R.S.T. acronym,
which is coined by Martin [4].
1. Fast: the execution time of the tests should be limited. If it takes too long, a suite
of many tests will limit development speed.
2. Independent: the setup and result of a test should be independent of other tests.
Dependencies between tests complicate execution order and lead to failing tests
when changing or removing tests.
3. Repeatable: the execution of a test should be repeatable and deterministic. False
positives and negatives lead to wrong assumptions.
4. Self-validating: the result of a test should lead to a failing or passing assertion.
This may sound obvious, nevertheless should the test lead to something else, like
a log file, it cannot be verified automatically.
5. Timely: this refers to the TDD way of writing tests as soon as possible.
Finally TDD is attributed to put the focus on three fundamental issues. First, focus
is placed on the current issue, which ensures that a programmer can concentrate
on one thing at a time. Next, TDD puts the focus on the interface and external
behavior of software, rather than its implementation. By testing its own software,
TDD forces a programmer to think how software functionality will be offered to the
external world. In this respect, TDD is complementary to Design by Contract where
a software module is approached by a test case instead of formal assertions. Lastly
TDD moves the focus from debugging code to testing. When an unexpected issue
arises, a programmer might revert to an old state, write a new test concerning an
assumption and see if it holds. This is a more effective way of working as opposed
to relying on a debugger.
TDD has a number of imperfections, which mainly concern the overhead intro-
duced by testing, thoroughness of testing and particular difficulties when automating
particular hard to test code.
Overhead
Writing tests covering all development code doubles the amount of code that needs to
be written. Moreover, TDD is specifically effective to test library code. This is code
which is not directly involved with the outside world, for instance the user interface,
databases or hardware. However when developing code related to the outside world,
one has to lapse on software mocks. This introduces an additional overhead, as well
as assumptions on how the outside world will react. Therefore it becomes vital to do
some manual tests, which verify these assumptions.
Test coverage
Unit tests will only cover as much as the programmer deemed necessary. Corner
cases tend to be untested, as they will mostly cover redundant paths through the
code. In fact writing tests which will cover the same path with different values are
prohibited by the rule that a test should fail first. This rule is stated with good reason,
as redundant tests tend to lead to multiple failing tests if regression is introduced,
hence obfuscating the bug. Testing corner case values should be done separately
from the activity of programming according to TDD. An extra test suite, which is
not part of the development cycle allows for a minimalistic effort to deal with corner
case values. Should one of these tests detect a bug, the test can easily be migrated to
the TDD test suite to fix the problem and detect regression.
On the other hand programmers become responsible to adhere strictly to the rules
of TDD and only implement a minimum of code necessary to get a passing test.
Especially in conditional code, one could easily introduce extra untested cases. For
instance an if clause should only lead to an else clause, if a test demands to do so.
Furthermore, a consecutive number of conditional clauses tend to increase the
number of execution paths without demanding to write extra test cases. Similar to
the corner case values, an extra test suite can deal with this problem. However,
TDD also encourages avoiding this kind of code, by demanding isolation. This will
typically lead to a large number of small units, for instance classes, rather than one
big complicated unit.
96 P. Cordemans et al.
Next a critical remark has to be made on the effectiveness of the tests written in
TDD. First, they are written by the same person who writes the code under test. This
situation can lead to narrow focused tests, which only expose problems known to the
programmer. In effect, having a large test suite of unit tests, does not take away the
need of integration and system tests. On the other hand code coverage is not guaran-
teed. It is the responsibility of the programmer to diverge from the happy path and
also test corner cases. Additionally, tests for TDD specifically focus on black box unit
testing, because these tests tend to be less brittle than tests, which also test the internals
of a module. However for functional code coverage glass box tests are also necessary.
Finally, a unit test should never replicate the code that it is testing. Replication of
code in a test leads to a worthless test as bugs introduced in the actual code will be
duplicated in the test code. In fact, code complexity of tests should always be less
than the code under test. Moreover the tests that are written need to be maintained
as well as production code. Furthermore setting up a test environment, might require
additional effort, especially when multiple platforms are targeted.
The evaluation of the TDD strategy has been subject of multiple research projects.
George and Williams [5] have conducted a research on the effects of TDD on devel-
opment time and test coverage. Siniaalto [6] provides an overview of the experiments
regarding TDD and productivity. Nagappan [7] describes the effects of TDD in four
industrial case studies. Muller and Padberg [8] claim that the lifecycle benefit intro-
duced by TDD outweighs its required investment. Note that research on the alleged
benefits of TDD for embedded software is limited to several experience reports, such
as written by Schooenderwoert [9, 10] and Greene [11].
• CppUTest [18] is one of the latest C++ testing frameworks, it also has a
complementary mocking framework, called CppUMock.
• GoogleTest [19] is the most full-blown C++ unit test framework to date. It pro-
vides integration with its mocking framework, GoogleMock. However it is not
specifically targeted to embedded systems and is not easily ported.
In the interface based design, as shown in Fig. 4, the effective hardware and mock are
addressed through a unified abstract class, which forms the interface of the hardware
driver. Calls are directed to the interface thus both mock and effective hardware
driver provide an implementation. The interface should encompass all methods of
the hardware driver to ensure compatibility. Optionally the mock could extend the
interface for test modularity purposes. This enables customizing the mock on a test-
per-test basis, reducing duplication in the test suite.
It should be noted that the interface could provide a partial implementation for the
hardware independent methods. However, this would indicate that hardware depen-
dencies are mixed with hardware independent logic. In this situation a refactoring is
in order to isolate hardware dependent code.
Inheriting from the same interface guarantees compatibility between mock and
real hardware driver, as any inconsistency will be detected at compile time. Regarding
future changes, extending the real driver should appropriately reflect in the interface.
The main reason of concern with this approach is the introduction of late binding,
which inevitably slows down the system in production. However it should be noted
that such an indirection is acceptable in most cases.
Fig. 4 UML class diagram of interface based mock replacement in different environments
at least given protected member access and are declared as virtual. These conditions
allow overriding hardware related methods with a mock implementation on host.
However, these issues can be worked around with some macro preprocessing.
First, all private members can be adjusted to public access solely for testing purposes.
Also the virtual keyword can be removed in the target build.
Inheritance-based mock introduction is more about managing testability of code
than actual testable design. That being said, all overhead of testability can be easily
removed in production code. However, ensuring that the macro definitions do not
wreck havoc outside the file is fundamental in this approach. Nonetheless, also when
dealing with legacy code this approach is preferable. Considering the amount of
100 P. Cordemans et al.
First is link-time based mock replacement [22], also known as link-time polymor-
phism or link-time substitution. The idea is to provide a single interface of functions
in a header file and use the linking script or IDE to indicate which implementation
file corresponds to it, i.e. the file containing the actual implementation or a similar
file containing the mock implementation. Correspondingly the host build will refer
to the mock files and the target build to the real implementation files, as visually
represented in Fig. 6.
Practically the linker script (or IDE) will refer to three different subfolders. First
is the common folder, which contains all platform independent logic as well as
header files containing the hardware dependent function declarations. Next is the host
folder, which will include the mocks and finally the target folder with the correspond-
ing real implementations. Should a hardware implementation or mock file be miss-
ing,the linker will return an error message as a reminder. A practical example of the
Fig. 6 Host and target build refer to the mock and real implementation file respectively
Test-Driven Development as a Reliable Embedded Software Engineering Practice 103
link-time based configuration is not given, considering the multitude of build systems
and IDE’s.
While the link-time based macro replacement involves delivering the desired source
code files to the linker, the macro preprocessed alternative involves preprocessor
directives, which manipulate the source code itself. For instance,
#ifdef TESTONHOST
#include "mockdriver.h"
#else
#include "realdriver.h"
#endif
provides an almost identical effect to its link-time based alternative. Moreover, macro
replacement allows to intersect inside a source file. First, the function call to be
mocked is replaced by a new function definition. Also, the testing framework to
implement the tests related to the code under test is injected in the file itself.
#ifdef TESTONHOST
#define functionMock(int arg1, int arg2)
function(int arg1, int arg2)
void functionMock(int arg1, int arg2) {};
#endif
#ifdef TESTONHOST
#include "unittestframework.h"
int main () {
/* run unit tests & report */
}
Although macros are commonly negatively regarded, the macros shown in the
previous two listings are generally safe and will not lead to bugs which are hard to
find. However the macro statements will pollute the source code, which leads to less
readable and thus less maintainable code. The main advantage of macro preprocessed
mock replacement is in dealing with legacy code. Capturing the behavior of legacy
code in tests is something that should be done with the least refactoring, because in
legacy code, tests are lacking to provide feedback on the safety of the refactoring
operations. Using macros effectively allows leaving the production code unchanged,
while setting up the necessary tests. Conversely, when developing new applications
link-time based mock replacement is preferred, as it does not have any consequences
on the production code.
104 P. Cordemans et al.
Ideally Test-Driven Development is used to develop code which does not have any
external dependencies. This kind of code suits TDD well, as it can be developed
fast, in isolation and does not require a complicated setup. However, when dealing
with embedded software the embedded environment complicates development. Four
typical constraints influence embedded software development and have their effect
on TDD. To deal with these issues four strategies have been defined [24–26], which
tackle one or more of these constraints. Each of these strategies leads to a specific
setup and influence the software development process. However, neither of these
strategies is the ideal solution and typically a choice needs to be made depending on
the platform and type of application.
Development speed
TDD is a fast cycle, in which software is incrementally developed. This results in
frequently compiling and running tests. However, when the target for test execu-
tion is not the same as the host for developing software, a delay is introduced into
development. For instance, this is the time to flash the embedded memory and trans-
mit test data back to the host machine. Considering that a cycle of TDD minimally
consists of two test runs, this delay becomes a bottleneck in development accord-
ing to TDD. A considerable delay will result in running the test suite less frequent,
which in turn results to taking larger steps in development. This will introduce more
Test-Driven Development as a Reliable Embedded Software Engineering Practice 105
failures, leading to more delays, which in turn this will reduce the number of test
runs, etc.
Memory footprint
Executing TDD on a target platform burdens the program memory of the embedded
system. Tests and the testing framework are added to the program code residing in
target memory. This results in at least doubling the memory footprint needed.
Cross-compilation issues
In respect of the development speed and memory footprint issues, developing and
testing on a host system solves the previously described problems. However, the
target platform will differ from the host system, either in processor architecture or
build tool chain. These issues could lead to incompatibilities between the host and
target build. Comparable to other bugs, detection of incompatible software has a less
significant impact should it be detected early on. In fact, building portable software
is a merit on its own as software migration between target platforms improves code
reuse.
Hardware dependencies
External dependencies, like hardware interaction, complicate the automation of tests.
First, they need to be controlled to ensure deterministic execution of the tests. Fur-
thermore hardware might not be available during software development. Regardless
of the reason, in order to successfully program according to TDD, tests need to run
frequently. This implies that executing tests should not depend on the target plat-
form. Finally, in order to effectively use an external dependency in a test, setup and
teardown will get considerably more complicated.
In the Test on target strategy, TDD issues raised by the target platform are not dealt
with. Nevertheless, Test on target is a fundamental strategy as a means of verification.
First, executing tests on target deliver feedback as part of an on-host development
strategy. Moreover, during the development of system, integration or real-time tests,
the effort in mocking specific hardware aspects is too labor intensive. Finally, writing
validation tests when adopting TDD in a legacy code based system, provides a self-
validating, unambiguous system to verify existing behavior.
3.2.1 Implementation
3.3 Process
needed to solve some of these cases, resulting in tests that only test the mock. TDD
should only be applied to software that is useful to test. When external software is
encountered, minimize, isolate and consolidate its behavior.
Finally, Test on target has its merit to consolidate behavior in software sys-
tems without tests. Changing existing software without tests giving feedback on its
behavior, is undesirable. After all this is the main reason to introduce TDD in the first
place. However, chances are that legacy software does not have an accompanying
test suite. Preceding refactoring of legacy software with on-target tests capturing the
system’s fundamental behavior is essential to safely conduct the necessary changes.
In the following example, a focus is put on automation of tests for low level hardware-
related software, according to the Test on target strategy.
TEST(RepeatButtonTest)
{
Button *button = new Button(&IOPIN0, 7);
button->setCurrentState(RELEASED);
CHECK(button->getState() == RELEASED);
button->setCurrentState(PRESSED);
CHECK(button->getState() == PRESSED);
button->setCurrentState(RELEASED);
button->setCurrentState(PRESSED);
CHECK(button->getState() == REPEAT);
delete button;
}
108 P. Cordemans et al.
This test1 creates a button object and tries to check whether its state logic
functions correctly, namely two consecutive high states should result in REPEAT.
Now, one way to test a button is to press it repeatedly and see what happens. Yet this
requires manual interaction and is not feasible to manually test the button every time
a code change is made. However, automation of events related to hardware can be
solved with software. In this case an additional method is added to the button class,
setCurrentState, which allows to press and release the button in software.
Two remarks are generally put forward when adding methods for testing purposes.
On the one hand, these methods will litter production code. This can easily be solved
by inheriting from the original class and add these methods in a test subclass. On the
other hand, when a hardware event is mocked by some software, it might contain
bugs on its own. Furthermore there is no guarantee that the mock software is a good
representation of the hardware event it is replacing. Finally, is the actual code under
test or rather the mock code tested this way?
These remarks indicate that manual testing is never ruled out entirely. In the case
of automating tests and adding software for this purpose, a general rule of thumb is
to test it both manually and automated. If both tests have identical results, consider
the mock software as good as the original hardware behavior. The added value of the
automated test will return on its investment when refactoring the code under test or
extending its behavior.
Ideally, program code and tests reside in memory of the programmer’s development
computer. This situation guarantees the fastest feedback cycle in addition to indepen-
dence of target availability. Furthermore, developing in isolation of target hardware
improves modularity between application code and drivers. Finally, as the host sys-
tem has virtually unlimited resources, a state of the art unit testing framework can
be used.
In the Test on host strategy, development starts with tests and program code on
the host system. However, calls to the effective hardware are irrelevant on the host
system. Hence a piece of code replaces the hardware related functions, mocking
the expected behavior. This is called a mock, i.e. a fake implementation is provided
for testing purposes. A mock represents the developer’s assumptions on hardware
behavior. Once the developed code is migrated to the effective hardware system,
these assumptions can be verified.
3.5.1 Implementation
The Test on host strategy typically consists of two build configurations, as shown in
Fig. 9. Regardless of the level of abstraction of hardware, the underlying components
can be mocked in the host build. This enables the developer to run tests on the
host system, regardless of any dependency on the target platform. However, cross-
platform issues might arise and these are impossible to detect when no reference
build or deployment model is available. Ideally, building effectively for the real
target platform will identify these issues. Although, running the cross-compiler or
deploying to a development board could already identify some issues before the
actual target is available.
3.5.2 Process
In order to deal with the slow cycle of uploading embedded program code, executing
tests on target and reporting, Test on host is presented as the main solution. In
order to do so an assumption is made that any algorithm can be developed and
tested in isolation on the host platform. Isolation from hardware-related behavior
is critical with the purpose of dynamically delegating the call to the real or mock
implementation.
Considering the differences between host and target platform, verification of cor-
respondence between target and host implementation is essential. These differences
are:
• Cross-compilation issues, which occur as compilers can generate different
machine code from the same source code. Also functions called from libraries
110 P. Cordemans et al.
for each platform might lead to different results, as there is no guarantee regarding
correspondence of both libraries.
• Assumptions on hardware-related behavior. Since the hardware reference is not
available on host, the mocks are representing assumptions made on hardware
behavior. This requires having an in-depth knowledge of hardware specifications.
Furthermore, as the hardware platform for embedded systems can evolve, these
specifications are not as solid or verified as is the case with a host system.
• Execution issues concerning the different platforms. These concern the difference
in data representation, i.e. word-size, overflow, memory model, speed, memory
access times, clocking differences, etc. These issues can only be uncovered when
the tests are executed on the target platform.
Test on host is the primary step in the embedded TDD cycle [23, 27–29], as shown in
Fig. 8. This cycle employs the technique of “dual targeting”, which is a combination
of Test on host and Test on target. In effect, development in this process is an activity
entirely executed according to Test on host, as a reasonable development speed can be
achieved. However, in order to cover up for the intrinsic deficiencies of Test on host,
Test on target techniques are applied. Specifically, time-intensive activities are exe-
cuted less frequent, which allows managing the process between development time
and verification activities. The embedded TDD cycle proscribes to regularly compile
with the target compiler and subsequently solve any cross-compilation issues. Next,
automated tests can be ported to the target environment, execute them and solve any
problems that arise. Yet, as this is a time-intensive activity it should be executed less
frequently. Finally, some manual tests, which are the most labor-intensive, should
only be carried out every couple of days.
/* register to mock */
unsigned int *IODIRmock;
TemperatureSensor *tempSensor =
new TemperatureSensor(IOaddresses, pinNumber);
tempSensor->reset();
3.6.1 Implementation
in
another address space, without the manual intervention of the programmer. When
this is applied to TDD for embedded software, remoting allows for tests on host to
call the code under test, which is located on the target environment. Subsequently,
the results of the subroutine on target are returned to the Test on host for evaluation.
Regardless of the specific technology, a broker is required which will setup the
necessary infrastructure to support remoting. In homogeneous systems, such as net-
worked computing, the broker on either side is the same. However, because of the
specific nature of embedded systems, a fundamental platform difference between the
target and the host broker exists.
On the one hand the broker on target has a threefold function. First, it maintains
communication between host and target platform on the target side. Next, it contains
a list of available subroutines which are remotely addressable. Finally, it keeps a
list of references to memory chunks, which were remotely created or are remotely
accessible. These chunks are also called skeletons.
On the other hand the broker on host serves a similar, but slightly different func-
tion. For one thing it maintains the communication with the target. Also, it tracks
the stubs on host, which are interfaces on host corresponding to the skeletons
in the target environment. These stubs provide an addressable interface for tests,
as if the effective subroutine would be available in the host system. Rather than
executing the called function’s implementation, a stub merely redirects the call to
the target and delivers a return value as should the function have been called locally.
As the testing framework solely exists in the host environment, there is practically
no limitation on it. Even the programming language on host can differ completely
from the target’s programming language. In the spirit of CxxTest2 a note is made
that C++ as a programming language to devise tests might require a larger amount
of boilerplate code than strictly necessary. Writing tests in another language is a
convenience which can be exploited with Remote testing.
Unfortunately, the use of remoting technology introduces an overhead into soft-
ware development. Setting up broker infrastructure and ensuring subroutines are
remotely accessible require a couple of additional actions. On target, a subroutine
must be extended to support a remote invocation mechanism called marshaling. This
mechanism will allow the broker to invoke the subroutine when a call from host “mar-
shals” such an action. Correspondingly on host, an interface must be written which
is effectively identical to the interface on target. Invoking the subroutine on host will
marshal the request, thus triggering the subroutine on target, barring communication
issues between host and target.
Some remoting technologies, for instance CORBA, incorporate the use of an
Interface Description Language (IDL). An IDL allows defining an interface in a lan-
guage neutral manner to bridge the gap between otherwise incompatible platforms.
On its own the IDL does not provide added value to remoting. However the spec-
ifications describing the interfaces are typically used to automatically generate the
correct serialization format. Such a format is used between brokers to manage data
and calls. As serialization issues concern the low level mechanics of remoting, an
IDL provides a high level format, which relieves some burden of the programmer.
3.6.2 Process
The Remote testing development cycle changes the conventional TDD cycle in the
first step. When creating a test, the interface of the called subroutine under test must
be remotely defined. This results in the creation of a stub on host which makes the
defined interface available on the host platform, while the corresponding skeleton
on target must also be created. Subsequent steps are straightforward, following the
traditional TDD cycle.
1. Create a test
2. Define an interface on host
(a) Call the subroutine with test values
(b) Assert the outcome
(c) Make it compile
(d) If the subroutine is newly created: add a corresponding skeleton on target
(e) Run the test, which should result in a failing test
3. Red bar
(a) Add an implementation to the target code
(b) Flash to the target
(c) Run the test, which should result in a passing test
4. Green bar: either refactor or add a new test.
114 P. Cordemans et al.
Remote testing only provides a means to eliminate one code upload. In order to
deal with the rather low return of investment inherent to Remote testing, an adaption
to the process is made, which results in a new process called Remote prototyping.
Principally Remote prototyping involves developing code on the host platform, while
specific hardware calls can be delegated towards the respective code on target [33].
Addressing the hardware-related subroutines on host, delegating the call to the tar-
get and returning values as provided by the subroutine prototype are provided by
remoting infrastructure.
Software can be developed on host, as illustrated in Fig. 11, as all hardware
functionality is provided in the form of subroutine stubs. These stubs deliver the
subroutine definition on the host system while an effective call to the subroutine stub
will delegate the call to the target implementation.
Remote prototyping is viable under the assumption that software under develop-
ment is evolving, but once the software has been thoroughly tested, a stable state is
reached. As soon as this is the case the code base can be instrumented to be remotely
addressable. Subsequently, it is programmed into the target system and thoroughly
tested again to detect cross-compilation issues. Once these issues have been solved,
the new code on target can be remotely addressed with the aim of continuing devel-
opment on the host system.
An overview of the Remote prototyping process applied to an object oriented
implementation, for instance C++, is given in Fig. 12. A fundamental difference
Test-Driven Development as a Reliable Embedded Software Engineering Practice 115
exists when all objects can be statically allocated or whether dynamic creation of
memory objects is required.
In a configuration in which the target environment can be statically created, setup
of the target system can be executed at compile time. The broker system is not
involved in constructing the required objects, yet keeps a reference to the statically
created objects. Effectively the host system does not need to configure the target
system and treats it as a black box. Conversely the process of Remote prototyping with
dynamic allocation requires additional configuration. Therefore the target system is
approached as a glass box system. This incurs an additional overhead for managing
the on target components, yet allows dynamically reconfiguring the target system
without wasting a program upload cycle.
The dynamical Remote prototyping strategy starts with initializing both the broker
as well on target as on host side. Next, a test is executed, which initializes the environ-
ment. This involves setting up the desired initial state on the target environment. This
is in anticipation of the calls, which the software under development will conduct.
For instance, to create an object in the target, the following steps are performed, as
illustrated in Fig. 12.
1. The test will call the stub constructor, which provides the same interface as the
actual class.
2. The stub delegates the call to the broker on host.
3. The broker on host translates the constructor call in a platform independent com-
mand and transmits it to the target broker.
4. The broker on target interprets the command and calls the constructor of the
respective skeleton and in the meanwhile assigns an ID to the skeleton reference.
5. This ID is transmitted in an acknowledge message to the broker on host, which
assigns the ID to the stub object.
After test setup, the test is effectively executed. Any calls to the hardware are
dealt with by the stub object, which are delegated to the effective code on target.
Likewise, any return values are delivered to the stub. Optionally another test run
can be done without rebooting the target system. A cleanup phase is in order after
each test has executed, otherwise the embedded system would eventually run out of
memory. Deleting objects on target is as transparent as on host, with the addition that
the stub must be cleaned up as well.
Remote prototyping deals with certain constraints inherent to embedded systems.
However, some issues can be encountered when implementing and using the Remot-
ing infrastructure.
Embedded constraints
The impact, especially considering constrained memory footprint and processing
power, of the remoting infrastructure on the embedded system is minimal. Of
course it introduces some overhead to systems which do not need to incorporate the
infrastructure for application needs. On the other hand Remote prototyping enables
conducting unit tests with a real target reference. Porting a unit test framework and
running the tests in target memory as an alternative will introduce a larger over-
head than the remoting infrastructure and lead to unacceptable delays in an iterative
development process.
Next, the embedded infrastructure does not always provide all conventional com-
munication peripherals, for instance Ethernet, which could limit Remote prototyp-
ing applicability. However, if an IDL is used, the effective communication layer is
abstracted. Moreover, the minimal specifications needed to setup Remote prototyping
are limited as throughput is small and no timing constraints need to be met.
Finally, Remote prototyping requires that hardware and a minimalistic hardware
interfacing is available. This could be an issue when hardware still needs to be
developed. Furthermore hardware could be unavailable or deploying code still under
development might be potentially dangerous. Lastly, a minimalistic software inter-
face wrapping hardware interaction and implementing the remoting infrastructure is
needed to enable remote prototyping. This implies that it is impossible to develop
all firmware according to this principle.
Issues
The encountered issues when implementing and using Remote prototyping can be
classified in three types. First are cross-platform issues related to the heterogeneous
architecture. A second concern arises when dynamic memory allocation on the target
side is considered. Thirdly, translation of function calls to common architectural
independent commands introduces additional issues.
Differences between host and target platform can lead to erratic behavior, such
as unexpected overflows or data misrepresentation. However, most test cases will
quickly detect any data misrepresentation issues. Likewise, over- and underflow
problems can be discovered by introducing some boundary condition tests.
Next, on-target memory management is an additional consideration which is a
side-effect of Remote prototyping. Considering the limited memory available on
Test-Driven Development as a Reliable Embedded Software Engineering Practice 117
target and the single instantiation of most driver components, dynamic memory
allocation is not desired in embedded software. Yet, Remote prototyping requires
dynamic memory allocation to allow flexible usage of the target system. This intro-
duces the responsibility to manage memory, namely creation, deletion and avoiding
fragmentation. By all means this only affects the development process and unit ver-
ification of the system, as in production this flexibility is no longer required.
Finally, timing information between target and host is lost because of the asynchro-
nous communication system, which can be troublesome when dealing with a real-
time application. Furthermore to unburden the communication channel, exchanging
simple data types are preferred over serializing complex data.
Tests
The purpose of Remote prototyping is to introduce a fast feedback cycle in the devel-
opment of embedded software. Introducing tests can identify execution differences
between the host and target platform. In order to do so the code under test needs to
be ported from the host system to the target system. By instrumenting code under
test, the Remote prototyping infrastructure can be reused to execute the tests on host,
while delegating the effective calls to the code on target.
3.6.4 Overview
Test on target, Test on host, Remote testing and Remote prototyping have been
defined as strategies to develop in a TDD fashion for embedded. These strategies
have advantages and disadvantages when a comparison between them is made [34].
Furthermore, because of the disadvantages, these strategies excel in a particular em-
bedded environment. In this section a comparison is made between each strategy
and an overview is given of how development in a project can be composed of
combinations of these strategies.
The baseline of this comparison is Test on target. Namely for the particular reason
that when the number of code uploads to target is the only consideration, Test on
target is the worst strategy to choose. It is possible to demonstrate this when the
classical TDD cycle is considered, as in Fig. 13.
When TDD is strictly applied in Test on target, every step will require a code
upload to target. Considering the iterative nature of TDD, each step will be frequently
run through. Moreover since a target upload is a very slow act, this will deteriorate
the development cycle to a grinding halt. Thus reducing the number of uploads is
critical in order to successfully apply TDD for embedded software.
Remote Testing
When considering the effect of Remote testing on TDD for embedded software, the
following observation can be made. At a minimum with Test on target, each test will
require two uploads, i.e. one to prove the test is effectively failing and a second one,
which contains the implementation to make the test pass. Note that this is under the
assumption that the correct code is immediately implemented and no refactorings
are needed. If it takes multiple tries to find the correct implementation or when
refactoring the number of uploads rises.
In order to decrease the number of required uploads, tests can be implemented
in the host environment, i.e. Remote testing. Effectively this reduces the number of
uploads by the number of tests per subroutine minus one. One is subtracted because
a new subroutine will require to flash the empty skeleton to the target. Therefore
the benefit of Remote testing as a way to apply TDD to embedded software is lim-
ited, as demonstrated in Table 1. The ideal case is when a test passes after the first
implementation is tried.
Consider that tests have a relative low complexity when compared to production
code. This observation implies that a test is less likely to change than the effective
code under test, which indicates a reduction of the benefits of Remote testing. The
possibility of changing code to reach green bar, namely during the implementation
or refactoring phase, is higher than the (re)definition of the tests. Effectively this will
reduce the ratio of tests versus code under test requiring an update of code on target.
When only the number of uploads is taken into account, Remote testing will never
be harmful to the development process. Yet considering the higher complexity of
production code and refactoring, which mostly involves changing code under test,
the benefit of Remote testing diminishes rapidly. When other costs are taken into
account, this strategy is suboptimal when compared to Test on host. However, as a
pure testing strategy, Remote testing might have its merit. Though the application of
Remote testing in this context was not further explored.
Remote Prototyping
As the effectiveness of Remote testing is limited, an improvement to the process
is made when code is also developed on host, i.e. Remote prototyping. Remote
prototyping only requires a limited number of remote addressable subroutines to start
with. Furthermore, once code under development is stable, its public subroutines can
be ported and made remote addressable in turn. This is typically when an attempt can
be made to integrate newly developed code into the target system. At that moment
these subroutines can be addressed by new code on host, which is of course developed
according to the Remote prototyping principle.
Where Remote prototyping is concerned, it is possible to imagine a situation
which is in fact in complete accordance to Remote testing. Namely when a new
remote subroutine is added on target, this will conform to the idea of executing a Test
on host, while code under test resides on target. However code which is developed
on host will reduce the number of uploads, which would normally be expected in
a typical Test on target fashion. Namely each action which would otherwise have
provoked an additional upload will add to the obtained benefit of Remote prototyping.
Yet the question remains of how the Remote testing part of the process is related
to the added benefit of Remote prototyping. A simple model to express their relation
is the relative size of the related source code. Namely on the one hand the number
of Lines Of Code (LOC) related to remoting infrastructure added to the LOC of
subroutines which were not developed according to Remote prototyping, but rather
with Remote testing. On the other hand the LOC of code and tests developed on host.
These assumptions will ultimately lead to Table 2.
In Table 2 the following symbols are used:
T # tests
C # of code uploads
R # of new remote subroutines
CD # of uploads related to Remote testing
CH # of averted uploads on host
LOCD LOC related to Remote testing (Remoting infrastructure + traditionally
developed code)
LOCH LOC developed according to Remote prototyping
LOCD LOCH
α= ,β =
LOCTOTAL LOCTOTAL
Test on Host
Finally, a comparison can be made with the Test on host strategy. When only uploads
to target are considered, Test on host provides the best theoretical maximum perfor-
mance, as it only requires one upload to the target, i.e. the final one. Ofcourse, this is
not a realistic practice and definitely contradicts with the incremental aspect of TDD.
Typically a verification upload to target is a recurrent irregular task, executed at the
discretion of the programmer. Furthermore Test on host and the remoting strategies
have another fundamental difference. Namely while setting up remoting infrastruc-
ture is only necessary when a certain subroutine needs to be remotely addressable,
Test on host requires a mock. Although there are mocking frameworks which reduce
the burden of manually writing mocks, it still requires at least some manual adap-
tation. When the effort to develop and maintain mocks is ignored, a mathematical
expression similar to the previous expressions can be composed as shown in Table 3.
However it should be noted, that this expression does not consider the most important
metric for Test on host and is therefore less relevant3
In Comparison
In the previous sections, the only metric which was considered was the number of
code uploads. Although this is an important metric to determine which strategy is
more effective, there are also other metrics to consider.
First metric is the limited resources of the target system, namely memory footprint
and processing power. On the one hand when considering the Test on target case, tests
Table 3 Test on host benefit when only target uploads are considered
# Tests (T) # Code uploads (C) # Verification uploads (U) Test on host (%)
1 1 1 50
Min. 1 C U Min = 0.0...1
Max. T C 1 Max
= 99.99...
General case T C U 1− U
T +C ∗ 100
and a testing framework will add to the required memory footprint of the program.
While on the other hand, the processing power of the target system is also limited,
so a great number of tests on target will slow down the execution of the test suite.
Another metric to consider are the hardware dependencies, namely how much effort
does it require to write tests (and mocks) for hardware-related code? Finally, what is
the development overhead required to enable each strategy. For Test on target this is
the porting of the testing framework, while Test on host requires the development and
maintenance of hardware mocks and finally Remote prototyping requires Remoting
infrastructure.
Table 4 provides a qualitative overview of the three strategies compared to each
other when these four metrics are considered.
The overview in Table 4 does not specify the embedded system properties, as the
range of embedded systems is too extensive to include this information into a decision
matrix. For instance, applying Remote prototyping does not have any overhead at all,
when remoting infrastructure is already available. Likewise when an application is
developed on embedded Linux, one can develop the application on a PC Linux system
with only minimal mocking needed, making Test on host the ideal choice. Moreover
in this overview no consideration is given to legacy code, yet the incorporation of
legacy code will prohibit the use of the Test on host strategy.
When deciding which strategy is preferable, no definite answer can be given. In
general, Test on target is less preferred than Test on host and Remote prototyping,
while Remote prototyping is strictly better than Remote testing. Yet beyond these
statements all comparisons are case-specific. For instance when comparing Test on
host versus Remote prototyping, it is impossible to make a sound decision without
considering the embedded target system and the availability of drivers, application
software, etc.
122 P. Cordemans et al.
The following sections deal with two structural patterns related to Test-Driven
Development for embedded software. First is 3-tier TDD, which deals with the dif-
ferent levels op complexity to develop embedded software. Subsequently is the MCH
pattern [35–37], an alternative with the same objective.
In dealing with TDD for embedded software, three levels of difficulty to develop
according to TDD are distinguished. Each of these levels imply their specific prob-
lems with each TDD4ES strategy. A general distinction is made between hardware-
specific, hardware-aware and hardware independent code. When developing these
types of code, it is noted that the difficulty to apply TDD increases, as shown
in Fig. 14.
Hardware independent code
A fine example of hardware independent software is code to compose and tra-
verse data structures, state machine logic, etc. Regardless of the level of abstraction
these parts of code could be principally reused on any platform. Typically hardware
independent code is the easiest to develop with TDD. Because barring some cross-
compilation issues, it can be tested and developed on host (either as Test on host or
Remote prototyping). Typical issues that arise when porting hardware independent
code to target are:
• Type-size related, like rounding errors or unexpected overflows. Although most of
these problems are typically dealt with by redefining all types to a common size
across the platforms, it should still be noted that unit tests on host will not detect
any anomalies. This is the reason to run tests for hardware independent code.
• Associated with the execution environment. For instance, execution on target might
miss deadlines or perform strangely after an incorrect context switch. The target
environment is not likely to have the same operating system, provided it has an
OS, as the host environment.
• Differences in compiler optimizations. Compilers might have a different effect on
the same code, especially when optimizations are considered. Also problems with
the volatile keyword can be considered in this category. Running the compiler on
host with low and high optimization might catch some additional errors.
Most importantly, developing hardware independent code according to TDD requires
no additional considerations for each of the strategies. Concerning Test on target,
the remark remains that all development according to this strategy is painstakingly
slow. Furthermore hardware independent code does not impose any limitations on
either Remote prototyping or Test on host, so there should be no reason to develop
according to Test on target.
Hardware-aware code
Next is hardware-aware code, which is a high level abstraction of target-specific
hardware components. A typical example of a hardware-aware driver is a Tempera-
ture sensor module. This does not specify which kind of Temperature sensor is used.
It might as well concern a digital or analog temperature sensor. Yet it would not be
surprising to expect some sort of getTemperature subroutine. Hardware-aware code
will typically offer a high level interface to a hardware component, yet it only presents
an abstraction of the component itself, which allows changing the underlying imple-
mentation. Developing hardware-aware code on host will require a small investment
when compared to a traditional development method, because hardware-aware code
will typically call a low level driver, which is not available on host. However the
benefits of TDD compensate for this investment. The particular investment depends
on the strategy used.
On the one hand when developing according to Test on host this investment will
be a mock low level driver. The complexity of this mock depends on the expected
behavior of the driver. This particular approach has two distinct advantages. First, it
allows intercepting expected non-deterministic behavior of the driver, which would
otherwise complicate the test.
A second advantage of using mocks to isolate hardware-aware code for testing
purposes. Namely, a consequence of the three-tier architecture is that unit tests for
hardware-aware code will typically test from the hardware independent tier. This
has two reasons. On the one hand a unit test typically approaches the unit under
test as a black box. On the other hand, implementation details of hardware-aware
and hardware-specific code are encapsulated, which means only the public interface
is available for testing purposes. In order to deal with unit test limitations, break-
ing encapsulation for testing is not considered as an option. Because it is not only
considered as a harmful practice, but is also superfluous as mocks enable testing the
124 P. Cordemans et al.
Fig. 16 Remote prototyping with a mock/stub hybrid, which can assert the call order of the software
under test
readings until a failure has been invoked. Therefore it is easier to mock the failure,
which guarantees a failure every time the test is executed.
Hardware-specific code
Finally hardware-specific code is the code which interacts with the target-specific
registers. It is low level driver code, which is dependent on the platform, namely
register size, endianness, addressing the specific ports, etc. It fetches and stores
data from the registers and delivers or receives data in a human-readable type, for
instance string or int. An example of a hardware-specific code are drivers for the
various peripherals embedded in a microcontroller.
Hardware-specific code is the most difficult to develop with TDD, as test automa-
tion of code which is convoluted with hardware is not easily done. When considering
the strategies Test on host and Remote prototyping, each of these has its specific
issues. On the one hand, Test on host relies on mocks to obtain hardware abstrac-
tion. Although it can be accomplished for hardware-specific code, as demonstrated
in listing Sect. 3.5.3, developing strictly according to this strategy can be a very
time absorbing activity. This would lead to a diminishing return of investment and
could downright turn into a loss when compared to traditional development methods.
Furthermore as hardware-specific code is the least portable, setting up tests with spe-
cial directives for either platform could be an answer. However these usually litter
the code and are only a suboptimal solution.
Optimally, the amount of hardware-specific code is reduced to a minimum and
isolated as much as possible to be called by hardware-aware code. The main idea
concerning hardware-specific code development is to develop low-level drivers with
a traditional method and test this code afterwards. For both Test on host and Remote
prototyping this results in a different development cycle.
126 P. Cordemans et al.
4.2 MCH-Pattern
An alternative for 3-tier TDD is the MCH-pattern by Karlesky et al., which is shown in
Fig. 19. This pattern is a translation of the MVC pattern [38] to embedded software.
It consists of a Model, which presents the internal state of hardware. Next is the
Hardware, which presents the drivers. Finally the Conductor contains the control
logic, which gets or sets the state of the Model and sends command or receives
triggers from the Hardware. As this system is decoupled it is possible to replace each
component with a mock for testing purposes.
5 Conclusion
Future directions
1. Hardware mocking
As briefly indicated in Sect. 2 mocks could be partially automatically generated by
a mocking framework, which is complementary to a testing framework. No further
elaboration is given on the subject, but since hardware mocks are extensively used
in the Test on host strategy, a part of the work could be lifted from the programmer.
2. Related development strategies
Test-Driven Development is a fundamental practice in Agile or eXtreme Program-
ming methodologies. Yet, similar practices exist based on the same principles of
early testing. For instance, Behavior-Driven Development (BDD) is an iterative
practice where customers can define features in the form of executable scenar-
ios. These scenarios are coupled to the implementation. In turn this can be exe-
cuted indicating whether the desired functionality has been implemented. BDD for
embedded has some very specific issues, since functionality or features in embed-
ded systems is mostly a combination of hardware and software.
Test-Driven Development as a Reliable Embedded Software Engineering Practice 129
References
1 Introduction
The Electronic Design Automation (EDA) industry has ushered in the era of pervasive
computing where digital devices are indispensible for executing every aspect of
modern civilization. Several EDA tools such as Vista from Mentor Graphics [1],
EDA360 from Cadence [2], Platform Architect from Synopsys [3] and Simulink from
Mathworks [4] are available for carrying out high level as well as low level system
S. Chakraverty (B)
Netaji Subhas Institute of Technology, Dwarka, Sector 3, New Delhi 110078, India
e-mail: [email protected]
A. Kumar
Samsung Research Institute, Noida, India
e-mail: [email protected]
M. A. Khan et al. (eds.), Embedded and Real Time System Development: 131
A Software Engineering Perspective, Studies in Computational Intelligence 520,
DOI: 10.1007/978-3-642-40888-5_5, © Springer-Verlag Berlin Heidelberg 2014
132 S. Chakraverty and A. Kumar
design. The field of EDA has been richly researched upon and utilized extensively
by both academia and industry. Nevertheless, there are certain critical issues that
specifically relate to the design of multi-objective DMPE systems. These issues
need to be addressed more systematically and in a manner that supports active user
participation in quantifying tradeoffs between conflicting design objectives.
Multi-processor design optimization falls under the category of NP-Complete
problems. Not only does it confront a very large search space but it has also to deal
with several design objectives simultaneously. The system must be designed so as to
achieve the desired real time performance levels, ensure a high degree of availability
and accuracy at its service points and possess the ability to reconfigure in the presence
of faults. All these deliverables must be achieved in a cost effective manner. Another
level of complexity arises from the sheer diversity of implementation platforms that
are available for executing the functional tasks. They include software implemen-
tations on a range of Instruction Set Architecture (ISA) based processors, hardware
implementations on Application Specific Integrated Circuits (ASIC) and bit map
implementations on Field Programmable Gate Arrays (FPGA). These are matched
by the plethora of bus types that are available to build the underlying communication
fabric with.
Literature is rife with examples of traditional mathematical and graph based
approaches for system optimization with focus on reliability. They include Mixed
Integer Linear Programming (MILP) [5], dynamic programming [6] and [7], integer
programming [8] and branch-and-bound [9] techniques. These approaches take a
very long run time to generate optimal architectures for even medium sized appli-
cations. Multi-objective design optimization problems have been tackled practically
by utilizing metaheuristic optimization algorithms. Meta-heuristic techniques do not
guarantee the best solution but are able to deliver a set of near optimal solutions
within a reasonable time frame. They generate an initial set of feasible solutions
and progressively move towards better solutions by modifying and updating the
existing solutions in a guided manner. Several metaheuristics such as Simulated
Annealing (SA) [10, 11], Tabu Search (TS) [12], Ant Colony Optimization (ACO)
[13, 14], Practical Swarm Optimization (PSO) [15, 16] and Genetic Algorithm (GA)
[17–19] have been used for reliable multiprocessor design optimization. However,
these approaches also suffer their own drawbacks. For example SA suffers from
time longevity for a low temperature cooling process. GA is an effective optimiza-
tion technique for large sized problems and can efficiently deal with the problem of
convergence on local sub-optimal solutions. But GA requires excessive parameter
tuning. In general existing approaches face difficulty in fitting into design explo-
ration semantics because of their random and unpredictable behavior. We have used
Cuckoo Search (CS), a new optimization algorithm developed by Yang and Deb that
is by inspired by the unique breeding behavior of certain cuckoo species [20], for our
multi-objective DMPE design optimization. Experiments demonstrate that it is able
to deliver a set of high quality optimal solutions within a reasonable period. Besides,
it needs minimal parameter tuning.
An important issue that needs due consideration is—how to harness the end-user’s
proactive participation in the process of system design? Reliability and Availability
A Fuzzy Cuckoo-Search Driven Methodology for Design Space Exploration 133
2 Design Environment
TA 0 2 1 0
v0 v1 v2 v3
1 1 1 1 0.4
0.6 1
v8 v9 v10 v11
(b) Bus database: The communication links of the distributed system are imple-
mented using time shared buses. Table 2 shows the technical data for each bus
type used by the DSE tool in our experiments. It includes its data transfer rate,
failure rate, repair rate and its cost.
(c) Execution Times database: The probability distributions of the execution times
for all tasks of the given application on each of the available PEs are stored in a
database of execution times. Table 3 is the database of task execution times for
the tasks of TG12 on the PE types in Table 1. Each entry in this table is a set
of Beta distribution parameters for the corresponding execution time. The lower
and upper limits of the timing distributions are shown in the table. The two shape
parameters of each of these Beta distributions are assumed to be 0.5 each. A *
denotes that it is infeasible to execute the task on the corresponding platform.
Observe that the GPP takes the maximum time to execute a given task. The SPP
takes lesser time while the ASICs incur the least execution time.
A Fuzzy Cuckoo-Search Driven Methodology for Design Space Exploration 137
If
(QL(pon) is Low .AND. QL(po n-1) is Medium .OR.....QL(pol )
is Very High .AND. QL(po 0) is Remote)
then
(IoA(pon) is High, IoA (po n-1) is Very High,..., IoA(po0)
is Remote).
The user ascribes different degrees of relative importance to the services available,
called Importance-of-Availability (IoA). It is to be noted that these IoA values change
under different situations. When the system is fully functional, the user assigns a
perceived relative importance to each one of system’s services. However, when one
or more service(s) degrade in quality, the perceptions can change dramatically. The
relative importance of the service points must then be assigned afresh.
The process of capturing the availability requirements begins when the user visu-
alizes different scenarios depicting the quality level that is available at each primary
output. The user’s availability requirements are expressed in linguistic terms by a set
of fuzzy rules. An example fuzzy rule is given in Fig. 2. It has an antecedent part and a
consequent part. Its antecedent combines the various output quality levels {QL(poi )}
using AND/OR operators. The set of output quality levels denotes a certain condition
called usage context. The consequent part of the fuzzy rule assigns IoA weights to
each of the primary outputs under the given usage context.
The user is free to input as many rules as deemed necessary to express all her
availability requirements. These rules are stored in a database. Table 4 illustrates the
set of fuzzy rules that were input for TG12 in our experiments. The topmost rule is
valid for the fully functional scenario when each service point is available with its
maximum accuracy (QL is Very High). The relative importance of the four outputs
in this condition are such that po0 and po1 are considered less important that po2 and
po3 . Similarly, rules for other usage contexts are stored in the database.
Complimentary to the fuzzy framework for IoA requirements elicitation, our co-
synthesis scheme incorporates a fuzzy engine to compose the multiple design objec-
tives during the optimization process. We shall describe its working in the next
section.
Given the above inputs viz. the RT application to be realized in the form of a
CTPG, the platform library comprising the needed technological databases and the
availability requirements described by a set of fuzzy rules, the FCS-DSE system
launches a design space exploration process that evaluates candidate architectural
solutions. The process finally yields a set of cost-optimal architectures.
138 S. Chakraverty and A. Kumar
Table 4 Fuzzy rules for user input contextual importance-of-availability requirements for TG12
Rule no QL(po) AND-ed combination IoA(po)
po0 po1 po2 po3 po0 po1 po2 po3
1 VH VH VH VH H H VH VH
2 H H L L H H L L
3 R M H R H H H R
4 M M M M H H M L
5 L L L L H M L R
6 VH H H VH L H VH VH
7 H M L R H H M R
8 L L L L H M M R
9 M M M H H M H H
An architectural solution for the given CTPG and associated databases is encoded
as a vector of integer values that define its architectural features. Figure 3 represents
a feasible solution encoding for TG12 .
The solution encodes the following features:
1. The number of instances of each PE type: {NPE }
2. The computation task (or node) to resource mappings {NV −P }
3. The number of instances of each Bus types {NB }
4. The communication task (or edge) to bus mappings {NE−B }.
Thus the total number of decision variables L is equal to:
The FCS-DSE system uses the Cuckoo Search (CS) algorithm to optimize the dis-
tributed multiprocessor architectures. A set of global, qualitative objectives govern
the optimization path. It embeds a fuzzy logic based fitness evaluator within CS to
assess the fitness of the solutions.
A Fuzzy Cuckoo-Search Driven Methodology for Design Space Exploration 139
In this section, we will first elucidate the various design objectives. Next we will
elaborate upon the fuzzy model that is employed for composing the multiple design
objectives. Finally we will explain the adaptation of the Fuzzy-CS metaheuristic to
our DMPE design problem.
The overall system Qualitative Availability QAsys is given by the following equa-
tion.
QAsys = IoAk ∗ Pk (4)
k
The aim of DSE is to maximize QAsys and thereby ensure that the system continues
to serve user perceived critical services even in the presence of faults.
3. Cost_Effectiveness: The cost of realization Ct of a chromosome is the cumulative
cost of each resource deployed for realizing the architecture. The cost factor
includes a feasibility constraint whereby the budget Ctmax should not be exceeded.
In addition there is a cost minimization objective. The Cost-effectiveness of a
solution is defined by the probability that the system is realizable under the
prescribed budget Ctmax . It is given by:
Ctmax − Ct
Cost_Effectiveness = × U(Ctmax − Ct) (5)
Ctmax
Rule 1:
if (Performance is Very High .OR. Cost_Effectiveness is
Very High .OR. Qualitative Availability is Very High)
then (Fitness is Very High)
Rule 2:
if (Performance is High .AND. Cost_Effectiveness is Me-
dium .AND. Qualitative Availability is Very High)
then (Fitness is Very High)
Rule 3:
if ( Performance is Medium .OR. Qualitative Availabil-
ity is High .AND. Cost_Effectiveness is High)
then (Fitness is High)
Fig. 4 Examples of fuzzy rules for evaluating solution fitness by blending multiple objectives
Notably, rules can be framed to depict a set of criteria based on the concept of
non-dominance as exemplified by Rule 1. The antecedent of this rule encapsulates a
condition where at least one the objectives is Very High. In its consequent, it implies
Very High solution fitness under this condition. In contrast, Rule 2 prescribes a lin-
guistically weighted combination of various objectives. Rule 3 describes a composite
condition by using both AND and OR operators. The user can specify many such
rules in a flexible and discernible manner to guide the evaluation of solution fitness.
Table 5 shows the set of rules that were input to the DSE tool for composing the
design objectives and evaluating the fitness of solutions for TG12 .
Membership Function: In [23], Kasabov defined standard types of membership
functions such as Z function, S function, trapezoidal function, triangular function
and singleton function. These membership functions define the degree to which an
input or output variable belongs to different but overlapping fuzzy sets. We chose the
trapezoidal membership function as it is sufficient to capture the imprecise relation-
ships between various objectives and is computationally simple. Figure 5 illustrates
142 S. Chakraverty and A. Kumar
0.5
0.0
.1 .2 .3 .4 .5 .6 .7 .8 .9 1.0
Fig. 5 Trapezoid membership function for the input and output variables of the fuzzy fitness
evaluator
the trapezoid membership function that have been used to fuzzyfy all input objectives
and the solution fitness.
Working of the fuzzy engines: We adopt the Mamdani model for the fuzzy engine
employed to capture the IoA requirements of the application and also for the FFE
sub-system.
1. Fuzzification: The absolute values of input variables are mapped into the fuzzy
sets. The degree of membership of each input variable in each of the fuzzy sets
is determined by consulting its membership function.
2. Rules Activation: For a given set of crisp values of design objectives, several rules
may be invoked because each crisp value maps to more than one linguistic set. The
overall strength of a rule is determined by the value of the membership functions
of all the design objectives represented in its condition. The min() function for
conjunctive AND operators and the max function for the disjunctive OR operator
between conditions is used to determine input matching degree of a rule.
3. Inferring the consequent: The clipping method is used to infer the rule’s con-
clusion. For each component in the consequent, its membership above the rule’s
matching degree is cut off and maintained at a constant level.
4. De-fuzzification: De-fuzzification converts the fuzzy fitness values to a crisp over-
all fitness. The Centroid Of Area method is used to generate a final crisp value
for the output variables.
Cuckoo Search (CS) is a metaheuristic search technique proposed by Yang and Deb
[20]. It evolves a population of solutions through successive generations, emulating
the unique breeding pattern of certain species of the cuckoo bird. Further, it uses
a heavy-tailed Levy flight probability distribution to generate its pattern of random
walk through the search space [25]. The Levy flight is simulated with a heavy tailed
Cauchy distribution. Levy flight driven random walk is found to be more efficient
than that obtained by Uniform or Gaussian distributions. In nature, it is used to
advantage by birds and animals for foraging.
A Fuzzy Cuckoo-Search Driven Methodology for Design Space Exploration 143
Fig. 6 Pseudocode for the fuzzy Cuckoo Search driven design space exploration FCS_DSE function
The cuckoo lays its eggs in the nest of another bird that belongs to a differ-
ent species. Over many generations of evolution, the cuckoo has acquired excellent
mimicry to lay eggs that deceptively resemble the host bird’s eggs. Closer the resem-
blance, greater are the chances of the planted eggs being hatched by the host. The
cuckoo also times the laying and planting of her eggs cleverly so that they hatch
earlier than those of the host bird. Being first to hatch, the fledgling destroys new
born chicks of the host to further enhance its survival chances. However, once in a
while the host bird discovers the cuckoo eggs and either destroys them or simply
abandons the nest to build a new one.
The pseudo-code in Fig. 6 describes our FCS_DSE optimization algorithm.
• Any feasible architectural solution is an egg. The process starts with a set of new
solutions—the cuckoo’s eggs, laid one per nest (line 2). High quality solutions with
greater fitness correspond to those cuckoo eggs that most closely resemble host
eggs and therefore have greater chances of surviving. The subsequent steps are
performed for each generation till either the best fitness stabilizes or the maximum
number of generations is reached.
144 S. Chakraverty and A. Kumar
• The solutions are sorted according to their fitness values as calculated by the FFE
engine (lines 4, 5).
• The solutions are advanced one step at a time by generating a new solution from
an existing one (lines 6, 7, 8). The choice of a solution, its targeted architectural
feature and the modification in its value are all done by local random walk whose
steps are drawn from heavy tailed (Levy flight) Cauchy distributions. Levy flight
predominantly creates new solutions in the vicinity of good ones (exploitation),
but resorts to sudden bursts of variations (exploration).
• If the new solution turns out to be superior to a randomly picked up existing solu-
tion, it replaces the old one (lines 9, 10, 11). This process is akin to a cuckoo chick
(superior solution) destroying a host chick (inferior solution). It is an exploitative
search technique which favors good solutions.
• With a probability Pa, a low quality solution with least fitness is discarded from
the population and replaced another one built anew (line 12). This is analogous to
the discovery of poorly mimicked cuckoo eggs by the host bird, abandoning the
nest and building a new one from scratch. It helps the search process from getting
stuck in local minima.
• Finally, high quality solutions are passed on to the next generation and the process
is repeated. This ensures survival of the fittest.
4 Experimental Results
The Fuzzy Cuckoo Search driven Design Space Exploration (FCS-DSE) tool is
implemented using Object oriented design in C++. We conducted our experiments
on a Pentium dual core 2.59 GHz processor.
We input the synthetically generated 12-node CTPG TG12 shown in Fig. 1 to the
FCS-DSE tool. TG12 is representative of real time applications that require a high
degree of concurrency among its tasks and also impose some sequential constraints.
The Processor and Bus databases are given in Tables 1 and 2 respectively. Table 3
shows the execution time distributions of the tasks of TG12 on available processor
types, Table 4 gives the fuzzy rules for IoA requirements and Table 5 gives the fuzzy
rules for fitness evaluation. The fitness evaluation rules in Table 5 give a higher prefer-
ence to Qualitative Availability as compared to Performance and Cost_Effectiveness.
Starting with a population of 20 chromosomes and the value of Pa set to 0.5, the
design exploration was conducted through 400 generations. The task-to-processor
mapping of the best architecture that was obtained at the end of exploration is shown
in Fig. 7. The block diagram of the architecture is given in Fig. 8. The system uses
A Fuzzy Cuckoo-Search Driven Methodology for Design Space Exploration 145
TA= 0 2 1 0
v0 v1 v2 v3
1 1 0.4
1 0.6 1
1
v8 v9 v10 v11
Legends
b0 {e2, 6}
b1 {e3,6, e6, 10}
four processors to execute the computational tasks and four busses to implement the
communication tasks.
The following allocation patterns can be observed:
• Sequential tasks and tasks at different hierarchical levels share the same processor.
For example v1 precedes v5 which in turn precedes v8 . All three tasks share the
same processor SPP s0 .
146 S. Chakraverty and A. Kumar
• Tasks at the same hierarchical level having equal precedence order are allocated
to different processors, thus allowing their concurrent execution. For example the
terminal tasks v8 , v9 , v10 and v11 are allocated to different PEs.
• Tasks that contribute to primary outputs with high IoA values are predominantly
allocated to ASICs. The first rule in Table 4 shows that the primary outputs po2
and po3 are given more importance than others in the fully functional condition.
Moreover they lose their importance drastically when their output quality level
reduces (for example see rules 4, 5 and 8 in Table 4). Hence the system assigns the
more reliable ASICs to implement the tasks v2 , v3 , v7 , v10 and v11 that contribute
to po2 and po3 .
• Tasks that contribute to less important outputs are mapped to less reliable, cheaper
processors. For example tasks v0 , v1 , v4 , v5 , v7 andv9 contribute to outputs po0 and
po1 . Table 4 shows that these outputs have lesser important in the fully functional
state (rule 1). Even when their QL levels diminish, these outputs are still acceptable
(rules 4, 5 and 8). The DSE system aptly allocates a GPP for tasks v0 , v4 , v9 and
an SPP for tasks v1 , v5 , v7 as they are less reliable and cheaper PEs than ASICs.
• For the best architecture, there are seven local inter-task communications and seven
remote inter-task communications. The remote data transfers are implemented on
four bus instances in a manner such that sequential edges share a bus (e3,6 and
e6,10 ) while concurrent edges are assigned to different busses (e2,6 and e2,7 ).
The above observations indicate that the FCS-DSE system allocates the resources
among the tasks of an application judiciously so as to enhance concurrency and
availability in a cost-effective manner.
Average Fitness
0.4
0.3
0.2
0.1
0
0 200 400
Run Time
0.8
Fitness Values
0.7
0.6
0.5
0 100 200 300 400 500 600
Generation
of exploitation when random moves are taken from the tail end of the distribution.
Significantly, these transitions from exploitative search to explorative search occur
naturally due to the characteristics of the heavy-tailed Cauchy distribution. In sharp
contrast GA requires adjustment of two parameters viz. crossover rate and mutation
rate to balance exploration and exploitation. CS thus relies on fewer tuning parame-
ters (Fig. 9)
• Best Fitness: Figure 10 gives a plot of the best fitness value achieved in each
generation for CS and GA driven optimization processes. The value of Pa for CS
was set to 0.5 and MR for the GA was set to 0.25. The best fitness is a monotonically
function across generations. This is due to the selection operator Elitism which
ensures that the best solution in any generation is preserved in the next generation.
We find that CS produces the better results in shorter run time. However, CS
was not able to make any further change in the best solution during extended run
time beyond 400 generations. In contrast to this, GA improved the fitness of best
solution during the extended time period. This happened because CS looks for new
solutions mostly in the vicinity of good solutions while GA explores the search
space evenly with the help of random moves taken from a uniform distribution.
148 S. Chakraverty and A. Kumar
We experimented with the DSE tool to assess architectural solutions for TG12 by
inserting three different fitness evaluation functions within the optimization code:
Table 6 shows the quality of best solutions obtained through the probabilistic,
weighted and fuzzy based fitness evaluation methods for TG12. The fuzzy logic based
fitness function has produced the best solution. Its system availability is 24.45 %
higher than weighted fitness method and 19.24 % higher than the probabilistic fitness
methods. Moreover its performance also surpasses that obtained by the weighted
fitness method by 14.29 % and the probabilistic fitness method by 16.76 %. It is
cheaper than the solution of the weighted fitness method by 46.67 % but costlier than
the solution of the probabilistic fitness method by 10 %.
The above results highlight the fact that when fixed weights are used for each
objective throughout the exploration path as in the case of weighted and probabilistic
methods, then the optimization algorithm rejects several combinations of objectives
that are actually acceptable to the user. Fuzzy rules evaluate the fitness values more
faithfully, applying different criteria under different states of functionality. This gives
a better chance to a wider variety of solutions to participate in the next generation of
evolution.
A Fuzzy Cuckoo-Search Driven Methodology for Design Space Exploration 149
5 Conclusion
References
1. Mentor Graphics, Vista a complete TLM 2.0-based solution (2011), Available: http://www.
mentor.com/esl/vista/overview. Accessed 2 June 2011
2. Cadence, A cadence vision: EDA360 (2011), Available: http://www.cadence.com/eda360/
pages/default.aspx. Accessed 2 June 2011
3. Synopsys, Platform architect: SoC architecture performance analysis and optimization
(2011), Available: http://www.synopsys.com/Systems/ArchitectureDesign/pages/Platform
Architect.aspx. (Accessed 3 June 2011)
4. Simulink, mathworks. (Online)
5. R. Luus, Optimization of system reliability by a new nonlinear integerprogramming procedure.
IEEE Trans. Reliab. 24(1), 14–16 (1975)
6. D. Fyffe, W. Hines, N. Lee, System reliability allocation and a computational algorithm. IEEE
Trans. Reliab. 17(2), 64–69 (1968)
7. Y. Nakagawa, S. Miyazaki, Surrogate constraints algorithm for reliability optimization prob-
lems with two constraints. IEEE Trans. Reliab. 3(2), 175–180 (1981)
8. K. Misra, U. Sharma, An efficient algorithm to solve integer programming problems arising
in system-reliability design. IEEE Trans. Reliab. 40(1), 81–91 (1991)
9. C. Sung, C.Y. Kwon, Branch-and-bound redundancy optimization for a series system with
multiple-choice constraints. IEEE Trans. Relaib. 48(2), 108–117 (1999)
10. B. Suman, Simulated annealing-based multi-objective algorithm and their application for
system reliability. Eng. Optim. 35(4), 391–416 (2003)
11. V. Ravi, B. Murty, P. Reddy, Nonequilibrium simulated annealing algorithm applied to relia-
bility optimization of complex systems. IEEE Trans. Reliab. 46(2), 233–239 (1997)
12. S. Kulturel-Konak, D. Coit, A.E. Smith, Efficiently solving the redundancy allocation problem
using tabu search. IIE Trans. 35(6), 515–526 (2003)
13. Y.-C. Liang, A. Smith, An Ant System Approach to Redundancy Allocation, in Proceedings
of the 1999 Congress on Evolutionary Computation (Washington, D.C., 1999)
14. Y.-C. Liang, A. Smith, Ant colony paradigm for reliable systems design, in Reliability Engi-
neering, vol. 53, ed. by Computational Intelligence (Springer, Berlin, 2007), pp. 417–423
15. G. Levitin, X. Hu, Y.-S. Dai, Particle swarm optimization in reliability engineering, in Com-
putational Intelligence in Reliability Engineering, vol. 40, ed. by G. Levitin (Springer, Berlin,
2007), pp. 83–112
150 S. Chakraverty and A. Kumar
16. P. Yin, S. Yu, W.P.P. Wang, Y.T. Wang, Task allocation for maximizing reliability of a dis-
tributed system using hybrid particle swarm optimization. J. Syst. Softw. 80(5), 724–735
(2007)
17. P. Busacca, M. Marseguerra, E. Zio, Multiobjective optimization by genetic algorithms: appli-
cation to safety systems. Reliab. Eng. Syst. Safety 72(1), 59–74 (2001)
18. A. Konak, D.W. Coit, A.E. Smith, Engineering & system safety multi-objective genetic algo-
rithms: a tutorial. Reliab. Eng. Syst. Safety, 992–1007 (2006)
19. L. Sahoo, A.K. Bhunia, P.K. Kapur, Genetic algorithm based multi-objective reliability opti-
mization in interval environment. Comput. Ind. Eng. 62(1), 152–160 (2012)
20. X. Yang, S. Deb, Engineering optimisation by cuckoo search. Int. J. Math. Model. Numer.
Optimisation 1(4), 330–343 (2010)
21. I. Olkin, L. Glesser, C. Derman, Probability Models and Applications, 2nd edn. (Prentice Hall
College Div, NY, 1994)
22. K. Anil, C. Shampa, A fuzzy based design Exploration scheme for High Availability Het-
erogeneous Multiprocessor Systems. eMinds: Int. J. Human-Computer Interact 1(4), 1–22
(2008)
23. N.K. Kasabov, Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering
(The MIT Press, Cambridge, 1996)
24. J. Anderson, A Survey of Multiobjective Optimization in Engineering Design, Technical Report
Department of Mechanical Engineering (Linköping University, Sweden, 2000)
25. A. Kumar, S. Chakerverty, Design optimization for reliable embedded system using Cuckoo
search, in IEEE International Conference on Electronics Computer Technology (ICECT),
Kanyakumari, India, (2011)
26. L.A. Zadeh, Fuzzy sets. Inf. Control 8(3), 338–353 (1965)
Part III
Modeling Framework
Model-Based Verification and Validation
of Safety-Critical Embedded Real-Time
Systems: Formation and Tools
1 Introduction
Real-time systems is one of the challenging research area today, which addresses both
software and hardware issues related to computer science and engineering design.
In a real-time system the correctness of the system performance depends not only
on the logical results of the computations, but also on the time at which the results
are produced [1]. A real-time system changes its state precisely at physical (real)
time instant, e.g., maintaining the temperature of a chemical reaction chamber is
a complex continuous time process which constantly changes its state even when
M. A. Khan et al. (eds.), Embedded and Real Time System Development: 153
A Software Engineering Perspective, Studies in Computational Intelligence 520,
DOI: 10.1007/978-3-642-40888-5_6, © Springer-Verlag Berlin Heidelberg 2014
154 A. H. Khan et al.
ACTUATORS
COMMANDS
REAL-TIME
PLANT
SYSTEM
PLANT STATUS
DISPLAY
SENSORS
the controlling computer has stopped. Conceived from controlling the real world
phenomena, real-time systems are often comprised of the following three subsystems
shown in Fig. 1.
Controlled system is the device (the plant or object), we want to control according
to the desired characteristics. It also contains actuating devices i.e., motors, pumps,
and valves, etc. and sensors i.e., pressure sensor, temperature sensor, navigation
sensor, and position sensors, etc. Surrounding environmental effects (disturbances),
sensors noise and actuators limits are also considered as a part of this subsystem.
Operator environment is the human operator, who commands the controlling
system to control the output of the controlled system. It also contains the command
input device i.e., keyboards, joysticks, and brake pedals, etc.
Controlling system is the real-time system or the controller which acquires the
information about the plant by using sensors and controls it with actuators according
to user demands under sensors imperfection and actuating device limitation consid-
erations.
Real-time systems can be categorized based on two factors [2]. The factors out-
side the computer system classify the real-time systems as soft real-time, hard real-
time, fail-safe real-time and fail-operational real-time systems. The factors inside the
computer system classify the real-time systems as event-triggered, time-triggered,
guaranteed-timeless, best-effort, resource adequate and resource inadequate real-
time systems. Typically, in real-time systems, the nature of time is considered,
because deadlines are instants in time. Safety-critical real-time systems are mainly
concerned with the result deadlines based on the underlying physical phenomena of
the system under control.
Locke [3] describes the classification of the real-time systems according to the cost
of missing a desired deadline as shown in Fig. 2. In a soft real-time system, producing
a result after its deadline will still be valuable to some extent even if the outcome is
not as profitable as if the result had been produced within the deadline and it may
be acceptable to occasionally miss a deadline. Examples of such systems include
the flight reservation system, TV transmissions, automated teller machine (ATM),
Model-Based Verification and Validation 155
Real-Time Systems
(RTS)
value
value
value
time time time time
-n (penalty)
- (disaster)
Fig. 2 Four types of real-time systems and their effects of missing a deadline
video conferencing, games, virtual reality (VR), and web browsing, etc. In a firm real-
timesystem, producing a result after missing few deadlines will neither give any profit
nor incur any cost but missing more than few may lead to complete or catastrophic
system failure. Such systems include cell phones, satellite-based tracking, automobile
ignition system and multimedia systems, etc. In a hard essential real-time system,
a bounded cost will be the outcome of missing a deadline e.g., lost revenues in a
billing system. Lastly, missing a deadline of a hard critical real-time system will
have dreadful consequences e.g. loss of human lives and significant financial loss
[4]. Examples of such systems include avionics weapon delivery system, rocket and
satellite control, auto-pilot in aircraft, industrial automation and process control,
medical instruments and air-bag controller in automotives to name few.
Hard critical real-time or simply hard real-time systems are usually safety-critical
systems. These systems must respond in the order of milliseconds or less to avoid
catastrophe. In contrast, soft real-time systems are not as fast and their time require-
ments are not very stringent. In practice, a hard real-time system must execute a set
of parallel real-time tasks to ensure that all time-critical tasks achieve their speci-
fied deadlines. Determining the priority order of the tasks execution based on the
provided rules is called scheduling. The time scheduling problem is also concerned
with the optimal allocation of the resources to satisfy the timing constraints. Hard
real-time scheduling can be either static (pre run-time) or dynamic in nature.
In static scheduling, the scheduling of the tasks is determined in advance. A run-
time schedule is generated as soon as the execution of one task is finished by looking
in a pre-calculated task-set parameters table, e.g., maximum execution times, prece-
dence constraints, and deadlines. Clock-driven algorithms and offline scheduling
techniques are mostly used in static systems, where all properties of the tasks are
known at design time.
In dynamic scheduling there is no pre-calculated task-set, decisions are taken at run
time based on set of rules for resolving conflict between tasks, we want to execute
156 A. H. Khan et al.
Real-Time
Scheduling
Aperiodic Periodic
Task Scheduling Task Scheduling
at the same time. One common approach is the introduction of pre-emptive and
non-pre-emptive scheduling. Priority-driven algorithms are mostly employed in
dynamic systems, with combination of periodic and aperiodic or sporadic tasks.
In preemptive scheduling, the presently running task will be preempted upon
arrival of higher priority task. In nonpreemptive scheduling, the presently running
task will not be preempted until completion. Parallel processing system is dynamic
if the tasks can migrate between the processors and static if the tasks are bound to
a single processor [5]. In general, the static systems have inferior performance as
compared to dynamic systems in terms of overall job execution time. It is easy to
verify, validate and test a static system as compared to dynamic system for which
sometimes it may be impossible to validate the system. Because of this fact, hard
real-time systems are preferred over static systems. Figure 3 shows the taxonomy of
real-time systems scheduling with implementation architecture.
Often, the real-time systems are implemented with combination of both hard
real-time tasks and soft real-time tasks. Traffic control system is a typical example
of a critical hard real-time task to avoid crashes as compared to soft real-time tasks,
where optimized traffic flows can be experienced. In measurement systems, value
of timestamps is a hard real-time task and delivering timestamps is a soft real-time
task. Another example of such an application is a quality control using robotics where
defective product removal from the conveyer belt is a soft real-time task and stopping
it in emergency is a hard real-time task.
Generally, embedded systems are used to meet the real-time (RT) system per-
formance specifications in an individual processor form or in a complete sub-
system form i.e., System-On-Chip (SOC). Dedicated embedded hardware i.e., gen-
Model-Based Verification and Validation 157
Communications/ Medical
Networking Instruments
Automotive / Aerospace
Transportation 6% Electronics
Systems & 21%
Equipment 6%
5%
Electronic
Consumer 7%
6% Testing
Electronics / Equipment
Multimedia
10% 13%
Computers/
Other 11% Office
15% Automation
Government /
Military
Electronics Industrial Control
Algorithm
Developers
System
Engineer
8.5%
Software 8.9%
Engineer 36.8% 7.3%
Project
Managers
17.9%
6.7% Test/Verification/
IC/SOC 8.2% 5.7% Validation Engineer
Engineer
Board Mechanical
Engineer Engineer
Fig. 7 Percentage of full-time engineers working on projects in the year 2010 (extracted from [6])
i.e. quality, user satisfaction, costs according to budget and schedule are interdepen-
dent as each corner of a square shown in Fig. 8. These interconnected targets can
easily be achieved using model-based design and VV&T techniques, which is the
theme of this text.
160 A. H. Khan et al.
One of the most demanding application of embedded real-time system is the safety-
critical systems where little unseen software bug or hardware design malfunction
can cause unenviable damage to the environment and could result in loss of life
[8]. These expensive complex systems such as hybrid electric vehicles, autonomous
underwater vehicles (AUV), chemical plants, nuclear power plants, aircraft, satellite
launch vehicle (SLV) and medical imaging for non-invasive internal human inspec-
tions are designed in such a way to ensure system stability and integrity during all
of the system functional modes in normal scenario and some level of performance
and safe procedure in case of faults. Multiple distributed embedded real-time (RT)
computers are used in medical, aerospace, automobile and industrial applications for
fast real-time performance with controllability of the system.
Testing and qualification of a safety-critical embedded real-time system is of great
importance for an industrial product development and quality assurance system.
Embedded real-time system is a blend of advanced computer hardware and software
to perform specific control function with stringent timing and resource constraints
as a part of larger system often consists of other mechanical or electrical hardware.
For example to maintain the flow of a liquid through a pipe, we need some actuating
mechanism i.e., flow control valve and sensing device e.g., flow meter in a digital
closed loop as shown in Fig. 9. Where r(t), rk , y(t), yk , ek , uk , and u(t) are reference
command, sampled reference command, process output, sampled process output,
sampled error, sampled control signal and continuous control signal respectively.
At each phase of an embedded software development life-cycle, we require a quick
verification methodology with appropriate validation test cases because of tight time
schedules during product development.
The benchmark for measuring the maturity of an organization’s software process is
known as capability maturity model (CMM). In CMM there are five levels of process
maturity based on certain key process areas (KPA) as shown in Fig. 10. There are four
essential phases of embedded real-time software development process. First phase
is of the requirements realization/generation, review, analysis, and specification.
Model-Based Verification and Validation 161
Reference rk ek uk
A/D Embedded
Inputs Converter + Control Laws
-
r(t)
yk
A/D D/A
Converter Control System Converter
y(t) u(t)
Outputs
Flow
y(t) meter
Valve
LEVEL 5
Optimizing (~ 1%)
Second phase is of system design and review. Third phase includes algorithm imple-
mentation and final phase is of extensive testing. Each of these phases has an output
which we have to validate. Interpretation of these phases may differ according to
the projects. Each particular style and framework which describes the activities at
each phase of embedded software development is called a “Software Development
Life-Cycle Model”.
162 A. H. Khan et al.
3 Methods in VV&T
Here, we provide a brief overview of the model based verification, validation, and
testing procedures employed for embedded system design. Verification is the process
of assessing a system or component to determine whether the products of a given
development phase satisfy the requirements imposed at the start of that phase whereas
validation is the process of assessing a system or component during or at the end
of the development process to determine whether it satisfies the specified product
requirements [9]. Verification, validation and testing (VV&T) is the procedure used
in parallel to system development for ensuring that an embedded real-time system
meets requirements and specifications and that it fulfills its deliberate purpose. The
principal objective is to determine faults, failures and malfunctions in a system and
evaluation of whether the system is functional in an operational condition.
There are various embedded real-time software development life-cycle (SDLC)
models available in technical literature. Some of the well known life-cycle models
are:
1. Incremental Models
2. Iterative Models
• Spiral Model
• Evolutionary Prototype Model
3. Single-Version Models
• Big-Bang Model
• Waterfall Model without “back flow”
• Waterfall Model with “back flow”
• “V” Model
As compared to the incremental models, iterative models do not start with a com-
plete initial specification of requirements and implementation. In fact, development
Model-Based Verification and Validation 163
Review Design
Implementation
begins with specifying and implementing just part of the software, which can then be
reviewed and modified completely at each phase in order to identify further require-
ments. Figure 12 shows an iterative lifecycle model, which consist of repeating the
four phases in sequence. For further details see Ref. [10].
In this software product models, one VV&T procedure is followed without addition
or review of product design after requirements identification at later stage. Some of
the models categorize in this group are discuss next.
Here a software developer works in isolation for some extended time period to solve
the problem statement. Developed product is then delivered to the customer with a
hope that client is satisfied.
164 A. H. Khan et al.
(a) (b)
Requirements Requirements
Design Design
Implementation Implementation
Test Test
Maintenance Maintenance
Fig. 13 Waterfall SDLC models. a Waterfall without “back flow”, b Waterfall with “back flow”
This is one of the oldest and widely used classical software development models
initially utilized in government projects. This model emphasizes on planning and
intensive documentation which makes it to identify design flaws before development.
The simplest waterfall lifecycle model consists of non-overlapping phases where each
phase “pours over” into the next phase as shown in Fig. 13a.
Waterfall model starts with the requirements phase; where, the function, behavior,
performance and interfaces are defined. Then, in the design phase; data structures,
software architecture, and algorithm details are decided. In implementation phase the
source code is developed in order to further proceed towards testing and maintenance
phases. There are many variants of simple waterfall model. One of the most important
is with correction functionality where detected defects are corrected by going back
to the appropriate phase as shown Fig. 13b.
3.3.3 V-Model
One of the most effective and employed lifecycle model used for software design and
VV&T is V-model [11] as shown in Fig. 14. In V-lifecycle model verification and
validation activities are performed in parallel with development process. At initial
development phase verification and validation are of static in nature whereas in later
development phases they are dynamic [10]. The static verification is concerned with
the requirements, analysis of the static system representation and tool based software
analysis. The VV&T after availability of software is of dynamic character with test
data generation, product functional performance testing, integrated system testing
and acceptance testing.
Model-Based Verification and Validation 165
System requiremets
Systems validation
analysis
System software
HIL testing
design
Rapid prototyping
Implementation
(SIL)
Inspection
Static analysis: It means verifying a product without executing the VV&T object.
These checks are text analysis, requirement analysis and VV&T functionality review.
This review is performed by a team of VV&T experts and product designers. This
technique is able to find missing, deficient and unwanted functionality in the source
text of the product under analysis at early development phase.
Testing: Testing is carried out to check dynamically, whether the product require-
ments are fulfilled. Both, verification and validation can be performed by testing. The
biggest advantage of testing as compared to other V&V techniques is the analyz-
ing the system in realistic application environment. This allows the realistic online
and real-time behavioral examination of the developed product. Here we describe
some of the common In-the-loop testing procedures in model-based safety-critical
embedded real-time product development.
Component testing: It is performed in SIL to verify the individual, elementary
software component or collection of software components in a model-based closed-
loop real-time simulation environment. Each software code is extensively verified
with respect to requirements, timing and resource constraints. In model-based VV&T
process automatic code generation technology is used for rapid prototyping of sim-
ulation model or design fully implemented in embedded software code. Which can
also act as a final embedded real-time product or used as a simulator for further
product testing procedures i.e., PIL, HIL and system integrated testing.
White box testing: It is called code based testing [17] which treats the software
as a box that you can see into to verify the embedded real-time software code execu-
tion. Black box testing (also called specification based testing or functional testing
[17]) treats software as a box that you cannot see into. Here we have to check the
functionality of the implemented code according to the specification without con-
sideration to how the software is implemented. It looks what comes out of the box
when a particular input data is provided.
Processor-In-the-loop (PIL) testing: It is performed after successful implemen-
tation of design software on actual processing hardware with electronic hardware I/O
interface availability. SIL simulation results are compared with the PIL simulation
168 A. H. Khan et al.
Initialization
. initial population Are termination Yes
Best individuals
generation criteria met?
.evalution of individuals No
Reinsertion Mutation
Offspring evaluation
results to verify the compiler and processor. In PIL, we check the shortest and longest
execution times of the implemented embedded software algorithm through evolu-
tionary testing. Where each individual of the population represents a test value for
which the system under test is executed. For every test value, the execution time is
measured to determine the fitness level of individual.
Extended evolutionary testing: In this approach is presented in [18] which allow
the combination of multiple global and local searching strategies and automatic
optimal distribution of resources for the success of the strategies. In recent study
[19] evolutionary algorithm is used for generation of test traces that cover most or
all of the desired behavior of a safety-critical real-time system. Evolutionary testing
is also used for verification of developers’ tests (DT). For a detailed discussion of
evolutionary algorithms see [20]. General evolutionary algorithm execution flow is
shown in Fig. 17.
Hardware-In-the-Loop testing: It is performed with realistic input data and other
actual mechanical/electrical system components to validate the embedded real-time
software. HIL testing is the standard verification and qualification procedure for
safety critical products industry. HIL simulation response is compared with the PIL
testing results and SIL testing results to ensure the correctness of the embedded
real-time algorithm. Fault injection testing is also carried out with HIL simulation
testing to check the robustness of the system to unwanted environmental conditions.
Figure 18 illustrates the HILS testing of an unmanned aerial vehicle (UAV) to validate
the flight controller.
After successful HIL testing and verification of embedded RT flight control com-
puter (FCC), entire integrated system testing is performed. Where actual flight vehi-
cle sensors, actuators and engine are installed with the FCC and the real-time flight
Model-Based Verification and Validation 169
ground data is logged for verification and comparison with the design requirements
and RT simulation. Integration testing checks the defects in the interfaces and interac-
tion between integrated subsystems. These integrated subsystems behave as elements
in an architectural design where the software and hardware works as a system [21].
Integrated and acceptance testing: Complete integrated system and formal
acceptance testing is performed in the presence of customer to validate the system
standard and customer requirements. After the certification and customer approval
product will go to the production department with specification and VV&T details
according to the requirement and quality standards.
170 A. H. Khan et al.
Controls
Winds
Aerodynamic
Model
Atmosphere
Model
Sensors
Propulsion
Model
Equations
Of
Earth Motion
Model
Inertia
Model
of-the-self (COST) UAV system. A low cost off-the-shelf autopilot and sensor system
is used to develop this platform.
Here the plant model is a physical, functional, logical, and mathematical descrip-
tion of an aircraft and it’s environment which replicates the real complex system
characteristics using data fitting, system identification, physical modeling, parameter
estimation and first-principles modeling techniques. Most of the real-world systems
are highly nonlinear and their respective models can be developed by expressing them
in high-order complex mathematical terminologies to increase their accuracy. The
developed models are not 100 % accurate with respect to the true system; however,
they are quite useful for understanding and sufficient for controlling the system.
UAV nonlinear simulation model is modified from Aerosonde UAV Simulink
model available in AeroSim Blockset from Unmanned Dynamics [28]. The AeroSim
Blockset and Aerospace Blockset library provide almost all the aircraft model com-
ponents, environment models, and earth models for rapid development of 6-DOF
aircraft dynamic models. Figure 20 shows a basic layout of nonlinear aircraft model
subsystems from AeroSim Blockset library [28]. Customization of the aerodynamics,
propulsion, and inertia Aerosonde UAV simulink models is performed after exten-
sive experimentation on our small UAV. Generally, the equation of motion, earth
and atmosphere Simulink models are not modified because they are independent of
the aircraft system used. Brief description of adapted aircraft subsystem models is
presented in next section. First order actuator dynamics with saturation limits are
used to simulate the control surface characteristics.
Model-Based Verification and Validation 173
where c1 –c9 are the interia coefficients which are computed using AeroSim library.
The moments L, M, and N include all the available loads (i.e. aerodynamics, thrust,
winds) and they are given with respect to the current location of aircraft center of
gravity (CG).
The kinematic equations of the aircraft rotation motion using classical Euler angles
φ, θ, and ψ with the body angular rates p, q, and r and aerodynamics angles α, β,
and γ are given by
The propulsion system models the interaction between the electric motor and pro-
peller dynamics. The electrical characteristics of motor is used to describe the motor
dynamics and rotation speed of propeller (ω p ) is used to describe the propulsion
dynamics. Our small aircraft flight dynamics are sensitive to propulsion dynamics
because of the large torque from the propulsion system as relative to its size. The
propulsion system dynamics is expressed in the following equation using conserva-
tion of angular momentum.
174 A. H. Khan et al.
where Imotor and I pr opeller are the moment of inertia of rotating motor body and
propeller respectively in kgm2 , Tmotor and T pr opeller are the output torque at motor
shaft and torque generated by propeller respectively in Nm.
The inertia model consists of the aircraft inertia parameters, including mass,
CG position, and moment of inertia coefficients. The aircraft moment of inertia is
described by the moment of inertia matrix I as
⎛ ⎞
I x x −I x y −I x z
I = ⎝ −I yx −I yy −I yz ⎠
−Izx −Izy Izz
Beside the aircraft moment of inertia matrix, propulsion system moment of inertia
coefficients are determined using pendulum method in lab experimentation setup.
The linearized state-space aircraft model obtained at trim point in desired flight
envelope using numerical linearization is given by:
ẋ = Ax + Bu (12)
y = C x + Du (13)
(a) 15
Cmd
10 Sim
HIL
5
[deg]
-5
-10
-15
0 10 20 30 40 50 60 70 80 90 100
Time [sec]
15 4 5
Sim
10 HIL
2
q [deg/sec]
r [deg/sec]
p [deg/sec]
0 0 0
-5
-2
-10
-15 -4 -5
0 50 100 0 50 100 0 50 100
Time [sec] Time [sec] Time [sec]
(b) 7
4
6 3
5 2
1
4
0
3 -1
0 20 40 60 80 100 0 20 40 60 80 100
0.5 1
Sim
HIL
0 0.5
-0.5 0
-1 -0.5
-1.5 -1
0 20 40 60 80 100 0 20 40 60 80 100
Fig. 22 RT and HIL simulation responses. a RT and HIL simulation response in tracking bank angle
command (φ) with states, b Comparison of RT and HIL simulation control surfaces deflections
Model-Based Verification and Validation 177
• Rapid testing and identification of FCS issues before real flight test.
• Multiple simulation runs are possible to verify the accuracy and repeatability of
system performance.
• Provides a test bed for different flight controllers performance testing and compari-
son. Fault injection mechanism is also possible to test the fault-tolerance capability
of the FCS in closed-loop environment.
• Provides a close to real environment for the pilot and flight test engineers to feel
the actual flight situations.
• It can be used for each individual subsystem testing, verification and validation by
simulating the other system components in real-time environment.
• It can be used for post-flight analysis and test flight data verification.
Initially, PID controllers are implemented for each longitudinal and lateral mode
of UAV flight development and testing. Further work includes optimal controller
design, adaptive controller design and inclusion of actuators fault-tolerance using
control allocation (CA) strategies as detailed in [30, 31]. Also a fault detection and
isolation (FDI) block can be introduced in modular flight controller design to handle
actuators and sensors faults. FlightGear a free open-source flight simulator can also
be you to visualize the aircraft flight motions.
Another HIL simulation setup is developed with Matlab/Simulink model-based
environment using dSPACE hard real-time software development kit and dedicated
hardware. It is not as cost effective solution as the former one but has superior real-
time performance with on-line data display and precise high-speed data storage. For
hard real-time performance testing, verification, and validation dSPACE systems are
178 A. H. Khan et al.
-3
x 10
1.2
X: 0.02
Y: 0.001166 TET
TET(max)
1 TET(avg)
X: 31.88
0.6 Y: 0.0005545
-4
x 10
5.7
0.4
5.65
5.6
0.2 5.55
5.5
18.2 18.4 18.6 18.8 19 19.2
0
0 10 20 30 40 50 60 70 80 90 100
Simulation Time [sec]
Fig. 24 Aircraft real-time simulation task execution time (TET) at each sample instant using
DS1005PPC board
generally preferable. Here, we use Matlab2008b with ControlDesk 3.3 for the devel-
opment of HIL simulation platform for UAV flight controller VV&T. UAV flight
simulation task execution time (TET) in dSPACE is quite impressive as shown in
Fig. 24. Worse case time required to complete real-time task in dSPACE is approxi-
mately 1.2 ms and on average approximately 560 µs are required to compete different
tasks in complete 100 s UAV flight controlled simulation.
We used modular processor board DS1005 PPC for real-time execution of sim-
ulation which consists of PowerPC 750GX processor running at 933 MHz having
128 MB SDRAM, 16 MB flash memory, and two general-purpose timers. High speed
PHS bus interface is used to communicate with other modular DAQ cards. DS2201
modular I/O board is used for DAC, ADC and DIO requirements. DS4201-S high
speed serial interface board is used for RS232 and RS422 communication for sensors
and telemetry data saving from FCC. dSPACE’s TargetLink code generation technol-
ogy can generate highly readable and optimized code for resource-limited embedded
real-time systems. Graphical user interface is also developed in ControlDesk to visu-
alize the flight data and status as shown in Fig. 25.
Fig. 25 Real-time simulation graphical user interface in controlDesk. a UAV real-time control
effectors deflections [deg] layout in controlDesk, b UAV real-time avionics display layout in con-
tolDesk
180 A. H. Khan et al.
educational and industrial standard for complete system design [32]. There are var-
ious hardware/software solutions available within Matlab/Simulink® for real-time
embedded system development and its verification, validation and testing. Following
are the three ways to prepare complete embedded RT system.
Using Matlab/Simulink® family of products: There are following Matlab/
Simulink family of products available for model-based verification, validation and
testing in safety-critical product design:
1. Simulink: Simulation and model-based system design.
2. Simulink Coder: Generate C and C++ code from simulink and stateflow model.
3. Embedded Coder: Generate optimized embedded C and C++ code.
4. Simulink Verification and Validation: Use for simulink models and generated
codes verification.
5. Simulink Code Inspector: Source code review according to DO-178 safety stan-
dards.
6. Simulink Design Verifier: Verify design according to requirements and generate
test vectors.
7. Real-Time Workshop: Generate simplified, optimized, portable, and customiz-
able C code from Simulink model-based design.
8. Real-Time Windows Target: Execute Simulink models in real time on Win OS
based PC.
9. xPC Target: Use for real-time rapid prototyping and HIL simulation on PC
hardware.
10. xPC TargetBox: Embedded industrial PC for real-time rapid prototyping.
11. Real-Time Workshop Embedded Coder: Embedded real-time code generator for
product deployment.
Using Hard Real-Time Operating Systems (RTOS): In this approach, we develop
our system model and simulation in Matlab/Simulink® and generate C code with
some modifications for porting to RTOS. Then we run generated code in RTOS
with customized graphical user interface (GUI) with some necessary improvements.
For detailed procedure as a reference see [33]. Various free and commercial hard
real-time OS available for this type of embedded hardware and software VV&T.
Advantages of this procedure are customized multiple user interfaces, thousands
of available hardware I/O interfaces and hard real-time verification and validation
utilizing maximum hardware performance.
Using Commerical-Of-The-Shelf (COTS) Solutions: Several complete improved
rapid prototyping hardware and GUI development solutions are available with
Simulink model-based system design. Some of them are as follows:
1. dSPACE Systems Inc. (ControlDesk, dSPACE simulators, dSPACE RapidPro,
and TargetLink)
2. OPAL-RT Technologies, Inc. (RT-LAB, RT simulators (eMEGAsim, eDRIVE-
sim and eFLYsim), RCP controllers)
3. Applied Dynamics International (Advantage Framework (SIL and HIL simula-
tion environment), Beacon family (embedded code generator) and Emul8 family
(rapid prototyping environment).
Model-Based Verification and Validation 181
Development time
Verification
Verification
Modification
Integration
Coding
Coding
Before Software
using Software design
Verification
ACG
Verification
Verification
Modification
Integration
After
Software
using Software design 20%
ACG Verification
Model-based Software
Software
VV&T 50%
design Verification
(Future)
Fig. 26 Product development time reduction using model based design (Ref. [22])
4. Quanser Inc. (Real-time control software design, implementation and rapid pro-
totyping tools for algorithm VV&T)
5. NI LabVIEW real-time toolkits
6. Humsoft (Real-time toolbox and Data acquisition boards for VV&T)
7. TeraSoft Inc. (HIL and RCP hardware solutions for VV&T)
8. UEI Inc. (Real-time HIL and RCP hardware solutions)
9. DSPtechnology Co. Ltd (RTSim (HIL and RCP hardware and software))
10. Pathway Technologies Inc. (RCP electronic control units for software VV&T)
11. add2 Ltd (RT HIL and RCP hardware and software for algorithm VV&T)
etc. . .
These model-based verification and validation solutions are cost effective, reliable
and fast as compare to other tradition solutions. In embedded real-time product design
VV&T takes 40–50 % project time which can easily be reduce utilizing model-based
automatic code generation (ACG), verification, validation, and testing techniques as
shown in Fig. 26 by taking an example of automobile industry. All above ways for
the development and VV&T of ERT software made Matlab/Simulink a promising
candidate for increasing productivity and reducing project cost. Modular “off the
self” hardware platforms availability provide maximum flexibility for cost, selection
of communication interfaces, and performance requirements.
7 Conclusion
Verification, Validation and Testing technologies are important for next genera-
tion safety critical systems. This chapter has described ongoing work in applying
and extending COTS, specifically HIL simulations, to the VV&T of an embedded
182 A. H. Khan et al.
References
1. J.A. Stankovic, Misconceptions about real-time computing: a serious problem for next-
generation systems. Computer 21(10), 10–19 (1988)
2. H. Kopetz, Real-Time Systems Design Principles for Distributed Embedded Applications
(Kluwer Academic Publishers, London, 1997)
3. C.D. Locke, Best-Effort Decision Making for Real-Time Scheduling. Technical Report
(CMUCS-86-134 Carnegie-Mellon University, Department of Computer Science, USA, 1986)
4. M. Grindal, B. Lindström, Challenges in testing real-time systems. Presented at in 10th inter-
national conferene on software testing analysis ad review (eurostar’ 02), Edinburgh, Scotland,
2002
5. J.W.S. Liu, Real-Time Systems (Prentice Hall, New Jersey, 2000)
6. VDC Research, Next Generation Embedded Hardware Architectures: driving Onset of Project
Delays, Costs Overruns, and Software Development Challenges. Technical report, Sept 2010
7. M. van Genuchten, Why is software late? An empirical study of reasons for delay in software
development. IEEE Trans. Softw. Eng. 17(6), 582–590 (1991)
8. El al flight 1862, Aircraft Accident Report 92–11. Technical report (Netherlands Aviation Safety
Board, Hoofddorp, 1994)
9. IEEE Standard 610.12-1990, Standard Glossary of Software Engineering Terminology (IEEE
Service Center, NY, 1990)
10. I. Sommerville, Software Engineering, 6th edn. (Addison-Wesley Publishing Company, MA,
2001)
11. W. W. Royce, Managing the development of large software systems. Proceedings of Western
Electronic Show and Convention, pp. 1–9, 1970. Reprinted in Proceedings of the 9th Interna-
tional Conference on, Software Engineering, pp. 328–338, 1987
12. C. Kaner, J. Falk, H. Nguyen, Testing Computer Software, 2nd edn. (Van Nostrand Reinhold,
NY, 1999)
13. R.V. Binder, Testing Object-Oriented Systems: Models, Patterns, and Tools (Addison-Wesley,
MA, 1999)
14. IEEE Standard 1028–1988, IEEE Standard for Software Reviews (IEEE Service Center, NY,
1988)
15. Simulink Verification and Validation, User’s Guide, Mathworks, Inc., http://www.mathworks.
com
16. Matlab and Simulink Mathworks. http://www.mathworks.com
17. J.A. Whittaker, What is software testing? And why is it so hard? IEEE Softw. 17(1), 70–79
(2000)
18. J. Wegener, M. Grochtmann, Verifying timing constraints of real-time systems by means of
evolutionary testing. Real-Time Syst. 15(3), 275–298 (1998)
19. J. Hänsel, D. Rose, P. Herber, S. Glesner, An Evolutionary Algorithm for the Generation of
Timed Test Traces for Embedded Real-Time Systems. IEEE Fourth International Conference
on Software Testing, Verification and Validation (ICST), 2011, pp. 170–179
Model-Based Verification and Validation 183
20. R.L. Haupt, S.E. Haupt, Practical Genetic Algorithms (Wiley, New York, 2004)
21. B. Beizer, Software Testing Techniques, 2nd edn. (VNR, New York, 1990)
22. Toyota, North America Environmental Report (Toyota Motor North America, Inc., NY, 2010).
23. S.A. Jacklin, J. Schumann, P. Gupta, K. Havelund, J. Bosworth, E. Zavala, K. Hayhurst, C.
Belcastro, C. Belcastro, Verification, Validation and Certification Challenges for Adaptive
Flight-Critical Control Systems (AIAA Guidance, Navigation and Control, Invited Session
Proposal Packet, 2004)
24. L. Pedersen, D. Kortenkamp, D. Wettergreen, I. Nourbakhsh, A survey of space robotics.
Robotics (2003)
25. N. Nguyen, S.A. Jacklin, Neural Net Adaptive Flight Control Stability, Verification and Vali-
dation Challenges, and Future Research (IJCNN Conference, Orland Florida, 2007)
26. J.M. Buffington, V. Crum, B. Krogh, C. Plaisted, R. Prasanth, Verification and Validation of
Intelligent and Adaptive Control Systems, in 2nd AIAA Unmanned Unlimited Systems Confer-
ence (San Diego, CA, 2003)
27. J. Schumann, W. Visser, Autonomy software: V&V challenges and characteristics, in Proceed-
ings of the 2006. IEEE Aerospace Conference, 2006
28. Unmanned Dynamics LLC. Aerosim Blockset Version 1.2 User’s Guide, 2003
29. B.L. Stevens, F.L. Lewis, Aircraft Control and Simulation (John Wiley & Sons, Inc., 1992).
ISBN 0-471-61397
30. A.H. Khan, Z. Weiguo, Z.H. Khan, S. Jingping, Evolutionary computing based modular con-
trol design for aircraft with redundant effectors. Procedia Eng. 29, 110–117 (2012). (2012
International Workshop on Information and Electronics Engineering)
31. A.H. Khan, Z. Weiguo, S. Jingping, Z.H. Khan, Optimized reconfigurable modular flight control
design using swarm intelligence. Procedia Eng. 24, 621–628 (2011). (International Conference
on Advances in Engineering 2011)
32. MathWorks User Stories. http://www.mathworks.com/company/user_stories/index.html
33. N. ur Rehman, A.H. Khan, RT-Linux Based Simulator for Hardware-in-the Loop Simulations.
International Bhurban Conference on Applied Sciences and Technology (IBCAST), Islamabad,
2007, pp. 78–81
A Multi-objective Framework for
Characterization of Software Specifications
1 Introduction
M. Rashid (B)
Umm Al-Qura University, Makkah, Saudi Arabia
e-mail: [email protected]
M. Rashid · B. Pottier
University of Bretagne Occidentale, CNRS, UMR 3192, Lab-STICC, Brest, France
M. A. Khan et al. (eds.), Embedded and Real Time System Development: 185
A Software Engineering Perspective, Studies in Computational Intelligence 520,
DOI: 10.1007/978-3-642-40888-5_7, © Springer-Verlag Berlin Heidelberg 2014
186 M. Rashid and B. Pottier
We have divided our related work into three categories: application analysis tech-
niques, application characterization and streaming programming languages.
We have described in the introductory part of this article that spatial parallelism infor-
mation is used to describe the application in the form of parallel process networks.
This section provides a brief overview of the streaming languages used for writing
applications in the form of parallel process networks.
190 M. Rashid and B. Pottier
The idea of language dedicated to stream processing is not new and has already
been discussed in existing literature [25]. The languages of recent interests
are Cg [26], Brook [27], Caravela [28], StreamIt [11], Spidle [29], Streams-C
[30], Sisal [31], Sassy [32] and DirectFlow [33]. Existing stream languages can be
divided into three categories as shown in Fig. 1.
The first type of languages are geared towards the features of specific hardware
platform such as Cg [26], Brook [27] and Caravela [28]. All of these languages
are dedicated to program Graphical Processing Units (GPUs). Cg language is a
C-like language that extends and restricts C in certain areas to support the stream
processing model. Brook abstracts the stream hardware as a co-processor to the host
system. Kernel functions in Brook are similar to Cg shaders. These two languages
do not support the distributed programming. Caravela applies stream computing to
the distributed programming model.
The second type of languages introduce constructs for gluing components of
stream library. The examples are StreamIt [11], Spidle [29] and Streams-C [30].
StreamIt and Spidle are stream languages with similar objectives. However, the for-
mer is more general purpose while the latter is more domain specific. StreamIt is
a Java extension that provides some constructs to assemble stream components.
Spidle, on the other hand, provides high level and declarative constructs for specify-
ing streaming applications. Streams-C has a syntax similar to C and is used to define
high level control and data flow in stream programs.
The third type of languages are functional languages such as Sisal [31], Sassy [32]
and DirectFlow [33]. Sisal offers automatic exploitation of parallelism as a result
of its functional semantics. Sassy is a single assignment of C language to enable
compiler optimizations for parallel execution environments targeting particularly
the reconfigurable systems. DirectFlow system is used for describing information
flow in the case of distributed streaming applications.
Before describing the proposed framework, some background knowledge is
required. Section 3 will provide essential background to understand the contents
described in this article.
A Multi-objective Framework for Characterization of Software Specifications 191
This section briefly reviews the basic concepts of an object oriented language
Smalltalk. Then, the concept of visitor design pattern is illustrated. The proposed
application analysis technique is based on visitor design pattern concept.
Smalltalk [16] is uniformly object-oriented because everything that the program-
mer deals with is an object, from a number to an entire application. It differs from
most languages in that a program is not defined declaratively. Instead, a computation
is defined by a set of objects. Classes capture shared structure among objects, but
they themselves are objects, not declarations. The only way to add code to classes
is to invoke methods upon them. Smalltalk is a reflective programming language
because its classes inherently support self modifications.
Definition 7.1 Reflective Programming—the programming paradigm driven by
reflection. The reflection is the process by which an application program observe
and/or modify program execution at run-time. In other words, the emphasis of the
reflective programming is dynamic program modification. For example, the instru-
mentation in dynamic analysis of application programs can be performed without
re-compiling and re-linking the program.
Definition 7.2 Visitor Design Pattern—it represents an operation to be performed on
the elements of an object structure [34]. It defines a new operation without changing
the classes of the elements on which it operates. Its primary purpose is to abstract
functionality that can be applied to an aggregate hierarchy of “element objects”.
The general organization of visitor design pattern is shown in Fig. 2. Abstract
class for object structure is represented as AbstractElement while the abstract class
for visitor hierarchy is represented as AbstractVisitor. “Visitor1” and “Visitor2” are
inherited from abstract class. The functionality is simply extended by inheriting more
and more visitor classes as each visitor class represents a specific function.
Client
AbstractVisitor AbstractElement
VisitElementA
acceptVisitor
VisitElementB
Consider a compiler that parses an input program and represents the parsed pro-
gram as an Abstract Syntax Tree (AST). An AST may have different kinds of nodes
such that multiple operations can be performed on an individual node. Examples of
nodes in an AST are assignment nodes, variable reference nodes, arithmetic expres-
sion nodes and so on. Examples of operations performed on an AST are program
re-structuring, code instrumentation, computing various metrics and so on.
Operations on the AST treat each type of node differently. One way to do this is
to define each operation in the specific node class. Adding new operations requires
changes to all of the node classes. It can be confusing to have such a diverse set of
operations in each node class. Another solution is to encapsulate a desired operation
in a separate object, called as visitor. The visitor object then traverses the elements of
the tree. When a tree node accepts the visitor, it invokes a method on the visitor that
includes the node type as an argument. The visitor will then execute the operation
for that node—the operation that used to be in the node class.
The visitors are typically used: (a) when many distinct and unrelated operations
are performed on objects in an object structure and (b) when the classes defining
the object structure rarely change and we want to define new operations over the
structure. Visitor based design provides modularity such that adding new operations
with visitors is easy. Related behavior is not spread over the classes defining the object
structure. Visitor lets the designer to keep related operations together by defining them
in one class. Unrelated sets of behavior are partitioned in their own visitor subclasses.
The Smalltalk environment has a visitor class called as ProgramNodeEnumerator.
Definition 7.3 ProgramNodeEnumerator—an object to visit AST produced from
the Smalltalk source code. The structure of AST is determined by the source code and
the syntax rules of Smalltalk. Therefore, AST is also called as Program Node Tree.
Consequently, ProgramNodeEnumerator class in Smalltalk environment provides a
framework for recursively processing a Program Node Tree [34].
Probing Messages
(1) INSTRUMENTATION (2) EXECUTION
Visitor
Parsing Process Probing Messages
Analysis
Operations
Record Events
(3) VISUALIZATION
Replacing
Binding ANALYSIS
RESULTS
are executable specifications along with real input data and the output is trace tree
representation of input specifications. The input to the second part is trace tree and
the output is in the form of analysis results.
This section describes the first part of proposed application analysis framework.
It takes Smalltalk executable source specifications along with real input data. As
the analysis is performed on executable specifications, therefore the input data is
the real data and not merely synthetic vectors. For example, in case of MPEG-2
video decoding application analysis, the input is in the form of MPEG-2 video bit-
stream. Dynamic analysis is performed to transform source specifications into a
trace tree representation. The trace tree contains information about the execution of
an application at run-time and represents implementation independent specification
characteristics.
First part of Fig. 3 summarizes three essential steps of dynamic analysis to
transform source specifications into a trace tree representation. These sub-steps are
instrumentation, execution and visualization. Instrumentation step is responsible to
instrument the source specifications by automatically inserting probes inside the
source code. Execution step takes the instrumented specifications along with real
input data and runs the instrumented specifications. Each probe activation is recorded
during the Execution step. Visualization step displays the results in the form of a trace
tree and bounds original source specifications to the corresponding trace tree repre-
sentation.
194 M. Rashid and B. Pottier
Input to the instrumentation step is original specifications and the output is the
instrumented specifications. Instrumentation is performed by automatically inserting
probes inside the source specifications. A probe is a statement added to a statement
of the original source code, with no disruption to its semantics. The probe is written
to extract the required information during execution. The base for the code modifi-
cation is the Abstract Syntax Tree (AST) produced by the Smalltalk environment.
Instrumentation step generates probing messages in four sub-steps as shown in Fig. 3.
The first sub-step is to parse the source code for AST generation. The second
sub-step is to instrument the AST. This sub-step automatically generates additional
nodes in the original AST. The output is an instrumented AST. The third sub-step is
to compile the instrumented AST. The output is the compiled source code. Finally,
the original source code is replaced with the compiled source code.
It performs trace recording through the activation of probes in the instrumented code.
The input to this step is the instrumented source code in the form of probing messages
from the instrumentation phase. In addition to the instrumented code, real input data
is provided to execute the instrumented code.
Once the instrumented code and the input data are available, the trace recording
is done through the activation of probes in the instrumented code and the recording
of events in the trace. Each probe is implemented as a message sent to a collector
along with the information from the execution context. The collector receives the
message, creates a corresponding record event and adds it to the trace tree.
It binds each event in the trace to the original source code in the form a trace tree
as shown in Fig. 4. The right hand side represents the original source code while
the left hand side represents its trace tree representation. Each entity in the source
code, such as different variables and operators on the right hand side, is linked to the
corresponding trace tree entity. It is illustrated by drawing arrows in Fig. 4.
It means that one can go through all the application source code, starting from
the beginning and observe the corresponding arrangement in the trace tree for each
single element. It may help in comprehension of the source code.
A Multi-objective Framework for Characterization of Software Specifications 195
The output of first part is a trace tree which represents the sequence of recorded
events in a tree-like form. A typical use of the trace tree is to hierarchically show the
structure of function calls during a particular execution of the source specification.
Once source specification is transformed into a trace tree representation, we perform
operations on the trace tree for different types of analysis.
Multiple analysis operations can be performed on basic entities of the trace tree. For
example:
• Checking the value of each variable in every step of the program execution
• In the context of code rewriting, one may perform operations for type-checking,
code optimization and flow analysis
• For early design space exploration, application orientation and extraction of spatial
parallelism are the key concerns.
We keep basic entities of the trace tree independent from analysis operations that
apply to them. Related operations that we want to perform on the trace tree entities
are packaged in a separate object named as “visitor” and pass it to the elements
of trace tree. There are different visitors for multiple analysis operations. We have
already explained the concept of visitor design patten in Sect. 3.
The proposed analysis framework is generic as it is not restricted to a particular
set of analysis operations. It allows the designer to extend the framework by defining
new analysis operations to fulfill different requirements of design space exploration.
For each analysis operation, there is a corresponding visitor for the trace tree, making
a visitor hierarchy similar to the visitors on a parse tree [34]. For example, in this
article, we have defined visitors for application orientation and spatial parallelism
information. However, the framework can easily be extended by simply defining new
visitors.
environment. It includes: (1) application orientation, (2) spatial parallelism, (3) guide-
lines for mapping and (4) guidelines for performance estimation.
Trace tree representation shows the existing parallelism among the operations of the
function. It implies the possibility of mapping different operations or functions to
different processing elements of the target architecture for concurrent execution. In
other words, we can exploit the inherited spatial parallelism present in the application.
We represent the amount of average inherited spatial parallelism for every function
in the source specification by “P” such that functions with higher “P” values are
considered as appropriate to architectures with large explicit parallelisms. Functions
with lower “P” value are rather sequential and acceleration can only be obtained by
exploiting temporal parallelism.
198 M. Rashid and B. Pottier
The mapping process requires application model in the form of different functions as
well as architecture model in the form of processing elements and interconnections
to map the application behavior on the architecture model. The obtained analysis
results identify the most complex functions in terms of computations, which may
be the best candidates for mapping to the fastest processing elements (PEs). The
designers also prefer to map the functions which communicate heavily with each
other to the same PE or to the PEs connected by dedicated busses.
6 Experimental Results
The purpose of this section is to provide analysis results for MPEG-2 video decoding
application [20]. The basic principles of MPEG-2 decoding application are first
described in Sect. 6.1. A Smalltalk implementation of MPEG-2 decoder is used to
perform experiments in Sect. 6.2.
Video Bit−Stream
Huffman Decoding Inverse Scan Inverse Quantization
Reconstructed pictures
2D Inverse Discrete Cosine Transform Motion Compensation
Frame Memory
memory is used to store the reference frames. The reference macroblock is added to
the current macroblock, to recover the original picture data.
This section performs experiments with different blocks of MPEG-2 video decoder
to get analysis results in terms of application orientation and spatial parallelism.
For simplicity, we present experimental results with Two Dimensional Inverse Dis-
crete Cosine Transform (2D-IDCT) and Huffman Decoding to illustrate our analysis
approach.
2D-IDCT for 8 × 8 image blocks is transformed into a trace tree using the flow
in Fig. 3. We perform analysis operations on the trace tree representation to
get analysis results. Table 2 shows the analysis results for different functions in
2D-IDCT. The first column represents the name of method. The second, third and
fourth columns represent the percentages of computation, memory and control in
each method respectively. The last column represents the amount of inherited spatial
parallelism.
From the structural point of view, 2D-IDCT is composed of two identical and
sequential one-dimensional IDCT (1D-IDCT) sub-blocks, operating on rows and
columns. The method “idctCol:index:” and method “idctRow:index:” in Table 2
represent 1D-IDCT operations on columns and rows respectively. The corresponding
trace trees have the same orientation values for both methods as shown in Table 2.
The method “add:on:offset:stride:” in Table 2 represent 2D-IDCT.
The first observation is that the percentage of control operations is zero since it is
composed of deterministic loops and does not contain any test. Secondly, the com-
putation percentage for 2D-IDCT functional blocks are higher so it is computation
oriented. The results also show a good percentage of memory operations.
Figure 6 shows the percentage of each type of computation in 2D-IDCT. It does
not contain any floating point operations. It implies that processors with dedicated
floating point units are not necessary and processor selection should focus on integer
performance. Furthermore, 27 % operations are multiplications such that selected
processors may have dedicated hardware multipliers.
A Multi-objective Framework for Characterization of Software Specifications 201
Table 3 shows the analysis results for huffman decoding methods. It can be noticed
that these functions have relatively high percentages of control operations denot-
ing heavily conditioned data-flows. The percentage of computation operations also
indicates an important computation frequency. There are less number of memory
operations as compared to computations and control operations. It indicates that
these methods are control and computation oriented.
Figure 7 shows the orientation of Huffman Decoding. There are no multiplications,
so selected processors have no need for dedicated hardware multipliers. The results
show that 45 % of the computations are logical operations.
We have not shown the value of P in Table 3 because the value of P remains
1 (the value of P for sequential code) at all hierarchical levels of the trace tree. It
reveals that suitable target architecture for Huffman decoding algorithm may be a
GPP. There is no need for a DSP and for a complex data path structure, since there
is no spatial parallelism at any hierarchical level.
202 M. Rashid and B. Pottier
This section has presented the analysis results for 2D-IDCT and Huffman Decod-
ing in MPEG-2 decoder in terms of application orientation and spatial parallelism.
The next section will further illustrate the significance of spatial parallelism infor-
mation by representing the application in the form of parallel process networks.
AVEL framework specifies three kinds of processes which are composed hierarchi-
cally. The first type is the Primitive Process which is the leaf of a process network
hierarchy and implements an atomic behavior. The second type is the Node Process
which is the composition of other processes and behaviors. It allows an hierarchical
description of process network. The third type is the Alias Process which is declared
outside the main process and is re-used by another processes. We use Alias Process to
factorize complex behaviors in the code. The syntax to declare the Primitive Process
or the Node Process is given as:
The graphical representation of the Primitive Process and the Node Process syntax
is shown in Figs. 8 and 9 respectively.
The syntax to declare an Alias Process is given as:
The graphical representation of the Alias Process syntax is shown in Fig. 10.
01. StrZ{}
02. [
03. Prim1{Prim2@1}[Prim1]
04. Prim2{}[Prim2]
05. ]
06. Example{}
07. [
08. Split{StrA@1 StrB@1} [splitter]
09. StrA{StrZ}{Join@1}
10. StrB{StrC@1}
11. [
12. PrimA {PrimB@1}[prima]
13. PrimB{}[primb]
14. ]
Example
Split
Prim1 PrimA
StrA StrB
Prim2 PrimB
PrimA
StrC
PrimB
Join
Fig. 11 Graphical representation of “Example” program (takes an input stream and splits it into
two processes
A Multi-objective Framework for Characterization of Software Specifications 205
15. StrC{StrB}{Join@2 }
16. Join{}[joiner]
17. ]
“Example” (line 6) is an AVEL process network that takes an input stream and
splits it into two other processes “StrA” and “StrB” (line 8). “StrZ” (line 01) is
a Node Process because its behavior contains other Primitive Processes “Prim1”
and “Prim2” (line 03 and 04 respectively). “StrB” (line 10) is also a Node Process
because its behavior contains other Primitive Processes “PrimA” and “PrimB” (line
12 and 13 respectively). “StrA” (line 09) and “StrC” (line 15) are Alias Processes
because their behaviors reuse predefined processes “StrZ” and “StrB” respectively.
“Split” (line 08 in above program) and “Join” (line 16 in above program) are two
Primitive Processes: Former is responsible for distributing input stream between its
outputs while the latter is responsible for merging an output stream from its inputs.
We have described MPEG-2 video decoding in Sect. 6.1. However, it did not explain
the parsing of incoming video bit stream into an object stream. As its name implies,
parser reads the incoming video bit stream to extract different syntactic structures.
The process of parsing the incoming video bit-stream consists of many layers of
nested control flow. It makes the parser unsuitable for streaming computations. As
AVEL is intended for streaming computations, parsing of MPEG-2 bit stream into
an object stream is implemented in a higher level language like Smalltalk rather than
AVEL.
The transformation of video bit-stream into an object stream ensures that all syn-
tactic structures above macroblocks have been treated. The following AVEL program
shows slice (a collection of macroblocks) processing in MPEG-2 decoder. Figure 12
shows the graphical representation of the “ProcessSlice” AVEL Program.
01. ProcessSlice {} [
02. MBAddr{MBMode@1}[mbaddr]
03. MBMode {Split@1}[mbmode]
04. Split {IntraMB@1
05. FieldMB@1
06. FrameMB@1
07. Pattern@1
08. DMV@1
09. NoMotion@1}[splitter]
10. IntraMB {join@1}
11. [
12. VLD{InverseScan@1}[vld]
13. InverseScan{InverseQuant@1}[is]
206 M. Rashid and B. Pottier
ProcessSlice MBAddr
MBMode
Split
Inv. Scan
Inv. Q
Inv. DCT
Join
14. InverseQ{IDCT@1}[iq]
15. IDCT {}[idct]
16. ]
17. FieldMB {} {Join@2}
18. FrameMB {} {Join@3}
19. Pattern {} {Join@4}
20. DMV {} {Join@5}
21. NoMotion{} {Join@6}
22. Join {} [joiner]
23. ]
The slice processing (line 01 in the above program) starts by calculating the mac-
roblock address increment (line 02 in the above program and shown as “MBAddr” in
Fig. 12). It indicates the difference between current macroblock address and previous
macroblock address. We have implemented it as a primitive process with behavior
“mbaddr”. The behavior of this process contains inherent parallelism. We can imple-
ment this process as node process which contains other primitive processes. However,
we have shown it as a primitive process in Fig. 12) for simplicity. After calculating
macroblock address increment, macroblock mode (line 03 in the above program) is
calculated which indicates the method of coding and contents of the macroblocks
according to tables defined in MPEG-2 standard [20]. Each type of macroblock is
treated differently. We have implemented it as a primitive process with behavior
“mbmode”. However, we can implement it as node process containing other primi-
tive processes.
A Multi-objective Framework for Characterization of Software Specifications 207
The output of “MBMode” is given to any of the six processes (line 04–09 in the
above program). All of these processes “IntraMB”, “FieldMB”, “FrameMB”, “Pat-
tern”, “DMV” and “NoMotion” are node processes and consists of other primitive
processes. But for simplicity, we have shown only “IntraMB” as node process (line
10 in the above program) and all of other processes are shown as primitive processes.
“IntraMB” further consists of primitive processes. These processes are “VLD”,
“InverseScan”, “InverseQ”, and “IDCT” (line 12–5 in above program). Again, we
have implemented all of these processes as primitive process. We can implement
these processes as node process which contains other primitive processes depending
upon the amount of spatial parallelism obtained from analysis framework.
8 Conclusions
References
1. S.S. Bhattacharyya et al., Overview of the MPEG reconfigurable video coding framework. J.
Signal Process. Syst. 63(2), 251–263 (2011)
2. Y.Q. Shi, H. Sun, Image and video compression for multimedia engineering: fundamentals,
algorithms, and standards, 2nd edn. (Cambridge, Massachusetts, USA, 2008)
3. G. Ascia, V. Catania, A.G. Di Nuovo, M. Palesi, D. Patti, Efficient design space exploration for
application specific systems-on-a-chip. J. Syst. Arch. 53(10), 733–750 (2007). Special Issue
on Architectures, Modeling, and Simulation for Embedded Processors
4. I. Bacivarov, W. Haid, K. Huang, L. Thiele, Methods and tools for mapping process net-
works onto multi-processor systems-on-chip, in Handbook of Signal Processing Systems, Part
4 (2010), pp. 1007–1040
208 M. Rashid and B. Pottier
Muhammad Rashid
1 Introduction
M. Rashid (B)
Umm Al-Qura University, Makkah, Saudi Arabia
e-mail: [email protected]
M. A. Khan et al. (eds.), Embedded and Real Time System Development: 213
A Software Engineering Perspective, Studies in Computational Intelligence 520,
DOI: 10.1007/978-3-642-40888-5_8, © Springer-Verlag Berlin Heidelberg 2014
214 M. Rashid
native SoCLib simulation modules are not sufficient to implement the proposed
performance estimation methodology. Consequently, we create one new module and
modify one of the native module in the SoCLib library.
The SoCLib simulator models all architecture features and estimation is made
with the binary executable after compilation. Therefore, first two requirements of
the performance estimation, consideration of architectural features and compiler
optimizations, are satisfied. The third requirement, consideration of data dependent
behavior, is met by simulating each function with different input data. Accordingly,
we compute the Worst Case Execution Time (WCET) and the Average Case Execu-
tion Time (ACET) for individual functions blocks.
To summarize, the contributions of this article are as follows:
• A complete DSE framework is proposed. It consists of five stages. The in-depth
description of the second stage, a performance estimation methodology at cycle-
accurate level with reduced simulation time, is presented.
• In order to implement the proposed cycle-accurate performance estimation method-
ology, SoCLib simulation library is extended.
The rest of this article is organized as follows: Sect. 2 describes the related work
in software performance estimation. Section 3 presents a DSE framework with five
stages. Section 4 presents the performance estimation stage of the proposed frame-
work. Simulations are performed and the performance results of individual function
blocks are stored in a performance estimation database. Section 5 presents SoCLib
library of simulation models. Experimental results with H.264 video encoding appli-
cation are provided in Sect. 6. Finally, we conclude the article in Sect. 7.
In this technique, performance information is annotated into the application code and
executed at the native environment. During the native execution, the previous anno-
tated information is used to calculate the application performance. One example of
annotation-based performance estimation is [9]. The C source code of the application
is first lowered to an executable intermediate representation (IR). A set of machine
independent optimizations, such as constant propagation, constant folding and dead
code elimination are performed to remove redundant operations. The optimized IR
is then analyzed to estimate the operation cost of each basic block. These costs are
then annotated back to the IR, which in turn are natively compiled and executed
to estimate the performance of the application. The performance estimation from
a cycle accurate virtual prototype is exported to a concurrent functional simulator
in [11]. Target binaries are simulated on cycle-accurate simulators to obtain timing
information. This timing information is annotated back to the original C code at
source line level. Finally, the SystemC simulation is performed on annotated code
for fast performance estimation.
on the native host machine, whereas the rest runs on an ISS. Natively executed parts
are the most frequently executed portion of the code. Since native execution is much
faster than ISS, significant simulation speed is achieved.
The HySim framework [8] combines native execution with ISS similar [13].
However, it generates the C code containing performance information from the
original C source code, similar to the annotation-based techniques [11]. It analyzes
the source code of the application, and annotates operation cost and memory accesses.
These annotations are evaluated at runtime to generate performance information in
terms of processor cycles and memory latencies.
Due to system complexity, the ISS acceleration techniques such as dynamically com-
piled simulation, statically compiled simulation or JIT compiled simulation were not
enough to achieve the desired simulation speed. Consequently, the partial simulation
techniques, such as sampling and tracing, were proposed. The major drawback of
sampling-based technique is that a large amount of pre-processing is needed for dis-
covering the phases of the target application. The proposed performance estimation
methodology in this article does not require to identify the regions of a program
that are selectively simulated in detail while fast-forwarding over other portions.
Therefore, no pre-processing is required.
The problem with trace-driven simulation is that the generated traces might
become excessively large. Another issue is that trace-driven simulation relies on
post-processing and cannot provide performance information at runtime. Annotation-
based simulation techniques [9–11] provide simulation speedup as compared to pure
ISS but still suffer with some restrictions. For example, the approach in [9] is applica-
ble for RISC like processors and does not support super-scalar or VLIW architectures.
The technique in [11] does not fully parse the C code. Similarly, developing a binary-
to-C translator requires considerable efforts in [10]. The proposed approach in this
article does not estimate the performance of the entire application. Instead, we use
the simulation to estimate the performance of function blocks before the design space
exploration loop.
Although simulation speedup is achieved in hybrid simulation techniques [8, 13],
the major concern is the selection of application functions for native execution. For
example, the limitation of [13] is that a training phase is required to build a procedure
level performance model for the natively executed code. Similarly in [8], functions
for native execution must contain no target dependent optimization. Our technique
does not impose any restriction on the application code as the complete application
is executed on the ISS of the target architecture.
An Efficient Cycle Accurate Performance Estimation Model 219
Section 2 provided state-of-the-art in reducing the simulation speed for fast DSE. We
highlighted some major limitations in the existing approaches. The subsequent parts
of this article will present a cycle-accurate performance estimation technique in a
DSE framework. First, this sections presents the proposed DSE framework. Then,
Sect. 4 will describe the performance estimation technique.
The proposed framework is shown in Fig. 2. It contains five stages: applica-
tion specification, cycle-accurate performance estimation, computation architecture
selection, code partitioning and communication architecture selection.
Application Transformation:
The application description is given in the form of reference C code. The proposed
DSE framework starts by transforming a reference sequential code into composition
of functional blocks. An application transformation methodology is presented in [16].
The application behavior may consist of hardware blocks written in RTL specifica-
tions (hardware IPs) or software blocks written as C functions. Hardware IPs are
provided with their performance values. However, the performance of a software IP
can not be determined a priory. We estimate the performance of software function
blocks by performing simulations at cycle-accurate level in the second step of the
proposed DSE framework which is the main concern of this article.
Performance Estimation:
The proposed DSE framework consists of two inner design loops: computation
architecture selection loop and communication architecture selection loop. Before
entering into computation architecture selection loop, it is necessary to estimate the
(c)
(3) Computation Architecture Selection Performance Estimation Database
(a)
(4) Code Partitioning Architecture Library
(Computation)
Section 3 presented the DSE framework with five stages. This section will present the
second step of the proposed DSE framework by presenting a performance estimation
methodology at functional level of the embedded software.
An Efficient Cycle Accurate Performance Estimation Model 221
Basic Principle:
Application behavior specified in function blocks is instrumented and
cross-compiled on ISS of the target processor. We assume that all architectural
features of the processor are accurately modeled in the simulator. The compiled
code is simulated on the simulation platform at cycle-accurate level to obtain the
run time profile of each function block. It includes number of execution cycles and
memory access counts. From the run time profile, we determine the representa-
tive performance values and store it in a performance estimation database. Once we
have the performance estimation values for individual blocks, the performance of the
entire application is computed as a linear combination of function block performance
values.
Cyclic Dependency Problem:
The software performance estimation depends upon two things: compiler options
and architecture features. Depending upon the compilation options, the performance
variation can be as large as 100 %. Even though, function block performances were
already recorded in the performance estimation database, we have to examine which
compiler options were used before using performance values. The most important
architecture feature is the memory system. If a cache system is used, cache miss rate
and miss penalty, both affect the software performance.
As a result, there is a cyclic dependency between the performance estimation and
the DSE. The system architecture is determined after the DSE but the performance
estimation is required before the DSE. However, the accurate performance estimation
is only possible after system architecture is determined. For example, memory access
time is dependent on the communication architecture and memory system. This cyclic
dependency is shown in Fig. 3.
Solution to Cyclic Dependency Problem:
This problem is solved by specifying the performance value of a software block
on a processor with not a number but a pair: (CPU time, memory access counts).
The CPU time is obtained from simulation assuming that the memory access cycle
is 1 (perfect memory hypothesis). We record the memory access counts separately
as the second element of the performance pair. Then, the block performance on a
specific architecture will be the sum of the CPU time and the memory access counts
multiplied by the memory access cycles. Memory access latency is defined from the
architecture and its value can be updated after performing the fifth step, which is the
communication architecture selection loop, in the proposed DSE framework shown
in Fig. 2. Such separation of memory access counts breaks the cycle dependency
between the performance estimation and the design space exploration.
Data-Dependant Performance Estimation:
In addition to compiler options and architecture features, performance value of
a software function block depends on input data. It is observed that the worst case
takes much longer than the average case behavior. Therefore, for each function block,
simulation is performed more than once with different input data. As a result, WCET
and ACET are computed for each function block.
Definition 1 Worst Case Execution Time (WCET)—the maximum execution time
of a function during multiple simulations with different input data. It should not be
confused with the conventional meanings of the WCET used in static analysis. There
is no guarantee of the worst case performance because we use the inputs that are not
exhaustive in any sense. It just says that the performance is no worse than this value
with high probability.
Definition 2 Actual Case Execution Time (ACET)—the average execution time of
a function block during multiple simulations with different input data. Performance
results with the WCET can be too pessimistic. ACET may reveal more realistic
performance results for average case optimization.
However, it might be a problem to use the ACET as the performance measure for
real time applications due to non-uniform execution time.
Definition 3 Non-uniform Execution Time—the phenomenon of executing different
frames in a video sequence with different execution time. The in-depth discussion
of non-uniform execution time will be presented in Sect. 6.4.
Performance Estimation with Real Test Data:
For estimating the performance of each block, a common method is to build a
test bench program where a test vector generator provides input argument values
to the function. This method has two serious drawbacks. First, it is very laborious
to build a separate test bench program and analysis environment for each function
block. Second, good test vectors are not easy to define.
The proposed approach overcomes these drawbacks by running the entire appli-
cation. Since the entire application is already given at the specification stage, no
additional effort of building a separate simulation environment is needed. And test
vectors to the function blocks are all real, better than other synthetic test vectors.
Since we are using the real test vectors for a function block, the average case perfor-
mance value is meaningful when computing the average performance of the entire
application by summing up the performance values of function blocks.
In order to measure the performance estimation, we obtain the number of processor
clock cycles for a particular task execution. For more precise performance estimation,
we can also measure the information related to the caches associated to a processor
(number of cache Miss and Hit).
An Efficient Cycle Accurate Performance Estimation Model 223
• “T(k, i)” be the CPU time taken by function “Fk” mapped on component “Pi”.
• “M(k)” and “I(k)” be the number of memory accesses and the number of invoca-
tions for function “Fk” respectively.
• “C(k, l)” be the communication requirement of function “Fk” to the next function
block “Fl”.
• “N(m)” be the memory access overhead of the selected candidate architecture.
• “N(c)” be the channel communication overhead of the selected architecture.
Table 2 Example of
Functions PEs T(k, i) I(k) M(k) N(m) C(k, L) N(c)
performance estimation with
Eq. 1 F1 P1 30 5 5 5 5 2
F2 P1 35 2 3 5 0 2
F3 P2 25 5 5 4 5 3
F4 P2 40 4 10 4 0 3
An Efficient Cycle Accurate Performance Estimation Model 225
TTY
VGMN
Processor Instruction RAM
Data
Cach
Data RAM
TTY
Instruction
VGMN
Processor
Instruction RAM
(Modified)
Data
Cach
Data RAM
(Modified)
designer to describe the memory mapping and address decoding scheme of any
shared memory architecture build with the SoCLib hardware components. All the
required SoCLib components are added to mapping table. We then create all the
required components and associated signals and connect components and signals.
Finally we define “sc_main” in the top module, and run simulation.
The information required for the proposed performance estimation framework can
not be extracted through the already available tools in the SoCLib library. Therefore,
modifications in various modules of the SoCLib platform or creation of new modules
is required. In order to implement the proposed performance estimation methodology,
we create a new module “VCI_Profile_Helper” and modify the existing module for
instruction and data RAM. It allows to extract the required performance information
which is not possible through already available tools in SoCLib [21]. In an open
source framework like SoCLib, it is easy to make these changes. However, it may
require some development time to modify the existing module or creating new ones.
The purpose of the new module “VCI_Profile_Helper” is to count number of
execution cycles for a function block on a processing element during simulation.
The purpose of modification in instruction and data RAMs is to generate a memory
access profile for extraction of the information such as: time at which the request is
made, the address of the transaction and the type of transfer etc. This information is
stored in a simple text file during simulation. We modify the simulation platform such
that the new simulation platform instantiate the component “VCI_Profile_Helper”
and modified RAMs as shown in Fig. 5.
An Efficient Cycle Accurate Performance Estimation Model 227
Input Data
Figure 6 represents the simulation flow that will be used in our experiments (Sect. 6).
Application is compiled with the GNU GCC tool suite. SoCLib simulation platform
models all architecture features. It is compiled with the GCC and yields a binary
executable named “simulation.x”. The results of the simulation at cycle-accurate
level for each function with different input data are stored in performance estimation
database. Since the platform simulator models all architecture features and the binary
executable is obtained after compilation. In addition to this, simulation is performed
with different types of data set. Therefore, the simulation flow incorporates the three
basic requirements of performance estimation.
This section described a simulation platform to implement the proposed perfor-
mance estimation technique. The next section will use this simulation platform to
perform experiments with H.264 video encoding application.
6 Experimental Results
This section presents experimental results for the performance estimation technique
(Sect. 4) by using the SocLib simulation platform (Sect. 5) with X264 application
which is an open source implementation of H.264.
228 M. Rashid
The target simulation platform and simulation flow are shown in Figs. 5 and 6
respectively. In the simulation platform of Fig. 5, we have used cycle-accurate simula-
tion models of different processors from SoCLib library. It includes ARM, PowerPC
and MIPS with associated instruction and data cache. We encode 745 frames of
QCIF format moving picture. The frame sequence consists of one I-type frame and
the subsequent 734 P-type frames. We estimate the performance of each function
block on PowerPC405, ARM7TDMI and ARM966 processors.
The performance estimation results of X264 video encoder for PowerPC405 with
32 KB of instruction and data cache are summarized in Table 3. The first column
of Table 3 lists all function blocks in the application. For each function block, sim-
ulations are performed with different data sets to obtain WCET (Definition 1) and
ACET (Definition 2) listed in the second and the third columns respectively.
The diversity “D” is shown in the fifth column and is obtained by dividing the
WCET with ACET. Total execution time (TET) for each function block is shown in
the fourth column and is obtained by multiplying execution time (WCET or ACET
depending upon the value of “D” for the function block) with total number of calls.
We observe that the value of “D” is comparatively large for the mc_chroma block,
get_ref block and the IDCT block. Therefore, using the WCET for these function
blocks is not adequate measure of estimated performance for cost sensitive designs.
If the cache miss penalty is zero, which implies perfect cache hypothesis or no
external memory access, the processor times become the performance of function
the L2 cache, if frame size is larger such that the required data cannot be obtained
from the L2 cache. Therefore, the cache miss penalty of the L1 cache highly depends
on the hit rate of L2 cache and the hit rate of the L2 cache is related to frame size.
Since all function blocks are executed in a single processor, there is no commu-
nication overhead included in this experiment. Figure 7 also shows the experimental
results with other image samples. Here, we estimate the block performance separately
for each image sample since the performance values are quite different depending
on the scene characteristics.
Section 6.2 presented the experimental results on PowerPC405. This section shows
the performance estimation of H.264 video encoder on different processors as
shown in Table 4. As candidate processing elements, we have used PowerPC405,
ARM7TDMI and ARM966 with an L1 cache only. The total number of exe-
cution cycles for one PowerPC405 processor, two PowerPC405 processors, one
ARM7TDMI processor, two ARM7TDMI processors, one ARM966 processor and
two ARM966 processors are 3.7×1011 , 2.9×1011 , 6.5×1011 , 4.6×1011 , 5.2×1011
and 3.6 × 1011 respectively.
Fig. 8 The consecutive frames, average and worst case execution cycles
The simulation time of the entire X264 video encoding application with PowerPC405
processor is 16 h and 30 min. This simulation time is recorded for a QCIF video of 745
frames with one I-type frame and the subsequent 734 P-type frames. Simulation is
performed for all function blocks in the application. In order to reduce the simulation
time, performance values of function blocks in performance estimation database are
used as shown in Table 5.
Decrease in simulation time during each iteration is shown in the third column
of Table 5. 25 % decrease is obtained by using “SATD” performance values from
performance estimation database. Similarly, 30 % decrease is obtained by using
“get_ref” values from performance estimation database and so on.
232 M. Rashid
Whether we estimate the performance of function block again for a new application
or not is a trade-off. The performance of a function block may depend on what appli-
cation it is used in and what are the input value ranges. Estimating the function block
performance again with a new application gives more accurate information for the
next design space exploration step. However, it costs time overhead of candidate
processor. If the number of candidate processors are large, this overhead may be too
huge to be tolerated within the tight budget of design time.
This section has presented the experimental results of the proposed performance
estimation technique with X264 application. The PowerPC405, ARM7TDMI and
ARM966 from SoCLib library were used to perform simulations. Experimental
results included the cache miss penalty as well as the non-uniform execution time.
Finally, the decrease in simulation time was illustrated in Table 5.
7 Conclusions
This article presented a DSE framework consisting of five stages, with the empha-
sis on software performance estimation at cycle-accurate level. The proposed per-
formance estimation methodology stored performance estimation results of each
function block on a simulation platform in a performance estimation database. The
database values were used for architecture components selection.
After component selection and mapping decision was made, the performance
of the entire application was computed as a linear combination of individual func-
tion blocks performance values. The proposed technique considered the effects of
architecture features, compiler optimizations and data dependent behavior of the
application. We have extended the SoCLib library to build a simulation platform for
experiments.
Experimentation with H.264 encoder has proved that the proposed performance
estimation technique satisfy the requirements of accuracy and adaptability at the same
time. A simple linear combination of performance numbers has given an accurate
(within 1 %) performance estimate of the entire application.
An Efficient Cycle Accurate Performance Estimation Model 233
References
1. S. Abdi, Y. Hwang, L. Yu, G. Schirner, D.D. Gajski, Automatic TLM generation for early
validation of multicore systems. IEEE Des. Test Comput. 28(3), 10–19 (2011)
2. J. Aycock, A brief history of just-in-time. ACM Comput. Surv. 35(2), 97–113 (June 2003)
3. I. Boandhm, B. Franke, N. Topham, Cycle-accurate performance modelling in an ultra-fast
just-in-time dynamic binary translation instruction set simulator, in International Conference
on Embedded Computer Systems (SAMOS), (2010), pp. 1–10
4. G. Braun, A. Hoffmann, A. Nohl, H. Meyr, Using static scheduling techniques for the retar-
geting of high speed, compiled simulators for embedded processors from an abstract machine
description, in Proceedings of the 14th International Symposium on System Synthesis (ISSS’01)
(Montreal, Quebec, Canada, October, 2001), pp. 57–62
5. D. Burger, T.M. Austin, The SimpleScalar Tool Set, Version 2.0. Technical report CS-TR-
1997-1342 (1997)
6. L. Eeckhout, K. De Bosschere, H. Neefs, Performance analysis through synthetic trace genera-
tion, in Proceedings of the IEEE International Symposium on Performance Analysis of Systems
and Software (ISPASS’00) (Austin, Texas, USA, April, 2000), pp. 1–6
7. Evan Data Corporations, in Software Development Platforms-2011 Rankings (2011).
8. L. Gao, K. Karuri, S. Kraemer, R. Leupers, G. Ascheid, H. Meyr, Multiprocessor performance
estimation using hybrid simulation, in Proceedings of the 45nd Design Automation Conference
(DAC’08). Anaheim, CA, USA 8–13, 325–330 (June 2008)
9. K. Karuri, M.A. Al Faruque, S. Kraemer, R. Leupers, G. Ascheid, H. Meyr, Fine-grained appli-
cation source code profiling for ASIP design, in Proceedings of the 42nd Design Automation
Conference (DAC’05) (Anaheim, California, USA, June, 2005), pp. 329–334
10. M.T. Lazarescu, J.R. Bammi, E. Harcourt, L. Lavagno, M. Lajolo, Compilation-based software
performance estimation for system level design, in Proceedings of the IEEE International High-
Level Validation and Test Workshop (Washington, DC, USA, November, 2000), pp. 167–172
11. T. Meyerowitz, A. SangiovanniVincentelli, M. Sauermann, D. Langen, SourceLevel timing
annotation and simulation for a heterogeneous multiprocessor, in Proceedings of the Design,
Automation and Test in Europe Conference (DATE’08) (Munich, Germany, March, 2008), pp.
276–279
12. M.H. Rashid, in System Level Design: A Holistic Approach. (Lap Lambert Academic Pub-
lishing, 2011).
13. A. Muttreja, A. Raghunathan, S. Ravi, N.K. Jha, Hybrid simulation for energy estimation of
embedded software. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 26(10), 1843–1854
(2007)
14. J. Park, S. Ha, Performance analysis of parallel execution of H.264 encoder on the cell processor,
in Proceedings of 5th IEEE/ACM Workshop on Embedded Systems for Real-Time Multimedia
(ESTIMedia’07), Salzburg, Austria, October 2007, pp. 27–32
15. W. Qin, J. D’Errico, X. Zhu, A Multiprocessing Approach to Accelerate Retargetable and
Portable Dynamic-compiled Instruction-set Simulation. 4th International Conference on Hard-
ware/software Co-design and System Synthesis (CODES+ISSS’06), Seoul, Korea, October
2006, pp. 193–198.
16. M. Rashid, F. Urban, B. Pottier, A transformation methodology for capturing data flow speci-
fication, in Proceedings of the IASTED International Conference on Parallel and Distributed
Computing and Networks (PDCN’09) (Innsbruck, Austria, February, 2009), pp. 220–225
17. M. Reshadi, N. Dutt, P. Mishra, A retargetable framework for instruction-set architecture sim-
ulation. ACM Trans. Embed. Comput. Syst. 5(2), 431–452 (2006)
18. M. Rosenblum, E. Bugnion, S. Devine, S. Herrod, Using the SimOS machine simulator to
study complex computer systems. ACM Trans. Model. Comput. Simul. 7(1), 78–103 (1997)
19. M.S. Suma, K.S. Gurumurthy, Fault simulation of digital circuits at register transfer level. Int.
J. Comput. Appl. 29(7), 1–5 (2011)
20. T. Sherwood, E. Perelman, G. Hamerly, S. Sair, B. Calder, Discovering and exploiting program
phases. IEEE Micro 23(6), 84–93 (2003)
234 M. Rashid
1 Introduction
With recent advances in VLSI technologies, modern chips can embed large number
of processing cores as a multi-core chip. Such multicore chips require efficient com-
munication architecture to provide a high performance connection between the cores.
Network-on-Chip (NoC) has been recently proposed as a scalable communication
M. A. Khan et al. (eds.), Embedded and Real Time System Development: 235
A Software Engineering Perspective, Studies in Computational Intelligence 520,
DOI: 10.1007/978-3-642-40888-5_9, © Springer-Verlag Berlin Heidelberg 2014
236 R. Sabbaghi-Nadooshan et al.
architecture for multicore chips [1]. In NoC paradigm, every core communicates with
other cores using on-chip channels and an on-chip router. On-chip channels construct
a predefined structure called topology. The NoC is a communication centric intercon-
nection approach which provides a scalable infrastructure to interconnect different
IPs and sub-systems in a SoC [2]. The NoC can make SoC more structured, reusable
and can also improve their performance. Since the communication between the var-
ious processing cores will be deciding factor for the performance of such systems,
therefore we need to focus on making this communication faster as well as more
reliable.
Also, the network topology has direct impact on important NoC parameters e.g.,
network diameter, bisection width, and the routing algorithm [3]. The topology has
a great impact on the system performance and reliability. It generally influences net-
work diameter (the length of the maximum shortest path between any two nodes),
layout and wiring [4]. These characteristics mainly determine the power consumption
and average packet latency [5]. Before we delve deeper into the widely used topolo-
gies, the main characteristics of network topology which are described in Table 1
should be understood first.
Authors have proposed several topologies in the literature such as mesh topol-
ogy [6], hypercube topology [7], tree topology [8], and de Bruijn topology. Each of
these topologies has its pros and cons; for example, mesh topology is used in the
fabrication of several NoCs because of its simple VLSI implementation; however,
other topologies are also favored by NoC designers due to their exclusive features.
The de Bruijn topology is one of those topologies which provide a very low diameter
in comparison with the mesh topology, however imposes a cost equal to a linear
Since the proposed network topology is based on de Bruijn, this section briefly
introduces the de Bruijn topology [17, 18]. An n-dimensional de Bruijn topology
is a directed graph including k n nodes. In this topology, node u = (u n , . . . , u 1 ) is
connected to the node v = (vn , . . . , v1 ) if and only if u i = vi+1 1 ∗ i ∗ n − 1. In
other words, node v has a directed link to node u if and only if
u = v × k + r (mod k n ), 0 ∗ r ∗ k − 1 (1)
(a)
(b)
Fig. 1 The de Bruijn network with (a) 8 nodes and (b) 16 nodes
two operations, the following routing algorithm can deliver any packet in a de Bruijn
network.
For routing algorithm, Ganesan [11] splits the de Bruijn networks into two trees
i.e., T1 and T2, (see Fig. 2) to perform the routing with at most four steps. At first,
the message is routed between T1 to T2 if it is necessary, and then in T2, and then the
message is routed between T2 to T1 and finally in T1. A two-dimensional de Bruijn
topology is a two-dimensional mesh topology in which nodes of each dimension
form a de Bruijn network. An 8 × 8 two-dimensional de Bruijn is shown in Fig. 3.
The proposed routing algorithm for two-dimensional de Bruijn exploits two trees
T1 and T2 in each dimension of the network. Like XY routing in mesh networks,
the deterministic routing first applies the routing mechanism in rows to deliver the
packet to the column at which the destination is located. Afterwards, the message is
routed to the destination by applying the same routing algorithm in the columns.
0 ,0 0 ,1 0 ,2 0 ,3 0 ,4 0 ,5 0 ,6 0 ,7
1, 0 1 ,1 1, 2 1, 3 1, 4 1, 5 1, 6 1, 7
2 ,0 2 ,1 2 ,2 2 ,3 2 ,4 2 ,5 2 ,6 2 ,7
4 ,0 4 ,1 4 ,2 4 ,3 4 ,4 4 ,5 4 ,6 4 ,7
5 ,0 5 ,1 5,2 5 ,3 5,4 5 ,5 5 ,6 5 ,7
6 ,0 6 ,1 6 ,2 6 ,3 6 ,4 6 ,5 6 ,6 6 ,7
7 ,0 7 ,1 7 ,2 7 ,3 7 ,4 7 ,5 7 ,6 7 ,7
Fig. 3 A two-dimensional de Bruijn with 64 nodes composed from eight 8-node de Bruijn networks
(as shown in Fig. 1a) along each dimension
Deterministic routing is based on minimum hop and distance between source and
destination nodes, unlike partially adaptive and fully adaptive is fixed and is shown
with dS−D . In deterministic routing, if the specific node k is a part of the route, number
of hops between nodes of source S and destination D equal to the distance between
source node and specific node k plus the distance between node k and destination
node and vice versa. Above condition will be
If a specific node k is next bode (N), then the above condition will be as follows:
Condition 0:
d S−D = d S−N + d N −D (3)
In this chapter, all nodes that are necessary for routing source S to destination D,
are shown with P(S, D). Also main route is the route that source node moves to
marked destination node.
In proposed multicast routing (that is based on minimum distance between nodes
of source and destination); a destination is selected randomly (D0 ) and marked.
Message routes as unicast to deliver it to the marked destination (D0 ). At each hop,
N (that is next node in current message) and D P (that is one of destination nodes
except marked destination node in current message) are placed in condition (0). If
condition 0 for specific destination (D P ) is true, the next node in main route belongs
to P(S, D P ). Therefore, message is not duplicated and routing is continued with a
message. Otherwise, next node in main route with next node in P(S, D P ) is different
and for routing D P , message should be duplicated. Therefore, a necessary condition
to duplicate the message is below condition.
Condition 1:
d S−D p ∈= d S−N + d N −D P (4)
As an example, we suppose that node (3, 0) has message for nodes (5, 4), (7, 5) and
(1, 1) as Fig. 4.
Multicast Algorithm for 2D de Bruijn NoCs 241
For multicasting, one destination is selected randomly (d0 : (5,4)) and message is
routed as unicast R((3,0), (5,4)). At each hop, two above conditions (1), (2) for the
remaining destinations ((7, 5), (1, 1)) are checked and if both of conditions are true,
message will be duplicated.
In first step (Fig. 5), condition (1) is false for both of destinations (7, 5), (1, 1)
but second condition is true for them. Therefore, the next node (3, 1) in first path is
shared with two other paths. Now, there is only one message into the network.
S = (3, 0), C = (3, 0), N = (3, 1), D = {(7, 5), (1, 1)}
In second step as Fig. 6, condition (2), (1) are true for destination (1, 1). Thus,
message will be duplicated to route (1, 1) and in this new message destination node
(1, 1) is marked. However, for destination (7, 5), according to condition (1) that is
false, message will not duplicate. Now, there are two messages into the network.
242 R. Sabbaghi-Nadooshan et al.
(a) (b)
(c)
d1 d2
Fig. 5 First hop checking. a the dSDi ∈= dSN +dNDi (condition 1), b dSDi = dSC +dCDi (condition 2),
c Table
S = (3, 0), C = (3, 1), N = (3, 2), D = {(7, 5), (1, 1)}
In third step (Fig. 7), condition (1), (2) are true for destination (7, 5) and message
will be duplicated to route it and (7, 5) is marked in this new message. Even so
condition (2) for destination (1, 1) is false. Therefore, message will not copy. Now,
there are three messages into the networks.
After this step (when the number of destinations equal to number of messages),
at all next steps, condition (2) for all destinations will be false and no message will
be duplicated. Furthermore, routing will be as unicast. Pseudo code of proposed
algorithm is shown in Fig. 8.
Multicast Algorithm for 2D de Bruijn NoCs 243
(a) (b)
(c)
d1 d2
Fig. 6 Second hop checking. a the dSDi ∈= dSN + dNDi (condition 1), b dSDi = dSC + dCDi
(condition 2), c Table
3 Simulation Results
(a) (b)
(c)
d1 d2
Fig. 7 Third hop checking. a the dSDi ∈= dSN + dNDi (condition 1), b dSDi = dSC + dCDi
(condition 2), c Table
In the following figures the average message latency and power consumption are
shown. The x-axis of these figures indicates the generation rate and y-axis indicates
power and delay in our simulations. Figure 9 compares the average message latency
for different traffic patterns with different message lengths of 32 and 64 flits. As can
be seen, the multicast routing has smaller average message latency with respect to
the unicast routing algorithm for the full range of network load under various traffic
patterns (especially in uniform traffic). For hotspot traffic load a hotspot rate of 16 %
is assumed (i.e. each node sends 16 % of messages to the hotspot node (node (7, 7))
and the rest of messages to other nodes uniformly). As can be seen in the figures, the
multicast routing can better cope with non-uniformity of the network traffic and its
performance improvement over unicast under hotspot traffic pattern.
Figure 10 demonstrates power consumption of the multicast routing and uni-
cast routing with various traffic patterns. Simulation results indicate that the power
Multicast Algorithm for 2D de Bruijn NoCs 245
consumption of multicast routing is less than the power dissipated of unicast routing
for light to medium traffic loads. However, it begins to behave differently near heavy
traffic regions where the unicast routing saturates and cannot handle more traffic.
Obviously, handling more traffic load (after the point that the unicast is satu-
rated) requires more power for multicast routing. Note that when the unicast routing
246 R. Sabbaghi-Nadooshan et al.
(a)
(cycles)
unicast-U 32F
60
50 multicast-hot 32F
40 unicast-hot 32F
30
(b)
Average Message Latency (cycles)
300
250
200
150
multicast-U 64F
100
unicast-U 64F
50 multicast-hot 64F
unicast-hot 64F
0
.00001 .00101 .00201 .00301 .00401 .00501
Message Generation Rate (λ)
Fig. 9 Compares the latency for different traffic patterns with different message lengths of 32
and 64
50
45
Power (nj / cycles)
40
35
30
25 multicast-U 32F
20
15 unicast-U 32F
10 multicast-hot 32F
5
0 unicast-hot 32F
Fig. 10 Demonstrates power consumption for various traffic patterns and message lengths of 32
and 64
Multicast Algorithm for 2D de Bruijn NoCs 247
approaches its saturation region, the multicast routing can still handle the traffic
effectively and the saturation point for them is higher than that unicast routing.
In Fig. 11a, the average message latency is plotted as a function of message
generation rate at each node for the multicast routing and unicast routing for different
message lengths of 32 and 64 flits. As can be seen in the Fig. 11a, the multicast routing
has smaller average message latency with relation to the unicast routing. Figure 11b
compares the total network power in different message lengths of 32 and 64. The
obtained result of xmulator indicates the multicast routing goes to the saturation later
and can send more packages; therefore has more power.
(a)
200
Average Message Latency
180
160
140
(cycles)
120
multicast 64F
100
80 unicast 64F
60 multicast 32F
40
20 unicast 32F
0
(b)
50
45
Power (nj / cycles)
40
35
30
multicast 32F
25
20 unicast 32F
15 multicast 64F
10
5 unicast 64F
0
Fig. 11 Compares the performance and total network power in different message lengths of 32
and 64
248 R. Sabbaghi-Nadooshan et al.
4 Conclusion
References
M. A. Khan et al. (eds.), Embedded and Real Time System Development: 251
A Software Engineering Perspective, Studies in Computational Intelligence 520,
DOI: 10.1007/978-3-642-40888-5_10, © Springer-Verlag Berlin Heidelberg 2014
252 A. Gharbi et al.
1 Introduction
Nowadays, the new generations of distributed embedded control systems are more
and more sophisticated since they require new forms of properties such as reconfig-
urability, reusability, agility, adaptability and fault-tolerance. The first three proper-
ties are offered by new advanced component-based technologies, whereas the last two
properties are ensured by new technical solutions such as multi-agent architectures.
New generations of component-based technologies have recently gained popular-
ity in industrial software engineering since it is possible to reuse already developed
and deployed software components from rich libraries. A Control Component is
a software unit owning data of the functional scheme of the system. This advan-
tage reduces the time to market and allows minimizations of the design complexity
by supporting the system’s software modularity. This chapter deals with run-time
automatic reconfigurations of component-based applications by using multi-agent
solutions. An agent is assumed to be a software unit allowing the control of the
system as well as its environment before applying automatic reconfigurations. The
reasons for which reconfigurations may be taken are classified into two categories
[33]: (1) corrective reasons: if there is one component which is misbehaving, then
it is automatically substituted by a new one which is assumed to run correctly. The
new component is supposed to have the same functionalities as the old one. (2)
Adaptive reasons: even the component-based application is running well, dynamic
adaptations may be needed as a response to the new environment evolutions, in order
to extend new functionalities or to improve some required functional properties.
Dynamic reconfigurations can cover the following issues: (1) architecture level which
means the set of components to be loaded in memory to constitute the implemented
solution of the assumed system; (2) control level which means the compositions
of components; (3) implementation level which means the behavior of compo-
nents encoded by algorithms; and (4) data level which means the global values. We
define a multi-agent architecture for reconfigurable embedded control systems where
a Reconfiguration Agent is affected to each device of the execution environment to
apply automatic reconfigurations of local components, and a Coordination Agent
which is used for coordination between distributed Reconfiguration Agents in order
to allow coherent distributed reconfigurations. The Coordination Agent is based on
a coordination protocol using coordination matrices which define coherent simulta-
neous reconfigurations of distributed devices. We propose useful meta-models for
Control Components and also for intelligent agents. These meta-models are used
to implement adaptive embedded control systems. As we choose to apply dynamic
scenarios, the system should run even during automatic reconfigurations, while pre-
serving correct executions of functional tasks.
Given that Control Components are defined in general to run sequentially, this
feature is inconvenient for real-time applications which typically handle several
inputs and outputs in a too short time constraint. To meet performance and tim-
ing requirements, a real-time must be designed for concurrency. To do so, we
define at the operational level some sequential program units called real-time tasks.
Functional and Operational Solutions 253
Thus, we define a real-time task as a set of Control Components having some real-
time constraints. We characterize a task by a set of properties independently from
any Real Time Operating System (RTOS). We define service processes as soft-
ware processes for tasks to provide system’s functionalities, and define reconfigura-
tion processes as tasks to apply reconfiguration scenarios at run-time. In fact, service
processes are functional tasks of components to be reconfigured by reconfigura-
tion processes. To guarantee a correct and safety behavior of the system, we use
semaphores to ensure the synchronization between processes. We apply the famous
algorithm of synchronization between reader and writer processes such that execut-
ing a service is considered as a reader and reconfiguring a component is assumed to
be a writer process. The proposed algorithm ensures that many service processes can
be simultaneously executed, whereas reconfiguration processes must have exclusive
access. We study in particular the scheduling of tasks through a Real Time Operating
System. We apply the priority ceiling protocol proposed by Sha et al. [49] to avoid
the problem of priority inversion as well as the deadlock between the different tasks.
The priority ceiling protocol supposes that each semaphore is assigned a priority
ceiling which is equal to the highest priority task using this semaphore. Any task
is only allowed to enter its critical section if its assigned priority is higher than the
priority ceilings of all semaphores currently locked by other tasks.
In this chapter, we continue our research by proposing an original implementation
of this agent-based architecture. We assume that agent controls the plant to ensure the
system running physically. The design and the implementation of such agent under
Real-Time constraints are the scope of this study. The main contributions of this
chapter are the following: (1) a complete study of Safety Reconfigurable Embedded
Control Systems from the functional level (i.e. dynamic reconfiguration system with
a multi-agent system) to the operational level (i.e. decomposition of the system into
a set of tasks with time constraints); (2) a global definition of real-time task with
its necessary parameters independently from any real-time operating system; (3) the
scheduling of these real-time tasks considered as periodic tasks with precedence and
mutual exclusion constraints. To our best of knowledge, there is no research works
which deal with these different points together.
We present in Sect. 2 the state of art about dynamic reconfiguration. Section3
presents the benchmark production systems FESTO and EnAS that we follow as
running examples in the chapter. We define in Sect. 4 a multi-agent architecture
and the communication protocol to ensure safety in a distributed embedded control
systems. Section 5 presents the real-time task model and studies the safety of its
dynamic reconfiguration as well as the scheduling between the different tasks. We
finally conclude the chapter in Sect. 6.
2 Dynamic Reconfiguration
The new generation of industrial control systems is addressing today new crite-
ria as flexibility and agility [43, 48]. We distinguish two reconfiguration policies:
static and dynamic policies such that static reconfigurations are applied off-line to
254 A. Gharbi et al.
apply changes before any system cold start [3], whereas dynamic reconfigurations are
dynamically applied at run-time. Two cases exist in the last policy: manual reconfig-
urations applied by users [47] and automatic reconfigurations applied by intelligent
agents [2]. We are interested in automatic reconfigurations of an agent-based embed-
ded control system when hardware or software faults occur at run-time. The system
is implemented by different complex networks of Control Components. In literature,
there are various studies about dynamic reconfigurations applied to component-based
applications. Each study has its strength and its weakness. In the article [35], the
authors propose to block all nodes involved in transactions (considered as sets of
interactions between components) to realize dynamic reconfigurations. This study
has influenced many research works later. Any reconfiguration should respect the
consistency propriety which is defined as sets of logical constraints. A major dis-
advantage of this approach is the necessity to stop all components involved in a
transaction. In the article [4], problem of dynamic reconfigurations in CORBA is
treated. The authors consider that consistency is related to Remote Procedure Call
Integrity. To ensure this property, they propose to block the incoming before the out-
going links. However, the connection between components must be acyclic in order
to be able to block connections in the right order. A dynamic reconfiguration language
based on features [41] is proposed. The authors use the control language MANIFOLD
where processes are considered as black boxes having ports of communication. In
this case, the communication is anonymous. The processes having access to shared
data are connected in cyclic manners to wait tokens that visit each one at turn (as in
token ring). Although the novelty of this solution, there is a loss of time especially
at waiting until receiving the token to access to the shared data or also to reconfigure
the system. Another study [46] is proposed to apply dynamic updates on graphical
components (for example button, graphical interface, . . .) in a .Net framework. To
do so, the authors associate for each graphical component an appropriate running
thread. The synchronization is ensured through the reader-writer-locks. The dynamic
reconfiguration is based on blocking all involved connections. Due to rw-locks,
this solution works only on local applications. In addition, they define [45] a new
reconfiguration algorithm ReDAC (Reconfiguration of Distributed Application with
Cyclic dependencies) ensuring dynamic reconfigurations in distributed systems to
be based on running multi-threads. This algorithm is applied to capsules which are
defined as groups of running components. As disadvantage, the proposed algorithm
uses counter variables to count on-going method calls for threads which lead to
consume further space memory and treatment time.
To our best of knowledge, there is no research works which treat the problem of
dynamic software reconfigurations of component-based technology with semaphores.
The novelty of this chapter is the study of dynamic reconfiguration with semaphore
ensuring the following points: (1) blocking connections without blocking involved
components; (2) safety and correctness of the proposed solution; (3) independence
of any specific language; (4) verification of consistency (i.e. logical constraints)
delegated to the software agent; (5) suitable for large-scale applications.
Functional and Operational Solutions 255
We present two Benchmark Production Systems1 : FESTO and EnAS available in the
research laboratory at the Martin Luther University in Germany.
4 Multi-agent System
Each state machine (SMi ) is a graph of states and transitions. A state machine
treats the several events that may occur by detecting them and responding to each
one appropriately. We define a state machine as the following:
Nested State
machine
# listSM
# initialSM
# inputEvent
# outputEvent
+ nextSM ()
+ setSM ()
+ setInputEvt ()
+ setOutputEvt ()
+ setInitialSM ()
+ addSM ()
+ removeSM ()
+ linkSM ()
+ unlinkSM ()
1 Event
# eventID
1 # immediate
*
Input Event
...
Execution context
Current state
...
Current state
machine
List of events
Execute ()
NextState ()
NextStateMachine ()
Information about
Component base current state
Agent description
In the following algorithm, the symbol Q is an event queue which holds incoming
event instances, ev refers to an event input, Si represents a State Machine, and si,j a
state related to a State Machine Si . The internal behavior of the agent is defined as
follow:
Algorithm 1: GenericBehavior
begin
while (Q.length() > 0) do
ev ← Q.Head()
For each state machine SMi do
si,j ← currentStatei
If ev ∈ I(si,j ) then
For each state si,k ∈ next(si,j )
such that si,k related to Si do
If execute(si,k ) then
currentStatei ← si,k
break
end if
end for
For each state sl,k ∈ next(si,j )
such that sl,k related to Sl do
If execute(sl,k ) then
currentStatel ← sl,k
break
end if
end for
end if
end for
end while
end.
First of all, the agent evaluates the pre-condition of the state si,j . If it is false, then
the agent exits, Else the agent determines the list of Control Components concerned
by this reconfiguration, before applies the required reconfiguration for each one.
Finally, it evaluates the post-condition of the state si,j and generates errors whenever
it is false.
Functional and Operational Solutions 261
1 2 3 4 Reconfigurations to be
applied simultaneously
Ag1
Aga ia ja ka ha Reconfiguration to
be applied by Ag a
Agc ic jc kc hc Reconfiguration to
be applied by Ag c
Ag n
refuse
accept
[nbResp = nbR-1]
Apply the new reconfiguration *[j:=1..nbR, j <> i] apply the new reconfiguration
• If CA(ξ(Sys)) receives positive answers from all agents, then it authorizes recon-
figurations in the concerned devices: For each Agb , b ∈ [1, n] and CM[b, i] = 0,
∀i ∈ [1, 4], apply (ReconfigurationCM[b,1],CM[b,2],CM[b,3],CM[b,4]
b ) in deviceb . Else
If CA(ξ(Sys)) receives a negative answer from a particular agent, then
– If the reconfiguration scenario Reconfigurationiaa ,ja ,ka ,ha allows optimizations of
the whole system behavior, then CA(ξ(Sys)) refuses the request of Aga by send-
ing the following reply: refused_reconfiguration(CA(ξ(Sys)), Aga ,
a
ReconfigurationCM[a,1],CM[a,2],CM[a,3],CM[a,4] )).
System
# SystemID
# listAgent
+ addAgent ()
+ deleteAgent ()
+ searchAgent ()
Message
# content
# perfomative
# receiver
Control Agent # sender
# time
# AgentID
# nameAgent Send/receive + setContent ()
+ getContent ()
+ setPerformative ()
+ subscribe () + getPerformative ()
+ unsubscribe () + setReceiver ()
+ communicate () + getSender ()
+ getTime ()
+ setTime ()
Coordination Reconfiguration
Agent Agent
# listAgent
# CoordinatorID
# matrices
# actualReconfig
# actualMatrix
+ searchAgents () + searchCoordinator ()
+ setMatrices () + searchReconfig ()
+ decideMatrix () + decideReconfig ()
+ setMatrix () + setReconfig ()
+ communicate () + communicate ()
(RAj , j ∈ [1..NbR], j <> i), it decides to cancel this reconfiguration and informs
the corresponding agent by its decision (i.e RAi ). Figure 4 depicts the interaction
between Reconfiguration and Coordination agents to ensure dynamic reconfigura-
tion in a distributed system.
Before sending or receiving a message, the Reconfiguration Agent searches
the Coordination Agent with the method searchCoordinator(). The Coordination
Agent in its turn searches also the list of Reconfiguration Agents with the method
searchAgents().
The method receive() used by both Coordination Agent and Reconfiguration
Agent permits to receive a message sent by another agent. Whenever receive() is
Functional and Operational Solutions 265
Algorithm CA_Communicate()
begin
switch (step)
case 0:
// Wait a request from a Reconfiguration Agent
reply ← receive();
if (reply != null)
if (reply.getPerformative() = REQUEST)
i ← reply.getSender();
Matrix ← decideMatrix(reply.getContent());
step++;
else
block();
break;
case 1:
// Send the proposition to all Reconfiguration Agents
for j = 1 to NbR do
if (j <> i)
msg.addReceiver(reconfigurationAgents[j]);
msg.setContent(Matrix[j]);
msg.setPerformative(PROPOSE);
msg.setTime(currentTime());
send(msg);
step++;
break;
266 A. Gharbi et al.
case 2:
// Receive all accept/refusals from Reconfiguration Agents reply ← receive();
if (reply != null)
if (reply.getPerformative() = ACCEPT)
nbResp++;
if (nbResp = nbR-1)
step++;
else
if (reply.getPerformative() = REFUSE)
step ← 4;
else
block();
break;
case 3:
// Send accept response to all Reconfiguration Agents
for j = 1 to NbR do
msg.addReceiver(reconfigurationAgents[j]);
msg.setPerformative(CONFIRM);
msg.setTime(currentTime());
msg.setContent(Matrix[j]);
send(msg);
setMatrix(Matrix);
step ← 0;
break;
case 4:
// Send refuse response to the Reconfiguration Agent i
msg.addReceiver(reconfigurationAgents[i]);
msg.setPerformative(CANCEL);
msg.setTime(currentTime());
send(msg);
step ← 0;
break;
end
Algorithm RA_Communicate()
begin
switch (step)
case 0:
// Wait a request from a Coordination Agent
Functional and Operational Solutions 267
reply ← receive();
if (reply != null)
if (reply.getPerformative() = REQUEST)
newReconfig ← reply.getContent();
response.setReceiver(CoordinatorID);
if (decideReconfig(newReconfig))
response.setPerformative(ACCEPT);
else
response.setPerformative(REFUSE);
send(response)
step++;
else
block();
break;
case 1:
// Wait the response from a Coordination Agent
reply ← receive();
if (reply != null)
if (reply.getPerformative() = CONFIRM)
setReconfig(newReconfig);
step ← 0;
break;
end
CC 1 CC2 ... CC n
A real time task is considered as a process (or a thread depending on the Operating
System) having its own data (such as registers, stack, . . .) which is in competition
with other tasks to have the processor execution. A task is handled by a Real-Time
Operating System (RTOS) which is a system satisfying explicitly response-time con-
straints by supporting a scheduling method that guarantees response time especially
to critical tasks.
In this paragraph, we aim to present a real-time task as a general concept inde-
pendently from any real-time operating system.
To be independent from any Real-Time Operating System and to be related to our
research work, we define a task τi as a sequence of Control Components, where a
Control Component is ready when its preceding Control Component completes its
execution. τi,j denotes the j-th Control Component of τi (Fig. 6). Thus, our application
consists of a set of periodic tasks τ = (τ1 , τ2 , . . . , τn ). All the tasks are considered as
periodic this is not a limitation since non-periodic task can be handled by introducing
a periodic server.
Running Example. In the FESTO Benchmark Production System, the tasks τ1 to
τ9 execute the following functions:
• (τ1 ) Feeder pushes out cylinder and moves backward/back;
• (τ2 ) Converter pneumatic sucker moves right/left;
• (τ3 ) Detection Module detects workpiece, height, color and material;
• (τ4 ) Shift out cylinder moves backward/forward;
• (τ5 ) Elevator elevating cylinder moves down/up;
• (τ6 ) Rotating disc workpiece present in position and rotary indexing table has
finished a 90 rotation;
• (τ7 ) Driller 1 machine drills workpiece;
• (τ8 ) Driller 2 machine drills workpiece;
• (τ9 ) WarehouseCylinder removes piece from table.
In the following paragraphs, we introduce the meta-model of a task. We study also
the dynamic reconfiguration of tasks. After that, we introduce the task scheduling.
Functional and Operational Solutions 269
In this chapter, we extend the work presented in [42] by studying both a task and a
scheduler in a general real-time operating system where each task is characterized
by:
priority: each task is assigned a priority value which may be used in the scheduling.
• a deadline Di ;
• an execution time Ci ;
• a period Ti ;
• a set of inputs Ii ;
• a set of outputs Oi ;
270 A. Gharbi et al.
Task created
ready
Task having the
highest priority
Task finishes
its execution
Waiting for
blocked unavailbale resource
terminated
• a set of constraints ρi ;
• a set of ni Control Components (ni ∗ 1) such that the task τi is constituted by
CCi1 , CCi2 , …, CCini .
One of the core components of an RTOS is the task scheduler which aims to
determine which of the ready tasks should be executing. If there are no ready tasks
at a given time, then no task can be executed, and the system remains idle until a task
becomes ready (Fig. 8).
Running Example. In the FESTO Benchmark Production System, when the task
τ1 is created, it is automatically marked as Ready task. At the instant t1,it is executed
by the processor (i.e. it is in the Running state). When the task τ1 needs a resource
at the instant t2, it becomes blocked. Whenever the resource is available at the
Functional and Operational Solutions 271
Scheduler CPU
State
Blocked
Running
Ready
Terminated
Time
t1 t2 t3 t4 t5
instant t3, the task τ1 is transformed into ready state. Finally, it is executed again
since the time t4. It is terminated at the instant t5 (Fig. 9).
A scheduler related to a real-time operating system is characterized by (Fig. 10):
readyTask: a queue maintaining the set of tasks in ready state.
executingTask: a queue maintaining the set of tasks in executing state.
minPriority: the minimum priority assigned to a task.
maxPriority: the maximum priority assigned to a task.
timeSlice: the threshold of preempting a task (the quantity of time assigned to a
task before its preemption).
Several tasks may be in the ready or blocked states. The system therefore maintains a
queue of blocked tasks and another queue for ready tasks. The latter is maintained in
a priority order, keeping the task with the highest priority at the top of the list.When
a task that has been in the ready state is allocated the processor, it makes a state
transition from ready state to running state. This assignment of the processor is called
dispatching and it is executed by the dispatcher which is a part of the scheduler.
Running Example. In the FESTO Benchmark Production System, we consider
three tasks τ1 , τ2 and τ3 . having as priority p1, p2 and p3 such that p1 < p2 < p3.
272 A. Gharbi et al.
Task
Scheduler
Task-name
readyTask Task-Id
executingTask Task-Period
minPriority Task-Deadline
maxPriority Task-WCET
Queue Task-Pred
timeSlice
runningTask Task-QoS
Scheduler-State Task-Priority
criteria Task-State
Initialize ()
enqueue () getInfo()
verifyTemporalProp () dequeue () getPriority ()
verifyQoSProp () maintain isEmpty () 1 * setPriority ()
chooseTask () Length () addPred ()
Create () setPred ()
Suspend () getState ()
Kill () setState ()
Activate () setTemporal ()
preemptionLock () addComponent ()
preemptionUnlock () removeComponent ()
setQoS ()
Priority
Context
Context
switch
switch
Context
switch Task 3
Task 2 Task 2
Task 1
Time
t1 t2 t3
We suppose that the task τ1 is running when the task τ2 is created at the instant t1. As
a consequence, there is a context switch so that the task τ1 stays in a ready state and
the other task τ2 begins its execution as it has higher priority. At the instant t2, the
task τ3 which was already blocked waiting a resource, gets the resource. As the task
τ3 is the highest priority, the task τ2 turns into ready state and τ3 executes its routine.
The task τ3 continues processing until it has completed, the scheduler enables τ2 to
become running (Fig. 11).
Functional and Operational Solutions 273
P(serv)
Nb ← NB − 1
274 A. Gharbi et al.
if (NB = 0) then
V(reconfig)
end if
V(serv)
Consequently, each service process related to a task does the following instructions:
P(serv)
Nb ← NB - 1
if (NB = 0) then
V(reconfig)
end if
V(serv)
end service
Running Example. Let us take as a running example the task Test related to the
EnAS system. To test a piece before elevating it, this component permits to launch
the Test Service Process. Figure 12 displays the interaction between the objects
Test Service Process,Service semaphore and Reconfiguration semaphore. The flow
of events from the point of view of Test Service Process is the following: (1) the
operation P(serv) leads to enter in critical section for Service semaphore; (2) the
number of services is incremented by one; (3) if it is the first service, then the operation
P(reconfig) permits to enter in critical section for Reconfiguration semaphore; (4)
the operation V (serv) leads to exit from critical section for Service semaphore; (5)
the Test Service Process executes the corresponding service; (6) before modifying
the number of service, the operation P(serv) leads to enter in critical section for
Service semaphore; (7) the number of services is decremented by one; (8) if there
is no service processes, then the operation V (reconfig) permits to exit from critical
section for Reconfiguration semaphore; (9) the operation V (serv) leads to liberate
Service semaphore from its critical section.
With the operation P(reconfig), a reconfiguration process verifies that there is
no reconfiguration processes nor service processes which are running at the same
time. After that, the reconfiguration process executes the necessary steps and runs
the operation V (reconfig) in order to push other processes to begin their execution.
Each reconfiguration process specific to a task realizes the following instructions:
Functional and Operational Solutions 275
Running Example. Let us take as example the task Elevate related to EnAS
system. The agent needs to reconfigure this task which permits to launch the Elevate
Reconfiguration Process. Figure 13 displays the interaction between the following
objects Elevate Reconfiguration Process and Reconfiguration semaphore. The flow
of events from the point of view of Elevate Reconfiguration Process is the follow-
ing: (1) the operation P(reconfig) leads to enter in critical section for Reconfigura-
P(serv)
Critical
Nb Nb + 1 section
[Nb = 1] P(reconfig)
V(serv)
Execute Critical
service
section
P(serv)
Critical
section
Nb Nb -1
[Nb = 0] V(reconfig)
V(serv)
P(reconfig)
Critical
Execute section
reconfiguration
V(reconfig)
tion semaphore; (2) the Elevate Reconfiguration Process executes the correspond-
ing reconfiguration; (3) the operation V (reconfig) leads to liberate Reconfiguration
semaphore from its critical section.
To verify the safety of the synchronization, we should verify if the different con-
straints mentioned above are respected.
First property: whenever a reconfiguration process is running, any service
processes must wait until the termination of the reconfiguration. Let us suppose
that there is a reconfiguration process (so the integer reconfig is equal to zero and
the number of current services is zero). When a service related to this component is
called, the number of current services is incremented (i.e. it is equal to 1) therefore
the operation P(reconfig) leads the process to be in a blocked state (as the integer
reconfig is equal to zero). When the reconfiguration process terminates the reconfig-
uration, the operation V (reconfig) permits to liberate the first process waiting in the
semaphore queue. In conclusion, this property is validated.
Second property: whenever a service process is running, any reconfiguration
processes must wait until the termination of the service. Let us suppose that there
is a service process related to a component (so the number of services is greater or
equal to one which means that the operation P(reconfig) is executed and reconfig is
equal to zero). When a reconfiguration is applied, the operation P(reconfig) leads
this process to be in a blocked state (as the reconfig is equal to zero). Whenever
the number of service processes becomes equal to zero, the operation V (reconfig)
allows to liberate the first reconfiguration process waiting in the semaphore queue.
As a conclusion, this property is verified.
Functional and Operational Solutions 277
How to schedule periodic tasks with precedence and mutual exclusion constraints is
considered as important as how to represent a task in a general real-time operating
system. In our context, we choose the priority-driven preemptive scheduling used
in the most real-time operating systems. The semaphore solution can lead to the
problem of priority inversion which consists that a high priority task can be blocked
by a lower priority task. To avoid such problem, we propose to apply the priority
inheritance protocol proposed by Sha et al. [49].
The priority inheritance protocol can be used to schedule a set of periodic tasks
having exclusive access to common resources protected by semaphores. To do so,
each semaphore is assigned a priority ceiling which is equal to the highest priority
task using this semaphore. A task τi is allowed to enter its critical section only if
its assigned priority is higher than the priority ceilings of all semaphores currently
locked by tasks other than τi .
Schedulability test for the priority ceiling protocol: a set of n periodic tasks using
the priority ceiling protocol can be scheduled by the rate-monotonic algorithm if the
following inequalities hold, ∀i, 1 ∈ i ∈ n,
where Bi denotes the worst-case blocking time of a task τi by lower priority tasks.
278 A. Gharbi et al.
Task
Fig. 14 The priority ceiling protocol applied to three tasks R1, S1 and S2
p1, p2 and p3 such that p1 > p2 > p3. The sequence of processing steps for each
task is as defined in the section previous paragraph where S (resp. R) denotes the
service (resp. reconfiguration) semaphore:
Therefore, the priority ceiling of the semaphore R is equal to the task R1 (because
the semaphore R is used by the tasks R1, S1 and S2 and we know that the task R1 is
the highest priority) and the priority ceiling of the semaphore S is equal to the task
S1 (because the semaphore S is used by the tasks S1 and S2 and the priority task of
S1 is higher). We suppose that the task S2 is running when the task S1 is created at
the instant t3. We suppose also that the task R1 is created at the instant t5. Fig. 14, a
line in a high level indicates that the task is executing, a line in a low level indicates
that the the task is blocked or preempted by another task. Table 3 explains more in
details the example.
6 Conclusion
This chapter deals with Safety Reconfigurable Embedded Control Systems. We pro-
pose conceptual models for the whole component-based architecture. We define a
multi-agent architecture where a Reconfiguration Agent is affected to each device of
the execution environment to handle local automatic reconfigurations, and a Coor-
dination Agent is defined to guarantee safe distributed reconfigurations. To deploy
a Control Component in a Real-Time Operating System, we define the concept of
real-time task in general (especially its characteristics). The dynamic reconfiguration
of tasks is ensured through a synchronization between service and reconfiguration
processes to be applied. We propose to use the semaphore concept for this syn-
chronization such that we consider service processes as readers and reconfiguration
processes as writers. We propose to use the priority ceiling protocol as a method
to ensure the scheduling between periodic tasks with precedence and mutual exclu-
sion constraints. The main contributions presented through this work are: the study
of Safety Reconfigurable Embedded Control Systems from the functional to the
operational level and the definition of a real-time task independently from any real-
time operating system as well as the scheduling of these real-time tasks considered
as periodic tasks with precedence and mutual exclusion constraints. The chapter’s
contribution is applied to two Benchmark Production Systems FESTO and EnAS
available at Martin Luther University in Germany.
280 A. Gharbi et al.
References
24. A. Gharbi, M. Khalgui, H.M. Hanisch, Functional safety of component-based embedded control
systems. 2nd IFAC Workshop on Dependable Control of Discrete Systems (2009)
25. http://www.program-Transformation.org/Tools/KoalaCompiler. Last accessed on 11 July 2010
26. IEC 1131–3, Programmable Controllers, Part 3: Programming Languages (International Elec-
trotechnical Commission, Geneva, 1992)
27. IEC 61508, Functional Safety of Electrical/Electronic Programmable Electronic Systems:
Generic Aspects. Part 1: General requirements (International Electrotechnical Commission,
Geneva, 1992)
28. IEC60880, Software for Computers in the Safety Systems of Nuclear Power Stations (Interna-
tional Electrotechnical Commission, 1987)
29. IEC61511, Functional Safety: Safety Instrumented Systems for the Process Industry Sector
(International Electrotechnical Commission, Geneva, 2003)
30. IEC61513, Nuclear Power Plants Instrumentation and Control for Systems Important to Safety
General Requirements for Systems (International Electrotechnical Commission, Geneva, 2002)
31. G. Jiroveanu, R.K. Boel, A distributed approach for fault detection and diagnosis based on
Time Petri Nets. Math. Comput. Simul., 287–313 (2006)
32. M. Kalech, M. Linder, G.A. Kaminka, Matrix-based representation for coordination fault detec-
tion: a formal approach. Comput. Vis. Image Underst.
33. A. Ketfi, N. Belkhatir, P.Y. Cunin, Automatic Adaptation of Component-based Software Issues
and Experiences (2002)
34. M. Khalgui, H.M. Hanisch, A. Gharbi, Model-checking for the functional safety of control
component-based heterogeneous embedded systems. 14th IEEE International conference on
Emerging Technology and Factory Automation (2009)
35. J. Kramer, J. Magee, The evolving Philosophers problem: dynamic change management. IEEE
Trans. Softw. Eng. 16 (1990)
36. P. Leitao, Agent-based distributed manufacturing control: A state-of-the-art survey. Eng. Appl.
Artif. Intell. (2008)
37. A.J. Massa, Embedded Software Development with eCos, 1st edn (Prentice Hall, Upper Saddle
River, NJ, USA, 2002)
38. S. Merchant, K. Dedhia, Performance Comparison of RTOS (2001)
39. C. Muench, The Windows CE Technology Tutorial: Windows Powered Solutions for the Devel-
oper (Addison Wesley, Reading, 2000)
40. S. Olsen, J. Wang, A. Ramirez-Serrano, R.W. Brennan, Contingencies-based reconfiguration
of distributed factory automation. Robot. Comput. Integr. Manuf., 379–390 (2005) (Safety
Reconfigurable Embedded Control Systems 31)
41. G.A. Papadopoulos, F. Arbab, Configuration and Dynamic Reconfiguration of Components
Using the Coordination Paradigm (2000)
42. P. Pedreiras, L. Almeida, Task Management for Soft Real-Time Applications based on General
Purpose Operating, System (2007)
43. G. Pratl, D. Dietrich, G. Hancke, W. Penzhorn, A new model for autonomous, networked
control systems. IEEE Trans. Ind. Inform. 3(1) (2007)
44. QNX Neutrino, Real Time Operating System User Manual Guide (2007)
45. A. Rasche, A. Polze, ReDAC—Dynamic Reconfiguration of distributed component-based appli-
cations with cyclic dependencies (2008)
46. A. Rasche, W. Schult, Dynamic updates of graphical components in the .NET Framework, in
Proceedings of SAKS07 Workshop eds. by A. Gharbi, M. Khalgui, M.A. Khan, vol. 30 (2007)
47. M.N. Rooker, C. Sunder, T. Strasser, A. Zoitl, O. Hummer, G. Ebenhofer, Zero Downtime
Reconfiguration of Distributed Automation Systems : The eCEDAC Approach (Springer, New
York, 2007). Third International Conference on Industrial Applications of Holonic and Multi-
Agent Systems
48. G. Satheesh Kumar, T. Nagarajan, Experimental investigations on the contour generation of a
reconfigurable Stewart platform. IJIMR 1(4), 87–99 (2011)
49. L. Sha, R. Rajkumar, J.P. Lehoczky, Priority inheritence protocols: an approach to real-time
synchronization. IEEE Trans. Comput. 39(9), 1175–1185 (1990)
282 A. Gharbi et al.
50. D.D. Souza, A.C. Wills, Objects, Components and Frameworks: The Catalysis Approach
(Addison-Wesley, Reading, MA, 1998)
51. D.B. Stewart, R.A. Volpe, P.K. Khosla, Design of dynamically reconfigurable real-time soft-
ware using port-based objects. IEEE Trans. Softw. Eng. 23, 592–600 (1997)
52. C. Szyperski, D. Gruntz, S. Murer, Component Software Beyond Object- Oriented Program-
ming (The Addison-Wesley Component Software Series, 2002)
53. R. van Ommering, F. van der Linden, J. Kramer, J. Magee, The Koala Component Model for
Consumer Electronics Software (IEEE Computer, Germany, 2000), pp. 78–85
54. M. Winter, Components for Embedded Software—The PECOS Approach
55. R. Wuyts, S. Ducasse, O. Nierstrasz, A data-centric approach to composing embedded, real-
time software components. J. Syst. Softw. (74), 25–34 (2005)
Low Power Techniques for Embedded
FPGA Processors
Abstract The low-power techniques are essential part of VLSI design due to
continuing increase in clock frequency and complexity of chip. The synchronous
circuit operates at highest clock frequency. These circuits drive a large load because
it has to reach many sequential elements throughout the chip. Thus, clock signals
have been a great source of power dissipation because of high frequency and load.
Since, clock signals are used for synchronization, they does not carry any information
and certainly doesn’t perform any computation. Therefore, disabling the clock signal
in inactive portions of the circuit is a useful approach for power dissipation reduction.
So, by using clock gating we can save power by reducing unnecessary clock activities
inside the gated module. In this chapter, we will review some of the techniques avail-
able for clock gating. The chapter also presents Register-Transfer Level(RTL) model
in Verilog language. Along with RTL model we have also analyzed the behaviors of
clock gating technique using waveform.
J. Kathuria (B)
HMR Institute of Technology and Management, New Delhi, India
e-mail: [email protected]
M. A. Khan
Department of Computer Science and Engineering, Sharda University, Gr. Noida, India
e-mail: [email protected]
A. Abraham
Machine Intelligence Research Labs (MIR Labs), Auburn, Washington, USA
e-mail: [email protected]
A. Darwish
Helwan University, Cairo, Egypt
e-mail: [email protected]
M. A. Khan et al. (eds.), Embedded and Real Time System Development: 283
A Software Engineering Perspective, Studies in Computational Intelligence 520,
DOI: 10.1007/978-3-642-40888-5_11, © Springer-Verlag Berlin Heidelberg 2014
284 J. Kathuria et al.
1 Introduction
The Moore’s law states that the density of transistors on an Integrated Circuit (IC)
will double approximately every two years. However, there are many challenges.
The power density of the IC increases exponentially in every generation of tech-
nology. We also know that bipolar and nMOS transistors consume energy even
in a stable combinatorial state. However, CMOS transistors consume lower power
largely because power is dissipated only when they switch states, and not when
the state is steady. The power consumption has been always an important area of
research in circuit design. Also, there is a paradigm shift from traditional single
core computing to multi-core System-on-Chip (SoC). A SoC consists of computa-
tional intellectual property cores, analog components, interface and ICs to imple-
ment a system on a single-chip. More than billion transistors are expected to be
integrated on a single-chip. Multiple cores can run multiple instructions simulta-
neously, increasing overall speed for programs amenable to parallel computing.
Processors were originally developed with only one core. The traditional multi-
processor techniques are no longer efficient because of issues like congestion in
supplying instructions and data to the many processors. The Tilera processors has
a switch in each core to route data through an on-chip mesh network to avoid data
congestion [12]. Hence, SoCs with hundreds of IP cores are becoming a reality.
The growth of number of cores in single-chip is shown in Fig. 1. The fundamental
idea to reduce the power consumption is to disconnect the clock when the device
is idle. This technique is called clock gating. In synchronous circuits the clock
signal is responsible for significant part of power dissipation (up to 40 %) [3].
The power density has once again increased in multi-core processing and SoC. The
Register Transfer Level (RTL) clock gating is the most common technique used for
optimization and improving efficiency but still it leaves one question that how effi-
ciently the gating circuitry has been designed. The gated clock is widely accepted
technique in order to optimize power that can be applied at system, gate and RTL
level. The clock gating can save more power by shutting off the clock to register if
there is no change in the state. The clock gating technique has ability to retain the
state of register while clock is shut off.
power is directly proportional to voltage and the frequency of the clock as shown in
the following equation:
Generally, the system clock is connected to the clock pin on every flip-flop in the
design. Therefore, we can observe main sources of power consumption as follows:
1. Power consumption in combinational logic due to evaluation on each clock cycle.
2. Power consumed by flip-flops in case of steady inputs.
The clock gating can reduce power consumed by flip-flops and combinational net-
work. The simplest approach for clock gating is to identify a set of flip-flops who
shares a common enable signal. Generally, the enable signal is ANDed with the clock
to generate the gated clock. The gated clock is then fed to the clock ports of all of
the flip-flops. The clock gating is a good technique to reduce the power, however,
several considerations in implementing clock gating is needed. We have listed some
of the important consideration as follows:
286 J. Kathuria et al.
1. The enable signal shall remain stable during high period of the clock. The enable
signal can switch when clock is in low period.
2. The glitches at the gated clock should be avoided.
3. The clock skew at gating circuitry must be avoided. Hence, gating technique need
a careful design.
This chapter presents an exhaustive survey and discussion on several techniques
for clock gating. The chapter also presents analysis on RTL design and simulation.
Also, chapter discusses some of the fundamental mechanisms employed for power
reduction.
2 Timing Analysis
In the steady state behavior of combinational circuit the output is a function of the
inputs under the assumption that inputs have been stable for a long time relative to
the delays in the circuit. However, the actual delay from an input change to output
change in a real circuit is non-zero which depends on many factors.
Glitches—The unequal arrival of input signals produces transient behavior in a logic
circuit that may differ from what is predicted by a steady-state analysis. As shown
in Fig. 2, the output of a circuit may produce a short pulse, often called glitch; at a
time steady state analysis predicts that the output should not change [14].
Hazards—A hazard is a circuit which may produce a glitch. A hazard exists when
a circuit has some possibility of producing glitches [6, 14]. This is an unwanted
glitch that occurs due to unequal path or unequal propagation delays through a
combinational circuit.There are two types of hazards as follows.
1. Static Hazard
(a) Static-1 Hazard—If the output momentarily goes to sate ‘0’ when the output
is expected to remain in state ‘1’ as per the steady state analysis, the hazard
of this nature is known as static-1 Hazard.
(b) Static-0 Hazard—If the output momentarily goes to sate ‘1’ when the output
is expected to remain in state ‘0’ as per the steady state analysis, the hazard
of this nature is known as static-0 Hazard.
2. Dynamic Hazard—When the output is supposed to change from 0 to 1 (or from 1
to 0), the circuit may go through three or more transients and produce more than
one glitch. Such multiple glitch situations are known as dynamic hazards.
In the Fig. 3, we can see to glitch when en is switching from high to low and CLK is
switching from low to high. In a similar fashion we can see a glitch in Fig. 4 where
en and CLK both are switching from high to low (Fig. 5).
Low Power Techniques for Embedded FPGA Processors 287
Fig. 2 Glitches when En and CLK is applied randomly at the inputs of AND based clock gating
Most of the clock-gating has been applied at RTL level. In this section we present
six different techniques for clock gating at RTL level. The RTL clock-gating can be
applied at three level:
1. System-level
2. Sequential
3. Combinational
The system-level clock-gating stops the clock for an entire circuitry. The system-
level technique effectively disables entire functionality of the system. While as com-
binational and sequential selectively suspend clocking while the block continues to
produce an output. This chapter considers a counter for evaluating various clock gat-
ing technique. We start our discussion with AND gate as fundamental clock gating
technique (Fig. 8).
In the beginning, many authors have suggested use of AND gate for clock gating
due to simple logic design [4, 14, 15]. Here, we will analyze the response of the
sequential circuit while applying fundamental AND gate technique for clock gating.
In order to control the state of clock we need an enable (En) input to AND gate other
288 J. Kathuria et al.
d
en
clk
reset
q
Gclk
Fig. 3 Glitches when En and CLK is applied randomly at the inputs of AND based gating
than clock (CLK). To demonstrate the concept we present schematic, RTL code and
output waveform. Throughout the chapter we have took 4-bit counter to apply clock
gating technique. Let’s first analyze basic 4 bit nagative edge counter as shown in
Fig. 6. As shown in Fig. 7, initially at reset = 0, the counter initialized to 0 and
thereafter when reset = 1 the counter increments at each negative edge of the clock.
Let’s Analyze response of counter circuit with AND clock gating as shown in
Fig. 8. In Fig. 9 we can observe that enable is stable to high when clk is rising,
therefore, counter is incremented on active edge of the clock. However, in Fig. 10,
when en is starts changing from positive edge to the next positive edge, then counter
increments one extra time, due to tiny glitch. Another scenario of negative edged
triggered is shown in Fig. 11. We can observe response of counter when en changes
from clock cycle starting from negative edge to the next negative edge. In this case
output of the counter changes after one clock cycle. Therefore, counter works cor-
rectly as the inputs are supplied for sufficient amount of time and signal is stable as
show in Fig. 11.
However, if the input timing of en and Clk are not synchronized then it may lead
to unpredictable results. In Fig. 12, any momentary changes in en signal while Clk
is active will produce hazard in Gclk. This situation may be dangerous and could
jeopardize the correct functioning of the entire system [6].
Low Power Techniques for Embedded FPGA Processors 289
d
en
clk
reset
q
gclk
Fig. 4 Glitches when En and CLK is applied randomly at the inputs of NOR based Gating
Using NOR gate for clock gating is a very suitable technique for reduce the power
where we need actions to be performed on positive edge of the global clock
[6, 14]. Therefore, we will now observe response of counter based on NOR
gating. The schematic of NOR based clock gating is shown in Fig. 13. In this figure
we can observe that counter will work when en signal is active because we have
connected an inverter at the input of the NOR gate. In Fig. 14 waveform shows
incorrect output of the counter when en changes to 1, from negative edge of the
clock to the next negative edge of the clock. In this case the counter is positive edge
290 J. Kathuria et al.
triggered so by observing GClk we can say that due to small glitch when en signal
changes to inactive at negative edge of the clock the counter increments once more.
The important thing is that we can use this configuration where we want to analyze
circuit response on positive edge of the clock. However, in this case the target system
should be negative edge triggered with NOR Clock gating, we can verify this from
Fig. 16. In the Fig. 15 correct output of the counter is shown when counter is positive
edge triggered in this case output is correct because en signal is changing from pos-
itive edge of the clock to the next positive edge of the clock. In the Fig. 17, we have
shown a major problem of hazards when any momentary hazard at the enable could
be pass on to the Gclk when clk = ‘0’ this situation is particularly very dangerous
and could jeopardize the correct functioning of the entire system [6].
Till now, we have analyzed two type of clock gating technique based on AND and
NOR gate. Now, we will discuss latch based AND clock gating technique as shown
reset
clk
reset
clk
en
q 0000 0001 0010 0011
Gclk
reset
clk
en
q 0000 0001 0010 0011 0100
Gclk
in Fig. 18. In this design we have inserted a latch between one input of AND gate
and En input signal. In the previous designs, En was directly connected to the AND
gate input, but here this will come through a latch. The latch is needed for correct
behavior, because En may have have hazards that must not propagate through AND
gate when global clock GLCK is high [1, 11, 13]. Moreover, we can notice that the
delay of the logic for the computation of En may on the critical path of the circuit,
therefore, effect must be taken into account during time verification [1, 5, 8, 11]. We
can observe from Fig. 19 that counter will take one extra clock cycle to change the
state and after that it will work normally until En is de-asserted. Also, at the time of
de-assertion it will take one extra clock cycle to change the state.
In Fig. 20 we have verified that unwanted outputs due to hazards at the En has
been eliminated. In Fig. 21, we can observe that when controlling latch is positive
and counter is also positive edge triggered then output of the counter is incorrect
because it increments the one more time even when En is low due to a tiny glitch
due (Fig. 22).
292 J. Kathuria et al.
reset
clk
en
q 0000 0001 0010
Gclk
reset
clk
en
q 0000 0001 0010 0011
Gclk
Latch based NOR gated technique is shown in Fig. 23. As we can observe from figure
that we have inserted a latch between one input of NOR gate and En input signal.
In the NOR based clock gating the En signal was directly connected to NOR gate
input, but in this design En is coming through a latch.
We can observe from Fig. 24 that initially counter is taking one extra clock cycle
to change the state. Thereafter, this will work normal until En is de-asserted. At the
Low Power Techniques for Embedded FPGA Processors 293
reset
clk
en
q 0000 0001 0010 0011 0100 0101
Gclk
reset
clk
en
q 0000 0001 0010 0011
Gclk
reset
clk
en
q 0000 0001 0010 0011
Gclk
Fig. 16 Output of counter when enable changes from positive edge to next positive edge but counter
is negative edge triggered
time of de-assertion also it will take one extra clock cycle to change the state. In
Fig. 25, we have verified that unwanted outputs due to hazards at the En signal has
been eliminated.
In Fig. 26, we can observe that when controlling latch is negative and counter
is also negative edge triggered then output of the counter is incorrect because it
294 J. Kathuria et al.
reset
clk
en
q 0000 0001 0010 0011
Gclk
Fig. 17 Hazards problem when NOR gate is used for clock gating
increments the counter one extra time even when En is low due to a tiny glitch as
shown in Fig. 27.
In multiplexer based clock gating, we use multiplexer to close and open a feedback
loop around a basic D-type flip-flop under the control of En signal as shown in
Fig. 28. Therefore, the resulting circuit is simple, robust, and compliant with the
rules of synchronous design. However, this approach takes fairly expensive because
per bit one multiplexer is needed which is energy inefficient. This is because of
switching at the clock input of a disabled flip-flop that amounts to wasting energy
in discharging and recharging the associated node capacitances. In Figs. 29 and 30,
we have shown the negative and positive edge triggered counter respectively. We
can observe from these waveforms that when En is high then at each negative and
positive edge of the clock respectively counter increments and when En goes low
then counter holds the state.
3 New Design
In this section, we will discuss another efficient design that will save more power. In
this circuit a clock gating cell for gating is used that is similar to latch based clock
gating. The gated clock generation circuit is shown in Figs. 31 and 34 using negative
latch and positive latch respectively. These circuits also have one comparison logic,
first logic circuit and second logic circuit. This circuit saves power in such a way that
even when target’s device clock is ON, the controlling device’s clock is OFF and also
when the target device’s clock is OFF then also controlling device’s clock is OFF.
This way we can save more power by avoiding unnecessary switching at clock signal
[9]. When En becomes high at that time GEN is low so XNOR will produce x =‘0’
which goes to the first clock generation logic that generates clock for controlling
device (latch). In first logic we have an OR gate which have Global Clock GCLK as
an input at the other input of OR gate. This logic will generate a clock pulse that will
drive the controlling latch when x turns to low (Figs. 32 and 33).
Low Power Techniques for Embedded FPGA Processors 295
Fig. 18 Clock gating of negative edge counter using negative latch based AND gate circuit
The second clock generation logic has an AND gate with GEN and Global clock
GCLK at its input. In the next clock pulse, when GEN turns to high the second clock
generation logic which is an AND gate with GEN and Global clock GCLK at its
input and when Gen goes high then it generates clock pulse that goes to the target
device. Since GEN is high the XNOR will produce x = ‘1’ thus OR will produce
at CClk constant high until En turns to low. This way GClk will be running and
CClk will be at constant high state that means latch will hold its state without any
switching. The circuit shown in Fig. 34 performs similar sequence of operations as
explained for the circuit shown in Fig. 31. When En turns to high at that time GEN
is low so XOR will produce x =‘1’ that goes to the first clock generation logic that
296 J. Kathuria et al.
reset
clk
en
q 0000 0001 0010 0011
q1
Gclk
Fig. 19 Normal output of negative edge nounter when negative latch based AND gated clock is
used
reset
clk
en
q 0000 0001 0010
q1
Gclk
Fig. 20 Output of negative edge counter when there are some random Hazards at En
Fig. 21 Clock gating of positive edge counter using positive latch based AND gate circuit
generates clock for controlling device (Latch). In first logic we have an AND gate,
which have global clock as input at the other input of AND gate. This logic will
generate a clock pulse that will drive the controlling latch when x turns to high. In
the next clock pulse, when GEN turns to high our second clock generation logic
which is an OR gate which has Q* and Global clock at its input and when Q* goes
Low Power Techniques for Embedded FPGA Processors 297
reset
clk
en
q 0000 0001 0010 0011 0100
q1
Gclk
Fig. 22 Output of counter when latch is positive and counter is also positive edge triggered
Fig. 23 Clock gating of negative edge counter using positive latch based NOR gate circuit
‘0’ it generates clock pulse that goes to the target device. Since GEN is high then
XOR will produce x = 0 and OR will produce at CClk constant low until En turns
to low. This way GClk will be running and CClk will be at constant low state that
means latch will hold its state without any switching.
The output of counter for circuit shown in Fig. 31 is shown in Figs. 32 and 33.
In Figs. 32 and 33 the enable signal changes from negative edge to next negative
edge and positive edge to next positive edge respectively. We can also observe that
target circuit is negative edge triggered and positive edge triggered respectively. The
presented design produces correct output that gives us solution of the problem that
persists in previous four types of clock gating. The output of counter for circuit
shown in Fig. 34 is shown in Figs. 35 and 36. In Figs. 35 and 36 the enable signal
changes from negative edge to next negative edge and positive edge to next positive
edge respectively. The target circuitry is negative edge triggered and positive edge
298 J. Kathuria et al.
reset
clk
en
q 0000 0001 0010 0011
q1
Gclk
Fig. 24 Normal output of negative edge counter when positive latch based OR gated clock is used
reset
clk
en
q 0000 0001 0010
q1
Gclk
Fig. 25 Output of negative edge counter when there are some random Hazards at En
Fig. 26 Clock gating of negative edge counter using negative latch based NOR gate circuit
triggered respectively. The presented design produces correct output which gives us
solution of the problem that persists in the previous four types of clock gating.
Low Power Techniques for Embedded FPGA Processors 299
reset
clk
en
q 0000 0001 0010 0011 0100 0101
q1
Gclk
Fig. 27 Output of counter when latch is negative and counter also negative edge triggered
4 Conclusions
en
reset
clk
Fig. 29 Output of negative edge triggered counter with multiplexer based clock gating
en
reset
clk
q 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001
Fig. 30 Output of positive edge triggered counter with multiplexer based clock gating
Low Power Techniques for Embedded FPGA Processors 301
en
reset
clk
q 0000 0001 0010 0011 0100 0101 0110 0111
Cclk
x
Gen
Gclk
Fig. 32 Output of negative edge counter with gated clock for circuit shown in Fig. 31
302 J. Kathuria et al.
en
reset
clk
q 0000 0001 0010 0011 0100 0101 0110 0111 1000
Cclk
x
Gen
Gclk
Fig. 33 Output of positive edge counter with gated clock for circuit shown in Fig. 31
en
reset
clk
q 0000 0001 0010 0011 0100 0101 0110 0111
Cclk
x
Gen
Gclk
Fig. 35 Output of negative edge counter with gated clock for circuit shown in Fig. 34
en
reset
clk
q 0000 0001 0010 0011 0100 0101 0110 0111
Cclk
x
Gen
Gclk
Fig. 36 Output of positive edge counter with gated clock for circuit shown in Fig. 34
304 J. Kathuria et al.
References
1 Introduction
In the past, automotive electronics and avionics systems were designed in a fed-
erated manner. Most functionality was implemented by special-purpose hardware
and hardware-tailored software. One control unit performed only one or at most a
limited number of individual functions, and functions had their own dedicated hard-
ware. As the functionality steadily increased, the number of control units has also
F. Pölzlbauer (B)
Virtual Vehicle, Graz, Austria
e-mail: [email protected]
I. Bate
Department of Computer Science, University of York, York, UK
e-mail: [email protected]
E. Brenner
Institute for Technical Informatics, Graz University of Technology, Graz, Austria
e-mail: [email protected]
M. A. Khan et al. (eds.), Embedded and Real Time System Development: 305
A Software Engineering Perspective, Studies in Computational Intelligence 520,
DOI: 10.1007/978-3-642-40888-5_12, © Springer-Verlag Berlin Heidelberg 2014
306 F. Pölzlbauer et al.
increased. Nowadays cars contain up to 80 control units. During the last several years,
a paradigm shift has occurred. The design of electronics has moved from a hardware-
oriented to a software/function-oriented approach. This means that functionality
is mainly based on software which is executed on general-purpose hardware. In
order to enable this trend an interface layer (AUTOSAR [2]) was introduced which
separates the application software from the underlying hardware. At the same time,
software development steadily moves from hand-coded to model-driven. In model
driven development, system synthesis is an important design step to give a partition-
ing/allocation. The synthesis transforms the Platform Independent Model (PIM) of
the system, held in views such as UML’s class and sequence diagram, into a Platform
Specific Model (PSM), held in views such as UML’s deployment diagrams. Design-
languages which support model-driven development (such as UML, EAST-ADL,
MARTE, etc.) provide dedicated diagrams (e.g.: component, deployment, commu-
nication, timing).
In order to deploy the application software onto the execution platform, several
configuration steps need to be performed. In the literature this is often referred to
as the Task Allocation Problem (TAP). TAP is one of the classically studied prob-
lems in systems and software development. It basically involves two parts. Firstly
allocating tasks and messages to the resources, i.e. the processors and networks
respectively. Secondly assigning attributes to the tasks and messages. Tasks repre-
sent software component, and are described by their timing and resource demand
(e.g. memory). Messages represent communication between tasks, and are described
by their data size and timing. Processors represent the computational units which
execute tasks, and are described by their processing power and memory. Networks
enable cross-processor communication, and are described by their bandwidth and
protocol-specific attributes. In its simplistic form it is an example of the Bin Packing
Problem (BPP) where the challenge is to avoid any resource becoming overload-
ing [11]. This “standard” version of the problem is recognised as being NP-hard.
Solutions normally involve three components: a means of describing the problem,
a fitness function that indicates how close a solution is to solving the problem, and
a search algorithm. A wide range of techniques have been used for searching for
solutions with heuristic search algorithms, branch and bound, and Integer Linear
Programming being the main ones. The problem was later expanded to cover:
1. hard real-time systems where schedulability analysis is used to ensure that the
system’s timing requirements are met as failing to meet them could lead to a
catastrophic effect [3]
2. reducing the energy used [1]
3. making them extensible [19], i.e. so that task’s execution times can be increased
while maintaining schedulability
4. handling change by reducing the likelihood of change and the size of the changes
when this is no longer possible [8]
5. supporting mode changes with the number of transitions minimized [6] and fault
tolerance [7].
Software Deployment for Automotive Applications 307
networks). Software deployment (or task allocation) deals with the question, how
the software application should be allocated onto the execution platform. Thereby,
objectives need to be optimized and constraints need to be satisfied. Applied to hard
real-time systems, the process consists of the following design decisions:
• task allocation: local allocation of tasks onto processors
• message routing: routing data from source processor to destination processor via
bus-systems and gateway-nodes
• frame packing: packing of application messages into bus frames
• scheduling: temporal planning of the system execution (task execution on proces-
sors, frame transmission on bus-systems).
These steps are followed by system performance and timing analysis in order to
guarantee real-time behaviour. Due to the different design decisions involved, the
terms software deployment or task allocation seem inappropriate to describe the
overall process. The term system configuration seems more adequate. This is why
it will be used throughout this chapter, from this point on.
cooling of a material. It has proven to be very robust, and can be tailored to specific
problems with low effort. However, the main reason for using SA is that it is shown
in [5] how SA can be tailored to address system configuration upgrade scenarios.
This aspect will be re-visited in Sect. 7.
In order to apply SA to a specific problem (here: system configuration), the fol-
lowing methods have to be implemented:
• neighbour: Which modification can be applied to a system configuration, in order
to get a new system configuration? These represent the modification an engineer
would perform manually.
310 F. Pölzlbauer et al.
• energy (cost): How “good” is a system configuration? This represents the metrics
that are used to evaluate a system configuration.
Algorithm 1 shows the overall procedure. The search starts from an initial con-
figuration, which is either randomly generated or given by the user. By applying
the neighbour function, a new configuration is generated. By analysing its cost, the
SA determines whether or not to accept it. After a certain number of iterations, the
value of parameter t (which represents the temperature in the annealing process) is
decreased, which impacts the acceptance rate of worse configurations. The overall
search stops, due to a defined exit criteria (usually a defined number of iterations or
a defined cost-limit).
The following optimization objectives are encoded into the cost function.
• number of needed processors ∗ min
• bus utilization ∗ min
• processor CPU utilization ∗ max and balanced.
The individual terms are combined into a single value, using a scaled weighted
sum. Determining adequate weights is challenging, and should be done systemati-
cally [15].
wi · costi
cost = (1)
wi
Electronics and embedded software was introduced into automotive systems in 1958.
The first application was the electronically controlled ignition system for combus-
tion engines. It was introduced in order to meet exhaust emission limits. Over the
years, a wide range of functionality of a car was implemented by electronics and
embedded software. Thereby, the embedded real-time software was developed in a
federated manner. Software was tailored to special-purpose hardware, and almost
each software-component was executed on a dedicated hardware-unit. As a con-
sequence, the software-components were quite hardware-dependent, which made
it hard to migrate software-components to new hardware-units. Also, the number
of hardware-units dramatically increased, which increased system cost and system
weight.
To overcome these issues, a paradigm shift has occurred during the last several
years. Software is executed on general-purpose hardware. Processing power of this
hardware steadily increases, thus allowing to execute several software-components
on a single hardware-unit. In order to make it easier to migrate software-components
to new hardware-units, software-components are separated from the underlying hard-
ware. This is enabled by the introduction of an interface layer, which abstracts
hardware-specific details, and provides standardized interfaces to the application-
layer software-components. In the automotive domain, the AUTOSAR standard [2]
has positioned itself as the leading interface-standard. The main part of AUTOSAR
with respect to TAP is the Virtual Function Bus (VFB). This allows two software-
components to communicate with each other without explicit naming and location
awareness. Basically the source would send a system-signal to the destination via
a VFB call. The actual details of where and how to send the signal is decided by
the Runtime Environment (RTE), which instantiates the VFB calls on each proces-
sor. Taking this approach restricts changes to the RTE when a new task alloca-
tion is generated. In other domains, similar trends can be observed. In the avionics
domain, the Integrated Modular Avionics (IMA) standard implements a very similar
concept referred to as Virtual Channels (instead of VFB calls) and System Blueprints
(containing the lookup tables).
Challenges
In order to solve the automotive system configuration problem, the general sys-
tem configuration approaches need to be tailored to automotive-specific demands.
Thereby, the following issues are the most challenging ones. To a large degree, these
issues are not sufficiently covered in the literature.
• The configuration of the communication infrastructure needs to tackle details
of automotive bus protocols. Thereby, a special focus needs to be set on frame
packing.
312 F. Pölzlbauer et al.
industry. Second, the approach leads to a high number of frames. The frame-set has
poor bandwidth usage, since too much bandwidth is consumed by the protocol-
overhead data. The high number of frames also leads to increased interference
between frames, thus leading to increased response times (and decreased schedula-
bility). In order to overcome these issues, realistic frame packing must be performed
by system configuration approaches. Therefore, several messages must be packed
into a single bus frame. Packing objective should be to minimise the bandwidth
demand of the resulting frame-set.
The Frame Packing Problem (FPP) is defined as follows: A set of messages
M = {m1 , m2 , . . . , mn } must be packed into a set of bus frames F = {f1 , f2 , . . . , fk },
subject to the constraint that the set of messages in any frame fits that frame’s max-
imum payload. Usually, the FPP is stated as an optimization problem. The most
common optimization objectives are: (1) minimize the number of needed frames; or
(2) maximize the schedulability of the resulting frame-set. A message is defined by
mi = [si , Ti , Di ]. A frame is define by fj = [pmj , Mj , Tj , Dj ]. In general each frame
may have its individual max.payload (depending on the bus protocol). However, usu-
ally all frames on the same bus have the same max.payload. Symbols are explained
in Table 2.
Design constraints may have a wide variety of sources. Most relevant are:
• safety considerations: If safety analysis of the entire system has been performed
(e.g. hazard and risk analysis, in accordance with ISO 26262 [10]), safety require-
ments can be derived. These impose constraints on design decisions.
• compatibility to legacy systems: Automotive systems are usually designed in
an evolutionary fashion. A previous version of the system is taken as a start-
ing point and is extended with additional features, in order to satisfy current
demands/requirements. Thus, legacy components may impose constraints on
design decisions.
• engineer’s experience: Engineers who have been designing similar systems typi-
cally have figured out “best practices”. These may exclude certain design decisions,
thus imposing additional constraints.
• legal requirements: Certain design solutions may not be allowed, in order to comply
to legal regulations.
Within the AUTOSAR standard, design constraints that might occur have been
specified in the AUTOSAR System Template. Therein, a variety of constraint-types can
be found. However, these constraints are not only relevant for automotive systems,
and could easily be applied to other domains (e.g. rail, aerospace, automation,…).
Table 1 provides a summary of the constraint-types. They can be categorized within
6 classes.
314 F. Pölzlbauer et al.
Since all embedded software must content itself with limited resources, these
constraints are well studied in the literature. Automotive systems must be reliable,
and thus have to satisfy additional constraints. Most safety-related functions must
guarantee real-time behaviour, especially if human life is at risk (e.g. drive-by-wire
application in a car). If the function is high-critical, it may be needed to apply
redundancy. Therefore replicated tasks must not reside on the same processor (task
separation), certain processors are inadequate for handling certain tasks (excluded
processors), and data must be transferred via separated buses, probably even within
separated bus frames.
It is interesting to note, that several constraint-types are not addressed in the
literature. Especially constraints that focus on the configuration of the communication
infrastructure have not been tackled. This can be explained, because most works on
system configuration (e.g. task allocation) use simplified models for cross-processor
communication. These models do not cover all relevant details of the communication
infrastructure, and thus the use of detailed constraints seems obsolete. In automotive
systems though, these constraints are of high importance.
Software Deployment for Automotive Applications 315
The FPP can be seen as a special case of the Bin Packing Problem (BPP), which
is known to be a NP-hard optimization problem. In the literature there are sev-
eral heuristics for the BPP [4]. Well known on-line heuristics are: next fit, first fit,
best fit, etc. Off-line heuristics extend these approaches by applying initial sorting,
resulting in: next fit decreasing, best fit decreasing, etc. In general, off-line approaches
outperform on-line approaches, since off-line approaches can exploit global knowl-
edge, whereas on-line approaches have to take decisions step-by-step, and decisions
cannot be undone.
Inspired by the main concepts of BPP heuristics, heuristics for the FPP have been
developed. It is interesting to note that there are only a few works in the literature
addressing the FPP, although the FPP has significant impact on the performance of
the system. Most FPP algorithms mimic some BPP heuristic. Sandström et al. [17]
mimics next fit decreasing, where messages are sorted by their deadline. Saket and
Navet [16] mimics best fit decreasing, where messages are sorted by their periods. In
addition, the sorted message-list is processed alternately from the beginning and the
end. In [14] messages are sorted by their offsets. References [14, 16] combine the
FPP with the scheduling problem. References [18, 19] include the FPP into the TAP.
Thereby FPP and TAP are formulated as a Mixed Integer Linear Problem (MILP),
and solved sequentially
Besides these differences, all state-of-the-art FPP algorithms share one common
issue: The packing decision is made based on one condition only:
payf + ohf
bwf = (4)
Tf
The payload contains all packed-in messages. The overhead contains all protocol-
specific data. Since a frame may have several messages packed-in, the frame must
be transmitted at a rate which satisfies the rate of all packed-in messages. Thus the
lowest message period determines the frame period.
Note that the period of a frame is determined by the message with the lowest
period inside the frame. By adding a message, the period of the extended frame Tf∪
may change. Originally it is Tf .
Tf∪ = min Tf ∪ Tm (7)
Depending on the relation between Tm and Tf , there are 3 cases for this packing
situation. For each of them, an optimal decision can be made.
Case I: Tm = Tf
If the periods of the frame and the message are equal, it is always beneficial to extend
an existing frame. Creating a new frame is never beneficial, because of the additional
overhead data.
payf + sm + ohf payf + ohf sm + ohf
= + (8)
T T T
ohf < 2 ohf (9)
318 F. Pölzlbauer et al.
At the threshold period of the message, the two alternatives perform equally.
sm + ohf
Tm← = Tf (12)
sm
The main issue of state-of-the-art frame packing heuristics is: During packing only the
necessary packing condition (see Eq. 3) is checked. In case the periods of messages
vary significantly, the approaches perform poorly, even if messages are sorted by
their periods.
To overcome this issue, the packing decision must be taken by also incorporating
the trade-off optimality criteria, derived above. The proposed frame packing heuristic
(see Algorithm 2) incorporates these criteria. Its structure is inspired by the Fixed
Frame Size approach of [17] which mimics next fit decreasing. However, messages are
not sorted by their deadline. Instead messages are sorted by their period, inspired by
Saket and Navet [16]. However, the packing procedure is not done in a bi-directional
way.
Within the ExtendOrNew method, the most beneficial decision is determined using
the optimality criteria presented earlier. This way each packing step has minimal
increase of bandwidth demand.
Due to the NP-hard nature of the FPP, the proposed approach cannot guarantee
an optimal packing. However, experimental evaluation shows that it outperforms
state-of-the-art approaches. On the one hand, the bandwidth demand of the resulting
frame-set is significantly decreased. On the other hand, the schedulability of the
frame-set is less sensitive against timing uncertainties.
Table 3 shows the improvements in bandwidth demand. On the left side,
improvements are shown due to number of sending processors and bus baudrate.
The main improvements can be seen for systems with higher number of sending
processors. Such systems will be used in future automotive applications. On the
320 F. Pölzlbauer et al.
Table 3 Improvement of
# nodes Improvement (%) Message Improvement
Poelzlbauer et al. compared
125 k 256 k 500 k (bit) (%)
to state-of-the-art
1…3 0.0 0.0 0.0 1…8 0…18
5 5.9 2.3 0.0 1…16 0…18
10 14.4 6.2 3.3 1…24 0…19
15 13.8 15.0 2.4 1…32 0…16
20 17.6 16.2 6.2 1…64 0…6
right side, improvements are shown due to message size. An interesting finding is
that the improvements are almost the same for a wide range of message sizes. Cur-
rently, physical data is mainly encoded in up to 16 bit variables. Future applications
may need higher accuracy, thus 32 bit variables may be used. The proposed approach
also handles these systems in an efficient way. More details on the evaluation can be
found in [13].
5 Handling Constraints
Basically, this extended SA search framework should be able to find system con-
figurations which satisfy all constraints. Since constraint violations are punished, the
search should be directed towards regions of the design space where all constraints
are satisfied. However, experimental evaluation reveals some more diverse findings.
For some constraint-types, this approach works. Configurations are modified, until
the number of constraint violations becomes quite low. The approach works quite
well for E-1 processor-internal only. This can be explained by the fact, that this
Software Deployment for Automotive Applications 321
In order to overcome these issues, and to handle constraints in an efficient way, two
issues must be addresses:
1. neighbour: The neighbour-moves are not aware of the constraints. Thus infeasible
configuration may be generated. However, neighbour-moves should be aware of
the constraints, and only generate configuration within feasible boundaries.
2. pre-conditions: In order to be able to satisfy a constraint, a set of pre-conditions
may need to be fulfilled (e.g. certain packing constraints need certain routing con-
ditions). Thus it is highly important to fulfill these pre-conditions, else constraints
can never be satisfied.
It is advised to split the entire system configuration problem into several sub-
problems, and to handle design decisions and constraints within these sub-problems.
Considering the various interactions, the following sub-problems seem appropriate:
• task allocation and message routing
• frame packing
• scheduling.
In order to increase the efficiency of the search, the neighbour-moves for task alloca-
tion are modified as follows: Each task has a set of admissible processors associated.
Only processors out of these sets are candidates for allocation modifications.
The question is: How can the sets of admissible processors be derived, so that all
allocation- and routing-constraints are satisfied? To achieve this, a set of rules are
derived and applied. Most of these rules are applied before the search-run. Thus it is
a one time effort.
E-1: By grouping the sender- and the receiver-task (forming a task-cluster), it
can be ensured that the task allocation algorithm will allocate both tasks to the
same processor. Thus, the communication between these tasks is always performed
processor-internal.
322 F. Pölzlbauer et al.
E-2 and E-3: Based on these sets, a set of admissible buses can be calculated for
each message.
B \ Bex if Bded = {}
Badm = (17)
Bded \ Bex otherwise
Since a task may send and receive several messages, only the intersected set X is
a potentially admissible processor for each task.
(t∗m∗t)
X= Padm (19)
E-4: Two messages can only be routed via the same bus, if their sender-tasks reside
on the same processor and also their receiver-tasks reside on the same processor.
Thus, E-4 can be satisfied by two D-1 constraints.
C-1 and C-2: Based on these sets, a set of admissible processors can be calculated
for each task. Thereby, the set of admissible buses (derived from E-2 and E-3) of
the sent/received messages has also to be taken into account.
(P ∩ X) \ Pex if Pded = {}
Padm = (20)
(Pded ∩ X) \ Pex otherwise
D-1: Similar to E-1, this constraint can be resolved by grouping the associated tasks
(forming a task-cluster). If tasks are grouped, the set of admissible processors for
a task cluster c is:
(c)
Padm = Padm (21)
t→c
D-2: The set of admissible processors can be updated dynamically (during the
design space exploration).
Pex.dyn = P of tasks that the current task must be separated from (23)
C-3: If an allocation is fixed, the task allocation algorithm will not modify that
allocation.
Software Deployment for Automotive Applications 323
Based on the considerations and rules presented earlier, design space exploration can
be performed more efficiently. Exploration steps (performed via neighbour moves)
are performed based on the following principles:
• Task clusters are treated as single elements during task allocation. Therefore, if a
task cluster is re-allocated, all tasks inside that task cluster will be re-allocated to
the same processor.
• When picking a “new” processor for a task/task cluster, only processors from the
set of admissible processors are used as candidates.
• Frame packing is performed due to the constraint-aware packing heuristic.
As a consequence, a large number of infeasible configurations is avoided, since
constraints are not violated. Thus, the efficiency of the search increases. In addition,
constraint satisfaction can be guaranteed for certain constraint-types.
Unfortunately, not all constraint-types can be resolved. For a set of constraint-
types (A-1, A-2, A-3, B-1, B-2, B-3, E-5) no rules how to satisfy them, could be
324 F. Pölzlbauer et al.
architectures). The goal of this phase is to find a platform which is extensible for
future requirements. Within this phase, almost all design decisions may be mod-
ified. This phase may be called finding a system configuration which maximizes
extensibility.
• During the following years, this system configuration is used as the basis. New
technologies are rarely introduced. Modifications to the system configuration may
mainly be due to 2 scenarios:
– Minor modifications are applied to the system configuration. The goal is to
improve the system. This may be called system configuration improving.
Thereby, most design decision taken for the initial configuration must be treated
as constraints.
– Additional components (e.g. software) are added to the system, in order to meet
new requirements. The goal is to find a system configuration for the extended
system. This may be called system configuration upgrade. Thereby almost
all design decisions from the initial system must be treated as constraints.
In addition, the new configuration should be similar to the initial configura-
tion, in order to reduce effort for re-verification.
Consequently, future research activities have to be two-fold: On the one hand,
emerging technologies such as multi-core architectures and automotive Ethernet have
to be tackled. On the other hand, it must be investigated how these development-
scenarios can be addressed. Concerning the latter, basically most ingredients are
already at hand.
• In order to find a platform configuration, which is extensible for future modifi-
cations/extensions, the key issue is to have test-cases (i.e. software architectures)
which represent possible future requirements. This can be addressed by using
change scenarios [8]. In addition, the configurations can be analysed with respect
to parameter uncertainties. Well known approaches use sensitivity analysis for
task WCET [19]. This can be extended towards other parameters (e.g. periods),
thus resulting in multi-dimensions robustness analysis [9].
• The second issue is to actually perform a system configuration modification.
A typical improvement scenario could be: reassign priorities to tasks on a cer-
tain processor, in order to fix timing issues. Therefore, state-of-the-art optimiza-
tion approaches could be used, e.g. SA. Of course, the neighbour-moves must be
constraint-aware.
• Within a system configuration upgrade, both the software-architecture as well
as the hardware-architecture may be subject to changes. Typically new addi-
tional software-components (and communication between software-components)
are added. In order to provide sufficient execution resources, additional processors
may be needed. These scenarios can be addressed as follows: In [6] it is shown,
how a system configuration can be found for multi-mode systems. Thereby, the
goal is to have minimal changes between modes, thus enabling efficient mode-
switches. If the different versions of the system (initial system, extended system)
are treated as modes, similar methods can be used. However, there is one signifi-
Software Deployment for Automotive Applications 327
cant difference: Emberson and Bate [6] assumed that the hardware-platform is the
same for all modes. In a system configuration upgrade scenario, this assumption
is no longer valid.
• Thus, when performing system configuration modifications and system configura-
tion extensions, the key issue is to deal with legacy decisions. These must be treated
as constraints. Therefore, constraint handling is needed. This can be addressed by
the methodology presented in Sect. 5.
In order to address and solve system configuration upgrade scenarios, the fol-
lowing next steps have to be performed: First, a metric for determining changes has
to be derived, and tailored to automotive needs. Second, the approach in [6] needs
to be extended, so that it can handle changes in the hardware-platform. Third, the
constraint handling approach needs to be incorporated with the part that will deal
with changes.
Acknowledgments The authors would like to acknowledge the financial support of the “COMET
K2—Competence Centres for Excellent Technologies Programme” of the Austrian Federal Ministry
for Transport, Innovation and Technology (BMVIT), the Austrian Federal Ministry of Economy,
Family and Youth (BMWFJ), the Austrian Research Promotion Agency (FFG), the Province of
Styria and the Styrian Business Promotion Agency (SFG). We also thank our supporting industrial
(AVL List) and scientific (Graz University of Technology) project partners.
References
11. D.S. Johnson, M.R. Garey, A 71/60 theorem for bin packing. J. Complex. 1(1), 65–106 (1985)
12. F. Pölzlbauer, I. Bate, and E. Brenner. Efficient constraint handling during designing reliable
automotive real-time systems. International Conference on Reliable Software Technologies
(Ada-Europe) (2012), pp. 207–220
13. F. Pölzlbauer, I. Bate, E. Brenner, Optimized frame packing for embedded systems. IEEE
Embed. Syst. Lett. 4(3), 65–68 (2012)
14. P. Pop, P. Eles, Z. Peng, Schedulability-driven frame packing for multicluster distributed em-
bedded systems. ACM Trans. Embed. Comput. Syst. 4(1), 112–140 (2005)
15. S. Poulding, P. Emberson, I. Bate, J. Clark, An efficient experimental methodology for configur-
ing search-based design algorithms, in IEEE High Assurance Systems Engineering Symposium
(HASE) (2007) , pp. 53–62
16. R. Saket, N. Navet, Frame packing algorithms for automotive applications. Embed. Comput.
2(1), 93–102 (2006)
17. K. Sandström, C. Norström, and M. Ahlmark. Frame packing in real-time communication. In-
ternational Conference on Real-Time Computing Systems and Applications (RTCSA) (2000),
pp. 399–403
18. W. Zheng, Q. Zhu, M. Di Natale, A. Sangiovanni-Vincentelli, Definition of task allocation
and priority assignment in hard real-time distributed systems, in IEEE Real-Time Systems
Symposium (RTSS) (2007), pp. 161–170
19. Q. Zhu, Y. Yang, E. Scholte, M. Di Natale, A. Sangiovanni-Vincentelli, Optimizing extensibil-
ity in hard real-time distributed systems, in IEEE Real-Time and Embedded Technology and
Applications Symposium (RTAS) (2009), pp. 275–284
Editors Biography
Ashraf Darwish received the Ph.D. degree in computer science from Saint
Petersburg State University, Russian Federation in 2006 and is currently an
assistant professor at the Faculty of Science, Helwan University, Cairo, P.O. 1179,
Egypt. Dr. Darwish teaches artificial intelligence, information security, data and
web mining, intelligent computing, image processing (in particular image
retrieval, medical imaging), modeling and simulation, intelligent environment,
body sensor networking.
M. A. Khan et al. (eds.), Embedded and Real Time System Development: 329
A Software Engineering Perspective, Studies in Computational Intelligence 520,
DOI: 10.1007/978-3-642-40888-5, Springer-Verlag Berlin Heidelberg 2014
330 Editors Biography
Ajith Abraham received the Ph.D. degree in Computer Science from Monash
University, Melbourne, Australia. He is currently the Director of Machine
Intelligence Research Labs (MIR Labs), Scientific Network for Innovation and
Research Excellence (SNIRE), P.O. Box 2259, Auburn, Washington, DC 98071,
USA, which has members from more than 100 countries. He serves/has served the
editorial board of over 50 International journals and has also guest edited 40
special issues on various topics. He has authored/co-authored more than 850
publications, and some of the works have also won best paper awards at
international conferences. His research and development experience includes more
than 23 years in the industry and academia. He works in a multidisciplinary
environment involving machine intelligence, network security, various aspects of
networks, e-commerce, Web intelligence, Web services, computational grids, data
mining, and their applications to various real-world problems. He has given more
than 60 plenary lectures and conference tutorials in these areas. He has an h-index
of 50+ with nearly 11K citations as per Google Scholar. Since 2008, Dr. Abraham
is the Chair of IEEE Systems Man and Cybernetics Society Technical Committee
on Soft Computing and also represented the IEEE Computer Society Distinguished
Lecturer Programme during 2011-2013.He is a Senior Member of the IEEE, the
IEEE Computer Society, the Institution of Engineering and Technology (U.K.) and
the Institution of Engineers Australia (Australia), etc. He is actively involved in
the Hybrid Intelligent Systems (HIS); Intelligent Systems Design and Applications
(ISDA); Information Assurance and Security (IAS); and Next Generation Web
Services Practices (NWeSP) series of international conferences, in addition to
other conferences. More information at: http://www.softcomputing.net.
[email protected], Personal WWW://http://www.softcomputing.net. Postal
address for delivering the books: Professor (Dr.) Ajith Abraham, Aster 13C,
Skyline Riverdale Apartments, North Fort Gate, Petah, Tripunithura, Kochi, KL
682301, India.