Security in Embedded Hardware - Daniel Ziener
Security in Embedded Hardware - Daniel Ziener
Daniel Ziener
Computer Architecture for Embedded Systems
University of Twente
Email: [email protected]
5. Februar 2019
Contents
1 Motivation 1
1.1 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 ASICs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Definitions 9
2.1 Dependability and its Attributes . . . . . . . . . . . . . . . . . . . 9
2.1.1 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.3 Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.4 Confidentially . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.5 Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.6 Maintainability . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.7 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Fault, Error, Failure . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.2 Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.3 Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Means to Attain Dependability . . . . . . . . . . . . . . . . . . . . 14
2.3.1 Fault Prevention . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.2 Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.3 Fault Removal . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.4 Fault Forecasting . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Security Flaws and Attacks . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5.1 Area Overhead . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5.2 Memory Overhead . . . . . . . . . . . . . . . . . . . . . . 18
2.5.3 Execution Time Overhead . . . . . . . . . . . . . . . . . . 18
2.6 IP Cores and Design Flow . . . . . . . . . . . . . . . . . . . . . . 19
iii
Contents
5 IP Protection 63
5.1 Encryption of IP Cores . . . . . . . . . . . . . . . . . . . . . . . . 65
5.1.1 Encrypted HDL or Netlist Cores . . . . . . . . . . . . . . . 65
5.1.2 Encrypted FPGA Configurations . . . . . . . . . . . . . . . 67
5.2 Additive Watermarking of IP Cores . . . . . . . . . . . . . . . . . 68
5.2.1 HDL Cores . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2.2 Netlist Cores . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.2.3 Bitfile Cores . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.3 Constraint-Based Watermarking of IP Cores . . . . . . . . . . . . . 100
5.3.1 HDL Cores . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.3.2 Netlist Cores . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.3.3 Bitfile and Layout Cores . . . . . . . . . . . . . . . . . . . 103
5.4 Other Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Bibliography 107
iv
Motivation
1
Since the invention of the transistor, the complexity of integrated circuits continues to
grow rapidly. First, only basic functions like discrete logic gates were implemented
as integrated circuits. With improvements in chip manufacturing, the size of the
transistors was drastically reduced and the maximum size of a die was increased.
Now, it is possible to integrate more then one billion transistors [Xil03] on one chip.
In the beginning, electric circuits (e.g., a central processing unit) consisted of dis-
crete electronic devices which were integrated on printed circuit boards (PCBs) and
consumed a lot of power. The invention of integrated circuits in the end of the 1950s
laid also the cornerstone of the development of embedded systems. For the first time,
the circuits were small enough and consumed less power, so that applications embed-
ded into a device, like production machines or consumer products became possible.
An embedded system is considered as a complete special purpose computer that may
consist of one or more CPUs, memories, a bus structure and special purpose cores.
The first integrated circuits were able to integrate basic logic functions (e.g., AND-,
OR-gate) and flip-flops. With further integration, complex circuits, like processors,
could be implemented into one chip. Today, it is possible to integrate a whole system
with processors, buses, memories and specific hardware cores on a single chip, a so
called system-on-chip (SoC).
These small, power and cost efficient, but manifolded applicable embedded sys-
tems finally took off on their triumphal course. Today, embedded systems are in-
cluded in most electrical devices, from the coffee machine over stereo systems to
washing machines. The application field of embedded systems spans from consumer
products, like mobile phones or television sets, over safety critical applications, like
automotive or nuclear plant applications, to security applications, such as smart cards
or identity cards.
As integration density grew, problems with heat dissipation arose. The embedding
of electronics into systems with small place and reduced cooling possibility, or the
operation in areas with extreme temperature, intensify this problem. Furthermore,
an embedded system which is integrated into an environment with moving parts is
exposed to shock. Thermic and shock problems have a high influence on the relia-
bility of the system. On the other hand, a system that steers big machines or controls
a dangerous process must have a high operational reliability. These are all reasons
1
1. Motivation
that design for reliability is gaining more and more influence on the development of
embedded systems.
However, what is the need for reliability, if everyone may alter critical parameters
or shut down important functions? To solve these problems, we need access control
to the embedded system. But, today, embedded systems are also used to grant access
to other systems or buildings. One example are chip cards. Inside these cards, a
secret key is stored. It is important that no unauthorized persons or systems are able
to read this secret key. Thus, an embedded system should not only be reliable but
also secure.
Integration of functions for the guarantee of reliability and security features in-
creases also the complexity of the integrated system enormously and thus design
time. On the other hand, the market requires shorter product cycles. The only solu-
tion is to reuse cores, which have been designed for other projects or were purchased
from other companies. The number of reused cores constantly increases. The advan-
tages of IP core (Intellectual Property cores) reuse are substantial. For example, they
offer a modular concept and fast development cycles.
IP cores are licensed and distributed like software. One problem of the IP cores
distribution, however, is the lack of protection against unlicensed usage, as cores can
be easily copied. Future embedded systems should also be able to prevent the usage
of unlicensed cores or the core developers should be able to detect their cores inside
an embedded system from third party manufactures.
Considering todays embedded systems, the integration of reliability and security
increasing functions depends on the application field. In the area of security-critical
systems (e.g., chip cards, access systems, etc.), several security functions are im-
plemented. We find additional reliability functions in systems where human life or
valuable assets are at stake (e.g., power plants, banking mainframes, airplanes, etc.).
On the other hand, the problem of all these additional functions is the requirement
for additional chip area. For cost-sensitive products which are produced in huge vol-
umes, like mobile phones or chip cards, the developer must rethink to integrate such
additional functions.
1.1 Security
Security becomes more and more important for computers and embedded systems.
With the ongoing integration of personal computers and embedded systems into net-
works and finally into the Internet, security attacks on these systems arose. These net-
worked, distributed devices may now compute sensitive data from the whole world
and the attacker does not need to be physically present. Also, the increased complex-
ity of these devices increases the probability of errors which can be used to break
into a system. Figure 1.1 shows a classification of different types of attacks related to
computer systems. This information is obtained form the CSI Computer Crime and
2
1.1 Security
Security Survey [Ric08], where 522 US-companies reported their experience with
computer crime. Further, the integration of networking interfaces into embedded
devices, for which it would not be obviously necessary lead to strange attacks, for
example that someone can break into the coffee machine over the Internet and alter
the composition of the coffee [Wri08].
! ! ∀
#
∃ #
%
& ∀
∋ !
( ) ∗( ∗) +( +) ,( ,) −( −) )(
∋ . / )++0
Figure 1.1: Security attacks reported in the CSI Computer Crime and Security Sur-
vey [Ric08], where 522 US-companies reported their experience with
computer crime for the year 2008.
Within the last decade, the focus of the embedded software community paid more
attention onto security of software-based applications. Today, most of the software
updates fix security bugs and provide only little additional functionality. At the same
time, the number of embedded electronic devices including at least one processor is
increasing.
The awareness of security in digital systems lead to investigation of secure com-
munication standards, for example SSL (Secure Socket Layer) [FKK96], the im-
plementation of cryptographic methods, for example AES (Advanced Encryption
Standard) [Fed01], a better review of software code to find vulnerabilities, and the
integration of security measures into hardware. Nevertheless, Figure 1.2 shows that
the vulnerability of digital systems increased rapidly over the last years. The main
cause for vulnerability are software errors through which a system may be compro-
3
1. Motivation
mised. The software of embedded systems moves from monolithic software towards
module-based software organized and scheduled by an operating system. By means
of modern communication structures like the Internet, the software on embedded
systems may be updated, partially or completely. These update mechanisms and the
different communication possibilities open the door for software based attacks on the
embedded system. For example, the number of viruses and trojans on mobile phones
increased rapidly over the last years. One main gateway for these attacks are buffer
overflows. A wrong jump destination or a wrong return address from a subroutine
might cause an execution of infiltrated code (see also Section 3.1).
However, also hardware errors can lead to the vulnerability of a system. For exam-
ple, Kaspersky shows that it is possible that the execution of appropriate instruction
sequences on a certain processor can lead to an adoption of control of the system
by an attacker [KC08]. In this case, it does not matter which operation system or
software security programs are running on the system.
A common objective for attackers are sensitive data, which are stored inside a dig-
ital system. To reach this objective, attackers are not only bound to software attacks.
Hardware attacks, where the digital system is physically penetrated to gather infor-
mation over the security facilities, or extract sensitive information are also practical.
If an embedded device stores secure data, like a cryptographic key, attackers may try
4
1.2 FPGAs
#
∀
()∗
!
∃
∀
Figure 1.3: On the left side, the percentage of the usage of unlicensed software is
shown in different areas of the world. On the right side the correspond-
ing losses in million US-Dollars are depicted [All07].
to read out this secret data by physically manipulating the processor on the embedded
device. This may be done by differential fault analysis (DFA) [BS97] or by specific
local manipulation on control registers inside the processor (see also Section 3.2).
The attackers goal thereby is to execute infiltrated code or deactivate the protection
of the secured data which may result from the manipulation of the program counter.
Another relevant security aspect in embedded systems is intellectual property pro-
tection (IPP). Due to shorter design cycles, many products can only be developed
with acquired hardware cores or software modules. Those companies selling these
cores and modules naturally have a high interest in securing their products against
unlicensed usage. Figure 1.3 shows the estimated percentage of unlicensed software
used in different areas of the world. Also, calculated revenue losses are shown. Ad-
ditionally, many unlicensed hardware IP cores are used in products. At the RSA
conference in 1998, it was estimated, that the adversity of the usage of unlicensed IP
cores approaches 1 Billion US$ per day [All00].
1.2 FPGAs
FPGAs (Field Programmable Gate Arrays) have their roots in the area of PLDs (Pro-
grammable Logic Devices), such as PLAs (Programmable Logic Arrays) or PALs
(Programmable Array Logics). Today, FPGAs have a significant market segment in
the microelectronics and, particularly in the embedded system area. The advantages
5
1. Motivation
of FPGAs over ASICs are their flexibility, the reduced development costs, and the
short implementation time. Also, developers have a limited implementation risk, a),
because of the easy possibility to update an erroneous design and b), because of the
awareness, that the silicon devices are proofed and the underlying technology oper-
ates correctly under the specified terms.
The main advantage of FPGAs is their reconfigurability. The demand for flexi-
bility through reconfigurability will rise according to ITRS [ITR07] from 28% of all
functionalities in 2007 until to an estimated 68% in the year 2022. Note that ITRS
also takes into account software running on a microprocessor which can be updated.
Furthermore, many FPGA devices support dynamic partial reconfiguration, which
means that during runtime, the design or a part of it can be reconfigured. With this
advantage, we can envisage new designs with new and improved possibilities and
properties, like an adaptive design, which can adapt itself to a new operation envi-
ronment. Unfortunately, dynamic reconfiguration is currently used rarely due to the
lack of improved design tools which increases the development costs for dynamic re-
configuration. But now, improved design tools for partial reconfiguration are starting
to become available, like the ReCoBus-Builder [KBT08, KHT08] or Xilinx Plana-
head [DSG05]. Nevertheless, dynamic reconfiguration for industrial designs is in its
infancy, and it will take several years to use all the great features of FPGAs.
In the last years, the main application area of FPGAs were in small volume em-
bedded systems and rapid prototyping platforms, where ASIC designs can be im-
plemented and verified before the expensive masks are produced. Nevertheless, the
FPGA usage in higher volume market rises, mainly due to lower FPGA price, higher
logic density and lower power consumption. Furthermore, due to shorter time-to-
market cycles and rising ASIC costs, FPGAs are breaking more and more into tra-
ditional ASIC domains. On the other hand, FPGAs are becoming competitors in the
(reconfigurable) DSP domain with multi-core and coarse-grain reconfigurable archi-
tectures, as well as from graphic processing units (GPU) where DSP algorithms are
adapted to run on these architectures. Nevertheless, these architectures suffer from
the lack of flexibility and today, only FPGA technology is flexible enough to imple-
ment a heterogeneous reconfigurable system-on-a-chip.
1.3 ASICs
Besides the advantages and the success of FPGAs, there still exists a huge market for
traditional ASICs (Application Specific Integrated Circuit). ASICs are designed for
high volume productions, where small cost-per-unit is important, as well as in low
power and high performance applications and designs with a high logic density.
The implementation of a core on an ASIC instead of an FPGA (both 90 nm tech-
nology) may require 40 times less area, may speed up the critical path by a factor
between 3 and 4, and may reduce the power by a factor of about 12 [KR06]. Here,
6
1.3 ASICs
we see that the big advantage of ASICs over FPGAs is the higher logic density, which
results in significantly lower production cost per unit. The disadvantages of ASICs
are the higher development and the higher initial production costs (e.g., masks, pack-
age design, test development [Kot06]). Therefore, the decision for using ASICs or
FPGAs due to minimization of the total costs is highly dependent on the produc-
tion volume. Figure 1.4 shows a comparison of the total costs between ASICs and
FPGAs in different technology generations over the production volume. The ASIC
graphs start with higher costs due to the high initial production costs, but with a lower
slope due to cheap production costs per unit. The initial cost of ASICs increases from
technology generation to generation, mainly because of the increasing chip and tech-
nology complexity and logic density. FPGA designs have lower initial costs, but
higher costs per unit. In summary, the total costs of a design using FPGA technology
is lower until reaching a certain production volume point. However, according to Xil-
inx [RBD+ 01] this point is shifting for each technology generation in the direction
of higher volumes.
Figure 1.4: This figure from [RBD+ 01] shows a comparison of the total costs of
FPGAs and ASICs in different technology generations over the produc-
tion volume. With every new technology generation, the break even
point between the total costs of FPGAs and ASICs designs is shifted
more and more to the ASIC side. As on implication, one may expect
the market for FPGAs to grow.
Nevertheless, besides the total costs discussion, there exist many design solutions,
especially in the area of embedded systems, which can only be implemented using
7
1. Motivation
ASIC technology. Examples include very low power designs and high performance
designs.
Before summarizing the major contributions of the thesis with respect to the above
topic, a set of definitions is in order.
8
Definitions
2
In this section, we introduce necessary definitions of terms with respect to security
and reliability of embedded systems that will be throughout this thesis. First, defi-
nitions in the field of dependability and the difference between defects, faults, and
errors are outlined. After the categorization of faults and errors, definitions stem-
ming from the area of security attacks are presented. Finally, different types of over-
head, which are indispensable for additional security and reliability functions, are
described.
2.1.1 Availability
The availability is considered as the readiness for correct service [ALR01]. This
means that the availability is a degree of the possibility to start a new function or
task of the system. Usually, the availability is given in the percentage of time that
a system is able of serving its intended function and can be calculated using the
following formula:
9
2. Definitions
Figure 2.1: The relationship of dependability between attributes, threats and means
[ALR01].
Table 2.1: The maximal annual downtime of a system for different values of avail-
ability, running either 8 hours or 24 hours per day [Rag06].
2.1.2 Reliability
Reliability is defined as the ability of a system or component to perform its required
functions under well-defined conditions for a specified time period [Ger91]. Laprie
and others transcribe the reliability with the continuity of correct service [ALR01].
Important parameters of reliability are the failure rate and its inversion, the MTTF
10
2.1 Dependability and its Attributes
(mean time to failure). Other parameters, like the MTBF (mean time between fail-
ures) include the time which is necessary to repair the system. The MTBF is the sum
of MTTF and the MTTR (mean time to repair).
2.1.3 Safety
Safety is the attribute of a safe system. This means that the system cannot lead to
catastrophic consequences for the users or the environment. Safety is relative, the
elimination of all possible risks is usually impossible. Furthermore, the safety of
a system cannot be measured directly. It is rather a subjective confidence of the
system. Whereas availability and reliability avoid all failures, safety avoids only the
catastrophic failures, which are only a small subset.
2.1.4 Confidentially
The confidentially of a system describes the absence of unauthorized disclosure of
information. The International Organization of Standardization (ISO) defines the
confidentially as “ensuring that information is accessible only to those authorized
to have access” [ISO05]. In many embedded systems (e.g., cryptographic systems),
it is very important to secure the information (e.g., the secure key) stored inside the
system against unauthorized access. But also the prevention of unlicensed usage of
software programs or hardware cores are topics of confidentially. Confidentially is,
like safety, subjective and cannot be measured directly.
2.1.5 Integrity
Integrity is the absence of improper system state alternation. This alternation can
be an unauthorized access to alter system information inside the system, which are
necessary for the correctness of the system. Furthermore, the system state alternation
can also be a damage or modification of the system. System integrity assures that no
part of the system (software or hardware) can be altered without the necessary privi-
leges. Also, the IP core verification to ensure the correct creator and the absence of
unauthorized supplementary changes can elevate the integrity of a system. Integrity
is the precondition for availability, reliability and safety [ALR01].
2.1.6 Maintainability
Maintainability is the ability to undergo repairs and modifications. This can be done
to repair errors, meet new requirements, make further maintenance easier, or to cope
with a changed requirement or environment. A system with a high maintainability
may have a good documentation, a modular structure, is parameterizable, uses asser-
tions and implements built-in self tests.
11
2. Definitions
2.1.7 Security
Security is defined as a combination of the attributes (1) confidentially (the preven-
tion of the unauthorized disclosure of information), (2) integrity (the prevention of
the unauthorized amendment or deletion of information), and (3) availability (the
prevention of the unauthorized withholding of information) [ITS91]. An alternative
definition for security could be the absence of unauthorized access to the system state
[ALR01]. The prevention or detection of the usage of unlicensed software or IP cores
can also be seen as a security aspect (confidentially) as well as the prevention of the
unauthorized alteration of software or IP cores (integrity). Like safety, security shall
prevent only a class of failures which are caused by unauthorized access or unautho-
rized handling of information.
2.2.1 Failure
A system is typically composed of different components. Each component can be
further subdivided into other components. All of these system components may have
internal states. If a system delivers its intended function, then the system is working
correctly. The intended function of a system can be described as an input/output
or interface specification which defines the behavior of the system on the system
boundaries with its users or other systems.
The system interface specification may not be complete. For example, it is spec-
ified that an event occurs on the output of the system, but the time of this event to
occur is not exactly specified. So, the system behavior can vary without violating the
specification. If the specification is violated, the system fails. A failure is an event
which occurs when the system deviates from its interface specification (see Figure
2.2).
2.2.2 Errors
If the internal state of a component deviates from the specification (the specification
of the states of the component), the component is erroneous and an error occurs.
An error is an unintended internal state whereas a failure is an unintended interface
behavior of the system. An error may lead to a failure. But it is also possible that
an error occurs and does not lead to a system failure, because of the component is
currently not used or the error is detected and corrected fast enough. Errors can be
12
2.2 Fault, Error, Failure
Figure 2.2: Faults may lead to an error, which may also lead to a system failure.
2.2.3 Faults
A fault is defined as a defect that has the potential to cause an error [Jal94]. All errors
are caused by faults, but a fault may not lead to an error. In the latter case, the fault
is masked out and has no impact on the system.
For example, consider the control path of a processor core. A fault like a single
event transient fault, caused by an alpha particle impact, occurs on one signal of the
program counter between two pipeline stages. If the time of occurrence is near the
rising active clock edge, an error may occur. Otherwise, if the time of occurrence
is far away form the rising edge of the clock, the fault does not lead to an error.
The erroneous program counter value can now lead to a system failure, if the wrong
subroutine is executed and the interface behavior differs from the specification. Oth-
erwise, if an error detection technique, like a control flow checker, as introduced later
in Chapter 4.7.3, is used, the error can be detected after the fault appearance, and the
error may be corrected by a re-execution of the corresponding instruction. But, this
additional re-execution needs several clock cycles to restore the error free state. For
real-time systems with very critical timing requirements, the possible output events
might be too late and the system thus might still fail.
13
2. Definitions
14
2.4 Security Flaws and Attacks
The next step is the recovery from the erroneous state. Recovery consists of two
steps, namely error handling and fault handling. Error handling is usually accom-
plished by rollback or rollforward. Rollback is done by using error-free states which
are stored on certain checkpoints to restore the state of the system to an older error-
free state. Rollback is attended by delaying the operation. This might be a problem
in case of real-time applications. Rollforward uses a new error-free state to recover
the system.
If the cause of the error is a permanent or periodic temporal fault, we need fault
handling to prevent the system from running into the same error state repeatedly. This
is usually done by fault diagnosis, fault isolation, system reconfiguration and system
reinitialization. For example, in case at permanent errors in memory structures, the
faulty memory column is identified and this column is switched to a reserved spare
column. After the switch over, the column content must be reinitialized.
It is important to notice that fault tolerance is a recursive concept. The techniques
and methods which provide fault tolerance should obviously themselves be resistant
against faults. This can, for example, be done by means of replication.
15
2. Definitions
lead to this alternation and therefore to a security failure. The process of exploiting
the flaw by a threat is called an attack (see Figure 2.3). A security failure occurs when
a security goal is violated. The main security goals are the dependability attributes
integrity, availability, and confidentially. The difference between a flaw and a threat
is that a flaw is a system characteristic, whereas a threat is an external event.
Figure 2.3: Flaws are security faults, which lead to errors if they are exploited by
attacks. The state alternation in case of an attack may lead to a security
failure.
16
2.5 Overhead
2.5 Overhead
Methods for increasing security and reliability in embedded systems often have the
drawback of additional overhead. To evaluate the additional costs of these methods,
we can use the following criteria:
17
2. Definitions
18
2.6 IP Cores and Design Flow
19
2. Definitions
Figure 2.4: A general design flow for FPGA and ASIC designs with the synthesis
and implementation steps and the different abstraction levels.
of these primitive cells to slices and configurable logic blocks (CLBs) is done in the
implementation step [Xilb].
IP cores can be delivered at all different abstraction levels in the corresponding
format: on the RTL as VHDL or Verilog code, on logic level as EDIF netlist, or
on the device level as mask files for the ASIC flow or as FPGA depended (partial)
bitfiles.
IP cores can be further categorized into soft and hard cores. Hard cores are ready
to use and are offered into a target depended layout or bitfile. All IP cores which are
delivered into an HDL or netlist format belongs to the soft cores. These cores need
further transformations of the design flow to be usable. The advantages of soft cores
are their flexibility for different target technologies and their can be parameterizable.
However, the timing and the area overhead are less predictable compared to hard
cores due the impact of the needed design tools. Analog or mixed signal IP cores are
usually delivered as hard cores.
20
Attacks on Embedded Systems
3
There exist two ways for categorization of attacks. The first way is to categorize at-
tacks by the violated security goals. The other way is to describe how the attack
is realized and which way the attacker chose to compromise the system [Rag06,
RRKH04].
Using the first categorization schema, the main security goals are integrity, avail-
ability, and confidentially (see Figure 3.1 above, and Section 2.4). Attacks which
compromise integrity can be further subdivided into manipulation of data, manipu-
lation of software or IP cores, as well as forging of authorship. Attacks which may
paralyze a system compromise the availability. Attacks to compromise the confiden-
tially of a system can be subdivided into gathering of sensitive data like passwords,
keys, program code, or IP cores, and getting access control to a system. Additionally,
copyright infringement compromises the confidentiality of the author of the core.
The means used to launch the attacks or the ways how the attack is realized can
be categorized into invasive and non-invasive attacks (see Figure 3.1 below). Both
groups can further be subdivided into logical and physical attacks [RRKH04]. Phys-
ical attacks typically require relatively expensive equipment and infrastructure, espe-
cially physical invasive attacks. Whereas for logical attacks, usually only a computer
or the embedded system is needed.
21
3.
22
!
secured data, deactivate security protection, open gateways for the attackers, or load
another infiltrated code from the Internet. The malicious code can be inside the pro-
cessed input data which is loaded into the memory by the processor. The second step
is bringing the processor in a state to execute the inserted attacker’s code. This can
be done by manipulation of the program flow.
One way to achieve this is by utilizing buffer overflows for smashing stacks. Most
programs are structured into subroutines with its own local variables. These variables
and also the arguments and the return address are stored in memory segments called
stacks. The return address is usually the first on the stack and the local variables
are concatenated on the bottom. Normally, like in the C programming language,
the content of array variables are written from bottom to the top, and if the range
is not checked, the return address can be overwritten (see Figure 3.2). The attacker
can manipulate the input data in a way that the return address is overwritten with
the address of his malicious code. On the return, the malicious code is executed
[Ale96, PB04]. Another possibility is to overwrite the frame pointer address instead
of the return address [klo99].
Figure 3.2: On the left side, a part of the program memory is shown. Normally,
the subroutine is called and after its execution, the program counter
jumps back to the main program after the call instruction. However,
if the return address in the stack is overwritten by a buffer overflow of
the vector a[] (see right side), the erroneous return destination may
become the entry point of the malicious code (dashed line).
Heap-based buffer overflows are another class of code injection attacks. The mem-
ory heap is the dynamically allocated memory, in C managed by malloc() and
free(), in C++ by new() and delete(). The heap consists of many memory
23
3. Attacks on Embedded Systems
blocks which are usually chained together by a double linked list. These memory
blocks are managed by the dynamic memory allocator, which can insert, unlink or
merge blocks together. The information (pointer to the previous and next block) of
the linked lists is stored in a header for each block.
A heap-based buffer overflow may overwrite this header information in a way that
one pointer of the double linked list points to the stack segment before the return
address [Rag06]. If this block is now freed by the dynamic memory allocator, the
header information of the previous and next block are updated. Because one pointer
points to the stack segment due to the attack, the stack is updated in a way that the
return address is overwritten with the address of a heap block, which can now be
executed after the control flow reaches a return [Rag06, PB04]. There exist many
other different possibilities to utilize heap-based buffer overflows [Con99, Dob03].
Arc injection or return-into-libc is an attack where a system function is exploited
to spawn a new process which performs the attacker’s desired operations. The name
arc injection came from the inserting a new arc (control flow transfer) into the control
flow graph (CFG) of a program. In the standard C library (libc on UNIX-based sys-
tems), there exists a function called system(). This function validates a process
call given as argument and after successful validation starts its execution as a new
thread. The memory location of this standard function is usually known, and there-
fore also the starting address to bypass the validation of the argument. The return
pointer of the stack can now be manipulated by using a stack-based buffer overflow
to jump to the desired destination in the system function to execute a malicious pro-
cess. The name of the malicious process can be transferred to the system function by
using registers [PB04]. This attack is useful if the architecture or operating system
prevents the stack or heap memory area from execution.
Shacham generalized the return-into-libc attacks to show that it is possible to do
malicious computation without injecting malicious code [Sha07]. The idea is that
due to shared libraries, e.g., libc, many analyzable and attackers known instruction
snippets are in the memory. Shacham proposes that an attacker can build an arbitrary
program from these snippets which can do arbitrary computation. This can be done
by analyzing, for example, the libc library for code snippets which end with a return
instruction. Moreover, Shacham shows that for the x86 architecture, it is possible
to use only parts of instructions. The return instruction of the x86 architecture is
a one byte instruction encoded with 0xc3. However, other instructions which are
longer consist also of this byte value. By starting the sequence in the middle of an
instruction, the original instruction alignment is bypassed which enables the attacker
the usage of additional new instruction sequences. From these building block, the
attacker can build a program by chaining these snippets together by overwriting the
register which stores the return address. This so-called return-oriented programming
has been successfully transferred to other processor architectures, e.g., SPARC. In
[BRSS08], a compiler is introduced which is able to construct return-oriented exploits
24
3.2 Invasive Physical Attacks
from a general propose language. In summary, Shacham shows that preventing the
injection of code is not sufficient for preventing malicious computation.
Pointer subterfuge is an attack technique where pointer values are changed. There
exist four varieties: function pointer clobbering, data pointer manipulation, excep-
tion handler hijacking and virtual pointer smashing [PB04].
Function pointer clobbering modifies a function pointer so that the pointer directs
to the malicious code. When the control flow reaches the modified function call, the
attacker’s function is called and his code is executed.
Data pointer modifications can be used for arbitrary memory writes. This tech-
nique can be combined with other code injection attacks to launch complex attacks.
Exception handler hijacking modifies the thread environment block (in MS Win-
dows) that points to the list of registered exception handler functions. Because of
the fact that the list is stored on the stack, the entries can be easily manipulated to
utilize stack based buffer overflows. This technique can be put to work to transfer the
control flow to a malicious function. Within Linux, function pointers in the fnlist can
be replaced to have a similar effect.
Virtual pointer smashing replaces the virtual function table used in the C++ imple-
mentation of virtual functions. The virtual function table is used in C++ at runtime to
implement dynamic dispatch. Every C++ object has a virtual pointer, which points
to the appropriate virtual function table. By modifying the virtual pointer to direct to
an attacker’s virtual function table, malicious functions can be called when the next
virtual function is invoked.
25
3. Attacks on Embedded Systems
Figure 3.3: A secondary electron image recorded with a focused ion beam (FIB).
The FIB previously interrupts a signal wire [Fra].
26
3.3 Non-Invasive Logical Attacks
using bus probing. The problem here is the generation of the successive addresses
to get a linear memory trace. The attacker can bypass the software by destroying
and deactivating the transistor gates which are responsible for branches and jumps
with an FIB. The result is a program counter with can only linearly count up, which
fits perfectly for this attack [KK99]. Other attacks are reading ROM, reviving and
using test modes, ROM overwriting by using a laser cutter, EEPROM overwriting,
key retrieval using gate destruction, memory remanence, or probing single bus bits,
as well as EEPROM alternation [KK99, Hag].
27
3. Attacks on Embedded Systems
28
3.4 Non-Invasive Physical Attacks
calculation and registering of the new branch target address is a long combinatorial
path on many processor implementations [KK99].
29
3. Attacks on Embedded Systems
30
Defenses Against Code Injection
4
Attacks
In this section, we show measures against different code injection attacks, as intro-
duced in Section 3.1. A good overview of defenses against code injection attacks
is further given in [Rag06] and [Erl07]. The related work in this section is divided
into six groups: Methods using an additional return stack, software encryption, safe
languages, code analyzer, anomaly detection, as well as compiler, library and oper-
ation system support. Control flow checking methods, which combine security and
reliability issues are discussed in Section 4.7.
31
4. Defenses Against Code Injection Attacks
Furthermore, Xu and others propose methods to divide the stack into a control and
a data stack in [XKPI02]. Inside the control stack, the return addresses and stack
pointers and inside the data stack variables, e.g., buffers are stored. This approach
effectively solves the problem of buffer overflows. To achieve the stack split, Xu
presents two different techniques, one modifies the compiler and the other is a hard-
ware technique which modifies the processor.
Bhatkar and others [BDS03] and Xu and others [XKI03] propose methods for address
obfuscation. To exploit buffer overflows and achieve the execution of malicious code,
the attacker must know the memory layout. Due to address obfuscation, the achieve-
ment of such information about the memory structure is enormously complicated for
the attacker. In these methods, the program code is modified so that each time the
code is executed, the virtual addresses of the code and data are randomized. These
approaches randomize the base address of the stack and heap, the starting address of
the dynamic linked library, as well as the location of static data. Also, the order of
local and static variables as well as the functions are permuted. For objects which
cannot be rearranged, Bhatkar inserts random gaps by padding, e.g., in stack frames.
Shao and others proposed hardware assisted protection against function pointer
and buffer smashing attacks [SZHS03, SXZ+ 04]. The function pointers are XORed
with a randomly assigned key for each process which is hard to be reconstructed by
the attacker. This is a countermeasure for function pointer clobbering (see Section
3.1). Furthermore, Shao introduces a hardware-based boundary checking method to
avoid stack smashing. On each memory write, it is checked if the write destination is
outside the current stack frame and if so, an exception is raised.
If the software is loaded encrypted into the memory and decrypted in the fetch
stage, code injection attacks are impossible, because the attacker needs to inject his
code encrypted. The key for de- and encryption is different for each process, hence it
is impossible for the attacker to encrypt his code properly. Injection of unencrypted
code produces data garbage after decryption and results in a crash of the process.
Barrantes and others propose a method which uses an x86 processor emulator for
simulate the decryption in the fetch stage [BAFS05]. The process is encrypted at
load time, whereas Kc and others present an approach where the executable is stored
encrypted on the hard disk [KKP03]. The proper key is stored in the executable
header which is loaded into a special register for decryption. However, the key is also
easily extractable for an attacker which lowers the effectiveness of this approach.
32
4.3 Safe Languages
33
4. Defenses Against Code Injection Attacks
34
4.6 Compiler, Library, and Operating System Support
system call recording in a learning phase. Furthermore, techniques using sliding win-
dows which analyze the system calls inside the window are presented by Forrest and
others [FHSL96] and Wagner and others [WD01]. Forrest uses a dynamic learning
phase whereas Wagner uses static information derived from the control flow graph.
Feng and others use, besides the system call information, the return address from
the stack for anomaly detection [FKF+ 03]. Like the other methods, the checks are
done on system calls. During a learning phase, so-called virtual paths are recorded.
A virtual path can be built with all return addresses, gathered from the stack, on a
system call. These return addresses correspond to all unreturned functions. During
the execution of the detection phase, the virtual path is checked on every system call
to detect anomaly behavior.
Zhang and others present a hardware approach for detecting anomalies in the pro-
gram behavior [ZZPL04]. In this approach, the detection is done on the control flow
instruction level which has a finer granularity than the other system call-based ap-
proaches. Jump and branch information, like target addresses or favored conditional
branch decisions, are stored additionally in the system memory. Fast memory access
is assured through common cache structures of the processor. This method has some
similarities with our method which will be introduced in Section 4.7.3. However, this
approach does not store control flow graphs, rather each branch or jump is separately
looked up using a context addressable memory (CAM). Hereby, a hash of the branch
or jump address is calculated which acts as index for the branch table. Although this
approach needs more memory and has a higher latency as our approach, there is no
need for synchronization with the control flow of the executed program. The aim of
the method is to recognize attacks by detecting anomalous behavior. Therefore dur-
ing a learning phase, the decisions of conditional branches are recorded and stored
in the branch table. If the recorded decisions differ from the control flow behavior
in the detection phase, a warning signal will be risen, whereas if the control flow
diverges from the stored jump and branch information, a threat is signaled. The ap-
proach was extended with an anomalous path detection which compares sequences of
branch decisions of the executed program with the decisions recorded in the learning
phase [ZZPL05]. In the second approach, general indirect jumps (non returns from
subroutine) are considered as well.
In this section we discuss countermeasures for code injection attacks through en-
hancement of compilers, libraries, or the operation system.
35
4. Defenses Against Code Injection Attacks
1 The term canary word corresponds to the miner’s canary which was used in coal mines as an early
warning system. If there were toxic gases in the mine, the birds died before the miners were
affected. Canaries sing a lot, which made them very suitable for a visual and audible warning
system. The last canaries in mines were phased out in 1986 in the UK [BBC05].
36
4.6 Compiler, Library, and Operating System Support
Lhee and others propose a compiler extension which inserts additional buffer size
checks to prevent buffer overflows at runtime [LC02]. The buffer size information
is read out of a compilation with debugging information of the program. Using this
information, additional checks are automatically inserted into the source code.
Erlingsson and others propose a fine-grained software-based memory access con-
trol technique called XFI [EAV+ 06, ABEL09]. This technique enriches the program
code to grant access to an arbitrary number of memory regions. Furthermore, the
entry and exit point of a program can be controlled using XFI. Budiu and others pro-
pose additional instructions to extend the instruction set architecture (ISA) for XFI
hardware support [BEA06].
Jones and Kelly propose a method to identify out-of-bound pointers [JK97]. Every
result of a pointer arithmetic must reference the same object as the original pointer. If
not, the pointer is out-of-bounds. Such pointers can be identified dynamically by ad-
ditional instructions which are included at compile-time and a new object table which
is maintained during the execution. If a pointer is out-of-bounds, this pointer value is
set to ’-2’. The problem of this approach is that out-of-bound accesses are not allowed
in ANSI C, however, such pointers are used in many programs. Therefore, Ruwase
and Lam extend this approach with an out-of-bound object and call this approach
C Range Error Detector (CRED) [RL04]. If a pointer becomes out-of-bounds, it is
redirected to a special out-of-bound object which keeps the original pointer value and
the referenced data. This approach prevents buffer overflows, because all data written
over the bounds of the buffer are automatically redirected to other memory locations
managed by the out-of-bound object.
37
4. Defenses Against Code Injection Attacks
the compilation, libverify embeds the verification code at the start of the process. The
advantage is that the code does not have to be recompiled which makes this approach
completely transparent to the user.
Robertson and others [RKMV03] and Krennmair [Kre03] propose countermea-
sures for heap-based attacks, described in Section 3.1. The allocation and dealloca-
tion routines of the standard C library are modified to protect the header of the heap
segment. Robertson includes a padding mechanism and a checksum in the header
on frame allocation and verifies these information, if the segment should be freed.
Krennmairs technique, called ContraPolice, protects the heap pointer in the header
of each heap segment by randomly generating canaries like the StackGuard approach
for stack-based headers.
Finally, the operating system can be enhanced to protect programs from code injec-
tion attacks. Non-executable stack prevents the execution of malicious code, injected
into the stack. However, this approach prevents some allowed situations where code
is executed in the stack. Examples are functional programming languages which gen-
erate code during runtime in the stack, function trampolines for nested functions used
by the gcc compiler, or stack-based signal handling which is used by Linux. A patch
for a non-executable stack for the Linux operating system was provided in [Des97]
which also handles the above mentioned executions by disabling the protection in
case of these situations. However, this approach is defeated by Wojtczuk [Woj98].
Lately, processor vendors have introduced hardware support to prevent the execu-
tion of code from the stack. With a new flag, the so called NX (No eXecute) bit, mem-
ory regions can be declared page-wise as non-executable areas which are excluded
from execution by the hardware. Non-executable stack approaches for the Linux op-
erating system, like PaX [PAX03] or Exec Shield [vdV04] are able to use this NX
bit, or emulate it on processors which have no NX bit support. The technique can
be combined with write protection to achieve that no memory location in the process
can be marked as writable (’W’) and executable (’X’). This so called W⊕X protection
prevents attackers from injecting malicious code with subsequent execution. Never-
theless, Shacham demonstrated that it is not necessary to inject code in order to do
malicious computations [Sha07] (see also Section 3.1).
StackGhost is an operation system-based approach to protect the stack frame for
systems running on the SPARC architecture [FS01]. This method utilizes special
SPARC features like the windowed register file and provides a redundant copy of the
return address. StackGhost is available as a patch for the OpenBSD kernel.
38
4.7 Control Flow Checking
39
4. Defenses Against Code Injection Attacks
Figure 4.1: Control instructions are usually inserted before and after a basic block
(BB1-BB3) in software-based control flow checking approaches. The
additional control flow instructions check the executed control flow ac-
cording to the control flow graph.
and at the end of each block. Checking is done by storing a signature, e.g., the current
address, into a variable at the basic block entrance. Before leaving the basic block,
this signature is verified. The method can verify the subsequent linear execution of
the basic block. However, the processing of the correct basic block order cannot be
verified.
Oh and others introduce a technique called Control Flow Checking by Software
Signatures (CFCSS) [OSM02]. A unique signature is assigned to each basic block
and the signature is embedded with the signature difference to the predecessor block
in the code. During the execution, a runtime signature is calculated and stored in
a general purpose register. The signature of the last block and the stored signature
difference are used to calculate and verify the current runtime signature at each basic
block entrance. In other words, this approach checks if the correct predecessor of
the current basic block, according to the CFG, was processed. However, if a basic
block has more than two predecessors, the method is not applicable. In this case, an
additional runtime variable is introduced to resolve the problem. Borin and others
propose error classification for control flow checking and analyze the error coverage
of the most existing software-based approaches [BWWA06]. Furthermore, they in-
troduce two methods which are enhancements to the original CFCSS method. The
first method is called Edge Control Flow Checking (EdgCF) and the second is called
Region Based Control Flow (RCF) technique.
Other similar approaches are SWIFT [RCV+ 05] and YACCA [GRRV03, GRRV05].
All these method insert control instructions at the basic block borders into the pro-
gram code, as depicted in Figure 4.1.
40
4.7 Control Flow Checking
Bagchi and others introduce the Preemptive Control Signature (PECOS) checking,
which is able to prevent jumps or branches in case of an error [BLW+ 01, BKIL03].
The program code is equipped with additional checker instructions before each con-
trol flow instruction. The runtime target address is determined and verified with the
valid target addresses, extracted from the compiled code. The list of valid target ad-
dresses is stored inside the code, whereas the runtime target address is determined by
loading the control flow instruction into a register and decode it by software. If the
runtime target address is not in the list with valid addresses, an exception is triggered
which prevents the execution of the erroneous jump or branch. The drawback of this
approach is that only the integrity of the control flow instruction in the memory is
checked. Transient faults, such as single event effects in the control path cannot be
detected by this method.
Abadi and others introduce a software-based CFC technique for security issues
called Control Flow Integrity (CFI) [ABEL05, ABEL09]. This method focuses on
indirect calls and returns. The destination of indirect calls and returns are determined
at compile-time, and each of these jump destination are labeled with a unique iden-
tifier in the code. Instructions to check the identifier of the destination are added to
the program code before an indirect jump. Only if the identifier is correct, the jump
is executed. Budiu and others present an instruction set architecture (ISA) extension
which introduces new instructions for CFI hardware support [BEA06].
41
4. Defenses Against Code Injection Attacks
or related design errors of the program or the processor due to the diversity of the
processors architectures.
Watchdog processors, when used for control flow checking, have a watchdog pro-
gram which is derived from the control flow of the checked program. The control
flow of a program can be represented in a graph whose nodes represent sequences
of code, e.g., basic blocks or whole functions, and the edges between the nodes rep-
resent the control flow. An identifier, often called signature which is known by the
watchdog program is attached to each node. The signatures can be assigned at ran-
dom or they can be derived form the instructions inside a node. Techniques using
the arbitrarily assigned signatures are called assigned-signature control flow check-
ing and techniques using the derived signature are called derived-signature control
flow checking [MM88]. The different watchdog processor approaches can be further
categorized by the storage of the watchdog signatures. Therefore, the methods can
be divided into two groups, called Embedded Signature Monitoring (ESM) and Au-
tonomous Signature Monitoring (ASM). ESM methods embed the watchdog signa-
tures into the code of the checked program. To verify a signature, the corresponding
signature must first be transferred to the watchdog or to the main processor, depend-
ing on the comparison location. The watchdog processors for the ASM methods have
their own memory to store the signatures. Therefore, the watchdog must be initial-
ized with all watchdog signatures before the program execution. In summary, there
exist four categories for control flow checking with watchdog processors (see Table
4.1).
Table 4.1: Four different categories for control flow checking using watchdog pro-
cessors with some example references. The methods are categorized by
different watchdog signature storage locations (embedded into the pro-
gram: ESM; in additional memories for the watchdog processor: ASM)
and the different type of signatures (derived, assigned) according to
[MHPS96].
42
4.7 Control Flow Checking
instruction sequence inside a node. The third category includes schemes which do
both [MM88].
During the execution, the arbitrarily assigned signatures used for assigned-signature
CFC are transferred to the watchdog processor for verification. Usually, the trans-
ferred signatures are compared by the watchdog processor to the watchdog signa-
tures, stored in a separate watchdog memory (ASM method). The advantages of
these methods are the ease of implementation and the possibility to perform runtime
checks asynchronously. However, the drawbacks are the performance impact, due to
the program-based transfer of the signatures to the watchdog with additional control
flow instructions, and the low error coverage since only the sequence of the signatures
is checked.
The first known method is introduced by Yau and Chen [YC80] which assigns
prime numbers to loop-free intervals which are checked at runtime. Lu proposes a
method called Structural Integrity Checking (SIC) [Lu82]. The method assigns labels
to high-level control flow structures which are verified by the watchdog processor.
The approach is enhanced by Majzik and Hohl which is called Extended Structural
Integrity Checking (ESIC) [MH91].
An embedded signature monitoring approach for assigned-signature CFC is intro-
duced by Pataricza and others, called Signature Encoded Instruction Stream (SEIS)
[PMHH93, MHPS96]. In this approach, each basic block is assigned a unique signa-
ture which further encodes the successor basic block. The signatures are transferred
to the watchdog processor during the execution which verifies the control flow of
the program only using the information encoded in the signatures. Therefore, the
watchdog processor needs no signature storage memory and initialization phase.
Derived-signature CFC uses a signature calculated from the properties of the ex-
ecuted instructions inside a node. To check all instructions, a signature, e.g., an
XOR, hash or CRC value, of all instructions of a basic block is calculated offline (at
compile-time). At runtime, a checker unit calculates the signature of the executed
instruction in a basic block. When leaving a basic block, both signatures can be
compared and errors inside the basic block can be detected. The derived-signature
CFC methods can also be categorized by the storage of the precalculated (golden)
signature in ESM and ASM methods.
43
4. Defenses Against Code Injection Attacks
inserted instructions at the end or the beginning of each basic block. During run-
time, the calculated signatures from the watchdog processor are compared to these
embedded signatures. The advantage of these methods is that all instructions can be
checked and a new program already contains the corresponding signature (see Fig-
ure 4.2). The disadvantages are the performance impact and that a fault can only
be detected at the end of a basic block which may be too late. Also, a single event
upset during the execution of the additionally inserted instructions can lead to a false
detection or spoofing of an error.
44
4.7 Control Flow Checking
ded into NOP instructions [Gai94]. Meixner and others store the signatures for the
Argus-1 checker into unused instruction bits of the SPARC ISA to reduce the perfor-
mance and memory overhead of their ESM method [MBS07, MBS08]. If insufficient
unused bits are available, they also embed the signature into NOP instructions.
Upadhyaya and Ramamurth propose a derived-signature CFC technique using m-
of-n codes [UR94]. An m-of-n code is an n-bit code whose bit values have m ones. At
compile-time, the signature of a basic block is calculated, for example, by XORing
the instructions. If the intermediate result is an m-of-n code, then this instruction
is tagged. At runtime, the watchdog calculates the signature, recognizes the tagged
instructions and verifies on the tagged instructions if the runtime signature is an m-
of-n code. At the basic block borders, an additional byte is inserted which adjusts
the current signature to an m-of-n code in order to force a check. The advantage is
that the signature must not be stored in the program code. However, one additional
byte per branch is necessary to force a check in order to restart the runtime signature
calculation at a new basic block. A similar approach is presented by Ohlsson and
Rimen called Implicit Signature Checking (ISC) [OR95]. The implicit signatures
are the current start addresses of the basic blocks. This can be achieved by using
additional justified signatures embedded into the code.
45
4. Defenses Against Code Injection Attacks
Figure 4.3: In the derived-signature ASM approach, the watchdog processor has
a separate memory for storing the control instructions. The execution
must be synchronized between the CPU and the watchdog processor.
of the program is extracted and the signatures (CRC values) are calculated. During
the execution, the runtime signatures are calculated with an additional signature gen-
eration unit and, at the end of a block, the calculated signature is sent to the watchdog
processor. The watchdog compares the received signature to the signatures achieved
from the control flow graph and stores then in the watchdog memory. At branches,
the received signature is compared with the two successor signatures of the current
node in order to determine the next signature. This approach can also be used to
check multi-processor systems where each processor has its own signature genera-
tion unit and sends the signature via a signature queue to the watchdog processor
which is responsible for the whole system. The approach is extended by Madeira and
Silva who introduce the Online Signature Learning and Checking (OSLC) technique.
In this approach, the golden signature is generated during a learning phase [MS91].
The learned signatures are stored in the watchdog memory of an RMP-like watchdog
processor.
Arora and others describe an ASM approach for security applications in [ARRJ06].
This hardware approach consists of three parts: the Inter-Procedural CFC, the Intra-
Procedural CFC, and the Instruction Stream Integrity Checker. The Inter-Procedural
CFC verifies the function calls and returns by implementing the function call graph
in hardware using content addressable memories (CAMs) and an FSM. The Intra-
Procedural CFC checks the basic blocks by a compile-time generated control flow
graph, implemented in checker memories. Finally, the Instruction Stream Integrity
Checker is similar to those of other ASM methods, however, they use hash functions
to generate and verify the signatures.
Our new control flow checking approach introduced in Section 4.7.3 belongs to
the class of derived-signature ASM methods. We propose a term checker unit for the
46
4.7 Control Flow Checking
watchdog processor, because our unit is very simple with only few hardware over-
head. However, like in the Cerberus-16 or the WDP approach, the control flow graph
is mapped into microinstructions which are stored in a separate memory. Further
advantages of our control flow checker are the fast error detection due to the tight in-
tegration into the processor, the error recovery possibility, and the expandability with
modules which support more control flow instructions or detect more errors. More-
over, like the other ASM-methods we have no performance impact on the error-free
case and the program must not be altered.
47
4. Defenses Against Code Injection Attacks
Furthermore, the class of unconditional indirect jumps can be subdivided into re-
turns from subroutine, register indirect calls and other jumps. A return from sub-
routine is an example of an indirect jump, because the program counter jumps to
the address where the routine is called from, and this address is only known at run-
time. Register indirect calls are calls where the address of the called subroutine is
determined at runtime. This usually happen in C++ if a virtual function is called.
Finally, jumps which are not triggered by an instruction can occur such as inter-
rupts and traps. The destinations of interrupts are typically given by the start address
of the main interrupt service routine, and so, interrupts belong to the class of direct
jumps. Traps occur on exception conditions (like divide by zero). Here, the program
redirects to the address of an exception handler, and so, traps can be treated as direct
jumps, too.
Table 4.2 presents an analysis of the quantity of these different types of branches
and jumps in the code on the SPARC V8 [SPA] architecture for the SPEC CINT2000
benchmark [SPE] for a given list of programs. As can be seen, indirect calls and
jumps occur relatively rarely as opposed to direct branches and jumps.
Table 4.2: Accumulated number of all and different kinds of control flow instruc-
tions of benchmarks of the SPEC CINT2000 test suite [SPE] when com-
piled to the SPARC V8 [SPA] architecture.
3 Note that conditional indirect branches are not supported by any instruction set architecture that we
know of.
48
4.7 Control Flow Checking
Control Flow Method First, a given compiled machine code is separated into
a set of basic blocks (BB). A basic block is a sequence of code which is executed
successively without any jumps or branches except, possibly, at the end. The basic
block can only be left at the end of a block and can only be entered at the beginning.
Only the last instruction can be a jump or branch and only the first instruction can be
a jump or branch destination. The following instructions define the beginning of a
basic block [TH07]:
• the first instruction in a program or segment,
• the instruction following a control flow instruction,
• the instruction which is a destination of a control flow instruction.
From this information, the control flow graph CFG(BB, T ) is built: Each node
BBi ∈ BB of the control flow graph represents a basic block. The nodes are sorted
with increasing start address of the corresponding basic block in ascending order.
Each edge t j ∈ T represents a transition of the control flow from one basic block to
49
4. Defenses Against Code Injection Attacks
another. If the last instruction of a basic block BBi is a direct branch instruction, the
basic block has two successors. One is the basic block next in the list BBi+1 (if the
branch is not taken), and to a basic block where the first instruction is the branch
destination (if the branch is taken). Jumps have only one successor, and if the last
instruction is not a control flow instruction, the successor basic block is always the
next basic block BBi+1 . An example program and the corresponding CFG are shown
in Figure 4.4 which is separated into basic blocks.
!∀
#
%
∃
Figure 4.4: An example program code is given on the left hand side together with
the corresponding assembler code. The CFIs are denoted A to C, and
the CFI destination addresses a to c. D denotes the end of the program
or segment to be checked. Furthermore, the code is divided into ba-
sic blocks BBi , i = 1, . . . , 6. On the right hand side, the corresponding
control flow graph (CFG) is shown.
With the given CFG, we have all information to check a sequence of program
counter values for correctness leading to the specification of a proper control flow
checker unit as follows: The information of the CFG can be either used to directly
define a finite state machine (FSM) to check the correctness of a sequence of con-
trol flow instructions. Alternatively, an implementation using micro-instructions of a
micro-programmed circuit can be deducted from the CFG.
For an implementation of the checker unit as a micro-programmed circuit, the
information of the CFG can be stored inside memories. For each basic block, we
need to store the start and the end address and also the indices of the successor basic
blocks. The start address of each basic block is the end address of the previous basic
block incremented by one. To minimize the memory overhead, we can store only the
50
4.7 Control Flow Checking
end address and a global start address. Also, we only need to store one successor of
a basic block for branches because if the branch is not taken, the basic block with the
next index (BBi+1 ) is always executed.
With these memory overhead improvements, we need three memory items for each
basic block inside the memory:
• One for the basic block end address (addr),
• one for the index of the branch taken successor basic block (suc), and
• a flag ( f lag) which identifies the type of the last instruction of the basic block.
The flag is needed to choose the right transition to the next basic block (see Figure
4.5). Note that if the last instruction of a basic block is not a CFI, the successor basic
block index (suc) is not needed.
Figure 4.5: Three memory areas are necessary to store required information for
each CFG. In the first column (addr), the address of the last instruc-
tion of the basic block is stored. The successor basic block for a taken
branch is stored in the second column (suc). In the third column ( f lag),
a flag is stored which identifies the type of the last instruction of a basic
block. An N denotes a non control flow instruction, whereas a B de-
notes a branch. This example memory stores the values for the example
program in Figure 4.4.
The control flow checking algorithm is depicted in C language in Listing 4.1. For
checking the control flow, we need the current program counter (PC) and the follow-
ing program counter (nPC). The algorithm, implemented as a C function, returns 0
if the control flow for the program counter and its successor is correct, and −1 if
the control flow differs from its specification. Further, the index i of the current basic
block and the three memories (addr, suc, and f lag) are needed. The function addr(i)
returns the entry with index i of the memory addr.
51
4. Defenses Against Code Injection Attacks
By looking up the basic block end address in the addr memory (addr(i)), we know
when the basic block end is reached (Line 3). If the basic block end is not reached,
the address of the next program counter must be the current address incremented by
one (Line 23). If not, an error occurs. If the basic block end is reached, we must
distinguish between the different types of the last instruction inside the basic block
(Line 4, 9, and 17). If this is an unconditional jump, like a call, only the jump
target must be checked for correctness. The corresponding target address is the start
address of the successor basic block, given by its index. To get this address, the
end address of the basic block with the previous index is fetched and the address is
incremented (BBi−1 + 1 or Line 5). Furthermore, the current index i must be updated
to the successor basic block index (Line 6). If the last basic block instruction is
a conditional branch, two possible successor basic blocks exist. If the branch is
taken (Line 10), the handling is the same as for an unconditional jump. If the branch
is not taken (Line 13), the next program counter value should be the current value
incremented by one (nPC == PC + 1). Hence, the next instruction belongs to the
successive basic block and also the basic block index i must be updated (Line 14).
52
4.7 Control Flow Checking
Finally, if the last instruction of a basic block is not a CFI, the checking behavior is
the same as on conditional branches, where the branch is not taken (Line 18).
One very similar approach of a CF method is described in [ARRJ06]. Here, the
CFG is implemented in hardware by a finite state machine and a lookup table for
resolving the control flow instruction addresses and indices (in memory). The disad-
vantage of this approach is that the checker unit must be synthesized new for each
program. Here, in our memory-based approach, only the contents of the memories
need to be reconfigured in order to check a new program.
Control Flow Instruction Method In contrast to the control flow (CF) method,
the control flow instruction (CFI) method is based on storing control flow instructions
instead of basic blocks. In case of direct branches and jumps, the start and target
address are known at compile-time. So, it is possible to extract this information
from the binary or the disassembled program code by decoding the instructions. The
control flow instructions are then sorted by increasing addresses in ascending address
order.
Then, the control flow instruction graph CFIG(CFI, T ) is built: Here, each control
flow instruction in the code which should be checked represents a node (CFIi ∈ CFI).
Directed edges t j ∈ T of the CFIG denote transitions to the next following control
flow instruction in the given code.
Like in a CFG, each node can have a maximum of two successors: two for a
branch instruction and one in case of a jump instruction. For a branch instruction
CFIi , one successor is CFIi+1 (branch is not taken). The other successor of a direct
branch and jump instruction is CFIn which is the next control flow instruction in
the program code after the branch destination (branch is taken). The CFIG of the
example program code form Figure 4.4 is shown on the left side of Figure 4.6. Note
that D is not a CFI, rather it refers to the end of the checking segment or function.
Like in the CF method, the information of the CFIG can be used as a specification
of the correct branching behavior inside a control flow checker unit and implemented
either directly by an FSM or by micro-instructions of a micro-programmed circuit. In
case of a micro-programmed circuit implementation, we store for each CFI the start
and the target address in memory (addr and target in Figure 4.6). Also, the index of
the successor CFI must be stored inside this memory (suc in Figure 4.6). For direct
branches, we store the successor CFI for taken branches. If the branch is not taken,
the successor CFI is CFIi+1 . Finally, we need a flag ( f lag) to distinguish between
the different CFI types.
A proper control flow instruction checking algorithm is shown in Listing 4.2. Like
the CF algorithm, the inputs are the current program counter PC and the next program
counter nPC and the output is a 0 in case of a correct control flow, or a −1 in case
of an error. The checking algorithm needs the four memory columns, introduced in
53
4. Defenses Against Code Injection Attacks
Figure 4.6: For the example program code in Figure 4.4, the corresponding CFIG is
shown on the left hand side. The nodes correspond to the control flow
instructions, whereas the edges denote transitions. On the right hand
side, the four memory areas are shown which are necessary to store the
CFIG. In the addr memory, the address of the CFI is stored and the
corresponding target address is stored in the target column. In the third
column (suc), the successor CFI index is stored. Finally, the kind of
instruction is stored in the last column ( f lag).
Figure 4.6 and the index i, which denotes the next CFI from the current program flow
position.
The algorithm is quite similar to the CF method, with the difference of accessing
the jump or branch targets and the missing check of basic block ends with a non
CFI. In Line 3, we check if the current executed instruction is a CFI. If it is not, the
linear successive program flow is checked (Line 18). If the current program counter
references to a CFI, we must also distinguish between the different types of CFIs
(Line 4 and 9). If the CFI is an unconditional jump, the next program counter should
be the value stored in the target memory column (target, Line 5). Also, we must
update the index i to the index of the successive CFI (suc(i), Line 6). If the current
CFI is a conditional branch, we must check if the branch is taken or not (Line 10 and
13). If the branch is taken, the same checking strategy as in the case of unconditional
jumps is used. If the branch is not taken, the next program counter must be the
current one, incremented by one (Line 14). In both cases, the index i must be recently
updated.
54
4.7 Control Flow Checking
successor block. In the CFI method, we need to store two addresses (control flow in-
struction address addr and target address target) and the index of the successor block
for each CFIG node. Both methods also need bits to store the flags for distinguishing
the different CFI or basic block types. Usually, the index needs less bits than the
addresses of instructions, so the CF method uses less memory than the CFI method
for this example.
For measuring the memory overhead for standard user programs, we use the pro-
grams from the SPEC CINT2000 [SPE] benchmark in the following (see Section
4.7.3). Table 4.3 shows the memory overhead caused to implement the CF and CFI
method for the SPEC CINT2000 benchmark when compiled to the 32-bit SPARC
V8 [SPA] architecture. The smallest possible index bit width is chosen for the given
program to calculate the memory overhead in bits.
Also, the memory overhead of the checking methods are compared with the mem-
ory usage of the test programs. The number of instructions of the test programs are
presented in Table 4.2. On the SPARC V8 architecture, each instruction needs 32-
bit of memory space. The additional memory overhead of the checker methods are
shown in absolute values and in percentage of the memory usage of the test program
in Table 4.3.
The results in Table 4.3 show that the CF method usually produces a lower memory
overhead than the CFI method and in a range of typically less than 20%. Note that
55
4. Defenses Against Code Injection Attacks
Table 4.3: Required memory overhead of the programs of the SPEC CINT2000
benchmark in bits for the CF and CFI method. Also, the number of basic
blocks and control flow instructions, and the corresponding index width
is shown. The memory overhead is shown in absolute values and in per-
centage of the memory usage of the corresponding test program.
the shown overhead is for checking the whole program, with all subroutines which is
not always the best way. By restricting the checking to only few subroutines which
are executed very often and should have a high reliability and security, the memory
overhead can be significantly reduced.
56
4.7 Control Flow Checking
Conclusions Both introduced methods can only check direct branches and jumps,
where start and destination addresses can be extracted from the compiled code.
The advantage of the CF method is that in most cases, fewer additional memory
resources are needed than for the CFI method. The disadvantage of the CF method is
that memory handling is more difficult. On many processor architectures, the fastest
execution of one instruction is one clock cycle. Consider Algorithm 4.1, where we
need access to the addr memory for each control flow instruction twice, once for the
end address of the basic block (Line 3) and once for the start address of the successor
basic block (Line 5 and 10). To achieve this in a single clock cycle, we need a
dual-port memory which is more expensive than single port memories. Furthermore,
for the second access to the memory, we need first the successor index from the
suc memory. To do both memory accesses in one clock cycle is nearly impossible
on high-clocked processors. Furthermore, the access to the suc memory cannot be
scheduled one clock cycle before, because if the current basic block consists only
of one instruction, and the previous basic block ends with a branch instruction, the
current index i depends on the result of the executed branch (taken or not). This shows
us that we need at least two clock cycles to check a transition in CFG. To ensure that
on a basic block end the correct start and destination addresses are available, we
might pre-read both values. This can also be done with a single ported addr memory.
On the first clock cycle, the basic block end address is read from the addr memory
and the successor basic block index is read from the suc memory. On the second
clock cycle, the target address is read from the addr. But this pre-read can only be
done if the basic block consists of more than one instruction. If a basic block consists
57
4. Defenses Against Code Injection Attacks
of only one instruction, we must stall the processor pipeline to verify the control flow
instruction to prevent possible erroneous behavior. Fortunately, basic blocks with
only one instruction are very rare.
The CFI method, on the other hand, requires only one memory access for each
memory. In Line 3 of the Algorithm 4.2, we access the addr memory to get the next
CFI address. In the same clock cycle we can access the target memory to fetch the
correct destination address (Line 5 and 10). With the CFI method, it is possible to
check a transition in the CFI graph with at least one clock cycle. Therefore, the CFI
method has no execution time overhead at all.
The advantages of the CFI method are that the checker unit is very simple and uses
only few logic resources. Also, we have no performance impact, because the correct
control flow instruction address and target address may be loaded from the memory
in a single clock cycle. The disadvantages are that usually more memory resources
are needed as for the CF method and that we are not able to check the integrity of non
control flow instructions.
Finally, both introduced concepts for control flow checking have the big advantage
over [ARRJ06] in being reprogrammable. Thus, only the memory of the control
flow checker unit needs to be reprogrammed so to check a different program. No
adjustments of the hardware are thus necessary. Moreover, we have no performance
impact for verify the control flow like the software-based methods.
58
4.7 Control Flow Checking
Indirect jumps are also used for jump tables to efficiently implement switch
case clauses. Here, the alternative case targets are assembled in a jump table
which is addressed by the previously calculated operator. Furthermore, the targets
may be direct jumps which lead the control flow to the desired code segment (see
Figure 4.7). Another way to use a jump table is to call different functions, depending
on an input. Here, the alternative function pointers are stored inside a jump table,
whereas the index of the table is calculated with the input value. The address of the
desired function is fetched from the table and is called with an indirect jump. Note
that jump tables are not often used by compilers. Usually, switch case clauses
are translated to an if .. else if tree. But, depending on the compiler and
optimization parameters, indirect jumps might nevertheless occur. Indirect jumps
which result from jump table implementations are listed in Table 4.2 under the cate-
gory “other jumps”.
Finally, the indirect jumps are also used as indirect calls (see Table 4.2). During
indirect calls, the address of the target function is loaded inside a register and with an
indirect jump the function will be called. This occurs mainly in object-oriented pro-
gramming languages that support function pointers and virtual functions. However,
functions called by a jump table also use indirect calls.
The methods for checking indirect jumps that will be described in the following
can be categorized into methods using information which are gathered by analysis or
simulation at compile-time and methods which are only using runtime information.
59
4. Defenses Against Code Injection Attacks
Methods using runtime information do not need information from the compiled code.
Here, we monitor the control flow at runtime to decide if the execution of the indirect
jump is correct or not.
Most indirect jumps are returns from subroutine (see Table 4.2). By executing the
return from subroutine instruction, the program counter jumps to the next address
after the instruction, were the subroutine was called. The return address is typically
stored in a register inside the CPU so the return instruction is a special indirect jump.
Returns can be verified by implementing an additional hardware stack [KE91]. On a
call (direct or indirect), the return address is stored in the stack and when the return
instruction is executed, the back-jump can be verified.
Furthermore, indirect branch prediction units can be used to evaluate an indirect
jump address. Branch prediction is used in pipelined processors to avoid pipeline
stalls on branches. A prediction is made if a branch will be taken or not and the next
instructions will be fetched according to the prediction. If the prediction was right,
no stall occurs, if not, the pipeline must be stalled and the right instructions must be
fetched.
Indirect branch prediction units predict destinations of indirect jumps. The predic-
tions are made based on the jump behavior in the past [CHP97, SFF+ 02, JMKP07].
The result of an indirect branch predict unit might be used to evaluate how reliable
the jump destination is. If the prediction is correct, then the probability that this jump
is correct is high, but if the prediction is incorrect, the jump destination could be false.
A non-predicted indirect jump target has a lower trustworthiness. With this method,
no exact proposition can be made, but, for example, a higher level autonomic oper-
ating system can evaluate this jump confidentially to increase the reliability of the
whole system.
60
4.7 Control Flow Checking
• The CPU can continue executing the code at a lower reliability level.
If an error in the control flow occurs, the faulty instruction might be re-executed
as follows: The error should be detected fast enough to ensure that the state of the
CPU is not altered by the erroneous instruction execution. To guarantee this, a pos-
sible checker unit must monitor the program counter in the first pipeline stage of a
given RISC CPU. Unfortunately, in most architectures, the jump or branch instruc-
tions need more than one cycle to execute. So, until the error is detected, some
other instructions after the jump might be executed already. After error detection,
the program counter is reset to a value previous to the error by looping back the pro-
gram counter value from a subsequent pipeline step or by a calculated value from the
checker unit. The details of the re-execution process depend highly on the processor
architecture and design.
For example, the SPARC V8 architecture allows to execute one instruction after
a branch instruction or two instructions after a jump instruction before the branch
or jump is performed (see SPARC Architecture Manual [SPA]). If an error is de-
tected and the jump or branch instruction must be re-executed, also these following
instructions must be re-executed. It must also be ensured that these instructions can-
not alter the state (e.g., register content or memory operations) of the CPU before
re-execution. If the retry also fails, the instruction cache can be invalidated to ensure
that on the next re-execution, the instructions are transferred again from the memory.
If a predefined number of retries fail, the checker unit can lead the CPU into a secure
state. Also, the number of retries can be reported to the operating system to show
how reliable the CPU is.
Another possibility to react in the case of an error is to transfer the CPU into a
secure state. This state can be the reset state or any other state until the program was
executed correctly. After reaching this state, the operating system can initialize the
CPU with correct data and the CPU can start to execute from this clean state. The
invalidation of the data resulting from the erroneous task can be also done by the
operating system.
However, the CPU might continue executing the code at lower reliability level with
deactivated checker if the task has a low reliability requirement or is further checked
by another process.
61
4. Defenses Against Code Injection Attacks
In all cases, the operating system should be informed about the error and update
the internal reliability state of the CPU. If many errors occur, the CPU should only
be allowed to execute tasks with low reliability requirements or unimportant tasks, or
should finally be excluded by the dispatcher and shut down.
Additional Reading
In [ZT08a, ZT09, Zie10, SBE+ 07b, SBE+ 07a, MWB+ 10], more information about
this control flow checker unit can be found.
62
IP Protection
5
Intellectual property (IP) denotes the absolute right on an intangible asset, like music,
literature, artistic works, discoveries, inventions, words, phrases, symbols, designs,
software, or IP cores. The owner of the IP can license his work to other people
or companies. IPs are protected by law with patents, copyrights, trademarks, and
industrial design rights.
Drimer defines the following protection or defense categories against IP theft or
fraud [Dri09]: social, reactive, and active protections.
Social protection means that IP works are protected by laws, non-disclosure agree-
ments, copyrights, trademarks, patents, contracts, and so on. The deterrents are con-
viction by a court of law and the loss of a good reputation. However, these deterrents
are only effective if the misconduct can be proven and the appropriate laws exist.
Furthermore, the laws must be enforced which is handled differently from country to
country.
Reactive protection means that the theft or fraud cannot be prevented, however, it
can be detected and delivers evidence of the misconduct. Some reactive protection
mechanisms deliver only suspicious facts which, however, may be enough to trigger
further investigations. Furthermore, the persistence of reactive protection mecha-
nisms might deter would-be attackers.
Active protection means that physical or cryptographic mechanisms prevent the
theft or fraudulent usage of the protected work. This category has the highest deter-
rent degree. However, these mechanisms can be broken by attacks. Often the attack
can be proven if the misconduct is detected.
In this work, we concentrate on the protection of the IP of hardware cores. These
so called IP cores are distributed like software and can easily be copied. Some core
suppliers encrypt their cores and deliver special development tools which can handle
encrypted cores. The disadvantage is that common tools cannot handle encrypted
cores and that the shipped tools can be cracked so that unlicensed cores can be pro-
cessed. Another approach is to hide a signature in the core, a so-called watermark,
which can be used as a reactive proof of the original ownership. There exist many
concepts and approaches on the issue of integrating a watermark into a core.
In general, hiding a unique signature into user data, such as pictures, video, au-
dio, text, program code, or IP cores is called watermarking. Embedding a watermark
63
5. IP Protection
into multimedia data is achieved by altering the data slightly at points where human
sense organs have lower perception sensitivity. For example, one can remove fre-
quencies which cannot be perceived by the human ear by coding an audio sequence
into an MP3 file. Now, it is possible to hide a signature into these frequencies without
decreasing quality of the coded audio sequence [BTH96].
One problem of watermarking is that for verification, the existence and the charac-
teristic of a watermark must be disclosed, which enables possible attackers to remove
the watermark. To overcome this obstacle, Adelsbach and others [ARS04] and Li and
others [LC06] presented so-called zero-knowledge watermark schemes which enable
the detection of the watermark without disclosing relevant information.
The watermarking of IP cores is different from multimedia watermarking, because
the user data, which represents the circuit, must not be altered since functional cor-
rectness must be preserved. A fingerprint denotes a watermark which is varied for
individual copies of a core. This technique can be used to identify individual autho-
rized users. In case of an unauthorized copy, the user, the copied source belongs to,
can be detected and the copyright infringement may be reconstructed. Watermark-
ing procedures can be categorized into two groups of methods: additive methods and
constraint-based methods.
In additive methods, the signature is added to the functional core, for example, by
using unused lookup-tables in an FPGA [LMSP98]. The constraint-based methods
were originally introduced by [KLMS+ 01] and restrict the solution space of an op-
timization algorithm by setting additional constraints which are used to encode the
signature.
A survey and analysis of watermarking techniques in the context of IP cores is
provided by Abdel-Hamid and others [AHTA04]. Further, we refer to our own survey
of watermarking techniques for FPGA designs [ZT05]. A survey of security topics
for FPGAs is given by Drimer [Dri09] who also maintains the FPGA design security
bibliography website: http://www.cl.cam.ac.uk/˜sd410/fpgasec/.
In order to compare different watermarking strategies, some criteria are defined in
the following [HP99]:
64
5.1 Encryption of IP Cores
Verifiability: The watermark should be embedded in such a way that the author-
ship can be verified easily. It should be possible to read out the watermark only with
the given product and without any further information from the design flow which
must be requested from a company suspected of IP fraud.
Strong proof of authorship: The watermark should identify the author with a
strong proof. It should be impossible that other persons can claim the ownership of
the core. The watermarking procedure must be resistant against tampering.
In this section, we first discuss IP protection methods using core encryption. Af-
ter that, related work using additive and constraint based watermarking methods is
presented.
65
5. IP Protection
encrypted data, which means that the EDA tool routines must be protected against
read out attacks.
The problem is that no consistent industrial standard exists which handles en-
crypted IP cores [Dau06]. This complicates the interoperability of IP cores and EDA
tools.
Today, symmetric and asymmetric cryptographic approaches are used. Using sym-
metric cryptographic approaches, the en- and decryption is done with the same key.
The advantage of this approach is the reduced computational complexity compared
to asymmetric approaches. One problem is the secure distribution and communica-
tion of the key. Furthermore, EDA tools must deal with different keys for different
IP vendors, and if one key is cracked, usually all IP cores of the corresponding ven-
dor have lost their protection. Nevertheless, this approach is used, for example, by
Xilinx to encrypt some of their parameterizable HDL IP cores, e.g., the Microblaze
processor softcore [Xild].
Methods using asymmetric cryptography are also known as public key cryptog-
raphy which need two keys, the private and the public key. The private key is for
decryption inside the EDA tools, where as the encryption key is publicly available
and is used by the IP core vendor. The EDA vendor creates the key pairs and embeds
the private key in his tools. The IP core developer can now use the public key for the
encryption. The advantage is that the private decryption key may not be transferred
over untrusted communication channels and is only known by the EDA vendor. The
disadvantage is that asymmetric approaches have a high computational complexity
which results in long runtime for decryption up to several hours for IP cores [Dau06].
Another drawback is that the IP vendor must create a separate version for each EDA
tool, which is encrypted with the corresponding public key of the EDA vendor.
Dauman, Vice President of the Synopsys’ Synplicity Business Group, introduced a
hybrid approach [Dau06]. The IP core is encrypted with a symmetric cryptographic
method, like Triple-DES, or AES using a key which is generated by the IP vendor.
This key, now referenced as the data key, is encrypted with an asymmetric crypto-
graphic method, like RSA [RSA78], with the public key of the EDA vendor. This
approach is similar to the PGP approach [Zim95] for cryptographic privacy and au-
thentication of messages. The decryption is done with the (decrypted) data key and
the cryptographic method which is specified by the IP vendor. Inside the EDA tools,
there exist different symmetric cryptographic routines for the decryption of the core.
The advantage is that the decryption with a symmetric algorithm is very fast and the
computational complex asymmetric method is only used for the data key which is
very small compared to the whole IP core. Synplicity suggested this approach as
future industry standard and includes this method called ReadyIP into the product
Synplify Premier [Syn].
In 2007, a industry-wide panel discussion [Wil07] provided some insight into the
perception of encrypted IP cores of the EDA industry. The conclusion was that the
current social-based protection works well for large cooperations. A better solution
66
5.1 Encryption of IP Cores
is desirable but not necessarily urgent. However, they express their reservation to
small companies or startups which are not known in the community and might not be
willing to sell IP cores to these companies.
Barrick argued against the usage of encrypted netlist cores due to their hidden
costs [Bar]. The disadvantages are the fixed constraints, the prevention to reuse parts
of the logic for other cores, slower simulation speed or inaccurate behavioral models,
restriction of the choice of EDA tools, and fewer debugging possibilities. However,
sometimes encryption can be worth due to reduced acquisition costs.
67
5. IP Protection
Upon every FPGA configuration during the power-up cycle, the bitfile is loaded and
decrypted in the FPGA. The advantage is that the key never leaves the FPGA.
Bosset and others [BGB06] propose a method for using partial reconfiguration for
en- and decryption of FPGA bitfiles by user-defined soft cores. At power-up, the
decryption core is initially loaded form the PROM which decrypts the bitfile with the
user logic. Soudan and others [SAH] propose a method for the encryption of partially
reconfigurable bitfiles using device-specific keys.
68
5.2 Additive Watermarking of IP Cores
The disadvantage of these approaches is the usage of ports for signature verifica-
tion. This works only if the ports are reachable. If the core is embedded into other
cores, the ports of the watermarked core can be altered which falsifies or prevents
the detection of the signature in the output stream. This applies also to the signature
extraction sequence in the input stream.
69
5. IP Protection
from the original core, the ownership of the core can be proven by evaluating a de-
tector function D(IB , IL1 ).
Identifying the Core After the extraction of the content of lookup tables from a
bitfile, we can compare the obtained values with the information in the netlist. The
extraction of all lookup table contents from a bitfile is done as described in [Zie10]:
LB (IB ) = {xB1 , xB2 , . . . , xBq }. The content of the lookup tables can easily be read out
from a netlist file: LL (IL1 ) = {xL1 , xL2 , . . . , xLr }. For example, in an EDIF netlist
for Xilinx FPGA devices, the lookup table contents appear after the INIT property
for the lookup table instances. Unfortunately, the mapping tools do not necessarily
adopt these values. The mapping tool may merge lookup tables from different cores
together, convert one, two or three input lookup tables to four input lookup tables and
permute the inputs to achieve a better routing.
All lookup tables of an FPGA have nl inputs. On most FPGA architectures, lookup
tables have nl = 4 inputs. In a core netlist, also lookup tables with less than nl inputs
may exist. These lookup tables must be mapped onto nl input lookup tables. If
one input is unused, only half of the memory is needed to store the function and
the remaining space must be filled. In the case that a function uses less inputs than
the underlying technology of the FPGA provides, it is desirable to turn the unused
inputs into don’t cares. Intuitively, this can be achieved rather easily by replicating
the function table as it is demonstrated in Figure 5.1.
Figure 5.1: Converting a two input lookup table into a three input lookup table with
unused input i2 .
The mapping tool can permute the inputs of the lookup tables, for example, to
achieve a better routing. In most FPGA architectures, the routing resources for
lookup table inputs are not equal, and so a permutation of the lookup table inputs
70
5.2 Additive Watermarking of IP Cores
can lower the amount of used routing resources. Permutation of the inputs signifi-
cantly alters the content of a lookup table. For nl inputs, nl ! permutations exist and
thus up to nl ! different lookup table values for one so-called unique function. To
compare the contents of the lookup table from the netlist and the bitfile, it must be
checked if one of these possible different lookup table values for one unique function
is equal to the value of the lookup table in the bitfile. This is done by creating a table
with all possible values of lookup tables for all unique functions (see Figure 5.2).
Figure 5.2: Before the lookup table contents of the bitfile and the netlist are com-
pared, they are mapped into unique functions.
71
5. IP Protection
After watermarking bitfile cores, we now watermark netlist cores. Netlist IP cores
consist of primitive cells (e.g., LUT4, DFF, XORCY) of a certain FPGA family which
covers many different FPGA devices. For example, the whole Xilinx Virtex-4, or
Altera Stratix-II family with all different FPGA sizes. This means, that one netlist
core can be deployed for the whole family without changing the file. Once again,
we are using the Virtex-II and II Pro family to demonstrate this approach. However,
using other FPGA families should also be possible by adapting the methods to their
primitive cells. Another big advantage from netlist cores over bitfile cores is that the
bitfile creator (e.g., product developer) can combine different cores.
As mentioned before in Section 5.2.2, FPGAs usually consist of the same type of
lookup tables with regard to the number of inputs. For example, the Xilinx Virtex-II
uses lookup tables with four inputs whereas the Virtex-5 has lookup tables with six
inputs. However, in common netlist cores many logical lookup tables exist, which
have less inputs than the FPGA type. These lookup tables are mapped to the physical
lookup tables of the FPGA. If the logical lookup table of the netlist cores has fewer
inputs than the physical one, the memory space which cannot be addressed remains
unused. We use this memory space to embed a watermark into functional lookup
tables.
One problem of watermarking netlist cores is that the core further traverses the de-
sign flow which includes different optimization steps. Additive watermarking meth-
ods which use redundant structures or logic as watermark have the problem that the
global optimization steps may detect and remove this redundancy. Todays design
tools are very sophisticated to find redundant logic in a design. Even if a special re-
dundant logic which can be used as watermark is not removed by today’s tools, it is
not granted that future versions or other tools may not detect and remove this logic.
The challenge is to find an element or component which can be used as watermark
and is not altered by design tools. For Xilinx FPGAs such elements are shift registers
and memories which are implemented in lookup tables.
In some FPGA architectures (e.g., all Xilinx Virtex architectures), the lookup ta-
bles (LUTs) can also be used as a shift register or distrubuted memory [Xilf]. For
example, a 4-input lookup table can be further used as a 16-bit shift register (see Fig-
ure 5.3). The content of such a shift register can be further addressed by the lookup
table input ports. So, the shift register can also be used as a functional lookup table. If
the lookup table is used as a LUT primitive cell, the content is interpreted as logic by
the design tools and is in focus of optimization. However, if the same content is used
as a shift register or memory primitive cell, the design tools do not touch the con-
tent. Using the unused memory space of functional lookup tables for watermarking
without converting the lookup table either to a shift register or distributed memory
turns out to be not applicable, because design flow tools identify the watermark as
redundant and remove the content due to optimization. Converting the watermarked
72
5.2 Additive Watermarking of IP Cores
functional lookup table into shift registers or memory cells, prevents the watermark
from deletion due to optimization.
Figure 5.3: In the Xilinx Virtex architecture, a lookup table (LUT4) can also be
configured as a 16-bit shift register (SRL16).
Embedding of the Watermark In this approach we are use Virtex-II Pro FP-
GAs and convert LUT1, LUT2, or LUT3 primitive cell which can be found in netlists
of IP cores into the shift register primitive cell SRLC16E. Note that LUT1 has one
input, LUT2 two and so on. LUT4 has four input and uses the whole lookup table
memory for its function which make this type uninteresting for our approach. The
content of the physical 4-input lookup table in an FPGA stores 16 bits. A LUT3 prim-
itive cell uses only 8 bits, a LUT2 4 bits, and LUT1 only 2 bits out of the 16 bits. The
Xilinx mapping tool map duplicates the used memory area to the unused area if not
all inputs are needed (see Section 5.2.2). Therefore, to use the unused memory space
for embedding a watermark, we must restrict the memory reachability of the function
by clamping the unused inputs to constant values. In Figure 5.4, we demonstrate this
idea for an AND-function, implemented by a LUT2. By clamping input A3 and A4
to zero, we can free 12 bits which can be used for carrying a watermark.
Another problem of watermarking netlist cores is that the published core is com-
bined with other cores and undergoes further design flow steps, like the placement
of the lookup table in the FPGA. Therefore, at the extraction of the watermark, we
do not know the locations of the watermarks. To reduce the effort for identifying the
watermarks after the extraction, we can cascade the watermarks over the shift in (D)
and shift out (Q15) ports of the shift register cell. We assume, that design tools place
these chains of watermarks close together which extremely simplifies the extraction
of the watermarks. Furthermore, for rebuilding the watermark from the individual
extracted watermarked lookup tables, the sequence is important. To bring the dif-
ferent watermarks, which have further different sizes according to the used original
functional lookup table cell, into the right order, we concatenate the watermark bits
73
5. IP Protection
O <= A1 and A2
A4 <= '0'
A4 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
A3 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
A2 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
A1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
O 0 00 1 ? ? ? ? ? ? ? ? ? ? ? ?
Figure 5.4: Example of implementing a two input AND gate using a four input
lookup table. Addressable storage is restricted by connecting the unused
inputs to zero [SZT08].
with a counter. Due to limited space, only few counter bits can be used, which results
in a repetition of the counter values. Nevertheless, this method further simplifies the
detection and extraction of the watermark during the verification process.
The first step of embedding a watermark is to extract all lookup tables from a given
netlist core IL : LL (IL ) = {lutL1 , lutL2 , . . . , lutLr }, where L denotes the logic abstraction
level used for netlist cores (see Figure 5.5). Each element lutLi denotes a lookup
table primitive cell in the netlist (e.g. for Virtex-II devices, LUT1, LUT2, LUT3, or
LUT4). A watermark generator GL (·, ·) must know the different lookup table cells
with the functional content as well as the unique key K to generate the watermarks:
GL (K, LL (IL )) = WL .
From the unique key K a secure pseudo random sequence is generated. Some or all
of the extracted lookup table primitive cells are chosen to carry a watermark. Usually
a core which is worth to be watermarked consists of many markable lookup tables.
Transforming all of these lookup tables into shift registers restricts the optimization
degree of the tools and results in non-optimal timing behavior. Therefore, only a
small subset of all suitable lookup table are chosen. Note that the shift registers
must never be shifted, because this alters the functional part of it. Nevertheless, we
connect the clock input with the clock, but the shift enable input to ground. Now, the
transformed shift registers are ordered and the first 4 bits of the free space are used for
the counter value. The other bits are initialized according to the position with values
from the pseudo random stream, generated from the key K. Note that the number
of bits which can be used for the random stream depends on the original functional
lookup table type.
The generated watermark WL consists of the transformed shift registers: WL =
{srlL1 , srlL2 , . . . , srlLk } with k ≤ r. The watermark embedder EL inserts the water-
74
5.2 Additive Watermarking of IP Cores
#
∀
! ∀
∃%∃&
! ∀
∀
∋
∋
( #
Figure 5.5: The watermarking netlist core system. In the embedding system the
lookup tables are extracted from the netlist core and the watermark gen-
erator select suitable lookup table, transform it to shift register and add
the watermark. The embedder insert the watermark. A product devel-
oper may obtain this watermarked netlist core an combine it with other
cores into a product. The lookup tables from the product can be ex-
tracted and transformed so, that the detector can decide if the watermark
is present or not.
75
5. IP Protection
marks into the netlist core IL by replacing the corresponding original functional
lookup tables with the shift registers: EL (IL ,WL ) = IeL . The watermarked work IeL
can now be published and sold.
Extraction of the Watermark The purchased core IeL can now be combined by
a product developer with other purchased or self developed cores and implemented
into an FPGA bitfile: IbB = TL→B (IeL ◦ IL0 1 ◦ IL0 2 ◦ . . . ) (see Figure 5.5). An FPGA which
is programmed with this bitfile IbB may be part of a product. If the product developer
is accused of using an unlicensed core, the product can be purchased and the bitfile
can be read out, e.g., by wire tapping. The lookup table content and the content of
the shift registers can be extracted from the bitfile: LB (IbB ) = {b xB1 , xbB2 , . . . , xbBq }.
The lookup table or shift register elements xBi belong to the device abstraction level
B. The representation can differ from the representation of the same content in the
logic abstraction level L. For example, in Xilinx Virtex-II FPGAs the encoding of
the shift register differs from the encoding of lookup tables. For shift registers the bit
order is reversed compared to the lookup table encodings. Therefore, the bitfile ele-
ments must be transferred to the logic level by the corresponding decoding. This can
be done by the reverse engineering operator: TL←B (LB (IbB )) = {b xL1 , xbL2 , . . . , xbLq }.
Reverse engineering lookup table or shift register content is however very simple
compared to reverse engineering the whole bitfile. Now, the lookup table or shift reg-
ister content can be used for the watermark detector DL which can decide if the water-
mark WL is embedded in the work or not: DL (WL , {b xL1 , xbL2 , . . . , xbLq }) = true/ f alse.
The detector DL searches the content of the watermarked shift register WL in the
extracted lookup table contents from the bitfile. It might occur that certain water-
marks will be found in more than one locations, because more of the same water-
marks exist with an identical content, or a complete functional lookup table has, by
chance, the value of a watermarked one. To simplify the extraction, the watermarks
are chained together by the shift in and out ports. It is likely that these watermarks
are placed close together. From the bitfile lookup table extraction LB , we also have
the locations of the possible watermarks. Using these locations we can in most cases
identify the right watermark, if duplicates exist. Note that this chaining approach is
not mandatory, but elevates the robustness of the approach against ambiguity attacks.
After the detection of the watermark WL inside the bitfile IbB , the watermark must
be verified similar to the watermarking approach for bitfile cores proposed in Section
5.2.3.
Additional Reading More information about this method can be found in [SZT08].
76
5.2 Additive Watermarking of IP Cores
Power Watermarking
This section introduces watermarking techniques, where a signature is verified over
the power consumption pattern of an FPGA. These techniques may also be suitable
for ASIC designs, however, we concentrate on FPGA designs and develop several
enhancements which are exclusively related to the FPGA technology. The presented
idea is new and differs from [KJJ99] and [AARR03] where the goal of using power
analysis techniques is the detection of cryptographic keys and other security issues.
For power watermarking methods, the term signature refers to the part of the wa-
termark which can be extracted and is needed for the detection and verification of the
watermark. The signature is usually a bit sequence which is derived from the unique
key for author and core identification.
First of all, a short introduction is given and the communication channel between
the generation and the detection of the watermark is discussed. Next, the basis
method is presented and afterwards, several enhanced methods which increase the
robustness of decoding the watermark in case of external or internal disturbances are
introduced. Finally, multiplex methods are discussed which enable the detection of
more than one watermark if multiple watermarked cores are present in the design.
77
5. IP Protection
∃
%#
# !
∀
Figure 5.6: Watermark verification using power signature analysis: From a signa-
ture (watermark), a power pattern inside the core will be generated that
can be probed at the voltage supply pins of the FPGA. From the trace,
a detection algorithm verifies the existence of the watermark.
supply voltage drops and rises (see Figure 5.7). In the frequency domain, the clock
frequency with harmonics and even integer divisions are present (see Figure 5.8).
The real behavior of the core voltage depends on the individual FPGA, the individual
printed circuit board and the individual voltage supply circuits.
In the following, we seek for techniques to encode a watermark such that the core
voltage is subject to change once the watermark is processed within a core. In the
first method, the frequency of the voltage drops shall be influenced, in the second, the
amplitude of the voltage drops shall be manipulated.
In the first case, a watermark can be identified if we produce another frequency
line in the spectrum of the core voltage which is not an integral multiple or a rational
fraction of the clock frequency. For achieving this, we need a circuit that consumes
a considerable amount of power and generates a signature-specific power pattern,
and a clock which can be identified in the spectrum. The power consumer can be,
for example, an additional shift register. If we would derive the clock source from
the operational clock, we would not be able to distinguish the frequency line in the
spectrum from operational logic. Another opportunity is to generate a clock using
78
5.2 Additive Watermarking of IP Cores
Figure 5.7: A measured voltage signal at the voltage supply pin of an FPGA. The
core supply voltage drops and rises. Note that the DC component is
filtered out.
combinatorial logic. This clock could be identified as a watermark, but the jitter of a
combinatorial clock source might be very high, and no clean frequency line could be
seen in the spectrum. This means that we need a higher additional power consumer
to make the watermark readable. Another drawback is that we have only limited
possibilities to encode a signature reliably in these frequency lines.
In the following approaches, we alter the amplitude of the interferences in the
core voltage. The basic idea is to add a power pattern generator (e.g., a set of shift
registers), and clock it either with the operational clock or an integer division thereof.
Further, we control these power pattern generators according to the characteristics of
the data sequence which should be sent, respectively detected. A logical ’1’ lets the
power consumer operate one cycle (e.g., perform a shift), a ’0’ causes no operation.
We detect higher amplitudes in the voltage profile over time corresponding to the
ones and smaller amplitudes according to the zeros. Note that the amplitude for the
no-operation state is not zero, because the operational logic and the clock tree is still
active.
79
5. IP Protection
Figure 5.8: The spectrum of the measured signal in Figure 5.7. The clock frequency
of 50 MHz and harmonics can be seen. Also, a peak at the half of the
clock frequency is visible which is caused by switching activities from
the logic.
The advantage of power watermarking methods is that the signature can easily be
read out from a given device. Only the core voltage of the FPGA must be measured
and recorded. No bitfile is required which needs to be reverse-engineered. Further-
more, these methods work also for encrypted bitfiles whereas methods where the
signature is extracted from the bitfile fail. Moreover, we are able to sign netlist cores,
because our watermarking algorithm does not need any placement information. So,
also cores at this level can be protectedly watermarked.
Basic Method In this section, we describe the basic method for power water-
marking of netlist cores. The concept, the embedding of the watermark, as well as the
detection and verification procedure are described. The encoding and decoding for
sending the signature through the FPGA power communication channel is relatively
simple and straightforward in the basic method and will be refined later on with the
enhanced methods. However, the basic concepts of embedding and the verification
are very similar in all methods.
80
5.2 Additive Watermarking of IP Cores
For power watermarking, two shift registers are used, a large one for causing a rec-
ognizable signature-dependent power consumption pattern, and a shift register stor-
ing the signature itself (see Figure 5.6 in Section 5.2.2). The signature shift register
is clocked by the operational clock and the output bit enables the power pattern gen-
erator. If the output bit is a ’1’, the power pattern register will be shifted at the next
rising edge of the operational clock. At a ’0’, no shift is done. Therefore, the channel
encoding is Z = {(γ, 1, 1), (γ̄, 1, 1)}. To avoid interference from the operational logic
in the measured voltage, the signature is only generated during the reset phase of the
core.
As mentioned before in Section 5.2.2, a shift register can also be used as a lookup
table and vice versa in many FPGA architectures (see Figure 5.3 in Section 5.2.2).
A conversion of functional lookup tables into shift registers does not affect the func-
tionality if the new inputs are set correctly. This allows us to use functional logic
for implementing the power pattern generator. The core operates in two modes, the
functional mode and the reset mode. In the functional mode, the shift is disabled and
the shift register operates as a normal lookup table. In the reset mode, the content is
shifted according to the signature bits and consumes power which can be measured
outside of the FPGA. To prevent the loss of the content of the lookup table, the output
of the shift register is fed back to the input, so the content is shifted circularly. When
the core changes to the functional mode, the content must be shifted to the proper
position to have a functional lookup table for the core.
The amplitude of the generated power signature depends on the number and con-
tent of the converted lookup tables. It will be assumed that the transitions between
zeros and ones in the bit pattern of the lookup table contents are sufficient to pro-
duce a recognizable pattern on the supply voltage. Experimental results in [Bau08]
show that, on average, 8 of maximal 16 transitions are generated in functional 4 input
lookup tables of example cores if the content will be shifted.
To increase the robustness against removal and ambiguity attacks, the content of
the power consumption shift register which is also part of the functional logic can be
initialized shifted. Only during the reset state, when the signature is transmitted, the
content of the functional lookup table can be positioned correctly. So, normal core
operation cannot start before the signature was transmitted completely. The advan-
tage is that the core is only able to work after sending the signature. Furthermore, to
avoid a too short reset time in which the watermark cannot be detected exactly, the
right functionality will only be established if the reset state is longer than a predefined
time. This prevents the user from leaving out or shorten the reset state with the result
that the signature cannot be detected properly.
The signature itself can be implemented as a part of the functional logic in the same
way. Some lookup tables are connected together and the content, the function of the
LUTs, represents the signature. Furthermore, techniques described in Section 5.2.2
can be used to combine an additional watermark and the functional part in a single
lookup table if not all lookup table inputs are used for the function. For example,
81
5. IP Protection
LUT2 primitives in Xilinx Virtex-II devices can be used to carry an additional 12-
bit watermark by restricting the reachability of the functional lookup table through
clamping certain signals to constant values. Therefore, the final sending sequence
consists of the functional part and the additional watermark. This principle makes it
almost impossible for an attacker to change the content of the signature shift register.
Altering the signature would also affect the functional core and thus result in a corrupt
core.
The advantages of using the functional logic of the core also as a shift register are
a reduced resource overhead for watermarking and the robustness of this method, be-
cause these shift registers are embedded in the functional design and it is hard, if not
impossible, to remove the shift registers without destroying the functionality of the
core. Furthermore, our watermarking procedure is difficult to be detected in a netlist
file, because the main part of the required logic for signature creation depends on the
functional logic for the proper core. Another benefit is that our watermark cannot
be removed by an optimization step during the mapping into CLBs (Configurable
Logic Blocks). Nevertheless, if an attacker had special knowledge of the watermark-
ing method and of the EDIF netlist format, he may reverse-engineer the alteration
of the embedding algorithm and remove or disable the sending method. This can
be avoided by initializing the power pattern register with shifted lookup table con-
tents (see above). If sending of the signature is prevented, the core will not function
properly.
82
5.2 Additive Watermarking of IP Cores
The watermark embedder EL (IL ,WL ) = IeL consists of two steps. First, the core IL
must be embedded in a wrapper which contains the control logic for emitting the
signature. This step is done at the register-transfer level before synthesis. The second
step is at the logic level after the synthesis. A program converts suitable lookup tables
(for example LUT4 for Virtex-II FPGAs) into shift registers for the generation of the
power pattern and attaches the corresponding control signal from the control logic in
the wrapper (see Figure 5.9).
Figure 5.9: The core and the wrapper before (above) and after (below) the netlist
alternation step. The signal “wmne” is an enable signal for shifting the
power pattern generator shift register.
The wrapper contains the control logic for emitting the watermark and the shift
register, holding the signature. If functional lookup tables are used for implementing
the signature shift register, we add or convert this shift register in the second step
so that the wrapper contains only the control logic. Some control signals have no
sink yet, because the sink will be added in the second step (e.g., the power pattern
generator shift register). So we must use synthesis constraints to prevent the synthesis
83
5. IP Protection
tool from optimizing these signals away. The ports of the wrapper are the same for
the core, so we can easily integrate this wrapper into the hierarchy. The control
logic shifts the signature shift register, while the core is in reset state. Also, the
power pattern shift register is shifted corresponding to the output of the signature
shift register. If the reset input of the wrapper gets inactive, the function of the core
cannot start at the same cycle, because the positions of the content in the shift register
are not in the correct state. The control logic shifts the register content into the correct
position and leaves the reset state to start the normal operation mode.
The translation of lookup tables of the functional logic into shift registers is done
at the logic level. At Xilinx Virtex-II FPGAs, the usage of a LUT4 as a 16-bit shift
register (SRL16) is only possible if the LUT4 is not part of a multiplexer logic, be-
cause the additional shift logic and the multiplexer share common resources in a slice.
Also, if the lookup table is a part of an adder, the mapping tool splits the lookup table
and the carry chain. In these two cases, additional slices would be required, so we do
not convert these lookup tables into shift registers.
The embedding procedure for Virtex-II netlist cores is done by a program which
parses an EDIF netlist and writes back the modified EDIF netlist. First, the program
reads all LUT4 instances and only select those that are not a “MUXF5”, a “MUXCY”
or an “XORCY”. Then, the instances are converted to a shift register (SRL16), if
required, initialized with the shifted value and connected to the clock and the water-
mark enable (wmne) signal according to Figure 5.9. Always two shift registers are
connected together to rotate their contents. Finally, the modified netlist is created.
The watermarked core IeL is now ready for purchase or publication.
Detection Algorithm A company may obtain an unlicensed version of the core IbL
and embeds this core in a product: IbP = TL→B (IbL ◦ IL0 1 ◦ IL0 2 ◦ . . . ). If the core developer
has a suspicious fact, he can buy the product and verify that his signature is inside the
core using a detection function DP (IbP ,WL ) = true/ f alse.
Detecting the basic power watermark, the measured voltage will be probed, digi-
tized and decoded by a signature detection algorithm (see Figure 5.10). To decode
the digitalized voltage signal, the sampling rate, the clock frequency of the shifted
signature and the bit length of the signature is needed. The clock frequency can be
extracted using the Fast Fourier Transformation (FFT) of the measured signal. Our
detection algorithm consists of five steps: down sampling, differential step, accu-
mulation step, phase detection and quantization (see Figure 5.10). After successful
extraction, the decoded signature can be compared to the signature inside the water-
mark WL to establish the ownership. Furthermore, the signature must be verified by
cryptographic methods with the author’s unique key K.
As mentioned before, the main characteristic caused by a switching event is the
drop of the voltage followed by a subsequent overshoot. This results in extreme
slopes. The basic method detection algorithm can find each rising edge as follows:
84
5.2 Additive Watermarking of IP Cores
Figure 5.10: The five steps of the watermark detection algorithm: downsam-
pling, differential and accumulation step, phase detection and finally
quantization.
85
5. IP Protection
First, the measured signal will be sampled down from the recorded sample rate to the
quadruple of the clock frequency, so each signature bit is represented by four samples.
Then, the discrete derivative of the signal will be calculated. This transforms the
rising edges of the switching events into peaks. The easiest way to calculate the
discrete derivative at a discrete point in time SD [k] is to take the difference of two
subsequent samples over time (see Figure 5.11).
where SDS is the down sampled probed voltage signal and k denotes the sample index.
0
[k]
DS
S
−2
−4
0 2 4 6 8 10 12 14 16
k
2
S [k]
D
−2
−4
0 2 4 6 8 10 12 14 16
k
Figure 5.11: An example voltage signal which represents the signature “0011”
(above). The example voltage signal after the differentiation step
(below).
Since the signature is repeated many times during the reset state, the signal can
be accumulated and averaged to reduce the noise level. To accumulate the coherent
pattern, we need to know the bit length of the signature. If we record a longer signal
sequence, we can accumulate more patterns and reduce noise as well as switching
events which do not belong to the power consumption register of the watermarking
algorithm. The disadvantage is that we would need a longer time for the reset phase.
86
5.2 Additive Watermarking of IP Cores
After this third step, we have a signal in which each signature bit is represented by
four samples. But only one sample carries the information of the rising edge. Since
the measurement is not synchronized with the FPGA clock, the phase (position) of
the relevant sample of a bit is unknown. We divide the signal into four new signals,
where one signature bit is represented in one sample. The four signals have a phase
shift of 90o to each other. Let
denote the sampled voltage signal after the accumulation step where m is the length
of the signature. Then, we obtain the four following phase shifted signals
where SAS is the accumulated signal and S0 , S90 , S180 , and S270 are the phase signals
(see Figure 5.12).
We are able to extract the right phase of the signal if we calculate the mean value of
each phase-shifted signal. The maximal mean value corresponds to the correct phase,
because the switching event should cause the greatest rising edge in the signal.
Now, we have a signal in which each sample is represented by the accumulated
switching activities of one bit of the signature. The decision if the sample corresponds
to a signature bit ’1’ or ’0’ can be done by comparing the sample value with the mean
value of the signal. If the sample value is higher than the mean value, the algorithm
decides a ’1’, in the other case a ’0’.
Robustness Analysis The most common attacks against watermarking are re-
moval, ambiguity, key copy, and copy attacks. Once again, key copy attacks can be
prevented by asymmetric cryptographic methods, and there is no protection against
copy attacks.
Removal attacks most likely take place on the logic level instead of the device level
where it is really hard to alter the design. The signature and power shift registers as
well as the watermark sending control logic in the wrapper are mixed with functional
elements in the netlist. Therefore, they are not easy to detect. Even if an attacker
is able to identify the sending logic, a deactivation is useless if the content of the
power shift register is only shifted into correct positions after sending the signature.
By preventing the sending of the watermark, the core is unable to start. Another
possibility is to alter the signature inside the shift register. The attacker may analyze
the netlist to find the place were the signature is stored. This attack is only successful
87
5. IP Protection
5
[k]
0
AS
S
−5
2 4 6 8 10 12 14 16
k
5
S [k]
0
0
−5
1 1.5 2 2.5 3 3.5 4
k
5
S [k]
0
90
−5
1 1.5 2 2.5 3 3.5 4
k
5
[k]
0
180
S
−5
1 1.5 2 2.5 3 3.5 4
k
5
[k]
0
270
S
−5
1 1.5 2 2.5 3 3.5 4
k
Figure 5.12: The example voltage signal after the accumulation step (above) and
the four phase shifted signals (below). Here, S180 corresponds to the
right phasing.
if there is no functional logic part mixed with the signature. By mixing the random
bits with functional bits, it is hard to alter the signature without destroying the correct
functionality of the core. Therefore, this watermark technique can be considered as
resistant against removal attacks.
In case of ambiguity attacks, an attacker analyses the power consumption of the
FPGA in order to find a fake watermark, or to implement a core whose power pattern
disturbs the detection of the watermark. In order to trustfully fake watermarks inside
the power consumption signal, the attacker must present the insertion and sending
procedure which should be impossible without using an additional core. Another
possibility for the attacker is to implement a disturbance core which needs a lot of
power and makes the detection of the watermark impossible. In the next sections,
enhanced robustness encoding methods are presented which increase the possibility
to decode the signature, even if other cores are operating during the sending of the
signature. Although a disturbance core might be successful, this core needs area
and most notably power which increases the costs for the product. The presence of a
disturbance core in a product is also suspicious and might lead to further investigation
if a copyright infringement has occurred. Finally, the attacker may watermark another
88
5.2 Additive Watermarking of IP Cores
core with his watermark and claim that all cores belong to him. This can be prevented
by adding a hash value of the original core without the watermark to the signature
like in the bitfile watermarking method for netlist cores.
Figure 5.13: Measured voltage supply signal when sending “FFFF0000” with a
large power pattern generator shift register.
This fading out amplitude belongs to an overlaid frequency which might be pro-
duced by a resonance circuit that consists of the capacitances and resistances from
the power supply plane and its blocking capacitances. This behavior is dependent on
the printed circuit board and the power supply circuit.
To avoid such a false detection, the transmission time of one symbol is extended by
the time of the swing out of the printed circuit board by sending the same signature
89
5. IP Protection
bit multiple times: Z = {(γ, 1, ω), (γ̄, 1, ω)}. The repetition rate for each signature
bit is ω clock cycles. If we connect two SRL16 together, one period for this shift
register needs 32 clock cycles. If the reset phase ends and we have finished sending
one bit, the content in the shift register which also represents a part of the logic of the
core is in the correct position.
The detection algorithm differs for this method. First, the signal will be sampled
down and the approximate derivation will be calculated as in the original method (see
Section 5.2.2). Now, we average the signal to suppress the noise. Here, the length
of one signature word is the length of the signature (m) multiplied by the number of
times each bit is sent (ω).
Here, SD is the voltage signal after the differential step with index k and ns being the
number of repetitions of the pattern in SD .
The phase detection of the shift clock is the same as in the original method (see
Section 5.2.2), but we also need the position p where a new signature bit starts. This
is done in a loop to detect this position. In the beginning, we assume that the starting
position is the beginning of our trace (p = 0). First, we accumulate ω successive
values where ω is the repetition of one bit:
ω−1
S p [ j] = ∑ Sφ [i + p + ω j], j = 0, 1, .., m − 1 (5.10)
i=0
Here, Sφ denotes the signal after the phase detection step. Now, we subtract the mean
value and generate the absolute value and calculate the sum of it.
n−1
1 n−1
Fp = ∑ S p [i] − ∑ S p[ j] (5.11)
i=0 n j=0
Fp identifies how good our signature bit starting position p fits the real position.
Now, we shift our trace one value (p = 1) and calculate the fitting value again, and
so on. This is done ω times. The starting position with the best fitting value will be
used.
The decoding of the watermark signature is done like in the basic method (see
Section 5.2.2) by comparing the sample values with the mean value of the samples.
90
5.2 Additive Watermarking of IP Cores
Figure 5.14: Shown is a carrier signal Scarrier and the BPSK modulated signal
SBPSK . The signature bit value ’0’ is decoded with 0◦ and the value
’1’ is decoded with 180◦ .
91
5. IP Protection
To generate the new watermark signal, the power pattern generator is driven by the
signal SBPSK and performs the OOK modulation. The encoding scheme for the signal
SBPSK is: Z = {(γ, 1, ω), (γ̄, 1, ω)}, where ω is chosen 10 in our case. To send the
signal SBPSK for one period, we first send five ones (the power pattern shift register is
shifted five times) and then five zeros (the power pattern shift register is not shifted)
in case the signature bit is ’1’. If the signature bit is ’0’, first five zeros and then
five ones are sent (see Figure 5.15). For each signature bit, we repeat this period 32
times to ensure that the content of the power pattern shift registers which are also
functional lookup tables are in the correct positions after sending one signature bit.
Repetition allows to detect the signature with a higher probability. The decreased bit
rate results in a smaller bandwidth for our watermarking signal. Using this method,
we need more time to send the signature than the previously presented methods. The
signature bit rate fwm is:
Figure 5.15: The signal SBPSK is the BPSK modulated signal of the signature above.
The signal below is the voltage signal which is the OOK modulated
signal of SBPSK . This figure also illustrate the different frequencies.
The watermark control inside the wrapper (see Section 5.2.2) is altered to control
the power pattern generator in this way. Only few additional resources are used to
implement this enhanced watermark protocol.
If we look at the spectrum of the recorded signal (see Figure 5.16), we detect the
clock frequency fclk and two side bands from the OOK modulation fclk − fBPSK and
fclk + fBPSK .
The detection algorithm for this method is different from the previous methods.
Only the first (down sampling) and the last steps (quantization) are identical (see
92
5.2 Additive Watermarking of IP Cores
Figure 5.16: The spectrum of a measured signal. The clock frequency of 50MHz
and the two side bands of the modulated signal SBPSK are shown at
45MHz and 55MHz.
93
5. IP Protection
Figure 5.17). After down sampling, the two side bands of the carrier signal are mixed
down into the base band (Ssb1 and Ssb2 ) and are combined (Scc ) as follows:
where SDS is the voltage signal after down sampling with index k. The clock fre-
1
quency is fclk = 14 · fsample , and the frequency fBSPK = 10 1
· fclk = 40 · fsample . The
sample frequency of the recorded voltage signal is fsample . After low pass filtering of
Scc , we get the complex carrier signal SBPSK (see Figure 5.18).
Scc is filtered using a matched filter to obtain the limits of one signature bit and
the correct sample point. All samples of SBPSK which belong to one signature bit
are summed up into this sample point by the matched filter. At the down sampling
step, only these points are used to represent the signature bits. Now, the angle of the
94
5.2 Additive Watermarking of IP Cores
Figure 5.18: The constellation diagram of the down mixed complex signal SBPSK .
Here, the two different BPSK constellation points for the signature bit
’1’ and ’0’ are shown.
signal is calculated from the signature bit with the highest amplitude, and the signal
is rotated into the real plane. From the real valued signal, the value of the bits and
the quality of the signal are determined similar to the other detection algorithms (see
Section 5.2.2).
The advantage of the BPSK method is its robustness with respect to interferences
coupled with the clock frequency. The disadvantages are the longer reset phase and
the fact that we can only detect bit value changes and not the signature bit value di-
rectly due to the BPSK modulation. Using proper encoding methods and preambles,
the bit values can be reconstructed.
95
5. IP Protection
The approach of Lach and others watermarks bitfile cores by encoding the signa-
ture into unused lookup tables [LMSP98]. At first, the signature will be hashed and
coded with an error correction code (ECC) to be able to reconstruct the signature
even if some lookup tables are lost, e.g., during tampering. After the initial place and
route pass, the number of unused lookup tables will be determined. The signature is
split into the size of the lookup tables and additional LUTs are added to the design.
Then, the place and route process will be started again with the watermarked design.
Later, the approach was improved by using many small watermarks instead of a sin-
gle large one [LMSP99]. The size of the watermarks should be limited by the size of
a lookup table. The advantage is that small watermarks are easier to search for, and
for verification, only a part of all of watermark positions must be published. With
the knowledge of the published position, the watermark can be easily removed by
an attacker. At the verification process, only a few positions of the watermark need
to be used to establish the ownership. A second improvement is that a fingerprinting
technology is added to the approach that enables the owner to see which customer has
given the core away [LMSP01]. The fingerprinting technology is achieved by divid-
ing the FPGA into tiles. In each tile, one lookup table is reserved for the watermark.
The position of the mark in the tile encodes the fingerprint. For verification, it is
possible to read out the content of the lookup table from a bitfile. So, these methods
are easy to verify. It’s more difficult to determine the position of the watermark in a
tile, but it’s still generally possible. However, if an attacker knows the position of the
watermark, it is easy to overwrite it.
Saha and others present a watermarking strategy for FPGA bitfiles by subdividing
the lookup table locations into sets of 2 × 2 tiles [SSK07]. The number of used
lookup tables in a set is used as signature. From an initial level, additional lookup
tables are added to achieve the fill level according to the signature. The input and
output are connected to the don’t care inputs of the neighboring cells. Kahng and
others show in [KLMS+ 98] that the configuration of the multiplexer of unused CLB
outputs in FPGA bitfiles can carry a signature. The signature is embedded after the
bitfile creation and by knowing the encoding of the bitfile. These configuration bits
can be later extracted to verify the signature.
Van Le and Desmedt show that these additional watermark schemes for bitfile
cores can be easily attacked by reverse engineering, watermark localization, and sub-
sequent watermark removal [LD03]. A simple algorithm is introduced which iden-
tifies lookup tables or multiplexers whose outputs are not connected to any output
pins. However, these attacks are only successful if reverse engineering of the bitfile
is possible and the costs of reverse engineering are not too high.
Finally, Kean and others present a watermarking strategy where a signature is em-
bedded into an FPGA bitfile core or design [KMM08]. The read out of the signature
96
5.2 Additive Watermarking of IP Cores
Concept In the following, the watermark approach is described in detail. For wa-
termarking a bitfile core, the watermarks which should be embedded into the unused
97
5. IP Protection
lookup tables must be generated. This is done by the watermark generator function:
GB (K) = WB . The generator needs a unique key K which identifies the author as well
as the core and the authors private key as input. The output is a set of watermarks
WB = (wB1 , wB2 , · · · , wBm ). Each element wBi must fit into a single lookup table. For
Xilinx Virtex-II and II Pro FPGAs, which use 4-input lookup tables, the size is 16
bit.
Additionally, the number of usable lookup tables which can carry a watermark
must be determined. This can be done by extracting all lookup table contents and
coordinates: LB (IB ) = {xB1 , xB2 , . . . , xBq }. The next step is to find suitable location
candidates which can carry a watermark. For Xilinx Virtex-II and II Pro FPGAs,
possible candidates are unused lookup table in a used slice. Such candidates can
be easily determined, because they carry the initialization value 0xFFFF, whereas
unused lookup tables in unused slices have 0x0000 as initialization value. The
higher the number of location candidates and therefore watermarked lookup tables
is, the more reliable is the proof of authorship. For example, if only one lookup table
candidate was found, only 24 = 16 different watermark values overall exist, which
makes the proof of authorship contradictable.
The content of the chosen locations of the bitfile core IB can be replaced by the
watermarks WB with the embedder IeB = EB (IB ,WB ) (see Figure 5.19). The result is
the watermarked bitfile core IeB . The distance DistB (IB , IeB ) between the watermarked
and original core is low, because of the functional correctness and all electrical prop-
erties of the core are preserved. Furthermore, if the watermarks are near to the func-
tional lookup tables, the watermarks cannot be easily distinguished from the func-
tional lookup tables.
For extracting the watermarks, we need the bitfile IeB from the accused company,
and the locations of the watermarks (see Figure 5.19). The first step is to extract
the content and coordinates from all lookup table in IeB : LB (IeB ) = {e xB1 , xeB2 , . . . , xeBq }.
Using the locations from the core developer, the watermarks WB can be identified.
e
By comparing these watermarks to the watermarks WB of the core developer, the
detection process DB (IeB ,WB ) = true/ f alse can be finished.
98
5.2 Additive Watermarking of IP Cores
! ∀
∀
#∃#%
∀
∀
&
watermark, because the locations are near the functional lookup tables and the content
is not distinguishable from the other lookup tables. However, if the attacker is able to
reverse engineer the bitfile core to the logic level (IeL = TL←B (IeB )), the watermarks are
easy to detect and can be removed. This task is, however, very expensive if no reverse
engineering tool is available. For Virtex-II devices the Xilinx “reverse engineering”
tool JBits [Xilc] is available, which is in fact able to remove the watermarks.
The attacker may analyze the bitfile core and search for lookup table content which
he can present as his own watermark in case of ambiguity attacks. He can use the in-
serted watermarks and assert that these watermarks belong to him. To be successful
with such an attack, he must also present the procedure to generate the watermarks.
Hereby, the attacker must generate a signatures or key which identifies him as the
author and fits to the watermarks inside the core. This is very hard to achieve due to
99
5. IP Protection
the usage of one way cryptographic functions. Furthermore, the attacker can present
some functional lookup tables as his watermarks. This should also be nearly impossi-
ble due to the characteristics of one way cryptographic functions. Another possibility
to check this attack, is to remove the watermarks from the bitfile core. The correct
watermarks are inserted after the implementation of the core and therefore the core
should keep the functional correctness. Whereas the removal of the wrong water-
marks which are functional lookup table contents, destroys the core.
Using asynchronous public/private key cryptographic functions for the watermark
generation and verification and further storing information about the core into the
unique key successfully prevents key copy attacks.
Additional Reading More information about this method can be found in [SZT08].
100
5.3 Constraint-Based Watermarking of IP Cores
Figure 5.20: The solution space of an original and a watermarked design. If a de-
sign satisfies the original and the additional constraints, then the design
is protected by a watermark. The probability that the additional con-
straints are satisfied by chance should be low to have a strong proof of
authorship.
able to remove a watermark, for example, embedded into the layout of a circuit, the
watermarks added at higher abstraction levels are still present. However, Charbon
focused more on layout, nets, and latch watermarking techniques which are only
applicable for ASIC layout cores.
The verification of a constraint-based watermark is usually done with the water-
marked core as it is. This means the watermarked core can be purchased or published
and from the distributed cores the watermark can be verified. However, if the core
is combined with other cores and traverses further design steps, the watermark infor-
mation is usually lost or it cannot be extracted.
Van Le and Desmedt [LD03] present an ambiguous attack for constraint-based
watermarking techniques. The authors add further constraints to the watermarked
solution by allowing only a minimal increase of the overhead. The result is a slightly
degenerated solution which satisfies many additional constraints. This means that in
this solution, a lot of different signatures can be found which destroys the unique
identification of the core developer. They choose, for example, the constraint-based
watermarking approach for graph coloring. Further, this attack might be applicable
to other constraint-based watermarking techniques.
As it was the case with additive watermarking strategies, constraint-based water-
marking strategies are applicable for HDL, netlist, and bitfile cores.
101
5. IP Protection
access the internal registers for debugging purposes. The use of scan chains in FPGA
designs is rather unusual, but might be helpful in some cases. At first, a number
will be assigned to each register, and the registers will be sorted. Now, a pseudo
random sequence will be generated from the signature. Registers are selected with
an algorithm which uses a random sequence as input. For Kc scan chains, the first
Kc selected registers are chosen as the first register in each chain. Depending on
the signature, we have a variation on the scan chains which can be used to detect
the watermark. It is possible that an unfortunately chosen start of a chain could
result in the allocation of more routing resources. Moreover, the maximum clock
frequency for the scan chain can be limited. This approach is easy to verify, if the
scan chains can be accessed from outside of the chip. Problems occur, if the scan
chain is only used internally or is not connected to any device. In such a case, there
is no verification possibility.
Some work was done for watermarking digital signal processing (DSP) functions
[RAMSP99, CD00]. This kind of watermarking has more in common with media wa-
termarking instead if IP watermarking. Both approaches alter the function of the core
slightly by embedding a watermark. In [RAMSP99], the coefficients of finite impulse
response (FIR) filters are slightly varied according to the watermark. Additionally,
the authors use different structures to build the FIR filter which also corresponds to
the signature. In [CD00], these ideas are extended and proven correct by mathemati-
cal analysis.
102
5.3 Constraint-Based Watermarking of IP Cores
Khan and others watermark netlist cores by doing a rewiring after synthesis [KT05].
Rewiring means that redundant connections between primitive cells are added in the
netlist which makes other original connections redundant. These new redundant con-
nections are removed.
Bai and others introduce a method for watermarking transistor netlists for full cus-
tom designs [BGXC07]. The transistors are enumerated and sorted into a list like in
the approach above. Corresponding to the pseudo random stream generated from the
signature, the width of the transistor gate is altered. If the transistor is assigned a ’1’
from the random stream, the transistor width is increased by a constant value.
103
5. IP Protection
104
5.4 Other Approaches
105
5. IP Protection
Figure 5.21: An Arbiter-PUF consists of a flip-flop and two delay lines the routing
of which can be altered by different challenge values. An edge propa-
gates through the multiplexer network to the flip-flop. The registered
response is determined by which signal arrives first. The responses of
different challenges are device dependent, hence to minimal uncon-
trollable path delay variations of different devices.
106
Bibliography
[AARR03] Dakshi Agrawal, Bruce Archambeault, Josyula R. Rao, and Pankaj Ro-
hatgi. The EM Side-Channel(s). In CHES ’02: 4th International Work-
shop on Cryptographic Hardware and Embedded Systems, pages 29–
45, London, UK, 2003. Springer-Verlag.
[ABEL05] Martı́n Abadi, Mihai Budiu, Úlfar Erlingsson, and Jay Ligatti. Control-
flow Integrity. In CCS ’05: Proceedings of the 12th ACM Confer-
ence on Computer and Communications Security, pages 340–353, New
York, NY, USA, 2005. ACM Press.
[ABEL09] Martı́n Abadi, Mihai Budiu, Úlfar Erlingsson, and Jay Ligatti. Control-
flow Integrity: Principles, Implementations, and Applications. ACM
Trans. Inf. Syst. Secur., 13(1):1–40, 2009.
[ABMF04] Todd Austin, David Blaauw, Trevor Mudge, and Krisztián Flautner.
Making Typical Silicon Matter with Razor. Computer, 37(3):57–65,
2004.
[Ale96] Aleph One. Smashing the Stack for Fun and Profit. Phrack magazine,
49(7), 1996.
[All07] Business Software Alliance. Fifth Annual BSA and IDC Global Soft-
ware Piracy Study. Technical report, 2007.
107
Bibliography
108
Bibliography
[BDH+ 98] Feng Bao, Robert H. Deng, Yongfei Han, Albert B. Jeng, A. Desai
Narasimhalu, and Teow-Hin Ngair. Breaking Public Key Cryptosys-
tems on Tamper Resistant Devices in the Presence of Transient Faults.
In Proceedings of the 5th International Workshop on Security Proto-
cols, pages 115–124, London, UK, 1998. Springer-Verlag.
[BEA06] Mihai Budiu, Úlfar Erlingsson, and Martı́n Abadi. Architectural Sup-
port for Software-based Protection. In ASID ’06: Proceedings of the 1st
workshop on Architectural and system support for improving software
dependability, pages 42–51, New York, NY, USA, 2006. ACM.
[BGB06] Lilian Bossuet, Guy Gogniat, and Wayne Burleson. Dynamically Con-
figurable Security for SRAM FPGA Bitstreams. International Journal
of Embedded Systems, 2(1):73–85, 2006.
[BGXC07] Fujun Bai, Zhiqiang Gao, Yi Xu, and Xueyu Cai. A Watermarking
Technique for Hard IP Protection in Full-custom IC Design. In Interna-
tional Conference on Communications, Circuits and Systems (ICCCAS
2007), pages 1177–1180, 2007.
[BK00] Bulba and Kil3r. Bypassing Stackguard and Stackshield. Phrack Mag-
azine, 2000.
[BLW+ 01] Saurabh Bagchi, Y Liu, Keith Whisnant, Zbigniew Kalbarczyk, Rav-
ishankar K. Iyer, Y. Levendel, and Larry Votta. A Framework for
Database Audit and Control Flow Checking for a Wireless Telephone
Network Controller. In DSN ’01: Proceedings of the 2001 Interna-
tional Conference on Dependable Systems and Networks (formerly:
FTCS), pages 225–234, Washington, DC, USA, 2001. IEEE Computer
Society.
109
Bibliography
[BPS00] William R. Bush, Jonathan D. Pincus, and David J. Sielaff. A Static An-
alyzer for Finding Dynamic Programming Errors. Software-Practice
Experience, 30(7):775–802, 2000.
[BRSS08] Erik Buchanan, Ryan Roemer, Hovav Shacham, and Stefan Savage.
When Good Instructions go Bad: Generalizing Return-oriented Pro-
gramming to RISC. In CCS ’08: Proceedings of the 15th ACM con-
ference on Computer and communications security, pages 27–38, New
York, NY, USA, 2008. ACM.
[BS97] Eli Biham and Adi Shamir. Differential Fault Analysis of Secret Key
Cryptosystems. In CRYPTO ’97: Proceedings of the 17th Annual In-
ternational Cryptology Conference on Advances in Cryptology, pages
513–525, London, UK, 1997. Springer-Verlag.
[BST00] Arash Baratloo, Navjot Singh, and Timothy Tsai. Transparent Run-
time Defense against Stack Smashing Attacks. In ATEC ’00: Proceed-
ings of the annual conference on USENIX Annual Technical Confer-
ence, pages 251–262, Berkeley, CA, USA, 2000. USENIX Associa-
tion.
[BWWA06] Edson Borin, Cheng Wang, Youfeng Wu, and Guido Araujo. Software-
Based Transparent and Comprehensive Control-Flow Error Detection.
In CGO ’06: Proceedings of the International Symposium on Code
Generation and Optimization, pages 333–345, Washington, DC, USA,
2006. IEEE Computer Society.
[CBB+ 01] Crispin Cowan, Matt Barringer, Steve Beattie, Greg Kroah-Hartman,
Mike Frantzen, and Jamie Lokier. FormatGuard: Automatic Protection
from printf Format String Vulnerabilities. In SSYM’01: Proceedings
of the 10th conference on USENIX Security Symposium, Berkeley, CA,
USA, 2001. USENIX Association.
[CBD+ 99] Crispin Cowan, Steve Beattie, Ryan Finnin Day, Calton Pu, Perry Wa-
gle, and Erik Walthinsen. Protecting Systems from Stack Smashing
Attacks with StackGuard. In Linux Expo, 1999.
110
Bibliography
[CBJW03] Crispin Cowan, Steve Beattie, John Johansen, and Perry Wagle. Point-
guardTM: Protecting Pointers from Buffer Overflow Vulnerabilities. In
SSYM’03: Proceedings of the 12th conference on USENIX Security
Symposium, Berkeley, CA, USA, 2003. USENIX Association.
[CE99] Cristina Cifuentes and Mike Van Emmerik. Recovery of Jump Table
Case Statements from Binary Code. In IWPC ’99: Proceedings of the
7th International Workshop on Program Comprehension, pages 192–
199, Washington, DC, USA, 1999. IEEE Computer Society.
[CHP97] Po-Yung Chang, Eric Hao, and Yale N. Patt. Target Prediction for Indi-
rect Jumps. Proceedings of the 24th Annual International Symposium
on Computer Architecture, 25(2):274–283, 1997.
[CPG+ 06] Encarnacion Castillo, Luis Parrilla, Antonio Garcia, Antonio Loris, and
Uwe Meyer-Baese. IPP Watermarking Technique for IP Core Pro-
tection on FPL Devices. In International Conference on Field Pro-
grammable Logic and Applications, 2006. FPL’06, pages 487–492,
2006.
[CPG+ 08] Encarnacion Castillo, Luis Parrilla, Antonio Garcia, Uwe Meyer-
Baese, Guillermo Botella, and Antonio Lloris. Automated Signature
Insertion in Combinational Logic Patterns for HDL IP Core Protection.
111
Bibliography
[DMW98] J. H. Daniel, D. F. Moore, and J. F. Walker. Focused Ion Beams for Mi-
crofabrication. Engineering Science and Education Journal, 7(2):53–
56, 1998.
[Dob03] Igor Dobrovitski. Exploit for CVS Double free() for Linux
pserver. Neohapsis Archives (http://www.security-express.
com/archives/fulldisclosure/2003-q1/0545.html), 2003.
112
Bibliography
[DSG05] Nij Dorairaj, Eric Shiflet, and Mark Goosman. PlanAhead Software
as a Platform for Partial Reconfiguration. Xilinx XCELL Journal, Art,
55:68–71, 2005.
[EAV+ 06] Úlfar Erlingsson, Martı́n Abadi, Michael Vrable, Mihai Budiu, and
George C. Necula. XFI: Software Guards for System Address Spaces.
In OSDI ’06: Proceedings of the 7th symposium on Operating systems
design and implementation, pages 75–88, Berkeley, CA, USA, 2006.
USENIX Association.
[EL02] David Evans and David Larochelle. Improving Security Using Exten-
sible Lightweight Static Analysis. IEEE Software, 19(1):42–51, 2002.
[ES84] James B. Eifert and John Paul Shen. Processor Monitoring Using Asyn-
chronous Signatured Instruction Streams. In Twenty-Fifth Interna-
tional Symposium on Fault-Tolerant Computing, 1995,’Highlights from
Twenty-Five Years’, Reprinted from FTGS-14 1984, pages 394–399,
1984.
[FKF+ 03] Henry Hanping Feng, Oleg M. Kolesnikov, Prahlad Fogla, Wenke Lee,
and Weibo Gong. Anomaly Detection Using Call Stack Information.
In SP ’03: Proceedings of the 2003 IEEE Symposium on Security and
Privacy, page 62, Washington, DC, USA, 2003. IEEE Computer Soci-
ety.
[FKK96] Alan O. Freier, Philip Karlton, and Paul C. Kocher. The SSL Proto-
col – Version 3.0. URL: http://www.mozilla.org/projects/
security/pki/nss/ssl/draft302.txt, 1996.
113
Bibliography
[GCvDD02] Blaise Gassend, Dwaine Clarke, Marten van Dijk, and Srinivas De-
vadas. Silicon Physical Random Functions. In CCS ’02: Proceedings
of the 9th ACM conference on Computer and communications security,
pages 148–160, New York, NY, USA, 2002. ACM.
[GDWL92] Daniel D. Gajski, Nikil D. Dutt, Allen C.-H. Wu, and Steve Y.-L. Lin.
High-level Synthesis: Introduction to Chip and System Design. Kluwer
Academic Publishers, Norwell, MA, USA, 1992.
[GHJM05] Dan Grossman, Michael Hicks, Trevor Jim, and Greg Morrisett. Cy-
clone: A Type-safe Dialect of C. C/C++ Users Journal, 23(1):112–
139, 2005.
[GKST07] Jorge Guajardo, Sandeep S. Kumar, Geert-Jan Schrijen, and Pim Tuyls.
FPGA Intrinsic PUFs and Their Use for IP Protection. In CHES ’07:
Proceedings of the 9th international workshop on Cryptographic Hard-
ware and Embedded Systems, pages 63–80, Berlin, Heidelberg, 2007.
Springer-Verlag.
[GO98] Anup K. Ghosh and Tom O’Connor. Analyzing Programs for Vulner-
ability to Buffer Overrun Attacks. In Proceedings of the 21st National
114
Bibliography
[HB03] Eric Haugh and Matt Bishop. Testing C Programs for Buffer Overflow
Vulnerabilities. In Proceedings of the Network and Distributed System
Security Symposium, volume 2. Citeseer, 2003.
[HBF07] Daniel E. Holcomb, Wayne P. Burleson, and Kevin Fu. Initial SRAM
State as a Fingerprint and Source of True Random Numbers for RFID
Tags. In Proceedings of the Conference on RFID Security. Citeseer,
2007.
115
Bibliography
[JMG+ 02] Trevor Jim, J. Greg Morrisett, Dan Grossman, Michael W. Hicks,
James Cheney, and Yanling Wang. Cyclone: A Safe Dialect of C. In
ATEC ’02: Proceedings of the General Track of the annual conference
on USENIX Annual Technical Conference, pages 275–288, Berkeley,
CA, USA, 2002. USENIX Association.
[JMKP07] José A. Joao, Onur Mutlu, Hyesoon Kim, and Yale N. Patt. Dynamic
Predication of Indirect Jumps. IEEE Computer Architecture Letter,
6(2):25–28, 2007.
[JYPQ03] Adarsh K. Jain, Lin Yuan, Pushkin R. Pari, and Gang Qu. Zero Over-
head Watermarking Technique for FPGA Designs. In GLSVLSI ’03:
Proceedings of the 13th ACM Great Lakes symposium on VLSI, pages
147–152. ACM Press, 2003.
[KC08] Kris Kaspersky and Alice Chang. Remote Code Execution through In-
tel CPU Bugs. In Hack In The Box (HITB) 2008 Malaysia Conference,
2008.
[KE91] David R. Kaeli and Philip G. Emma. Branch History Table Predic-
tion of Moving Target Branches due to Subroutine Returns. SIGARCH
Computer Architecture News, 19(3):34–42, 1991.
116
Bibliography
117
Bibliography
[KLMS+ 01] Andrew Byun Kahng, John Lach, William Henry Mangione-Smith,
Stefanus Mantik, Igor Leonidovich Markov, Miodrag M. Potkonjak,
Paul Askeland Tucker, Huijuan Wang, and Gregory Wolfe. Constraint-
Based Watermarking Techniques for Design IP Protection. IEEE
Transactions on Computer-Aided Design of Integrated Circuits and
Systems, 20(10):1236–1252, 2001.
[klo99] klog. The Frame Pointer Overwrite. Phrack magazine, 55(9), 1999.
[KMM+ 98] Andrew Byun Kahng, Stefanus Mantik, Igor Leonidovich Markov,
Miodrag M. Potkonjak, Paul Askeland Tucker, Huijuan Wang, and Gre-
gory Wolfe. Robust IP Watermarking Methodologies for Physical De-
sign. In DAC ’98: Proceedings of the 35th annual Design Automation
Conference, pages 782–787, New York, NY, USA, 1998. ACM.
[KMM08] Tom Kean, David McLaren, and Carol Marsh. Verifying the Authen-
ticity of Chip Designs with the DesignTag System. In HOST ’08:
Proceedings of the 2008 IEEE International Workshop on Hardware-
Oriented Security and Trust, pages 59–64, Washington, DC, USA,
2008. IEEE Computer Society.
118
Bibliography
[KR06] Ian Kuon and Jonathan Rose. Measuring the Gap between FPGAs and
ASICs. In FPGA ’06: Proceedings of the 2006 ACM/SIGDA 14th inter-
national symposium on Field programmable gate arrays, pages 21–30,
New York, NY, USA, 2006. ACM.
[LB94] James R. Larus and Thomas Ball. Rewriting Executable Files to Mea-
sure Program Behavior. Software-Practice and Experience, 24(2):197–
218, 1994.
[LBD+ 04] James R. Larus, Thomas Ball, Manuvir Das, Robert DeLine, Manuel
Fahndrich, Jon Pincus, Sriram K. Rajamani, and Ramanathan Venkat-
apathy. Righting Software. IEEE Software, 21(3):92–100, 2004.
[LKMS04] Ruby B. Lee, David K. Karig, John P. McGregor, and Zhijie Shi. En-
listing Hardware Architecture to Thwart Malicious Code Injection. Se-
curity in Pervasive Computing, pages 237–252, 2004.
[LLG+ 04] Jae W. Lee, Daihyun Lim, Blaise Gassend, G. Edward Suh, Marten
Van Dijk, and Srini Devadas. A Technique to Build a Secret Key in
119
Bibliography
[LLG+ 05] Daihyun Lim, Jae W. Lee, Blaise Gassend, G. Edward Suh, Marten
Van Dijk, and Srini Devadas. Extracting Secret Keys from Integrated
Circuits. IEEE Transactions on Very Large Scale Integration Systems,
13(10):1200, 2005.
[MBS07] Albert Meixner, Michael E. Bauer, and Daniel Sorin. Argus: Low-
Cost, Comprehensive Error Detection in Simple Cores. In MICRO ’07:
Proceedings of the 40th Annual IEEE/ACM International Symposium
on Microarchitecture, pages 210–222, Washington, DC, USA, 2007.
IEEE Computer Society.
[MBS08] Albert Meixner, Michael E. Bauer, and Daniel Sorin. Argus: Low-Cost,
Comprehensive Error Detection in Simple Cores. IEEE Micro-Institute
of Electrical and Electronics Engineers, 28(1):52–59, 2008.
[MdR99] Todd C. Miller and Theo de Raadt. strlcpy and strlcat: Consistent,
Safe, String Copy and Concatenation. In ATEC ’99: Proceedings of
the annual conference on USENIX Annual Technical Conference, pages
41–41, Berkeley, CA, USA, 1999. USENIX Association.
120
Bibliography
[MH91] Edgar Michel and Wolfgang Hohl. Concurrent Error Detection using
Watchdog Processors in the Multiprocessor System MEMSY. In Fault-
tolerant computing systems: tests, diagnosis, fault treatment: 5th Inter-
national GI/ITG/GMA Conference, Nürnberg, September 25-27, 1991:
proceedings, page 54. Springer, 1991.
[MHPS96] István Majzik, Wolfgang Hohl, András Pataricza, and Volker Sieh.
Multiprocessor Checking using Watchdog Processors. Computer Sys-
tems Science and Engineering, 11(5):301–310, 1996.
[MKGT92] Ghassem Miremadi, Johan Karlsson, Ulf Gunneflo, and Jan Torin. Two
Software Techniques for On-line Error Detection. In Digest of Papers,
Twenty-Second International Symposium on Fault-Tolerant Comput-
ing. FTCS-22., pages 328–335, 1992.
[MKSL03] John P. McGregor, David K. Karig, Zhijie Shi, and Ruby B. Lee. A Pro-
cessor Architecture Defense against Buffer Overflow Attacks. In Pro-
ceedings of the IEEE International Conference on Information Tech-
nology: Research and Education (ITRE 2003), pages 243–250. Cite-
seer, 2003.
[MLS91] Thierry Michel, Régis Leveugle, and Gabriele Saucier. A New Ap-
proach to Control Flow Checking Without Program Modification. In
Digest of Papers of Twenty-First International Symposium of Fault-
Tolerant Computing, pages 334–343, 1991.
121
Bibliography
[MS91] Henrique Madeira and João G. Silva. On-line Signature Learning and
Checking: Experimental Evaluation. In CompEuro’91: Proceedings of
the 5th Annual European Computer Conference of Advanced Computer
Technology, Reliable Systems and Applications, pages 642–646, 1991.
[MV05] Matt Messier and John Viega. Safe C String Library v1.0.3. URL:
http://www.zork.org/safestr/, 2005.
[MWB+ 10] Matthias May, Norbert Wehn, Abdelmajid Bouajila, Johannes Zeppen-
feld, Walter Stechele, Andreas Herkersdorf, Daniel Ziener, and Jürgen
Teich. A Rapid Prototyping System for Error-Resilient Multi-Processor
Systems-on-Chip. In Proceedings of DATE’10, pages 375–380, March
2010.
[NCH+ 05] George C. Necula, Jeremy Condit, Matthew Harren, Scott McPeak,
and Westley Weimer. CCured: Type-safe Retrofitting of Legacy Soft-
ware. ACM Transactions on Programming Languages and Systems
(TOPLAS), 27(3):477–526, 2005.
122
Bibliography
[PB04] Jonathan Pincus and Brandon Baker. Beyond stack smashing: Recent
advances in exploiting buffer overruns. IEEE Security and Privacy,
02(4):20–27, 2004.
[PMHH93] András Pataricza, István Majzik, Wolfgang Hohl, and Joachim Hönig.
Watchdog Processors in Parallel Systems. Microprocessing and Micro-
programming, 39(2-5):69–74, 1993.
123
Bibliography
[RBD+ 01] Rob A. Rutenbar, Max Baron, Thomas Daniel, Rajeev Jayaraman, Zvi
Or-Bach, Jonathan Rose, and Carl Sechen. (When) will FPGAs kill
ASICs? (panel session). In DAC ’01: Proceedings of the 38th annual
Design Automation Conference, pages 321–322, New York, NY, USA,
2001. ACM.
[RCV+ 05] George A. Reis, Jonathan Chang, Neil Vachharajani, Ram Rangan, and
David I. August. SWIFT: Software Implemented Fault Tolerance. In
CGO ’05: Proceedings of the international symposium on Code gener-
ation and optimization, pages 243–254, Washington, DC, USA, 2005.
IEEE Computer Society.
[Reu] Design & Reuse. Catalyst of Collaborative IP Based SoC Design. URL:
http://www.design-reuse.com/.
[Ric08] Robert Richardson. CSI Computer Crime and Security Survey. Tech-
nical report, 2008.
[RLC+ 07] Eduardo L. Rhod, Calisboa A. Lisbôa, L. Carro, Massimo Violante, and
Matteo Sonza Reorda. A Non-intrusive On-line Control Flow Error De-
tection Technique for SoCs. In IEEE Latin-Americam Test Workshop,
LATW, volume 8, 2007.
[RRKH04] Srivaths Ravi, Anand Raghunathan, Paul Kocher, and Sunil Hattan-
gady. Security in Embedded Systems: Design Challenges. ACM Trans-
action on Embedded Computer Systems, 3(3):461–491, 2004.
[RSA78] Ronald Linn Rivest, Adi Shamir, and Leonard Max Adleman. A
Method for Obtaining Digital Signatures and Public-key Cryptosys-
tems. Communications of the ACM, 21(2):120–126, 1978.
124
Bibliography
[SAH] Bassel Soudan, Wael Adi, and Abdulrahman Hanoun. Enabling Se-
cure Integration of Multiple IP Cores in the Same FPGA. D&R Indus-
try Articles. URL: http://www.design-reuse.com/articles/
21638/secure-integration-ip-cores-fpga.html.
[SBE+ 07a] Walter Stechele, Oliver Bringmann, Rolf Ernst, Andreas Herkersdorf,
Katharina Hojenski, Peter Janacik, Franz Rammig, Jürgen Teich, Nor-
bert Wehn, Johannes Zeppenfeld, and Daniel Ziener. Autonomic MP-
SoCs for Reliable Systems. In Proceedings of Zuverlässigkeit und En-
twurf (ZuD 2007), pages 137–138, Munich, Germany, March 2007.
[SBE+ 07b] Walter Stechele, Oliver Bringmann, Rolf Ernst, Andreas Herkersdorf,
Katharina Hojenski, Peter Janacik, Franz Rammig, Jürgen Teich, Nor-
bert Wehn, Johannes Zeppenfeld, and Daniel Ziener. Concepts for Au-
tonomic Integrated Systems. In Proceedings of edaWorkshop07, Mu-
nich, Germany, June 2007.
[SFF+ 02] Oliverio J. Santana, Ayose Falcón, Enrique Fernández, Pedro Medina,
Alex Ramı́rez, and Mateo Valero. A Comprehensive Analysis of Indi-
rect Branch Prediction. In ISHPC ’02: Proceedings of the 4th Interna-
tional Symposium on High Performance Computing, pages 133–145,
London, UK, 2002. Springer-Verlag.
125
Bibliography
[SPA] SPARC International, Inc. The SPARC Architecture Manual V8. URL:
http://www.sparc.com/standards/V8.pdf .
[SS87] Michael A. Schuette and John Paul Shen. Processor Control Flow
Monitoring using Signatured Instruction Streams. IEEE Transaction
on Computers, 36(3):264–277, 1987.
[SSK07] Debasri Saha and Susmita Sur-Kolay. Fast Robust Intellectual Property
Protection for VLSI Physical Design. In ICIT ’07: Proceedings of the
10th International Conference on Information Technology, pages 1–6,
Washington, DC, USA, 2007. IEEE Computer Society.
[ST87] John Paul Shen and Stephen P. Tomas. A Roving Monitoring Processor
for Detection of Control Flow Errors in Multiple Processor Systems.
Microprocessing and Microprogramming, 20(4-5):249–269, 1987.
[SXZ+ 04] Zili Shao, Chun Xue, Qingfeng Zhuge, Edwin Hsing Mean Sha, and
Bin Xiao. Security Protection and Checking in Embedded System In-
tegration Against Buffer Overflow Attacks. In ITCC ’04: Proceedings
of the International Conference on Information Technology: Coding
and Computing (ITCC’04) Volume 2, pages 409–412, Washington, DC,
USA, 2004. IEEE Computer Society.
[SZHS03] Zili Shao, Qingfeng Zhuge, Yi He, and Edwin Hsing Mean Sha. De-
fending Embedded Systems Against Buffer Overflow via Hardware/-
Software. In ACSAC ’03: Proceedings of the 19th Annual Computer
126
Bibliography
[WD01] David Wagner and Drew Dean. Intrusion Detection via Static Anal-
ysis. In SP ’01: Proceedings of the 2001 IEEE Symposium on Secu-
rity and Privacy, pages 156–169, Washington, DC, USA, 2001. IEEE
Computer Society.
[WFBA00] David Wagner, Jeffrey S. Foster, Eric A. Brewer, and Alexander Aiken.
A First Step towards Automated Detection of Buffer Overrun Vulnera-
bilities. In Network and Distributed System Security Symposium, pages
3–17, 2000.
127
Bibliography
[WS90] Kent Wilken and John Paul Shen. Continuous Signature Monitoring:
Low-cost Concurrent Detection of Processor Control Errors. IEEE
Transactions on Computer-Aided Design of Integrated Circuits and
Systems, 9(6):629–641, 1990.
[Xila] Xilinx Inc. FPGA IFF Copy Protection Using Dallas Semi-
conductor/Maxim DS2432 Secure EEPROMs. URL: http:
//www.xilinx.com/support/documentation/application_
notes/xapp780.pdf .
[Xilb] Xilinx Inc. ISE Design Suite Software Manuals and Help - PDF
Collection These. URL: http://www.xilinx.com/support/
documentation/sw_manuals/xilinx11/manuals.pdf .
[Xilc] Xilinx Inc. JBits 3.0 SDK for Virtex-II. URL: www.xilinx.com/
labs/projects/jbits/.
[Xilf] Xilinx Inc. Virtex-II Platform FPGAs: Complete Data Sheet. URL:
http://www.xilinx.com/support/documentation/data_
sheets/ds031.pdf .
[Xil03] Xilinx Inc. Next-Generation Virtex Family From Xilinx to top one Bil-
lion Transistor Mark. URL: http://www.xilinx.com/prs_rls/
silicon_vir/03131_nextgen.htm, 2003.
128
Bibliography
[XKPI02] Jun Xu, Zbigniew Kalbarczyk, Sanjay Patel, and Ravishankar K. Iyer.
Architecture Support for Defending against Buffer Overflow Attacks.
In Workshop on Evaluating and Architecting Systems for Dependabil-
ity, 2002.
[ZAT06] Daniel Ziener, Stefan Aßmus, and Jürgen Teich. Identifying FPGA
IP-Cores based on Lookup Table Content Analysis. In Proceedings
of 16th International Conference on Field Programmable Logic and
Applications (FPL 2006), pages 481–486, Madrid, Spain, August 2006.
[Zim95] Philip R. Zimmermann. The official PGP user’s guide. MIT Press,
Cambridge, MA, USA, 1995.
[ZT06] Daniel Ziener and Jürgen Teich. FPGA Core Watermarking Based on
Power Signature Analysis. In Proceedings of IEEE International Con-
ference on Field-Programmable Technology (FPT 2006), pages 205–
212, Bangkok, Thailand, December 2006.
[ZT08a] Daniel Ziener and Jürgen Teich. Concepts for Autonomous Control
Flow Checking for Embedded CPUs. In Proceedings of the 5th Interna-
tional Conference on Autonomic and Trusted Computing (ATC 2008),
pages 234–248, Oslo, Norway, June 2008.
129
Bibliography
[ZT09] Daniel Ziener and Jürgen Teich. Concepts for Run-time and Error-
resilient Control Flow Checking of Embedded RISC CPUs. Int. Jour-
nal of Autonomous and Adaptive Communications Systems, 2(3):256–
275, July 2009.
[ZZPL04] Tao Zhang, Xiaotong Zhuang, Santosh Pande, and Wenke Lee. Hard-
ware Supported Anomaly Detection: Down to the Control Flow Level.
Technical report, Georgia Institute of Technology, 2004.
[ZZPL05] Tao Zhang, Xiaotong Zhuang, Santosh Pande, and Wenke Lee. Anoma-
lous Path Detection with Hardware Support. In CASES ’05: Proceed-
ings of the 2005 international conference on Compilers, architectures
and synthesis for embedded systems, pages 43–54, New York, NY,
USA, 2005. ACM.
130