Data Storage
Data Storage
Data Storage
Data Storage
Edited by
Prof. Florin Balasa
In-Tech
intechweb.org
Published by In-Teh
In-Teh
Olajnica 19/2, 32000 Vukovar, Croatia
Abstracting and non-profit use of the material is permitted with credit to the source. Statements and
opinions expressed in the chapters are these of the individual contributors and not necessarily those of
the editors or publisher. No responsibility is accepted for the accuracy of information contained in the
published articles. Publisher assumes no responsibility liability for any damage or injury to persons or
property arising out of the use of any materials, instructions, methods or ideas contained inside. After
this work has been published by the In-Teh, authors have the right to republish it, in whole or part, in any
publication of which they are an author or editor, and the make other personal use of the work.
© 2010 In-teh
www.intechweb.org
Additional copies can be obtained from:
[email protected]
Data Storage,
Edited by Prof. Florin Balasa
p. cm.
ISBN 978-953-307-063-6
V
Preface
Many different forms of storage, based on various natural phenomena, has been invented.
So far, no practical universal storage medium exists, and all forms of storage have some
drawbacks. Therefore, a computer system usually contains several kinds of storage, each with
an individual purpose.
Traditionally, the most important part of a digital computer is the central processing unit
(CPU or, simply, a processor), because it actually operates on data, performs calculations, and
controls all the other components. Without a memory, a computer would merely be able to
perform fixed operations and immediately output the result. It would have to be reconfigured
to change its behavior. This is acceptable for devices such as desk calculators or simple digital
signal processors. Von Neumann machines differ in that they have a memory in which they
store their operating instructions and data. Such computers are more versatile in that they
do not need to have their hardware reconfigured for each new program, but can simply be
reprogrammed with new in-memory instructions. Most modern computers are von Neumann
machines.
In practice, almost all computers use a variety of memory types, organized in a storage
hierarchy around the CPU, as a trade-off between performance and cost. Generally, the lower
a storage is in the hierarchy, the lesser its bandwidth (the amount of transferred data per time
unit) and the greater its latency (the time to access a particular storage location) is from the
CPU. This traditional division of storage to primary, secondary, tertiary and off-line storage
is also guided by cost per bit.
Primary storage (or main memory, or internal memory), often referred to simply as memory,
is the only one directly accessible to the CPU. The CPU continuously reads instructions stored
there and executes them as required. Any data actively operated on is also stored there in
uniform manner.
As the random-access memory (RAM) types used for primary storage are volatile (i.e., they
lose the information when not powered), a computer containing only such storage would not
have a source to read instructions from, in order to start the computer. Hence, non-volatile
primary storage containing a small startup program is used to bootstrap the computer, that is,
to read a larger program from non-volatile secondary storage to RAM and start to execute it.
Secondary storage (or external memory) differs from primary storage in that it is not directly
accessible by the CPU. The computer usually uses its input/output channels to access
secondary storage and transfers the desired data using intermediate area in the primary
storage. The secondary storage does not lose the data when the device is powered down – it is
VI
non-volatile. Per unit, it is typically also two orders of magnitude less expensive than primary
storage. In modern computers, hard disk drives are usually used as secondary storage. The
time taken to access a given byte of information stored on a hard disk is typically a few
milliseconds. By contrast, the time taken to access a given byte of information stored in a
RAM is measured in nanoseconds. This illustrates the very significant access-time difference
which distinguishes solid-state memory from rotating magnetic storage devices: hard disks
are typically about a million times slower than memory. Rotating optical storage devices, such
as CD and DVD drives, have even longer access times. Some other examples of secondary
storage technologies are: flash memory (e.g. USB flash drives), floppy disks, magnetic tape,
paper tape, punched cards, stand-alone RAM disks, and Zip drives.
Tertiary storage or tertiary memory provides a third level of storage. Typically it involves
a robotic mechanism which will mount (insert) and dismount removable mass storage
media into a storage device according to the system’s demands; this data is often copied to
secondary storage before use. It is primarily used for archival of rarely accessed information
since it is much slower than secondary storage (e.g. tens of seconds). This is primarily useful
for extraordinarily large data stores, accessed without human operators. Typical examples
include tape libraries and optical jukeboxes.
Off-line storage is a computer data storage on a medium or a device that is not under the
control of a processing unit. In modern personal computers, most secondary and tertiary
storage media are also used for off-line storage. Optical discs and flash memory devices are
most popular, while in enterprise uses, magnetic tape is predominant. This book presents
several advances in different research areas related to data storage, from the design of a
hierarchical memory subsystem in embedded signal processing systems for data-intensive
applications, to data representation in flash memories, to the data recording and retrieval
in conventional optical data storage systems and the more recent holographic systems, to
applications in medicine requiring massive image databases.
In optical storage systems, sensitive stored patterns can cause failure in data retrieval and
decrease the system reliability. Modulation codes play the role of shaping the characteristics
of stored data patterns. In conventional optical data storage systems, information is recorded
in a one-dimensional spiral stream. The major concern of modulation codes for these systems
is to separate the binary 1’s by a number of binary 0’s.
The holographic data storage systems are regarded as the next-generation optical data
storage due to an extremely high capacity and ultra-fast transfer rate. In holographic systems,
information is stored as pixels on two-dimensional pages. Different from the conventional
optical data storage, the additional dimension inevitably brings new considerations to the
design of modulation codes. The primary concern is that interferences between pixels are
omni-directional. Moreover, interferences between pixels are imbalanced: since pixels carry
different intensities to represent different bits of information, pixels with higher intensities
tend to corrupt the signal fidelity of pixels with lower intensities more than the other way
around.
Chapter 1 analyzes different modulation codes for optical data storage. It first addresses types
of constraints of modulation codes. Afterwards, the one-dimensional modulation codes,
adopted in the current optical storage systems (like EFM for CD’s, EFMPlus for DVD’s, 17PP
for Blu-ray discs), are presented. Next, the chapter focuses on two-dimensional modulation
codes for holographic data storage systems – the block codes and the strip codes. It further
VII
charge level than desired) while programming cells, and also to tolerate better asymmetric
shifts of the cells’ charge levels.
Chapter 5 deals with data compression which is becoming an essential component of
high-speed data communication and storage. Lossless data compression is the process of
encoding (“compressing”) a body of data into a smaller body of data, which can, at a later
time, be uniquely decoded (“decompressed”) back to the original data. In contrast, lossy data
compression yields by decompression only some approximation of the original data. Several
lossless data compression techniques have been proposed in the past – starting with Huffman
code. This chapter focuses on a more recent lossless compression approach – the Lempel-Ziv
(LZ) algorithm – whose princi- ple is to find the longest match between a recently received
string stored in the input buffer and an incoming string; once the match is located, the
incoming string is represented with a position tag and a length variable, linking it to the old
existing one, thus achieving a more concise representation than the input data. The chapter
presents an area- and speed-efficient systolic array implementation of the LZ compression
algorithm. The systolic array can operate at a higher clock rate than other architectures (due
to the nearest-neighbor communication) and can be easily implemented and tested due to
regularity and homogeneity.
Although the CPU performance has considerably improved, the speed of file system
management of huge information is considered as the main factor that affects computer system
performance. The I/O bandwidth is limited by magnetic disks whose rotation speed and seek
time has improved very slowly (although the capacity and cost per megabyte has increased
much faster). Chapter 6 discusses problems related to the reliability of a redundant array of
independent disks (RAID) system, which is a generally used solution since 1988 referring to
the parallel access of data separated on several disks. A RAID system can be configured in
various ways to get a fair compromise between data access speed, system reliability, and size
of storage. The general trade-off is to increase data access speed by writing the data into more
places, hence increasing the amount of storage. On the other hand, more disks cause a lower
reliability: this, together with the data redundancy, cause a need for additional algorithms
to enhance the reliability of valuable data. The chapter presents recent solutions for the use
of Reed- Solomon code in a RAID system in order to correct single random error and double
erasures. The proposed RAID system is expandable in terms of correction capabilities and
presents an integrated concept: the modules at the disk level mainly deal with burst or random
errors in disks, while the control level does the corrections for multiple failures of the system.
A grid is a collection of computers and storage resources, addressing collaboration, data
sharing, and other patterns of interaction that involve distributed resources. Since the grid
typically consists nowadays of hundreds or even thousands of storage and computing nodes,
key challenges faced by high-performance storage systems are manageability, scalable
administration, and monitoring of system state. Chapter 7 presents a scalable distributed
system consisting of an administration module that manages virtual storage resources
according to their workloads, based on the information collected by a monitoring module.
Since the major concern in a conventional data storage distributed system is the data access
performance, the focus was on static performance parameters not related to the nodes load.
In contrast, the current system is designed based on the principle that the whole system’s
performance is affected not only by the behavior of each individual application, but also by
the execution of different applications combined together. The new system takes into account
the utilization percentage of system resources, like CPU load, disk load, network load. It
IX
offers a flexible and simple model that collects information on the nodes’ state and uses its
monitoring knowledge together with a prediction model to efficiently place data during
runtime execution in order to improve the overall data access performance.
Many multidimensional signal processing systems, particularly in the areas of multimedia
and telecommunications, are synthesized to execute data-intensive applications, the data
transfer and storage having a significant impact on both the system performance and the
major cost parameters – power and area. In particular, the memory subsystem is, typically,
a major contributor to the overall energy budget of the entire system. The dynamic energy
consumption is caused by memory accesses, whereas the static energy consumption is due
to leakage currents. Savings of dynamic energy can be potentially obtained by accessing
frequently used data from smaller on-chip memories rather than from the large off-chip main
memory, the problem being how to optimally assign the data to the memory layers. As on-
chip storage, the scratch-pad memories (SPM’s) – compiler-controlled static RAM’s, more
energy-efficient than the hardware-managed caches – are widely used in embedded systems,
where caches incur a significant penalty in aspects like area cost, energy consumption, and hit
latency. Chapter 8 presents a power-aware memory allocation methodology. Starting from
the high-level behavioral specification of a given application, where the code is organized
in sequences of loop nests and the main data structures are multidimensional arrays, this
framework performs the assignment of the multidimensional signals to the memory
layers – the on-chip scratch-pad memory and the off-chip main memory – the goal being
the reduction of the dynamic energy consumption in the memory subsystem. Based on the
assignment results, the framework subsequently performs the mapping of signals into the
memory layers such that the overall amount of data storage be reduced. This software system
yields a complete allocation solution: the exact storage amount on each memory layer, the
mapping functions that determine the exact locations for any array element (scalar signal)
in the specification, metrics of quality for the allocation solution, and also an estimation of
the dynamic energy consumption in the memory subsystem using the CACTI power model.
Network-on-a-Chip (NoC) is a new approach to design the communication subsystem
of System-on-a-Chip (SoC) – paradigm referring to the integration of all components of a
computer or other electronic system into a single integrated circuit, implementing digital,
analog, mixed-signal, and sometimes radio-frequency functions, on a single chip substrate.
In a NoC system, modules such as processor cores, memories, and specialized intellectual
property (IP) blocks exchange data using a network that brings notable improvements over
the conventional bus system. An NoC is constructed from multiple point-to-point data links
interconnected by temporary storage elements called switches or routers, such that messages
can be relayed from any source module to any destination module over several links, by
making routing decisions at the switches. The wires in the links of NoC are shared by many
signals, different from the traditional integrated circuits that have been designed with
dedicated point-to-point connections – with one wire dedicated to each signal. In this way,
a high level of parallelism is achieved, because all links in NoC can operate simultaneously
on different data packets. As the complexity of integrated circuits keeps growing, NoC
provides enhanced throughput and scalability in comparison with previous communication
architectures (dedicated signal wires, shared buses, segmented buses with bridges), reducing
also the dynamic power dissipation (as signal propagation in wires across the chip require
multiple clock cycles).
X
Chapter 9 discusses different trade-offs in the design of efficient NoC, including both
elements of the network – interconnects and storage elements. This chapter introduces a high-
throughput architecture, which is applied to different NoC topologies: butterfly fat-free (BFT),
a mesh interconnect topology called CLICH´E , Octagon, and SPIN. A power dissipation
analysis for all these high-throughput architectures is presented, followed by a discussion on
the throughput improvement and an overhead analysis.
Data acquisition and data storage are requirements for many applications in the area of sensor
networks. Chapter 10 discusses different protocols for interfacing smart sensors and mobile
devices with non-volatile memories.
A “smart sensor” is a transducer that provides functions to generate a correct representation
of a sensed or controlled quantity. Networks of smart sensors have an inbuilt ability to sense
and process information, and also to send selected information to external receivers (including
other sensors). The nodes of such networks – the smart sensors – require memory capabilities
in order to store data, either temporarily or permanently. The non-volatile memories (whose
content need not be periodically refreshed) used in the architecture of smart sensors include
all forms of read-only memories (ROM’s) – such as programmable read-only memories
(PROM’s), erasable programmable read-only memories (EPROM’s), electrically erasable
programmable read-only memories (EEPROM’s) – and flash memories. Sometimes, random
access memories powered by batteries are used as well.
The communication between the sensor processing unit and the non-volatile memory units is
done using different communications protocols. For instance, the 1-wire interface protocol –
developed by Dallas Semiconductor – permits digital communications through twisted-pair
cables and 1-wire components over a 1-wire network; the 2-wire interface protocol – developed
by Philips – performs communication functions between intelligent control devices, general-
purpose (including memories) and application-specific circuits along a bi-directional 2-wire
bus (called the Inter-Integrated Circuit or the I2C-bus).
The chapter also presents memory interface protocols for mobile devices, like the CompactFlash
(CF) memory protocol – introduced by SanDisk Corporation – used in flash memory card
applications where small form factor, low-power dissipation, and ease-of-design are crucial
considerations; or the Secure Digital (SD) memory protocol – the result of a collaboration
between Toshiba, SanDisk, and MEI (Panasonic) – specifically designed to meet the security,
capacity, performance, and environment requirements inherent to the newly-emerging audio
and video consumer electronic devices, as well as smart sensing networks (that include smart
mobile devices, such as smart phones and PDA’s).
The next chapters present complex applications from different fields where data storage plays
a significant part in the system implementation.
Chapter 11 presents a complex application of an array of sensors, called the electronic nose.
This system is used for gas analysis when exposed to a gas mixture and water vapour in
a wide range of temperatures. The large amount of data collected by the electronic nose is
able to provide high-level information, to make characterizations of different gases. The
electronic nose utilizes an efficient and affordable sensor array that shows autonomous and
intelligent capabilities. An application of this system is the separation of butane and propane
– (normally) gaseous hydrocarbons derived from natural gas or refinery gas streams. A neural
network with feed forward back propagation training algorithm is used to detect each gas
XI
with different sensors. Fuzzy logic is also used because it enhances discrimination techniques
among sensed gases. A fuzzy controller is used to detect the concentration of each gas in the
mixture based on three parameters: temperature, output voltage of the microcontroller, and
the variable resistance related to each sensor.
With the steady progress of the technology of digital imaging and the rapid increase of
the data storage capacity, the capability to manipulate massive image databases over the
Internet is used nowadays both in clinical medicine – to assist diagnosis – and in medical
research and teaching. The traditional image has been replaced by Computer Radiography
or Digital Radiography derived imagery, Computed Tomography, Magnetic Resonance
Imaging (MRI), and Digital Subtraction Angiography. This advance reduces the space and
cost associated with X-ray films, speeds up hospital management procedures, improving
also the quality of the medical imagery-based diagnosis. Chapter 12 presents a complex
system where a grid-distributed visual medical knowledge and medical image database was
integrated with radiology reports and clinical information on patients in order to support
the diagnostic decision making, therapeutic strategies, and to reduce the human, legal, and
financing consequences of medical errors. This system started being used on the treatment of
dementia – a brain generative disease or a group of symptoms caused by various diseases and
conditions – to establish a prototype system integrating content-based image retrieval (CBIR)
and clinical information from electronic medical records (EMR).
Chapter 13 describes a data storage application from the food industry. The food legislation
in the European Union imposes strict regulations on food traceability in all the stages of
production, transformation, and distribution. The back or suppliers traceability refers to
the capability of having knowledge of the products that enter a food enterprise and their
origin and suppliers. The internal or process traceability refers to the information about what
is made, how and when it is made, and the identification of the product. The forward or
client traceability refers to the capability of knowing the products delivered, when and to
whom they have been supplied. The chapter describes a traceability system called the radio-
frequency identification (RFID). This is a contactless method for data transfer, carried out
through electromagnetic waves. RFID tags (transponders), consisting of a microchip and an
antenna, represent the physical support for storing the information required to perform a
complete traceability of the products, as well as facilitate the collection of data required by
the technical experts of regulatory boards in charge of quality certification. The RFID read/
write device creates a weak electromagnetic field. When a RFID tag passes through this field,
the microchip of the transponder wakes up and can send/receive data without any contact
to the reader. The communication gets interrupted when the tag leaves the field, but the data
on the tag remains stored.
Contents
Preface V
6. Design of Simple and High Speed Scheme to Protect Mass Storages 093
Ming-Haw Jing, Yan-Haw Chen, Zih-Heng Chen,
Jian-Hong Chen and Yaotsu Chang
X1
1. Introduction
In optical storage systems, sensitive stored patterns can cause failure in data retrieval and
decrease the system reliability. Modulation codes play the role of shaping the characteristics
of stored data patterns in optical storage systems. Among various optical storage systems,
holographic data storage is regarded as a promising candidate for next-generation optical
data storage due to its extremely high capacity and ultra-fast data transfer rate. In this
chapter we will cover modulation codes for optical data storage, especially on those
designed for holographic data storage.
In conventional optical data storage systems, information is recorded in a one-dimensional
spiral stream. The major concern of modulation codes for these optical data storage systems
is to separate binary ones by a number of binary zeroes, i.e., run-length-limited codes.
Examples are the eight-to-fourteen modulation (EFM) for CD (Immink et al., 1985), EFMPlus
for DVD (Immink, 1997), and 17 parity preserve-prohibit repeated minimum run-length
transition (17PP) for Blu-ray disc (Blu-ray Disc Association, 2006). Setting constraint on
minimum and maximum runs of binary zeros results in several advantages, including
increased data density, improved time recovery and gain control and depressed interference
between bits.
In holographic data storage systems, information is stored as pixels on two-dimensional (2-
D) pages. Different from conventional optical data storage, the additional dimension
inevitably brings new consideration to the design of modulation codes. The primary concern
is that interferences between pixels are omni-directional. Besides, since pixels carry different
intensities to represent different information bits, pixels with higher intensities intrinsically
corrupt the signal fidelity of those with lower intensities more than the other way around,
i.e., interferences among pixels are imbalanced. In addition to preventing vulnerable
patterns suffering from possible interferences, some modulation codes also focus on decoder
complexity, and yet others focus on achieving high code rate. It is desirable to consider all
aspects but trade-off is matter-of-course. Different priorities in design consideration result in
various modulation codes.
In this chapter, we will first introduce several modulation code constraints. Next, one-
dimensional modulation codes adopted in prevalent optical data storage systems are
discussed. Then we turn to the modulation codes designed for holographic data storage.
These modulation codes are classified according to the coding methods, i.e., block codes vs.
strip codes. For block codes, code blocks are independently produced and then tiled to form a
2 Data Storage
whole page. This guarantees a one-to-one relationship between the information bits and the
associated code blocks. On the contrary, strip codes produce code blocks by considering the
current group of information bits as well as other code blocks. This type of coding
complicates the encoding procedure but can ensure that the constraints be satisfied across
block boundary. We will further discuss variable-length modulation codes, which is a
contrast to fixed-length modulation codes. Variable-length modulation codes have more
freedom in the design of code blocks. With a given code rate, variable-length modulation
codes can provide better modulated pages when compared to fixed-length modulation
codes. However, variable-length modulation codes can suffer from the error propagation
problem where a decoding error of one code block can lead to several ensuing decoding
errors.
2. Constraints
Generally speaking, constraints of modulation codes are designed according to the channel
characteristics of the storage system. There are also other considerations such as decoder
complexity and code rate. In conventional optical data storage systems, information carried
in a binary data stream is recorded by creating marks on the disk with variable lengths and
spaces between them. On the other hand, information is stored in 2-D data pages consisting
of ON pixels and OFF pixels in holographic data storage system. The holographic
modulation codes encode one-dimensional information streams into 2-D code blocks. The
modulated pages created by tiling code blocks comply with certain constraints, aiming at
reducing the risk of corrupting signal fidelity during writing and retrieving processes. Due
to the additional dimension, other considerations are required when designing constraints
for holographic modulation codes. In the following some commonly adopted constraints are
introduced.
Fig. 1. Illustration of the 2-D run-length limited constraint based on 2-D spatial distance with
lower and upper bounds of two and four, respectively.
, , etc.
Table 1. Low-pass constraints and examples of forbidden patterns.
2.5 Summary
We have introduced four types of constraints for modulation codes commonly adopted in
optical data storage: run-length limited, conservative, low-pass and constant-weight
constraints. The run-length limited constraint focuses on the density of ON pixels. The
conservative constraint considers the frequency of transitions between ON and OFF pixels.
The low-pass constraint helps avoid those patterns vulnerable to inter-pixel interference
effects in the holographic data storage systems. As for the constant-weight constraint, it
enables a simple decoding scheme by sorting pixel intensities. Sparse modulation codes
further decrease the probability of vulnerable patterns and increase the number of
recordable pages.
Fig. 3. Actual patterns recorded on the disc through a change from the NRZ format to the
NRZI format.
Fig. 4. Illustration of DC control in the 17PP modulation code (Blu-ray Disc Association,
2006).
code design is a trade-off among higher code rate, simpler decoder and satisfactory BER
performance.
Block codes indeed provide a simple mapping in encoding and decoding. However, the risk
of violating constraints becomes significant when tiling multiple blocks together since illegal
patterns may occur across the boundary of neighboring blocks. To circumvent this problem,
additional constraints may be required. Unfortunately, more constraints can eliminate some
patterns that would have been legitimate. To maintain the code rate, a larger block size is
called for.
Fig. 6. (a) A code block in the 5:9 block code and (b) a code block in the 6:9 block code.
where Fn is the set of all binary column vectors and En is the subset of Fn consisting of all
even-weight vectors; σ(x) is the 1-bit cyclic downward shift of a column vector x. Given two
sets of codewords, C1 and C2, produced by two (t-2)-error correcting codes of length m and
n, respectively, two sets of column vectors {a1, a2,…, an} and {b1, b2, …, bm} are obtained by
inverse-mapping of φ-1(C1) and φ-1(C2). A theorem in (Vardy et al., 1996) states that for any
x ai
vector x in Fn, if has less than t transitions, x a j will have has at least t transitions
for all j≠i.
Next one constructs matrices A1,A2,An and B1,B2,Bm all with the size of m×n, from the
aforementioned sets of column vectors, {a0, a1,…, an} and {b0, b1, …, bm}. Ai is formed by
tiling ai n times horizontally while Bj is formed by tiling transposed bj m times vertically.
A i a i a i a i , i 1, ..., n (2)
b Tj
T
b
B j j , j 1, ..., m (3)
T
b j
Finally, a modulation block is obtained by component-wise exclusive-OR of an Ai and a Bj.
On the total, there are (m+1)×(n+1) such modulation blocks. With such modulation block
construction, it can be shown that pixel-wise exclusive-OR operation of an input
information block with a modulation block yields an modulated block that satisfies the
conservative constraint with strength t. To indicate which modulation block is applied,
several bits are inserted as an extra row or column to each recorded page.
Fig. 8. An example of 8×8 block with three all-ON sub-blocks (Malki et al., 2008).
In (Malki et al., 2008), a fixed number of all-ON sub-blocks are present in each code block.
This scheme lets the proposed code comply with the constant-weight constraint
automatically. An example of three 2×2 all-ON sub-blocks is shown in Fig. 8. For an 8×8
block without any constraints, there are (8-1)×(8-1)=49 possible positions for each sub-block,
10 Data Storage
denoted by filled circles. The value of dmin determines the available number of legal code
blocks.
modulation code, but with higher code rate by relaxing the balanced constraint to allow
certain degree of codeword imbalance. The 9:12 pseudo-balanced code maps information
bits to codewords according to a trellis with eight diverging and eight merging states. Each
branch represents a set of 64 pseudo-balanced 12-bit codewords with minimum Hamming
distance of four. These codewords are divided into 16 groups. The most-significant three
information bits of the current data word and the most-significant three bits of the previous
data word indicate the states in the previous stage and the current stage, respectively. The
codeword set is determined uniquely by these two states. The least-significant six bits of the
current information word are used to select a codeword within the set.
The decoding is similar to Viterbi decoder for trellis codes (Proakis, 2001). First, the best
codeword with respect to the acquired codeword for each codeword set is found. The index
of that best codeword and the associate distance are recorded. Then the Viterbi algorithm
finds the best state sequence using those aforementioned distances as branch metrics. In
(Hwang et al., 2002), it is shown that the 9:12 pseudo-balanced code, though with a higher
code rate, provides similar performance as the 8:12 balanced strip code.
00 10 01 11 00 11 10 01 00 11
Code block
00 10 01 11 10 01 00 11 11 00
Information
000 001 010 011 100 100 101 101 110 111
bits
Table 3. Decoder for the third constraint in Table 1 (Ashley & Marcus, 1998).
Besides, the high-frequency patterns may not only appear within strips but also across
strips. Strip constraint is therefore very important when we require a low-pass property over
the whole modulated page. Below is an example of the strip constraint applying to a strip
code that satisfies the low-pass constraint. Given a forbidden pattern in Table 1 with height
L, a strip constraint banning the top/bottom m top/bottom rows of this forbidden pattern to
appear in the top/bottom of the current strip, where m is between L/2 and L. With this extra
strip constraint, it is guaranteed that the low-pass constraint will be satisfied across strip
boundary. Table 2 and Table 3 are an example of encoder/decoder designed for the third
12 Data Storage
constraint in Table 1. Using a finite state machine with four states, the encoder transforms
three information bits into a 2×2 code block based on the current state. The code block and
the next state are designated in the Table 2. The three information bits are easily decoded
from the single retrieved block. The mapping manner is shown in Table 3.
7. Conclusion
In this chapter, modulation codes for optical data storage have been discussed. At first, four
types of constraints are introduced, including run-length limited, conservative, low-pass
and constant-weight constraints. Since high spatial frequency components tend to be
attenuated during recording/reading procedures and long runs of OFF pixels increase
difficulty in tracking, the former three types of constraints are proposed to avoid these
adverse situations as much as possible. On the other hand, the constant-weight constraint
gives modulated pages that are easier to decode. In addition, experiments indicate that
better performance can be obtained for modulation codes that have sparse weight.
Based on the constraints, several modulation codes are discussed. The one-dimensional
modulation codes adopted in current optical storage systems, i.e., EFM for CD, EFMPlus for
DVD and 17PP for Blu-ray disc, are first introduced. All of these modulation codes are
developed for the run-length limited constraint.
Next, we focus on 2-D modulation codes for holographic data storage systems. They are
classified into block codes and strip codes. Information bits and code blocks have a one-to-
one relationship in block codes whose encoder/decoder can be simply realized by look-up
tables. However, block codes cannot guarantee that patterns across block borders comply
with the required constraints. This shortcoming can be circumvented by strip codes, which
produce code blocks based on not only the input information bits but also neighboring
modulated blocks. A finite state machine and a Viterbi decoder are typical schemes for the
encoding and decoding of the strip codes, respectively.
Variable-length modulation codes, in contrast to fixed-length modulation codes, do not fix
the number of input information bits or the code block size. The relaxed design increases the
number of legal patterns and provides better performance than the fixed-length modulation
codes with the same code rate. However, error propagation problems necessitate a more
elaborated decoder scheme.
Finally, comparisons among different types of modulation codes introduced in this chapter
are listed in Table 4 and Table 5.
14 Data Storage
Fixed-length Variable-length
Freedom of choosing legal pattern Low High
Error propagation problem during decoding No Yes
BER performance with the same code rate Poor Good
Code rate with the same BER performance Low High
Table 5. Comparison between fixed-length and variable-length modulation codes
8. References
Ashley, J. J. & Marcus, B. H. (1998). Two-dimensional low-pass filtering codes. IEEE Trans.
on Comm., Vol. 46, No. 6, pp. 724-727, ISSN 0090-6778
Blaum, M.; Siegel, P. H., Sincerbox, G. T., & Vardy, A. (1996). Method and apparatus for
modulation of multi-dimensional data in holographic storage. US patent (Apr. 1996)
5,510,912
Blu-ray Disc Association (2006). White paper Blu-ray disc format: 1. A physical format
specifications for BD-RE, 2nd edition
Burr, G. W.; Ashley, J., Coufal, H., Grygier, R. K., Hoffnagle, J. A., Jefferson, C. M., &
Marcus,B. (1997). Modulation coding for pixel-matched holographic data storage.
Optics Letters, Vol. 22, No. 9, pp. 639-641, ISSN 0146-9592
Burr, G. W. & Marcus, B. (1999). Coding tradeoffs for high density holographic data storage.
Proceedings of the SPIE, Vol. 3802, pp. 18–29, ISSN 0277-786X
Coufal, H. J.; Psaltis, D. & Sincerbox, G. T. (Eds.). (2000). Holographic data storage, Springer-
Verlag, ISBN 3-540-66691-5, New York
Chen, C.-Y. & Chiueh, T.-D. A low-complexity high-performance modulation code for
holographic data storage, Proceedings of IEEE International Conference on Electronics,
Circuits and Systems, pp. 788-791, ISBN 978-1-4244-1377-5, Dec. 2007, Morocco
Daiber, A. J.; McLeod, R. R. & Snyder, R. (2003). Sparse modulation codes for holographic
data storage. US patent (Apr. 2003) 6,549,664 B1
Hwang, E.; Kim, K., Kim, J., Park, J. & Jung, H. (2002). A new efficient error correctible
modulation code for holographic data storage. Jpn. J. Appl. Phys., Vol. 41, No. 3B,
pp. 1763-1766, ISSN 0021-4922
Hwang, E.; Roh, J., Kim, J., Cho, J., Park, J. & Jung, H. (2003).A new two-dimensional
pseudo-random modulation code for holographic data storage. Jpn. J. Appl. Phys.,
Vol. 42, No. 2B, pp. 1010-1013, ISSN 0021-4922
Immink, K. A.; Nijboer, J. G., Ogawa, H. & Odaka, K. (1985). Method of coding binary data.
U.S. Patent (Feb. 1985) 4,501,000
Immink, K. A. (1997). Method of converting a series of m-bit information words to a
modulated signal, method of producing a record carrier, coding device, decoding
Modulation Codes for Optical Data Storage 15
device, recording device, reading device, signal, as well as record carrier. U.S.
Patent (Dec. 1997) 5,696,505
Kamabe, H. (2007). Representation of 2 dimensional RLL constraint, Proceedings of IEEE
International Symposium on Information Theory (ISIT), pp. 1171-1175, ISBN 978-1-
4244-1397-3, Jun. 2007, Nice
King, B. M. & Neifeld, M. A. (2000). Sparse modulation coding for increased capacity in
volume holographic storage. Applied Optics, Vol. 39, No. 35, pp. 6681-6688, ISSN
0003-6935
Kume, T.; Yagi, S., Imai, T. & Yamamoto, M. (2001). Digital holographic memory using two-
dimensional modulation code. Jpn. J. Appl. Phys., Vol. 40, No. 3B, pp. 1732-1736,
ISSN 0021-4922
Malki, O.; Knittel, J., Przygodda, F., Trautner, H. & Richter, H. (2008). Two-dimensional
modulation for holographic data storage systems. Jpn. J. Appl. Phys., Vol. 47, No. 7,
pp. 5993-5996, ISSN 0021-4922
McLaughlin, S. W. (1998). Shedding light on the future of SP for optical recording. IEEE
Signal Processing Magazine, Vol. 15, No. 4, pp. 83-94, ISSN 1053-5888
Pansatiankul, D. E. & Sawchuk, A. A. (2003). Variable-length two-dimensional modulation
coding for imaging page-oriented optical data storage systems. Applied Optics, Vol.
42, No. 26, pp. 5319-5333, ISSN 0003-6935
Proakis, J. G. (2001) Digital Communications, McGraw-Hill Science/Engineering/Math; 4th
edition, ISBN 0-07-232111-3, International
Roth, R. M.; Siegel, P. H. & Wolf, J. K. (2001). Efficient coding schemes for the hard-square
model. IEEE Trans. on Info. Theory, Vol. 47, No. 3, pp. 1166-1176, ISSN 0018-9448
Vadde, V. & Vijaya Kumar, B.V.K. (2000). Performance comparison of equalization and low-
pass coding for holographic storage, Conference Digest Optical Data Storage, pp. 113-
115, ISBN 0-7803-5950-X, May 2000
Vardy, A.; Blaum, M., Siegel, P. H., & Sincerbox, G. T. (1996). Conservative arrays:
multidimensional modulation codes for holographic recording. IEEE Trans. on Info.
Theory, Vol. 42, No. 1, pp. 227-230, ISSN 0018-9448
16 Data Storage
Signal Processing in Holographic Data Storage 17
X2
1. Introduction
Holographic data storage (HDS) is regarded as a potential candidate for next-generation
optical data storage. It has features of extremely high capacity and ultra-fast data transfer
rate. Holographic data storage abandons the conventional method which records
information in one-dimensional bit streams along a spiral, but exploits two-dimensional (2-
D) data format instead. Page access provides holographic data storage with much higher
throughput by parallel processing on data streams. In addition, data are saved throughout
the volume of the storage medium by applying a specific physical principle and this leads
data capacity on the terabyte level.
Boosted data density, however, increases interferences between stored data pixels.
Moreover, the physical limits of mechanical/electrical/optical components also result in
misalignments in the retrieved images. Typical channel impairments in holographic data
storage systems include misalignment, inter-pixel interferences and noises, which will be
discussed in the following. One channel model that includes significant defects, such as
misalignment, crosstalk among pixels, finite pixel fill factors, limited contrast ratio and
noise, will also be introduced.
The overall signal processing for holographic data storage systems consists of three major
parts: modulation codes, misalignment compensation, equalization and detection. A block
diagram of such a system is illustrated in Fig. 1. Note that in addition to the three parts,
error-correcting codes (ECC) help to keep the error rate of the retrieved information under
an acceptable level. This topic is beyond the scope of the chapter and interested readers are
referred to textbooks on error-correcting codes.
To help maintain signal fidelity of data pixels, modulation codes are designed to comply
with some constraints. These constraints are designed based on the consideration of
avoiding vulnerable patterns, facilitation of timing recovery, and simple decoder
implementation. Modulation codes satisfying one or more constraints must also maintain a
high enough code rate by using as few redundant pixels as possible. The details are
discussed in the chapter on “Modulation Codes for Optical Data Storage.”
Even with modulation codes, several defects, such as inter-pixel interferences and
misalignments, can still cause degradation to the retrieved image and detection
performance. Misalignments severely distort the retrieved image and thus are better
handled before regular equalization and detection. Iterative cancellation by decision
18 Data Storage
feedback, oversampling with resampling, as well as interpolation with rate conversion are
possible solutions for compensating misalignment.
Equalization and detection are the signal processing operations for final data decision in the
holographic data storage reading procedure. Over the years, many approaches have been
proposed for this purpose. For one, linear minimum mean squared error (LMMSE)
equalization is a classical de-convolution method. Based on linear minimum mean squared
error (LMMSE) equalization, nonlinear minimum mean squared error equalization can
handle situations where there exists model mismatch. On the other hand, maximum
likelihood page detection method can attain the best error performance theoretically.
However, it suffers from huge computation complexity. Consequently, there have been
several detection algorithms which are modifications of the maximum likelihood page
detection method, such as parallel decision feedback equalization (PDFE) and two-
dimensional maximum a posteriori (2D-MAP) detection.
Fig. 2. Holographic data storage system: (a) recording process and (b) retrieving process
(Chen et al., 2008).
With the spatial light modulator well controlled by the computer, information-carrying
object beams can be created by passing a laser light through or being reflected by the spatial
light modulator. Next, the object beam is interfered with a reference beam, producing an
interferes pattern, namely, a hologram, which then leads to a chemical and/or physical
change in the storage medium. By altering one or more characteristics of the reference beam,
e.g., angle, wavelength, phase or more of them, multiple data pages can be superimposed at
the same location. The process is called multiplexing. There are many multiplexing methods;
interested readers can refer to (Coufal et al., 2000) for more details.
Reference beams of data pages at a certain location in the recording medium are like
addresses of items in a table. In other words, data pages can be distinguished by the
reference beam which interfered with the corresponding object beam. Therefore, by
illuminating the recording medium with the reference beam from a proper incident angle,
with a proper wavelength and phase, a certain object beam can be retrieved and then
captured by the detector array at the receiving end. The stored information is processed by
20 Data Storage
transforming captured optical signals to electrical signals. The detector array is usually a
charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS)
image sensor. Minimum crosstalk from other pages when retrieving individual pages is
attributed to a physical property called Bragg effect. Fig. 2 shows an example of the
holographic data storage systems and Fig. 3 illustrates more details in the recording and
reading processes.
3. Channel Defects
Ideally, a pixel of the spatial light modulator is directly imaged onto a detector pixel in a
pixel-matched holographic data storage system. Such perfect correspondence is practically
difficult to maintain due to non-ideal effects in holographic data storage channel that distort
light signal and deteriorate signal fidelity. These non-ideal effects are called channel defects.
Like most communication systems, optical storage systems have channel defects. Boosted
storage density makes holographic data storage signals especially sensitive to interferences,
noise or any tiny errors in storage media and optical/mechanical fabrication. From the
spatial light modulator to the detector, from optical to mechanical, quite a few factors have
significant influence on retrieved signals.
Laser light intensity determines the signal-to-noise ratio (SNR) but the intensity may not be
uniform across the whole page. Usually, corners of data pages are darkerdue to the
Gaussian wave front of the light source. This phenomenon and variations of light intensity
over a page both are categorized as non-uniformity effect of light intensity. Non-uniformity
of the retrieved signal level may cause burst errors at certain parts of data pages. Another
factor is the contrast ratio, which is defined as the intensity ratio between an ON pixel and
an OFF pixel. Ideally the contrast ratio is infinite. However, the OFF pixels actually have
non-zero intensity in the spatial light modulator plane. A finite contrast ratio between the
ON and OFF pixels makes the detection of pixel polarity more cumbersome. Another non-
ideal effect is related to fill factors of spatial light modulator pixels and detector pixels. A fill
factor is defined as the ratio of active area to the total pixel area, which is less than unity in
real implementation.
Crosstalk is actually the most commonly discussed channel defect. We discuss two kinds of
crosstalk here, including inter-pixel interference and inter-page interference. Inter-pixel
interference is the crosstalk among pixels on the same data page. It results from any or
combination of the following: band-limiting optical apertures, diffraction, defocus and other
optical aberrations. Inter-page interference is caused by energy leakage from other data
pages. This can be caused by inaccuracy of the reference beam when a certain data page is
being retrieved. As more pages are stored in the medium, inter-page interference becomes a
higher priority in holographic data storage retrieval. How good the interference can be
handled determines to a large extent the number of date pages that can be superimposed.
Mechanical inaccuracies bring about another type of channel defects. Errors in the optical
subsystem, mechanical vibration and media deformation can lead to severe misalignment,
namely magnification, translation and rotation, between the recorded images and the retrieved
images. Even without any inter-pixel interference or other optical distortions, misalignments
still destroy the retrieval results entirely, especially in pixel-matched systems. Fig. 4 shows
an example of misalignments (γx, γy, σx, σy, θ), where γx, γy are the magnification factors in
the X- and Y- directions, respectively; σx, σy are the translation in the range of ±0.5 pixel
Signal Processing in Holographic Data Storage 21
along the X- and Y- directions, respectively; and the rotation angle, θ, is positive in the
counter-clockwise direction.
Fig. 4. Misalignment effects: magnification, translation and rotation of a detector page with
respect to a spatial light modulator page.
Optical and electrical noises are inevitable in holographic data storage systems. They
include crosstalk noise, scattering noise, shot noise and thermal noise. Optical noise occurs
during the read-out process when the storage medium is illuminated by coherent reference
beams, resulting from optical scatter, laser speckle, etc. Electrical noise, such as shot noise
and thermal noise, occurs when optical signals are captured by detector arrays and
converted into electrical signals.
4. Channel Model
A holographic data storage channel model proposed in (Chen et al., 2008), shown in Fig. 5,
includes several key defects mentioned in Section 3. Multiple holographic data storage
channel impairments including misalignment, inter-pixel interference, fill factors of spatial
light modulator and CCD pixels, finite contrast ratio, oversampling ratio and noises are
modeled. The input binary data sequence, A(i, j), takes on values in the set {1, 1/ε}, where ε
is a finite value called the amplitude contrast ratio. The spatial light modulator has a pixel
shape function p(x, y) given by
x y ,
p( x, y ) (1)
ff SLM ff SLM
where ffSLM represents the spatial light modulator’s linear fill factor, the symbol Δ represents
the pixel pitch and Π(.) is the unit rectangular function. Another factor that contributes to
inter-pixel interference is the point spread function, a low-pass spatial behavior with impulse
response hA(x, y) resulted from the limited aperture of the optics subsystem. This point
spread function is expressed as
hA x, y hA x hA y , (2)
where
22 Data Storage
hA x D f L sinc xD f L . (3)
Note that D is the width of the square aperture, λ is the wavelength of the incident light, and
fL represents the focal length. The corresponding frequency-domain transfer function HA(fx,
fy) in this case is the ideal 2-D rectangular low-pass filter with a cut-off frequency equal to
D/2λfL.
A CCD/CMOS image sensor is inherently a square-law integration device that detects the
intensity of the incident light. The image sensor transforms the incoming signals from the
continuous spatial domain to the discrete spatial domain. Quantization in space causes
several errors due to offsets in sampling frequency, location, and orientation. Magnification,
translation and rotation are modeled as (γx, γy, σx, σy, θ), as explained previously. In addition,
the oversampling ratio M, the pixel ratio between spatial light modulator and CCD/CMOS
sensor, is another factor to be considered.
Taking all the aforementioned effects into account, we have the final image sensor output at
the (k, l)th pixel position given by
2
A(i a, j b)h( x a, y b) no ( x' , y ' ) dy'dx' ne (k , l )
Cm (k , l ) (4)
X Y a b
where
k ff CCD 2 k ff CCD 2
X x , x
M x M x
(5)
l ff 2 l ff CCD 2
Y CCD
y , y
M
M y y
and
x' x cos y sin
. (6)
y ' x sin y cos
The subscript m in Cm(k, l) indicates misalignment; h(x, y) in Eq. (4) is also known as a pixel
spread function and h( x, y ) p ( x, y ) hA ( x, y ) ; ffCCD represents the CCD image sensor
linear fill factor; no(x‘, y‘) and ne(k, l) represent the optical noise and the electrical noise
associated with the (k, l)th pixel, respectively. Translation and magnification effects are
represented by varying the range of integration square as in Eq. (5), while the rotational
effect is represented by transformation from the x-y coordinates to the x’-y’ coordinates as
in Eq. (6).
The probability density function of optical noise no(x‘, y‘) can be described as a circular
Gaussian distribution and its intensity distribution has Rician statistics. On the other hand,
the electrical noise ne(k, l) is normally modeled as an additive white Gaussian noise with
zero mean (Gu et al., 1996).
Signal Processing in Holographic Data Storage 23
5. Misalignment Compensation
Misalignments in retrieved images need be detected and compensated. To avoid
information loss in the case that the detector array receives only part of a data page, a larger
detector array or redundant (guard) pixels surrounding the information-carrying spatial
light modulator pages can be employed. Moreover, a misalignment estimation procedure
based on training pixels is needed before actual compensation. In fact, the estimation needs
to be performed locally due to non-uniformity of channel effects. Toward this end, a page is
divided into blocks of a proper size and training pixels are deployed in every block for local
misalignment estimation. One possible estimation method is correlation based. In this
method, the correlations of the received pixels and the known training pixels with different
values of magnification, translation and rotation are first computed. The parameter setting
with the maximum correlation is regarded as the estimation outcome. With the
misalignments estimated, the retrieved images need be compensated before the modulation
decoder decides the stored information. In the following several compensation schemes will
be introduced.
Fig. 6. Illustration of one-dimensional translation effect with parameter σ (Menetrier & Burr,
2003).
Cm k , l
ffCCD 2
ff CCD 2
ff CCD 2 ff CCD 2
Ak , l h x , y Ak 1, l h x 1, y dxdy
2
(7)
Rewriting Eq. (7), we have
Cm k , l Ak , l H 00 2 Ak , l Ak 1, l H 01 Ak 1, l H11 . (8)
It is clear that if Cm(k, l) and A(k-1, l) are known, A(k, l) can be calculated according to Eq. (8).
The decision feedback detection scheme is based on this observation. If σ is positive in the
horizontal direction, then the scheme starts from the pixel at the top-left corner Cm(0, 0), A(0,
0) (the corner pixel) is first detected assuming that A(-1, 0) is zero. With A(0, 0) decided,
decision feedback detection moves to detect the next pixel A(1, 0) and repeats the same
process until all pixels are detected. When the translation is only in the horizontal
dimension, all rows can be processes simultaneously.
Extending the above case to 2-D, the retrieved pixel is a function of four spatial light
modulator pixels
Cm , s As H ss Ah H hh Av H vv Ad H dd
2 As Ah H sh 2 As Av H sv 2 As Ad H sd , (9)
2 Ah Av H hv 2 Ah Ad H hd 2 Av Ad H vd
where subscript s is for self, h for horizontal, v for vertical and d for diagonal. With the same
principle used in the one-dimensional decision feedback detection, one must detect three
pixels, horizontal, vertical and diagonal, before detecting the intended pixel. If both σx and σy
are positive, again we start from the top-left pixel calculating A(0, 0) assuming that pixel A(0,
-1), A(-1, -1), and A(-1, 0) are all zeros. The process is repeated row by row until all pixels are
detected.
A similar detection scheme for images with rotational misalignment is proposed in
(Srinivasa & McLaughlin, 2005). The process is somewhat more complicated because a
pixel’s relationship with associated SLM pixels depends on its location. For example, if the
detector array has rotational misalignment of an angle in the clockwise direction as shown
in Fig. 7, a pixel at the left top portion is a function of A(k, l), A(k, l-1), A(k+1, l-1) and A(k+1,
l), while pixel at the right-bottom corner is a function of A(k, l), A(k-1, l), A(k-1, l+1) and A(k,
l+1). Therefore, iterative decision feedback detection has to be performed differently
Signal Processing in Holographic Data Storage 25
Fig. 7. Rotational misalignment entails different scan orders for the decision feedback
cancellation detection in different portions of a page (Srinivasa & McLaughlin, 2005).
Fig. 8. Realignment by using a bilinear interpolator with local fractional displacement μx and
μ y.
where Zm indicates the realigned pixels and the weights νx(p) and νy(p) depend on the
magnification factors γx and γy as well as the oversampling ratio M.
Signal Processing in Holographic Data Storage 27
The realignment interpolator and the rate-conversion filter can be combined to reduce
complexity. First, both 2-D filters are separated into two respective one-dimensional
operations. Second, the realignment interpolators and the rate-conversion filters are
integrated to construct a misalignment-compensation block that consists of one-dimensional
compensation in the horizontal direction and one-dimensional compensation in the vertical
direction. With such rearrangement, 84% reduction in additions and 74% reduction in
multiplications are achieved (Chen et al., 2008).
based on the MMSE criterion. According to the orthogonal principle (Haykin, 2002), we can
make use of auto-correlation of the received pixels and the cross-correlation between the
received and the desired pixels to solve for the optimal coefficients through the following
equations
K K
RAZ p, q wm, n R p m, q n w p, q R p, q ,
ZZ ZZ (18)
mK nK
where denotes convolution and
RAZ p, q E Ai, j Z i p, j q .
(19)
RZZ p, q E Z i, j Z i p, j q
(Keskinoz & Vijaya Kumar, 1999) provided a simple way to calculate equalizer coefficients
by applying Fourier transformation to Eq. (18). The equalizer coefficients can then be
obtained by
w p, q IFFT FFT RAZ p, q FFT RZZ p, q . (20)
Unfortunately, linear equalizers can suffer from model mismatch and render them
ineffective since the holographic data storage channel is inherently nonlinear. To this end,
nonlinear equalization was also proposed (Nabavi & Vijaya Kumar, 2006; He & Mathew,
2006).
Assume that we consider inter-pixel interference within a range of 3×3 pixels and there exist
no misalignment effects and the retrieved images have been made pixel-matched. Parallel
decision feedback equalization starts with computing hard decision for each pixel. Then two
hypotheses are tested to find the best decision for the current pixel. The process is shown in
Fig. 9. With eight surrounding pixels given by decisions from the previous iteration, the
central (current) pixel is decided as “1” or “0” according to
Aˆ i, j 1 if
2 2
Z (i, j ) H (1, nij ) Z (i, j ) H (0, nij ) ,
(21)
Aˆ i, j 0 if
2 2
Z (i, j ) H (1, nij ) Z (i, j ) H (0, nij )
where H(A(i, j), nij) is the inter-pixel-interference-inflicted channel output with A(i, j)=0 or 1
and a neighborhood pattern expressed as a binary vector, nij, consisting of the eight binary
pixels.
The performance of parallel decision feedback equalization depends on the correctness of
channel estimation and good initial condition. With inaccurate channel information or too
many errors in the initial condition, PDFE will have poor overall performance as initial
errors can propagate throughout the entire page.
6.3 2D-MAP
Two-dimensional maximum a posteriori (2D-MAP) detection, proposed in (Chen et al., 1998) as
the 2-D4 (Two-Dimensional Distributed Data Detection) algorithm, is actually the well-
known max-log-MAP algorithm. It is also a simplified sub-optimal maximum likelihood
page detection algorithm. Different from PDFE, the extrinsic information of each pixel is
now taken into consideration during searching for optimal decisions. Therefore, more than
two cases are tested in 2D-MAP. In this algorithm, a log-likelihood ratio (LLR) for each pixel
is used as extrinsic information and is maintained throughout the iterative process. An LLR
30 Data Storage
with higher absolute value indicates a greater probability of the pixel being “1” or “0.” As
the iteration goes on, the LLR value at each pixel will be re-calculated based on the
knowledge of the previous LLR values of its eight neighbors. In all likelihood, this process
makes each and every LLR move away from the origin and all pixel decisions become more
and more certain.
The procedure of the 2D-MAP detection comprises likelihood computation and update. In a
binary holographic data storage system, the likelihood computation formulas are given by
1 2
LL1U( k ) (i , j ) min Z (i, j ) H (1, nij ) dij( k 1) nij
N ij 2 N 0 .
(22)
(k ) 1 2 ( k 1)
LL0U (i, j ) min Z (i, j ) H (0, nij ) d ij nij
N ij 2 N 0
Again H(A(i, j), nij) is the inter-pixel interference-inflicted channel output with A(i, j) = 0 or 1
and a neighborhood pattern expressed as a binary vector, nij, consisting of the eight binary
pixels; Nij is the set of all possible neighborhood patterns. In addition, d(k-1) is a vector
consisting of the corresponding LLR values of neighboring pixels in the (k-1)th iteration and
the symbol ‘ ’ represents inner product of two vectors. In the above, all misalignment effects
and oversampling have been properly handled and the only remaining channel effect is
inter-pixel interference and noises. LLRs are updated at the end of each iteration. To avoid
sudden changes in the LLR values, a forgetting factor β is applied and the updated LLR
takes the form of
L( k ) (i, j ) (1 ) L( k 1) (i, j ) LL1U( k ) (i, j ) LL0U( k ) (i, j ) , (23)
The value of β can affect the speed and accuracy of convergence. A larger β leads to faster
convergence but may lead to poor detection performance.
With the help of soft information, 2D-MAP indeed has better performance but with much
higher complexity than PDFE. In (Chen et al., 2008) several complexity reduction schemes,
including iteration, candidate, neighborhood and addition reduction, are proposed and thus
up to 95% complexity is saved without compromising the detection performance.
7. Conclusion
This chapter gives an overview to the processing of retrieved signals in holographic data
storage systems. The information to be stored is arranged in a 2-D format as binary or gray-
level pixels and recorded by interference patterns called hologram. The fact that multiple
holograms can be superimposed at the same location of the recording medium leads to
volume storage that provides very high storage capacity.
Two important channel defects, misalignments and inter-pixel interferences, are major
causes for degradation in detection performance and their model is formulated
mathematically. Several misalignments compensation algorithms are introduced. One
algorithm adopts decision feedback to handle misalignments and interference
simultaneously. Pixels are detected one by one after cancelling interference from
neighboring pixels. The scan orders should be carefully designed when misalignments may
involve pixels coming from other directions. Another algorithm makes use of oversampling,
and then resample at location. Another algorithm combines interpolation and rate
conversion to compensate various misalignments effects.
Signal Processing in Holographic Data Storage 31
Equalization and detection are crucial steps in restoring stored information from the
interference-inflicted signal. Albeit its popularity and low complexity, LMMSE equalization
algorithm suffers from the problem of model mismatch as the holographic data storage
channel is inherently nonlinear. In light of this fact, two nonlinear detection algorithms, both
simplified versions of the optimal maximum likelihood page detection, are introduced. They
achieve better performance than the LMMSE method at the cost of higher complexity.
8. References
Ayres, M.; Hoskins, A., & Curtis, K. (2006a). Processing data pixels in a holographic data
storage system. WIPO patent (Sep. 2006) WO/2006/093945 A2
Ayres, M.; Hoskins, A., & Curtis, K. (2006b). Image oversampling for page-oriented optical
data storage. Applied Optics, Vol. 45, No. 11, pp. 2459–2464, ISSN 0003-6935
Burr, G. W.; Coufal, H., Hoffnagle, J. A., Jefferson, C. M., & Neifeld, M. A. (1998). Gray-scale
data pages for digital holographic data storage. Optics Letters, Vol. 23, No. 15, pp.
1218–1220, ISSN 0146-9592
Chen, X.; Chugg, K. M. & Neifeld, M. A. (1998). Near-optimal parallel distributed data
detection for page-oriented optical memories. IEEE J. Sel. Top. Quantum Electron.,
Vol. 4, No. 5, pp. 866–879, ISSN 1077-260X
Chen, C.-Y.; Fu, C.-C. & Chiueh, T.-D. (2008). Low-complexity pixel detection for images
with misalignments and inter-pixel interference in holographic data storage.
Applied Optics, vol. 47, no. 36, pp. 6784–6795, ISSN 0003-6935
Choi, A.-S. & Baek, W.-S. (2003). Minimum mean-square error and blind equalization for
digital holographic data storage with intersymbol interference. Jpn. J. Appl. Phys.,
Vol. 42, No. 10, pp. 6424–6427, ISSN 0021-4922
Chugg, K. M.; Chen, X., Neifeld, M. A. (1999). Two-dimensional equalization in coherent
and incoherent page-oriented optical memory. J. Opt. Soc. Am. A, Vol. 16, No. 3, pp.
549–562, ISSN 1084-7529
Coufal, H. J.; Psaltis, D. & Sincerbox, G. T. (Eds.). (2000). Holographic data storage, Springer-
Verlag, ISBN 3-540-66691-5, New York
Das, B.; Joseph, J., Singh, K. (2009). Phase modulated gray-scale data pages for digital
holographic data storage. Optics Communications, Vol. 282, No. 11, pp. 2147–2154,
ISSN 0030-4018
Gu, C.; Dai, F., & Hong, J. (1996). Statistics of both optical and electrical noise in digital
volume holographic data storage. Electronic Letters, Vol. 32, No. 15, pp.1400–1402,
ISSN 0013-5194
Haykin, S. (2002). Adaptive filter theory, Prentice Hall, 4th edition, ISBN 0130901261
He, A. & Mathew, G. (2006). Nonlinear equalization for holographic data storage systems.
Applied Optics, Vol. 45, No. 12, pp. 2731–2741, ISSN 0003-6935
InPhase website: http://www.inphase-technologies.com
Keskinoz, M. & Vijaya Kumar, B. V. K. (1999). Application of linear minimum mean-
squared-error equalization for volume holographic data storage. Applied Optics,
Vol. 38, No. 20, pp. 4387–4393, ISSN 0003-6935
Keskinoz M. & Vijaya Kumar, B. V. K. (2004). Discrete magnitude-squared channel
modeling, equalization, and detection for volume holographic storage channels.
Applied Optics, Vol. 43, No. 6, pp. 1368–1378, ISSN 0003-6935
32 Data Storage
King, B. M. & Neifeld, M. A. (1998). Parallel detection algorithm for page-oriented optical
memories. Applied Optics, Vol. 37, No. 26, pp. 6275–6298, ISSN 0003-6935
Menetrier, L. & Burr, G. W. (2003). Density implications of shift compensation
postprocessing in holographic storage systems. Applied Optics, Vol. 42, No. 5, pp.
845–860, ISSN 0003-6935
Nabavi, S. & Vijaya Kumar, B. V. K. (2006). Application of linear and nonlinear equalization
methods for holographic data storage. Jpn. J. Appl. Phys., Vol. 45, No. 2B, pp. 1079–
1083, ISSN 0021-4922
Pharris, K. J. (2005). Methods and systems for holographic data recovery. U.S. Patent (Jan.
2005) 20050018263 A1
Singla, N. & O’Sullivan, J. A. (2004). Minimum mean squared error equalization using priors
for two-dimensional intersymbol interference. Proceedings of IEEE International
Symposium on Information Theory (ISIT), pp. 130, ISBN 0-7803-8280-3, Jun. 2004,
Chicago
Srinivasa, S. G. & McLaughlin, S. W. (2005). Signal recovery due to rotational pixel
misalignments. Proceedings of IEEE International Conference on Acoustics, Speech, and
Signal Processing (ICASSP), Vol. 4, iv/121- iv/124, ISBN 0-7803-8874-7, Mar. 2005,
Philadelphia
Optical data storage in photosensitive glasses and spin state transition compounds 33
X3
1. Introduction
Up to now, the common media for optical data storage are the discs (blue ray technology for
example). However, by definition, this technology is limited to two dimensions. The
necessity for increasing data storage capacity requires the use of three-dimensional (3D)
optically based systems. One of the methods for 3D optical data storage is based on volume
holography. The physical mechanism is photochromism, which is defined as a reversible
transformation of a single chemical species between two states that have different
absorption spectra and refractive indices. This allows for holographic multiplexing
recording and reading, such as wavelength (Rakuljic et al., 1992), angular (Mok, 1993), shift
(Psaltis et al., 1995) and phase encoding. Another promising 3D optical data storage system
is the bit-by-bit memory at the nanoscale (Li et al., 2007). It is based on the confinement of
multi-photon absorption to a very small volume because of its nonlinear dependence on
excitation intensity. This characteristic provides an income for activating chemical or
physical processes with high spatial resolution in three dimensions. As a result there is less
cross talk between neighbouring data layers. Another advantage of multi-photon excitation
is the use of infrared (IR) illumination, which results in the reduction of scattering and
permits the recording of layers at a deep depth in a thick material. Two-photon 3D bit
recording in photopolymerizable (Strickler & Webb, 1991), photobleaching (Pan et al., 1997;
Day & Gu, 1998) and void creation in transparent materials (Jiu et al., 2005; Squier & Muller,
1999) has been demonstrated with a femtosecond laser. Recording densities could reach
terabits per cubic centimeter. Nevertheless, these processes suffer from several drawbacks.
The index modulation associated with high bit density limits the real data storage volume
due to light scattering. The fluorescence can limit the data transfer rate and the lifetime of
the device.
Thanks to various available compositions, ease of implementation, stability and
transparency, both organic and inorganic materials are convenient for 3D data storage. To
be good candidates for 3D optical data storage, these materials must satisfy several norms
34 Data Storage
for the storing and the reading: Resistance to ageing due to temperature and recurrent
reading. Moreover, high-speed response for high rate data transfer, no optically scattering
for multilayer storage, erasure and eventually record with a grayscale to increase the data
density could be incontrovertible advantages.
Here, we present two particular media: A photosensitive glass (zinc phosphate glass
containing silver) in which the contrast mechanism is neither a change in refractive index
nor a change in absorption, but a change in the third-order susceptibility ((3)) induced by
femtosecond laser irradiation (Canioni et al., 2008); and a spin state transition material.
2. Photosensitive glass
2.1. Data storage medium: Photosensitive zinc phosphate glass containing silver
Glasses with composition 40P2O5-4Ag2O-55ZnO-1Ga2O3 (mol%) were prepared for 3D data
storage using a standard melt quench technique. (NH4)2HPO4, ZnO, AgNO3 and Ga2O3 in
powder form were used as raw materials and placed with the appropriate amount in a
platinum crucible. A heating rate of about 1°C.min-1 has been conducted up to 1000 °C. The
melt was then kept at this last temperature (1000 °C) from 24 to 48 hours. Following this
step, the liquid was poured into a brass mold after a short increase of the temperature at
1100°C in order to access the appropriate viscosity. The glass samples obtained were
annealed at 320 °C (55 °C below the glass transition temperature) for 3 hours, cut (0.5 to 1
mm-thick) and optically polished. The glass possesses an absorption cut-off wavelength at
280 nm (due to the silver ions associated absorption band around 260 nm) and emits
fluorescence mainly around 380 nm when excited at 260 nm. This intrinsic fluorescence is
due to Ag+ isolated in the glass (Belharouak et al., 1999).
The processed glass is highly photosensitive and was originally developed as a gamma
irradiation dosimeter (Schneckenburger et al., 1981; Dmitryuk et al., 1996). Following
exposure to gamma rays, the glass presents a broad UV absorption band, and when excited
by UV radiation, emits homogeneous fluorescence which intensity is proportional to the
irradiation dosage. This fluorescence is attributed to the presence of silver nanoclusters.
Fig. 1. (a) Differential absorbance spectrum between the irradiated and non-irradiated
regions; (b) Emission spectrum (excitation wavelength, 405 nm) of the irradiated region.
Experimental laser irradiation conditions: I = 6 TW.cm−2, N = 106. Differential absorbance
and emission spectra are assigned to laser induced Ag clusters.
The optical properties of the silver nanoclusters are studied for different irradiation
conditions by white light, epiuorescence and third-harmonic generation (THG)
microscopy. Thus, the glass is exposed to different irradiance levels (x axis) between 4
TW.cm−2 and 10 TW.cm−2 and a different number of pulses (y axis) from 102 to 106, as shown
on the experimental map sketch in Fig. 2(a). The sample is manipulated through patterning
of bits with a bit spacing of 20 mm using a precision xyz stage. Epiwhite light and
epiuorescence microscopy are performed with commercial microscopes. A transmission
confocal setup, with a 36 - 0.52 NA reective objective, is used for THG data collection. The
36 Data Storage
THG signal is ltered from the fundamental one by an emission bandpass lter @ (350 ± 50)
nm and is collected with a photomultiplier tube. In our case, THG is excited with the same
laser at low energy 10 nJ/pulse, but practically, a cheaper laser, such as a femtosecond ber
laser, could be used. Indeed, with a minimum energy of 0.1 nJ and an irradiance of 1010
W.cm−2 (corresponding to an average power of 10 mW with a 100 MHz repetition rate),
more than one third-harmonic photon by incoming pulse can be detected (Brocas et al.,
2004).
Fig. 2. Microscopy imaging of laser-induced species following the experimental map sketch.
(a) x axis, laser irradiance; y axis, number of laser pulses, 65 bits pattern; spacing, 20 mm.
(b) Epiwhite light microscopy image reveals linear refractive index modications; vertical
dashed line, damage threshold. (c) Epiuorescence microscopy image (excitation
wavelength, 365 nm; emission lter, (610 ± 40) nm). (d) THG image (excitation wavelength,
1030 nm; emission lter, (350 ± 50) nm); encircled area, data storage irradiation conditions (I
= 6TW.cm−2, N = 106).
Transmission, uorescence, and THG readout images of bits recorded with different
irradiance levels and laser shots are presented in figures 2(b)–2(d). Figure 2(b) shows
changes in refractive index. The damage threshold is achieved at an irradiance of 9 TW cm−2,
which is delimited by a vertical dashed line. At irradiances below this line, no apparent
modications are observed except for the high accumulated area in the upper part of Fig.
2(b). Nevertheless, in Fig. 2(c), we observe that uorescence is obtained in regions where the
refractive index is not modied. The same behavior is observed on the THG image in Fig.
2(d).
The THG image conrms that the induced Ag clusters absorb at 343 nm, corresponding to
Optical data storage in photosensitive glasses and spin state transition compounds 37
Fig. 3. Writing and reading processes. (a) Data are stored inside the photosensitive glass by
focusing an IR femtosecond laser. 1 bit corresponds to a silver nanoclusters aggregation
which presents no refractive index modifications. (b) Using the same laser with less energy,
the third-harmonic signal can be collected.
explained before, a 3D bit pattern embedded in the photosensitive glass is written and read
by a THG imaging setup. To achieve a high bit density in the volume, the change in
refractive index must be kept as low as possible to minimize the effect of scattering of the
reading beam while the change in (3) must be as high as possible. We choose to work below
the damage threshold to minimize the refractive index modication and optimize creation
of aggregates by the accumulated effect (the corresponding area is encircled in Fig. 2(d)).
The sample is irradiated with a laser irradiance of 6 TW.cm−2 and with 106 pulses. Three
layers of data are embedded 200 mm inside the sample. Each layer contains a pattern of
1212 bits with a bit spacing of 3 mm. The letters U, B, and the numeral 1 (for University
Bordeaux 1) are recorded in the rst, second, and third layers, respectively, with a layer
spacing of 10 mm in the z direction. As shown previously, the same laser is used for the
reading procedure but with a lower irradiance. By scanning the sample in xyz through the
focus, the three layers (U, B, and 1) are reconstructed and presented in Fig. 3.
Fig. 4. THG readout of the three layers containing the bit patterns U, B, and 1, recorded in
the bulk of the glass (bit spacing, 3 mm; layer spacing, 10 mm). Laser writing parameters: I =
6 TW.cm−2, N = 106. The three THG images present high signal-to-noise ratio and no cross
talk.
As expected by the THG mechanism, an image with high contrast and no cross talk is
observed in Fig. 3. The main advantages of this technique compared to usual 3D data
storage are no photobleaching, no change in linear refractive index, and therefore no
scattering. Moreover, the THG signal is coherent and gives rise to a rather intense,
directional, and less divergent beam with a high signal-to-noise ratio. Due to the fast THG
response, the reading speed is limited only by the pulse duration.
Fig. 5. THG signal versus number of the laser pulses with I = 6 TW.cm-2. The information
can be grayscale encoded.
value, all the system commutes from an LS to an HS state. According to these experiments,
one may wonder if a single laser pulse may also induce the same phenomena in a
cooperative iron(II) SCO material. To demonstrate it, we have selected the [Fe(PM-
BiA)2(NCS)2] (PM-BiA = N-2’-pyridylmethylene-4-aminobiphenyl) complex which exhibits a
well defined abrupt hysteresis around 170 K (Létard et al., 1998; Létard et al., 2003). We have
shown that, under certain conditions, the use of a single pulse laser in the center of the
thermal hysteresis loop leads to a LS HS photo-conversion (Freysz et al., 2004). Compared
to the experiments reported on valence tautomeric compounds, our results indicated that
the final state reached after a laser excitation is neither a pure HS nor a pure LS state, but it
is instead a “mixture” of HS/LS domains. Since the system is firstly photo-excited into the
HS state and then slowly relaxes to mixture of HS/LS state, our results cannot be accounted
by the so-called domino effect.
The set-up we used to perform these experiments is sketched on Figure 6a. The sample, a
powder composed of micro crystallites (a few microns in radius) is sandwiched in between
two optical windows and is placed into a cryostat. The specular light reflected by the sample
was collected and sent to a 150 mm spectrometer to select the wavelength centered at 600
nm. The resolution of the spectrometer was set to be ~ 2 nm. At the exit of the spectrometer,
the light was collected by a photomultiplier connected to a 1 Mega Ohm load. The voltage
drop across this load was recorded versus the temperature of the sample in the cryostat. As
shown in Figure 6b, this reflection set-up makes possible to record the temperature
hysteresis loop of the studied sample. To first record the LS to HS state transitions, the
sample is shined with a white light continuum and the light reflected by the sample and
transmitted at 600 nm through the spectrometer is recorded by the photo-multiplier. The
typical evolution of the reflectivity versus the temperature increase or decrease is presented
in Figure 6b. These data are in very good agreement with the measurements performed with
a SQUID.
1,0
(a) (b)
Trigger
0,8 -
P.D.
0,6
Cryostat
xHS
White light
continuum 0,4
F1
L1
+
0,2
F2 G1 0,0
P.M. Monochromator
162 164 166 168 170 172 174 176 178
=630 nm Nd:YAG
Temperature (K)
L2
2
Fig. 6. (a) Sketch of the experimental set-up used to induce and measure the laser induced
spin state transition. (b) Evolution of the reflected light versus the temperature. The
temperature is either decreased from 180 K to 160 K or increased from 160 K To 180 K. XHS is
the molar fraction of molecules in the HS state.
Using this set-up, we have studied the effect of pulsed light on the sample. In this later case,
a single laser pulse (Q-switched nanosecond frequency doubled Nd3+:YAG laser, =0.532
µm, pulse width 8 ns; energy between 0.5 and 10 mJ) was focused on a spot of about 3 mm
42 Data Storage
in diameter and only a small fraction of the light reflected by the sample at 600 nm within
the laser spot area was imaged and recorded by the spectrometer. As shown in Figure 7a,
when the sample is cooled at 140 K, a temperature below the hysteresis loop, we clearly
record a temporal evolution of the reflectivity of the sample. According to our calibration,
our data clearly show that all the sample probed by the reflected light is brought into the HS
state. Moreover, within 0.5 second, it relaxes back to the LS state. By studying the evolution
of the HS fraction within the laser spot versus the energy of the laser pulse, we note that the
number of HS particles steadily increases versus the energy of the laser pulse up to an
energy of ~ 1 mJ. Above 1 mJ, which corresponds to a fluence of 14 mJ.cm-2 per pulse, all the
probed molecules are photo-converted in HS state and the signal saturates. This situation
remains until 9 mJ, when a surface photo-degradation of the sample occurs. In conclusion,
below the thermal hysteresis and between 78 and 140 K, the photo-induced HS state is not
stable. We noted its lifetime decreases from hours (at 60 K) to minutes (at 78 K) and to
second (at 140 K) (Degert et al., 2005).
1,0
0,6
0,6
x HS
xHS
0,4
0,4
0,2 0,2
0,0 0,0
0,0 0,2 0,4 0,6 0,8 1,0 1,2 1,4 1,6 1,8 0 5 10 15
Let us now consider the influence of a 1 mJ laser pulse when the sample is set at a
temperature within the thermal hysteresis loop. The sample was firstly set into the LS
configuration by slowly cooling it and then carefully warming it at T1= 166 K, i.e. the region
at the beginning of the thermal hysteresis loop. The sample was then excited with a single
laser pulse. As reported on Figure 7a, immediately after the single laser pulse, all the probed
particles reach the HS state, but they relax back to the LS state. This clearly indicates firstly,
that the photo-induced excited HS state is not stable, even within the hysteresis loop and
secondly, that the LS HS transition cannot be induced by the photo-excitation of HS
molecules. This observation also considerably limits the contribution of a domino effect
associated with the excitation of the molecular HS state under our experimental conditions.
We then warmed the sample at 170K, i.e. the region in the centre of the thermal hysteresis
loop, and again excited it with a single laser pulse. Similarly to what reported on Figure 7b,
immediately after the single laser pulse, all the probed particles reach the HS state. But this
time, they neither relax toward the pure HS state nor the initial LS state. The final situation
after relaxation can be regarded as a “mixture” of HS/LS particles. At first, this result is
interesting in regard to the opened problem of the thermal effect accompanying the optical
excitation of the sample. Indeed, if an artificial and important temperature increase (T > 10
K) associated with the absorption of the particles brought the sample in the HS state, it
Optical data storage in photosensitive glasses and spin state transition compounds 43
should, at first glance, remain within this state. This is clearly not the reported experimental
result. Moreover, we have also shown that additional laser pulses do not affect the
measured HS/LS ratio (Freysz et al., 2004). We have also demonstrated the stability of the
photo-induced mixture state within the hysteresis loop. When one steadily increases the
temperature of the sample, it remains within this “mixture” state until it reaches the upper
and lower temperature branch of the hysteresis loop, respectively labelled T1/2↓ and T1/2.
For temperatures higher than T1/2, all the particles of the sample are converted in the HS
state. This latter result demonstrates, if necessary, that the sample is not damaged by the
laser pulse. This point was confirmed by the repetition of the experiment many times on the
same sample, which did not show any degradation.
Although these experiments were interesting in demonstrating optical data recording in
spin state compounds within the hysteresis loop, their practical use was quite limited.
3.2. Data recording within the hysteresis loop and at room temperature
One year after our work, it has been shown that different compounds can be photo-switched
at room temperature using a single Nd:YAG Q-switched laser pulse (Bonhommeau et al.,
2005). However, the details of the mechanisms giving rise to the switching of the spin state
compound within the thermal hysteresis was difficult to evidence (Fouché et al., 2009). In
order to investigate these mechanisms, we have carried out an optical study of the SCO
complex [Fe(NH2trz)3](NO3)2 3H2O coordination polymer (NH2trz = 4-amino-1,2,4-triazole)
known to display a well-defined thermal hysteresis loop at room temperature.
Fig. 8. [Fe(NH2trz)3](NO3)2 3H2O sample in the LS state (a), the HS state (c) and after photo-
excitation by a single laser pulse within the hysteresis loop of the compound (b).
As shown in Fig. 8, the colour of the powder, which is pink when the sample is in the LS
state (Fig 8a), becomes almost completely white when the sample is in the HS state (Fig. 8c).
Indeed, in the LS state, this material displays a broad absorption band at around 520 nm,
which is characteristic of the d-d transition of Fe2+ (1A1g → 1T1g), while in the HS state a d-d-
transition is recorded at lower energy in the near-infrared region (830 nm). Figure 8b shows
the state of the sample impacted by a sequence of pulses yielded by a frequency tripled
Nd:YAG laser that delivers pulses at 355nm which duration is ~ 6ns and fluence on the
sample is Ep ~ 52 mJ.cm-2. The same experiment has also been performed using pulses with
the same fluence but centred at 532 nm. In both cases, we noticed that the central part of the
sample impacted by the laser pulse becomes white. The sample remains as it is as long as its
temperature is kept within the thermal hysteresis loop. For instance, the whole sample
recovers its original pink colour when its temperature is below 10 °C. This clearly indicates
that within the hysteresis loop, the micro-crystallites of the compound impacted by the laser
pulses are brought in the HS state.
44 Data Storage
Fig. 9. (I) and (VII) colour of the sample in the LS state. Evolution of the sample set in the
thermal hysteresis loop and impacted by (II) one pulse, (III) two pulses, (IV) three pulses,
(V) four pulses, (VI) five pulses. (VIII) Colour of the sample impacted by five laser pulse and
cooled back in the LS state.
We have also recorded the evolution of the sample versus the laser excitation. Firstly, we
noticed that below a certain laser fluence, the sample remains unchanged. Above this
threshold fluence, the final state of the compounds depends on both the laser fluence and
the number of laser pulses used to excite the sample. As shown in Fig. 9, the central part of
the sample shined by the laser beam that is initially in the LS state (I) becomes whiter and
whiter as the number of excitation pulses increases ( Fig. 9 II to 9 VI). The pictures in Fig.
9VII and 9VIII underline that, for temperature above or below the hysteresis loop, the
sample recovers its HS or LS state. This indicates that the sample is not altered by the laser
excitation. This experiment also indicates that grayscale encoding is also possible in these
materials.
Fig. 10. In solid line: Thermal hysteresis loop deduced from the optical reflectivity and
irradiation studies performed at 328 K, 333 K and 338 K by one (), two (), three (), four
() and five () pulses at 355 nm (in blue) and 532 nm (in green) with a fluence of 52
mJ.cm-2. The inset is the absorption of the powder in the HS and LS states.
Optical data storage in photosensitive glasses and spin state transition compounds 45
To be more quantitative about the final state reached by the sample after each laser pulse,
we have replaced the monochromator by a spectrometer and we have recorded the
spectrum of the light reflected by the sample after each laser pulse. This enables to fix the
state of the sample within the hysteresis loop (Fig. 10). These last data clearly indicate that
one can easily perform data recording in this compound. However, they also stress that data
recording is easier when the absorption coefficient of the sample is higher. This also clearly
underlines that the laser induced heating of the sample is of central importance. To clearly
evidence the mechanism responsible for data recording in these materials, we have
performed a nanosecond time resolved reflectivity measurement that is described below.
Oven
G1 2
L1
Boxcar Nd:YAG
P.D.1 Synchronization
OPO
P.D.2 L2 F1
device. This device, that also delivers the signal that triggers the boxcar gates, makes it
possible to temporally delay, from few nanoseconds up to few seconds, the light pulses
delivered by these two systems. To limit the impact of OPO fluctuations, we measure the
energy of each OPO pulse and only kept the data corresponding to an OPO pulse energy
within 5% of the mean OPO energy value. The signal to noise ratio of our data is further
improved by averaging the data over ten laser shots. Finally, to measure the actual
reflectivity change of the sample, we record for each pump-probe time delay the relative
reflectivity of the sample with and without the pump pulse. In the actual set-up, the sample,
a powder composed of micro crystallites a few microns in radius, is sandwiched in between
two optical windows and placed into a thermally regulated oven. This set-up has been
mainly used when the temperature of the sample is set slightly below the hysteresis loop. In
such a case the sample recovers its initial LS state after each pump pulse. Therefore, under
these experimental conditions, one records both the formation and the relaxation of the HS
fraction induced by the pump pulse.
1.10
C D
1.06
1.04 B
1.02
A A
1.00
1 2 3 4 5 6 7
10 10 10 10 10 10 10
Log(t(ns))
Fig. 13. Time dependence of the reflectivity change at 283 K after photo-excitation with a
pump pulse of fluence Ep ~ 9 mJ.cm-2.
To grasp all the complexity of the dynamic, we have chosen to present our data on a
temporal logarithmic scale. Using this scale, one clearly evidences that the growing and
the relaxation of the reflectivity is governed by different characteristic times. Figure 12
also indicates that, to register the complete evolution of the system after the pulse
excitation, the reflectivity has to be recorded over at least five decades of time. This
stresses the advantage of our experimental set-up in which one can adjust continuously
the sampling steps along the experiment.
Optical data storage in photosensitive glasses and spin state transition compounds 47
C Tdown Tup
(1)
D C
B
A
T (K)
T(t) B (2)
C D Tup
Tdown
A
t
C(t)
(3)
A
t
1 ~40 ns 2 ~100 µs
Fig. 14. Mechanisms that account for the laser pulse induced SCO at the vicinity of the
thermal hysteresis loop.
As depicted in Fig. 14, this experiment makes also possible to follow the temporal evolution
of the sample within the HS concentration, temperature (i.e. (C,T)) phase diagram of the
compound outside the hysteresis loop (Fouché et al., to be published). Upon laser
excitation, absorption of the pump energy takes place in a very thin layer Lp (~ 400 nm).
As a result, the temperature of this layer increases drastically and is maximum about 40
ns after the laser excitation and sets the sample layer at the point B of the (C,T) diagram.
The heat subsequently diffuses toward the bulk sample. Hence, the temperature of the
layer steadily decreases so that the sample layer is moving within the phase diagram.
The evolution of the HS fraction is less abrupt. During the heating of the layer, it
48 Data Storage
remains almost constant and then it starts to increase. The growth lasts as long as the
temperature of the layer has not reached the ascending branch of the hysteresis loop
(point C of the phase diagram). Then, as long as the temperature of the layer remains
within the thermal hysteresis loop (i.e. between the point C and D of the phase
diagram), the HS fraction remains constant. It decreases as soon as the temperature of
the layer reaches the descending branch of the hysteresis loop (point D of the phase
diagram).
3.2.4. Mechanisms responsible for data recording within the hysteresis loop
The results recorded when the temperature of the sample was set below the hysteresis loop
and presented in the previous section makes possible to understand the evolution of the
sample when its temperature is set within the hysteresis loop as depicted in Fig.9. Figure 15
shows the evolution of the sample after each laser pulse.
Fig. 15. Path in the phase diagram followed by the sample after each laser shot and
evolution of the temperature (grey line) and the HS fraction (black line) of the sample after
each laser shot.
At first, just after the laser excitation, the absorption of the pulse by a thin layer of the
sample induces a local heating. Within a hundred of nanoseconds, the absorbed energy
heats the system; its temperature rises up very quickly at a temperature where the LS state is
unstable. The higher the absorption coefficient is, the higher the temperature increase is.
Thus the sample is thermally quenched (A → B), whereas only a very small HS fraction
appears. Then, the sample cools and an increase of the HS fraction takes place (B → C). As
shown in Fig. 14, this process lasts about 100 µs. Once the ascending branch of the hysteresis
loop is reached, the HS fraction no longer evolves and the created HS fraction coexists with
the LS fraction. Finally, the powder reaches the temperature of the oven (C → D). The whole
Optical data storage in photosensitive glasses and spin state transition compounds 49
process takes about 1 ms. The sample has then reached a state where its absorption
coefficient decreases. Indeed, as shown in inset of the Fig. 10, the absorption of the sample in
the HS state is reduced at both 532 nm and 355 nm. Thus, for the subsequent laser pulses,
the laser induced heating is less and less important, the growth lasts less and less time and
the increase of the HS fraction is less and less important. After a few pulses, the sample has
reached a state located within the thermal hysteresis loop (Fig. 14). In the last case, one can
further improve the LS to HS conversion efficiency by increasing the laser pulse energy.
Based on this scenario, the highest efficiency recorded at 355 nm is directly accounted by the
highest absorption of the sample in the blue spectral range (inset of Fig. 10), leading to a
greater heating of the sample.
4. References
Barille, R.; Canioni, L.; Sarger, L. & Rivoire, G. (2002). Nonlinearity measurements of thin
films by third-harmonic-generation microscopy, Phys. Rev. E, 66, 067602
Belharouak, I.; Parent, C.; Tanguy, B.; Le Flem, G. & Couzy, M. (1999). Silver aggregates in
photoluminescent phosphate glasses of the ‘Ag2O-ZnO-P2O5’ system, J. Non-Cryst.
Solids., 244, 238-249
Bellec, M; Royon, A.; Bousquet, B.; Bourhis, K.; Treguer, M.; Cardinal, T.; Richardson, M. &
Canioni, L. (2009), Beat the diffraction limit in 3D direct laser writing in
photosensitive glass, Opt. Exp., 17, 10304-10318
Bousseksou, A; Negre, N.; Goiran, M.; Salmon, L.; Tuchagues, J.P.; Boillot, M.L.;
Boukheddaden, K. & Varret, J. F. (2000), Dynamic triggering of a spin-transition by
a pulsed magnetic field, Eur. Phys. J. B, 13, 451
S. Bonhommeau, G. Molnar, A. Galet, A. Zwick, J.A. Real, J.J. McGarvey & A. Bousseksou
(2005), One-Shot-Laser-Pulse-Induced Reversible Spin Transition in the Spin
Crossover Complex {Fe(C4H4N2)[Pt(CN)4]} at Room Temperature, Angew. Chem.
Int. Ed., 44 4069
Brocas, A.; Canioni, L. & Sarger, L. (2004). Efficient selection of focusing optics in non linear
microscopy design through THG analysis, Opt. Exp., 12, 2317-2322
Canioni, L.; Bellec, M; Royon, A.; Bousquet, B. & Cardinal, T. (2008). Three-dimensional
optical data storage using third-harmonic generation in silver zinc phosphate glass.
Opt. Lett., 33, 360-362
Chen, J. & Xie, S. (2002). Green's function formulation of third-harmonic generation
microscopy, J. Opt. Soc. Am. B, 19, 1604
50 Data Storage
Dai, Y.; Hu, X.; Wang, C.; Chen, D.; Jiang, X.; Zhu, C.; Yu, B. & Qiu, J. (2007). Fluorescent Ag
nanoclusters in glass induced by an infrared femtosecond laser, Chem. Phys. Lett.,
439, 81
Day, D. & Gu, M. (1998). Effects of Refractive-Index Mismatch on Three-Dimensional
Optical Data-Storage Density in a Two-Photon Bleaching Polymer, Appl. Opt., 37,
6299-6304
Decurtins, S.; Gütlich, P.; Köhler, C.P.; Spiering, H. & Hauser, A. (1984), Light-induced
excited spin state trapping in a transition-metal complex: The hexa-1-
propyltetrazole-iron (II) tetrafluoroborate spin-crossover system, Chem. Phys. Lett.,
105, 1-4.
Degert, J.; Lascoux, N.; Montant, S.; Létard, S.; Freysz, E.; Chastanet, G. & Létard, J.-F. (2005),
Complete temperature study of the relaxation from the high-spin state to low-spin
state in a strongly cooperative spin crossover compound, Chem. Phys. Lett. 415, 206-
210
Dmitryuk, A. V.; Paramzina, S. E.; Perminov, A. S.; Solov’eva, N. D. & Timofeev, N. T.
(1996). The influence of glass composition on the properties of silver-doped
radiophotoluminescent phosphate glasses, J. Non-Cryst. Solids, 202, 173-177
Fouché, O.; Degert, J.; Jonusauskas, G.; Baldé, C.; Desplanche, C.; Létard, J.-F. & Freysz, E.
(2009), Laser induced spin state transition: Spectral and temporal evolution, Chem.
Phys. Lett., 469, 274-278
Freysz E, Montant S, Letard S & Létard J.F. (2004), Single laser pulse induces spin state
transition within the hysteresis loop of an Iron compound, Chem. Phys. Lett., 394,
318-323
Jiu, H.; Tang, H.; Zhou, J.; Xu, J.; Zhang, Q.; Xing, H.; Huang, W. & Xia, A. (2005).
Sm(DBM)3Phen-doped poly(methylmethacrylate) for three-dimensional
multilayered optical memory, Opt. Lett., 30, 774-776
Gütlich, P.; Hauser, A. & Spiering, H. (1994), Thermal and Optical Switching of Iron(II)
Complexes, Angew. Chem. Int. Ed. Engl., 33, 2024-2054
Hauser, A. (1986), Reversibility of light-induced excited spin state trapping in the
Fe(ptz)6(BF4)2, and the Zn1−xFex(ptz)6(BF4)2 spin-crossover systems, Chem. Phys.
Lett. 124, 543-548
Jones, H. D. & Reiss, H. R. (1977). Intense-field effects in solids, Phys. Rev. B, 16, 2466-2473
Keldysh, L. V. (1965). Ionization in the field of a strong electromagnetic wave, Sov. Phys.
JEPT, 20, 1307-1314
Koshino, K. & Ogawa, T (1998), Domino effects in photoinduced structural change in one-
dimensional systems, J. Phys. Soc. Jap. 67, 2174
Létard, J.-F.; Guionneau, P.; Rabardel, L.; Howard, J. A. K.; Goeta, A. E.; Chasseau, D. &
Kahn, O. (1998), Structural, Magnetic, and Photomagnetic Studies of a Mononuclear
Iron(II) Derivative Exhibiting an Exceptionally Abrupt Spin Transition. Light-
Induced Thermal Hysteresis Phenomenon, Inorg. Chem, 37, 4432-4441
Létard, J.-F.; Chastanet, G.; Nguyen, O.; Marcen, S.; Marchivie, M.; Guionneau, P.; Chasseau,
D.; Gütlich, P. (2003), Spin Crossover Properties of the [Fe (PM-BiA) 2 (NCS) 2]
Complex-Phases I and II, Monatshefte für Chemie 134, 165
Li, X.; Bullen, C.; Chon, J. W. M.; Evans, R. A. & Gu, M. (2007). Two-photon-induced three-
dimensional optical data storage in CdS quantum-dot doped photopolymer, Appl.
Phys. Lett., 90, 161116
Optical data storage in photosensitive glasses and spin state transition compounds 51
Liu, H.W.; Matsuda, K.; Gu, Z.Z.; Takahashi, K.; Cui, A.L.; Nakajima, R.; Fujishima, A. &
Sato, O. (2003), Reversible Valence Tautomerism Induced by a Single-Shot Laser
Pulse in a Cobalt-Iron Prussian Blue Analog, Phys. Rev. Lett. 90, 167403
Mok, F. K. (1993). Angle-multiplexed storage of 5000 holograms in lithium niobate, Opt.
Lett., 18, 915-917
Pan, S.; Shih, A.; Liou, W.; Park, M.; Bhawalkar, J.; Swiatkiewicz, J.; Samarabandu, J.; Prasad,
P. N. & Cheng, P. C. (1997). Scanning 19, 156
Psaltis, D.; Levene, M.; Pu, A. & Barbastathis, G. (1995). Holographic storage using shift
multiplexing, Opt. Lett., 20, 782-784
Rakuljic, G. A.; Leyva, V. & Yariv, A. (1992). Optical data storage by using orthogonal
wavelength-multiplexed volume holograms, Optics Letters, 17, 1471-1473
Renz, F.; Spiering, H.; Goodwin, H.A. & Gütlich, P (2000), Light-perturbed hysteresis in an
iron (II) spin-crossover compound observed by the Mössbauer effect, Hyperfine
Interactions 126, 155
Schneckenburger, H.; Regulla, D. F. & Unsöld, E. (1981). Time-resolved investigations of
radiophotoluminescence in metaphosphate glass dosimeters, Appl. Phys. A, 26, 23-
26
Shen, Y. R. (1984). The Principles of Nonlinear Optics, Wiley
Squier, J. & Muller, M. (1999). Third-harmonic generation imaging of laser-induced
breakdown in glass, Appl. Opt., 38, 5789-5794
Strickler, J. H. & Webb, W. W. (1991). Three-dimensional optical data storage in refractive
media by two-photon point excitation, Opt. Lett., 16, 1780-1782
Stuart, B. C.; Feit, M. D.; Herman, S.; Rubenchik, A. M.; Shore, B. W. & Perry, M. D. (1996).
Nanosecond-to-femtosecond laser-induced breakdown in dielectrics, Phys. Rev. B,
53, 1749-1761
Sun, H. B.; Tanaka, T.; Takada, K. & Kawata, S. (2001). Finer features for functional
microdevices, Nature, 412, 697-698
Shimamoto, N.; Ohkoshi, S.-S.; Sato, O. & Hashimoto, K. (2002), One-Shot-Laser-Pulse-
Induced Cooperative Charge Transfer Accompanied by Spin Transition in a Co-Fe
Prussian Blue Analog at Room Temperature, Chem. Lett., 31, 486
52 Data Storage
Data Representation for Flash Memories 53
0
4
Jehoshua Bruck
Electrical Engineering Department
California Institute of Technology
Pasadena, CA 91125, U.S.A.
[email protected]
In this chapter, we introduce theories on data representation for flash memories. Flash mem-
ories are a milestone in the development of the data storage technology. The applications of
flash memories have expanded widely in recent years, and flash memories have become the
dominating member in the family of non-volatile memories. Compared to magnetic record-
ing and optical recording, flash memories are more suitable for many mobile-, embedded-
and mass-storage applications. The reasons include their high speed, physical robustness,
and easy integration with circuits.
The representation of data plays a key role in storage systems. Like magnetic recording and
optical recording, flash memories have their own distinct properties, including block erasure,
iterative cell programming, etc. These distinct properties introduce very interesting coding
problems that address many aspects of a successful storage system, which include efficient
data modification, error correction, and more. In this chapter, we first introduce the flash
memory model, then study some newly developed codes, including codes for rewriting data
and the rank modulation scheme. A main theme is understanding how to store information
in a medium that has asymmetric properties when it transits between different states.
A prominent property of flash memories is block erasure. In a flash memory, cells are organized
into blocks. A typical block contains about 105 cells. While it is relatively easy to inject charge
into a cell, to remove charge from any cell, the whole block containing it must be erased to the
ground level (and then reprogrammed). This is called block erasure. The block erasure opera-
tion not only significantly reduces speed, but also reduces the lifetime of the flash memory [3].
This is because a block can only endure about 104 ∼ 106 erasures, after which the block may
break down. Since the breaking down of a single block can make the whole memory stop
working, it is important to balance the erasures performed to different blocks. This is called
wear leveling. A commonly used wear-leveling technique is to balance erasures by moving
data among the blocks, especially when the data are revised [10].
There are two main types of flash memories: NOR flash and NAND flash. A NOR flash
memory allows random access to its cells. A NAND flash partitions every block into multiple
sections called pages, and a page is the unit of a read or write operation. Compared to NOR
flash, NAND flash may be much more restrictive on how its pages can be programmed, such
as allowing a page to be programmed only a few times before erasure [10]. However, NAND
flash enjoys the advantage of higher cell density.
The programming of cells is a noisy process. When charge is injected into a cell, the actual
amount of injection is randomly distributed around the aimed value. An important thing to
avoid during programming is overshooting, because to lower a cell’s level, erasure is needed.
A commonly used approach to avoid overshooting is to program a cell using multiple rounds
of charge injection. In each round, a conservative amount of charge is injected into the cell.
Then the cell level is measured before the next round begins. With this approach, the charge
level can gradually approach the target value and the programming precision is improved.
The corresponding cost is the slowing down in the writing speed.
After cells are programmed, the data are not necessarily error-proof, because the cell levels
can be changed by various errors over time. Some important error sources include write dis-
turb and read disturb (disturbs caused by writing or reading), as well as leakage of charge from
the cells (called the retention problem) [3]. The changes in the cell levels often have an asym-
metric distribution in the up and the down directions, and the errors in different cells can be
correlated.
In summary, flash memory is a storage medium with asymmetric properties. It is easy to
increase a cell’s charge level (which we shall call cell level), but very costly to decrease it due to
block erasure. The NAND flash may have more restrictions on reading and writing compared
to NOR flash. The cell programming uses multiple rounds of charge injection to shift the
cell level monotonically up toward the target value, to avoid overshooting and improve the
precision. The cell levels can change over time due to various disturb mechanisms and the
retention problem, and the errors can be asymmetric or correlated.
Definition 1. W RITE A SYMMETRIC M EMORY (WAM) In a write asymmetric memory, there are
n cells. Every cell has q ≥ 2 levels: levels 0, 1, · · · , q − 1. The level of a cell can only increase, not
decrease.
The Write Asymmetric Memory models the monotonic change of flash memory cells before
the erasure operation. It is a special case of the generalized write-once memory (WOM) model,
which allows the state transitions of cells to be any acyclic directed graph [6, 8, 29].
Let us first look at an inspiring example. The code can write two bits twice in only three
single-level cells. It was proposed by Rivest and Shamir in their celebrated paper that started
the study of WOM codes [29].
We always assume that before data are written into the cells, the cells are at level 0.
Example 2. We store two bits in three single-level cells (i.e., n = 3 and q = 2). The code is shown in
Fig. 1. In the figure, the three numbers in a circle represent the three cell levels, and the two numbers
beside the circle represent the two bits. The arrows represent the transition of the cells. As we can see,
every cell level can only increase.
The code allows us to write the two bits at least twice. For example, if want to write “10”, and later
rewrite them as “01”, we first elevate the cell levels to “0,1,0”, then elevate them to “0,1,1”.
Fig. 1. Code for writing two bits twice in three single-level cells.
In the above example, a rewrite can completely change the data. In practice, often multiple
data variables are stored by an application, and every rewrite changes only one of them. The
joint coding of these data variables are useful for increasing the number of rewrites supported
by coding. The rewriting codes in this setting have been named Floating Codes [16].
two memory states (c1 , · · · , cn ) and (c1 , · · · , cn ), we say (c1 , · · · , cn ) ≥ (c1 , · · · , cn ) if ci ≥ ci for
i = 1, · · · , n.
A floating code has a decoding function Fd and an update function Fu . The decoding function
maps a memory state s ∈ {0, 1, · · · , q − 1}n to the stored data Fd (s) ∈ {0, 1, · · · , − 1}k . The update
function (which represents a rewrite operation),
is defined as follows: if the current memory state is s and the rewrite changes the i-th variable to value
j ∈ {0, 1, · · · , − 1}, then the rewrite operation will change the memory state to Fu (s, i, j) such that
Fd ( Fu (s, i, j)) is the data with the i-th variable changed to the value j. Naturally, since the memory is
a write asymmetric memory, we require that Fu (s, i, j) ≥ s.
Let t denote the number of rewrites (including the first write) guaranteed by the code. A floating code
that maximizes t is called optimal.
The code in Fig. 1 is in fact a special case of the floating code, where the number of variables
is only one. More specifically, the parameters for it is k = 1, = 4, n = 3, q = 2 and t = 2.
Let us look at an example of floating codes for two binary variables.
Example 4. We store two binary variables in a Write Asymmetric Memory with n cells of q levels.
Every rewrite changes the value of one variable. The Floating codes for n = 1, 2 and 3 are shown
in Fig. 2. As before, the numbers in a circle represent the memory state, the numbers beside a circle
represent the data, and the arrows represent the transition of the memory state triggered by rewriting.
With every rewrite, the memory state moves up by one layer in the figure. For example, if n = 3, q ≥ 3
and a sequence of rewrites change the data as
The three codes in the figure all have a periodic structure, where every period contains 2n − 1 layers
(as shown in the figure) and has the same topological structure. From one period to the next, the only
difference is that the data (0, 0) is switched with (1, 0), and the data (1, 1) is switched with (0, 1).
Given the finite value of q, we just need to truncate the graph up to the cell level q − 1.
The floating codes in the above example are generalized in [16] for any value of n and q (but
still with k = = 2), and are shown to guarantee
q−1
t = (n − 1)(q − 1) +
2
rewrites. We now prove that this is optimal. First, we show an upper bound to t, the number
of guaranteed rewrites, for floating codes [16].
Data Representation for Flash Memories 57
Fig. 2. Three examples of an optimal floating code for k = 2, l = 2 and arbitrary n, q. (a) n = 1.
(b) n = 2. (c) n = 3.
[ k ( l − 1) − 1] · ( q − 1)
t ≤ [ n − k ( l − 1) + 1] · ( q − 1) + ;
2
if n < k(l − 1) − 1, then
n ( q − 1)
t≤ .
2
Proof. First, consider the case where n ≥ k( − 1) − 1. Let (c1 , c2 , · · · , cn ) denote the memory
k (−1)−1
state. Let WA = ∑i=1 ci , and let WB = ∑in=k(−1) ci . Let’s call a rewrite operation “adver-
sarial” if it either increases WA by at least two or increases WB by at least one. Since there are k
variables and each variable has the alphabet size , a rewrite can change the variable vector in
k( − 1) different ways. However, since WA is the summation of only k( − 1) − 1 cell levels,
there are at most k ( − 1) − 1 ways in which a rewrite can increase WA by one. So there must
be an “adversarial” choice for every rewrite.
Consider a sequence of adversarial rewrites operations supported by a generic floating code.
Suppose that x of those rewrite operations increase WA by at least two, and suppose that
y of them increase WB by at least one. Since the maximum cell level is q − 1, we get x ≤
58 Data Storage
[k(l −1)−1]·(q−1)
2 and y ≤ [n − k(l − 1) + 1] · (q − 1). So the number of rewrites supported by
[k(l −1)−1]·(q−1)
a floating code is at most x + y ≤ [n − k(l − 1) + 1] · (q − 1) + 2 .
The case where n < k( − 1) − 1 can be analyzed similarly. Call a rewrite operation “adver-
sarial” if it increases ∑in=1 ci by at least two. It can be shown that there is always a adversarial
n ( q −1)
choice for every rewrite, and any floating code can support at most t ≤ 2 adversarial
rewrites.
q −1
When k = = 2, the above theorem gives the bound t ≤ (n − 1)(q − 1) + 2 . It matches the
number of rewrites guaranteed by the floating codes of Example 4 (and their generalization
in [16]). So these codes are optimal.
Let us pause a little to consider the two given examples. In Example 2, two bits can be written
twice into the three single-level cells. The total number of bits written into the memory is
four (considering the whole history of rewriting), which is more than the number of bits the
memory can store at any given time (which is three). In Example 4, every rewrite changes one
of two binary variables and therefore reflects one bit of information. Since the code guarantees
q −1
(n − 1)(q − 1) + 2 rewrites, the total amount of information “recorded” by the memory
q −1
(again, over the history of rewriting) is (n − 1)(q − 1) + 2 ≈ nq bits. In comparison, the
number of bits the memory can store at any give time is only n log2 q.
Why can the total amount of information written into the memory (over the multiple rewrites)
exceed n log2 q, the maximum number of bits the memory can store at any give time? It is be-
cause only the current value of the data needs to be remembered. Another way to understand
it is that we are not using the cells sequentially. As an example, if there are n single level cells
and we increase one of their levels by one, there are n choices, which in fact reflects log2 n
bits of information (instead of one bit).
We now extend floating codes to a more general definition of rewriting codes. First, we use
a directed graph to represent how rewrites may change the stored data. This definition was
proposed in [19].
It is simple to see that when the above notion is applied to floating codes, the alphabet size
L = k , and the data graph D has constant in-degree and out-degree k ( − 1). The out-degree
of D reveals how much change in the data a rewrite operation can cause. It is an important
parameter. In the following, we show a rewriting code for this generalized rewriting model.
The code, called Trajectory Code, was presented in [19].
Let (c1 , · · · , cn ) ∈ {0, 1, q − 1}n denote the memory state. Let VD = {0, 1, · · · , L − 1} denote
the alphabet of the stored data. Let’s present the trajectory code step by step, starting with its
basic building blocks.
For every rewrite, change as few cells from level 0 to level 1 as possible to get the new data.
If the rewrites change the data as 0 → 3 → 5 → 2 → 4, the memory state can change as
(0, 0, 0, 0, 0, 0, 0) → (0, 0, 1, 0, 0, 0, 0) → (0, 1, 1, 0, 0, 0, 0) → (0, 1, 1, 0, 1, 0, 0) → (0, 1, 1, 1, 1, 1, 0).
The following theorem shows that the number of rewrites enabled by the Linear Code is
asymptotically optimal in n, the number of cells. It was proved in [29].
1
Theorem 9. The Linear Code guarantees at least n+
4 + 1 rewrites.
1
Proof. We show that as long as at least n+ 2 cells are still of level 0, a rewrite will turn at most
two cells from level 0 to level 1. Let x ∈ {0, 1, · · · , n} denote the current data value, and let
y ∈ {0, 1, · · · , n} denote the new data value to be written, where y = x. Let z denote
and let
T = {z − s mod (n + 1) | s ∈ S}.
n +1
Since |S| = | T | ≥ and |S ∪ T | < n (zero and z are in neither set), the set S ∩ T must be
2
nonempty. Their overlap indicates a solution to the equation
z = s1 + s2 mod (n + 1)
When n ≥ L and q ≥ 2, we can generalize the Linear Code in the following way [19]. First,
suppose n = L and q ≥ 2. We first use level 0 and level 1 to encode (as the Linear Code
does), and let the memory state represent the data ∑in=1 ici mod n. (Note that rewrites here
will not change cn .) When the code can no longer support rewriting, we increase all cell
levels (including cn ) from 0 to 1, and start using cell levels 1 and 2 to store data in the same
way as above, except that now, the data represented by the memory state (c1 , · · · , cn ) uses
the formula ∑in=1 i (ci − 1) mod n. This process is repeated q − 1 times in total. The general
decoding function is therefore
n
∑ i ( ci − c n ) mod n.
i =1
60 Data Storage
Now we extend the above code to n ≥ L cells. We divide the n cells into b = n/L groups
of size L (some cells may remain unused), and sequentially apply the above code to the first
group of L cells, then to the second group, and so on. We call this code the Extended Linear
Code.
Theorem 10. Let 2 ≤ L ≤ n. The Extended Linear Code guarantees n(q − 1)/8 = Θ(nq) rewrites.
( q −1) n
Proof. The Extended Linear Code essentially consists of (q − 1) nL ≥ 2L Linear Codes.
For the codes we have shown so far, we have not mentioned any constraint on the data graph
D . Therefore, the above results hold for the case where the data graph is a complete graph.
That is, a rewrite can change the data from any value to any other value. We now show that the
above codes are asymptotically optimal. For the Linear Code and the Extended Linear Code,
this is easy to see, because every rewrite needs to increase some cell level, so the number of
rewrites cannot exceed n(q − 1) = O(nq). The following theorem shows that the rewriting
code in Construction 11 is asymptotically optimal, too [19].
Theorem 13. When n < L − 1 and the data graph D is a complete graph, a rewriting code can
nq log n
guarantee at most O( log L ) rewrites.
Proof. Let us consider some memory state s of the n flash cells, currently storing some value
v ∈ {0, 1, · · · , L − 1}. The next rewrite can change the data into any of the other L − 1 values.
If we allow ourselves r operations of increasing a single cell level of the n flash cells (perhaps,
operating on the same cell more than once), we may reach (n+rr−1) distinct new states. Let r
be the maximum integer sastisfying (n+rr−1) < L − 1. So for the next rewrite, we need at least
r + 1 such operations in the worst case. Since we have a total of n cells with q levels each, the
number of rewrite operations is upper bounded by
n ( q − 1) n ( q − 1) nq log n
≤ log( L−1) = O( ).
r+1 1+log n + 1 log L
Intuitively, the first d rewrite operations are achieved by encoding the trajectory taken by the
input data sequence starting with the anchor data. After d such rewrites, we repeat the process
by rewriting the next input from {0, 1, · · · , L − 1} in the anchor S0 , and then continuing with
d edge labels in S1 , · · · , Sd .
Let us assume a sequence of s rewrites have been stored thus far. To decode the last stored
value all we need to know is s mod (d + 1). This is easily achieved by using t/q more cells
(not specified in the previous d + 1 registers), where t is the total number of rewrite operations
we would like to guarantee. For these t/q cells we employ a simple encoding scheme: in
every rewrite operation we arbitrarily choose one of those cells and raise its level by one.
Thus, the total level in these cells equals s.
The decoding process takes the value of the anchor S0 and then follows (s − 1) mod (d + 1)
edges which are read consecutively from S1 , S2 , · · · . Notice that this scheme is appealing in
cases where the maximum out-degree of D is significantly lower than the alphabet size L.
Note that each register Si , for i = 0, . . . , d, can be seen as a smaller rewriting code whose data
graph is a complete graph of either L vertices (for S0 ) or ∆ vertices (for S1 , . . . , Sd ). We use either
the Extended Linear Code or the code of Construction 11 for rewriting in the d + 1 registers.
The parameters √
of the Trajectory Code are shown by the following construction. We assume
that n ≤ L ≤ 2 n .
√
Construction 14. T RAJECTORY C ODE FOR n ≤ L ≤ 2 n
n log n
If ∆ ≤ 2 log L , let
d = log L/ log n = Θ(log L/ log n).
n log n
If 2 log L ≤ ∆ ≤ L, let
d = log L/ log ∆ = Θ(log L/ log ∆).
In both cases, set the size of the d + 1 registers to n0 = n/2 and ni = n/(2d) for i = 1, . . . d.
n log n
If ∆ ≤ 2 log L , apply the code of Construction 11 to register S0 , and apply the Extended Linear Code
n log n
to registers S1 , · · · , Sd . If 2 log L ≤ ∆ ≤ L, apply the code of Construction 11 to the d + 1 registers
S0 , · · · , S d .
The next three results show the asympotically optimality of the Trajectory Code (when ∆ is
small and large, respectively) [19].
n log n
Theorem 15. Let ∆ ≤ 2 log L . The Trajectory Code of Construction 14 guarantees Θ(nq) rewrites.
Proof. By Theorems 10 and 12, the number of rewrites possible in S0 is equal (up to constant
factors) to that of Si (i ≥ 1):
nq
n0 q log n0 nq log n
Θ =Θ =Θ = Θ ( ni q )
log L log L d
Thus the total number of rewrites guaranted by the Trajectory Code is d + 1 times the bound
for each register Si , which is Θ(nq).
n log n nq log n
Theorem 16. Let 2 log L ≤ ∆ ≤ L. The Trajectory Code of Construction 14 guarantees Θ log ∆
rewrites.
Data Representation for Flash Memories 63
Here we use the fact that as d ≤ log L it holds that d = o (n) and log ni = Θ(log n − log d) =
Θ(log n). Notice that the two expressions above are equal. Thus, as in Theorem 15, we con-
clude that the total number of rewrites
guaranteed
by the Trajectory Code is d + 1 times the
nq log n
bound for each register Si , which is Θ log ∆ .
The rewriting performance shown in the above theorem matches the bound shown in the
following theorem. We omit its proof, which interested readers can find in [19].
n log n
Theorem 17. Let ∆ > 2 log L . There exist data graphs D of maximum out-degree ∆ such that any
nq log n
rewriting code for D can guarantee at most O log ∆ rewrites.
3. Rank Modulation
We focus our attention now on a new data representation scheme called Rank Modulation [21,
23]. It uses the relative order of the cell levels, instead the absolute values of cell levels, to
represent data. Let us first understand the motivations for the scheme.
Fast and accurate programming schemes for multi-level flash memories are a topic of signifi-
cant research and design efforts. As mentioned before, the flash memory technology does not
support charge removal from individual cells due to block erasure. As a result, an interative
cell programming method is used. To program a cell, a sequence of charge injection opera-
tions are used to shift the cell level cautiously and monotinically toward the target charge level
from below, in order to avoid undesired global erasures in case of overshoots. Consequently,
the attempt to program a cell requires quite a few programming cycles, and it works only up
to a moderate number of levels per cell.
In addition to the need for accurate programming, the move to more levels in cells also ag-
gravates the reliability problem. Compared to single-level cells, the higher storage capacity of
multi-level cells are obtained at the cost of a smaller gap between adjacent cell levels. Many of
the errors in flash memories are asymmetric, meaning that they shift the cell levels more likely
in one direction (up or down) than the other. Examples include the write disturbs, the read
disturbs, and the charge leakage. It will be interesting to design coding schemes that tolerate
asymmetric errors better.
64 Data Storage
The Rank Modulation scheme is therefore proposed in [21, 23], whose aim is to eliminate both
the problem of overshooting while programming cells, and to tolerate asymmetric errors bet-
ter. In this scheme, an ordered set of n cells stores the information in the permutation induced
by the charge levels of the cells. In this way, no discrete levels are needed (i.e., no need for
threshold levels) and only a basic charge-comparing operation (which is easy to implement) is
required to read the permutation. If we further assume that the only programming operation
allowed is raising the charge level of one of the cells above the current highest one (namely,
push-to-top), then the overshoot problem is no longer relevant. Additionally, the technology
may allow in the future the decrease of all the charge levels in a block of cells by a constant
amount smaller than the lowest charge level (block deflation), which would maintain their rela-
tive values, and thus leave the information unchanged. This can eliminate a designated erase
step, by deflating the entire block whenever the memory is not in use.
Let’s look at a simple example of rank modulation. Let [n] denote the set of integers
{1, 2, · · · , n}.
Example 18. We partition the cells into groups of three cells each. Denote the three cells in a group by
cell 1, cell 2, and cell 3. We use a permutation of [3] – [ a1 , a2 , a3 ] – to represent the relative order of the
three cell levels as follows: cell a1 has the highest level, and cell a3 has the lowest level. (The cell levels
considered in this section are real numbers. So no two cells can practically have the same level.)
The three cells in a group can introduce six possible permutations:
[1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 3, 1], [3, 1, 2], [3, 2, 1]. So they can store up to log2 6 bits of infor-
mation. To write a permutation, we program the cells from the lowest level to the highest level. For
example, if the permutation to write is [2, 3, 1], we first program cell 3 to make its level higher than
that of cell 1, then program cell 2 to make its level higher than that of cell 3. This way, there is no risk
of overshooting.
In this section, we use n to denote the number of cells in a group. As in the example, we use
a permutation of [n] – [ a1 , a2 , · · · , an ] – to denote the relative order of the cell levels such that
cell a1 has the highest level and cell an has the lowest level.
Once a new data representation method is defined, tools of coding are required to make it
useful. In this section, we focus on two tools: codes for rewriting, and codes for correcting
errors.
Definition 19. The decoding function Fd : Wn → [] maps every state s ∈ Wn to a value Fd (s)
in []. Given an “old state” s ∈ Wn and a “new information symbol” i ∈ [], the update function
Fu : Wn × [] → Wn produces a state Fu (s, i ) such that Fd ( Fu (s, i )) = i.
The more push-to-top operations are used for rewriting, the closer the highest cell level is to
the maximum level possible. Once it reaches the maximum level possible, block erasure will
be needed for the next rewrite. So we define the rewriting cost by measuring the number of
push-to-top operations.
Definition 20. Given two states s1 , s2 ∈ Wn , the cost of changing s1 into s2 , denoted α(s1 → s2 ), is
defined as the minimum number of “push-to-the-top” operations needed to change s1 into s2 .
Definition 21. The worst-case rewriting cost of a code is defined as maxs∈Wn ,i∈[] α(s → Fu (s, i )).
A code that minimized this cost is called optimal.
Before we present an optimal rewriting code, let’s first present a lower bound on the worst-
case rewriting cost. Define the transition graph G = (V, E) as a directed graph with V = Sn ,
that is, with n! vertices representing the permutations in Sn . For any u, v ∈ V, there is a
directed edge from u to v iff α(u → v) = 1. G is a regular digraph, because every vertex has
n − 1 incoming edges and n − 1 outgoing edges. The diameter of G is maxu,v∈V α(u → v) =
n − 1.
Given a vertex u ∈ V and an integer r ∈ {0, 1, . . . , n − 1}, define the ball centered at u with
radius r as Brn (u) = {v ∈ V | α(u → v) ≤ r }, and define the sphere centered at u with radius r
as Srn (u) = {v ∈ V | α(u → v) = r }. Clearly,
Brn (u) = Srn (u).
0≤ i ≤r
By a simple relabeling argument, both |Brn (u)| and |Srn (u)| are independent of u, and so will
be denoted by |Brn | and |Srn | respectively.
n!
|Brn | =
(n − r )!
1 r=0
|Srn | = n! n!
( n −r ) !
− ( n −r +1) !
1 ≤ r ≤ n − 1.
Proof. Fix a permutation u ∈ V. Let Pu be the set of permutations having the following prop-
erty: for each permutation v ∈ Pu , the elements appearing in its last n − r positions appear in
the same relative order in u. For example, if n = 5, r = 2, u = [1, 2, 3, 4, 5] and v = [5, 2, 1, 3, 4],
the last 3 elements of v – namely, 1, 3, 4 – have the same relative order in u. It is easy to see
that given u, when the elements occupying the first r positions in v ∈ Pu are chosen, the last
n − r positions become fixed. There are n(n − 1) · · · (n − r + 1) choices for occupying the first
n!
r positions of v ∈ Pu , hence | Pu | = (n− r )!
. We will show that a vertex v is in Brn (u) if and only
if v ∈ Pu .
66 Data Storage
Suppose v ∈ Brn (u). It follows that v can be obtained from u with at most r “push-to-the-top”
operations. Those elements pushed to the top appear in the first r positions of v, so the last
n − r positions of v contain elements which have the same relative order in u, thus, v ∈ Pu .
Now suppose v ∈ Pu . For i ∈ [n], let vi denote the element in the i-th position of v. One can
transform u into v by sequentially pushing vr , vr−1 , . . . , v1 to the top. Hence, v ∈ Brn (u).
n!
We conclude that |Brn (u)| = | P| = (n− r )!
. Since Brn (u) = 0≤i≤r Sin (u), the second claim
follows.
The following lemma shows a lower bound to the worst-case rewriting cost.
Lemma 23. Fix integers n and , and define ρ(n, ) to be the smallest integer such that |Bρn(n,) | ≥ .
For any code Wn and any state s ∈ Wn , there exists i ∈ [] such that α(s → Fu (s, i )) ≥ ρ(n, ), i.e.,
the worst-case rewriting cost of any code is at least ρ(n, ).
Proof. By the definition of ρ(n, ), |Bρn(n,)−1 | < . Hence, we can choose i ∈ [] \ { Fd (s ) | s ∈
Bρn(n,)−1 (s)}. Clearly, by our choice α(s → Fu (s, i )) ≥ ρ(n, ).
We now present a rewriting code construction. It will be shown that the code achieves the
minimum worst-case rewriting cost. First, let us define the following notation.
Definition 24. A prefix sequence θ = [ a(1) , a(2) , . . . , a(m) ] is a sequence of m ≤ n distinct symbols
from [n]. The prefix set Pn (θ ) ⊆ Sn is defined as all the permutations in Sn which start with the
sequence θ.
Theorem 26. The code in Construction 25 is optimal in terms of minimizing the worst-case rewriting
cost.
Proof. It is obvious from the description of Fu that the worst-case rewriting cost of the con-
struction is at most ρ(n, ). By Lemma 23 this is also the best we can hope for.
Example 27. Let n = 3, = 3. Since |B13 | = 3, it follows that ρ(n, ) = 1. We partition the n! = 6
states into (n−ρn!
(n,))!
= 3 sets, which induce the mapping:
If a probability distribution is known for the rewritten data, we can also define the perfor-
mance of codes based on the average rewriting cost. This is studied in [21], where an variable-
length prefix-free code is optimized. It is shown that the average rewriting cost of this prefix-
free code is within a small constant approximation ratio of the minimum possible cost of all
codes (prefix-free codes or not) under very mild conditions.
Basic Properties
d( A, B) = d( A , B ) + n − p.
The above theorem can be proved by induction. It shows a recursive algorithm for computing
the distance between two permutations. Let A = [ a1 , a2 , . . . , an ] and B = [b1 , b2 , . . . , bn ] be two
permutations. For 1 ≤ i ≤ n, let Ai denote [ a1 , a2 , . . . , ai ], let Bi denote the subsequence of B
that contains only those numbers in Ai , and let pi denote the position of ai in Bi . Then, since
d( A1 , B1 ) = 0 and d( Ai , Bi ) = d( Ai−1 , Bi−1 ) + i − pi , for i = 2, 3, . . . , n, we get
n
(n − 1)(n + 2)
d( A, B) = d( An , Bn ) = − ∑ pi .
2 i =2
68 Data Storage
Example 29. Let A = [1, 2, 3, 4] and B = [4, 2, 3, 1]. Then A1 = [1], A2 = [1, 2], A3 = [1, 2, 3],
A4 = [1, 2, 3, 4], B1 = [1], B2 = [2, 1], B3 = [2, 3, 1], B4 = [4, 2, 3, 1]. We get
d( A1 , B1 ) = 0,
d( A2 , B2 ) = d( A1 , B1 ) + 2 − p2 = 0 + 2 − 1 = 1,
d( A3 , B3 ) = d( A2 , B2 ) + 3 − p3 = 1 + 3 − 2 = 2,
d( A4 , B4 ) = d( A3 , B3 ) + 4 − p4 = 2 + 4 − 1 = 5.
(n−1)(n+2) (4−1)(4+2)
Indeed, d( A, B) = 2 − ∑in=2 pi = 2 − (1 + 2 + 1) = 5.
We now define a coordinate system for permutations. We fix A = [1, 2, . . . , n]. For every
permutation B = [b1 , b2 , . . . , bn ], we define its coordinates as XB = (2 − p2 , 3 − p3 , . . . , n − pn ).
Here pi is defined as above for 2 ≤ i ≤ n. Clearly, if XB = ( x1 , x2 , . . . , xn−1 ), then 0 ≤ xi ≤ i
for 1 ≤ i ≤ n − 1.
Example 30. Let A = [1, 2, 3, 4, 5]. Then X A = (0, 0, 0, 0). If B = [3, 4, 2, 1, 5], then XB =
(1, 2, 2, 0). If B = [5, 4, 3, 2, 1], then XB = (1, 2, 3, 4). The full set of coordinates for n = 3 and n = 4
are shown in Fig. 3 (a) and (c), respectively.
The coordinate system is equivalent to a form of Lehmer code (or Lucas-Lehmer code, inver-
sion table) [26]. It is easy to see that two permutations are identical if and only if they have the
same coordinates, and any vector (y1 , y2 , . . . , yn−1 ), 0 ≤ yi ≤ i for 1 ≤ i ≤ n − 1, is the coor-
dinates of some permutation in Sn . So there is a one-to-one mapping between the coordinates
and the permutations.
n ( n −1)
Let A ∈ Sn be a permutation. For any 0 ≤ r ≤ 2 , the set Br ( A) = { B ∈ Sn | d( A, B) ≤ r }
is a ball of radius r centered at A. A simple relabeling argument suffices to show that the
size of a ball does not depend on the choice of center. We use |Br | to denote |Br ( A)| for any
A ∈ Sn . We are interested in finding the value of |Br |. The following theorem presents a way
to compute the size of a ball using polynomial multiplication.
n ( n −1) x i +1 − 1
Theorem 31. For 0 ≤ r ≤ 2 , let er denote the coefficient of xr in the polynomial ∏in=−11 x −1 .
Then |Br | = ∑ri=0 er .
Proof. Let A = [1, 2, . . . , n]. Let B = [b1 , b2 , . . . , bn ] be a generic permutation. Let XB =
(y1 , y2 , . . . , yn−1 ) be the coordinates of B. By the definition of coordinates, we get d( A, B) =
∑in=−11 yi . The number of permutations at distance r from A equals the number of integer
solutions to ∑in=−11 yi = r such that 0 ≤ yi ≤ i. That is equal to the coefficient of xr in the
x i +1 − 1
polynomial ∏in=−11 ( xi + xi−1 + · · · + 1) = ∏in=−11 x −1 . Thus, there are exactly er permutations
at distance r from A, and |Br | = ∑ri=0 ei .
Theorem 31 induces an upper bound for the sizes of error-correcting rank-modulation codes.
By the sphere-packing principle, for such a code that can correct r errors, its size cannot exceed
n!/|Br |.
Lemma 32. For two permutations A = [ a1 , a2 , . . . , an ] and B = [b1 , b2 , . . . , bn ], let their coordinates
be X A = ( x1 , x2 , . . . , xn−1 ) and XB = (y1 , y2 , . . . , yn−1 ). If A and B are adjacent, then ∑in=−11 | xi −
yi | = 1.
The above lemma can be proved by induction. Interested readers can see [23] for details.
Let Ln = (VL , EL ) denote a 2 × 3 × · · · × n linear array graph. Ln has n! vertices VL . Each
vertex is assigned integer coordinates ( x1 , x2 , . . . , xn−1 ), where 0 ≤ xi ≤ i for 1 ≤ i ≤ n − 1.
The distance between vertices of Ln is the L1 distance, and two vertices are adjacent (i.e., have
an edge between them) if and only if their distance is one.
We now build a bijective map P : V → VL . Here V is the vertex set of the adjacency graph
of permutations G = (V, E). For any u ∈ V and v ∈ VL , P(u) = v if and only if u, v have
the same coordinates. By Lemma 32, if two permutations are adjacent, their coordinates are
adjacent in Ln . So we get:
Theorem 33. The adjacency graph of permutations is a subgraph of the 2 × 3 × · · · × n linear array.
We show some examples of the embedding in Fig. 3. It can be seen that while each permu-
tation has n − 1 adjacent permutations, a vertex in the array can have a varied degree from
n − 1 to 2n − 3. Some edges of the array do not exist in the adjacency graph of permutations.
The observation that the permutations’ adjacency graph is a subgraph of a linear array shows
an approach to design error-correcting rank-modulation codes based on Lee-metric codes. We
skip the proof of the following theorem due to its simplicity.
Theorem 34. Let C be a Lee-metric error-correcting code of length n − 1, alphabet size no less than n,
and minimum distance d. Let C be the subset of codewords of C that are contained in the array Ln .
Then C is an error-correcting rank-modulation code with minimum distance at least d.
70 Data Storage
Lemma 37. The rank-modulation code built in Construction 35 can correct one error.
Proof. It has been shown in [11] that for an infinite k-dimensional array, vertices whose co-
ordinates ( x1 , x2 , . . . , xk ) satisfy the condition ∑ik=1 ixi ≡ 0 (mod 2k + 1) have a minimum L1
distance of 3. Let k = n − 1. Note that in Construction 35, the codewords of C1 are a subset
of the above vertices, while the codewords in C2 are a subset of the mirrored image of the
above vertices, where the last coordinate xn−1 is mapped to − xn−1 . Since the permutations’
adjacency graph is a subgraph of the array, the minimum distance of C1 and C2 is at least 3.
Hence, the code built in Construction 35 can correct one error.
Data Representation for Flash Memories 71
Proof. Every permutation has n − 1 adjacent permutations, so the size of a radius-1 ball, |B1 |,
is n. By the sphere packing bound, a single-error-correcting rank-modulation code can have at
most n!n = ( n − 1) ! codewords. The code in Construction 35 has at least ( n − 1) !/2 codewords.
The errors in flash cell levels often have an asymmetric property. In [4], error-correcting codes
that correct asymmetric errors of limited magnitude were designed for flash memories.
In a storage system, to avoid the accumulation of errors, a common practice is to write the cor-
rect data back into the storage system once the errors accumulated in the data reach a certain
threshold. This is called memory scrubbing. In flash memories, however, memory scrubbing is
more difficult because to write one correct codeword back into the system, the whole block
needs to be erased. A new type of error-correcting codes, called Error-Scurbbing Codes, were
defined in [20] for multi-level cells. It is shown that even if the only allowed operation is to
increase cell levels, a higher rate of ECC can be still achieved by actively scrubbing errors.
The block erasure property of flash memories affects not only rewriting and cell programming,
but also data movement. In [22], it is shown that by appropriately using coding, the number
of erasures needed for moving data among n NAND flash blocks can be reduced by a factor
of O(log n).
The rewriting codes and the rank modulation scheme are useful not only for flash memories,
but also for other constrained memories. A promonent example is the phase-change memory
(PCM). In a phase-change memory, a memory cell can be switched between a crystalline state
and an amorphous state. Intermediate states are also possible. Since changing the cell to the
crystalline state takes a substantially longer time than changing it to the amorphous state,
the memory is closely related to the Write-Asymmetric Memory model. How to program
PCM cells with more intermediate states fast and reliably is an active ongoing topic. The rank
modulation scheme may provide an effective tool in this area.
Acknowledgment
We would like to thank all our co-authors for their collaborative work in this area. In particu-
lar, we would like to thank Mike Langberg and Moshe Schwartz for many of the main results
discussed in this chapter.
5. References
[1] R. Ahlswede and Z. Zhang, “On multiuser write-efficient memories,” IEEE Trans. on In-
form. Theory, vol. 40, no. 3, pp. 674–686, 1994.
[2] V. Bohossian, A. Jiang and J. Bruck, “Buffer codes for asymmetric multi-level memory,” in
Proc. IEEE International Symposium on Information Theory (ISIT), 2007, pp. 1186-1190.
[3] P. Cappelletti, C. Golla, P. Olivo and E. Zanoni (Ed.), Flash memories, Kluwer Academic
Publishers, 1st Edition, 1999.
[4] Y. Cassuto, M. Schwartz, V. Bohossian and J. Bruck, “Codes for multilevel flash memories:
Correcting asymmetric limited-magnitude errors,” in Proc. IEEE International Symposium
on Information Theory (ISIT), Nice, France, June 2007, pp. 1176-1180.
[5] G. D. Cohen, P. Godlewski, and F. Merkx, “Linear binary code for write-once memories,”
IEEE Trans. on Inform. Theory, vol. IT-32, no. 5, pp. 697–700, Sep. 1986.
[6] A. Fiat and A. Shamir, “Generalized “write-once” memories,” IEEE Trans. on Inform. The-
ory, vol. IT-30, no. 3, pp. 470–480, May 1984.
[7] H. Finucane, Z. Liu and M. Mitzenmacher, “Designing floating codes for expected perfor-
mance,” in Proc. 46th Annual Allerton Conference, 2008.
[8] F. Fu and A. J. Han Vinck, “On the capacity of generalized write-once memory with state
transitions described by an arbitrary directed acyclic graph,” IEEE Trans. on Inform. Theory,
vol. 45, no. 1, pp. 308–313, Jan. 1999.
Data Representation for Flash Memories 73
[9] F. Fu and R. W. Yeung, “On the capacity and error-correcting codes of write-efficient mem-
ories,” IEEE Trans. on Inform. Theory, vol. 46, no. 7, pp. 2299–2314, Nov. 2000.
[10] E. Gal and S. Toledo, “Algorithms and data structures for flash memories,” in ACM Com-
puting Surveys, vol. 37, no. 2, pp. 138-163, June 2005.
[11] S. W. Golomb and L. R. Welch, “Perfect codes in the Lee metric and the packing of poly-
ominoes,” SIAM J. Appl. Math., vol. 18, no. 2, pp. 302–317, Jan. 1970.
[12] A. J. Han Vinck and A. V. Kuznetsov, “On the general defective channel with informed
encoder and capacities of some constrained memories,” IEEE Trans. on Inform. Theory,
vol. 40, no. 6, pp. 1866–1871, 1994.
[13] C. D. Heegard, “On the capacity of permanent memory,” IEEE Trans. on Inform. Theory,
vol. IT-31, no. 1, pp. 34–42, Jan. 1985.
[14] C. D. Heegard and A. A. E. Gamal, “On the capacity of computer memory with defects,”
IEEE Trans. on Inform. Theory, vol. IT-29, no. 5, pp. 731–739, Sep. 1983.
[15] A. Jiang, “On the generalization of error-correcting WOM codes,” in Proc. IEEE Interna-
tional Symposium on Information Theory (ISIT’07), 2007, pp. 1391-1395.
[16] A. Jiang, V. Bohossian and J. Bruck, “Floating codes for joint information storage in write
asymmetric memories,” in Proc. IEEE International Symposium on Information Theory (ISIT),
2007, pp. 1166-1170.
[17] A. Jiang and J. Bruck, “Joint coding for flash memory storage,” in Proc. IEEE International
Symposium on Information Theory (ISIT), 2008, pp. 1741-1745.
[18] A. Jiang and J. Bruck, “On the capacity of flash memories,” in Proc. International Sympo-
sium on Information Theory and Its Applications (ISITA), 2008, pp. 94-99.
[19] A. Jiang, M. Langberg, M. Schwartz and J. Bruck, “Universal rewriting in constrained
memories,” in Proc. IEEE International Symposium on Information Theory (ISIT), Seoul, Ko-
rea, June-July 2009. Technical report online at http://www.paradise.caltech.edu/
papers/etr096.pdf.
[20] A. Jiang, H. Li and Y. Wang, “Error scrubbing codes for flash memories,” in Proc. Canadian
Workshop on Information Theory (CWIT), May 2009, pp. 32-35.
[21] A. Jiang, R. Mateescu, M. Schwartz and J. Bruck, “Rank modulation for flash memories,”
in Proc. IEEE International Symposium on Information Theory (ISIT), 2008, pp. 1731-1735.
[22] A. Jiang, R. Mateescu, E. Yaakobi, J. Bruck, P. Siegel, A. Vardy and J. Wolf, “Storage coding
for wear leveling in flash memories,” in Proc. IEEE International Symposium on Information
Theory (ISIT’09), Seoul, Korea, June-July 2009.
[23] A. Jiang, M. Schwartz and J. Bruck, “Error-correcting codes for rank modulation,” in Proc.
IEEE International Symposium on Information Theory (ISIT), 2008, pp. 1736-1740.
[24] M. Kendall and J. D. Gibbons, Rank correlation methods. Oxford University Press, NY,
1990.
[25] A. V. Kuznetsov and B. S. Tsybakov, “Coding for memories with defective cells,” Problemy
Peredachi Informatsii, vol. 10, no. 2, pp. 52–60, 1974.
[26] D. H. Lehmer, “Teaching combinatorial tricks to a computer,” in Proc. Sympos. Appl. Math.
Combinatorial Analysis, vol. 10, Amer. Math. Soc., Providence, R.I., pp. 179-193, 1960.
[27] F. Merkx, “WOM codes constructed with projective geometries,” Traitment du Signal,
vol. 1, no. 2-2, pp. 227–231, 1984.
[28] W. M. C. J. van Overveld, “The four cases of write unidirectional memory codes over
arbitrary alphabets,” IEEE Trans. on Inform. Theory, vol. 37, no. 3, pp. 872–878, 1991.
[29] R. L. Rivest and A. Shamir, “How to reuse a ‘write-once’ memory,” Information and Con-
trol, vol. 55, pp. 1-19, 1982.
74 Data Storage
[30] G. Simonyi, “On write-unidirectional memory codes,” IEEE Trans. on Inform. Theory,
vol. 35, no. 3, pp. 663–667, May 1989.
[31] Z. Wang, A. Jiang and J. Bruck, “On the capacity of bounded rank modulation for flash
memories,” in Proc. IEEE International Symposium on Information Theory (ISIT), Seoul, Ko-
rea, June-July 2009.
[32] F. M. J. Willems and A. J. Vinck, “Repeated recording for an optical disk,” in Proc. 7th
Symp. Inform. Theory in the Benelux, May 1986, Delft Univ. Press, pp. 49-53.
[33] J. K. Wolf, A. D. Wyner, J. Ziv, and J. Korner, “Coding for a write-once memory,” AT&T
Bell Labs. Tech. J., vol. 63, no. 6, pp. 1089–1112, 1984.
[34] E. Yaakobi, A. Vardy, P. H. Siegel and J. K. Wolf, “Multidimensional flash codes,” in Proc.
46th Annual Allerton Conference, 2008.
[35] G. Zémor and G. Cohen, “Error-correcting WOM-codes,” IEEE Trans. on Inform. Theory,
vol. 37, no. 3, pp. 730–734, May 1991.
Design and Implementation of FPGA- based Systolic Array for LZ Data Compression 75
X5
1. Introduction
Data compression is becoming an essential component of high speed data communications
and storage. Lossless data compression is the process of encoding ("compressing") a body of
data into a smaller body of data which can, at a later time, be uniquely decoded
("decompressed") back to the original data. In lossy compression, the decompressed data
contains some approximation of the original data.
Hardware implementation of data compression algorithms is receiving increasing attention
due to exponential expansion in network traffic and digital data storage usage. Many lossless
data compression techniques have been proposed in the past and widely used, e.g., Huffman
code (Huffman, 1952) ; (Gallager, 1978);( Park & Prasanna, 1993), arithmetic code (Bodden et
al., 2004);(Said, 2004);(Said, 2003); (Howard & Vetter, 1992), run-length code (Golomb, 1966),
and Lempel–Ziv (LZ) algorithms (Ziv & Lempel, 1977);( Ziv & Lempel, 1978);(Welch,
1984);(Salomon, 2004). Among those, LZ algorithms are the most popular when no prior
knowledge or statistical characteristics of the data being compressed are available. The
principle of the LZ algorithms is to find the longest match between the recently received
string which is stored in the input buffer and the incoming string. Once this match is located,
the incoming string is represented with a position tag and a length variable linking the new
string to the old existing one. Since the repeated data is linked to an older one, more concise
representation is achieved and compression is performed. The latency of the compression
process is defined by the number of clock cycles needed to produce a codeword (matching
results).
To fulfill real-time requirements, several hardware realizations of LZ and its variants have
been presented in the literature. Different hardware architectures, including content
addressable memory (CAM) (Lin & Wu, 2000);(Jones, 1992);(Lee & Yang, 1995), Systolic array
(Ranganathan & Henriques, 1993);(Jung & Burleson, 1998);(Hwang & Wu, 2001), and
embedded processor (Chang et al., 1994), have been proposed in the past. The microprocessor
approach is not attractive for real- time applications, since it does not fully explore hardware
parallelism (Hwang & Wu, 2001). CAM has been considered one of the fastest architectures
to search for a given string in a long world, which is necessary process in LZ. A CAM- based
LZ data compressor can process one input symbol per clock cycle, regardless of the buffer
length and string length. A CAM- based LZ can achieve optimum speed for compression.
76 Data Storage
However, CAMs require highly complex hardware and dissipate high power. The CAM
approach performs string matching through full parallel search, while the systolic-array
approach exploits pipelining.
As compared to CAM- based designs, systolic –array-based designs are slower, but better in
hardware cost and testability (Hwang & Wu, 2001); (Hwang & Wu, 1995); (Hwang & Wu,
1997). Preliminary design for systolic – array contains thousands of processing elements (PEs)
(Ranganathan & Henriques, 1993). High speed designs were then reported later, requiring
only tens of PEs (Jung & Burleson, 1998);(hwang et al., 2001). A technique to enhance the
efficiency of systolic- array approach which is used to implement Lempel- Ziv algorithm is
described in this chapter. A parallel technique for LZ-based data compression is presented.
The technique employs transforming a data–dependent algorithm to a data – independent
algorithm. A control variable is introduced to indicate early completion which improves the
latency. The proposed implementation is area and speed efficient. The effect of the input
buffer length on the compression ratio is analyzed. An FPGA implementation for the
proposed technique is carried out. The implemented design verifies that data can be
compressed and decompressed on-the- fly which opens new areas of research in data
compression.
The organization of this chapter is as follows: In Section 2, the LZ compression algorithm is
explained. The results and comments about some software simulations are discussed. The
dependency graph (DG) to investigate the data dependency of every computation step in the
algorithm is shown. The most recent systolic array architecture is described and an area and
speed efficient architecture is proposed in Section 3. In Section 4, the proposed systolic array
structure is compared with the most recent structures (Hwang et al., 2001) in terms of area,
and latency. An FPGA implementation for the proposed architecture showing the real time
operations is demonstrated in Section 5. Finally, conclusions are provided in Section 6.
Let us consider an example with window length of (n=9) and look-ahead buffer length (Ls=3)
shown in Fig. 1.
Let the content of the window be denoted as Xi, i = 0,1,…..,n-1 and that of the look-ahead
buffer be Yj, j = 0,1,……Ls-1 (i.e., Yj = Xi+ n-Ls). According to LZ algorithm, the content of look-
ahead buffer is compared with the dictionary content starting from X0 to Xn-Ls-1 to find the
longest match length. If the best match in the window is found to start from position Ip and
the match length is Lmax. Then Lmax symbols will be represented by a codeword (Ip, Lmax). The
codeword length is Lc:
Lc = 1 + [log2 (n-Ls)] + [Log2 Ls] bits (1)
Lc is fixed. Assume w bits are required to represent a symbol in the window, l = [log2 Ls] bits
are required to represent Lmax, and p = [log2 (n-Ls)] bits are required to represent Ip. Then the
compression ratio is (l + p) / (Lmax * w), where 0 ≤ Lmax ≤ Ls. Hence the compression ratio
depends on the match situation.
The codeword design and the choice of widow length are crucial in achieving maximum
compression. The LZ technique involves the conversion of variable length substrings into
fixed length codewords that represent the pointer and the length of the match. Hence,
selection of values of n and Ls can greatly influence the compression efficiency of the LZ
algorithm.
1,2
1 n=256
compression ratio
n=512
0,8
n=102
0,6 4
0,4
4 8 16 32 64 128
Ls values
Fig. 2. The relationship between the compression ratio of Calgary corpus and Ls for different
values of n
1,2
1 n=25
compression ratio
6
n=51
0,8
2
n=10
0,6 24
0,4
4 8 16 32 64 128
Ls values
Fig. 3. The relationship between the compression ratio of Silesia corpus and Ls for different
values of n.
3.1 Design-1
This architecture was first proposed in (Ziv & Lempel, 1977). The space-time diagram and
its final array architecture are given in Fig. 5, where D represents a unit delay on the signal
line between two processing elements. In Table 1, the six –sets of comparisons have to be
done in sequence in order to find the maximum matching substring.
Let us consider six processing elements (PE’s) in parallel, each performing one vertical set of
comparisons. Each processing element would require 3 time units (Ls =3) to complete its set
of comparisons. As shown in Fig. 5, the delay blocks in each PE delay the Y by two time
steps and the X by one time step. A space- time diagram is used to illustrate the sequence of
comparisons as performed by each PE. The data brought into PE0 are routed systolically
through each processor from left to right. In the first time unit, X0 and Y0 are compared at
PE0. In the second time unit, X1 and Y1 are compared, X0 flows to PE1, and Y0 is delayed by
one cycle (time unit). In the third time unit, X2 andY2 are compared at PE0. At this time, Y0
gets to PE1along with X1, and PE1 performs its first comparison at the third cycle, PE0
80 Data Storage
completes all its required comparisons and stores an integer specifying the number of
successful comparisons in a register called Li. Another register called Lmax, holds the
maximum matching length obtained from the previous PE's. In the fourth time unit, PE0
compares the values of Lmax (which for PE0 is 0) and Li, and the greater of the two is sent to
the Lmax register of the next PE. The result of the Li - Lmax comparison is sent to the next PE
after a delay of one time unit for proper synchronization. Finally, the Lmax value emerging
out from the last PE (PE5 in this case) is the length of the longest matching substring. There
is another register called PE's id, whose contents are passed along with the Lmax value to the
next PE. Its contents indicate the id of the processor element where the Lmax value occurred
which becomes the pointer to the match.
The functional block of the PE is shown in Fig. 6, in which the control circuit is not included.
Two comparators are needed in the PE: one is for equality check of Yj and Xi and the other
together with two multiplexers are for determining Lmax and Ip. If Yj and Xi are equal, a
counter is incremented each time until an unsuccessful comparison occurs. Sequences Xi and
Yj can be generated by the buffer shown in Fig. 7, which is organized in two levels the upper
level of the buffer holds the incoming symbols to be compressed. The contents of the upper
level are copied into the lower level whenever the "load" line goes high. The lower level is
used to provide data to the PE's in the correct sequence.
Design and Implementation of FPGA- based Systolic Array for LZ Data Compression 81
Yj
w
Xi
w
a b
Comparator
a=b
Incremente
r
D Li
Lmax M
u D
x
l a b
Comparator
a>b
Ip
M
p u D
PE’s id x
The operation of the buffer is as follows. When the longest match length is found, the same
number of symbols are shifted into the in upper buffer from the source and then the
symbols in the upper buffer are copied to the lower buffer in parallel to generate the next
sequence to the processor array. In the Design- 1 array, The number of clock cycles needed
to produce a codeword is 2 (n-Ls), so the utilization rate of each PE is Ls/ [(2 (n-Ls)], which is
low since the PE is idle from the moment when Li is determined until the time the codeword
is produced. The reason is that it seems impossible to compress subsequent input symbols
before the present compression process is completed, because the number of input symbols
needed to be shifted into the buffer is equal to the longest match length which is not
available before the completion of the present compression process. Therefore, the design
with more than Ls pipeline stages must have some idle PEs before the present codeword is
produced.
3.2 Design-2:
The Design-2 was first proposed in (Hwang & Wu, 2001). The space-time diagram and its
array architecture are given in Fig. 8. It consists of Ls process elements. The match element
Yjs stay in the PEs, and Xi and Li both flow leftwards with delays of 1 and 2 clock cycles,
82 Data Storage
respectively. The first Li from the leftmost PE will be obtained after 2* Ls clock cycles. After
that, one Li will be obtained every clock cycle. The block diagram of the Design-2 PE is
shown in Fig. 9.
Accumulator Li
D D
D D
3.3 Design-3:
Design-3 was proposed in (Ranganathan & Henriques, 1993). The space-time diagram and
the resulting array are given in Fig. 10. The value of Yi stays in the PE. The buffer element Xi
moves systolically from right to left with delay of 2 clocks. Li propagates from right to left
with 1 clock cycle. The first Li from the leftmost PE will be obtained after Ls clock cycles.
After that, the subsequent ones will be obtained every clock cycle. The structure of Design-3
PE is shown in Fig. 11. MRB is needed to determine Lmax and Ip.
space
Fig. 10. The space-time diagram and the resulting array of Design-3.
Yj
w
Reg
w Xi
D D
a b
comparator
a=b
Ei
Li
D Accumulator
D
Fig. 11. The structure of Design-3 PE.
84 Data Storage
X0 X1 X2 X3 X4 X5 X6 X7 X8
Input sequence
Mux
Before the encoding process, the Yis are preloaded which take Ls extra cycles. During the
encoding process, the time to preload new source symbols depend on how many source
symbols were compressed in the previous compression step, Lmax. The block diagram of the
processor element is shown in Fig. 14.
Design and Implementation of FPGA- based Systolic Array for LZ Data Compression 85
Match results block MRB is needed to determine Lmax among the serially produced Lis. MRB
is shown in Fig. 15. The PEs do not need to store their ids to record the position of the Lis. A
special counter is needed to generate the sequence which interleaves the position of the first
half of Lis and the position for the second half as shown in Fig. 16. The compression time of
the interleaved design is n clock cycles.
Lmax
Reg
Lmax
Li M l
u
x
a b
comparator
a>b
Ip
p Ip
M
p u
counter n - Ls
x
a b Code word
comparator
a=b d
The dictionary element Xi moves systolically from left to right with a delay of 1 clock cycle.
The match signal Ei of the processing element moves to the L-encoder. The output Li of the
encoder is the matching length resulting from the comparisons at step i-1. The first Li will be
obtained after one clock cycle and the subsequent ones will be obtained every clock cycle.
Before the encoding process, the Yis are preloaded to be processed and this takes Ls extra
cycles. During the encoding process, the time to preload new source symbols depends upon
how many source symbols were compressed in the previous compression step, Lmax.
The functional block of the PE is shown in Fig. 18. Only one equality comparator is needed
for comparing Yj and incoming Xi. The comparator result Ei (match signal) propagates to L-
encoder. The block diagram of L- encoder is shown in Fig. 19. According to Eis (match
signals), L-encoder computes the match length Li corresponding to position i.
Space
Time PE2 PE1 PE0 Li
As shown in Fig. 17, it is clear that the maximum matching length is not produced by the L-
encoder. So, a match results block (MRB) is needed, as shown in Fig. 20, to determine Lmax
among the serially produced Lis. Also, the PEs need not store their ids to record the position
of the Lis (Ip). Since p = [log2(n-Ls)] bits are required to represent Ip, only a p- bit counter is
required to provide the position i associated with each Li, since the time when Li is produced
corresponds to its position. MRB uses a comparator to compare the current input Li and the
present longest match length Lmax stored in the register. If the current input Li is larger than
Lmax, then Li is loaded into the register and the content of the position counter is also loaded
into another register which is used to store the present Ip. Another comparator is used to
determine whether the whole window has been searched. It compares the content of the
position counter with n- Ls, whose output is used as the codeword – ready signal. During the
searching process, Li might be equal to Ls when I< n- Ls, i.e., the content in the look-ahead
buffer can be fully matched to a subset of the dictionary, and hence searching the whole
window is not always necessary. An extra comparator is used to determine whether Lmax is
equal to Ls, and hence the string matching process is completed. Therefore, encoding a new
set of data could start immediately. This will reduce the average compression time. The
number of clock cycles needed to produce a codeword is [(n-Ls) + 1] clock cycles, so the
utilization rate of each PE is (n-Ls) / [(n-Ls) + 1], which almost equals to one. This result is
consistent since the PE is busy once Li is determined until the time at which the codeword is
produced.
88 Data Storage
Lmax
Reg
M Lmax
u
l
x
Li
a b Ls
comparator a b
a>b comparator
a=b
Ip
Reg
done
Ip
M p
p u
counter x
n - Ls
Code word
a b d
comparator
a=b
5. FPGA Implementation
In this section, the proposed architecture for LZ is implemented on FPGA. As shown in Fig.
22, the architecture consists of 3 major components: systolic-array LZ component (SALZC),
block RAM, and host controller. The length of window (n) is assumed to be 1K, and the
length of look-ahead buffer (Ls) is assumed to be 16. SALZ component contains 16 PEs and
implements the most cost effective array architecture (Design –P). Full – custom layout is
straight forward since the array is very regular. Only a single cell (PE) was hand – laid out.
The other 15 PEs are copies of it. Since the array is systolic, routing also is simplified. Block
RAM that is used as the data buffer (Dictionary) is not included in SALZ component.
Thereby, the dictionary length can be increased by directly replacing the block RAM with a
larger one. The dictionary length is a parameter to cover a broad range of applications, from
text compression to lossless image compression.
The implementation of the proposed design (Design-P) and interleaved design (Design-i) are
carried out using XILINX (Spartan II XC200) FPGA, for n= 1 k, Ls=16, w =8. The
implementation results are shown in Table 3. The number of components and their
percentage as compared with the available components on the chip are calculated.
The compression rate of a compressor is defined as the number of input bits which can be
compressed in one second. The compression rate (Rc) can be estimated as follows:
Rc = clk × [(LsW) /(n-Ls+1)] (2)
Where clk is the operating frequency. Note that only estimated Rc can be obtained, since it
depends on the input data. It is not possible to predict exactly how many words will be
compressed (Ls at most) and how many clock cycles will be required ((n – Ls +1) at most) for
every compression step. In the proposed implementation, if the window length (n) is 1k, Ls
=16, w =8, and clk = 105 MHz, The Rc is 13 M bit per second.
Number Number Number
Number of Slice of 4 input of Maximum
of Slices Flip Flops LUTs BRAMs Frequency
Design-P 310 13% 408 8% 419 8% 2 14% 105 MHz
Design-i 471 20% 511 10% 650 13% 2 14% 79 MHz
Table 3. The implementation results of Design-P and Design-i
In order to use the parallel scheme to increase the compression rate, the host controller could
be modified and an appropriate number of LZ compressor components could be connected
in parallel, as shown in Fig. 23. Note that, the MRB now needs to determine Lmax among LI, LII
and LIII. e.g., ten components could be implemented in one chip of large size to achieve a
compression rate about 130 M bits per second. Moreover, by modifying the host controller
and including, e.g., dictionaries, the proposed design can be used for other string-matching
based LZ algorithms, such as LZ78 and LZW. The Design-P is flexible.
6. Conclusions
In this chapter, a parallel algorithm for LZ based data compression is described by
transforming a data–dependent algorithm to a data – independent regular algorithm. To
further improve the latency, a control variable to indicate early completion is used. The
Design and Implementation of FPGA- based Systolic Array for LZ Data Compression 91
proposed implementation is area and speed efficient. The compression rate is increased by
more than 40% and the design area is decreased by more than 30%. The design can be
integrated into real – time systems so that data can be compressed and decompressed on –
the - fly.
7. References
Abd El Ghany, M. A.; Salama, A. E. & Khalil, A. H. “Design and Implementation of FPGA-
based Systolic Array for LZ Data Compression” in Proc. IEEE International
Symposium on Circuits and Systems ISCAS07, May 2007
Arias, M.; Cumplido, R. & Feregrino C., (2004) "On the Design and Implementation of an
FPGA-based Lossless Data Compressor.", ReConFig’04, Colima, Mexico, Sep., 2004
Bodden, E.; Clasen, M.; Kneis, J. & Schluter, R., (2004) "Arithmetic coding revealed" pp. 8-57,
May 2004
Chang, J.; Jih, H. J & Liu, J. W., (1994) “A lossless data compression processor,” in Proc. 4th
VLSI Design/CAD Workshop, Aug. 1994, pp.134-137
Deorowicz, S., (2003) “Universal lossless data compression algorithms” Dissertation, Silesian
University, pp 92-119, 2003
Gallager, R. (1978)“Variations on a theme by Huffman,” IEEE Trans. Inform. Theory, vol. IT-
24, pp. 668-674, 1978
Golomb, S., (1966) “Run-length encoding,” IEEE Trans. Inform. Theory, Vol. IT-12, pp. 399-
401, July 1966
Howard, P. G. & Vetter, J. S., (1992) "Practical Implementations of Arithmetic Coding"
Brown University, Department of Computer Science, Technical Report No. (CS-91-45),
April 1992
Huffman, D. A. (1952) “A method for the construction of minimum-redundancy codes" Proc.
IRE, Vol. 40, pp. 1098-1101, September, 1952.
Hwang, S. –A & Wu, C. -W., (2001) “Unified VLSI Systolic Array Design for LZ Data
compression,” IEEE Trans. VLSI Systems, vol. 9, pp. 489- 499, August 2001
Jung, B. & Burleson, W. P., (1998) “Efficient VLSI for Lempel-Ziv compression in wireless
data communication networks,” IEEE Trans. VLSI Syst. Vol. 6, pp. 475-483, Sept.
1998
Lin, K. -J. & Wu, C.-W., (2000) "A low-power CAM design for LZ data compressor," IEEE
Transactions on Computers, Vol.49, No.10, pp. 1139-1145, Oct. 2000
Ranganathan, N. & Henriques, S., (1993) ”High-Speed VLSI Designs for Lempel-Ziv-Based
Data Compression,” IEEE Trans. Circuits and Systems. II: Analog and Digital Signal
Processing, vol. 40, pp.96-106, Feb. 1993
Park, H. & Prasanna, V.K. (1993) “Area efficient VLSI architectures for Huffman coding”,
IEEE Trans. Analog and Digital Signal Processing, Vol.40, pp.568-575,Sep.1993
Said, A., (2003), "Fast Arithmetic Coding," in Lossless Compression Handbook, 2003
Said, A., (2004) " Introduction to Arithmetic coding – Theory and Practice" Imaging Systems
Laboratory, HP Laboratories Palo Alto, April 2004
Salomon, D., (2004) "Data Compression" The complete reference, Computer Science
Department, California State University, Northridge, pp. 22-205, 2004
Sandoval, M. M. & Uribe, C. F. (2005) “A hardware Architecture for Elliptic Curve
Cryptography and Lossless Data Compression,” IEEE Computer Society, May 2005
92 Data Storage
Welch, T., (1984) "A technique for high-performance data compression," IEEE computer, Vol.
17, pp. 8-19, 1984
Ziv, J. & Lempel, A., (1977) “A universal algorithm for sequential data compression,” IEEE
Trans. Inform. Theory, Vol. IT-23, pp. 337-343, 1977
Ziv, J. & Lempel, A., (1978) “Compression of individual sequences via variable rate coding,”
IEEE Trans. Inform. Theory, Vol. IT-24, pp. 530-536, 1978
Design of Simple and High Speed Scheme to Protect Mass Storages 93
X6
1. Introduction
The computer industry has entered a stage of unprecedented improvement in CPU
performance. However, the speed of file system management of huge information is
commonly considered as the main factor that affects the computer performance; for
example, the I/O bandwidth is limited by magnetic disks. The capacity and cost of magnetic
disks per megabyte have been continually improved, but the rotation speed and seek time
are improved very slowly. Recently, many computers have become I/O bound in the
applications of video, audio, commercial database, etc. If such an I/O crisis can be resolved,
the computer system performance will be improved. In 1988, Patterson et al. proposed the
redundant array of independent disks (RAID) system which allows the data to be separated
into several disks (Patterson et al., 1988). We can access the data in parallel so that the
throughput of I/O systems will be improved. On the other hand, more disks in RAID
system have a higher risk of losing data because of high component failure rates. As a result,
the safety and reliability have become the major issues in the RAID system.
When designing a highly available and reliable RAID system, the method of bit wise parity
checking is mostly used to correct errors and to enhance reliability of the RAID system.
However, the parity checking method is limited so that only single disk failure can be
tolerated. In 1995, Blaum et al. proposed a method called even-odd code, which tolerates up
to two disk failures in the RAID system (Blaum et al., 1995). Even-odd code is the first
known scheme for tolerating single or double disk failures, providing an optimal solution
with regard to both storage and performance. However, the major problem concerning the
even-odd code is a variety of modes of operations when solving erasures or up to 2 disk
failures. In practical, it is not easy to be integrated into a VSLI. On the other hand, a small
write problem is difficult to be solved with the even-odd code (Liao & Jing, 2002).
In 1997, Plank proposed a tutorial by using the Reed-Solomon (RS) code to provide error
correction in the RAID system (Plank, 1997). In 2000, Jing et al. also proposed a simple
algorithm, called RS-RAID system, to combine the RS codes with the RAID system (Jing et al.
2000). In this chapter, we aim to improve RS codes codec to design a fast error and erasure
correction for RS-RAID system, and to solve the small write problem in RS codes. In a RS
decoder, there are various algorithms to solve the error locator polynomial, which affect the
94 Data Storage
RAID Controller
zeros of the generator polynomial g(x) for the t-error-correcting RS code. The generator
polynomial is the production of the associated minimal polynomials:
2t
(1)
g( x )
( x j ), for GF( 2 m ) .
j 1
evaluation for the error pattern e(x). If there is an error e(x), the syndrome will be nonzero.
where Xi is the error locator for the ith error and Yi is the magnitude of the ith error. If a
random error ei has been introduced into the received word as ri c i ei , the syndromes
from equation (4) can be represented as
98 Data Storage
S1 R( ) ei i (6)
and
S2 R( 2 ) ei 2 i . (7)
From equations (6) and (7), the ei can directly affect the syndromes S1 and S2. With single
error, the error magnitude ei is substituted by Yi and the error location i is substituted by
Xi. Rearranging the equations (6) and (7), we have
S1 Yi X i (8)
and
S2 Yi X i2 . (9)
Finally, the direct solution of the error location Xi and magnitude Yi are obtained from
equations (8) and (9) as
S (10)
X i 2 S2 S11
S1
and
S12 (11)
Yi S12 S21 .
S2
On the other hand, the RAID-6 systems have 2 parities, and the modification of parities
might need to fetch a block of data to calculate the new parity. This is another inefficient
method that fetches more data and performs encoding again. The proposed algorithm has
limited times of access so that only the changed data can be read and the calculation of the
parities is performed, such as the c0 and c1 , which may be different from the original c0 and
c1 in the encoder. Since the advance of VLSI, the proposed IP has absolutely become a
combinational circuit to perform the calculation which will provide a high speed
performance in the case of low delay and a simple interface to the current RAID systems, as
presented in the following sections.
correspondence parity symbols c0 and c1 to construct a new set of codeword from the
original parity symbols c0 and c1. The fast algorithm is explained as follows.
First, the codeword C(x) can be expressed in terms of
n1
(17)
C ( ) c
i 0
i
i
0
and
n1
(18)
C ( 2 ) c
i 0
i
2i
0.
We also know the c0 and c1 are parity check symbols, thus the equations (17) and (18) can be
expressed in terms of
n1
(19)
c 0 0 c 1 1 c j j c i i
i 2 ,i j
and
n1
(20)
c 0 0 c 1 2 c j 2 j c
i 2 ,i j
i
2i
.
where j is the index or the location, cj is the original coefficient and cj is the new coefficient
of the updated codeword.
Secondly, the new coefficients c0 and c1 can be expressed in terms of the equations
c0 c 0 0 and c1 c 1 1 . In a similar manner, cj c j j , where j stands for the
differences between the original and new coefficients of the jth symbol in the codeword.
Since j is known, we need to solve 0 and 1 so that we can substitute 0 , 1 and j
into equations (19) and (20); therefore, we have
n1
(21)
(c 0 0 ) 0 (c 1 1 ) 1 (c j j ) j c
i 2. i j
i
i
and
n 1
(22)
(c 0 0 ) 0 (c 1 1 ) 2 (c j j ) 2 j c
i 2. i j
i
2i
.
To solve the 0 and 1 , subtract equation (19) from equation (21) and use the same way on
equations (20) and (22); we obtain
0 0 1 1 j j (23)
and
0 0 1 2 j 2 j . (24)
From the equations (23) and (24), it is found that we do not need the whole codeword to
generate the new set of parity symbols. This is the key to calculate the new parity symbols
on line. To solve 0 and 1 , the set of equations in equations (23) and (24) can be solved
simultaneously, and we have
( j 2 j ) (25)
1 (cj c j ) .
( 2 )
Finally, the new parity check coefficient c1 can be expressed in terms of
Design of Simple and High Speed Scheme to Protect Mass Storages 101
( j 2 j ) (26)
c1 (cj c j ) c1 .
( 2 )
Extending this representation to the new parity check coefficient c0 , we obtain
( j 2 j ) (27)
c0 (cj c j ) j (cj c j ) c0 .
( 2 )
This proves the decoder without a sequential stream of data. If all the elements are over
GF(28), equations (26) and (27) can be rearranged to obtain
c1 (cj c j )( j 2 j ) 229 c 1 (28)
and
c0 (cj c j )( j ( j 2 j ) 230 ) c 0 . (29)
By observing equations (28) and (29), we have a combinational circuit to construct a VLSI
module to finally realize this function.
of RS-RAID is then partitioned into dual levels, namely the system level and disk level, as
shown in Fig. 8.
In this design, all disks are considered as large logical/unified storage. The host can access
the data in RS-RAID through the IDE or SCSI interface. At the system level, there are n disks,
and all data are encoded and decoded by L1 (level one) RS-code codec through the PC
interface. This design guarantees the reliability of data reading from the large logical disk.
System Cache memory, which temporarily stores the data encoded from L1 RS-code codec,
is used to buffer the currently used data. Cache buffer will improve the RS-RAID
performance in frequent access to/from the system.
At the disk level, each disk has n stripes space. When the data are read from a disk, they are
decoded by L2 (level two) RS-code codec and then checked by CRC 32 codec. If the amount
Design of Simple and High Speed Scheme to Protect Mass Storages 103
of errors is too many for the correcting capability of L2 RS-code codec, this situation will be
detected by CRC 32 codec. This guarantees the data reading from individual disk in a
reliable condition. This system has advantages such as high capacity, throughput and
reliability, because all encoding/decoding processes are operating in real time.
number of total disks in system, and k is the number of data disks. When the frequently
rewritten data are sent to the system cache sequentially from the host, the c0 and c1 become
new parity symbols and must be updated in real time, as shown in Fig. 9.
Encoder
c0, c1 Disk0
Decoder Idisk(x)
System Cache Disk1
Syndromes Xi and Write Back
S1, S2 Yj, Yi Microprocessor
Before being written back to disks, all data are temporarily stored into system cache. During
write back, the data are firstly encoded into the outer-code C col ( x ) {c n1 , v , c n2 , v , , c 0 , v } for
v
of stripe Rstripe ( x ) {ru ,n1 , ru ,n2 , , ru , 0 } , where 0 u n , and n is the total number of disks.
u
Decoding Rstripe ( x ) , from the previous research in (Jing et al. 2001), the procedure is as
follows:
A. Rstripe ( x ) is firstly sent to the error corrector and generates its syndromes S1 and S2 for
u
immediately after completion of reading. The timing of this procedure is shown in Fig. 10.
With such high-speed correction, we consider this operation as a real-time correction
process.
At the disk level as shown in Fig. 11, when the bulks are written from cache into disks, the
encoding procedure at the disk level is as follows:
1. The data I disk ( x ) { I n 1 , I n 2 , , I 0 } in cache is written into disks.
Design of Simple and High Speed Scheme to Protect Mass Storages 105
2. Each disk receives its own data I which is then encoded by CRC 32 codec.
3. The encoded data stream from CRC 32 codec are encoded again by L2 RS-codes codec and
stored in the disks.
Thus, the RS-PC encoding is finished.
When a system reads data from disks into cache, all data will be checked by CRC 32 codec
and decoded by L2 RS-codes codec. The decoding procedure at the disk level is as follows:
1. The data in each disk are decoded and corrected by its own L2 RS-codes codec.
2. The decoded data stream from each L2 RS-codes codec are decoded again by its own CRC
32 codec.
Finally, if the quantity of errors is greater than the error-correcting capacity of L2 RS-codes
codec, the CRC 32 codec can detect the errors and report to the system level, so this disk is
marked as an erasure at the system level. This strategy will enhance the system reliability
and increase the data access speed with less possibility to retry failure disk(s). This design
provides support of double protection for the RAID system in real time.
6. Conclusion
This paper provides an example of coding to implement a RS code in redundant array of
independent disks system in correcting single random error and double erasures. There are
new directions such as the small write module and higher correction capabilities for the
design of a RS-RAID system. As a result, the proposed RS-RAID system has the following
advantages:
1. Expandable design: As the design of RAID-6, this paper does not only propose a solution
for dual disks failure, but also adopt the PGZ algorithm to correct less than six or seven
errors.
2. Integrated concept: This system presents a unified RSPC concept to partite the system into
dual levels of abstract/structure. Thus, the modules at the disk level mainly deal with burst
or random errors in disks, and the control of the system level does the correction for
multiple failures of the system. On the other hand, each disk may be a surface of disk so that
a fault tolerant hard disk is produced.
3. Real-time updating capability: In regard to the applications for banks, stock markets,
hospitals or military purposes, the system requires frequent transactions or updating of
data/information. The small write module may support the system cache with a real time
requirement and solve the frequent update operations in the RAID system with very low
overhead.
4. Suitability for co-design: The proposed algorithm is suitable for both hardware and
software designs of the modules in the applications of RS codes by using the finite field. The
math of finite field belongs to modern algebra which has been largely applied to the
applications of error correction code and cryptography. This suggests that the hardware
modules will be integrated into the math processor in CPU of the future versions. The
reliable control may be easily integrated into microcontrollers and general processors.
5. More applications: With the advantages from the expandability to the co-design of the
system, this concept may extend its applications to most memory systems such as the flash
memory, DRAM, and so on.
106 Data Storage
7. Acknowledgments
This work is supported in part by the Nation Science Council, Taiwan, under grant NSC 91-
2213-E-214-010.
8. References
Blaum, M.; Brady, J. ; Bruck, J. & Menon, J. (1995). EVENODD: An Efficient Scheme for
Tolerating Double Disk Failures in RAID Architectures. IEEE Transaction on
Computers, Vol. 44, No. 2, pp. 192-202, ISSN: 0018-9340
Chien, R.T. (1964). Cyclic Decoding Procedure for the Bose-Chaudhuri- Hocquenghem
Codes. IEEE Transaction on Information Theory, Vol. IT-10, No. 10, pp. 357-363, ISSN:
0018-9448
Forney, G.D. (1965). On Decoding BCH Codes. IEEE Transaction on Information Theory, Vol.
IT-11, No. 4, pp. 549-557, ISSN: 0018-9448
Gorenstein, D. & Zierler, N. (1961). A Class of Error Correcting Codes in pm Symbols. Journal
of the Society of Industrial and Applied Mathematics, Vol. 9, No. 2, pp. 207-214, ISSN:
0368-4245
Jing, M.H.; Chen, Y.H. & Yuan, K.Y. (2000). The Comparison of Evenodd Code and RS Code
for RAID Applications, Asia Pacific Conference on Multimedia Technology and
Application, pp. 261-268, December, 2000, Kaohsiung, Taiwan
Jing, M.H.; Chen, Y.H. & Liao, J.E. (2001). A Fast Error and Erasure Correction Algorithm for
a Simple RS-RAID, Proceedings of the 2001 International Conferences on Info-tech and
Info-net, pp. 333-338, ISBN: 0-7803-7010-4, October, 2001, Beijing, China
Liao, J.E & Jing, M.H. (2002). The Research and Implementation of High-Speed RAID Using Reed-
Solomon Codes, Master's thesis, Department of Information Engineer, I-Shou
University, Kaohsiung, Taiwan
Patterson, D.A.; Gibson, G. & Katz, R.H. (1988). A Case for Redundant Arrays of
Inexpensive Disks (RAID), Proceedings of the 1988 ACM SIGMOD international
conference on Management of data, pp. 109-116, ISBN: 0-89791-268-3, June, 1988,
ACM, Chicago, Illinois
Peterson, W.W. (1960). Encoding and Error-Correction Procedures for the Bose-Chaudhuri
Codes. IRE Transaction on Information Theory, Vol. IT-6, No. 4, pp. 459-470, ISSN:
0096-1000
Plank, J.S. (1997). A Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-like
Systems. Software - Practice and Experience, Vol. 27, No. 9, pp. 995-1012, ISSN: 0038-
0644
Wicker, S.B. & Bhargava, V.K. (1994). Reed-Solomon Codes and Their Applications, IEEE Press,
ISBN: 0-7803-5391-9, NJ
Wicker, S.B. (1995). Error Control Systems for Digital Communication and Storage, Prentice Hall,
ISBN: 0-13-308941-X, US
Administration and Monitoring Service for Storage Virtualization in Grid Environments 107
X7
1. Introduction
A grid is collection of computers and storage resources maintained to serve the needs of
some community (Foster et al., 2002). It addresses collaboration, data sharing, cycle sharing
and other patterns of interaction that involve distributed resources.
Actually, the de facto building block is for high performance storage systems. In grid
context, the scale and the reliability have key issues as many independently failing and
unreliable components need to be continuously accounted for and managed over time
(Porter & Katz, 2006). Manageability also becomes of paramount importance, since
nowadays the grid commonly consists of hundred or even thousands of storage and
computing nodes (Foster et al., 2002). One of the key challenges faced by high performance
storage system is scalable administration and monitoring of system state.
A monitoring system captures subset of interactions amongst the myriad of computational
nodes, links, and storage devices. These interactions are interpreted in order to improve
performance in grid environment.
On a grid scale basis, ViSaGe aims at providing to the grid users a transparent, reliable and
powerful storage system underpinned by a storage virtualization layer. ViSaGe is based on
three services, namely as: administration and monitoring service, storage virtualization
service and distributed file system. The virtualization service incorporates storage resources
distributed on the grid in virtual spaces. These virtual spaces will be attributed by the
distributed file system various qualities of service and data placement policies.
In this paper, we present our scalable distributed system: Admon. Admon consists of an
administration module that manages virtual storage resources according to their workloads
based on the information collected by a monitoring module. It is known that the
performance of a system depends deeply on the characteristics of its workload. Usually, the
node's workload is associated to the service response time of a storage application. As the
workload increases, the service response time becomes longer. However, the utilization
percentage of system resources (CPU load, the Disk load and Network load) must be taken
into more consideration. Therefore, Admon traces ViSaGe's applications and collects system
resources (CPU, Disk, Network…) percentage of utilization. It is characterized by an
108 Data Storage
2. Related work
In a distributed system such as ViSaGe, the storage management is principally funded on
automatic decisions to improve performance. These decisions identify nodes which can be
accessed efficiently. Therefore, analyzing node’s workload is primary to make an adequate
decision. Moreover, this workload is mainly supported by the monitoring systems. Several
existing monitoring systems are available for monitoring grid resources (computing
resources, storage resources, networks) and grid applications. Examples of existing
monitoring systems are: Network Weather Service (NWS) (Wolski et al., 1999), Relational
grid Monitoring Architecture (R-GMA) (Cooke et al., 2004), NetLogger (Gunter et al., 2000),
etc.
Network Weather Service (NWS) is a popular distributed system for producing short term
performance forecasts based on historical performance measurements. NWS is specific to
monitor grid computing and network resources. It provides CPU percentage availability
rather than CPU usage of a particular application. It supposes that nodes are available for
the entire application. This conclusion is very limited to be used to achieve high throughput
in a storage virtualization system like ViSaGe.
Relational grid Monitoring Architecture (R-GMA) is based on a relational model. It is an
implementation of GGF Grid Monitoring Architecture (GMA). It is being used both for
information about the grid and for application monitoring. R-GMA collected information
was used only to find out about what services are available at any time.
Netlogger is a monitoring and a performance analysis tool for grids. It provides tracing
services for distributed applications to detect bottlenecks. However, performance analysis is
carried out in post mortem. In ViSaGe, we don't need a system for studying the application's
state, but a system that analyzes the availability of grid resources.
The aforementioned monitoring systems are designed to produce and deliver measurements
on grid resources and application efficiently. Nevertheless, none of these systems presents
the whole of the pertinent characteristics for a virtualization system like ViSaGe. ViSaGe
needs a monitoring system providing a full view of node's workload in order to choose
better nodes according to its target (replicating data, distributing workload …) and during
nodes workload variations. Therefore, our monitoring system, Admon, traces applications,
and collects information about storage resources, grid resources and networks. It is
considered as an intersection point between all the aforementioned monitoring systems. It
should use its monitoring knowledge for choosing the accurate node. The choice will be
according to a prediction model in order to place data, efficiently, during runtime execution.
Administration and Monitoring Service for Storage Virtualization in Grid Environments 109
3. ViSaGe Architecture
A grid consists of three levels: node level represents storage node and computing node, site
level represents site’s administrator of the grid, and grid’s level represents grid’s
administrator. The components of our storage virtualization system are distributed on this
architecture (Fig.1) which brought an additional service for improving storage quality, and
the access to the data.
At the node level, we have :
1. The virtualization service (Vrt).
2. The distributed file system service (Visagefs).
3. The administration and monitoring service (Admon).
At the upper levels (the site level and the grid level), we found the tools of the
administration and monitoring service (Admon).
ViSaGe aggregates distributed storage resources. It proposes for providing a self adaptive
storage management system. The ViSaGe storage virtualization layer allows simplifying the
management of sharing storage resources: flexible and transparent access to the data, high
performance (data replication, links configuration, etc).
The traditional application (like creation file, reading/writing data) uses the virtual storage
resource through our file system, and thus reached data by means of virtualization layer.
Furthermore, to handle the dynamic and the inherent nature of the grid, and to control the
virtualization (storage resources attribution, virtual storage resources creation/destruction
or modification), our storage virtualization system proposes the administration and the
monitoring layer, Admon. Therefore, Admon’s design must be strongly related to grid
environment. Next, we present the hierarchical architecture and the fundamental concept of
this system.
110 Data Storage
The collection of traces and logs creates a complete view of what the system is being ordered
to be achieved. The next step should allow combining the system’s knowledge with the
statistically gathered data, for evaluating the stability state of the virtual storage resources.
5. Experimental results
In a data storage distributed system, the major concern is the data access performance.
Many works focus on the data access performance has been concentrated on a static value
and is not related to nodes load. However, whole system’s performance is affected not only
by the behavior of each single application, but also by the resulting execution of several
different applications combined together. We have implemented Admon to study the
service response time of ViSaGe’s application influencing the system resources utilization. It
traces applications and collects system resources' percentages (CPU, Disk, Network…) of
utilization. The contribution of our work is to uncover which nodes are dedicated or not to
ViSaGe. The Admon's performance metrics are represented by the CPU, Disk and Network.
Admon sets the maximum value of performance metrics; constraint for achieving an
experiment and making an adequate decision. This solution is used by Admon to identify
grid dedicated node.
Here we present a use case study in order to show the Admon automatic instrumentation
due to its decision. This decision improves data storage management and performance in
our data storage system. Before on going on our experiment, we choose the CPU load (CPU
< 70) like a constraint. This constraint is like a user demand to improve experiment
performance.
In the other hand, the second test, represented by the figure 4, shows from the workload of
node2 (in terms of CPU load), that node2 is dedicated to the grid- that mean that this node is
used by another applications (local or on the grid).
Before the launching the 2D phase, we launch on the node2 applications simulating the use
of this node by a virtual user. The application of this user uses the nodes CPU, stores and
sends requests to the disk (read or write data). The goal of this operation is to increase the
load of the CPU in a progressive way in order to stabilize it on a maximum value. The
monitoring agent collects the percentages of CPU node and sends this significant change to
the monitoring tools. The monitoring tools, starting from the information collected by each
monitoring agent, test the node’s state. For ongoing on the 2D phase, Admon administration
doesn’t choose this node – “node2” for the 2D stage.
The other two nodes, (node3 and node4), aren't detected in term of dedicated node to the
grid, Admon launches commands to mount the Visagefs. For adding these nodes in the
same virtual space, Admon contacts the Vrt (the visage virtualization layer). The Vrt
replicates data in order to launch the 2D phase.
We finalize our experiment by the 3D phase that synthesizes information on the node1.
6. Conclusion
In this paper, we have described a non-intrusive, scalable administration and monitoring
system for high performance grid environment. It is a fundamental building block for
achieving and analyzing the performance of the storage virtualization system in a huge and
heterogeneous grid storage environment. It offers a very flexible and simple model that
collects nodes state information and requirements needed by the other services of our
virtualization storage system improving storage distributed data performance. It is based on
a multi-tiered hierarchical architecture the start-up of the monitoring and administration
system.
Administration and Monitoring Service for Storage Virtualization in Grid Environments 115
Future work will focus on developing interactive jobs with the storage virtualization system,
in order to study the relation between different system resources (CPU, disk, networks) and
tracing events, where Admon allows changing the storage distributed protocols (for
instance, replication) during runtime execution. Therefore, this work will be difficult since
we have many parameters that must be taken into consideration. However, this mechanism
will allow generalizing I/O workload model in order to improve applications throughput
during workload variations in grid environments.
7. References
Foster and al. (2002), The Anatomy of the Grid: Enabling Scalable Virtual Organizations, Intl J.
High-Performance Computing Applications, vol. 15, no. 3, 2001, pp. 200-222, June
2002.
Porter, G. & Katz, R.H. (2006). Effective Web Service Load Balancing Through Statistical
Monitoring. Communication of the ACM, March 2006.
Wolski, R; Spring, N.T. & Hayes, J. (1999). The Network Weather Service: A Distributed
Resource Performance Forecasting Service for Metacomputing. Future Generation
Computing Systems, Metacomputing Issue, 15(5-6): 157-768, Oct. 1999.
Gunter, D.; Tierney, B.; Crowley, B.; Holding, M. & Lee, J. (2000). NetLogger: A Toolkit for
Distributed System Performance Analysis, In Proceedings of the IEEE Mascots 2000,
Aug. 2000.
Cooke, A. et al. (2004). The Relational Grid Monitoring Architecture: Mediating information about
the grid, 2004
116 Data Storage
Power-Aware Memory Allocation for Embedded Data-Intensive Signal Processing Applications 117
0
8
Abstract
Many signal processing systems, particularly in the multimedia and telecommunication do-
mains, are synthesized to execute data-intensive applications: their cost related aspects –
namely power consumption and chip area – are heavily influenced, if not dominated, by the
data access and storage aspects. This chapter presents a power-aware memory allocation
methodology. Starting from the high-level behavioral specification of a given application, this
framework performs the assignment of of the multidimensional signals to the memory layers
– the on-chip scratch-pad memory and the off-chip main memory – the goal being the reduc-
tion of the dynamic energy consumption in the memory subsystem. Based on the assignment
results, the framework subsequently performs the mapping of signals into the memory lay-
ers such that the overall amount of data storage be reduced. This software system yields a
complete allocation solution: the exact storage amount on each memory layer, the mapping
functions that determine the exact locations for any array element (scalar signal) in the spec-
ification, and, in addition, an estimation of the dynamic energy consumption in the memory
subsystem.
1. Introduction
Many multidimensional signal processing systems, particularly in the areas of multimedia
and telecommunications, are synthesized to execute data-intensive applications, the data
transfer and storage having a significant impact on both the system performance and the
major cost parameters – power and area.
In particular, the memory subsystem is, typically, a major contributor to the overall energy
budget of the entire system (8). The dynamic energy consumption is caused by memory ac-
cesses, whereas the static energy consumption is due to leakage currents. Savings of dynamic
energy can be potentially obtained by accessing frequently used data from smaller on-chip
memories rather than from the large off-chip main memory, the problem being how to op-
timally assign the data to the memory layers. Note that this problem is basically different
from caching for performance (15), (22), where the question is to find how to fill the cache
such that the needed data be loaded in advance from the main memory. As on-chip storage,
the scratch-pad memories (SPMs) – compiler-controlled static random-access memories, more
118 Data Storage
energy-efficient than the hardware-managed caches – are widely used in embedded systems,
where caches incur a significant penalty in aspects like area cost, energy consumption, hit
latency, and real-time guarantees. A detailed study (4) comparing the tradeoffs of caches as
compared to SPMs found in their experiments that the latter exhibit 34% smaller area and
40% lower power consumption than a cache of the same capacity. Even more surprisingly,
the runtime measured in cycles was 18% better with an SPM using a simple static knapsack-
based allocation algorithm. As a general conclusion, the authors of the study found absolutely
no advantage in using caches, even in high-end embedded systems in which performance is
important. 1 Different from caches, the SPM occupies a distinct part of the virtual address
space, with the rest of the address space occupied by the main memory. The consequence is
that there is no need to check for the availability of the data in the SPM. Hence, the SPM does
not possess a comparator and the miss/hit acknowledging circuitry (4). This contributes to
a significant energy (as well as area) reduction. Another consequence is that in cache mem-
ory systems, the mapping of data to the cache is done during the code execution, whereas in
SPM-based systems this can be done at compilation time, using a suitable algorithm – as this
chapter will show.
The energy-efficient assignment of signals to the on- and off-chip memories has been studied
since the late nineties. These previous works focused on partitioning the signals from the ap-
plication code into so-called copy candidates (since the on-chip memories were usually caches),
and on the optimal selection and assignment of these to different layers into the memory hier-
archy (32), (7), (18). For instance, Kandemir and Choudhary analyze and exploit the temporal
locality by inserting local copies (21). Their layer assignment builds a separate hierarchy per
loop nest and then combines them into a single hierarchy. However, the approach lacks a
global view on the lifetimes of array elements in applications having imperfect nested loops.
Brockmeyer et al. use the steering heuristic of assigning the arrays having the lowest access
number over size ratio to the lowest memory layer first, followed by incremental reassign-
ments (7). Hu et al. can use parts of arrays as copies, but they typically use cuts along the array
dimensions (18) (like rows and columns of matrices). Udayakumaran and Barua propose a
dynamic allocation model for SPM-based embedded systems (29), but the focus is global and
stack data, rather than multidimensional signals. Issenin et al. perform a data reuse analysis
in a multi-layer memory organization (19), but the mapping of the signals into the hierarchi-
cal data storage is not considered. The energy-aware partitioning of an on-chip memory in
multiple banks has been studied by several research groups, as well. Techniques of an ex-
ploratory nature analyze possible partitions, matching them against the access patterns of the
application (25), (11). Other approaches exploit the properties of the dynamic energy cost and
the resulting structure of the partitioning space to come up with algorithms able to derive the
optimal partition for a given access pattern (6), (1).
Despite many advances in memory design techniques over the past two decades, existing
computer-aided design methodologies are still ineffective in many aspects. In several previ-
ous works, the reduction of the dynamic energy consumption in hierarchical memory sub-
systems is addressed using in part enumerative approaches, simulations, profiling, heuristic
explorations of the solution space, rather than a formal methodology. Also, several models of
mapping the multidimensional signals into the physical memory were proposed in the past
(see (12) for a good overview).
1 Caches have been a big success for desktops though, where the usual approach to adding SRAM is to
configure it as a cache.
Power-Aware Memory Allocation for Embedded Data-Intensive Signal Processing Applications 119
2 Typical piece-wise linear operations can be transformed into affine specifications (17). In addition,
pointer accesses can be converted at compilation to array accesses with explicit index functions (16).
Moreover, specifications where not all loop structures are for loops and not all array indexes are affine
functions of the loop iterators can be transformed into affine specifications that captures all the memory
references amenable to optimization (20). Extensions to support a larger class of specifications are thus
possible, but they are orthogonal to the work presented in this chapter.
120 Data Storage
Fig. 1. Code derived from a motion detection (9) kernel (m = n = 16, M = N = 64) and the
exact map of memory read accesses (obtained by simulation) for the 2-D signal A.
center of the index space are accessed with high intensity (for instance, A[40][40] is accessed
2,178 times; A[16][40] is accessed 1,650 times), whereas the array elements at the periphery
are accessed with a significantly lower intensity (for instance, A[0][40] is accessed 33 times
and A[0][0] only once).
The drawbacks of such an approach are twofold. First, the simulated execution may be com-
putationally ineffective when the number of array elements is very significant, or when the
application code contains deep loop nests. Second, even if the simulated execution were fea-
sible, such a scalar-oriented technique would not be helpful since the addressing hardware of
the data memories would result very complex. An address generation unit (AGU) is typically
implemented to compute arithmetic expressions in order to generate sequences of addresses
(26); a set of array elements is not a good input for the design of an efficient AGU.
Our proposed computation methodology for power-aware signal assignment to the memory
layers is described below, after defining a few basic concepts.
Each array reference M [ x1 (i1 , . . . , in )] · · · [ xm (i1 , . . . , in )] of an m-dimensional signal M, in the
scope of a nest of n loops having the iterators i1 , . . . , in , is characterized by an iterator
space and an index (or array) space. The iterator space signifies the set of all iterator vectors
i = (i1 , . . . , in ) ∈ Zn in the scope of the array reference, and it can be typically represented
by a so-called Z-polytope (a polyhedron bounded and closed, restricted to the set Zn ): { i ∈
Zn | A · i ≥ b }. The index space is the set of all index vectors x = ( x1 , . . . , xm ) ∈ Zm of
the array reference. When the indexes of an array reference are linear mappings with integer
coefficients of the loop iterators, the index space consists of one or several linearly bounded
lattices (27): { x = T · i + u ∈ Zm | A · i ≥ b , i ∈ Zn }. For instance, the array reference
A[i + 2 ∗ j + 3][ j + 2 ∗ k] from the loop nest
for (i=0; i<=2; i++)
for (j=0; j<=3; j++)
for (k=0; k<=4; k++)
if ( 6*i+4*j+3*k ≤ 12 ) ···
A[i+2*j+3][j+2*k] · · ·
1 0 0 0
i i
0 1 0 0
has the iterator space P = j ∈ Z3 j ≥ . (The
0 0 1
0
k k
−6 −4 −3 −12
inequalities i ≤ 2, j ≤ 3, and k ≤ 4 are redundant.)
Power-Aware Memory Allocation for Embedded Data-Intensive Signal Processing Applications 121
Fig. 2. The mapping of the iterator space into the index space of the array reference A[i + 2 ∗
j + 3][ j + 2 ∗ k].
The
A-elements of the array referencehavethe indices x, y:
i i
x 1 2 0 3
= T·i+u = j + j ∈ P . The points of the index
y 0 1 2 0
k k
space lie inside the Z-polytope { x ≥ 3 , y ≥ 0 , 3x − 4y ≤ 15 , 5x + 6y ≤ 63 , x, y ∈ Z},
whose boundary is the image of the boundary of the iterator space P (see Fig. 2). However, it
can be shown that only those points (x,y) satisfying also the inequalities −6x + 8y ≥ 19k − 30,
x − 2y ≥ −4k + 3, and y ≥ 2k ≥ 0, for some positive integer k, belong to the index space;
these are the black points in the right quadrilateral from Fig. 2. In this example, each point in
the iterator space is mapped to a distinct point of the index space; this is not always the case,
though.
Algorithm 1: Power-aware signal assignment to the SPM and off-chip memory layers
Step 1 Extract the array references from the given algorithmic specification and decompose the array
references for every indexed signal into disjoint lattices.
The motivation of the decomposition of the array references relies on the following intuitive
idea: the disjoint lattices belonging to many array references are actually those parts of arrays
more heavily accessed during the code execution. This decomposition can be analytically per-
formed, using intersections and differences of lattices – operations quite complex (3) involving
computations of Hermite Normal Forms and solving Diophantine linear systems (24), com-
puting the vertices of Z-polytopes (2) and their supporting polyhedral cones, counting the
integral points in Z-polyhedra (5; 10), and computing integer projections of polytopes (30).
Figure 3 shows the result of such a decomposition for the three array references of signal M.
The resulting lattices have the following expressions (in non-matrix format):
L1 = { x = 0, y = t | 5 ≥ t ≥ 0}
L2 = { x = t1 , y = t2 | 5 ≥ t2 ≥ 1 , 2t2 − 1 ≥ t1 ≥ 1}
L3 = { x = 2t, y = t | 5 ≥ t ≥ 1}
L4 = { x = 2t1 + 2, y = t2 | 4 ≥ t1 ≥ t2 ≥ 1}
L5 = { x = 2t1 + 1, y = t2 | 4 ≥ t1 ≥ t2 ≥ 1}
L6 = { x = t, y = 0 | 10 ≥ t ≥ 1}
Step 2 Compute the number of memory accesses for each disjoint lattice.
The total number of memory accesses to a given linearly bounded lattice of a signal is com-
puted as follows:
Step 2.1 Select an array reference of the signal and intersect the given lattice with it. If the
intersection is not empty, then the intersection is a linearly bounded lattice as well (27).
Step 2.2 Compute the number of points in the (non-empty) intersection: this is the number of
memory accesses to the given lattice (as part of the selected array reference).
Step 2.3 Repeat steps 2.1 and 2.2 for all the signal’s array references in the code, cumulating
the numbers of accesses.
For example, let us consider one of signal A’s lattices3 { 64 ≥ x , y ≥ 16} obtained in Step
1. Intersecting it with the array reference A[k][l ] (see the code in Fig. 1), we obtain the lattice
{i = t1 , j = t2 , k = t3 , l = t4 | 64 ≥ t1 , t2 , t3 , t4 ≥ 16, t1 + 16 ≥ t3 ≥ t1 − 16, t2 + 16 ≥
t4 ≥ t2 − 16}. The size of this set is 2,614,689 , which is the number of memory accesses to the
given lattice as part of the array reference A[k][l ]. Since the given lattice is also included in
the other array reference4 in the code – A[i ][ j], a similar computation yields 1,809,025 accesses
to the same lattice as part of A[i ][ j]. Hence, the total amount of memory accesses to the given
lattice is 2,614,689+1,809,025=4,423,714.
Figure 4 displays a computed map of memory accesses for the signal A, where A’s index space
is in the horizontal plane xOy and the numbers of memory accesses are on the vertical axis Oz.
This computed map is an approximation of the exact map in Fig. 1 since the accesses within
each lattice are considered uniform, equal to the average values obtained above. The advan-
tage of this map construction is that the (usually time-expensive) simulation is not needed any
more, being replaced by algebraic computations. Note that a finer granularity in the decom-
position of the index space of a signal in disjoint lattices entails a computed map of accesses
closer to the exact map.
Step 3 Select the lattices having the highest access numbers, whose total size does not exceed the
maximum SPM size (assumed to be a design constraint), and assign them to the SPM layer. The other
lattices will be assigned to the main memory.
3 When the lattice has T=I – the identity matrix – and u=0, the lattice is actually a Z-polytope, like in this
example.
4 Note that in our case, due to Step 1, any disjoint lattice is either included in the array reference or disjoint
from it.
Power-Aware Memory Allocation for Embedded Data-Intensive Signal Processing Applications 123
Fig. 4. Computed 3D map of memory read accesses for the signal A from the illustrative code
in Figure 1.
Storing on-chip all the signals is, obviously, the most desirable scenario in point of view of
dynamic energy consumption, which is typically impossible. We assume here that the SPM
size is constrained to smaller values than the overall storage requirement. In our tests, we
computed the ratio between the dynamic energy reduction and the SPM size after mapping;
the value of the SPM size maximizing this ratio was selected, the idea being to obtain the
maximum benefit (in energy point of view) for the smallest SPM size.
5 For instance, a 2-D array can be typically linearized concatenating the rows or concatenating the
columns. In addition, the elements in a given dimension can be mapped in the increasing or decreasing
order of the respective index.
124 Data Storage
Fig. 5. (a-b) Illustrative examples having a similar code structure. The mapping model by
array linearization yields a better allocation solution for the former example, whereas the
bounding window model behaves better for the latter one.
operands in the next iterations), while the circles representing A-elements already ‘dead’ (i.e.,
not needed as operands any more). The light grey points to the right of the dashed line are
A-elements still unborn (to be produced in the next iterations).
If we consider the array linearization by column concatenation in the increasing order of the
columns ((A[index1 ][index2 ], index1 =0,18), index2 =0,9), two elements simultaneously alive,
placed the farthest apart from each other, are A[9][0] and A[9][9]. The distance between them
is 9×19=171. Now, if we consider the array linearization by row concatenation in the increas-
ing order of the rows ((A[index1 ][index2 ], index2 =0,9), index1 =0,18), the maximum distance
between live elements is 99 (e.g., between A[4][5] and A[14][4]). For all the canonical lin-
earizations, the maximum distances have the values {99, 109, 171, 181}. The best canonical
linearization for the array A is the concatenation row by row, increasingly. A memory win-
dow WA of 99+1=100 successive locations (relative to a certain base address) is sufficient to
store the array without mapping conflicts: it is sufficient that any access to A[index1 ][index2 ]
be redirected to WA [(10 ∗ index1 + index2 ) mod 100].
In order to avoid the inconvenience of analyzing different linearization schemes, another
possibility is to compute a maximal bounding window WA = (w1 , . . . , wm ) large enough
to encompass at any time the simultaneously alive A-elements. An access to the element
A[index1 ] . . . [indexm ] can then be redirected without any conflict to the window location
WA [index1 mod w1 ] . . . [indexm mod wm ]; in its turn, the window is mapped, relative to a base
address, into the physical memory by a typical canonical linearization, like row or column
concatenation for 2-D arrays. Each window element wk is computed as the maximum differ-
ence in absolute value between the k-th indexes of any two A-elements (Ai ,A j ) simultaneously
alive, plus 1. More formally, wk = max { | xk ( Ai ) − xk ( A j )| } + 1, for k = 1, . . . , m. This
ensures that any two array elements simultaneously alive are mapped to distinct memory lo-
cations. The amount of data memory required for storing (after mapping) the array A is the
volume of the bounding window WA , that is, |WA | = Πm k =1 w k .
Power-Aware Memory Allocation for Embedded Data-Intensive Signal Processing Applications 125
In the illustrative example shown in Fig. 5(a), the bounding window of the signal A is
WA = (11 , 10). It follows that the storage allocation for signal A is 100 locations if the
linearization model is used, and w1 × w2 =110 locations when the bounding window model is
applied. However, in the example shown in Fig. 5(b), where the code has a similar structure,
the bounding window model yields a better allocation result – 30 storage locations, since the
2-D window of A is WA = (5 , 6), whereas the linearization model yields 32 locations (the
best canonical linearization being the row concatenation in the increasing order of rows).
Our software system incorporates both mapping models, their implementation being based
on the same polyhedral framework operating with lattices, used also in Section 2. This is ad-
vantageous both from the point of view of computational efficiency and relative to the amount
of allocated data storage – since the mapping window for each signal is the smallest one of
the two models. Moreover, this methodology can be applied independently to the memory
layers, providing a complete storage allocation/assignment solution for distributed memory
organizations.
Before explaining the global flow of the algorithm, let us examine the simple case of a code
with only one array reference in it: take, for instance, the two nested loops from Fig. 5(b),
but without the second conditional statement that consumes the A-elements. In the bounding
window model, WA can be determined by computing the integer projections on the two axes
of the lattice of A[i ][ j], represented graphically by all the points inside the quadrilateral from
Fig. 5(b). It can be directly observed that the integer projections of this polygon have the
sizes: w1 = 11 and w2 = 7. In the linearization model, denoting x and y the two indexes,
the distance between two A-elements A1 ( x1 , y1 ) and A2 ( x2 , y2 ), assuming row concatenation
in the increasing order of the rows, is: dist( A1 , A2 ) = ( x2 − x1 )∆y + (y2 − y1 ), where ∆y
is the range of the second index (here, equal to 7) in the array space.6 Then, the A-elements
at a maximum distance have the minimum and, respectively, the maximum index vectors
relative to the lexicographic order. These array A-elements are represented by the points M =
A[2][7] and N = A[12][7] in Fig. 5(b), and dist( M, N ) = (12-2)×7 +(0-0)=70. Similarly, in the
linearization by column concatenation, the array elements at the maximum distance from each
other are still the elements with (lexicographically) minimum and maximum index vectors,
provided an interchange of the indexes is applied first. These are the points M = A[9][4] and
N = A[4][10] in Fig. 5(b). More general, the maximum distance between the points of a live
lattice in a canonical linearization is the distance between the (lexicographically) minimum
and maximum index vectors, providing an index permutation is applied first. The distance
j j j
between the array elements Ai ( x1i , x2i , . . . , xm
i ) and A ( x , x , . . . , x ) is:
j 1 2 m
j j j
dist( Ai , A j ) = ( x1 − x1i )∆x2 · · · ∆xm + ( x2 − x2i )∆x3 · · · ∆xm + · · · + ( xm−1 − xm
i
−1 ) ∆xm +
j i ) where the index vector of A is lexicographically larger than of A (∆x is the range
( xm − xm j i i
of xi ).
Algorithm 2: For each memory layer (SPM and main memory) compute the mapping windows for
every indexed signal having lattices assigned to that layer.
Step 1 Compute underestimations of the window sizes on the current memory layer for each indexed
signal, taking into account only the live signals at the boundaries between the loop nests.
Let A be an m-dimensional signal in the algorithmic specification, and let P A be the set of dis-
joint lattices partitioning the index space of A. A high-level pseudo-code of the computation
6 To ensure that the distance is a nonnegative number, we shall assume that [ x2 y2 ] T [ x1 y1 ] T relative to
the lexicographic order. The vector y = [y1 , . . . , ym ] T is larger lexicographically than x = [ x1 , . . . , xm ] T
(written y x) if (y1 > x1 ), or (y1 = x1 and y2 > x2 ), . . . , or (y1 = x1 , . . . , ym−1 = xm−1 , and ym > xm ).
126 Data Storage
of A’s preliminary windows is given below. Preliminary window sizes for each canonical lin-
earization according to DeGreef’s model (13) are computed first, followed by the computation
of the window size underestimate according to Tronçon’s model (28) in the same framework
operating with lattices. The meaning of the variables are explained as comments.
for ( each canonical linearization C ) {
for ( each disjoint lattice L ∈ P A ) // compute the (lexicographically) minimum and
maximum ...
compute x min ( L) and x max ( L) ; // ... index vectors of L relative to C
for ( each boundary n between the loop nests n and n + 1 ) { // the start of the code is
boundary 0
let P A (n) be the collection of disjoint lattices of A, which are alive at the bound-
ary n ;
// these are disjoint lattices produced before the boundary and con-
sumed after it
let Xnmin = min L∈P A (n) { x min ( L)} and Xnmax = max L∈P A (n) { x max ( L)} ;
|WC (n)| = dist( Xnmin , Xnmax ) + 1 ; // The distance is computed in the canonical
linearization C
}
|WC | = maxn { |WC (n)| } ; // the window size according to (13) for the canonical
linearization C
} // (possibly, an underestimate)
for ( each disjoint lattice L ∈ P A )
for ( each dimension k of signal A )
compute xkmin ( L) and xkmax ( L) ; // the extremes of the integer projection of L
on the k-th axis
for ( each boundary n between the loop nests n and n + 1 ) { // the start of the code is
boundary 0
let P A (n) be the collection of disjoint lattices of A, which are alive at the boundary
n;
for ( each dimension k of signal A ) {
let Xkmin = min L∈P A (n) { xkmin ( L)} and Xkmax = max L∈P A (n) { xkmax ( L)} ;
wk (n) = Xkmax − Xkmin + 1 ; // The k-th side of A’s bounding window at
boundary n
}
}
for ( each dimension k of signal A ) wk = maxn {wk (n)} ; // k-th side of A’s window over
all boundaries
|W | = Π m k=1 wk ; // the window size according to (28) (possibly, an underestimate)
Step 1 finds the exact values of the window sizes for both mapping models only when every
loop nest either produces or consumes (but not both!) the signal’s elements. Otherwise, when
in a certain loop nest elements of the signal are both produced and consumed (see the illus-
trative example from Fig. 5(a)), then the window sizes obtained at the end of Step 1 may be
only underestimates since an increase of the storage requirement can happen inside the loop
nest. Then, an additional step is required to find the exact values of the window sizes in both
mapping models.
Step 2 Update the mapping windows for each indexed signal in every loop nest producing and con-
suming elements of the signal.
Power-Aware Memory Allocation for Embedded Data-Intensive Signal Processing Applications 127
The guiding idea is that local or global maxima of the bounding window size |W | are reached
immediately before the consumption of an A-element, which may entail a shrinkage of some
side of the bounding window encompassing the live elements. Similarly, the local or global
maxima of |WC | are reached immediately before the consumption of an A-element, which
may entail a decrease of the maximum distance between live elements. Consequently, for
each A-element consumed in a loop nest which also produces A-elements, we construct the
disjoint lattices partially produced and those partially consumed until the iteration when the
A-element is consumed. Afterwards, we do a similar computation as in Step 1 which may
result in increased values for |WC | and/or |W |.
Finally, the amount of data memory allocated for signal A on the current memory layer is
|WA | = min { |W | , minC { |WC | } }, that is, the smallest data storage provided by the bound-
ing window and the linearization mapping models. In principle, the overall amount of data
memory after mapping is ∑ A |WA | – the sum of the mapping window sizes of all the sig-
nals having lattices assigned to the current memory layer. In addition, a post-processing step
attempts to further enhance the allocation solution: our polyhedral framework allows to effi-
ciently check weather two multidimensional signals have disjoint lifetimes, in which case the
signals can share the largest of the two windows. More general, an incompatibility graph (14)
is used to optimize the memory sharing among all the signals at the level of whole code.
4. Experimental results
A hierarchical memory allocation tool has been implemented in C++, incorporating the al-
gorithms described in this chapter. For the time being, the tool supports only a two-level
memory hierarchy, where an SPM is used between the main memory and the processor core.
The dynamic energy is computed based on the number of accesses to each memory layer. In
computing the dynamic energy consumptions for the SPM and the main (off-chip) memory,
the CACTI v5.3 power model (31) was used.
Table 1 summarizes the results of our experiments, carried out on a PC with an Intel Core 2
Duo 1.8 GHz processor and 512 MB RAM. The benchmarks used are: (1) a motion detection
algorithm used in the transmission of real-time video signals on data networks; (2) the kernel
of a motion estimation algorithm for moving objects (MPEG-4); (3) Durbin’s algorithm for
solving Toeplitz systems with N unknowns; (4) a singular value decomposition (SVD) up-
dating algorithm (23) used in spatial division multiplex access (SDMA) modulation in mobile
communication receivers, in beamforming, and Kalman filtering; (5) the kernel of a voice
coding application – essential component of a mobile radio terminal.
The table displays the total number of memory accesses, the data memory size (in storage
locations/bytes), and the dynamic energy consumption assuming only one (off-chip) memory
layer; in addition, the SPM size and the savings of dynamic energy applying, respectively, a
previous model steered by the total number of accesses for whole arrays (7), another previous
model steered by the most accessed array rows/columns (18), and the current model, versus
the single-layer memory scenario; the CPU times. The energy consumptions for the motion
estimation benchmark were, respectively, 1894, 1832, and 1522 µJ; the saved energies relative
to the energy in column 4 are displayed as percentages in columns 6-8. Our experiments
show that the savings of dynamic energy consumption are from 40% to over 70% relative
to the energy used in the case of a flat memory design. Although previous models produce
energy savings as well, our model led to 20%-33% better savings than them.
Different from the previous works on power-aware assignment to the memory layers, our
framework provides also the mapping functions that determine the exact locations for any
Data Storage
Application #Memory Mem. Dyn. energy SPM Dyn. energy Dyn. energy Dyn. energy CPU
parameters accesses size 1-layer [µJ] size saved (7) saved (18) saved [sec]
Motion detection 136,242 2,740 486 841 30.2% 44.5% 49.2% 4
M=N=32, m=n=4
Motion estimation 864,900 3,624 3,088 1,416 38.7% 40.7% 50.7% 23
M=32, N=16
Durbin algorithm 1,004,993 1,249 3,588 764 55.2% 58.5% 73.2% 28
N =500
SVD updating 6,227,124 34,950 22,231 12,672 35.9% 38.4% 46.0% 37
n =100
Vocoder 200,000 12,690 714 3,879 30.8% 32.5% 39.5% 8
Table 1. Experimental results.
128
Power-Aware Memory Allocation for Embedded Data-Intensive Signal Processing Applications 129
array element in the specification. This provides the necessary information for the automated
design of the address generation unit, which is one of our future development directions.
Different from the previous works on signal-to-memory mapping, our framework offers a
hierarchical strategy and, also, two metrics of quality for the memory allocation solutions:
(a) the sum of the minimum array windows (that is, the optimum memory sharing between
elements of same arrays), and (b) the minimum storage requirement for the execution of the
application code (that is, the optimum memory sharing between all the scalar signals or array
elements in the code) (3).
5. Conclusions
This chapter has presented an integrated computer-aided design methodology for power-
aware memory allocation, targeting embedded data-intensive signal processing applications.
The memory management tasks – the signal assignment to the memory layers and their map-
ping to the physical memories – are efficiently addressed within a common polyhedral frame-
work.
6. References
[1] F. Angiolini, L. Benini, and A. Caprara, “An efficient profile-based algorithm for scratch-
pad memory partitioning,” IEEE Trans. Computer-Aided Design, vol. 24, no. 11, pp. 1660-
1676, Nov. 2005.
[2] D. Avis, “lrs: A revised implementation of the reverse search vertex enumeration al-
gorithm,” in Polytopes – Combinatorics and Computation, G. Kalai and G. Ziegler (eds.),
Birkhauser-Verlag, 2000, pp. 177-198.
[3] F. Balasa, H. Zhu, and I.I. Luican, “Computation of storage requirements for multi-
dimensional signal processing applications,” IEEE Trans. VLSI Systems, vol. 15, no. 4,
pp. 447-460, April 2007.
[4] R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel, “Scratchpad mem-
ory: A design alternative for cache on-chip memory in embedded systems,” in Proc. 10th
Int. Workshop on Hardware/Software Codesign, Estes Park CO, May 2002.
[5] A.I. Barvinok, “A polynomial time algorithm for counting integral points in polyhedra
when the dimension is fixed,” Mathematics of Operations Research, vol. 19, no. 4, pp. 769-
779, Nov. 1994.
[6] L. Benini, L. Macchiarulo, A. Macii, E. Macii, and M. Poncino, “Layout-driven memory
synthesis for embedded systems-on-chip,” IEEE Trans. VLSI Systems, vol. 10, no. 2, pp.
96-105, April 2002.
[7] E. Brockmeyer, M. Miranda, H. Corporaal, and F. Catthoor, “Layer assignment tech-
niques for low energy in multi-layered memory organisations,” in Proc. ACM/IEEE De-
sign, Automation & Test in Europe, Munich, Germany, Mar. 2003, pp. 1070-1075.
[8] F. Catthoor, S. Wuytack, E. De Greef, F. Balasa, L. Nachtergaele, and A. Vandecappelle,
Custom Memory Management Methodology: Exploration of Memory Organization for Embed-
ded Multimedia System Design, Boston: Kluwer Academic Publishers, 1998.
[9] E. Chan and S. Panchanathan, “Motion estimation architecture for video compression,”
IEEE Trans. on Consumer Electronics, vol. 39, pp. 292-297, Aug. 1993.
[10] Ph. Clauss and V. Loechner, “Parametric analysis of polyhedral iteration spaces,” J. VLSI
Signal Processing, vol. 19, no. 2, pp. 179-194, 1998.
130 Data Storage
[11] S. Coumeri and D.E. Thomas, “Memory modeling for system synthesis,” IEEE Trans.
VLSI Systems, vol. 8, no. 3, pp. 327-334, June 2000.
[12] A. Darte, R. Schreiber, and G. Villard, “Lattice-based memory allocation,” IEEE Trans.
Computers, vol. 54, pp. 1242-1257, Oct. 2005.
[13] E. De Greef, F. Catthoor, and H. De Man, “Memory size reduction through storage order
optimization for embedded parallel multimedia applications,” Parallel Computing, special
issue on “Parallel Processing and Multimedia,” Elsevier, vol. 23, no. 12, pp. 1811-1837,
Dec. 1997.
[14] G. De Micheli, Synthesis and Optimization of Digital Circuits, McGraw-Hill, 1994.
[15] J.Z. Fang and M. Lu, “An iteration partition approach for cache or local memory thrash-
ing on parallel processing,” IEEE Trans. Computers, vol. 42, no. 5, pp. 529-546, 1993.
[16] B. Franke and M. O’Boyle, “Compiler transformation of pointers to explicit array accesses
in DSP applications,” in Proc. Int. Conf. Compiler Construction, 2001.
[17] C. Ghez, M. Miranda, A. Vandecapelle, F. Catthoor, D. Verkest, “Systematic high-level
address code transformations for piece-wise linear indexing: illustration on a medical
imaging algorithm,” in Proc. IEEE Workshop on Signal Processing Systems, pp. 623-632,
Lafayette LA, Oct. 2000.
[18] Q. Hu, A. Vandecapelle, M. Palkovic, P.G. Kjeldsberg, E. Brockmeyer, and F. Catthoor,
“Hierarchical memory size estimation for loop fusion and loop shifting in data-
dominated applications,” in Proc. Asia-S. Pacific Design Automation Conf., Yokohama,
Japan, Jan. 2006, pp. 606-611.
[19] I. Issenin, E. Brockmeyer, M. Miranda, and N. Dutt, “Data reuse analysis technique for
software-controlled memory hierarchies,” in Proc. Design, Automation & Test in Europe,
2004.
[20] I. Issenin and N. Dutt, “FORAY-GEN: Automatic generation of affine functions for mem-
ory optimization,” Proc. Design, Automation & Test in Europe, 2005.
[21] M. Kandemir and A. Choudhary, “Compiler-directed scratch-pad memory hierarchy de-
sign and management,” in Proc. 39th ACM/IEEE Design Automation Conf., Las Vegas NV,
June 2002, pp. 690-695.
[22] N. Manjiakian and T. Abdelrahman, “Reduction of cache conflicts in loop nests,” Tech.
Report CSRI-318, Univ. Toronto, Canada, 1995.
[23] M. Moonen, P. V. Dooren, and J. Vandewalle, “An SVD updating algorithm for subspace
tracking,” SIAM J. Matrix Anal. Appl., vol. 13, no. 4, pp. 1015-1038, 1992.
[24] A. Schrijver, Theory of Linear and Integer Programming, New York: John Wiley, 1986.
[25] W. Shiue and C. Chakrabarti, “Memory exploration for low-power embedded systems,”
in Proc. 35th ACM/IEEE Design Automation Conf., June 1998, pp. 140-145.
[26] G. Talavera, M. Jayapala, J. Carrabina, and F. Catthoor, “Address generation optimiza-
tion for embedded high-performance processors: A survey,” J. Signal Processing Systems,
Springer, vol. 53, no. 3, pp. 271-284, Dec. 2008.
[27] L. Thiele, “Compiler techniques for massive parallel architectures,” in State-of-the-art in
Computer Science, P. Dewilde (ed.), Kluwer Acad. Publ., 1992.
[28] R. Tronçon, M. Bruynooghe, G. Janssens, and F. Catthoor, “Storage size reduction by in-
place mapping of arrays,” in Verification, Model Checking and Abstract Interpretation, A.
Coresi (ed.), 2002, pp. 167-181.
[29] S. Udayakumaran and R. Barua, “Compiler-decided dynamic memory allocation for
scratch-pad based embedded systems,” in Proc. Int. Conf. on Compilers, Architecture, and
Synthesis for Embedded Systems, pp. 276-286, New York NY, Oct. 2003.
Power-Aware Memory Allocation for Embedded Data-Intensive Signal Processing Applications 131
X9
1. Introduction
As the number and functionality of intellectual property blocks (IPs) in System on Chips
(SoCs) increase, complexity of interconnection architectures of the SoCs have also been
increased. Different researches have been published in high performance SoCs; however,
the system scalability and bandwidth are limited. Network on Chip (NoC) is emerging as
the best replacement for the existing interconnection architectures. NoC is composed of
network of interconnects and number of temporary storage elements called switches. The
temporary storage element of different NoC architectures has different number of ports. The
main component of the port is the virtual channels. The virtual channels consist of several
buffers controlled by a multiplexer and an arbiter which grants access for only one buffer at
a time according to the request priority. When the number of buffers is increased, the
throughput increases. High throughput and low latency are the desirable characteristics of a
multi processing system. More research is needed to enhance performance of NoC
components (network of interconnects and the storage elements). Many NoC architectures
have been proposed in the past, e.g., SPIN (Guerrier & Greiner, 2000), CLICHÉ (Kumar et
al., 2002), Folded Torus (Dally & Towles, 2001), Octagon (Karim et al., 2002) and Butterfly
fat-tree (BFT) (Pande et al., 2003a). Among those, the butterfly fat tree (BFT) has found
extensive use in different parallel machines and shown to be hardware efficient (Grecu et al.,
2004a). The main advantage of the butterfly fat tree is that the number of storage elements in
the network converges to a constant irrespective of the number of levels in the tree network.
In the SPIN architecture, redundant paths contained within the fat tree structure are utilized
to reduce contention in the network. CLICHÉ (Chip-Level Integration of Communicating
Heterogeneous Elements) is simplest from a layout perspective and the local interconnections
between resources and storage elements are independent of the size of the network. In the
Octagon architecture, the communication between any two nodes takes at most two hops
within the basic Octagon unit.
After the NoC design paradigm has been proposed (Dally & Towles, 2001) ; (Kumar et al.,
2002) ; (Guerrier & Greiner, 2000) ; (Karim et al., 2002) ; (Pande et al., 2003) ; (Benini &
134 Data Storage
Micheli, 2002) ; (Grecu et al., 2004a), many researches on architectural and conceptual aspects
of NoC have been reported such as topology selection (Murali & Micheli, 2004), quality of
service (QoS) (Bolotin et al., 2004), design automation (Bertozzi et al., 2005) ; (Liang et al.,
2004) ; ( Pande et al., 2005a), performance evaluation (Pande et al., 2005b) ; (Salminen et al.,
2007) ; (Grecu et al., 2007a) and test and verification (Grecu et al., 2007b) ; (Kim et al., 2004) ;
(Murali et al., 2005). These researches have taken a top-down approach (a high level analysis
of NoC) and they didn’t touch the issues on a circuit level. However, a little research has
reported on design issues in implementation of NoC in the perspective of circuit level (Lee et
al., 2003) ; (Lee et al., 2004) ; (Lee & Kim et al., 2005) ; (Lee et al., 2006) ; (Lee; Lee & Yoo, 2005).
Although, they were implemented and verified on the silicon, they were only focusing on
implementation of limited set of architectures.
In large-scale NoC, power consumption should be minimized for cost-efficient
implementations. Although different researches have been published in NoCs, they were
only focusing on performance and scalability issues rather than power efficiency. Scaling
with power reduction is the trend in future technologies. Lowering supply voltage is the
most effective way to reduce power consumption. With lowering supply voltage, the
threshold voltage (VTH) has to be decreased to achieve high performance requirements.
Reducing VTH causes significant increase in the leakage component. Different researches
have been published in power minimization of high performance CMOS circuits (Khellah &
Elmasry, 1999) ; (Kao & Chandrakasan, 2000) ; (Kursun & Friedman, 2004).
In this chapter, different tradeoffs in designing efficient NoC including both elements of the
network (interconnects network and storage elements) are described. Building high
performance NoC is presented. In addition, a high throughput architecture is proposed. The
proposed architecture to achieve high throughput can improve the latency of the network.
The circuit implementation issues are considered in the proposed architecture. The switch
structure along with the interconnect architecture are shown in Fig. 1 for 2 IPs and 2 switches.
The proposed architecture is applied to different NoCs topologies. The efficiency and
performance are evaluated. To the best of our knowledge, this is the first in depth analysis on
circuit level to optimize performance of different NoC architectures.
This chapter is organized as follows: In Section 2, the proposed port architecture is presented.
The new High Throughput architecture is described in Section 3. In Section 4, power
characteristics for different high throughput architectures are provided. The performance and
overhead analysis of the proposed architecture are provided in Section 5. In Section 6, the
proposed design of low power NoC switch is described. Finally, conclusions are provided in
Section 7.
High Throughput Architecture for High Performance NoC 135
2. Port architecture
The switch of different architectures has different number of ports. Each port of the switch
includes input virtual channels, output virtual channels, a header decoder, controller, input
arbiter and output arbiter as shown in (Pande et al., 2003a). When the number of virtual
channel is increased, the throughput increases. The input arbiter is used to allow only one
virtual channel to access a physical port. The input arbiter consists of a priority matrix and
grant circuits (Pande et al., 2003b).
The priority matrix stores the priorities of the requests. The grant circuits generate the
granted signals to allow only one virtual channel to access a physical port. The messages are
divided into fixed length flow control units (flits). When the granted virtual channel stores
one whole flit, it sends a full signal to controller. If it is a header flit, the header decoder
determines the destination. The controller checks the status of destination port. If it is
available, the path between input and output is established. All subsequent flits of the
corresponding packet are sent from input to output using the established path. The flits
from more than one input port may simultaneously try to access a particular output port.
The output arbiter is used to allow only one input port to access an output port.
Virtual channels consist of several buffers controlled by a multiplexer and an arbiter which
grants access for only one virtual channel at a time according to the request priority. Once
the request succeeds, its priority is set to be the lowest among all other requests. In the
proposed architecture, rather than using one multiplexer and one arbiter to control the
virtual channels, two multiplexer and two arbiters are employed as shown in Fig. 2. The
virtual channels are divided into two groups, each group controlled by one multiplexer and
one arbiter. Each group of virtual channels is supported by one interconnect bus as
described in Section 3. However looks trivial, this port architecture has a great influence on
the switch frequency and the throughput of the network.
Let us consider an example with the number of virtual channels of 8 channels. In the NoC
architecture, 8x8 input arbiter and 8x1 multiplexer are needed to control the input virtual
channels as shown in Fig. 2 (a). The 8x8 input arbiter consists of 8x8 grant circuit and 8x8
priority matrix. In the proposed architecture, two 4x4 input arbiters, two 4x1 multiplexers,
2x1 multiplexers and 2x2 grant circuit are integrated to allow only one virtual channel to
136 Data Storage
access a physical port as shown in Fig. 2 (b). The 4x4 input arbiter consists of 4x4 grant
circuit and 4x4 priority matrix. The values of the grant signals are determined by the
priority matrix. The number of grant signals equals to the number of requests and the
number of selection signals of the multiplexer. The area of 8x8 input arbiter is larger than
the area of two 4x4 input arbiters. Also, the area of 8x1multiplexer is larger than the area of
two 4x1 multiplexers. Consequently, the required area to implement the proposed switch
with the proposed architecture is less than the required area to implement the conventional
switch. In order to divide a 4x1 multiplexer into three 2x1 multiplexers, the 4x4 input arbiter
should be divided into three 2x2 input arbiters. The grant signals generated by three 2x2
input arbiter (6 signals) aren’t the same grant signals generated by the 4x4 input arbiter (4
signals). Therefore, the 4x4 input arbiter can’t be replaced by three 2x2 input arbiters unless
the number of interconnect buses is increased to be equal the number of virtual channels
groups. By increasing the number of interconnect buses, the metal resources and power
dissipation are increased as described in Section 5.
Without circuit optimization in BFT architecture, the change in the maximum frequency of
the switch with the number of virtual channels is shown in Fig. 3. When the number of
virtual channels is increased beyond four, the maximum frequency of the switch is
decreased. The throughput is saturated when the number of virtual channels is increased
beyond four (Pande et al., 2005b) for different number of ports. On the other hand, the
average message latency increases with the number of virtual channels. To keep the latency
low while preserving the throughput, the number of virtual channels is constrained to four
(Pande et al., 2003b),(Pande et al., 2005b). Throughput is a parameter that measures the rate
in which message traffic can be sent across a communication network. It is defined by
(Pande et al., 2005b):
throughput, the links (interconnects) connecting the switches with each other should be
increased. Since the number of virtual channels could be doubled (from four to eight),
doubling the number of virtual channels between switches is proposed.
(a) (b)
Fig. 2. (a) Circuit diagram of switch port, (b) circuit diagram of High Throughput switch
port.
400
350
max. frequency (MHz)
300
250
200 BFT
150
100 HTBFT
50
0
2 3 4 5 6 7 8 10 12 16
Number of virtual channels
Fig. 3. Maximum frequency of a switch with different number of virtual channels for BFT
and HTBFT.
138 Data Storage
600
max. frequency (MHz) 500
400 Octagon
CLICHÉ
300
SPIN
200 HT‐Octagon
HT‐CLICHÉ
100
HT‐SPIN
0
2 3 4 5 6 7 8 10 12 16
Number of virtual channels
Fig. 4. Maximum frequency of a switch with different number of virtual channels for
different NoC architectures.
Let us consider an example of BFT architecture. The area required to implement the BFT
switch and HT-BFT switch is shown with different number of virtual channels in Fig. 5. The
HT-BFT architecture decreases the area of switch by 18%. Consequently, a system with eight
virtual channels achieves high throughput, high frequency and low latency while the area of
design is optimized. The architectures of different NoC topologies to achieve high
throughput network is discussed in Section 3.
40
Number of transistors
35
30
25
(x104)
20
BFT
15
10 HTBFT
5
0
2 3 4 5 6
Number of virtual channels
Increasing the number of buses between two switches could improve the throughput by
optimizing the design of the switch on the circuit level as shown in Section II. However,
using two buses to connect two switches implies a consumption of the metal resources and
may be silicon area for the repeaters within long interconnect bus. The overhead of the
proposed architecture is discussed in Section 5. Applying the proposed high throughput
architecture on different NoC topologies is presented in the following subsections.
connected 4 down links and 4 up links. Each group of 4 leaf nodes needs one switch. At the
next level, the same number of switches are needed (every 4 switches on the lower level
need 4 switches at the next level). This relation continues with each succeeding level. The
main rationale behind this approach is utilization of the redundant buses by the routers in
order to reduce contention in the network. Therefore, SPIN trades area overhead and extra
power dissipation for higher throughput. The interconnect template to integrate IP blocks
using High Throughput SPIN (HT-SPIN) architecture is shown in Fig. 6 (d). In the proposed
HT-SPIN architecture, the double number of buses is needed to connect between each two
switches or between an IP block and a switch. Due to the higher usage of on-chip resources
by the interswitch links, applying the high throughput architecture on SPIN topology is not
efficient for insignificant improvement of throughput as described in Section 5. The power
characteristics for different high throughput architectures are provided in Section 4.
(a) (b)
(c) (d)
Fig. 6. proposed interconnect architectures. (a) HTBFT. (b) HT- CLICHÉ. (c) HT-Octagon.
(d) HT-SPIN.
4. Power Characteristics
Power dissipation is a primary concern in high speed, high complexity integrated circuits
(IC). Power dissipation increases rapidly with the increase in frequency and transistor
density in integrated circuits. To achieve power efficient NoC, power dissipation need to be
characterized for different topologies. Communication network on chip contains three
primary parts; network switch, interswitch links (interconnects), and repeaters within
interswitch links as shown in Fig. 7. Including different sources of power consumption in
NoC, the total power dissipation of on chip network is defined as follows:
������ � ��������� � ����� � ���� �2�
��������� � ���������� � �������� �3�
High Throughput Architecture for High Performance NoC 141
where ��������� is the total power dissipation of these switches forming the network.
��������� is the summation of switching (including dynamic and short circuit) power and
leakage power of switches. ����� is the total power dissipation of interswitch links. ���� is the
total power dissipation of the repeaters which are required for long interconnects. The
number of repeaters depends on the length of the interswich link. According to the topology
of NoC interconnects, the interswitch wire lengths, the number of repeaters and the number
of switches can be determined a priori.
The power consumption of interswitch links ����� and the power consumption of
repeaters are defined by (El-Moursy & Friedman, 2004)
�
����� � � ��� � �4�
���� � �������� � ������� � ������������ ���
�
�������� � ���� ���� �� ��� � ���
where �������� is the total dynamic power dissipation of repeaters, ���� is the number of
repeaters, ���� is the optimal repeater size and �� is the input capacitance of a minimum size
repeater. ������� is the total short-circuit power of repeaters. ������������ is the total leakage
power dissipation of repeaters. ������������ and ������� are negligible as compared to the
total dynamic power dissipation of repeaters [32 ]. The closed form equations for the power
dissipation of different high throughput NoC architectures are described in the following
subsections.
√����
������ � �7�
2��������
������
� 1 � �1�2�
��������������� � � � ���
4 1 � 1�2
where ������ is the length of the wire spanning the distance between level a and a+1
switches, where a can take integer values between 0 and (levels-1). In the HT-BFT, The total
142 Data Storage
length of interconnect and the total number of repeaters can be determined from the
following equations:
√����
���������� � � � �������� � 2������ ���
2������ �����
√����
������������������� � 4 � � √� �√� � 1� ������ �14�
√� ����
Using the number of ports, number of switches, total length of interconnects and number
of repeaters, the total power consumption of the HT-CLICHÉ architecture can be
determined by the following expression:
�
������������� � � � ����� � 4 √���� �√� � 1� ������ � ��� �
√���� �
� 4� � √� �√� � 1� ������ ���� �� ��� � �1��
√� ����
(connecting nodes 1-8 and 4-5), forth (connecting nodes 1-2, 2-3, 3-4, 5-6, 6-7 and 7-8). the
interswitch wire lengths can be defined by:
��
�� � ����
4
�
�� � �� �� ������ � ����
4
�� � �� �� ������ ����
�
�� � ����
4
����
Where L is the length of four nodes; it equals to �4 � � �
�. �� is the summation of the
global interconnect width and space. Considering the interswitch wire lengths and the
optimal length of global interconnect, the total length of interconnect and number of
repeaters can be obtained by:
�������������� � �� � � ��4 �� ������ � ������ ���������� ����
�������������������� �
��� �� �� ������ � ��� �� �� ������ ��
�4 �� �� � 4 � ����
�� 4 � � � �� �� � �� ������ ��������� ����
��� ���� ���
Where ���������� is the number of basic octagon unit. The total power dissipation of the HT-
Octagon architecture can be determined by the following expression:
���� �
�������������� � � � ����� � ���� �� �
� � ��4 �� ������ � ������ ���������� � � ��� ��
���� ����
� �� � �� �� ������ � �� �
� � �� �� ������
��4 � � � 4� �� 4 � ��
���� ���� ����
����
�� �
�
�
�� � �� ������ ��������� � ���� �� ��� � ����
����
�� � √���� √����
������������������������������ � � � �
��������� � � ������√������������� ������ � � ���� �� � �� � �
����� �����
√���� �
��������������������������� �� ���������� ���� ��� ��� �����������������������������������������
�����
80
60 HT‐BFT
40 HT‐SPIN
HT‐CLICHÉ
20
HT‐Octagon
0
16 32 64 128 256 512 1024
Number of IP blocks
Fig. 8. power dissipation of different NoC architectures
The percentage of the power dissipation of the interswitch links and repeaters is shown in
Fig. 9. For the SPIN and architecture, the power dissipation of the interswitch links and
repeaters equals to 25% of the total power dissipation of the architecture. For the BFT,
CLICHÉ and Octagon architectures, the percentage of power dissipation of the interswitch
links and repeaters decreases with the number of IP blocks.
60
interconnect and repeaters
50
Power dissipation (%)
percentage of the
40
HT‐BFT
30 HT‐SPIN
20 HT‐CLICHÉ
10 HT‐Octagon
0
16 32 64 128 256 512 1024
Number of IP blocks
Fig. 9. power dissipation of interswitch links and repeaters for different NoC architectures.
0,4
(flit/cycle/IP)
Throughput
0,3
BFT
0,2 HTBFT
0,1
1 2 4 6 8 10 12 14 16
Number of virtual channels
Fig. 10. Throughput for different number of virtual channels.
5.2.1 HT-BFT
It is possible to organize the butterfly fat tree so that it can be laid out in O(N) active area(IPs
and switches) and O(log(N)) wiring layers (Dehon, 2000). The basic strategy for wiring is to
distribute tree layers in pair of wire layers – one for horizontal wiring Ha+1,a and one for
vertical wiring Va+1,a. The length of horizontal part Ha+1,a equals to the length of vertical part
Va+1,a given that the chip is squared. More than one tree layer can share the same wiring
trace. High throughput architecture has the same number of switches, but the number of
wires and repeaters will be doubled. The length of interswitch wire depends on the number
of levels in BFT, which depends on the system size as shown in eq (7).
In the circuit implementation of HT-BFT, a bus between each two switches has 12 wires, 8
for data and 4 for control signals. Considering a system of 256 IP blocks, the length of Ha+1,a
and Va+1,a are calculated. The number of BFT levels is seven. Using the critical interconnect
length, the number of repeaters equals to 960 repeaters. The area of repeaters required to
implement the HT-BFT interswitch links equals to 20880 µm2 (it equals to the double area of
repeaters required for BFT interswitch links). The power consumption of repeaters and
switches required to implement the BFT and HT-BFT is presented in Table 2. The power
consumption required to implement HT-BFT is increased by 7% as compared with the
power consumption of BFT.
Percentage of
Power
power
dissipation of Power Total power
No. of dissipation of
Architecture repeaters and dissipation of dissipation
repeaters repeaters and
interswitch switches (mw) (mw)
interswitch
links (mw)
links (%)
BFT 960 1458.24 15663.68 17121.92 8.5
HT-BFT 1920 2916.48 15674.84 18591.32 15.7
Table 2. power consumption of repeaters and switches for BFT and HT-BFT
The horizontal wiring is distributed in the metal layer no. 11 and the vertical wiring is
distributed in the metal layer no. 12. The total length of horizontal wires needed equals to
4800 mm (it is 5 % of the total metal resources available in the metal 11). The same for total
length of vertical wires, it requires 5 % of the total metal resources available in the metal 12.
High Throughput Architecture for High Performance NoC 147
For the proposed design, the double number of interswitch links is required to achieve the
communication between each two switches. Therefore, the total metal resources required to
implement the proposed architecture will be 10%. The metal resources of HT-BFT
architecture equals to the double metal resources of BFT architecture. The extra metal
resources required to achieve the proposed architecture is negligible as compared to the
metal resources.
The percentage of the metal resources and power consumption of interswitch links and
repeaters for different technology node is shown in Table 3. With the advance in technology,
the available metal resources in the same die size are increased. Therefore, the number of
IPs could be increased. The number of switches is also increased. The required metal
resources to implement the BFT and HT-BFT are increased by fewer rates than the rates of
increase of the available metal resources with the advance in technology. The extra metal
resources and power consumption required to implement the HT-BFT decreases. The extra
power consumption required to achieve the proposed architecture is 1% of the total power
consumption of the BFT architecture. Also, the extra metal resources required for HT-BFT is
3% of the metal resources. The HT-BFT is more efficient with the advance in technology.
Percentage of Percentage of
power power
Percentage Percentage
No. consumption consumption
Technology No. of of BFT of HT-BFT
of of interswitch of interswitch
node levels metal metal
IPs links and links and
resources resources
repeaters for repeaters for
BFT HT-BFT (mw)
130 nm 500 6 10.26% 20.5% 4.95 % 9.89 %
90 nm 1000 7 4.49% 8.98% 4.02 % 8.04 %
65 nm 2500 9 1.32% 2.64% 2.55 % 5.1 %
45 nm 7500 10 0.59% 1.19% 2.97 % 5.94 %
Table 3. metal resources and power consumption of interswitch links and repeaters for HT-
BFT and BFT
5.2.2 HT-CLICHÉ
The CLICHÉ architecture with N IP blocks can be laid out in O(N) active area(IPs and
switches) and O(√�) interswitch links. In the circuit implementation of HT-CLICHÉ, a bus
between each two switches has 20 wires, 16 for data and 4 for control signals. Considering a
system of 256 IP blocks, the architecture consists of 16x16 mesh of switches interconnecting
the IPs. The length of horizontal links and vertical links equal to 1.25 mm. They are smaller
than the critical interconnect length. Therefore, no repeaters are needed within the
interswitch links. The power dissipation of the network is presented in Table 4 for CLICHÉ
and HT-CLICHÉ. The extra power dissipation required to implement HT-CLICHÉ for 256
IPs equals to 5%.
148 Data Storage
Power
Percentage of
consumption
Power power dissipation
of interswitch Total power
Architecture consumption of of repeaters and
links and dissipation (mw)
switches (mw) interswitch links
repeaters
(%)
(mw)
CLICHÉ 1398 24448 25846 5.4
HT-CLICHÉ 2796 24471 27267 10.25
Table 4. power consumption for CLICHÉ and HT-CLICHÉ architectures
Using the equation no. 12, the total length of interswitch links is calculated. Distributing the
horizontal and vertical interswitch links into metal 11 and metal 12 respectively, the metal
resources required to implement the horizontal wires equals to 7 % of the total metal
resources available in the metal 11. Also, the metal resources required to the vertical wires
equals to 7 % of the total metal resources available in the metal 12. Therefore, the total metal
resources required to implement the HT-CLICHÉ architecture will be 14%. The increasing
percentage of the metal resources for HT-CLICHÉ is negligible as compared to the metal
resources.
Since the interswitch links is short enough, there is no need for repeaters within the
interconnects, the power and metal resources consumed by CLICHÉ and HT-CLICHÉ are
shown in Table 5 for different technology nodes. With the advance in technology, the power
dissipation required to implement the HT-CLICHÉ is increased by less than 2% of the total
power consumption of the CLICHÉ architecture. The percentage of metal resources for HT-
CLICHÉ is increased by 35% as compared with the metal resources of CLICHÉ. The HT-
CLICHÉ trades extra metal resources for higher throughput.
Percentage of Percentage of
power power
Percentage of Percentage of
No. consumption of consumption of
Technology CLICHÉ HT-CLICHÉ
of interswitch interswitch
node metal metal
IPs links and links and
resources (%) resources (%)
repeaters for repeaters for
CLICHÉ (%) HT-CLICHÉ (%)
130 nm 361 7.6 14.1 21 43
90 nm 729 4.8 9.1 22 44
65 nm 1849 2.7 5.2 28 57
45 nm 5625 1.4 2.7 36 71
Table 5. Power consumption of interswitch links and repeaters for HT-CLICHÉ and
CLICHÉ
The percentage of power consumption and metal resources required to implement the
Octagon and HT-Octagon networks in different technologies are shown in Table 7. By
increasing the number of IP blocks with the advance in technology, the extra power
consumption required to implement the proposed architecture is decreased. The extra
power consumption is 2% of the total power consumption of the Octagon architecture. The
percentage of extra metal resources for HT-Octagon is 25% of the available metal resources.
Percentage of
Percentage of
power
power
consumption of Percentage of Percentage of
No. consumption
interswitch Octagon HT-Octagon
Technology node of of interswitch
links and metal metal
IPs links and
repeaters for resources (%) resources (%)
repeaters for
HT-Octagon
Octagon (%)
(%)
130 nm 361 7.6 14.1 13 26
90 nm 729 4.78 9.1 13 26
65 nm 1849 2.8 5.4 17 35
45 nm 5625 1.6 3.1 25 50
Table 7. Power consumption of interswitch links and repeaters for HT-Octagon and Octagon
5.2.4 HT-SPIN
By applying the high throughput architecture on SPIN topology, the length of interswitch
links and number of repeaters are calculated by eq. (22) and eq. (23) respectively.
Considering a system of 256 IP blocks, the number of repeaters equals to 12288 repeaters.
The area of repeaters required to implement the HT-SPIN interswitch links equals to 267264
µm2 (it equals to the double area of repeaters required for SPIN interswitch links). The
horizontal wires and vertical wires are distributed into metal 11 and metal 12 respectively.
The length of horizontal wires needed consumes 28 % of the total metal resources available
in the metal 11. The vertical wires needed consume 28 % of the total metal resources
available in the metal 12. The total metal resources required to implement the proposed HT-
150 Data Storage
SPIN architecture will be 56%.The power consumption of interswitch links, repeaters and
switches required to implement the SPIN and HT-SPIN is presented in Table 8. The extra
power dissipation required by the interswitch links and repeaters for HT-SPIN architecture
(with 256 IPs) equals to 15% as compared with the total power dissipation.
Percentage of
Power
power
dissipation of Power Total power
No. of dissipation of
Architecture repeaters and dissipation of dissipation
repeaters repeaters and
interswitch switches (mw) (mw)
interswitch
links (mw)
links (%)
SPIN 12288 10612.99 32263.68 42876.67 24.75
HT-SPIN 24576 21225.98 32280.96 53506.94 39.67
Table 8. power consumption of repeaters and switches for SPIN and HT-SPIN
For different technologies, the power consumption and metal resources required to
implement the SPIN and HT-SPIN are shown in Table 9. With the advance in technology,
the extra power consumption required to achieve the proposed HT-SPIN architecture is 15%
of the total power consumption of the architecture. The percentage of extra metal resources
needed is more than 100% of the metal resources. Therefore, the overhead in the HT-SPIN is
high. Applying the high throughput architecture on the SPIN topology is not recommended.
Percentage of
Percentage of
power
power Percentage Percentage
consumption
No. consumption of SPIN of HT-SPIN
Technology of
of of interswitch metal metal
node interswitch
IPs links and resources resources
links and
repeaters for (%) (%)
repeaters for
HT-SPIN (%)
SPIN (%)
130 nm 400 41.2 58.4 21 42
90 nm 800 33.6 50.3 30 59
65 nm 2000 32.6 49.1 59 118
45 nm 6000 28.1 43.8 126 253
Table 9. Power consumption of interswitch links and repeaters for HT-SPIN and SPIN
Since the proposed architecture increases the power dissipation, a low power NoC switch is
proposed in Section 6.
during the output mode. The stand-by transistors (M2) disconnect the output circuit from
the supply voltage during the input mode. There is no need for the new control signals to
control the stand-by transistors (M1 and M2). The acknowledgment signals (Ack_in and
Ack_out) developed by the control unit are used to control the stand-by transistors M1 and
M2 respectively. Using the number of virtual channels ሺܰ ሻ, The number of stand-by
transistors equals to ͵ ʹܰ . The number of virtual channels is limited (it is not more 16
virtual channels (Abd El Ghany et al., 2009a)). By comparing the number of stand-by
transistors with the total number of transistors required to implement the NoC port (as
described is Section 2), the number of stand-by transistors is less than 1% of the total
number of transistors. Therefore, the area overhead in the proposed design is negligible as
compared to the area of NoC switch. The total power dissipation can be reduced by using
power gating technique.
Using the Cadence tools and 90nm technology node, the proposed low power NoC switch is
implemented. The power dissipation of BFT switch is determined. The total power
dissipation of the BFT switch equals to 41.29 mW. The total power dissipation of the port
during the input mode equals to 6.79 mW. The total power dissipation of the port during the
output mode equals to 6.57 mW. In the proposed BFT switch design of one virtual channel,
the power dissipation of the main components of the port for the active mode and sleep
mode is obtained as shown in Table 10. According to the mode of operation, the activation
of the component is determined. In the Input mode, the input FIFO, header decoder and
crossbar are activated, while the output FIFO is switched to sleep mode. The power
dissipation of the port will be 5.68 mW. In the output mode, the output FIFO is activated
while the input FIFO, the header decoder and cross bar are switched to sleep mode. The
power dissipation of the port equals to 3.79 mW. Therefore, the average power dissipation of
the proposed switch equals to 29,59 mW. The average power dissipation of the proposed
BFT switch is decreased by 28.32 % as compared to the average power dissipation of the
conventional BFT switch.
152 Data Storage
The power consumption of BFT switch increases with the number of virtual channels as
shown in Fig. 12. Applying the leakage power reduction technique on the BFT with different
number of virtual channels, the power reduction increases with the number of virtual
channels. The percentage of power reduction equals to 28 % when the number of virtual
channels equals to one. The percentage of power reduction of BFT switch with 12 virtual
channels equals to 45%. Increasing the number of virtual channels can improve the
throughput in an on- chip interconnect network. By optimizing the design on the circuit
levels, the high throughput can be provided by eight virtual channels (Abd El Ghany et al.,
2009a). Using the leakage power reduction technique, the power consumption of BFT switch
with 8 virtual channels is reduced by 44 %.
With the advance in technology, the number of IPs implemented in the same system size is
increased. The effect of power gating technique on the HT-BFT is presented in Fig. 13. The
power consumption of HT-BFT architecture using the leakage power reduction technique
(HT-BFT-PR) is less than the power consumption of the conventional BFT architecture.
5
total power dissipation of one
total power disipation of one
switch as compared to the
4
BFT
3
VC
2
BFT using leakage
power reduction
1
technique
0
1VC 2VC 3VC 4VC 6VC 8VC 10VC 12VC
Fig. 12. power dissipation of a switch with different number of virtual channels.
High Throughput Architecture for High Performance NoC 153
80
70
Power dissipation (W)
60
50
40 BFT
30 HTBFT
20
HTBFT‐PR
10
0
16 32 64 128 256 512 1024
Number of IP blocks
Fig. 13. power dissipation of the HT-BFT using the power reduction technique
The power consumption ܲ௦௪௧௦ of switches for different high throughput architectures is
obtained as shown in Table 11. The power consumption of these switches is more than 80%
of the total power consumption of the on chip network. Switching off the power supply is
an efficient technique to reduce the total power dissipation of NoC. The minimum power
consumption can be obtained by using the BFT architecture as presented in Table 11. Using
the leakage power reduction technique, the power consumption for different NoC
architectures is determined. The overall power consumption, includes the power
consumption of the interswitch links and repeaters, is decreased up to 33%.
Network Total Pswitches Total power using Percentage of
architecture power power reduction power
(mW) technique (mW) reduction
mW %
HT-BFT 18591.32 15674.84 84 15104.44 19%
HT-SPIN 53506.94 32280.96 60 46312.7 13%
HT-CLICHÉ 26148.64 24471 94 18608.16 29%
HT-Octagon 22072.32 19884.08 90 16253.04 26%
Table 11. the total power consumption of different network architectures with 256 IPs
7. Conclusions
In this chapter, the high throughput NoC architecture is proposed to increase the throughput
of the switch in NoC. The proposed architecture can also improve the latency of the network.
The proposed high throughput interconnect architecture is applied on different NoC
architectures. The architecture increases the throughput of the network by more than 38%
while preserving the average latency. The area of high throughput NoC switch is decreased
by 18% as compared to the area of BFT switch. The total metal resources required to
implement the proposed high throughput NoC is increased by less than 10 % as compared to
the metal resources required to implement the conventional NoC design.
The power characterization for different high throughput NoC architectures is developed.
The extra power consumption required to achieve the proposed high throughput NoC
architecture is less than 15% of the total power consumption of the NoC architecture. Low
power switch design is proposed. The power reduction technique is applied to different high
154 Data Storage
throughput NoC architectures. The technique reduces the overall power consumption of the
network by up to 29%.
The relation between throughput, number of virtual channels and switch frequency is
analyzed. The simulation results demonstrate the performance enhancements in terms of
throughput, number of virtual channels, switch frequency and power dissipation. It is shown
that optimizing the circuit can increase the number of virtual channels without degrading the
frequency. The throughput of different NoC architectures is also improved with the proposed
architecture. The minimum power consumption and the minimum area can be obtained by
using HT-BFT as compared to other high throughput NoC architectures. The extra metal
resources required to achieve the proposed HT-BFT is negligible as compared to the metal
resources of the network. The extra power consumption required to achieve the proposed
HT-BFT is eliminated by using the leakage power reduction technique.
8. References
Abd El Ghany, M. A.; El-Moursy, M. & Ismail, M. (2009a) “High Throughput Architecture
for High Performance NoC” Proceedings of IEEE International Symposium on Circuits
and Systems (ISCAS), May, 2009 (in publication)
Abd El Ghany, M. A.; El-Moursy, M. & Ismail, M. (2009b) “High Throughput Architecture
for CLICHÉ Network on Chip” Proceedings of the IEEE International SoC Conference,
September, 2009
Benini, L. & Micheli, G. de (2002) “Networks on chips: A new SoC paradigm,” IEEE
Computer, vol. 35, no. 1, pp. 70–78, Jan. 2002
Bertozzi, D.; Jalabert, A. & Murali, S. et al., (2005) “NoC synthesis flow for customized
domain specific multiprocessor systems-on-chip,” IEEE transactions on Parallel and
Distributed Systems, vol. 16, no. 2, pp. 113–129, February 2005
Bolotin, E.; Cidon, I.; Ginosar, R. & Kolodny, A. (2004) “QNoC: QoS architecture and design
process for network on chip,” Journal of Systems Architecture, vol. 50, no. 2–3, pp.
105–128, February 2004
Dally, W. J. & Towles, B. (2001) “Route packets, not wires: on-chip interconnection
networks”, In Proceedings of Design Automation Conference, pp 684–689, June 2001
Dehon, A. (2000) “Compact, Multilayer layout for butterfly fat-tree”, In Proceedings of The
12th ACM Symposium on Parallel algorithm Architectures, pp. 206- 215, July 2000
El-Moursy, M. A. & Friedman, E. G. (2004) “optimum wire sizing of RLC interconnect with
repeaters”, Integration, the VLSI journal, vol. 38, no. 2, pp. 205-225, 2004
Grecu, C.; Pande, P. P.; Ivanov, A. & Saleh, R. (2004a) “Structured Interconnect Architecture:
A Solution for the Non-Scalability of Bus-Based SoCs,” Proceedings of Great Lakes
Symposium on VLSI, pp. 192-195, April 2004
Grecu, C.; Pande, P. P.; Ivanov, A. & Saleh, R. (2004b) “Ascalable Communication-Centric
SoC Interconnect Architecture”, In Proceedings of IEEE International Symposium On
Quality Electronic Design, pp. 22- 24, March, 2004
Grecu, C.; Pande, P.; Ivanov, A.; Marculescu, R.; Salminen, E. & Jantsch, A. (2007a)
“Towards open network-on-chip benchmarks,” In Proceedings of the International
Symposium on Network on Chip, pp. 205, May 2007
High Throughput Architecture for High Performance NoC 155
Grecu, C.; Ivanov, A.; Saleh, R. & Pande, P. (2007b) “Testing network-on-chip
communication fabrics,” IEEE transactions Computer-Aided Design of Integrated
Circuits and Systems, vol. 26, no. 10, pp. 2201–2014, December 2007
Guerrier, P. & Greiner, A. (2000) “A generic architecture for on-chip packet-switched
interconnections”, In Proceedings of Design, Automation and Test in Europe Conference
and Exhibition, pp. 250–256, March 2000
ITRS 2007 Documents, http://itrs.net/Links/2007ITRS/Home2007.htm
Kao, J. T. & Chandrakasan, A. P. (2000) “Dual-Threshold Voltage Techniques for Low-Power
Digital Circuits “, IEEE Journal of Solid-State Circuits, vol. 35(7), pp. 1009- 1018, July
2000
Karim, F.; Nguyen, A. & Sujit Dey, (2002) “An Interconnect Architecture for Networking
Systems on Chips,” IEEE Micro, vol. 22, no. 5, pp. 36-45, September 2002
Khellah, M. M. & Elmasry, M. I. (1999) “ Power minimization of high-performance
submicron CMOS circuit using a dual-V/sub dd/ dual-V/sub th/ (DVDV)
approach” In Proceeding of the International symposium on Low Power Electronics and
Design , pp. 106-108, 1999
Kim, J.-S.; Hwang, M.-S & Roh, S. et al., (2004) “On-chip network based embedded core
testing,” In Proceedings of the IEEE International SoC Conference, pp. 223–226,
September 2004
Kumar, S. et al., (2002) “A Network on Chip Architecture and Design Methodology,” In
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, pp. 117-124,
2002
Kursun, V. & Friedman, E. G. (2004) “Sleep switch dual threshold voltage domino logic with
reduced standby leakage current,” IEEE transactions on VLSI systems, 12(5), pp. 485-
496, May 2004
Lee, S.-J.; Song, S.-J. & Lee, K. et al. (2003) “An 800MHz Star-Connected On-Chip Network
for Application to Systems on a chip”, IEEE Digest of International Solid State Circuits
Conference, vol. 1, pp. 468-489, February, 2003
Lee, K.; Lee, S.-J. & Kim, S.-E. et al. (2004) “A 51mW 1.6GHz On-Chip Network for Low
power Heterogeneous SoC Platform”, IEEE Digest of International Solid State Circuits
Conference, vol. 1, pp.152-518, February, 2004
Lee, S.-J.; Kim, K. & Kim, H. et al. (2005) “Adaptive Network-on-Chip with Wave-Front
Train Serialization Scheme”, IEEE Digest of Symposium on VLSI Circuits, pp. 104-107,
June, 2005
Lee, S.-J.; Lee, K. & Yoo, H.-J. (2005)“Analysis and Implementation of Practical Cost-
Effective Network-onChips”, IEEE Design & Test of Computers Magazine (Special
Issue for NoC), September 2005
Lee, K.; Lee, S.-J. & Yoo, H.-J. (2006) “Low-Power Networks-on-Chip for High-Performance
SoC Design”, IEEE Transactions on Very Large Scale Integration Systems, vol. 14, no.2,
pp.148-160, February 2006
Lee, S. & Bagherzadeh, N. (2006) “Increasing the Throughput of an Adaptive Router in
Network-on Chip(NoC)”, In Proceedings of 4th International Conference on
Hardware/Software Codesign and System Synthesis CODES+ISSS’06, pp. 82-87, Oct.
2006
156 Data Storage
Liang, J.; Laffely, A.; Srinivasan, S. & Tessier, R. (2004) “An architecture and compiler for
scalable on-chip communication,” IEEE transactions on VLSI Systems, vol. 12, no. 7,
pp. 711–726, July 2004
Li, X.-C.; Mao, J.-F.; Huang, H.-F. & Liu, Y. (2005) “Global interconnect width and spacing
optimization for latency, bandwidth and power dissipation,” IEEE Transactions on
Electron Devices, vol. 52, no. 10, pp. 2272–2279, Oct. 2005
Murali, S. & De Micheli, G. (2004) “SUNMAP: A Tool for Automatic Topology Selection and
Generation for NoCs”, IEEE Proceedings of Design Automation conference, pp. 914-919,
June 2004
Murali, S.; Theocharides, T. & Vijaykrishnan, N. et al., (2005) “Analysis of error recovery
schems for networks on chips,” IEEE Design and test, vol. 22, no. 5, pp. 434–442,
October 2005
Pande, P. P.; Grecu, C.; Ivanov, A. & Saleh, R. (2003a) “Design of a Switch for Network on
Chip Applications,” In Proceedings of The 2003 International Symposium on Circuits
and Systems, vol. 5, pp. 217-220, May 2003
Pande, P. P.; Grecu, C.; Ivanov, A. & Saleh, R. (2003b) ”High-Throughput Switch-Based
Interconnect for Future SoCs”, the 3rd IEEE International workshop on SoC for real-time
Applications, pp 304-310, July 2003
Pande, P. P.; Grecu, C.; Ivanov, A. & Saleh, R. (2005a) “Design, synthesis, and test of
networks on chips,” IEEE Design and Test of Computer, vol. 22, no. 5, pp. 404–413,
Aug. 2005
Pande, P. P.; Grecu, C.; Jones, M.; Ivanov, A. & Saleh, R. (2005b) “Performance Evaluation
and Design Trade-Offs for Network-on-Chip Interconnect Architectures”, IEEE
Transaction on Computers, vol. 54, no. 8, Aug. 2005
Salminen, E.; Kulmala, A. & H¨am¨al¨ainen, T. (2007) “On network-on-chip comparison,” In
Proceedings of the Euromicro conference on Digital System Design Architecture, August
2007, pp. 503–510
Non-volatile memory interface protocols for smart sensor networks and mobile devices 157
10
X
1. Introduction
Data acquisition, storage and transmission are mandatory requirements for different
applications in the area of smart sensors and sensor networks.
Different architectures and scenarios can be considered. Thus for smart sensors architectures
(Frank, 2002) based on the IEEE1451.X standard (IEEE, 2007) the acquired data can be
processed at the smart sensor level using data of the so-called standard template TEDS
(Honeywell, 2009) stored in a non-volatile memory (Brewer & Gill, 2008). The auto-
identification of the smart sensor (Yurish & Gomes, 2003) unit from a sensor network is
based on the Basic TEDS that also represents part of the stored information.
Considering smart sensor architectures (Song & Lee, 2008), the communication between the
sensor processing unit (e.g. microcontroller) and one or multiple non-volatile memory units (e.g.
Flash EEPROM memory units) is done using different communication protocols, such as SPI,
I2C, 1-wire (Kalinsky & Kalinsky, 2002); (Paret & Fenger, 1997); (Linke, 2008). These protocols are
thus frequently used in smart sensor implementations (IEEE, 2004)(Ramos et al., 2004).
As the name implies, smart sensors networks are networks of smart sensors, that is, of
devices that have an inbuilt ability to sense information, process the information and send
selected information to an external receiver (including to other sensors). A "smart sensor" is
a transducer (or actuator) that provides functions beyond what is necessary to generate a
correct representation of a sensed or controlled quantity. This means that such nodes require
memory capabilities to store data temporarily or permanently.
In an increasingly number of applications, the nodes are required to change their spatial
position (mobile nodes), which leads to wireless networks. The sensor network nodes data
management and advanced data processing are carried out by a host unit characterized by
high data processing capabilities, non-volatile data storage capabilities and data
communication capabilities. One kind of solutions that materialize the host unit is mobile
devices (e.g. phones and PDAs) with special operating systems (e.g. WindowsCE, Symbian,
BalckBerry OS) and internal and extended data storage memory capabilities (CF card
memory, SD card memory). Specific protocols, CompactFlash and Secure-Digital (Compact
Flash, 2009) (SD Association, 2009) are associated with memory card interfaces that are used
to perform the communication between the host unit processor and the memory units. In
wireless sensors networks, special attention is granted to the memory read/write operations
158 Data Storage
time interval and the associated power consumption considering the mobile device
autonomy requirements.
Considering the importance of non-volatile memory and the communication protocols
associated with memory units as parts of smart sensors, sensor networks and distributed
mobile systems (e.g. wearable sensing system for physiological parameter measurements),
the proposed chapter briefly reviews some non-volatile memory solutions more used in
those contexts.
IEEE1451.4
IEEE 1451.4 (dot 4 from now on) defines a mechanism for adding self-describing behaviour
to traditional transducers with an analogue signal interface. The dot 4 defines the concept of
Non-volatile memory interface protocols for smart sensor networks and mobile devices 159
a transducer that supplies both analogue and digital interface, namely, mixed-mode
interface (Fig.2) where the TEDS non-volatile memory localization is better highlighted. In
this case, the non-volatile memory interface will materialize the digital interface associated
with IEEE1451.4.
Analog + Digital
TEDS Mixed-Mode
IEEE P1451.4 Transducer
Txdcr
Network-Capable CANOpen
Application TEDS
Txdcr Bus
Interface
Processor IEEE P1451.6 Intrinsically safe
(NCAP) system
Any A/D Txdcr
Network
Digital, Point-to-point
TEDS Smart Transducer
Interface
Wireless
IEEE P1451.2 Interface Module
IEEE IEEE A/D Txdcr (STIM)
1451.1 1451.0
Distributed Multidrop
Common Common
Bus
Txdcr Bus
TEDS
Interface
Object Function Transducer Bus
Model ality & IEEE P1451.3
Interface Module
TEDS A/D Txdcr
Wireless
TEDS
Interface
Wireless
Fig. 1. TEDS on IEEE 1451 family of smart transducer interface standards network (Txdcr –
transducer sensor or actuator)
Mixed-Mode Transducer
NCAP
(Sensor with TEDS)
1-w Data
I/0 (1-wire) TEDS
Dig. GND
Fig. 2. Non-volatile memory for TEDS associated with mixed-mode interface for smart sensors
(NCAP – network capable application processor)
160 Data Storage
The memory interfaces that are mainly used in the IEEE1451.4 implementation are 1-wire
and I2C. In the following paragraph a brief description of the 1-wire and 2-wire (I2C)
memory interface protocols particularly used in smart sensor implementation will be
presented.
1-wire Memory
IO Infc Array
VCC
VCC
VCC VCC VCC
RPUP
IN OUT IN OUT IN OUT
MASTER DATA DATA DATA
UNIT 1 UNIT 2 UNIT n
GND GND GND
GND
GND
The 1-wire network has three main components: a bus master with controlling software,
wiring and associated connectors (e.g. Esensors connectors) and 1-wire devices (unit1, unit2,
unit3,..unitn) that can be sensors, actuators and memories (e.g. DS2430A from Maxim). The
Non-volatile memory interface protocols for smart sensor networks and mobile devices 161
protocol uses conventional CMOS/TTL logic levels (maximum 0.8V for logic “zero” and a
minimum 2.2V for logic “one”) with operation specified over a supply voltage range of 2.8V
to 6V. System clock is not required; each 1-wire part is self-clocked by an internal oscillator
synchronized to the falling edge of the master.
Signalling on the 1-wire bus is divided into time slots of 60us. One data bit is transmitted on
the bus per time slot. Units are allowed to have a time base that differs significantly from the
nominal time base. This however, requires the timing of the master to be very precise, to
ensure correct communication with slaves with different time bases.
Addressing
All 1-wire devices have a unique address laser-registered into the chip. Dallas
Semiconductors guarantees that the address is unique. The individual address is expressed
by a 64-bit serial number that is stored in the device memory. It is composed of eight bytes
divided into three main sections as presented in Table 1.
Starting with the least significant bit (LSB), the first byte stores the 8-bit family codes that
identify the device type. For the particular case of 1-wire memories the family codes are
presented in Table 2.
The next six bytes store a customizable 48-bit individual address or ID that guaranteed
unique within a family. A few types of chips have sequences of IDs reserved for special
manufacturing runs, but in general, there are no special characteristics to the ID. The last
byte, the most significant byte (MSB), contains a cyclic redundancy check (CRC) with a
value based on the data contained in the first seven bytes. This allows the master to
determine if an address was read without error.
With a 248 serial number pool, conflicting or duplicate node addresses on the net are never a
problem. For maximum data security the 1-wire memories can implement US government-
certified Secure Hash Algorithm (SHA-1).
162 Data Storage
Write 1- The master pulls the bus low for 1 to 15 μs. It then releases the bus for the
rest of the time slot (Fig.4).
Write 0 - The master pulls the bus low for a period of at least 60 μs, with a
maximum length of 120 μs (Fig.5).
Read - The master pulls the bus low for 1 to 15 μs. The slave then holds the bus
low if it wants to send a ‘0’. If it wants to send a ‘1’, it simply releases the line. The
bus should be sampled 15μs after the bus was pulled low. As seen from the
master’s side, the “Read” signal is in essence a “Write 1” signal. It is the internal
state of the slave, rather than the signal itself that dictates whether it is a“Write 1”
or “Read” signal (Fig. 6).
Reset & Presence - The master pulls the bus low for at least 8 time slots or 480μs
and then releases it. This long low period is called the “Reset” signal. If there is a
slave present, it should then pull the bus low within 60μs after it was released by
the master and hold it low for at least 60μs. This response is called a “Presence”
signal. If no presence signal is issued on the bus, the master must assume that no
device is present on the bus, and further communication is not possible (Fig.7).
Non-volatile memory interface protocols for smart sensor networks and mobile devices 163
Reset Presence
Fig. 7 .- Reset & Presence – command on the 1-wire bus
The communication between the master and slave is performed using the 1-wire commands
according with the flowchart that is presented in Fig.8.
Select 1-wire
device
Perform a single
Device operation
The first step of the communication is materialized by the „reset” command that is delivered
by the master synchronizing the entire 1-wire bus. One of the unit1, unit2, ...., unit n (slave
devices) is selected for the next communication. The selection of the specific slave is done
using the serial number of the device or using a binary search algorithm (Maxim, 2002).
Once a specific device has been selected, all other devices drop out and ignore subsequent
communications until the next reset is carried out.
Because each device type performs different functions and serves a different purpose, each
has a unique protocol once it has been selected. For the particular case of a non-volatile
memory, a set of particular commands are mentioned:
Write Scratchpad [0Fh] - applies to the data memory and the writable addresses in
the register page. After issuing the Write Scratchpad command, the master must
first provide the 2-byte target address, followed by the data to be written to the
scratchpad;
Read Scratchpad command [AAh] - allows verifying the target address and the
integrity of the scratchpad data. After issuing the command code, the master
begins reading. The master should read through the end of the scratchpad, after
which it receives an inverted CRC16, based on data as it was sent by the 1-wire
memory.
Copy Scratchpad command [55h] - is used to copy data from the scratchpad to the
data memory and the writable sections of the register page.
Read Memory command [F0h] - is the general function to read from the 1-wire
memory. After issuing the command, the master must provide a 2-byte target
164 Data Storage
address, which should be in the range of 0000h to 0A3Fh. If the target address is
higher than 0A3Fh, for the particular case of DS28EC20 (1-wire memory), the
upper four address bits are changed to “0”. After the address is transmitted, the
master reads data starting at the (modified) target address and can continue until
address 0A3Fh. If the master continues reading, the result is FFh. The Read
Memory command sequence can be ended at any point by issuing a reset pulse.
Extended read memory [A5h]- works essentially the same way as Read Memory,
except for the 16-bit CRC that the DS28EC20 generates and transmits following the
last data byte of a memory page. The Extended Read Memory command sequence
can be ended at any point by issuing a reset pulse.
AO
VCC
RPUP (300
to 2.2 k)
1-w I/O
Mem
AI (DS28EC20)
C GND
Master Slave
Fig. 9. - The interfacing of memory as part of IEEE1451.4 smart sensor to the microcontroller
(C)
Non-volatile memory interface protocols for smart sensor networks and mobile devices 165
Referring to the implementation of 1-wire software that works under Windows OS, an 1-
wire Software Development KIT (SDK) can be used to develop applications associated with
IEEE1451.4 smart sensor data management. The 1-wire SDK includes 1-wire API for .NET,
API examples in VB.NET and C#, along with TMEX API examples in C, C++, Pascal
(Borland Delphi), and Microsoft Visual Basic that assures high degree of flexibility and
reduced time of software design and implementation.
DAQ device
TR
SDA SCL TRi SDA SCL (i+1) SDA SCL TRn
unit i unit i+1 unit n
I2C Master I2C Slave I2C Slave
(e.g. microcontroller) (e.g. serial EEPROM (e.g. serial EEPROM
memory) memory)
Fig. 10. - I2C device network including 1 master device and n – slave devices ( TRi, TR(i+1),
TRn – analog transducers, DAQ device – analog signal acquisition devices)
Each device has a particular address. I2C bus supports two addressing schemes: 7-bit
address and 10-bit address. Up to 1024 devices are allowed to be connected to the bus. The
7-bit address scheme has shorter message length and requires less complex hardware.
Devices with 7 and 10 bit addresses can be mixed in the same system.
The bus can operate in three modes with different data rates. Data on the bus can be
transferred at rates of up to 100kbit/s in the standard-mode, up to 400 kbit/s in the fast-
mode, or up to 3.4 Mbit/s in the high-speed mode. The number of interfaces connected to
the bus is dependent on the bus capacitance limit of 400pf (Phillips, 2000).
166 Data Storage
SCL
start
Comm.
D7 D6 D5 D4 D3 D2 D1 D0 ACK
stop
Comm.
Fig. 11. - The SDA and SCL signals associated with the I2C bus communication
The I2C bus does not support plug-n-play or interrupt functions that are important for
many sensor networks or memory networks. To save energy, some sensor nodes should be
in sleep mode most of the time and woken up by a timer or sensing event. For each
transported information, the microcontroller should initiate the request and provide the
clock to the sensors. To get the updated information from the sensor, the microcontroller has
to poll each sensor node connected on the bus very often to make sure it will not miss new
information or unexpected events. These features make I2C unsuitable for the applications
which have strict requirement for the power efficiency and emergency processing. Although
the sensor node could be clipped to or from the system easily, the microcontroller can not
detect an event or configure the system during operation. This limits the application of I2C
in sensor networks which are usually dynamically reconfigurable and/or demand high
energy-efficiency.
Non-volatile memory interface protocols for smart sensor networks and mobile devices 167
CF based data storage is used especially at the master device level such as PDAs
(Postolache, 2006) or touch panel computers (e.g. TPC2106T) (Postolache, 2007).
CF electrical interface
The internal configuration of the CF protocol associated to CF non-volatile memory cards
includes a CF controller connected through I/O digital lines to a host interface that can be
associated with PDAs, laptop computers, tablet PC, etc. The CF block diagram and the CF
connector pin-out are presented in Fig. 12.
Non-volatile memory interface protocols for smart sensor networks and mobile devices 169
Data
In/Out
Storage
Host Module(s)
Controller
Interface
(Flash, Disk
Control
Drive, etc.)
26 50
CF connector
1 25
Fig. 12. - Compact Flash block diagram and CF connector pin-out
Pins 13 and 38 correspond to power supply, which according to the standard can be either
3.3 volts or 5 volts. Pins 1 and 50 correspond to GND. The data stored on the CF can be
accessed through 8 or 16 bit data bus associated to the CF connector. The data and address
bits for 8 and 16 bits memory access bus are shown in Table 3.
Pins 21 22 23 2 3 4 5 6
Data D00 D01 D02 D03 D04 D05 D06 D07
lines
Pins 47 48 49 27 28 29 30 31
Data D08 D09 D10 D11 D12 D13 D14 D15
lines
Table 3. - CF pins and data line correspondence for 8bit and 16 bit data bus for memory access.
The address bus (A0 to A10 lines) pin assignment is given in Fig. 13.
170 Data Storage
A9 A7 A5 A3 A1
26 50
CF
connector
1 25
A10 A8 A6 A4 A2 A0
Fig. 13. - CF’s connector address bus pin out
SD electrical interface
A block diagram of an SD card interface associated with a SD card non-volatile memory is
presented in Fig. 14 and the corresponding pin-out, together with the SD function are
presented in Table 4.
The SD card is clocked by an internal clock generator. The interface driver unit synchronizes
the DAT and CMD signals from external CLK to the internal used clock signal. The card is
controlled by the six line SD card interface containing the signals: CMD, CLK,DAT0~DAT3.
Non-volatile memory interface protocols for smart sensor networks and mobile devices 171
For the identification of the SD card in a stack of SD card, a card identification register (CID)
and a relative and address register (RCA) is foreseen [(Kingmax Digital Inc., 2009)].
An additional register, (CSD), contains different types of operation parameter.
VDD
DAT2 CMD CLK DAT0
CD/DAT3 Interface driver DAT1
Internal Clock
OCR [31:0]
CID [127:0]
Card
Power on detection
RCA [15:0]
Interface
DSR [15:0] controller
CSD [127:0]
SCR [630]
Memory interface
Memory core
The card has its own power on detected unit. No additional master reset signal is required
to setup the card after power on. It is protected against short circuit during insertion and
removal while the SD card system is powered up
The communication using the SD card lines to access either the memory field or the register
is defined by the SD card standard. Different protocols are supported by SD cards.
SD 1 bit protocol
It is a synchronous serial protocol with one data line, used for bulk data transfers, one clock
line for synchronization, and one command line, used for sending command frames. The SD
1-bit protocol explicitly supports bus sharing. A simple single-master arbitration scheme
allows multiple SD cards to share a single clock and DAT0 line.
SD 4 bit protocol
It is nearly identical to the SD 1-bit protocol. The main difference is that bulk data transfers
use a 4-bit parallel bus instead of a single wire. With proper design, this has the potential to
quadruple the throughput for bulk data transfers. Both the SD 1-bit and 4-bit protocols by
default require a Cyclic Redundancy Check (CRC) protection of bulk data transfers.
Cyclic Redundancy Check is a simple method for detecting the presence of simple bit-
inversion errors in a transmitted block of data. In SD 4-bit mode, the input data is
multiplexed over the four bus (DAT) lines and the 16-bit CRC is calculated independently
for each of the four lines, which imply an increased software complexity of CRC calculation.
Hardware implementation of 4-bit parallel CRC calculation represents an interesting
alternative and can be materialized using an Application Specific Integrated Circuit (ASIC)
or field programmable gate arrays (FPGA).
5. References
Atmel (2004). AVR318: Dallas 1-Wire master 8-bit Microcontrollers, on-line at:
www.atmel.com/dyn/resources/prod_documents/doc2579.pdf, Sept-2004
Non-volatile memory interface protocols for smart sensor networks and mobile devices 173
Brewer, J. & Gill, M. (2008). Nonvolatile Memory Technologies with Emphasis on Flash: A
Comprehensive Guide to Understanding and Using Flash Memory Devices, Wiley-IEEE
Press, February 2008
Compact Flash Org. (2009). CF+ & CompactFlash Specification Revision 4.1, on line at
http://www.compactflash.org/spec_download.htm, May 2009
Flittner P. (2007). CSR (Bluetooth Subgroup Chair) Thurston Brooks, 3eTI, IEEE P1451.5
Wireless Sensor Interface Working Group Bluetooth Subgroup. Proposal on-line at:
grouper.ieee.org/groups/1451/5/.../P1451.5_Bluetooth2.pdf
Frank, R. (2002). Understanding Smart Sensors, Artech House Publishers, April 2002
Higuera J.; Polo J.; Gasulla M. (2009). A Zigbee wireless sensor network compliant with the
IEEE1451 standard, Proceedings of IEEE Sensors Applications Symposium, SAS 2009,
pp.309-313, 2009
Honeywell (2009). “TEDS - plug and play sensor configuration”, on-line at
http://www.sensotec.com/pnpterms.shtml, June 2009
IEEE Std 1451.4-2004 (2004), Standard for a Smart Transducer Interface for Sensors and
Actuators- Mixed-Mode Communication Protocols and Transducer Electronic Data
Sheet (TEDS) Formats, IEEE Standards Association, Piscataway, NJ, sub classe 5.1.1,
2004
IEEE STD 1451.0-2007 (2007). Standard for a Smart Transducer Interface for Sensors and
Actuators – Common Functions, Communication Protocols, and Transducer
Electronic Data Sheet (TEDS) Formats, IEEE Instrumentation and Measurement
Society, TC-9, The Institute of Electrical and Electronics Engineers, Inc., New York,
N.Y. 10016, SH99684, October 5, 2007
IEEE Standard Association (2009). IEEE Standard 1451.4-2004 Tutorials, on-line at:
http://standards.ieee.org/regauth/1451/Tutorials.html, 2009
Kalinsky D. & Kalinsky R. (2002). Introduction to Serial Peripheral Interface, on-line at
http://www.embedded.com/story/OEG20020124S0116, January ‘02
Kingmax Digital Inc. (2009). SD Card Specification” on line at::
downloads.amilda.org/MODs/SDCard/SD.pdf
Leens F. (2009). An introduction to I2C and SPI protocols, Instrumentation and Measurement
Magazine, vol.12, No. 1, pp. 8-13, February 2009
Linke B. (2008). Overview of 1-Wire® Technology and Its Use, on-line at
http://www.maxim-ic.com/appnotes.cfm/an_pk/1796, June 2008
Maxim Inc. (2002), 1-Wire Search Algorithm, on-line at http://www.maxim
ic.com/appnotes.cfm/an_pk/187, March 2002
Maxim Inc. (2003), 1-Wire Communication with a Microchip PICmicro Microcontroller on-line at
http://www.maximic.com/appnotes.cfm/an_pk/2420, Sept. 2003
Pardo A.; Camara L.; Cabre J.; Perera A.; Cano X.; Marco S.; Bosch J. (2006). Gas
measurement systems based on IEEE1451.2 standard, Sensors and Actuators B:
Chemical, Vol.116, Issues 1-2, pp. 11-16, July 2006
Paret D.& Fenger, C. (1997). The I2C Bus: From Theory to Practice, Wiley, John & Sons, 1997
Pfeiffer O.; Ayre A.; Keydel C. (2003) Embedded Networking with CAN and CANopen,
Annabooks Publisher, 2003
Phillips Semiconductor (2000). The I2C-BUS specification, Version 2.1, 2000, online at:
i2c2p.twibright.com/spec/i2c.pdf. 2000
174 Data Storage
11
X
1. Introduction
Electronic nose is the intelligent design to identify food flavors, cosmetics and different gas
odors, depending on sensors. The continuous developing of these sensors permit advanced
control of air quality, as well as, high sensitivity to chemical odors. Accordingly, a group of
scientists have worked on developing the properties of sensors, while others have modified
ways of manufacturing ultra low-cost design (Josphine & Subramanian , 2008); (Wilson et
al., 2001).
In the design of an electronic nose, sampling, filtering and sensors module, signal
transducers, data preprocessing, feature extraction and feature classification are applied.
(Getino et al., 1995) is used as an integrated sensor array for gas analysis in combustion
atmosphere in the presence of humidity and variation in temperature from 150-350oC. The
sensor array exposed to a gas mixture formed by N2, O2, CO2, H2S , HCL and water vapour
with a constant flow rate of 500 ml/ min was studied. (Marco et al., 1998). The gas
identification with tin oxide sensor array is investigated, in addition, the several undesirable
characteristics such as slow response, non-linearties, long term drifts are studied. Correction
of the sensor’s drift with adaptive self organizing maps permit success in gas classification
problems.(Wilson et al., 2001) is introduced as a review of three commonly used gas sensors
which are, solid state gas sensor, chemical sensors and optical sensors. Comparisons are
deducted among them in terms of their ability to operate at low power, small size and
relatively low cost with numerous interference and variable ambient conditions.(Dong Lee
& Sik Lee, 2001) depended on solid state gas sensor, thus the pollutants of environment are
controlled relative to the sensing mechanism, the sensing properties of solid – state gas
sensors to environmental gases, such as No, Co and volatile organic compounds.(Guardado
et al. , 2001) is used as a neural network efficiency for the detection of incipient fault in
power transformers. The NN was trained according to five diagnosis criteria and then tested
by using a new set of data.
This study shows that NN rate of successful diagnosis is dependent on specific criterion
under consideration with values in the range of 87-100 %.( Zylka & Mazurek, 2002)
introduced a rapid analysis of gases by means of a portable analyzer fitted with
176 Data Storage
electrochemical gas sensor. The analyzer, which was built, is controlled by a microprocessor
and the system incorporates only two gases which are Co and H2.
The drawback is the lack of sensors selectivity which is disadvantageous in most
applications.(Belhovari et al., 2004; Belhovari et al., 2005) used sensors array with gas
identification and Gaussian mixture models. Some problems are studied such as drift
problem and slow response is introduced. Robust detection is applied through a drift
counteraction approach which is based on extending the training data set using a simulated
drift. (Belhovari et al., 2006) gas identification is introduced using sensors array, and
different neural networks algorithms. Different classifiers are used MLP, RBF, KNN, GMM
and PPCA are compared with each other using the same gas data set allowed performance
up to 97%. Electronic gas sensors based on tin oxide films are used for the identification of
gases, detection of toxic contaminants and separation of mixture of gases (Kolen, 1994);
(Belhovari et al., 2005); (Marco et al., 1998); (Getino et al., 1995); (Becker et al., 2000);
(Amigoni et al., 2006).
The problem here is to identify or to discriminate different gases such as, methane, propane,
butane, carbon dioxide and hydrogen. Using different concentrations. Taguchi gas sensors
(TGS) are used, which is a metal oxide semiconductor sensor, based on tin oxide that has
been commercially available from Figaro engineering company (Figaro sensor.com “on –
line”). In the design of electronic nose systems, power consumption directly related to
temperature operation, selectivity, sensitivity, and stability typically has the most influence
on the choice of metal oxide films for a particular application (Fleming, 2001); (Carullo,
2006). For electronic nose applications (Morsi-b, 2008); (Luo et al, 2002); (Bourgeois et al.,
2003); (Carullo, 2006) metal oxide semiconductors are largely hampered by their power
consumption demands. Thermal isolation and intermittent operation of the heaters reduce
the power consumption of the sensors themselves to facilitate their use of importable
applications. However, it also presents significant obstacles in terms of noise, drift, aging,
and sensitivity to environmental parameters. The Feed Forward Back Propagation of Neural
Network using the multilayer perceptron is used to separate between them. Fuzzy logic is
used to discriminate different gases and to detect the concentration of each gas. Electronic
nose design provides rapid responses, ease of operation and sufficient detection limits. Data
quality objectives (DQOs) of gases must be considered as a part of technology development
and a focus should be made on the most urgent problems.
2. Experimental Setup
The analysis and the characterization of gases are acquired by building a prototype of multi-
sensors monitoring system (electronic nose), which are TGS 822, TGS 3870, TGS 4160 and
TGS 2600, from Figaro sensor industry, temperature sensor, humidity sensor and supply
voltage equal to 5 V. However, current monitoring methods are costly and time intensive,
likewise limitations and analytical techniques exist. Clearly, a need exists for accurate,
inexpensive, long-term monitoring using sensors. TGS 822 is tin dioxide (Sno2) as
semiconductor. The sensor’s conductivity increases, depending on the gas concentration. It
has high sensitivity to the vapors of organic solvents, as well as, combustible gases. TGS
2600 is comprised of metal oxide semiconductor layer formed on an alumina substrate of a
sensing chip together with an integrated heater. TGS 3870 is a metal oxide semiconductor
gas sensor, embedding a micro-bead gas sensing structure. TGS 4160 is a hybrid sensor unit
composed of a carbon dioxide sensitive element and a thermistor. All presented sensors
have features of long life, low cost, small size, simple electrical circuit, low power
consumption and are available in commercial application. The experimental equipment
consists basically of the gas bottle mass flow controllers, sensor chamber with volume 475
cm3, supply voltage of 5 V and a heating system (Morsi, 2007); (Morsi-a, 2008).Gases used in
the experiment are carbon dioxide, hydrogen, methane, propane, butane and a mixture of
propane and butane. All measurements are presented at 45% relative humidity. All sensors
are connected as an array and covered by a chamber, which has “in” and “outlet” ports. The
input is connected with the mass flow controllers to control the concentration of input gas
after purging with humidified air. All sensors are subjected to variation in temperatures
from ambient temperatures and up (Clifford et al., 2005); (Fleming, 2001). Four variable
resistances are connected in series to the four sensors, placed out of chamber, then are
followed by the microcontroller, to control and monitor the output of each sensor (Smulko,
2006). The output of the microcontroller is monitored and recorded every 20 sec. Different
gases concentrations are applied 100 ppm, 400 ppm, 700 ppm, 1000 ppm with different
environmental temperatures between 20oC to 50oC with different variable resistances for
each sensor RL = 1 k, 3 k, 5 k, and 7 k . Variable load resistance is used to control the
conductivity and to increase the selectivity of each gas than other gases. Sensitivity is used
to refer either to the lowest level of chemical concentration that can be detected or to the
smallest increment of concentration that can be detected in the sensing environment. While
selectivity refers to the ratio of the sensor’s ability to detect what is of interest over the
sensor’s ability to detect what is not of interest as the interferents. Sensors for use in
electronic nose need partial selectivity, mimicking the responses of the olfactory receptors in
the biological nose (Belhovari et al., 2006). Figure (1) shows the electronic nose gas system.
The hardware requirements for the system implementation include a microcontroller Pic 16
F 877A with embedded (A/D) converter. It is chosen for the implementation of this task due
to the on chip memory resources, as well as, its high speed. The output data is transferred to
a PC via a serial port RS 232 with a Baud rate of 2400 from the microcontroller. The software
is developed in C language and is complied, assembled, and downloaded to the system. The
output volt of each sensor is collected, stored in memory and transferred to a
microcontroller to be ready for the processing work and the temperature is also monitored
via a temperature sensor and is recorded (Ishida et al., 2005); (Weigong et al.,
2006);(Smulko,2006).
178 Data Storage
Fig. 1-a. Electronic nose of gas detection Fig. 1-b. Electronic nose of gas detection
with chamber without chamber
Measurements using the electronic nose gas system detector have been done according to
three parts: measurement part, mathematical analysis part, and presentation part. The
system is supported by a collection of methods to improve the uncertainty and reliability.
Different processing techniques like self calibration, self validation and statistical analysis
methods are included. Data averaging standard deviation calculation are used to test and
evaluate the performance of the whole measuring system in order to minimize error (Morsi,
2007); (Morsi – a, 2008).
Fig. 4. Methane gas with TGS 2600 at RL 7 Fig. 5. Methane gas with TGS 4160 at RL 7
k k
Fig. 6. Hydrogen gas with TGS 3870 at RL 7 Fig. 7. Hydrogen gas with TGS 822 at RL 7
k k
Fig. 8. Hydrogen gas with TGS 2600 at RL 7k Fig. 9. Hydrogen gas with TGS 4160 at RL 7
k
180 Data Storage
Figures 14, 15 and 16 show resistances fluctuations with different sensors, different gases
and constant concentration at 700 ppm (Morsi – a, 2008).
Fig. 10. Carbon dioxide gas with TGS 3870 Fig. 11. Carbon dioxide gas with TGS 822 at RL 7
at RL 7 k k
Fig. 12. Carbon dioxide gas with TGS 2600 Fig. 13. Carbon dioxide gas with TGS 4160 at
at RL 7 k RL 7 k
Fig. 14. Methane gas at concentration 700 Fig. 15. Hydrogen gas at Concentration 700 ppm
ppm
Electronic Nose System and Artificial Intelligent Techniques for Gases Identification 181
The Electronic nose system used for gas detection depends on the resistance variation of the
gas sensor, which given the possibility increases the selectivity and sensibility. The portable
and cheap electronic nose is based on commercially available gas sensors. The stability of
the sensors, as well as, their sensitivity depends on the ratio between the change in sensor
resistance in the presence or absence of the gas, which was experimentally characterized.
Moreover, the method can reduce the number of gas sensors to limit power consumption
and maintenance costs. From figs 14 , 15 and 16 it can be noticed that, by increasing load
resistance from 1 k to 7 k, the output volt of TGS 822, TGS 2600 is directly proportional
(i.e. increase with load resistance), However, TGS 3870 is inversely proportional by
increasing load resistance incase of hydrogen and carbon dioxide. TGS 4160 remains
constant with average output volt in the range (0.2-0.4) V incase of hydrogen, but incase of
carbon dioxide TGS 4160 gives variation in the output voltage from (0.4 - 1.8) V. This
variation is inversely proportional to the load resistance variation, which concludes that this
sensor is preferable to detect carbon dioxide than other gases.
It can also illustrate the sensitivity of each sensor due to the resistance fluctuation. The
output results include non-linear response, drift and slow response time. The main
problem is the drift, which causes significant temporal variations of the sensor response
when exposed to the same gas under identical conditions (Clifford et al., 2005);
(Fleming, 2001). It is noticed that response times depend on many parameters, such as
the material type, operating temperature, thickness of the semiconductor, variable
resistance, humidity as well as gas concentration. The sensor array reacts slowly and
takes an average of l0 min to reach the stationary state. This time is a combination of the
time to fill the chamber and the sensor response time. To achieve robust and fast
identification of combustion gases with an array of sensors, a recent study suggested
three main methods for reducing response time:
1. Increasing operating temperature.
2. Reducing the film thickness.
3. Using variable resistance to reduce the number of sensors and the power
consumption of the system.
182 Data Storage
Y = b0 + b1 x1 + b2 x2 + b3 x3. (1)
Interaction
Y= b0 + b1 x1+ b2 x2+ b3 x3+ b12 x1 x2+ b13 x1 x3+ b23 x2 x3. (2)
Pure Quadratic
Where
Y is the predicting concentration of each gas.
x1, x2, x3 are voltages of each sensor, load resistance, and temperature, respectively.
Bij is the effect of element i on element j or the effect of first input on the second input, i and j
have values from 1 to 3.
More than 300 readings for each input are used to detect the error. The concentrations of each
gas are stored in matrices, which are related to the output voltage of each sensor through a
regression coefficient matrix and the equations can be solved using surface response
empirical models to predict the concentrations. Then, the percentage error can be calculated
between actual concentration and predicted concentration to determine the best empirical
modeling algorithm, which describes the surface response of each gas and has the least error.
Table1 shows the percentage error for the different gases with different surface response
empirical modeling algorithms (Morsi – a, 2008).
Electronic Nose System and Artificial Intelligent Techniques for Gases Identification 183
Percentage
Percentage
Percentage
Gases
Hydrogen
Methane
Dioxide
Error of
Error of
Error of
Carbon
The
The
The
Empirical Modeling
Algorithms
Sensor Type TGS 822
Linear 21.322% 13.745% 20.048%
Interactions 19.613% 0.2439% 21.175%
Pure Quadratic 17.7612% 11.2664% 17.9546%
Full Quadratic 14.2644% 0.096.029% 14.426%
Sensor Type TGS 2600
Linear 18% 9.2% 2.2%
Interactions 11% 7.7% 5.5%
Pure Quadratic 30.5% 12.95% 5.8%
Full Quadratic 8% 5.5% 5.7%
Sensor Type TGS 3870
Linear 6.41% 1.66% 3.47%
Interactions 3.03% 1.1% 0.98%
Pure Quadratic 2.08% 2.3% 0.28%
Full Quadratic 1.68% 0.641% 0.02%
Sensor Type TGS 4160
It can be noticed that, for the three different gases, four different algorithms are used. Thus,
full quadratic algorithms can predict the concentration of each gas with less error than other
algorithms. From the percentage error, it is clear that in the case of TGS 822, TGS 2600 and
TGS 3870 gas sensors provide the least error in the case of methane than hydrogen. It is
preferred to use this sensor to detect hydrogen and methane. However, it is unpreferable in
the case of carbon dioxide. It is also noticed that in TGS 4160 gas sensor is preferable to be
carbon dioxide, where as the highest errors are recorded in the case of hydrogen and
methane.
184 Data Storage
From the above results, it can be concluded that the surface response modeling algorithms
provide accurate detection for different concentrations of gases depending on solving
regression matrices using different equations while detecting the percentage error between
the actual and the predicted measurements. The key challenges for building regression
algorithms determine the significant factors that are included in the final mathematical
equation and quantify the effects of those factors. Tables 2, 3 and 4 show the ANOVA results
for each sensor with different gases. The second column is the degree of freedom (DF). The
mean square (MS) is the variance of the data for each factor interaction. The sum of squares
(SS) is computed as MS x DF. F-statistic is defined as the ratio of MS to Error. P-value is the
smallest probability of rejecting the null hypothesis. Using the analysis of variance
(ANOVA), it is possible to identify those effects that are statistically significant. It can be
noticed that, the variance is not constant and if the output voltage has a high value, the
variance also has a high value for different gases. The resulting algorithm includes only
those independent factors that are statistically significant (P-value < 0.05). Quantifying the
main and two-way interaction effects of the independent factors are equivalent to using the
well- known method of least squares fitting method in order to compute the regression
coefficients (Morsi-a, 2008) .
Table 2-a. Anova of methane gas with TGS 3870 Table 2-c. Anova of methane gas with TGS 2600
gas sensor gas sensor
Table 2-b. Anova of methane gas with TGS 822 Table 2-d. Anova of methane gas with TGS 4160
gas sensor gas sensor
Electronic Nose System and Artificial Intelligent Techniques for Gases Identification 185
Table 3-a. Anova of hydrogen gas with TGS Table 4-a. Anova of carbon dioxide gas with TGS
3870 gas sensor 3870 gas sensor
Table 3-b. Anova of hydrogen gas with TGS Table 4-b. Anova of carbon dioxide gas with TGS
822 gas sensor 822 gas sensor
Table 3-c. Anova of hydrogen gas with TGS Table 4-c. Anova of carbon dioxide gas with
2600 gas sensor TGS 2600 gas sensor
Table .3-d. Anova of hydrogen gas with TGS Table 4-d. Anova of carbon dioxide gas with TGS
4160 gas sensor 4160 gas sensor
186 Data Storage
By increasing load resistance, the sensitivity and conductivity will decrease. The difference
between the relationship of sensitivity, conductivity and load resistance between both
sensors, allows the discrimination between propane and butane depending on the variation
of the resistance. For the TGS 2600 gas sensor, figs 25 till 28, the voltage increases with the
increase of concentrations, temperature and load resistance for both gases.
With TGS 4160 gas sensor, figs 29-32, for both butane and propane, there is no change in the
output volt in load resistance, therefore, we can not depend on this sensor discrimination.
The calibration of pure gases among their semiconductor sensor, predicts the correct sensor
that should be used in classification. Figures 33 and 34 depict the mixture of both propane
and butane injected inside the chamber. Propane has a concentration of 600 ppm while
butane has a concentration of 400 ppm. It can be noticed that the output of both gases is
unstable, which leads to difficulties in discrimination. Neural Networks (NN) have been
used extensively in applications where pattern recognition is needed. They are adaptive,
capable of handling highly non-linear relationships and generalizing solutions for a new set
of data. In fact, NN do not need a predefined correspondence function, which means that
there is no need for a physical model. A neuron model is the most basic information
processing unit in a neural network. Depending on the problem complexity, they are
organized in three or more layers: the input layer, the output layer and one or several
hidden layers. Each neuron model receives input signals, which are multiplied by synaptic
weights. An activation function transforms these signals into an output signal to the next
neuron model. The Back Propagation learning algorithm is used due to its ability of pattern
recognition. A sigmoid activation function was also used because of two reasons: it is highly
non-linear and has been reported to have a good performance when working with the back
propagation learning algorithms [45][46] . In order to avoid slow training, it is decided to
use only three layers. During the training process, a vector from a training set (xi)
representing a gas pattern is presented to the net. The winning neuron (the closet to the
pattern with an Euclidean), and its neighbours, the neighborhood area, change their
position, becoming closer to the input pattern according to the following learning rule:
Wjinew = Wjiold + (t). nb (t,d). (xi – xjiold) (5)
where Wji are the weights of the neurons inside, the neighborhood area, j is the index of the
neuron, i is the index of pattern , t is the time step, (t) is the learning rate, nb (t,d) is the
neighborhood function, and d is the distance between the neuron and the winner measured
over the net. The learning rate and the neighborhood are monotonically decreasing
functions along the training (Eberhart et al. 1996); (Wesley, 1997). Both patterns constitute an
NN training set. In case NN training is slow or shows little convergence, then both patterns
are either poorly correlated or incorrect. This study is performed on pure butane and
propane gases, and a mixture of them. The data set is divided in two parts. The first is used
to train the net, while the second is used to test it. Training patterns are chosen from
different concentrations, different times and different temperatures. A large number of
patterns have been selected from the extremes of the concentrated range and from initial
parts of the dynamic response to give a larger weight to more early difficult cases. The
Neural Network is constructed as a feed forwarded back propagation network that is
composed of three layers: input, hidden and output layers. The input layer has three
neurons corresponding to the output voltages of each sensor (x1), temperature (x2) and
variable resistance (x3).
188 Data Storage
The hidden layer has five neurons. The hidden layer neurons use a transfer function of
tansig which is a hyperbolic tangent sigmoid function used to calculate the layer’s output
from its net input. One hidden layer with 5 neurons is used which gives the least mean
square error (MSE) between the actual and the predicted data. The output layer has one
neuron corresponding to the concentration of gas. The predicted performance metric, y,
given by the neural network model is as follows.
5 3
y Purelin w 2i1, Tansig w x j θ1j θ 21 (6)
i 1 j1 1ji
Where: xj is the input of node j in the input layer.
W1ji is the weight between node j in the input layer to node i in the hidden layer. θ1i is the
bias of node i in the hidden layer. W2i1 is the weight between node i in the hidden layer to
the node in the output layer. θ21 is the bias of the node in the output layer. The numbers 5
and 4 are the numbers of nodes in the hidden layer and in the input layer, respectively,
using a simple linear transformation. All performance data are scaled to provide values
between –1 and 1. Scaling is performed for two reasons: to provide commensurate data
ranges, so that the regressions are not dominated by any variable that happened to be
expressed in large number, and to avoid the asymptotes of the sigmoid function. During the
training, the weights of the neural network are iteratively adjusted to minimize the network
performance function MSE. The validation set is used to stop training early if the network
performance on the validation set fails to improve. Test set is used to provide an
independent assessment of the model predictive ability. The percentage error is important;
100% error means a zero prediction accuracy and error close to 0% means an increasing
prediction accuracy. For the proposed Neural Network model, the percentage error is found
to be 0.662%, 0.031%, 0.162%, 1.5% for sensors TGS 822, TGS 3870, TGS 2600 and TGS 4160,
respectively. MLP provides an optimized structure which provides linear discrimination
between both gases. Figs 35, 36, 37, and 38 depict the MLP results in separating butane and
propane gas by using TGS 822, TGS 3870 , TGS 2600 and TGS 4160 sensors respectively. The
sign circle indicates butane gas where as the sign plus indicates propane gas. Table 5 depicts
the results of classification for different gases among the different semiconductor sensors.
From the presented results, TGS 3870, TGS 2600 and TGS 822 gas sensor can discriminate
both gases rather than TGS 4160 which does not give different response with different
conditions. This conclusion is obtained from the multiplayer perception of Neural Network.
The Neural Network model does not yield a mathematical equation that can be
manipulated. However, its strength lies in its ability to accurately predict system
performance over the entire design space and its ability to compensate for the inherent
information inadequacy by requiring large and well spread training sets. Neural Network
with feed forward back propagation is used to detect the concentration of each gas. MLP is
able to separate between the mixtures with linear discrimination. TGS 3870 gives the
optimum classification with a percentage error of 0.031%, then, TGS 2600 gas sensor can be
classified between them with a percentage error 0.162%, then, TGS 822 gas sensor gives
percentage error of classification 0.662%. However, TGS 4160 gas sensor failed to
discriminate both gases and gives a percentage error of 40%.
Electronic Nose System and Artificial Intelligent Techniques for Gases Identification 189
Neural Network with MLP method is very robust against sensor nonlinearities, time effects,
and error rate depending on the selection of data, which is acquired by different
semiconductor gas sensors (Morsi –a, 2008) .
Fig. 17. Butane with TGS 822 at RL 1 k Fig. 18. Butane with TGS 822 at RL 7 k
Fig. 19. Propane with TGS 822 at RL 1 k Fig. 20. Propane with TGS 822 at RL 7 k
Fig. 21. Butane with TGS 3870 at RL 1 Fig. 22. Butane with TGS 3870 at RL 7 k
Fig. 23. Propane with TGS 3870 at RL 1k Fig. 24. Propane with TGS 3870 at RL 7 k
190 Data Storage
Fig. 23. Propane with TGS 3870 at RL Fig. 24. Propane with TGS 3870 at RL
Fig. 25. Butane with TGS 2600 at RL Fig .26. Butane with TGS 2600 at RL 7
k k
Fig. 27. Propane with TGS 2600 at RL Fig. 28. Propane with TGS 2600 at RL
Fig. 29. Butane with TGS 4160 at RL Fig. 30. Butane with TGS 4160 at RL
k
Fig. 31. Propane with TGS 4160 at RL 1 Fig. 32. Propane with TGS 4160 at RL 7
k k
Electronic Nose System and Artificial Intelligent Techniques for Gases Identification 191
Fig. 33. Mixture of propane and butane at RL 1 Fig. 34. Mixture of propane and butane at RL 7
k k
Vectors to be classified
P(2)
Fig. 35. Separation between propane and butane using NN(MLP) with TGS 822
Vectors to be classified
P(2)
P(1)
Fig. 36. Separation between propane and butane using NN (MLP) with TGS 3870
192 Data Storage
P(1)
Fig. 37. Separation between propane and butane using NN (MLP) with TGS 2600
Vectors to be classified
P(2)
P(1)
Fig. 38. Separation between propane and butane using NN (MLP) With TGS 4160
%
Classificatio
Sensors No. of inputs No. of output Test Epoch % Error Unflassificatio
n
n
TGS 822 3 (100 sample) 1 (100 sample) 30 sample 1000 0.662% 60% 40%
TGS 3870 3 (100 sample) 1 (100 sample) 30 sample 1000 0.031% 97% 3%
TGS 2600 3 (100 sample) 1 (100 sample) 30 sample 1000 0.162% 96% 14%
TGS 4160 3 (100 sample) 1 (100 sample) 30 sample 1000 1.5% 40% 60%
Table. 5. The results of classification by using Neural Networks (MLP) for four different
sensors .
Electronic Nose System and Artificial Intelligent Techniques for Gases Identification 193
Step 1: Identify and name the input linguistic variables, the output linguistic variables and
their numerical ranges. There are three input variables which are temperature, output
voltage of the microcontroller, and the variable resistance related to each sensor. The output
variable is the concentration of each gas. There are five identified ranges for each variable
(Morsi , 2007)
194 Data Storage
Step 2: Define a set of fuzzy membership functions for each of the input and the output,
variables. The low and high values are used to define a triangular membership function. The
height of each function is one and the function bounds do not exceed the high and low
ranges listed above for each range. The membership functions must cover the dynamic
ranges related to the minimum and maximum values of inputs and outputs that represent
the universe of discourse.
Step 3: Construct the rule base that will govern the controller’s operation. The rule base is
represented as a matrix of input and output variables. At each matrix row different input
variable ranges with one of the output variable range. All rules are activated and fired in
parallel whether they are relevant or not and the duplicate ones are removed to conserve
computing time. Each rule base is defined by ANDing together with the inputs to produce
each individual output response. For example, If temperature is low AND, if voltage is low
AND if, RL is low THEN concentration is low.
Step 4: The control actions will be combined to form the excited interface. The most common
rules combination method is the centroid defuzzification to get the crisp output value. This
step is a repeated process, after all adjustments are made, which allows the fuzzy expert
system to be able to discriminate and classify the data set patterns of the different gases.
Electronic Nose System and Artificial Intelligent Techniques for Gases Identification 195
Fig. 40. Fuzzy membership functions with three input variables (Temperature, variable
resistance, voltage of each sensor) and output variable (concentration).
196 Data Storage
Mamdani type fuzzy controller is used to construct the rules, which are extracted from the
data driven from the microcontroller. It can be used to discriminate and classify the data set
patterns for different gases according to the variation in different parameters, such as gas
concentrations, variation in sensor’s resistance and output voltage of microcontroller at
different temperatures and to improve the sensor’s selectivity for gas identification.
Due to the abundant number of membership function figures, the results are limited to
representing the fuzzy logic output surface for each sensor. Fuzzy logic gives on line
prediction of the concentration depending on the behavior of each gas with different sensors
which are extracted from the experiments. The feature of each gas is detected, based on the
fuzzy system. The input and output surface of the fuzzy inference system is illustrated in
figs 41-60 (Morsi, 2007).
Fig (41) The behavior of TGS Fig (42) The behavior of TGS Fig (46) The behavior of TGS Fig (47) The behavior of TGS
822 822 2600 2600
output volt with Butane gas output volt with Carbon output volt with Butane gas output volt with Carbon dioxide
7. Conclusion
The large scale of data is able to provide high-level information to make decisions about
each gas achieved by using efficient and affordable sensor systems that show autonomous
and intelligent capabilities. Measurements have been done using an electronic nose design,
depending on commercially available gas sensors. An enormous data collected allows
analyzing and characterizing of different gases. Several tests have been carried out by
eliminating from the complete time series, a subset of data in a random way to evaluate the
reliability of the developed model. The data set is characterized by several problems which
can generate errors during processing and prediction phases inducing:
Missing data, caused by the periodic setting or the stop function of
instruments.
Incorrect recording data occurred by errors in transmission, recording and
non-setting of equipment.
198 Data Storage
Some interpolation and validation techniques to address and solve the above problems
depend on building a mathematical model based on a suitable regression analysis and
interpolation, which is able to describe the behaviour of daily average concentration. The
aim of modeling for gases studies is to describe the peculiar characteristics of gases as alarm
situations or risk events. Electronic nose provides, low cost, low maintenance small size, and
in some cases low power consumption. The system can handle problems such as sensor
drift, noise and non-linearity. It is used as a simple alarm level based on index data. This
could be used as a stepping stone for the development of more complex systems, which are
needed for more demanding applications.
8. Acknowledgement:
I would like to express my sincere gratefulness and gratitude to Prof. Dr. Moustafa Hussein
Professor of Optical Communications and OSA member. Vice Dean for education and
research affairs, College of Engineering and Technology in Arab Academy for Science,
Technology and Maritime Transport, e-mail [email protected], for his tolerance in revising
the chapter as well as his endless guidance and support.
9. References:
Amigoni, F.; Brandolini, A.; Caglioti, V.; Lecce, V.; Guerriero, A.; Lazzaroni, M.; Lombardi,
F.; Ottoboni, R.; Pasero, E.; Piuri, V.; Scotti, O. & Somenzi, D. (2006). Agencies for
perception in environmental monitoring. IEEE Transactions on Instrumentation and
Measurement, Vol. 55. No.4. (august 2006), pp. (1038 – 1050), DOI 10.1109/ TIM,
2006. 877747.
Becker, I.; Mihlberger, S.; Bosch, C.; Braunmuhl, V., Muller, G.; Meckes, A. & Benecke, W.
(2000). Gas mixture analysis using Micro – reactor systems. IEEE Journal of micro
electromechanical systems, Vol.9, No.4, (december 2000), pp. (478 – 484), DOI S1057 –
7157 (00) 10867-4.
Bourgeois, W.; Romain, A.; Nicolas, J. & Stuetz, R. (2003). The use of sensors arrays for
environmental monitoring: interests and limitations. J. Environmental, Monitoring,
Vol.5, (october 2003), pp (852 – 860), ISSN: 10. 1039 / b 307905h.
Belhovari, S.; Bermak, A.; Wei, G. & Chan, P. (2004), Gas identification algorithms for
microelectronic gas sensor, Proceeding of instrumentation and Measurement
Technology, pp. 584 – 587, ISBN 0-7803 – 8248 Como, (may 2004), IMTC 2004, Italy.
Belhovari, S.; Shi, M.; Bermark, A. & Chan, P. (2005). Fast and robust gas identification
system using an integrated gas sensor technology and gaussian mixture models.
IEEE Sensors Journal, Vol.5, No.6, (december – 2005), pp. (1433 – 1444), DOI
510.1109/ JSEN, 2005. 858926.
Belhovari. S.; Shi, M.; Bermark, A. & Chan, P. (2006). Gas identification based on committee
machine for micro electronic gas sensor. IEEE Transactions on Instrumentation and
measurement, Vol. 55, No.5, (october 2006), pp (1786 – 1793), DOI S10.1109/TIM.
2006. 880956.
Clifford, K.H.; Robinson, A.; David, R. & Mary, J. (2005) Overview of sensors and needs for
environmental monitoring. Sensors, Vol. 5, No. 28, (february 2005), pp. (4-37), ISSN
1424 – 8220.
Electronic Nose System and Artificial Intelligent Techniques for Gases Identification 199
ISBN 978 – 1 – 4244 – 1766 – 7108 Orlando, (november 2008), IEEE industrial
electronics society, Florida.
http://www.figarosensor.com (on line).
Smulko, J. (2006). The measurement setup for gas detection by resistance fluctuations of gas
sensors, Proceeding on Instrumentation and Measurement Technology, pp. 2028 – 2031,
ISBN 0-7803, 9360, 0106, Sorrento, (april 2006), IEEE instrumentation and
measurement society, Italy.
Timothy, J. (1995). Fuzzy Logic with Engineering Applications, McGraw- Hill INC, ISBN 0-07-
053917-0, New York.
Tanaka, K. (1996). An Introduction to Fuzzy Logic for Practical Applications, Springer, ISBN 0-
387-94804-4, New York.
Wesley, J. (1997). Fuzzy and Neural Approaches in Engineering, John Wiley & Sons INC, ISBN
0-471-19247-3, 1, USA.
Wilson, D.; Hoyt, S.; Janata, H.; Booksh, K. & Obando, L. (2001). Chemical sensors for
portable handheld field instruments. IEEE Sensors Journal, Vol.1, No.4 (december
2001), pp. (256-274), ISSN 1530 – 437X (01) 11120 – 6.
Weigong, J.; Chen, Q., Reu, M.; Liu, N. & Daoust, C. (2006). Temperature feedback control
for improving the stability of a semiconductor metal oxide (SMO) gas sensor, IEEE
Sensor Journal, Vol. 6, No.1, (february 2006), pp. 139 – 145, ISSN 10.1109/ JSEN,
2005. 844353.
Zylka, P. & Mazurek, B. (2002). Rapid dissolved gas analysis by means of electrochemical
gas sensor, Proceeding of 14th International conference on Dielectric Liquids, pp. 325 –
328, ISBN 0-7803 – 7350 – 2102, Graz, (july 2002), ICDL 2002, Austria.
Medical Image Intelligent Access Integrated
with Electronic Medical Records System for Brain Degenerative Disease 201
12
X
Singapore
hFaculty of Sciences, University of Besancon, France
*Email: [email protected]
National Taiwan University, Taipei, Taiwan
Introduction
As the computer data storage capacity increases and technology of digital imaging
progresses rapidly, today we can access and manipulate massive image database on the
Internet. How to search and access intelligently on content based image database become a
prominent focus in the field of multimedia research. In this chapter, to support the
diagnostic decision making, a new idea of grid-distributed, contextual, representation, use
of specific visual medical knowledge, intelligent information access framework for medical
images database was integrated with radiology reports and clinical department information.
It will assist reducing the human, legal and financing consequences of these medical errors.
A report by the American Hospital Association suggests that US hospitals use only
1.5%~2.5% of their Hospital Medical budgets on data information systems, which is less
than the 5-10% of similar funds dedicated to such systems found in the budgets of other
industries. Moreover, as computer data storage capacity increases and the technology
involving digital imaging progresses rapidly, the traditional image has been replaced by
Computer Radiography (CR) or Digital Radiography (DR) derived imagery, Computed
Tomography (CT), Magnetic Resonance Imaging (MRI), and Digital Subtraction
202 Data Storage
Angiography (DSA). This advance reduces the space and the cost associated with X-ray
films, speeds up hospital management procedures improving also the quality of the medical
imagery-based diagnosis quality. American hospitals, in particular, often consult image
diagnostic professionals from India via Internet due to budget consideration. The clinical
information gathered by this study will alleviate the budget issue because the
tele-consultation regarding image diagnostics will then be feasible.
For the current medical image database system, image retrieval can be performed by using
keywords for searching in pre-interpreted reports. However, the characteristics of free-text
reports may compromise with the effort for clarification from low level features such as
color, shape, texture, to medium level features describing by collection and spatial
relationships, or highest abstract features which are semantic or contextual. In 1990s, the
content-based image retrieval (CBIR) started [1]and grew steadily over the last ten years. [2]
There is growing interest in CBIR because of the limitations inherent in metadata-based
systems. Textual information about images can be easily searched using existing technology,
but requires humans to personally describe every image in the database. However, CBIR
query uses a lot of computer power to process, and it can economize a large manpower.
Therefore, the content-based retrieval of the medical image also becomes a significant field
for medical assistance, medical education and medical research. At clinics, when a physician
diagnoses a patient through the patient’s medical images, he has to entry the PACS and RIS.
And he makes a diagnosis in another system, EMR (Electronic Medical Records). [3]
In Taiwan, the National Taiwan University Hospital (NTUH), which is traditionally the
leading medical center, has recently adopted a complete health recording system, including
an Electronic Medical Record (EMR) system linked to Hospital Information System (HIS),
Research and Information System (RIS), Picture Archiving and Communication System
(PACS), and other clinical information systems. National Taiwan University (NTU), one of
the research partners of ONtology and COntext related MEdical image Distributed
Intelligent Access (ONCO-MEDIA) project1 is leading a study of the integration between
the Content-Based Medical Image Retrieval (CBIR) and EMR systems as well as on its
implications in providing improved clinical diagnostic and therapeutic decision support.
Dementia is the loss of mental functions -- such as thinking, memory, and reasoning -- that
is severe enough to interfere with a person's daily functioning. Dementia is not a disease
itself, but rather a group of symptoms that are caused by various diseases or conditions.
Symptoms can also include changes in personality, mood, and behavior. In some cases, the
dementia can be treated and cured because the cause is treatable. Dementia develops when
the parts of the brain that are involved with learning, memory, decision-making, and
language are affected by one or more of a variety of infections or diseases. The most
common cause of dementia is Alzheimer's disease, but there are as many as 50 other known
causes. Most of these causes are very rare. [5] Dementia is a word for a group of symptoms
caused by disorders that affect the brain. People with dementia may not be able to think
well enough to do normal activities, such as getting dressed or eating. They may lose their
ability to solve problems or control their emotions. Their personalities may change. They
may become agitated or see things that are not there. [6] Dementia has become more and
more prevalent in recent years. In the United States, there are approximately 5 million
people suffering from dementia and that number is projected to rise above 16 million by the
Medical Image Intelligent Access Integrated
with Electronic Medical Records System for Brain Degenerative Disease 203
year of 2050. Presently, Americans pay US$5000 per patient per year for dementia
medication and associated nursing care costs [9] [10].
In Taiwan, there were approximately 140 thousand people suffering from dementia in 2005
and there will be 650 thousand dementia cases by the year of 2050. MMSE (mini-mental
state examination) is commonly used in medicine to screen for dementia.The MMSE test is a
brief 30-point questionnaire test that is used to screen for cognitive impairment. It is also
used to estimate the severity of cognitive impairment at a given point in time and to follow
the course of cognitive changes in an individual over time, thus making it an effective way
to document an individual's response to treatment.
Since Dementia, as a disease, is an important and long term problem that causes significant
burdens for families and societies, this study has endeavored to find a viable procedure to
ameliorate the treatment of dementia patients and to enhance the early diagnosis and
monitoring of its progression. [11]
In this chapter, we chose Dementia to establish a system for CBIR with EMR. Dementia is a
neurological disease, usually the clinical course is long, and it represents a variety of
characteristics in brain image such as CT (computed tomography), MRI (magnetic resonance
imaging) or PET (Positron Emission Tomography). If the doctor wants to diagnosis a
symptom, he needs a series of images to diagnose or decide for therapeutic strategies.
Therefore, the image database correlated with clinical information would be crucial in care
of a dementia patient. In addition, usually it is an intensive collaboration among
neurologists, radiologists and other clinical specialties. This study focused on Dementia as a
pathology model, in order to elaborate a prototyping system for CBIR with EMR. Dementia
is a neurological disease, usually characterized by a slow histopathology, and presents itself
with a variety of characteristic abnormalities in brain imagery such as CT, MRI, or PET. In
the course of the treatment, a doctor may need a series of images to make the proper
diagnosis or to make critical decision for therapeutic strategies. Therefore, an image
database infused with clinical information could become a major component for the
improvement of the dementia patient care. The chapter will describe a novel concept of
Medical Image Distributed Intelligent Access Integrated with Electronic Medical Records is
expected to enhance the early diagnosis and monitoring of disease progress.
2. Methods
Integration of RIS, EMR, PACS and Clinics to Support Diagnosis
At present, most hospitals store the medical images from CT, MRI, DSA and X-ray film in
PACS. The clinicians make differential diagnosis of a patient in EMR system with references
to laboratory results and image reports. Therefore, we have to provide the essential
information from EMR, PACS and RIS to clinicians, such as neurologists, to support their
decision. On the other hand, in the department of medical imaging, the radiologists also
need to refer to medical information recorded by other specialties to interpret medical image
for their reports on the RIS [7]. The image report by the radiologists could assist the
clinicians to make correct diagnoses; however, the correct image interpretation also depends
on the crucial medical information that clinical doctors must input. This co-dependence
demonstrates the need for two-way communications between imaging professionals such as
radiologists and their clinical counterparts that are treating the patients. Therefore, in this
study, the integration of EMR, RIS and PACS and clinics input was implemented to
204 Data Storage
establish a prototype model for intelligent access of medical image database, and retrieve
clinical information automatically. (Fig.1.) The integrated user interface used the ICD-9
(International Classification of Disease, Ninth Revision) 331.0 code to identify and query its
relative medical information and medical images from HIS, PACS or RIS. Therefore, the
important thing is to define what the essential clinical information is.
Inter-disciplinary collaboration
This important process was the collaboration of the interdisciplinary among neurology,
medical informatics and radiology experts. In this study, a prototype model was designed to
assist the procedure involving the physician-in-the-loop approach [4].In this prototype
model, a group of neurologists and radiologists collaborated to establish a common
language involving the image diagnosis. In the EMR integration with RIS and PACS, we
based on the diagnostic criteria and practical guideline to satisfy the diagnosis procedure
and requirement. Moreover, the establishment of the ontology of dementia was in
accordance with the essential clinical information of dementia. The process flow chart was
as follow:
Medical Image Intelligent Access Integrated
with Electronic Medical Records System for Brain Degenerative Disease 205
3. Results
Essential Clinical information
After the inter-disciplinary collaboration, we got the final definition of essential clinical
information in dementia when the neurologist diagnoses in a clinic. In the model, the
essential clinical information of dementia included is summarized as follows:
A. Base information a. Sex
b. Age
c. Country
d. Residency
e. Education (yr)
f. Occupation (pre-retired work)
g. Language (dialect)
B. Clinical history a. Handedness
b. Age of onset
c. Initial symptom sequence (multi-choice):i. memory, ii.
personality, iii .language, iv. gait and v. bradykinesia
d. Course: i. rapid progression (< 1 year), ii. chronic
progression, iii. stepwise and iv. fluctuated
e. Risk factors (multi-choice): i.CVA, ii.HTN, iii.DM, iv. cardiac
disease, v. hyperlipidemia, vi. obesity, vii. physical
inactivity, viii. vegetarian, ix. parkinsonism and x. family
history
C. Clinical a. Normal
diagnosis b. MCI (Mild Cognitive Impairment)
c. AD (Alzeimer Disease)
d. VaD (Vascular Dementia)
e. Mixed type
f. FTD (Frontotemporal Dementia)
g. DLB (Dementia with Lewy Bodies)
h. Dementia, other types
D. Lab a. CBC
b. Electrolyte
c. BUN/Cre
d. GOT/GPT
e. T4/TSH
f. B12/folate
g. Lipid profile
h. VDRL
i. Hachinski ischemic score
j. MMSE
k. CDR
The prototype system is designed to recognize a positive dementia diagnosis and
reconfigure its patient information presentation accordingly. International Statistical
Classification of Diseases and Related Health Problems (ICD) provides codes to classify
diseases and a wide variety of signs, symptoms, abnormal findings, complaints, social
circumstances and external causes of injury or disease.[14] The ICD-9(International
Medical Image Intelligent Access Integrated
with Electronic Medical Records System for Brain Degenerative Disease 207
Classification of Disease, Ninth Revision) was published by the WHO in 1977.[15] In EMR of
National Taiwan University, we use ICD-9. Each code of ICD-9 codes has a precise meaning,
and as a whole it covers practically all known symptoms. ICD-9 codes 290-319 are mental
disorders. And ICD-9 of dementia is 290. ICD-9 codes 320-359 are Diseases of the nervous
system and code 331.0 is Alzheimer’s disease. In diagnosis flow with the model of the
research, an input of the ICD-9 331.0 code for Alzheimer’s disease will alter and display the
patient information, such as: base information, clinical history, lab results and medical
images. (Fig. 3.)
Neurology clinic
Research case
ICD 331.0
ICD 290.XX
N
Diagnose
fi i h d?
Yes
End
On the other hand, if dementia has not been yet, the neurologists would still like to know
the patient’s base information, clinical history, and lab results. Using this clinical history, the
208 Data Storage
neurologists may make a choice reflecting the initial symptom sequence, course, and risk
factors, which will also support him to make an early diagnosis in the future. (Fig. 3.)
With the established prototype model, the data schema of the essential clinical information
of dementia will be integrated into the EMR user interface in the clinical department in the
EMR system of NTUH. Clinical doctors can input the essential clinical information, such as
base information, clinical history, clinical diagnosis, upon initiation of dementia diagnoses
and the system will present a prompt with a pop-up menu for easy data entry. At the same
time, the prototype model will automatically retrieve the relative medical images and image
reports by the radiologists in order to support the diagnosis. (Fig.4)
Moreover, the radiologists get images from PACS with differential diagnosis of dementia
and the RIS can simultaneously automatically retrieve the essential clinical information
related to specific image characteristics defined by a consensus of the neurology and
radiology experts (Fig.5.). In this way, both user interfaces are designed to open a pop-up
window with the pull down bar when the clinician or radiologist is ready to input data into
the system. (Figs 4-5) [8]
4. Discussion
In this study, the intelligent information access framework for medical image databases was
designed to integrate radiological reports and clinical information. The most important
concern in this approach is the interdisciplinary collaboration among neurology, medical
informatics and radiology experts. The second important concern would be the
implementation with the critical and service-oriented hospital information system.
Therefore, we will test the system by the physician-in-the-loop approach to enhance
diagnosis practically and revise the system.
Moreover, we focused on decision support for dementia diagnosis, teaching and research.
Therefore, we would retrieve images of similar patients via a medical grid by querying
keywords of the base information, clinical History, clinical diagnose, Lab. we get the new
retrieved image database for the patients. Therefore, we will analysis their medical image
correlation by image processing and the relevance of the clinical data and images in order to
assist diagnosis, research and teaching.
Security and privacy is also a very important issue in this field of research. First, we build
trusted electronic relationships between healthcare customers, employees, businesses,
trading partners and stakeholders. Therefore, when we use the patient’s anamnesis,
examination data, or medical images, whether we have the patient’s permission or not, there
must be a set of procedures to follow accordingly. We plan to consider the security and
privacy function in the system, in the future.
The implementation of this prototyping system must be well organized and the initial
testing done on an offline system. The clinical data could be backed-up and copied to a
210 Data Storage
separate system for the purposes of the trial. When the first prototype model becomes fully
implemented, the system could be expanded by adding other neurodegenerative diseases
one by one to enhance the power and comprehensiveness of the intelligent retrieval for
clinical practice.
In the future, the system will be continuously developed to extend its spectrum of diseases
and clinical specialties.
5. Acknowledgements
This research work was partially supported by the u-Hospital project in National Taiwan
University funded by the Ministry of Education and the National Science Council
(95R0062-AE00-05), Taiwan, and by the ICT-Asia ONCO-MEDIA project, France.
6. Reference
[1]H. Muller, N. Michoux, D. Bandon ,A. Geissbuhler; “A Review of Content Based Image
Retrieval Systems in Medical”, International journal of medical informatics, vol. 73,
page(s): 1-23, 2004.
[2]W. Niblack, R. Barber, W. Equitz, M. D. Flickner, E. H. Glasman, D. Petkovic, P. Yanker, C.
Faloutsos, and G. Taubin, QBIC project: querying images by content, using color,
texture, and shape, in: W. Niblack (Ed.), Storage and Retrieval for Image and Video
Databases, Vol. 1908 of SPIE Proceedings, pp. 173-187, 1993.
[3]Lei Zheng; Wetzel, A.W.; Gilbertson, J.; Becich, M.J.; “Design and analysis of a
content-based pathology image retrieval system”, IEEE Trans. Information
Technology in Biomedicine,vol. 7, no. 4, Dec. 2003, page(s):240-255.
[4]Kak, A.; Pavlopoulou, C ;”Content-based image retrieval from large medical databases”,
Proceedings of the First International Symposium on 3D Data Processing
Visualization and Transmission, 19-21 June 2002 Page(s):138 – 147.
[5]Joseph R Carcione, DO, MBA on January 01, 2007), “Alzheimer's Disease Guide”,
http://www.webmd.com/alzheimers/guide/alzheimers-dementia
[ 6]“Medline Plus: Dementia”, http://www.nlm.nih.gov/medlineplus/dementia.html
[7]Bueno, J.M.; Chino, F.; Traina, A.J.M.; Traina, C., Jr.; Azevedo-Marques, P.M., “How to
add content-based image retrieval capability in a PACS”, Proceedings of the 15th
IEEE Symposium on Computer-Based Medical Systems, 4-7 June 2002, page(s):321 –
326.
[8]Traina, A.; Rosa, N.A.; Traina, C., Jr, “Integrating images to patient electronic medical
records through content-based retrieval techniques”, Proceedings of the 16th IEEE
Symposium on Computer-Based Medical Systems, 26-27 June 2003, page(s):163 –
168.
[9]“1.7m 'will have dementia by 2051”, http://news.bbc.co.uk/1/hi/health/6389977.stm, 27
February 2007, BBC NEWS.
[10]“Report of Alzheimer’s Disease Facts and Figures 2007”, Alzheimer's
Association,http://www.alz.org/
[11]“Taiwan dementia policy advocacy 2006”, Taiwan Alzheimer’s Disease Association,
http://www.tada2002.org.tw/index.html
[12] http://www.nlm.nih.gov/research/umls/
Medical Image Intelligent Access Integrated
with Electronic Medical Records System for Brain Degenerative Disease 211
[13]Le Thi Hoang Diem; Chevallet, J.-P.; Dong Thi Bich Thuy;” Thesaurus-based query and
document expansion in conceptual indexing with UMLS: Application in medical
information retrieval”, Research, Innovation and Vision for the Future, 2007 IEEE
International Conference on 5-9 March 2007 Page(s):242 - 246
[14 ] http://www.cdc.gov/nchs/about/major/dvs/icd9des.htm
[15] Yao-Yang, Shieh; Shaw, B.C., Jr.; Roberson, G.H.; ”Internet Web-based teaching for
cross-sectional anatomy and radiological imaging”, Computer-Based Medical
Systems, 1998. Proceedings. 11th IEEE Symposium, 12-14 June 1998 Page(s):192 - 197
212 Data Storage
Use of RFID tags for data storage on quality control in cheese industries 213
13
X
1. Introduction
The current laws that regulate the food security in the European Union lay down the
principles and general requirements of the food legislation. In the article 3 of the Regulation
178/2002 the traceability is defined as the “ability to trace and follow a food, feed, food-
producing animal or substance intended to be, or expected to be incorporated into a food or
feed, through all stages of production, processing and distribution”. In addition, “the food
traceability must be guaranteed in all the stages described above: production,
transformation and distribution” which implies the obligation of being able of identifying
every product in the company providing a complete information about it (Regulation 178,
2002).
Depending on the activity in the food chain three different types of traceability can be
distinguished:
1) Back traceability, called “suppliers traceability” as well, refers to the possibility of having
knowledge of what products are coming into the company, where are they coming
from as well as which farmers are their suppliers.
2) Internal traceability or “process traceability” refers to the information about what is
made, from what it is made, how and when it is made and the identification of the
product.
3) Forward traceability or “client traceability” means the possibility of knowing what
products the company delivers, when and to whom they have been supplied (Coscarón
et al. 2007; Spanish Agency for Food Security, 2004).
Although the law imposes a generic obligation of traceability, it is not mentioned the way in
which companies can achieve that goal (Decree 1808, 1991; Decree 217, 2004).
Nowadays, the sector of cheese production has no a procedure that exhaustively guarantees
a proper traceability throughout all the fabrication stages. The main problem is the cheese
ripening, done in special chambers as shown in Figure 1. The surrounding conditions in
these rooms, such as humidity, temperature, product handling (turning), mould and flora
growing avoid the individual labeling of the products. Is for this that the production and
quality control are performed by batches and the data storage of the information is done
manually by the company staff.
214 Data Storage
To have an idea, for example, the “Ibores” cheese (Spanish Southwest P.D.O.) has to be
made with milk of mountain goats, with a preservation temperature less than 6º C, a
minimum protein content of 6%, fat around 4%, pH higher than 6,5 and so on. One of the
elaboration steps is the salting , which can be wet or dry, using only sodium chloride. In
case of wet salting, the maximum stay time shall be 24 hours in a saline solution with a
maximum concentration of 20%. Regarding the ripening time, it should be, at least, sixty
days, being the finished product cylindrical with a height from 6 to 9 cm, diameter
comprised between 12 and 15 cm and a weight from 750 to 1200 g. In the most traditional
presentation, the cheese is coated with paprika or smeared with olive oil, appearing diverse
colorations due to different moulds (Order 25/4, 1997). This is only a brief summary of the
conditions that “Ibores" cheeses should meet in order to be qualified as product with P.D.O.
The agency responsible for accrediting the P.D.O. is the Regulatory Board. Usually a
technical expert visits the cheese industry requesting to the manufacturer a collection of data
corresponding to the elaboration process to certify, if appropriate, the guarantee of origin
and quality. Besides, the Regulatory Boards of P.D.O. are recognized as entities of product
certification according to the European Standard EN 45011. This regulation shows in detail
the criteria to be applied by these entities to perform a certification of products reliable and
transparent. For instance, in the document that details the criteria to be evaluated by the
technical staff of the Regulatory Board of P.D.O. “Torta del Casar”, other Spanish Southwest
P.D.O. (Order 11/1, 1999; Order 9/10, 2001; Order 6/5, 2002; Regulation 1491, 2003),
appears that the Regulatory Board lays down a control based on sampling and analysis of
milk and cheese. The results will be used in the development of a set of measurable data
which will serve as basis for the accreditation process and to verify the adequacy of the
dairy industry to the certification system. This system, in fact, is based on a procedure of
self-control that includes several actions to be done by the person in charge of quality
Use of RFID tags for data storage on quality control in cheese industries 215
control in the company. These actions must be the periodic checking, as determined by the
regulations, of the physical, physicochemical, microbiological and organoleptic parameters
of the batches to be issued to the market, having a record of the results (CC/3, 2005; CC/4,
2007).
As it is clear from above, it is desirable that the cheese industry and the Regulatory Board of
P.D.O. interact effectively and efficiently to achieve that the final product reaches the
consumer with full guarantee of its origin, manufacturing process or quality. Therefore the
main objective here will be to join efforts between the dairy companies, in terms of
traceability, and the Regulatory Boards of P.D.O. with respect to their quality systems to
certificate products. In this way, the binomial Company-Regulatory Board would have the
necessary tools (hardware-software) that would enable compliance with current legislation
regarding traceability, as well as a continuous quality improvement thanks to the
optimization of the technological process according to the information obtained,
respectively. The key points here would be the speed on getting the required information,
the time saved and the simplification of tasks that would provide the implemented tool
compared to the manual records of data usually done by both, the employees of the cheese
industry and the technical staff of the Regulatory Board of P.D.O. in charge of the quality
certification.
To achieve this goal, a system which deals with the use of Radio Frequency IDentification
(RFID) has been developed. The RFID is a contactless method for data transfer in object
identification. The transmission of data is carried-out through electromagnetic waves. The
tag (transponder) consists of a microchip, as well as an antenna, which are usually put in
some form of casing for protection against different environments. The RFID read/write
device creates a weak electromagnetic field. If a RFID tag passes this field, the microchip of
the transponder wakes up and can send or receive data without any contact to the reader. If
the tag leaves the field, the communication gets interrupted and the chip on the tag stops
working, but the data on the tag keeps stored (Finkenzeller, 2003). To provide the system
with portability a PDA (Personal Digital Assistant) with reader and embedded antenna can
be used. Figure 2 shows the usual components of a RFID system.
There are different RFID technologies available: passive tags have no own power source and
take their energy directly from the magnetic field of the reader. Passive RFID tags do not
need any maintenance, but the reading distance depends on the size and frequency of the
transponder and antenna. Active tags are much more complex than passive tags and have
an internal battery to increase the reading distances. The life time of active tags is limited
through the battery. The semi-passive tags contain a power source, such as a battery, to
power the microchip’s circuitry. Unlike active tags, semi-passive tags do not use the battery
to communicate with the reader. Communication is done in the same manner than passive
tags do. Semi-passive tags might be dormant until activated by a signal from a reader. This
conserves battery power and can lengthen the life of the tag. There are systems with 134
kHz, 125 kHz, 13.56 MHz, 868 MHz, 915 MHz, 2.45 GHz available, having different
properties and reading distances depending on the environmental conditions.
Hence the main characteristics of a RFID system are:
- Communication done without physical contact between the reader and the tags.
- Soil resistant.
- No line of sight required.
- Read/Write memory.
- Possibility to store production and/or product data into the tag´s memory.
- Multi read /write capability.
Obviously, the use in dairy products of these “smart cards or tags” allows obtaining the
benefits that come from the fact of providing the product with a “certain kind of
intelligence”, such as:
- To have only one identity.
- To be able to communicate with his environment in a efficient way.
- To be capable of obtaining and retaining the information about itself.
Also in this context, the RFID solution offers, versus the „traditional” bar code, the
following advantages:
- Huge number of data in a reduced physical space.
- Automatic writing/reading in either individual or by batch mode. This is performed
thanks to the use of anti-collision algorithms, which allow reading several RFID tags
without interferences.
- Bar code needs visibility to work properly and the information stored can not be changed
using the same tag.
- Bar code identifies a type of object. The RFID tag identifies an only object in an only way.
- Bar code is easily damaged in wet environments like are the ripening chambers where
mould growing exists. In this case a RFID tag can be wrapped, for example, by biofilm
without affecting the reading/writing process.
Barcodes are of course cheap to create, but they are limited in their storage capacity and are
not flexible, when data needs to be changed.
Taking into account all the features mentioned above, a system based on the use of RFID
transponders seems to be the most appropriate to carry-out the proposed tasks. The idea is
to use RFID tags as physical support for storing the information required to perform a
“complete traceability” in cheese industries, as well as facilitate the collection of data
required by the technical experts of the Regulatory Boards in charge of quality certification
(Pérez-Aloe et al., 2007). The application will perform the three types of traceability (vendor,
process and customer) and an individual register of analysis and controls done in all the
Use of RFID tags for data storage on quality control in cheese industries 217
production stages (milk reception, storage, fabrication, ripening, quality and yield control
and pH measures) as well. In this way, the collected data corresponding to the process
conditions (humidity, temperature, pressure, ventilation) including microorganism,
biochemical and pH analysis and the connection of these data with the products provided
by the different suppliers are increased. Then, the traceability will be granted, both on
batches and on individual cheeses. This contributes to assure and certificate the quality,
making easier the location, immobilization and in some cases the effective and selective
recall when problems arise.
In this way, the device RFID would operate as an interface making easier and more efficient
the exchange of data and information between the company and the technical staff of the
Regulatory Board of the P.D.O. A sketch of this procedure can be seen in Figure 3.
WRITING READING
READING WRITING
Fig. 3. Use of RFID tags in the integrated system of quality control Company-Regulatory
Boards of P.D.O.
2. System implementation
The system developed is similar to that shown in Figure 2. It consists of two complementary
systems; one is based on a personal computer (PC) and the other in a PocketPC (PPC). The
PocketPC utility was implemented as a complement to the PC due to its portability. The
Figure 4 presents a simplified block diagram of the two systems implemented.
Software Hardware
Interface Interface
Data to write
PPC Platform CF Module
The PC system includes the S6350 midrange reader and the gate antenna series 6000 (Texas
Instruments_RFID, 2002). The reader operates at 13.56 MHz and is able to communicate
with tags that accomplish ISO 15693 protocol. The communication to the reader is done
through a PC serial port, using a RS-232 data transmission protocol, with one start bit, 8 data
bits, 1 stop bit and no parity. Several speeds can be selected within the range from 9600 bps
to 57600 bps. The PC starts the communication with the reader, through a pair of
request/response sequences accomplishing ISO 15693 standard, which establishes the
request stream format as well as the fields size. Some of the commands supported by the
reader can be used with addressing, i.e., read, write and lock. If addressing is used, the
command will be sent to a single tag, and in the other case, the command will be
broadcasted to all tags in the visible range of the reader. For example, Figure 5 shows the
display of the reader utility after sending the command inventory. As it can be seen, as a
response to this command, all transponders visible to the reader appear on the screen.
The system is also able to detect errors in such a way that if an error occurs during the
communication, the reader will send an error code in the response stream to the PC. Some
errors are tags not found in the vision range, a write attempt to a read-only block or an
addressing to a non existing block.
Regarding the PPC system, it consists of:
- PDA with Windows PocketPC 2003® (Hewlett Packard iPAQ, 2003).
- Compact Flash reader with built-in antenna (ACG, 2006).
The hardware interface runs also at 13.56 MHz. Both ASCII and binary transmission modes
are supported by the reader, but only ASCII mode has been developed because it simplifies
the process. The transmission protocol is ASCII mode at 9600 bps. The PocketPC and the PC
version are fully compatible, so that a tag written by one application can be read by the
other without problems. The software was implemented with Microsoft® Embedded Visual
C++ environment, and the source code includes the library supplied by the reader
manufacturer. All the data stored in a tag are accessed and read in just 8 seconds.
Use of RFID tags for data storage on quality control in cheese industries 219
Two different types of labels running at 13,56 MHz have been used in this application:
- Individual tags to be used to identify each cheese.
- Batch tags which store all data set and parameters related to its manufacturing process as
well as information about the cheeses that belong to it.
Besides, two different rounded-shape tags with different sizes have been tested for
individual labels, both on casein plate in order to improve the adherence to the cheese. The
smaller one has a diameter of 9 mm and uses the chip I-Code SL2 ICS20 (Philips
Semiconductors, 2003). The other one consists of a similar chip and has a size of 20 mm
(Labelys, 2007). I-Code chip has a user memory of 128 bytes, divided into blocks of 32x32
bits. Of these 128 bytes, only 108 are addressable bytes (27x32 bits).
The tags HF-I ISO 15963 (Texas Instruments, 2005) have been selected to be utilized as batch
labels due to their characteristics of thinness and flexibility. The tags have a 2 kBs user
memory organized in 64 blocks x 32 bits as shown in Figure 6 (a).
(a)
(b)
Fig. 6. Characteristics of the tags used
(a) Memory organization. (b) List of commands.
Tags can contain read-only data (ROM) and read/write data (R/W). The stored data on
blocks can be locked on factory or by user on an irreversible process, so data can not be
modified any more. In other cases, tags can be reused for future utilizations.
All the tags have a locked field, the individual identification code UID (Unique IDentifier of
the transponder) which is a 64 bits code, provided by the manufacturer and defined in ISO
15693 standard. AFI (Application Family Identifier, such as “transportation”,” finance” …)
220 Data Storage
code allows storing the type of application and DSFID (Data Storage Format IDentifier) the
data format. Both of them are 1 byte blocks.
The HF-I tags commands list is seen in Figure 6 (b). Basically, two modes of reading/writing
tags are available: single and multiple tag operation. In single tag operation, the first action
is to detect a single tag, then the reader identifies the UID, and the subsequent
reading/writing commands are referred exclusively to it. On the other hand, in the multiple
tag operation the information is broadcasted to all the tags in the reader range.
The PPC utility includes routines for data interpretation and the communication protocol
between the compact flash reader and the tags (AGC DLL, 2006). The procedure to establish
communication between both devices and start sending commands is shown in Figure 7 (a).
As it can be seen, the port where the reader is connected has to be detected and opened. If
the port has been opened successfully then the reader is activated and ready to send a set of
commands. Figure 7 (b) details the software code that performs the steps previously
mentioned.
The “writing tag” command consists of 11 bytes in ASCII mode, so that the first two bytes
indicate the address of the block to write, the following 8 bytes are the data to write and the
last byte is the NULL or stop chain with all bits to zero. However, the “reading tag”
command contains only 3 bytes, which refer to the block number to be read and the NULL
byte. Moreover, the reader has a 512 bytes buffer, so once the command is sent and the
reader receives the data, the utility can extract the required word from the buffer. The 512
bytes size allows the hardware to read all the tag blocks. Data are sent to the tags by means
of a software application. This application is easy-to-use, so that any employee of the
Open a reader
Send command
Get data
Close a reader
char puerto[30]; // Detects the PDA port where the // CF reader is connected
RDR_DetectSerialPort(puerto);
Use of RFID tags for data storage on quality control in cheese industries 221
The routine includes a “format tag option” which stores zeros in all blocks in order the tag
to be reused for future activities. A scheme representing the mode in which data are stored
in the tag memory and the data appearance on the PPC screen can be observed in Figure 8.
Regarding the data to be recorded in the tags, the writing process has to be optimized in
order to store as much as possible cheese production parameters. The fields that will be
saved on tags can contain numerical values (temperature, fat, etc.), alphanumerical values
(type of milk, manufacturer, Batch, batch qualification, etc.) and data concerning key dates
on production (elaboration, reception, ripening, etc.). Numerical data will be codified in
binary, differentiating between integer and decimal part. The number of bits required for
each part depends on the range of values and the precision used. In some cases, a data
manipulation is performed. For example, if a variable varies from 3 to 5, for a value of 4.57,
the integer and the decimal part are codified separately.
Data in PPC
Data provided by the factory
Stage 6 Milk Reception
Variable Units Format Range
Date 00/00/00 01/01/11-31/12/99
Fig. 8. Data recorded in tag and its representation on the PPC screen
Effective range of integer part has only 3 values (3, 4 and 5), and an offset of 3 is subtracted,
so only 2 bits are needed (0 to 2) instead of the 4 bits needed if offset is not considered (0 to
5). On the other hand, alphanumerical values will be stored as a simple database, that is,
assigning a numerical value to each data, i.e., type of milk can contain 4 values: “null”,
222 Data Storage
“cow”, “goat” and “sheep”, that will be linked to numerical values 0, 1, 2 and 3,
respectively.
Once all the fields and its associated number of bits are defined, the next step is to arrange
the whole tag. The purpose is that each stage of the production process will be associated
with a collection of blocks. Around two hundred variables related to the parameters
involved in the different phases of the cheese fabrication can be recorded in a tag. Some of
them appear in Table 1.
Variable Range
General Data
Product Code 1000-2000
Lot 010111-311299
Volume 500 lts-2500 lts
Pieces 50 uds-300 uds
Fit for consumption? yes/no
Number of tags 000000-999999
Specific data for the different stages
Reception date 01/01/11-31/12/99
Raw milk supplier Alphabetical
Tank Alphanumerical
pH 4,00 upH-7,00 upH
Acidity 13,00 ºD-18,00 ºD
Temperature 02,0 ºC-12,0 ºC
Chemical analysis
Fat 3,00 %- 8,00 %
Protein matter 2.50 %-7,00 %
Lactose 3,00 %- 6,00 %
sodium chloride 0,80 %-2,20 %
Microbiological analysis
Listeria monocytogenes yes/no
Salmonella spp. yes/no
Staphylococcus aureus 0-20.000 ufc/gr
Yield control
Lot weight 50,0 kgr-400.0 kgr
Lot yield
4,00-12,00
(Volume/Weight)
Average unit weight 750 gr-1400gr
Control of Clients
Client Reference Alphanumerical
Order number
1-300/0,80-400
(units/kgr)
Table 1. Some parameters stored in the tags
3. Experimental results
The systems have been tested in the industries that collaborate in this work. Previous to
place the prototypes in the companies, exhaustive tests were done on tags in laboratory in
Use of RFID tags for data storage on quality control in cheese industries 223
order to prove proper working under identical operation conditions than in factory. The
simulated conditions were temperature, humidity, biological contamination, acid corrosion,
ammoniacal gases and immersion on saline solutions, inhibiting substances, sugars, colorant
pigments, preservative substances, oils action (Figure 9 (a)) and mould growing (Figure 9
(b)).
(a)
(b)
Fig. 9. Test on tags in laboratory
(a) Immersion on saline solution, olive oil and paprika. (b) Mould growing.
Besides, physical tests have been also made which include friction and flexibility because in
ripening chambers, cheeses with its corresponding tags are subject to turns, shelving
changes, frictions and personnel manipulations. In all these cases, no significant negative
effects have been reported in communication with tags, with the exception of data reading
with metallic materials in the range of the reader, which reported erroneous reading.
Finally, in order to confirm that tags were suitable to be attached to the products from the
start of the fabrication to the end, the tags were placed on the surface of the cheese at the
224 Data Storage
beginning of the production. As can be seen in Figure 10, which corresponds to photographs
taken at different stages of production, both types of tags used as individual labels remained
(a)
Individual tags
Batch tag
(b)
(c)
Fig. 10. Tags in different stages of production
Use of RFID tags for data storage on quality control in cheese industries 225
(a) Before molding. (b) After pressing. (c) At the end of production attached to the rind of
the cheese until the end of the process without reporting errors in the reading/writing
process. Figure 10 also shows the containers with their batch labels.
Regarding the signal range, for the PC application the system is able to read/write tags
inside a radius of around 20 cm, whereas the PPC reader has a limited range of around 25
mm, in both cases, enough for our purposes.
Finally, Figure 11 displays the way in which data are updated using the PPC system. The
application is user-friendly so that employees of factories in charge of quality control and
technical staff of Regulatory Board of P.D.O. responsible for quality certification could use
the system easily.
4. Conclusion
Two different systems that perform the reading/writing task with RFID tags in a cheese
industry have been implemented. One of them is based on a Personal Computer whereas
the other solution uses a PocketPC providing the application with the required portability.
The main objective has been to make available to the factory the complete traceability of the
products, in individual and by batches mode, as well as to provide the technical staff of
Regulatory Boards of P. D. O. with a tool that facilitates the process of quality certification.
The tags have been tested under different conditions of temperature, humidity, acid
corrosion, ammoniacal gases and immerse in saline solutions with inhibiting substances,
sugars, colorant pigments, preservative substances and oils. Besides, physical tests have
been also made which include friction and flexibility. In all the situations, no significant
negative effects have been reported in communication with tags, excepting for metallic
materials in the range of the reader. Around two hundred variables related to the
parameters involved in the different stages of the cheese production can be stored in the tag,
which improve considerably the quality and yield control of the production plant.
226 Data Storage
5. Acknowledgements
This work has been financially supported by the Ministry of Infrastructures and
Technological Development of the Government of Extremadura, under Grant PDT05A042
and PDT08A041 with the economic assistance of the European Union (FEDER).
6. References
ACG Multi ISO Plug-In Reader, (2006). ACG Identification Technologies, RDHP-0206P0-02
ACG RFID Reader DLL, (2006). User Manual, Rev 1.0, January 2006
Certification Criteria CC/3, (2005), Requirements to be met by the dairy industry, Manual of
quality of the Regulatory Board of P.D.O., Torta del Casar, (4), 1-8, Casar de Cáceres
Certification Criteria CC/4, (2007), Characteristics of the "Torta del Casar" Produced. Manual
of quality of the Regulatory Board of P.D.O., Torta del Casar, (5), 1-6, Casar de Cáceres
Coscarón, C.; Gil, M. & Legaz, E. (2007). Revista Alimentaria, No. 383, May 2007, 48-55, ISSN
03005755
Decree 1808, (1991) Which governs the indications or marks identifying the batch to which a
foodstuff belongs
Decree 217, (2004), Which regulate the identification and registration of staff, facilities and
containers involved in the dairy sector, and record the movements of the milk
Finkenzeller, K. (2003). RFID Handbook: Fundamentals and Applications in contactless smart
cards and identification, John Wiley and Sons, ISBN 0-470-84402-7
Hewlett Packard iPAQ h5550 with Windows Pocket PC, (2003)
Labelys traçabilité, (2007). RFID casein plate
Order 25/4, (1997). Laying down the regulations of the Denomination of Origin “Ibores
cheese” and its Regulatory Board
Order 11/1, (1999). Official recognition of the Designation of Origin "Torta del Casar"
Order 9/10, (2001). Adoption of the Rules of the Designation of Origin "Torta del Casar"
Order 6/5, (2002). Ratification of the Rules of the Designation of Origin "Torta del Casar"
Pérez-Aloe, R.; Valverde, J. M.; Lara, A; Carrillo, J. M.; Roa, I. & González, J. (2007).
Application of RFID tags for the overall traceability of products in cheese
industries, Proceedings of 1st Annual RFID Eurasia Conference, pp. 268-272, ISBN 978-
975-01566-0-1, Turkey, September 2007, Istanbul
Philips Semiconductors, (2003). I-Code SL2 ICS20, Smart Label IC, Rev. 3.0, January 2003
Regulation 178, (2002). Laying down the principles and requirements of food law,
establishing the European Food Safety Authority and laying down procedures
relating to food safety
Regulation 1491, (2003). Registration of Protected Designation of Origin "Torta del Casar" in
the Register of Protected Designations of Origin and Protected Geographical
Indications
Spanish Agency for Food Security (2004). Guide for the application of traceability in food
companies. Coimán S. L., NIPO 355-04-001-9, Madrid
Texas Instruments_RFID, (2002). HF Reader System Series 6000, S6350 Midrange Reader
Module, September 2002
Texas Instruments_RFID, (2002). HF Reader System Series 6000, Gate Antenna, September 2002
Texas Instruments, (2005). Tag-itTM HF-I Plus Transponder Inlays, December 2005