Number Systems For Deep Neural Network Architectures 1st Edition Alsuhli - The Ebook in PDF and DOCX Formats Is Ready For Download
Number Systems For Deep Neural Network Architectures 1st Edition Alsuhli - The Ebook in PDF and DOCX Formats Is Ready For Download
com
https://textbookfull.com/product/number-systems-for-deep-
neural-network-architectures-1st-edition-alsuhli/
OR CLICK HERE
DOWLOAD EBOOK
https://textbookfull.com/product/artificial-neural-networks-with-java-
tools-for-building-neural-network-applications-1st-edition-igor-
livshin/
textbookfull.com
Intelligent Systems II: Complete Approximation by Neural
Network Operators 1st Edition George A. Anastassiou
(Auth.)
https://textbookfull.com/product/intelligent-systems-ii-complete-
approximation-by-neural-network-operators-1st-edition-george-a-
anastassiou-auth/
textbookfull.com
https://textbookfull.com/product/sustainable-wireless-network-on-chip-
architectures-1st-edition-murray/
textbookfull.com
https://textbookfull.com/product/campus-network-architectures-and-
technologies-1st-edition-ningguo-shen/
textbookfull.com
https://textbookfull.com/product/deep-learning-for-vision-systems-1st-
edition-mohamed-elgendy/
textbookfull.com
https://textbookfull.com/product/deep-learning-for-vision-systems-1st-
edition-mohamed-elgendy-2/
textbookfull.com
Synthesis Lectures on
Engineering, Science, and Technology
Number Systems
for Deep Neural
Network Architectures
Synthesis Lectures on Engineering, Science,
and Technology
The focus of this series is general topics, and applications about, and for, engineers
and scientists on a wide array of applications, methods and advances. Most titles cover
subjects such as professional development, education, and study skills, as well as basic
introductory undergraduate material and other topics appropriate for a broader and less
technical audience.
Ghada Alsuhli · Vasilis Sakellariou ·
Hani Saleh · Mahmoud Al-Qutayri ·
Baker Mohammad · Thanos Stouraitis
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole
or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage
and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or
hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does
not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective
laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give
a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that
may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Acknowledgements
This work was supported by the Khalifa University of Science and Technology under
Award CIRA-2020-053.
v
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction to Number Systems for DNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Book Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Deep Neural Networks Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 DNN Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Basic DNN Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Popular Network Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.2 Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.3 Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Activation Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 DNN Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Conventional Number Systems for DNN Architectures . . . . . . . . . . . . . . . . . . . . 17
3.1 FLP for DNN Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1 Floating Point Arithmetic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.2 FLP for DNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 FXP for DNN Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.1 FXP for DNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 CNSs for DNNs Training and Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.1 DNNs Inference Based on CNSs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.2 DNNs Training Based on CNSs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4 LNS for DNN Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1 Logarithmic Number System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 End-to-End LNS-Based DNN Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2.1 Addition in LNS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
vii
viii Contents
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Acronyms
ADA Add-decode-accumulate
AI Artificial intelligence
ASICs Application-specific integrated circuit
BFP Block floating point
CLBs Configurable logic blocks
CNNs Convolutional neural networks
CPUs Central processing units
DFXP Dynamic fixed point
DNNs Deep neural networks
ELU Extended linear unit
FLP Floating point
FPGAs Field programmable gate arrays
FPU Floating point unit
FXP Fixed point
GPUs Graphics processing units
K-L Kullback–Leibler
LNS Logarithmic number system
LSTM Long short-term memory
LUTs Look-up tables
MAC Multiply accumulate
NLP Natural Language Processing
PNS Posit Number System
ReLU Rectified linear unit
RNNs Recurrent neural networks
SOC System-on-chip
Unum Universal number
xi
Introduction
1
Abstract
In this introductory chapter, we provide an overview of the main topics covered in this
book and the motivations to write it. The importance of efficient number systems for
Deep Neural Networks (DNNs) and their impact on hardware design and performance
are emphasized. In addition, we list the various number systems that will be discussed in
detail in the subsequent chapters. Finally, we outline the organization of the book with
a summary of the contents of each chapter, to offer readers a clear roadmap of what to
expect while exploring number systems for DNNs in this book.
During the past decade, DNNs have shown outstanding performance in a myriad of Artificial
Intelligence (AI) applications. Since their success in both speech [1] and image recognition
[2], great attention has been drawn to DNNs from academia and industry, which subsequently
led to a wide range of products that utilize them [3]. Although DNNs are inspired by the
deep hierarchical structures of the human brain, they have exceeded human accuracy in
a number of domains [4]. Nowadays, the contribution of DNNs is notable in many fields
including self-driving cars [5], speech recognition [6], computer vision [7], natural language
processing [8], and medical applications [9]. This DNN revolution is helped by the massive
accumulation of data and the rapid growth in computing power [10].
Due to the substantial computational complexity and memory demands, accelerating
DNN processing has typically relied on either high-performance general-purpose compute
engines like Central Processing Units (CPUs) and Graphics Processing Units (GPUs), or
customized hardware such as Field Programmable Gate Arrays (FPGAs) or Application-
Specific Integrated Circuits (ASICs) [11]. While general-purpose compute engines continue
to dominate DNN processing in academic settings, the industry places greater emphasis on
deploying DNNs in resource-constrained edge devices, such as smartphones or wearable
devices, which are commonly used for various practical applications [3]. Whether DNNs
are run on GPUs or dedicated accelerators, speeding up and/or increasing DNN hardware
efficiency without sacrificing their accuracy continues to be a demanding task. The literature
includes a large number of works that have been dedicated to highlighting the directions
that can be followed to reach these goals [3, 4, 12–15]. Some examples of these directions
are DNN model compression [16], quantization [13], and DNN efficient processing [4, 12].
One of the directions that have a great impact on the performance of DNNs, but has not been
comprehensively covered yet is the DNN number representation.
As the compute engines use a limited number of bits to represent values, real numbers
cannot be infinitely represented. The mapping between a real number and the bits that rep-
resent it is called number representation or number system or data format [17]. Figure 1.1
shows an example to illustrate how a number can be represented differently with different
number systems and how the choice of the number system directly affects the number of
bits and cause different approximation to be happen. As a result, number representation
has a great impact on the performance of both general-purpose and customized compute
engines. DNNs encompass the learning of millions or even billions of parameters during
model construction. As a result, the sheer volume of data associated with DNNs becomes
substantial, requiring significant processing capabilities. Consequently, the choice of data
representation format becomes crucial, impacting various aspects such as data precision,
storage requirements, memory communication, and the implementation of arithmetic hard-
=3.1416
Fig. 1.1 An illustrative example of how the number π = 22 7 can be represented using three well-
known number systems: fixed point, single-precision floating point, and double-precision floating
point. The approximations associated with each of these representations are illustrated
ware [18]. These factors, in turn, significantly influence key performance metrics of DNN
architectures, including accuracy, power consumption, throughput, latency, and cost [12].
To this end, there is a significant body of literature that has focused on assessing the
suitability of specific number systems for DNNs, modifying conventional number systems
to fit DNN workloads, or proposing new number systems tailored for DNNs. Some of the
leading companies, such as Google [19], NVIDIA [20], Microsoft [18], IBM [21], and Intel
[22–24], have contributed in advancing the research in this field. A comprehensive discussion
of these works will be helpful to furthering the research in this field.
While conventional number systems like Floating Point (FLP) and Fixed Point (FXP) rep-
resentations are frequently used for DNN engines, several unconventional number systems
are found to be more efficient for DNN implementation. Such alternative number systems
are presented in this book and include the Logarithmic Number System (LNS), Residue
Number System (RNS), Block Floating Point Number System (BFP), Dynamic Fixed Point
Number System (DFXP), and Posit Number System (PNS). This book aims to provide a
comprehensive discussion about alternative number systems for more efficient representa-
tions of DNN data. As an extension of our survey paper [25], it delves deeper into these
alternative representations, offering an expanded discussion. The impact of these number
systems on the performance and hardware design of DNNs is considered. In addition, this
book highlights the challenges associated with each number system and various solutions
that are proposed for addressing them. The reader will be able to understand the impor-
tance of an efficient number system for DNN, learn about the widely used number systems
for DNN, understand the trade-offs between various number systems, and consider various
design aspects that affect the impact of number systems on DNN performance. In addition,
the recent trends and related research opportunities will be highlighted.
• Chapter 2 provides a background of DNNs including their basic operations, types, main
phases (training and inference), and an overview of their hardware implementations.
• Chapter 3 gives an overview of conventional number systems and their utilization for
DNNs.
• Chapter 4 classifies the DNNs that adopt the logarithmic number system.
• Chapter 5 describes the concepts behind the residue number system and its employment
for DNNs.
• Chapter 6 describes the block floating point representation and the efforts done to make
it suitable for DNNs implementation.
• Chapter 7 discusses the dynamic fixed point format and the work done to calibrate the
parameters associated with this format.
• Chapter 8 explains various DNN architectures that utilize Posits and the advantages and
disadvantages associated with these architectures.
• Chapter 9 concludes the book and provides insight into recent trends and research oppor-
tunities in the field of DNN number systems.
References
1. Deng, L., Li, J., Huang, J.T., Yao, K., Yu, D., Seide, F., Seltzer, M., Zweig, G., He, X., Williams, J.,
et al.: Recent advances in deep learning for speech research at Microsoft. In: IEEE International
Conference on Acoustics, Speech and Signal Processing, pp. 8604–8608. IEEE (2013)
2. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional
neural networks. Commun. ACM. 60(6), 84–90 (2017)
3. Guo, Y.: A survey on methods and theories of quantized neural networks (2018).
arXiv:1808.04752
4. Sze, V., Chen, Y.H., Yang, T.J., Emer, J.S.: Efficient processing of deep neural networks: a tutorial
and survey. Proc. IEEE. 105(12), 2295–2329 (2017)
5. Gupta, A., Anpalagan, A., Guan, L., Khwaja, A.S.: Deep learning for object detection and scene
perception in self-driving cars: survey, challenges, and open issues. Array. 10, 100057 (2021)
6. Shewalkar, A.: Performance evaluation of deep neural networks applied to speech recognition:
RNN, LSTM and GRU. J. Artif. Intell. Soft Comput. Res. 9(4), 235–245 (2019)
7. Buhrmester, V., Münch, D., Arens, M.: Analysis of explainers of black box deep neural networks
for computer vision: a survey. Mach. Learn. Knowl. Extr. 3(4), 966–989 (2021)
8. Otter, D.W., Medina, J.R., Kalita, J.K.: A survey of the usages of deep learning for natural
language processing. IEEE Trans. Neural Netw. Learn. Syst. 32(2), 604–624 (2020)
9. Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Khanna, A., Shankar, K., Nguyen, G.N.: An effec-
tive training scheme for deep neural network in edge computing enabled internet of medical
things (IoMT) systems. IEEE Access. 8, 107112–107123 (2020)
10. Alam, M., Samad, M., Vidyaratne, L., Glandon, A., Iftekharuddin, K.: Survey on deep neural
networks in speech and vision systems. Neurocomputing. 417, 302–321 (2020)
11. LeCun, Y.: Deep learning hardware: past, present, and future. In: IEEE International Solid-State
Circuits Conference-(ISSCC), pp. 12–19. IEEE (2019)
12. Sze, V., Chen, Y.H., Yang, T.J., Emer, J.S.: Efficient processing of deep neural networks. Synth.
Lect. Comput. Arch. 15(2), 1–341 (2020)
13. Gholami, A., Kim, S., Dong, Z., Yao, Z., Mahoney, M.W., Keutzer, K.: A survey of quantization
methods for efficient neural network inference (2021). arXiv:2103.13630
14. Wu, C., Fresse, V., Suffran, B., Konik, H.: Accelerating DNNs from local to virtualized FPGA
in the cloud: a survey of trends. J. Syst. Arch. 119, 102257 (2021)
15. Ghimire, D., Kil, D., Kim, S.h.: A survey on efficient convolutional neural networks and hardware
acceleration. Electron. 11(6), 945 (2022)
16. Choudhary, T., Mishra, V., Goswami, A., Sarangapani, J.: A comprehensive survey on model
compression and acceleration. Artif. Intell. Rev. 53(7), 5113–5155 (2020)
17. Gohil, V., Walia, S., Mekie, J., Awasthi, M.: Fixed-posit: a floating-point representation for error-
resilient applications. IEEE Trans. Circuits Syst. II Express Briefs. 68(10), 3341–3345 (2021)
18. Darvish Rouhani, B., Lo, D., Zhao, R., Liu, M., Fowers, J., Ovtcharov, K., Vinogradsky, A.,
Massengill, S., Yang, L., Bittner, R., et al.: Pushing the limits of narrow precision inferencing
at cloud scale with Microsoft floating point. Adv. Neural Inf. Process. Syst. 33, 10271–10281
(2020)
19. Wang, S., Kanwar, P.: BFloat16: The secret to high performance on cloud TPUs. Google Cloud
Blog 30 (2019)
20. Choquette, J., Gandhi, W., Giroux, O., Stam, N., Krashinsky, R.: Nvidia A100 tensor core GPU:
performance and innovation. IEEE Micro. 41(2), 29–35 (2021)
21. Gupta, S., Agrawal, A., Gopalakrishnan, K., Narayanan, P.: Deep learning with limited numerical
precision. In: International Conference on Machine Learning, pp. 1737–1746. PMLR (2015)
22. Kalamkar, D., Mudigere, D., Mellempudi, N., Das, D., Banerjee, K., Avancha, S., Vooturi, D.T.,
Jammalamadaka, N., Huang, J., Yuen, H., et al.: A study of BFLOAT16 for deep learning training
(2019). arXiv:1905.12322
23. Köster, U., Webb, T., Wang, X., Nassar, M., Bansal, A.K., Constable, W., Elibol, O., Gray, S.,
Hall, S., Hornof, L., et al.: Flexpoint: An adaptive numerical format for efficient training of deep
neural networks. Adv. Neural Inf. Process. Syst. 30 (2017)
24. Popescu, V., Nassar, M., Wang, X., Tumer, E., Webb, T.: Flexpoint: Predictive numerics for deep
learning. In: 2018 IEEE 25th Symposium on Computer Arithmetic (ARITH), pp. 1–4. IEEE
(2018)
25. Alsuhli, G., Sakellariou, V., Saleh, H., Al-Qutayri, M., Mohammad, B., Stouraitis, T.: Number
systems for deep neural network architectures: a survey. arXiv preprint arXiv:2307.05035 (2023)
Abstract
The following Chapter introduces the reader to basic DNN concepts. It provides the
definition of the basic DNN operations which are directly related to the underlying num-
bering system. It also decibels popular DNN models used in modern AI systems. Finally,
an overview of the standard training procedure (gradient descent, backpropagation) is
given.
The term “Neural Networks” originated from the efforts to mathematically model the infor-
mation processing mechanism of biological systems. The McCulloch-Pitts model, developed
in 1943, proposed a basic function for a neuron that involves applying a linear function to an
input vector, followed by a non-linear decision or activation function that produces an out-
put. By connecting multiple computing units (neurons) and organizing them into layers, the
first forms of neural networks were created. Since then, neural network research has made
tremendous progress, with simple and small networks evolving into complex architectures
with multiple layers, forming the basis of deep learning. In Deep Neural Networks (DNNs),
the extraction of the essential input features becomes part of the training process, resulting
in impressive results in pattern recognition applications such as computer vision and natural
language processing. The progress in neural network research can be attributed to advance-
ments in network architectures, learning algorithms, and the availability of large training
datasets. Additionally, improvements in hardware, including CPUs, GPUs, as well as spe-
cialized application-specific (ASIC) circuits, have also been a major factor in the success of
DNN models. The new hardware architecture paradigms led to the efficient processing of
large amounts of data and facilitated the training of large models, resulting in remarkable
improvements in the accuracy of neural networks.
The basic arithmetic operation that a DNN node performs is the dot product of a weight
vector and an input vector:
L
y=F Wi X i + b (2.1)
i=0
X i , Wi and b denote the input and the parameters (weights and biases) of the DNN node,
respectively. Then, a non-linear activation function F is applied to the intermediate dot
product result, S, to give the final output Y of each neuron. The selection of activation
functions is presented in Sect. 5.3.4. Figure 2.1 depicts such a node, where the output Y
feeds the next layer nodes. For a practical DNN implementation the processing of the huge
number of weights and biases, as well as the associated data transfers, become a bottleneck
for DNN processing. DNN processing can refer either to inference, where a trained model
makes prediction about a given task, and training, where the model learns from data points
of a given dataset related to a certain task.
Fig. 2.1 A typical DNN node computes the dot product of its weight and input vectors and applies
a non-linear activation function
CNNs are a fundamental component of deep learning and have revolutionized many areas
of machine learning, including computer vision and natural language processing. In CNNs,
the network itself extracts, through the iterative adaptation of the filter coefficients, the input
features with the highest information content. These features include edges, corners, and
textures which can then be used to classify, segment, or recognize objects. An important
property of convolution is its ability to preserve spatial information, enabling the use of
CNNs for tasks such as object detection and localization as well.
CNNs can capture spatial dependencies inside the input. They are most commonly used
in image and video processing tasks. Key aspects of CNNs are the sparse connections and the
parameter sharing scheme [1]: Unlike traditional neural networks (Multi-layer Perceptron),
that use dense matrix multiplication and each output unit interacts with each input unit, CNNs
utilize smaller kernels with significantly less parameters (Fig. 2.2). In the case of a two-
dimensional input image, for example, neighboring pixel regions are processed separately
by neurons that share weights, i.e., transform the image in the same way. This cannot only
dramatically decrease their computational complexity and memory requirements, but can
also improve generalization and mitigate overfitting.
The elementary operation that each CNN neuron performs is the 2-d convolution. At each
layer, a 3-d input, typically referred to as feature map, consisting of a number of channels
is convolved with multiple kernels (filters). The convolution consists of a series of nested
loops, where each kernel slides over the two spatial dimensions to calculate an output pixel
output, according to Algorithm 1. Cin and Cout are the number of input and output channels,
X , Y are the feature map’s dimensions, FX , FY are the filter’s dimensions, I , O are the
input and output feature maps, respectively, and W the weight tensor. Thus, the convolution
simply consists of multiplying input feature maps with weights and accumulating these
partial products. It follows that the basic arithmetic operation, in all CNNs, is the multiply-
Fig. 2.2 Fully connected (left) vs convolutional network (right) connectivity. The weight-sharing
scheme of the convolutional networks drastically reduces the number of parameters and computational
complexity of the model and at the same time improves its generalization capability
accumulate (MAC) operation. Multiple convolutional layers are stacked together to create
deeper models, where each layer is assumed to represent information at a different level
of abstraction. In the initial layers, filters of larger dimensions are used, while in the final
layers, smaller ones are used.
Typically, pooling layers are added between certain convolutional layers. These layers
calculate the mean or maximum of the input to gradually reduce its spatial dimensions
while preserving their important features. The pooling operation also helps the obtained
representation become approximately invariant to small translations of the input. A typical
CNN structure is depicted in Fig. 2.3.
Popular CNN architectures that have given impassive results in computer vision tasks
are presented in Table 2.1. These networks are composed of million parameters and require
billions of operations, mainly MAC operations. They are also typically used as benchmarks
to assess the performance of different hardware platforms
Fig. 2.3 Typical CNN structure. A series of convolutional and pooling layers is applied to the original
image, to generate feature maps at different abstraction levels. The spatial dimensions of the feature
maps is gradually decreased using pooling layers or convolution with stride greater than one, while
the number of filters(kernels) increases when moving towards the end of the convolution pipeline. A
fully connected layer, or generally a MLP classifier, is used at the end of the network to obtain the
final predictions
RNNs networks process information in successive time steps. The output of some neurons
at a certain time can be fed back as input to other neurons at the next time step [2]. They,
therefore, introduce a kind of memory in the processing of information and are particularly
suitable when input samples exhibit temporal dependencies, such as in speech recognition
or natural language processing applications. A typical RNN structure is depicted in Fig. 2.4.
One of the most on RNN units is the Long Short-term Memory (LSTM cell), which is used
to model to capture temporal dependencies between the input samples. Assume an input
sequence Y = {y1 , y2 , . . . , yt }, where yt is the input of the RNN at time t. An LSTM is
defined by the following set of equations
i t = σ W i xt ⊕ U i h t−1 ⊕ bi , f t =σ W f xt ⊕ U f h t−1 ⊕ b f
ot = σ W o xt ⊕ U o h t−1 ⊕ bo , ct = f t ct−1 ⊕ i t c˜t ,
c˜t = tanh W c xt ⊕ U c h t−1 ⊕ bc , h t = ot tanh (ct ) ,
where W k , U k , and bk , with k = i, f , o, c, are parameters of the RNN and are computed
during the training process. Symbols and ⊕ denote element-wise multiplication and
addition respectively. The input of the LSTM layer is xt and for the input LSTM layer, it
holds that yt = xt .
Fig. 2.4 Typical RNN structure. RNNs process sequential input and introduce a memory mechanism
with the hidden state h of the cell, which is updated according to the previous state and current input
(through trainable weight matrices W and U ). The output of the cell is also a function of current state
and input. Different RNN structures (like Gated Recurrent Unit or LSTM) are possible for the RNN
cells
2.3.3 Transformers
Transformer neural networks are a type of deep learning model that have significantly
advanced natural language processing (NLP) tasks. They were first introduced in 2017 by
Vaswani et al. [3] and have since become the standard architecture for a broad range of NLP
tasks, such as machine translation, question answering, and language modeling.
In contrast to traditional neural networks that process input data sequentially, transform-
ers can simultaneously process all input tokens, thereby enabling more efficient computa-
tion. This is achieved through self-attention mechanisms that allow the model to weigh the
importance of each token in the input sequence when computing the output. This mechanism
allows capturing long-range dependencies in the input, which is crucial for NLP tasks where
understanding the context of the text is essential.
The transformer model is composed of an encoder and a decoder. The encoder processes
the input sequence and produces a representation of the input, while the decoder uses this
representation to generate an output sequence. The transformer has demonstrated state-
of-the-art performance on numerous NLP benchmarks, and has revolutionized the field of
NLP.
Activation functions play a key role in NNs by introducing nonlinearity. This nonlinear-
ity allows neural networks to generate more complex representations and approximate a
wider variety of functions that would not be possible with a simple linear model. Common
Activation functions utilized in modern deep-learning models include:
• Sigmoid Function
1
σ (x) = (2.2)
1 + e−ax
The sigmoid function, also referred to as the logistic function, is one of the most commonly
used activation functions and is particularly suitable for binary classification tasks. The
sigmoid function maps its input to the range (0,1) and thus it can be directly interpreted
as the class probability.
• Hyperbolic Tangent Function
eax − e−ax
tanh(x) = (2.3)
eax + e−ax
Unlike the sigmoid function, which only produces positive values, the output of tanh lies
between −1 and 1. It can sometimes result in faster training convergence. Sigmoid and
tanh functions are also used in recurrent networks, like long short-term memory (LSTM)
networks.
• Rectified linear activation function (ReLU)
The ReLU function is by far the most common choice in modern CNNs. It is a piecewise
linear function that outputs the input if it is positive, otherwise, it will output zero.
Unlike tanh and the sigmoid function, its output does not saturate. ReLU can overcome
the vanishing gradient problem and it generally leads to faster convergence and better
network performance. It is also more hardware friendly, thus it can further accelerate the
training process.
• Leaky ReLU
• Softmax function
e xi
σ (X)i = K
for i = 1, . . . , K (2.6)
xj
j=1 e
The softmax function is most commonly used in the final layer of a multi-class classi-
fier in order to normalize its outputs, converting them from weighted sum values into
probabilities that sum to 1. Softmax applies the exponential function to each element xi
of the input vector X and normalizes these values by dividing by the sum of all these
exponentials.
The goal of any training process is to adjust the parameters of the network, i.e., the neuron’s
weights so that the network approximates the desirable function based on the task definition.
Therefore, once a network topology is determined, an error function E must be defined,
which quantifies the deviation of the network output from the desired output for the set of
input examples. Then, an appropriate algorithm is selected, which optimizes the network
parameters with respect to the error function.
∂E
=0 (2.7)
∂w
Since an analytical solution to this equation is usually not possible, iterative numerical
methods are used. Generally, the methods for training a neural network can be classified
into supervised, unsupervised, or competitive methods. The most commonly used training
method is based on supervised learning through the gradient descent algorithm, in combi-
nation with the backpropagation method of error. Let C be a cost function of the minimum
squares.
1
N
E= (Y targi − Y pr edi )2 (2.8)
N
i=1
E represents the mean squared error for the set of the training vectors and is a measure
of the distance between the network and the desired state. In the backpropagation training
algorithm, the weight update is calculated based on the contribution of each weight to the
total error. Considering the simplest form of ANN with two inputs and one output, the
square of the error as a function of weights has the general form of a parabola with a hollow
upward. The delta rule, also known as the method of gradient descent, follows the negative
slope of the surface toward its minimum, moving the weight vector toward the ideal vector,
according to the partial derivative of the error with respect to each weight.
∂E
Dwi j = −a (2.9)
∂wi j
The weight updates are calculated iteratively using the chain derivation rule
∂ E ∂ yi
Dwi j = −a (2.10)
∂ yi ∂wi j
where yi is the output of each neuron. By setting
∂E ∂ E ∂oi
di = − =− (2.11)
∂ yi ∂oi ∂ yi
where oi is the intermediate result of the neuron, before the application of the non-linear
activation function.
For the output neurons, the coefficients di take the form
∂ f (yi )
di = (bi − oi ) (2.12)
∂ yi
while for the hidden layer neurons, we have
∂oi
K
di = dk wki (2.13)
∂ yi
k=1
The weight updates can be obtained as
References
1. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. Springer (2015)
2. Bishop, C.: Pattern Recognition and Machine Learning. Springer (2006)
3. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin,
I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Abstract
The two conventional number systems, namely the floating point and fixed point, are com-
monly used in almost all general-purpose DNN engines. While the FLP representation
is usually used for modern computation platforms (e.g., CPUs and GPUs), where high
precision is required, FXP is more common in low-cost computation platforms used for
applications that require high speed, low power consumption, and small chip area. This
chapter introduces these two representations and briefly discusses their utilization for
implementing DNN hardware, in order to facilitate a comparison between conventional
and unconventional number systems presented in subsequent chapters.
In the FLP number system, a number n is represented using a sign (1 bit), an exponent e
(unsigned integer of length es), and a mantissa m (unsigned integer of length ms) (Fig. 3.1)
and its value is given by
m
n = (−1)s × 2e−emax × (1 + ), (3.1)
2ms
where emax = 2es−1 − 1 is a bias used to ease the representation of both negative and positive
exponents.
Although there are several FLP formats [1], the IEEE 754 FLP format [2], shown in
Fig. 3.2, is the most common representation used by modern computing platforms [3, 4].
According to IEEE 754, the FLP can be of half, single, double, or quad-precision depending
on the used bit-widths (e.g., for the single-precision FLP the bit-width is 32 bits and es = 8).
The single-precision FLP, also called FLP32, is commonly used as a baseline to evaluate the
… …
… …
...
… …
Fig. 3.1 The bit representation for the floating point number system
Sign Exponent FracƟon
(5 bit) (10 bit)
Half- precision
.
15
.
10
.
0
Single-precision
.
31 23
. .
0
Double-precision
.
63
.
52
.0
Quad-precision
Exponent FracƟon
Sign (15 bit) (112 bit)
.
127
.
112
.
0
efficiency of other number representations. Unless otherwise stated, the performance degra-
dation or enhancement is presented in comparison to the FLP32 format in this book as well.
Operand 1 Operand 1
sign Exponent ManƟssa sign Exponent ManƟssa
ManƟssa
XOR Exponent Adder
MulƟplicaƟon
NormalizaƟon
Exponent Update
Rounding
Result
Fig. 3.3 Block diagram of an arithmetic unit dedicated to FLP multiplier
For instance, Fig. 3.3 illustrates the block diagram of the FLP multiplier architecture.
It demonstrates that multiplying two FLP numbers involves adding their exponents, mul-
tiplying the mantissas, normalizing the resulting mantissa, and adjusting the exponent of
the product [5]. Similarly, FLP addition requires comparing the operand exponents, shifting
their mantissas (if the exponents differ), adding the mantissas, normalizing the sum mantissa,
and adjusting the sum exponent [1].
The increased complexity of FLP32 arithmetic often necessitates the use of a dedicated
unit called a Floating Point Unit (FPU) to perform FLP calculations [6]. However, the
high power consumption and cost associated with the FPU limit its usage within embedded
processing units like FPGAs [7]. As a result, the standard FLP32 format is rarely employed
for building efficient DNN architectures [5].
To increase the efficiency of the FLP in DNN architectures several custom FLP formats
[8–11] have been proposed. Also, new designs of the FLP arithmetic hardware (mainly the
multiplier) have been investigated [5, 12].
The standard FLP representations have a wide dynamic range,1 as demonstrated by
Table 3.1. However, these representations have a non-uniform gap between two representable
numbers, resulting in a non-uniform error. Figure 3.4 illustrates how an 8-bit non-standard
FLP representation represents the numbers between -15 to 15, and shows that the error
is smaller near zero but increases when the FLP is used to represent very small or large
numbers. As a result, for DNNs, quantization is typically used to scale the represented
values and bring them closer to zero, thereby taking advantage of the high precision near
zero. Therefore, the wide dynamic range of standard FLP representations is beyond what is
usually required for DNNs [5], resulting in a low information-per-bit metric, which means
an unnecessary increase in power consumption, area, and delay.
For this reason, the proposed custom FLP representations for DNNs mainly have reduced
bit-width and a different allocation of the bits to mantissa and exponent, than IEEE 754. The
bit-width is reduced to 19 bits in Nvidia’s TensorFloat32 [9] and 16 bits in Google’s Brain
FLP (bfloat16) [8] formats used in DNN training engines. 8-bit FLP has been proposed to
target the DNN inference in [10, 11]. These reduced FLP formats proved their efficiency
Table 3.1 A comparison between the smallest and largest numbers that can be represented using
single and double precision FLP
Smallest representable number Largest representable number
Single precision FLP 1.18 × 10−38 3.4 × 1038
Double precision FLP 1.8 × 10−308 2.23 × 10308
8-bit FLB
Fig. 3.4 Representable values of 8-bit non-standard FLP on number line in the range [-16, 16]. The
Non-uniform gap between representable numbers is noticable
1 The dynamic range of a number system is the ratio of the largest value that can be represented with
this system to the smallest one.
in replacing FLP32 with comparable accuracy, higher throughput, and smaller hardware
footprint. It is worth noting that most of these custom FLP formats are used to represent data
stored in memory (i.e., weights and activations), whereas, for internal calculations (e.g.,
accumulation and weight updates), FLP32 is used instead to avoid accuracy degradation
[8, 11, 13].
In summary, the standard FLP representation has a massive dynamic range, which makes
it a good choice for computationally intensive algorithms that include a wide range of
values and require high precision. At the same time, the complex and power-hungry FLP
calculations make FLP less attractive for DNN accelerators. This leads to using narrower
custom FLP formats which require less hardware and memory footprint while preserving
the performance of the standard FLP32. However, the utilization of the FLP format for DNN
accelerators is relatively limited and it loses ground to a fixed point and other alternative
representations.
The power inefficiency of the FLP arithmetic is the primary motivation to replace it with
the FXP format for designing energy-constrained DNN accelerators. The bit representation
for the fixed point number system is presented in Fig. 3.5. A real number n is represented
in FXP with the sign, the integer, and the fraction parts. The fixed point format is usually
indicated by < I , F > where I and F correspond to the number of bits allocated to the
integer and the fractional parts, respectively. In this book, we use the notations FXP8, for
example, to denote the FXP representation with bit-width equal to 8, i.e., I + F + 1 = 8.
FXP has a uniform gap between two representable numbers, equals to 2−F , and, thus, a
uniform error.
In FXP format, the separation between the integer and the fractional parts is implicit
and usually done by specifying a scaling factor that is common for all data. Thus, the FXP
… …
...
… …
XXVII
XXVIII
THE RITZ
The Ritz Hotel and Restaurant will keep in the remembrance of
Londoners the name of the foremost hôtelier of our days, M. Ritz, a
man whose genius is written across Europe and America, from Paris
to Frankfort, from Biarritz to Salsomaggiore, from Lucerne to Madrid,
from Budapest to New York. Too much quick brain work
unfortunately has broken down M. Ritz's health, and he is never
likely to take any share again in the control of the hotels which bear
his name. He was the man who first taught the mass of the rich
English how to dine in cultured comfort in their own capital; yet to
the great majority of those who benefited by his perfect taste and
his genius for giving unostentatious luxury to the gourmets of the
world he was an unknown personality. Duchesses and actresses,
legislators and actors, explorers and curates, all are known to the
public by their photographs in shop windows and in the newspapers,
but I never saw a photograph of Ritz in a Regent Street shop or in a
journal.
It was by chance that he first came to England. When the Savoy
Hotel was opened M. Ritz was manager of the Hotel National at
Lucerne and of the Grand Hotel at Monte Carlo: Mr D'Oyly Carte
found him at the Grand Hotel, and asked him if he would come to
the Savoy for six months to put the restaurant in order. He came,
bringing with him M. Escoffier, who had been chef at the Grand. Ritz
at the Savoy made the supper after the theatre the popular meal it
still continues to be, though it is, thanks to the Early Closing Act, a
scramble to eat five-shillings' worth of food in half-an-hour, and he
also discovered, while at the Savoy, that if a restaurant wishes a
large number of its guests to be of the softer sex a band is a
necessity. He saw that an Austrian band, engaged at the suggestion
of Mr Hwfa Williams, kept the diners half-an-hour longer at their
tables over their cigars and coffee, and that ladies soon came to
consider a dinner unaccompanied by music a tame feast. For the
music, often over-loud, to the accompaniment of which I eat my
meals in most restaurants, I am not in the least thankful to M. Ritz;
but the majority of diners, especially those in petticoats, if such
things exist nowadays, think differently.
The fight to obtain music at restaurants on Sundays was one of M.
Ritz's great battles. I remember the days, not so very long ago,
when a band could not play on Sunday in a restaurant unless some
individual dinner-giver engaged it to play for his guests, and had no
objection to the other diners listening to it. Another advance made
by Ritz was the obtaining of newly baked bread for those who
lunched and dined at the Savoy restaurant on a Sunday. The baker
who at first supplied this bread broke some law or some regulation
in doing this, and was summoned; but M. Ritz, not to be beaten,
established a bakery in the hotel to supply the bread. Other
restaurants followed suit. He had an enormous facility for quick
work, no detail was too small for him, and when he had made up his
mind that a thing should be done he took unlimited trouble to have
it carried out. At one time, when he managed the Carlton, he could
not understand why the coffee made there should not be quite up to
the level of the coffee at his hotels on the Continent. He tried every
experiment possible, brought water from all parts of England, took
every precaution against the dampness of our climate, and finally
asked one of the Rümpelmayers, the great pastrycook family of the
south of France, to come to London to advise him in this matter.
I used to see M. Ritz at this period of his life very often, and used to
chat with him on matters of gourmandise. Very slim, very quiet, with
nervous hands clasped tightly together, he would move through the
big restaurant seeing everything, saying a word under his breath to
a head waiter, bowing to some of the diners, staying by a table to
speak to others, possessing a marvellous knowledge of faces and of
what the interests were of all the important people of his clientele.
There was a maxim, he said, which should be carved in golden
letters above the door of every maître d'hôtel, and that maxim was,
in English, "A customer is always right," and he always bore this in
mind. Whenever at that period M. Escoffier invented a new dish a
little jury of three, M. Escoffier, Madame Ritz and M. Ritz, used to sit
in judgment on it in solemn conclave before it was allowed to appear
on a menu in the restaurant. I once asked Madame Ritz, who has
been M. Ritz's real helpmate and counsellor throughout his married
life, to what quality she attributed her husband's success in life, and
she answered, "sensibility," giving the word its French meaning.
M. Ritz had a talent for doing the right thing at the right time in the
right way. I once saw him in the early morning on the platform of
the station in Rome. He looked, as he always looked, as though he
had come out of a band-box, well-shaved and well-brushed, the
ends of his moustache pointed upwards, his whiskers brought down
to the level of his mouth, wearing those dark garments of extreme
neatness which one always associates with the manager of hotels.
He was the one male person on the platform that morning who was
not dishevelled, nor tired, nor unshaven; but he had raced across
the Continent as fast as trains could carry him to be there to receive
a duke and duchess who were going to stay at the hotel in which he
had an interest.
A coup du maître d'hôtel, of which he told me afterwards with a
smile, was the method by which he put a large luncheon-party of
ladies on easy terms with each other. It was a luncheon given at the
Carlton and attended by the ladies who were sending the hospital-
ship out to South Africa during the Boer War. Many of the ladies did
not know each other well, and M. Ritz, exceedingly anxious that the
luncheon should be a success, feared that they might not be easily
conversational, so at the commencement of the feast he took round
a bottle of Château Yquem and suggested to each lady that a little
glass of white wine made a good beginning to lunch. In two minutes
every lady was chatting most pleasantly to her neighbours whether
she had ever seen them before or not. Of the determination of M.
Ritz in his early days to learn everything that was to be learned in
the restaurant world, I remember one instance, told me by his wife.
He held a well-paid post in one of the smart Parisian restaurants, but
left it to go to Voisin's at a smaller salary, because he thought there
was more to be learned in the good old restaurant in the Rue St
Honoré than in the other place of good cheer.
M. RITZ
But it is of the Ritz Restaurant, not of Ritz himself, that I am writing
in this chapter. I have read that the Ritz has swallowed up the site of
the old "White Horse" cellars, from which so many of the coaches
used to start, but the White Horse cellars had crossed the road a
century and a half before I began to know my London. The Isthmian
Club-house at one time occupied the portion of the site overlooking
the Green Park, and when the Club moved on to other quarters it
became the Walsingham, part chambers, part restaurant, one of the
group of houses and hotels which stretched from the Green Park to
Arlington Street. When M. Gehlardi managed the Walsingham, and
M. Dutru was its chef, there was no better dining place in London.
The great white stone building of the Ritz, with its arcaded front and
its entrance to the restaurant and ballrooms right in the middle of
the arcade, is a comparative new-comer to London, in that it was
opened in 1906. It is a building, inside and out, of the Louis XVI.
period, with every modern luxury added. The Winter Garden, where
one awaits one's guests, is a delightful place of creamy marble
pillars and gilt trellis-work, casemented mirrors, carved amorini and
a fountain with a gilt lead figure of "La Source" looking up at the
golden cupids poised above her. The little orchestra of the hotel
plays in this Winter Garden, and its music in no way interferes with
the conversation in the restaurant.
The restaurant itself may be said to be dedicated to Marie
Antoinette, for the gilt bronze garlands which hang from electrolier
to electrolier, forming an oval below the painted sky, were designed
to represent the flower decorations at one of Marie Antoinette's
feasts, and though the garlands have been much lightened, for at
first they were too heavy in design, they are still reminiscent of the
poor little queen who lived such a merry life and met so sad an end.
It is a restaurant of soft colours, of marbles, cream and rose and
soft green, of tapestried recesses and of handsome consoles in the
niches. Towards the Green Park long arched windows look on to one
of the pleasantest prospects in London, and below these windows
and between them and the Park is a little forecourt, in which a green
tent is pitched when a great ball is to be held in the suite of rooms
below the restaurant, and where on hot summer evenings dinner is
served in the open air. At one end of the restaurant is a gilt group of
Father Thames contemplating an exceedingly attractive lady who
represents the Ocean. Everything in the restaurant is of the Louis
XVI. period, and the Aubusson carpets and the chairs and all the
silver and the china and the glass used in the restaurant and the
banqueting rooms harmonise with that period.
The restaurant is not a very large one, and sometimes tables for its
guests are set in the Marie Antoinette room with which it connects,
and in that portion of the corridor which forms an ante-room. But
though it is not of a very great size, the Ritz has a most aristocratic
clientele. Royal personages often lunch and dine there, and
diplomacy regards it as its own particular dining place, for tables are
retained by the secretaries and attachés of two of the Embassies,
the German and the Austrian, and, I fancy, by a third one also.
Lady Amalthea had very graciously said she would dine with me at
the Ritz, so I went in the afternoon of a hot day to interview M.
Kroell, the manager, who stepped across Piccadilly from the Berkeley
to succeed M. Elles, who, for a time, managed both the Ritz in Paris
and the Ritz in London. With M. Kroell was M. Charles, the manager
in charge of the restaurant, and I asked that I might be given that
evening a little dinner for two, not of necessity an expensive dinner,
but one suitable for a warm evening, and I sent my compliments to
M. Malley, the chef de cuisine, and said that I hoped that I should
find some of the specialities of his kitchen amongst the dishes.
M. Malley came from the Ritz at Paris when the London Ritz was first
opened, having acquired his art at the Grand Véfour and the Café
Anglais. He presides over a very spacious range of white-tiled
kitchens, in which all the rooms which should be hot are divided by a
wide corridor from the rooms which should be cold, and he has a
talent for the invention of new dishes, amongst these being a very
splendid dish of salmon with a mousse of crayfish, which he has
named after the Marquise de Sévigné, a reminiscence of his days at
Vichy, and his pêches Belle Dijonnaise, of which more anon. Russian
soups are one of the specialities of the Ritz kitchen, and there is a
Viennese pastrycook amongst the members of M. Malley's brigade,
who makes exquisite pastry. The late King Edward had a special
fancy for the cakes made at the Ritz, and a supply used to be sent
to Buckingham Palace, but M. Elles told me that this was a State
secret, for M. Ménager, the King's chef, might not have liked it to be
known that anything from another kitchen entered Buckingham
Palace.
As I had left my dinner in the safe hands of the experts, so I also
left the question of the champagne we should drink, only asking that
it should be one recommended by the house.
Before going on my way I reminded M. Kroell that on the last
occasion that I had word with him he was presented with a
miniature in brilliants of the order bestowed on him by the King of
Spain, and I asked him if he had been awarded any other
decorations. M. Kroell laughed, and then modestly owned to the
German military medal, and as he told me this he involuntarily
squared his shoulders as an old soldier.
Lady Amalthea arrived with military punctuality (she is a soldier's
wife) in the best of spirits, wearing a dream of a dress, and her
diamonds and turquoises. A table had been kept for us at the upper
end of the room, where Lady Amalthea could both see all the guests
and be seen by them. She ran through a little selection from Debrett
as she took her seat, having scanned most of the diners as she
came in, and I was enabled to add to this by identifying a group at
one of the tables as some of the Peace Delegates from the Balkans.
Then we settled down to the infinitely important matter of seeing
what the dinner was that M. Malley and M. Charles in counsel had
arranged for us.
This is the menu, and though at first sight it seems a long one for
two people it is an exceedingly light dinner, and we neither of us ate
the tiny cutlets which were the gros pièce of the feast. The wine to
go with it was a bottle of Roederer 1906:
Melon.
Consommé Glacé Madrilène.
Filet de Sole Romanoff.
Cailles des Gourmets.
Côtes de Pauillac Montpensir.
Petits Pois.
Velouté Palestine.
Poulet en Chaudfroid.
Salade à la Ritz.
Pêche Belle Dijonnaise.
The melon, delightfully cold, struck the right note in a dinner for a
hot evening; the Madrilène soup, beautiful in colour and flavoured
with tomato and capsicum, carried on the summer symphony; the
Romanoff sole was quite admirable, served with small slices of apple
and artichokes and with mussels, the apple giving a suspicion of
bitter sweetness as a contrast to the flesh of the fish. M. Charles
happened to be near our table at this period, not, I think, quite by
chance. I assured him that if there was such a thing as a
gastronomic nerve M. Malley's creation had found it. The quails
formed part of a little pie brought to table in a pie-dish of old blue
willow pattern, and with them were coxcombs and truffles and other
good things. The poulet en chaudfroid was a noble bird, all white,
and in it and with it was a pink mousse delicately perfumed with
curry powder, a quite admirable combination. The Ritz salad is of
cœurs de romaine, with almonds and portions of tiny oranges with
it. Last of the dishes in the dinner came the pêche Belle Dijonnaise,
which is one of the creations which have made the fame of M.
Malley, and which will become historical. It is a delightful
combination of peaches and black currant ice with some cassis, a
liqueur of black currants, added to it, and it is called Belle Dijonnaise
because of the old Burgundian proverb: A Dijon, il y a du bon vin et
des jolies filles.
I do not doubt that many people dined well in London on that hot
June evening, but this I will warrant, that no two people, however
important they might be, or whatever they paid for their dinner (my
bill came to £2, 10s.), dined better than did Lady Amalthea and I at
the Ritz, and I make all my compliments to M. Malley.
I should not do the Ritz full justice if I did not refer to the banquets
which are served in the Marie Antoinette room and in the great
white suite below the restaurant. As typical of the Ritz banquets I
give you the menu of one that Lord Haldane gave to the foreign
officers visiting London in June 1912, and I also give the
accompanying wines:
Caviar d'Esturgeon.
Kroupnick Polonaise.
Consommé Viveur Glacé en Tasse.
Timbale de Homards à l'Américaine.
Suprême de Truite Saumonée à la Gelée de Chambertin.
Aiguillette de Jeune Caneton à l'Ambassade.
Courgettes à la Serbe.
Selle de Veau Braisée à l'Orloff.
Petits Pois. Carottes à la Crème.
Pommes Mignonette Persillées.
Soufflé de Jambon Norvégienne.
Ortolans Doubles au Bacon.
Cœurs de Laitues.
Asperges Géantes de Paris, Sauce Hollandaise.
Pêches des Gourmets.
Friandises.
Mousse Romaine.
Tartelettes Florentine.
Corbeille de Fruits.
Vins.
Gonzalez Coronation Sherry.
Berncastler Doctor, 1893.
Château Duhart Milon, 1875.
Heidsieck Dry Monopole, 1898.
G. H. Mumm, 1899.
Croft's Port, 1890.
La Grande Marque Fine, 1848.
The dinner looks at first glance to be an exceedingly long one, but it
is also an exceedingly light one, the saddle of veal being the only
substantial dish of the feast. The aiguillettes of duckling from one of
the special dishes at the Ritz, and the soufflés and the mousses that
come from the Ritz kitchens are always ethereal. This banquet is an
excellent example of a feast which is important without being heavy.
XXIX
Rinaldo, in the days when he was at the Savoy, used to stand at the
desk by the door and tell us all as we came in what tables had been
reserved for us. Of course, as maître d'hôtel, he had other duties,
but as he knew my whims concerning the position of my table, and
as he always sent me just where I wanted to be, I have him in
grateful remembrance for doing this. When he left the Savoy he set
up on his own account at No. 15 Wilton Road, which is just opposite
Victoria Station, and there, I am glad to say, he still flourishes. He is
no longer quite the slim Spanish don with a peaked black beard that
he used to be, but proprietorship has a waistcoat-filling effect on
restaurateurs, and time softens black hair with streaks of grey.
Rinaldo's restaurant is quite spacious, a high and airy room with
plenty of light. Its walls are of pleasant grey with decorations in high
relief in the upper part, and on the stained glass of the sky-light are
paintings of game and fruit. Baskets of ferns in the shape of boats
hang from the roof, and there are always bunches of roses on the
tables. Behind a screen at the far end is the service bar where the
wines are served out, and in the centre of the room is a very
appetising table of cold meats and fruit; the melons and other things
that should be kept cold being on a long box of broken ice; the
mushrooms reposing in big wooden baskets; the crayfish and the
egg-fruit and the other delicacies, according to seasons, all being set
out with exceptional taste and looking very tempting.
Quite an aristocratic clientele lunches and dines at Rinaldo's
restaurant. Many of the great people of Belgravia like to lunch in a
restaurant which is no great distance from their homes; the
Monsignori from the neighbouring Roman Catholic Cathedral often
go there, and quite a number of gourmets who like the Italian dishes
—for Rinaldo, though he looks like a Spaniard, is an Italian—of
which there are always some on the bill of fare, are very constant
patrons.
The restaurant has an extensive carte du jour, and most people who
lunch there prefer to order that meal from the card, though there is
a two-shilling lunch for those who are in a hurry. On the carte du
jour which I took away with me on the last occasion I lunched in
Wilton Road I found amongst the entrées ris de veau financière,
Vienna schnitzel, côte de veau Napolitaine, bitock à la Russe,
entrecôte Tyrolienne and fritto misto à la Romaine, which shows that
the restaurant caters for many nationalities and many tastes. My
lunch on this occasion—it was a warm summer day—consisted of a
slice of cantaloup melon, 9d.; fritto misto, 1s. 6d.; a cut of cheese;
an iced zabajone Milanaise, 1s., and a cup of coffee, which is always
excellent at Rinaldo's, and which, disregarding his early bringing-up
—for Italians never allow metals to touch coffee—Rinaldo pours out
of a fascinating little metal pot. A three-and-six dinner is the dinner
of the house, and Rinaldo explained to me that this rarely contains
Italian dishes; for Englishmen in the evening find them rather
difficult to digest. This is a menu, taken by chance in the autumn, of
the dinner of the restaurant:
Hors d'œuvre.
Consommé Tosca.
Crème Portugaise.
Turbot Bouilli. Sce. Homard.
Filet d'Hareng Meunière.
Mignonette d'Agneau Marigny.
Grenadine de Veau Clamart.
Grouse rôti.
Salade.
Choufleur au Gratin.
Glacé Napolitaine.
Mignardises.
I wonder how many people of the tens of hundreds who take their
books to Mudie's to be exchanged know that the Vienna Café just
across the road is an excellent place at which to lunch. In the
upstairs rooms I have eaten, in the middle of the day, Austrian and
German dishes excellently cooked, and there is a Viennese cheese
cake which is a speciality of the house for which I have a liking, and
with a slice of which I have always ended my meal. The coffee of
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com