Parallel Programming And Optimization With Intel Xeon Phi Coprocessors Handbook On The Development And Optimization Of Parallel Applications For Intel Xeon Processors And Intel Xeon Phi Coprocessors 2nd Edition Andrey Vladimirov - The ebook in PDF/DOCX format is ready for download now
Parallel Programming And Optimization With Intel Xeon Phi Coprocessors Handbook On The Development And Optimization Of Parallel Applications For Intel Xeon Processors And Intel Xeon Phi Coprocessors 2nd Edition Andrey Vladimirov - The ebook in PDF/DOCX format is ready for download now
com
OR CLICK HERE
DOWLOAD EBOOK
https://ebookmeta.com/product/intel-galileo-blueprints-1st-edition-
schwartz/
ebookmeta.com
https://ebookmeta.com/product/intel-galileo-networking-cookbook-1st-
edition-schwartz/
ebookmeta.com
https://ebookmeta.com/product/programming-massively-parallel-
processors-4th-edition-wen-mei-w-hwu/
ebookmeta.com
https://ebookmeta.com/product/sparkle-forever-safe-the-twelve-days-of-
christmas-1st-edition-dakota-rebel/
ebookmeta.com
Artifacts Versus Nature Body: A Wealth-Additive Scheme of
Enterprise, Economics, and Nature Managing 1st Edition
Masayuki Matsui
https://ebookmeta.com/product/artifacts-versus-nature-body-a-wealth-
additive-scheme-of-enterprise-economics-and-nature-managing-1st-
edition-masayuki-matsui/
ebookmeta.com
https://ebookmeta.com/product/bimbo-and-cheerleader-gang-
breeding-1-julie-law/
ebookmeta.com
https://ebookmeta.com/product/strategic-management-a-competitive-
advantage-approach-17th-edition-fred-david/
ebookmeta.com
https://ebookmeta.com/product/physics-of-data-science-and-machine-
learning-1st-edition-rauf/
ebookmeta.com
https://ebookmeta.com/product/battle-mage-4-academy-for-magical-
inmates-1st-edition-dante-king/
ebookmeta.com
A Study of Prehistoric Soapstone Vessels of the Middle
Atlantic Region of the United States Gary D Shaffer
https://ebookmeta.com/product/a-study-of-prehistoric-soapstone-
vessels-of-the-middle-atlantic-region-of-the-united-states-gary-d-
shaffer/
ebookmeta.com
PA R A L L E L P R O G R A M M I N G
A N D O P T I M I Z AT I O N W I T H
INTEL XEON PHI
R TM
COPROCESSORS
HANDBOOK ON THE
SECOND EDITION
DEVELOPMENT AND
OPTIMIZATION OF
PARALLEL
APPLICATIONS FOR
INTEL XEON
PROCESSORS
AND INTEL
XEON PHI
COPROCESSORS
C O L F A X I N T E R N AT I O N A L
ANDREY VLADIMIROV | R Y O A S A I | VA D I M KA R P U S E N KO
This electronic copy is built for
free distribution without modification
under a CC BY-ND 4.0 license.
PARALLEL P ROGRAMMING AND O PTIMIZATION
TM
WITH I NTEL R X EON P HI C OPROCESSORS
Second Edition
Terms of Use
This book is licensed under the Creative Commons Attribution-NoDerivatives International License (CC BY-ND4.0). You
may copy and redistribute the material in any medium or format. If you remix, transform, or build upon the material, you may not
distribute the modified material.
For more information, see https://creativecommons.org/licenses/by-nd/4.0/
ISBN: 978-0-9885234-2-5
About the Authors
Andrey Vladimirov, PhD, is Head of HPC Research at Colfax
International. His primary interest is the application of modern
computing technologies to computationally demanding scientific
problems. Prior to joining Colfax, A. Vladimirov was involved in
computational astrophysics research at Stanford University, North
Carolina State University, and the Ioffe Institute in Russia, where
he studied cosmic rays, collisionless plasmas and the interstellar
medium using computer simulations.
First Edition
Authors are sincerely grateful to James Reinders for supervising and directing the
creation of this book, Albert Lee for his help with editing and error checking, to spe-
cialists at Intel Corporation who contributed their time and shared with the authors
their expertise on the MIC architecture programming: Bob Davies, Shannon Cepeda,
Pradeep Dubey, Ronald Green, James Jeffers, Taylor Kidd, Rakesh Krishnaiyer,
Chris (CJ) Newburn, Kevin O’Leary, Zhang Zhang, and to a great number of people,
mostly from Colfax International and Intel, who have ensured that gears were turning
and bits were churning during the production of the book, including Rajesh Agny, Mani
Anandan, Joe Curley, Roger Herrick, Richard Jackson, Mike Lafferty, Thomas
Lee, Belinda Liviero, Gary Paek, Troy Porter, Tim Puett, John Rinehimer, Gau-
tam Shah, Manish Shah, Bruce Shiu, Jimmy Tran, Achim Wengeler, and Desmond
Yuen.
BRIEF TABLE OF CONTENTS v
1 Introduction 1
1.1 Intel Xeon Phi Coprocessors . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 MIC Architecture: Developer’s Perspective . . . . . . . . . . . . . . . 13
1.3 Applicability of the MIC Architecture . . . . . . . . . . . . . . . . . . 30
1.4 Preparing for Future Parallel Architectures . . . . . . . . . . . . . . . . 39
1.5 System Administration with Intel Xeon Phi Coprocessors . . . . . . . . 46
2 Programming Models 87
2.1 Native Applications and MPI . . . . . . . . . . . . . . . . . . . . . . . 88
2.2 Explicit Offload Model . . . . . . . . . . . . . . . . . . . . . . . . . . 101
2.3 Shared Virtual Memory Model . . . . . . . . . . . . . . . . . . . . . . 119
2.4 Using Multiple Coprocessors . . . . . . . . . . . . . . . . . . . . . . . 132
2.5 Offload Programming with OpenMP 4.0 . . . . . . . . . . . . . . . . . 148
3 Expressing Parallelism 153
3.1 Data Parallelism (Vectorization) . . . . . . . . . . . . . . . . . . . . . 154
3.2 Task Parallelism in Shared Memory: OpenMP . . . . . . . . . . . . . . 186
3.3 Task Parallelism with Intel Cilk Plus . . . . . . . . . . . . . . . . . . . 212
3.4 Process Parallelism in Distributed Memory with MPI . . . . . . . . . . 229
4 Optimizing Parallel Applications 261
4.1 Optimization Roadmap for Intel Xeon Phi Coprocessors . . . . . . . . . 261
4.2 Scalar and General Optimizations . . . . . . . . . . . . . . . . . . . . . 267
4.3 Optimizing Vectorization . . . . . . . . . . . . . . . . . . . . . . . . . 289
4.4 Optimization of Multi-Threading . . . . . . . . . . . . . . . . . . . . . 311
4.5 Memory Access Optimization . . . . . . . . . . . . . . . . . . . . . . 356
4.6 Offload Traffic Control . . . . . . . . . . . . . . . . . . . . . . . . . . 387
4.7 Optimization Strategies for MPI Applications . . . . . . . . . . . . . . 396
5 Software Development Tools 427
5.1 Intel Math Kernel Library . . . . . . . . . . . . . . . . . . . . . . . . . 427
5.2 Intel VTune Amplifier XE . . . . . . . . . . . . . . . . . . . . . . . . 444
6 Summary and Resources 465
6.1 Parallel Programming and Intel Xeon Phi Coprocessors . . . . . . . . . 465
6.2 Supplementary Code for Practical Exercises (“Labs”) . . . . . . . . . . 467
6.3 Colfax Developer Training . . . . . . . . . . . . . . . . . . . . . . . . 470
6.4 Additional Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
Bibliography 475
Contents
1 Introduction 1
1.1 Intel Xeon Phi Coprocessors . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Technology Overview . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Conventional Programming, Portable Code . . . . . . . . . . . 4
1.1.3 Heterogeneous Computing and Clustering . . . . . . . . . . . . 7
1.1.4 Intel Xeon Phi Product Family . . . . . . . . . . . . . . . . . . 8
1.1.5 Intel Xeon Processor E3, E5 and E7 Family . . . . . . . . . . . 11
1.2 MIC Architecture: Developer’s Perspective . . . . . . . . . . . . . . . 13
1.2.1 Knights Corner Die Organization . . . . . . . . . . . . . . . . . 13
1.2.2 Core Specifications . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2.3 Memory Hierarchy and Cache Properties . . . . . . . . . . . . 17
1.2.4 Integration into the Host System through MPSS . . . . . . . . . 20
1.2.5 Networking with Coprocessors in Clusters . . . . . . . . . . . . 22
1.2.6 File I/O on Coprocessors . . . . . . . . . . . . . . . . . . . . . 24
1.2.7 Common Software Development Tools . . . . . . . . . . . . . . 25
1.2.8 Intel Xeon Processors versus Intel Xeon Phi Coprocessors: De-
veloper Experience . . . . . . . . . . . . . . . . . . . . . . . . 28
1.3 Applicability of the MIC Architecture . . . . . . . . . . . . . . . . . . 30
1.3.1 Task Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.3.2 Data-Parallel Component . . . . . . . . . . . . . . . . . . . . . 32
1.3.3 Memory Access Pattern . . . . . . . . . . . . . . . . . . . . . . 34
1.3.4 PCIe Bandwidth Considerations . . . . . . . . . . . . . . . . . 36
1.4 Preparing for Future Parallel Architectures . . . . . . . . . . . . . . . . 39
1.4.1 Exascale Computing for the Rest of Us . . . . . . . . . . . . . 39
1.4.2 Second Generation MIC Processor, KNL . . . . . . . . . . . . 41
1.4.3 Future-Proof Development Options . . . . . . . . . . . . . . . 44
1.5 System Administration with Intel Xeon Phi Coprocessors . . . . . . . . 46
1.5.1 Hardware Compatibility . . . . . . . . . . . . . . . . . . . . . 46
1.5.2 Operating Systems . . . . . . . . . . . . . . . . . . . . . . . . 47
1.5.3 Installation and Minimal Configuration of MPSS . . . . . . . . 48
1.5.4 Controlling the MPSS service . . . . . . . . . . . . . . . . . . 49
1.5.5 Integration of MPSS with InfiniBand: OFED . . . . . . . . . . 50
Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
ix
Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
xi
Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
xiii
James R. Reinders
TM
Co-author of “Intel R Xeon Phi Coprocessor High Performance Programming"
c 2013, Morgan Kaufmann Publishers
Intel Corporation
March 2013
1. The details unveiled by Intel of the present and future MIC processors, including
Knights Landing;
6. Deeper review of the Intel Math Kernel Library support for the MIC architecture;
7. More convenient page format and font size for on-screen reading, and
8. Numerous updates to the text improving the clarity and depth of the discussion.
We hope that you find this book to be a valuable resource on “all things Xeon Phi”,
and, as always, we value your feedback. The HPC research department of Colfax
International can be reached by email at [email protected], and the latest updates on
our work can be found at research.colfaxinternational.com.
Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
xvii
• Chapter 1 presents the Intel Xeon Phi architecture overview and the environment
provided by the MIC Platform Software Stack (MPSS) and Intel Parallel Studio
XE on Many Integrated Core architecture (MIC). The purpose of Chapter 1 is
to outline what users may expect from Intel Xeon Phi coprocessors (technical
specifications, software stack, application domain).
• Chapter 2 allows the reader to experience the simplicity of Intel Xeon Phi usage
early on in the program. It describes the operating system running on the coproces-
sor, with the compilation of native applications, and with the language extensions
and CPU-centric codes that utilize Intel Xeon Phi coprocessors: offload and virtual-
shared memory programming models. In a nutshell, Chapter 2 demonstrates how
to write serial code that executes on Intel Xeon Phi coprocessors.
• Chapter 4 re-iterates the material of Chapter 3, this time delving deeper into the
topics of parallel programming and providing example-based optimization advice,
including the usage of the Intel Math Kernel Library. This chapter is the core of
the training. The topics discussed in this Chapter 4 include:
i) scalar optimizations;
ii) improving data structures for streaming, unit-stride, local memory access;
iii) guiding automatic vectorization with language constructs and compiler hints;
iv) reducing synchronization in task-parallel algorithms by the use of reduction;
v) avoiding false sharing;
vi) increasing arithmetic intensity and reducing cache misses by loop blocking
and recursion;
vii) exposing the full scope of available parallelism;
viii) controlling process and thread affinity in OpenMP and MPI;
ix) reducing communication through data persistence on coprocessor;
x) scheduling practices for load balancing across cores and MPI processes;
xi) optimized Intel Math Kernel Library function usage, and other.
If Chapter 3 demonstrated how to write parallel code for Intel Xeon Phi coproces-
sors, then Chapter 4 shows how to make this parallel code run fast.
Throughout the training, we emphasize the concept of portable parallel code. Portable
parallelism can be achieved by designing codes in a way that exposes the data and task
Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
xix
List of Abbreviations
ALU Arithmetic Logic Unit
AO Automatic Offload
CPU Central Processing Unit, used interchangeably with the terms “processor” and
“host” to indicate the Intel Xeon processor, as opposed to the Intel Xeon Phi
coprocessor
FP Floating-point
I/O Input/Output
IP Internet Protocol
Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
xxiii
OS operating system
TD Tag Directory
Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
1
CHAPTER 1
Introduction
This chapter introduces the Intel manycore architecture and positions Intel
Xeon Phi coprocessors in the context of parallel programming.
Even though the focus of this book is on Intel Xeon Phi coprocessors, we
will also briefly discuss the Intel Xeon family CPUs. This is necessary to
put the performance characteristics of Intel Xeon Phi coprocessors in proper
perspective.
Our approach to comparing CPUs and the manycore architecture builds
upon the first question that the designer of a computing system may ask:
does it make more sense spend the budget for setup costs and operational
expenses on all-CPU nodes, or purchase fewer nodes, but enhance them with
coprocessors? Naturally, technical specifications alone cannot be used to
answer this question. This question can be answered only by benchmarks
of specific applications in combination with power measurements, total cost
analysis, and additional factors such as development effort, available rack
space, administrative burden, etc.
This chapter will help to set expectations for the potential of the Intel
manycore architecture for the reader’s outstanding computing challenges.
Figure 1.1: Left: multi-core Intel Xeon processors (CPUs), Right: manycore Intel Xeon Phi
coprocessor. Relative sizes are not to scale.
The manycore architecture may yield more performance per watt of power
and per dollar of setup costs than traditional multi-core CPUs. However,
not every application can be accelerated by manycore coprocessors. Intel
Xeon Phi coprocessors derive their high performance from multiple cores,
dedicated vector arithmetic units with wide vector registers, and cached
onboard GDDR5. High energy efficiency is achieved through the use of
low clock speed x86 cores with lightweight design suitable for parallel
HPC applications. Therefore, only highly parallel applications supporting
vectorized arithmetic with well-behaved (or negligible) memory traffic will
thrive on the manycore architecture.
Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
1.1. INTEL XEON PHI COPROCESSORS 3
Figure 1.2: Examples of computing system solutions featuring the Intel Xeon Phi coprocessors.
Left: A Colfax Workstation CXP7450 with two Intel Xeon Phi coprocessors. Right: A Colfax
Server CXP9000 with eight Intel Xeon Phi coprocessors. Relative sizes not to scale.
First generation Intel Xeon Phi coprocessors based on the Knights Corner
(KNC) chip are end-point Peripheral Component Interconnect Express (PCIe)
devices. They can be installed on the PCIe bus and operated in coprocessor-
ready computing systems, including workstations (e.g., Figure 1.2, left) and
servers (e.g., Figure 1.2, right).
An Intel Xeon Phi coprocessor cannot operate without a CPU-based host
system, which is the reason for terming these products coprocessors. Because
they reside on the PCIe bus and have their own on-board RAM, coprocessors
do not share memory address space with the CPU. Consequently, the mere
presence of a coprocessor in a system does not automatically improve the
performance of applications running on the CPU. To utilize the MIC archi-
tecture, the application or the cluster execution manager must be aware of
the presence of a coprocessor.
The usage model of the second generation Intel MIC based on the Knights
Landing (KNL) chip will be different. The second generation chip will be
available as a standalone processor, as well as a PCIe-endpoint device. For
the standalone processor version, applications need not be coprocessor-aware
in order to be accelerated. However, a prerequisite for accelerated perfor-
mance is optimization of the application code for multi-core and manycore
architectures. See Section 1.4 for more information.
Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
1.1. INTEL XEON PHI COPROCESSORS 5
At the same time, the optimization methods used in applications for Intel
Xeon Phi are the same methods that are used in applications for general-
purpose Intel architecture CPUs. Indeed, case studies show that a code
optimized for the MIC platform also runs significantly faster on a CPU
(for a synthetic example, see paper [1] illustrated in Figure 1.3; code for a
similar application is available among the Supplementary Code for Practical
Exercises as Lab 4.01 – see Section 6.2; for realistic examples, refer to [2]).
Figure 1.3: The same C language code used for a simple N-body simulation on the CPU and on
a coprocessor. See white paper [1] for more information.
Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
1.1. INTEL XEON PHI COPROCESSORS 7
Figure 1.4: Intel architecture benefit: wide range of development options. Breadth, depth,
familiar models meet varied application needs. Diagram based on Intel materials.
Figure 1.5: Five-character code identifying the model of an Intel Xeon Phi coprocessor.
The first character in the code stands for the performance shelf: 3, 5 or
7. The second character is the product generation. As of the writing of this
book (Feb 2015), only generation 1 (KNC) is available. Therefore, available
models can be organized into 3 groups: 3100, 5100 and 7100 series.
5100 Series is optimized for performance per watt. 5100 Series coproces-
sors feature lower TDP, contain more memory and cores than the 3100
series, and perform better in memory bandwidth-bound and memory
capacity-bound workloads.
7100 Series is the top performing group. It has the greatest core count,
memory size and bandwidth of all series. It also comes at a higher
price than other series, and greater TDP than the 5100 series.
The third and fourth characters in the code are the SKU digits. These
Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
1.1. INTEL XEON PHI COPROCESSORS 9
generally indicate the product stepping, and they increase as minor silicone-
level improvements are made.
Finally, the fifth character is a letter, which indicates the cooling solution
or special usage case of the model.
A stands for active cooling. These coprocessors come inside a heat sink with
a built-in and fan, and are suitable for usage in desktop workstations
(Figure 1.6, left). This cooling solution is not reliant on system fans,
and the built-in fan speed is controlled by an onboard sensor, which
allows these coprocessors cards to be quiet in the idle state.
P stands for passive cooling. These coprocessors have come in a heat sink,
but have no fan (Figure 1.6, right). They cannot be used in workstations
because of imminent overheating, and are designed for servers.
D is the dense form factor model. It does not have a heat sink, and is smaller
in size than the X option. These models are designed for specialized
solutions capable of supporting a large density of thermal dissipation.
Figure 1.6: Active and passive cooling solutions of Intel Xeon Phi coprocessors.
Table 1.1: Models of Intel Xeon Phi coprocessors available as of May 2014. Columns contain:
model name, thermal design power (TDP) in Watts, number of physical cores, their clock
speed, Intel Turbo Boost technology support, onboard memory size in GiB, maximum memory
bandwidth (MMB) in GB/s, double precision (DP) theoretical peak performance (TPP) in
GFLOP/s, and RCP. RCP is price guidance for bulk purchases by direct Intel customers, subject
to change without notice, not a formal pricing offer from Intel or Colfax International.
Table 1.1 summarizes the currently available models of Intel Xeon Phi
coprocessors and their specifications. In this table, all quantities are obtained
from the Intel Xeon Phi Product Family page, except for the Theoretical
Peak Performance (TPP), which is estimated according to Equation (1.1):
Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
1.1. INTEL XEON PHI COPROCESSORS 11
Intel Xeon family adheres to the numbering scheme shown in Figure 1.7.
The product line (E3, E5 and E7) for Intel Xeon CPUs is similar to the
performance shelf for Intel Xeon Phi coprocessors: E3 is the lowest-cost
option, E5 is optimized for best power consumption, and E7 is the top
performing line.
Wayness is the maximum number of CPU sockets per node. Two digits
of the processor SKU places the CPU within its family. There differences
between different SKUs are mostly quantitative. The SKU determines the
number of cores, clock speed, maximum memory bandwidth, and cache size.
After the SKU, in some CPU models, an additional suffix “L” is present,
indicating a low power consumption model.
Finally, the version of the CPU (v1, v2 or v3) determines the type of
processor microarchitecture used in the chip: Sandy Bridge (v1), Ivy Bridge
(v2) or Haswell (v3). The difference between versions depends on whether
the version update was a “tick” or a “tock”. For instance, Sandy Bridge
to Ivy Bridge development was a “tick”, i.e., a newer, smaller transistor
technology was used in v2. As a result, v2 CPUs may have more cores,
greater performance and lower power consumption than v1, however, the
instruction set is unchanged. In contrast, Ivy Bridge to Haswell update was
a “tock”, i.e., the same transistor technology as in Ivy Bridge was used to
produce an architecturally improved chip. As a result, v3 CPUs support
additional instruction sets (in this case, AVX2) and features (e.g., TSX), and
operate with a different chipset.
Model TDP Cores Clock Cache MMB DP TPP RCP
(W) (GHz) (MiB) (GB/s) (GFLOP/s)
E5-2603 80 4 1.8 10 34.1 57.6 $198
E5-2690 135 8 2.9 20 51.2 185.6 $2057
E5-2603 v2 80 4 1.8 10 42.6 57.6 $202
E5-2697 v2 130 12 2.7 30 59.7 259.2 $2614
E5-2603 v3 85 6 1.6 15 68.0 76.8 $217
E5-2697 v3 145 14 2.6 35 51.0 291.2 $2706
Table 1.2: Some of the models of Intel Xeon processors available as of April 2015. Columns
as in Table 1.1. RCP is price guidance for bulk purchases by direct Intel customers, subject to
change without notice, not a formal pricing offer from Intel or Colfax International. Values are
per socket; double all values for a dual-socket CPU.
Of the multitude of Intel Xeon SKUs, the most important for the discus-
sion in this book are two-way multi-core CPUs. This is because their TDP
and cost are comparable to those of a single Intel Xeon Phi coprocessor (see
also Section 4.1.2).
Table 1.2 lists key technical specifications of a few selected two-way
models of Intel Xeon processors. Note that the quantities in Table 1.2 are
reported per socket, so for a two-way machine, they must be multiplied
by 2. DP TPP is estimated similarly to Equation (1.1), with SIMD Register
Size=256 bits, and an additional factor of ×2 to account for two ALUs
in Sandy Bridge and Ivy Bridge architectures, or for FMA in the Haswell
architecture (see Section 4.5).
For complete information on the technical specifications of other Intel
processors, refer to http://ark.intel.com/.
Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
1.2. MIC ARCHITECTURE: DEVELOPER’S PERSPECTIVE 13
Core Ring
SBOX
CORE CORE CORE CORE Interconnect (CRI)
PCIe v2.0
controller, DATA
L2 L2 L2 L2
DMA engines
ADDRESS
COHERENCE
TD TD TD TD
GDDR5 GDDR5
TD TD
CORE L2 L2 CORE
GDDR5 TD TD TD TD GDDR5
GBOX GBOX
GDDR5 (memory (memory GDDR5
controller) L2 L2 L2 L2 controller)
Figure 1.8: Knights Corner die organization. A bi-directional ring interconnects cores, tag
directories, onboard memory controllers and PCIe/DMA engines.
In addition to cores, the CRI contains devices that allow the chip to operate
as a symmetric multiprocessor:
i) A distributed Tag Directory (TD): multiple TD devices maintain infor-
mation about cache lines in the L2 caches, and of their states. Together,
all TDs form a Distributed Tag Directory (DTD), responsible for main-
taining a global cache coherency.
ii) 6 to 8 GBOX units, which are memory controllers for onboard GDDR5
RAM. Each controller has two 32-bit channels delivering up to 5.5 GT/s.
The RAM has the Error Correction Code (ECC) capability.
iii) An SBOX (system box) unit, supporting a PCI Express v2.0 logic with
eight Direct Memory Access (DMA) channels for data transfer from
system to GDDR5 memory.
Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
Other documents randomly have
different content
finance, and social and economic institutions. The literary value of
these records is not, however, to be inconsiderately judged from
their bulk. Times and standards in American historiography have
changed. Among the multitude of authors one must not look for
many names which may be written down with those of Prescott,
Motley, and Parkman. Not that the modern period is wanting in good
work or able writers. These are to be found in abundance. But most
of the work belongs to science and not to letters; and besides,
eminence is not fostered by the catholic distribution of talent and
training. Jameson picks up Amiel’s blunt opinion that “the era of
mediocrity in all things is commencing” and applies it to American
historians. At the same time, this wise critic inclines to the belief that
the vast improvement in technical process and workmanship realised
within the present generation is the natural means to the
development of a more substantial and more profound school of
historians than the West has thus far created. The term “mediocrity”
does not, indeed, do full justice to the period and the authors in
question, and we must seek other grounds of excuse for the brevity
of our review of them. These grounds are found, first, in the indirect
importance to literature of the great mass of recent work, and,
secondly, in the impossibility of setting the achievements of
contemporary workers in just perspective.
The writers, great and little, of the periods already surveyed
were, in large measure, self-trained. Until the last two or three
decades, colleges and universities offered little incentive to
methodical work upon historical subjects. Even Harvard, from whose
doors went one after another the men who were to make the New
England School famous, taught history only incidentally. Now, an
academic school has arisen. Young men and women are trained in
undergraduate and graduate studies by teachers who are
themselves historical writers and investigators. Students are taught
the discriminating use of historical instruments, and sound methods
of reconstruction and interpretation. The change has been wrought
under the unequal pressure of external influence, emphasis laid
upon scientific method, a quickened consciousness of the
importance and dignity of American history, and, finally, the example
of those graceful and inspiring writers who gave to Western
historiography an honourable place in the world’s literature. The
academic school owes its existence to no single founder. It is, by its
nature, a school of coöperative endeavour,—coöperation, first,
between teacher and pupil, and coöperation, later, in the conjoint
and organised labour of productive hands and brains. Among its
early advocates and promoters were Charles Kendall Adams,
university professor and president, teacher and historian, who
adapted the German seminary method to the American university;
Henry Adams, professor at Harvard University and author of a
brilliant history in nine volumes (1889–91) of the country under
Jefferson and Madison (1801–17); Justin Winsor, librarian,
bibliographer, and editor of the useful and scholarly “Narrative and
Critical History of America” (1884–89), and Herbert Baxter Adams, of
Johns Hopkins, historian and instructor of historical students. The
coöperative labours of the period have borne abundant fruit. Besides
Winsor’s volumes should be mentioned “The American Nation: a
History from Original Sources by Associated Scholars,” a gigantic
work in twenty-seven volumes just finished (1904–8) under the
editorship of Albert Bushnell Hart. The authorship is divided among a
number of competent historical writers. The collection lays claim to
being “the first comprehensive history of the United States, now
completed, which covers the whole period” from the discovery of
America to the present. Similar undertakings are, however, in
progress, and a number of coöperative works of smaller scope are
already in print. Other notable histories covering comparatively long
periods of time are Edward Channing’s “A History of the United
States,” to be completed in eight volumes; a series of nine volumes
relating to preconstitutional times written by John Fiske, after the
manner of Parkman, and including “The Critical Period of American
History” (1888), “The Beginnings of New England” (1889), “The
American Revolution” (1891), “The Discovery of America” (1892),
etc.; James Schouler’s “History of the United States under the
Constitution” (1880–99); “A Popular History of the United States”
(1876–81), by William Cullen Bryant and Sydney H. Gay; “A History
of the People of the United States from the Revolution to the Civil
War” (6 of the 7 volumes published, 1883–1906), by John B.
McMaster; “The Constitutional and Political History of the United
States,” (1877–92), by Hermann E. von Holst, and “A History of the
American People” (1902), by President Woodrow Wilson of Princeton
University. Channing’s attempt to cover, by the labours of a single
competent scholar, the entire history of the country is comparable to
that of George Bancroft. John Fiske wrote readable and popular
narratives of historical events. He did much, both by books and
lectures, to arouse general interest in matters of American life past
and present. McMaster’s substantial and illuminating history is social
rather than political. He seeks to portray the whole life of the people.
Von Holst’s aim was, on the other hand, political. The author was a
German-American. He held, among academic posts, professorships
at Freiburg and the University of Chicago. His critical review, often
disparaging to democratic institutions, may be taken as a
counterblast to the ebullient patriotism of earlier, native writers. As
the work of a foreign observer of American affairs, it suggests the
reflections of de Tocqueville, of James Bryce, and of Goldwin Smith.
President Wilson’s five volumes contain a wise and judicial
commentary, in the form of a long and attractive essay, on the main
course of events since the days of discovery. For the multitude of
American historical writers who have treated single epochs, space
permits mention of only one or two names. James Ford Rhodes’
“History of the United States from the Compromise of 1850” (7
volumes, 1902–6), the work of “nineteen years’ almost exclusive
devotion,” is commonly regarded as the most thorough and best
balanced study of the Civil War, its causes and its consequences.
Henry Adams has, in his “History of the United States,” etc.,
investigated with competence and penetration the administrations of
Jefferson and Madison.
This meagre list of the more important productions of the
academic school clearly reveals the attraction of the American theme
for the present American historian. Capable and impressive studies
of foreign subjects there have been, it is true;—David Jayne Hill’s
“History of Diplomacy in the International Development of Europe”
and Henry C. Lea’s work on the medieval church are conspicuous
instances;—but the great mass of research and writing has been
gathered at home. Governmental affairs and political events loom
large. Less interest has been taken in the subtler phases of national
character and individual motive; although Fiske and McMaster and
Woodrow Wilson and certain of the best biographers (whose
important service to literature deserves separate consideration)
represent a current tendency toward reflective and philosophical
writing of a literary quality, which augurs well for the future of
American historiography.
Thus it was that for a long time Defoe and Fielding, Smollett and
Sterne found no imitators in America. The American novel-reader, for
the most part, was content with British provender, and satisfied his
appetite for the marvellous with Walpole’s “Castle of Otranto,” Lewis’
“Monk,” and Mrs. Radcliffe’s “Romance of the Forest” and “The
Mysteries of Udolpho.” Toward the end of the eighteenth century
several writers essayed the novel, but not with lasting success. In
“The Foresters” (published serially in The Columbian Magazine, and
in book form in 1792), Jeremy Belknap (1774–98) produced an
ingenious though trivial allegorical tale of the colonisation of America
and the rebellion of the colonies. In this, Peter Bullfrog stood for
New York, Ethan Greenwood for Vermont, Walter Pipeweed for
Virginia, Charles Indigo for South Carolina, and so on. Ann Eliza
Bleecker (1752–83) was the author of “The History of Maria Kittle,”
which in the form of a letter sets forth some harrowing experiences
among the savages during the French and Indian War; and of “The
Story of Henry and Anne,” a tale, “founded on fact,” of the
misfortunes of some German peasants who finally settled in
America; both of these were published posthumously in her “Works”
in 1793. Mrs. Susanna Haswell Rowson’s “Charlotte Temple” (1790),
a story of love, betrayal, and desertion, despite its absurdly stilted
phrases and its long-drawn melancholy, has ever been popular with
a certain class of readers; the editor of the latest edition (1905), Mr.
Francis W. Halsey, has examined 104 editions, and his list is
incomplete. An avowed antidote to “Charlotte Temple,” Mrs. Tabitha
G. Tenney’s satirical “Female Quixotism” (1808), suggests to
Professor Trent “an expurgated Smollett”; it is now unknown. Mrs.
Hannah W. Foster, the wife of a clergyman in Massachusetts, wrote
“The Coquette, or The History of Eliza Wharton, a Novel Founded on
Fact” (1797), a story of desertion, showing the marked influence of
Richardson. In the same year, appeared “The Algerine Captive,” by
Royall Tyler, who was one of the first to turn to American life as a
fruitful subject for fiction. His story is a broadly humorous picaresque
tale, of the Smollett type, which introduces rather too many
wearisome details of customs in Algiers; a fault for which his
generally spirited style and his powerful description of the horrors of
a slave-ship partially atone.
Hugh Henry Brackenridge (1748–1816), the classmate at
Princeton of James Madison and Philip Freneau, wrote “Modern
Chivalry, or The Adventures of Captain John Farrago and Teague
O’Regan, His Servant” (Philadelphia and Pittsburgh, published in four
parts, 1792–7), a modern “Don Quixote” narrating his experiences in
the Whisky Insurrection of 1794. Though widely read in its day,
especially by artisans and farmers, its literary worth was not
sufficient to preserve it. “The Gamesters,” published in 1805 by Mrs.
Catharine Warren, was likewise popular in its day; it attempted “to
blend instruction with amusement.”