100% found this document useful (1 vote)
16 views

Parallel Programming And Optimization With Intel Xeon Phi Coprocessors Handbook On The Development And Optimization Of Parallel Applications For Intel Xeon Processors And Intel Xeon Phi Coprocessors 2nd Edition Andrey Vladimirov - The ebook in PDF/DOCX format is ready for download now

The document provides information about the 'Parallel Programming and Optimization with Intel Xeon Phi Coprocessors' handbook, which focuses on the development and optimization of parallel applications for Intel Xeon processors and coprocessors. It includes details about various digital products available for download, including other related eBooks. The handbook is authored by Andrey Vladimirov, Ryo Asai, and Vadim Karpusenko and is licensed under Creative Commons.

Uploaded by

gollisethiel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
16 views

Parallel Programming And Optimization With Intel Xeon Phi Coprocessors Handbook On The Development And Optimization Of Parallel Applications For Intel Xeon Processors And Intel Xeon Phi Coprocessors 2nd Edition Andrey Vladimirov - The ebook in PDF/DOCX format is ready for download now

The document provides information about the 'Parallel Programming and Optimization with Intel Xeon Phi Coprocessors' handbook, which focuses on the development and optimization of parallel applications for Intel Xeon processors and coprocessors. It includes details about various digital products available for download, including other related eBooks. The handbook is authored by Andrey Vladimirov, Ryo Asai, and Vadim Karpusenko and is licensed under Creative Commons.

Uploaded by

gollisethiel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Read Anytime Anywhere Easy Ebook Downloads at ebookmeta.

com

Parallel Programming And Optimization With Intel


Xeon Phi Coprocessors Handbook On The Development
And Optimization Of Parallel Applications For
Intel Xeon Processors And Intel Xeon Phi
Coprocessors 2nd Edition Andrey Vladimirov
https://ebookmeta.com/product/parallel-programming-and-
optimization-with-intel-xeon-phi-coprocessors-handbook-on-
the-development-and-optimization-of-parallel-applications-
for-intel-xeon-processors-and-intel-xeon-phi-coprocessors-2/

OR CLICK HERE

DOWLOAD EBOOK

Visit and Get More Ebook Downloads Instantly at https://ebookmeta.com


Recommended digital products (PDF, EPUB, MOBI) that
you can download immediately if you are interested.

Intel Galileo Blueprints 1st Edition Schwartz

https://ebookmeta.com/product/intel-galileo-blueprints-1st-edition-
schwartz/

ebookmeta.com

Intel Galileo Networking Cookbook 1st Edition Schwartz

https://ebookmeta.com/product/intel-galileo-networking-cookbook-1st-
edition-schwartz/

ebookmeta.com

Programming Massively Parallel Processors 4th Edition Wen-


Mei W. Hwu

https://ebookmeta.com/product/programming-massively-parallel-
processors-4th-edition-wen-mei-w-hwu/

ebookmeta.com

Sparkle Forever Safe The Twelve Days of Christmas 1st


Edition Dakota Rebel

https://ebookmeta.com/product/sparkle-forever-safe-the-twelve-days-of-
christmas-1st-edition-dakota-rebel/

ebookmeta.com
Artifacts Versus Nature Body: A Wealth-Additive Scheme of
Enterprise, Economics, and Nature Managing 1st Edition
Masayuki Matsui
https://ebookmeta.com/product/artifacts-versus-nature-body-a-wealth-
additive-scheme-of-enterprise-economics-and-nature-managing-1st-
edition-masayuki-matsui/
ebookmeta.com

Bimbo and Cheerleader Gang Breeding 1 Julie Law

https://ebookmeta.com/product/bimbo-and-cheerleader-gang-
breeding-1-julie-law/

ebookmeta.com

Strategic Management: A Competitive Advantage Approach,


17th Edition Fred David

https://ebookmeta.com/product/strategic-management-a-competitive-
advantage-approach-17th-edition-fred-david/

ebookmeta.com

Physics of Data Science and Machine Learning 1st Edition


Rauf

https://ebookmeta.com/product/physics-of-data-science-and-machine-
learning-1st-edition-rauf/

ebookmeta.com

Battle Mage 4 Academy for Magical Inmates 1st Edition


Dante King

https://ebookmeta.com/product/battle-mage-4-academy-for-magical-
inmates-1st-edition-dante-king/

ebookmeta.com
A Study of Prehistoric Soapstone Vessels of the Middle
Atlantic Region of the United States Gary D Shaffer

https://ebookmeta.com/product/a-study-of-prehistoric-soapstone-
vessels-of-the-middle-atlantic-region-of-the-united-states-gary-d-
shaffer/
ebookmeta.com
PA R A L L E L P R O G R A M M I N G
A N D O P T I M I Z AT I O N W I T H
INTEL XEON PHI
R TM

COPROCESSORS
HANDBOOK ON THE

SECOND EDITION
DEVELOPMENT AND
OPTIMIZATION OF
PARALLEL
APPLICATIONS FOR
INTEL XEON
PROCESSORS
AND INTEL
XEON PHI
COPROCESSORS

C O L F A X I N T E R N AT I O N A L
ANDREY VLADIMIROV | R Y O A S A I | VA D I M KA R P U S E N KO
This electronic copy is built for
free distribution without modification
under a CC BY-ND 4.0 license.
PARALLEL P ROGRAMMING AND O PTIMIZATION
TM
WITH I NTEL R X EON P HI C OPROCESSORS

H ANDBOOK ON THE D EVELOPMENT AND O PTIMIZATION


OF PARALLEL A PPLICATIONS
FOR I NTEL R X EON R P ROCESSORS
TM
AND I NTEL R X EON P HI C OPROCESSORS

Second Edition

Andrey Vladimirov, Ryo Asai and Vadim Karpusenko

c Colfax International, 2013–2015

Electronic book built: January 4, 2019


Last revision date: January 4, 2019
Copyrighted Material
Copyright c 2013–2015, Colfax International. All rights reserved.
Cover image Copyright c pio3, 2013. Used under license from Shutterstock.com.
Published by Colfax International, 750 Palomar Ave, Sunnyvale, CA 94085, USA.
All Rights Reserved.
No part of this book (or publication) may be reproduced or transmitted in any form or by any means, electronic or mechan-
ical, including photocopying, recording or by any information storage and retrieval system, without written permission from the
publisher, except for the inclusion of brief quotations in a review.
Intel, Xeon and Intel Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries.
All trademarks and registered trademarks appearing in this publication are the property of their respective owners.

Terms of Use
This book is licensed under the Creative Commons Attribution-NoDerivatives International License (CC BY-ND4.0). You
may copy and redistribute the material in any medium or format. If you remix, transform, or build upon the material, you may not
distribute the modified material.
For more information, see https://creativecommons.org/licenses/by-nd/4.0/

Disclaimer and Legal Notices


While best efforts have been used in preparing this book, the publisher makes no representations or warranties of any kind
and assumes no liabilities of any kind with respect to the accuracy or completeness of the contents and specifically disclaims
any implied warranties of merchantability or fitness of use for a particular purpose. The publisher shall not be held liable or
responsible to any person or entity with respect to any loss or incidental or consequential damages caused, or alleged to have been
caused, directly or indirectly, by the information or programs contained herein. No warranty may be created or extended by sales
representatives or written sales materials.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests are measured using specific computer systems, components, software, operations and functions. Any change to
any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully
evaluating your contemplated purchases, including the performance of that product when combined with other products.
Results have been simulated and are provided for informational purposes only. Results were derived using simulations run
on an architecture simulator or model. Any difference in system hardware or software design or configuration may affect actual
performance.
Because of the evolutionary nature of technology, knowledge and best practices described at the time of this writing, may
become outdated or simply inapplicable at a later date. Summaries, strategies, tips and tricks are only recommendations by the
publisher, and reading this eBook does not guarantee that one’s results will exactly mirror our own results. Every company
is different and the advice and strategies contained herein may not be suitable for your situation. References are provided for
informational purposes only and do not constitute endorsement of any websites or other sources.
The products described in this document may contain design defects or errors known as errata which may cause the product
to deviate from published specifications. All products, computer systems, dates, and figures specified are preliminary based on
current expectations, and are subject to change without notice.

ISBN: 978-0-9885234-2-5
About the Authors
Andrey Vladimirov, PhD, is Head of HPC Research at Colfax
International. His primary interest is the application of modern
computing technologies to computationally demanding scientific
problems. Prior to joining Colfax, A. Vladimirov was involved in
computational astrophysics research at Stanford University, North
Carolina State University, and the Ioffe Institute in Russia, where
he studied cosmic rays, collisionless plasmas and the interstellar
medium using computer simulations.

Ryo Asai is a Researcher at Colfax International. He develops


optimization methods for scientific applications targeting emerging
parallel computing platforms, computing accelerators and inter-
connect technologies. Ryo holds a B.S. degree in Physics from
University of California, Berkeley.

Vadim Karpusenko, PhD, is Principal HPC Research Engineer at


Colfax International involved in training and consultancy projects
on data mining, software development and statistical analysis of
complex systems. His research interests are in the area of physical
modeling with HPC clusters, highly parallel architectures, and code
optimization. Vadim holds a PhD from North Carolina State Uni-
versity for his research in in the field of computational biophysics
on the free energy and stability of helical secondary structures of
proteins.

Additional publications by these authors


related to Intel MIC architecture programming
may be found at
http://colfaxresearch.com/
Acknowledgements
Second Edition
We cannot thank enough the people who have contributed their valuable time and ex-
pertise to write technical reviews of the 2nd edition of this book. They have provided
guidance, fixed misconceptions, future-proofed the messages and caught countless bugs:
Ilya Burylov, Gennady Fedorov, Alexandr Kalinkin, Alexandr Kobotov, Vadim
Pirogov (Intel/MKL), Joseph Curley (Intel), Rob Farber (TechEnablement.com),
Rakesh Krishnaiyer (Intel), Lawrence Meadows (Intel), John Pennycook (Intel),
Troy Porter (Stanford University), Frances Roth (Intel), Jason Sewall (Intel), Ger-
gana Slavova (Intel). Thank you all very much!

First Edition
Authors are sincerely grateful to James Reinders for supervising and directing the
creation of this book, Albert Lee for his help with editing and error checking, to spe-
cialists at Intel Corporation who contributed their time and shared with the authors
their expertise on the MIC architecture programming: Bob Davies, Shannon Cepeda,
Pradeep Dubey, Ronald Green, James Jeffers, Taylor Kidd, Rakesh Krishnaiyer,
Chris (CJ) Newburn, Kevin O’Leary, Zhang Zhang, and to a great number of people,
mostly from Colfax International and Intel, who have ensured that gears were turning
and bits were churning during the production of the book, including Rajesh Agny, Mani
Anandan, Joe Curley, Roger Herrick, Richard Jackson, Mike Lafferty, Thomas
Lee, Belinda Liviero, Gary Paek, Troy Porter, Tim Puett, John Rinehimer, Gau-
tam Shah, Manish Shah, Bruce Shiu, Jimmy Tran, Achim Wengeler, and Desmond
Yuen.
BRIEF TABLE OF CONTENTS v

1 Introduction 1
1.1 Intel Xeon Phi Coprocessors . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 MIC Architecture: Developer’s Perspective . . . . . . . . . . . . . . . 13
1.3 Applicability of the MIC Architecture . . . . . . . . . . . . . . . . . . 30
1.4 Preparing for Future Parallel Architectures . . . . . . . . . . . . . . . . 39
1.5 System Administration with Intel Xeon Phi Coprocessors . . . . . . . . 46
2 Programming Models 87
2.1 Native Applications and MPI . . . . . . . . . . . . . . . . . . . . . . . 88
2.2 Explicit Offload Model . . . . . . . . . . . . . . . . . . . . . . . . . . 101
2.3 Shared Virtual Memory Model . . . . . . . . . . . . . . . . . . . . . . 119
2.4 Using Multiple Coprocessors . . . . . . . . . . . . . . . . . . . . . . . 132
2.5 Offload Programming with OpenMP 4.0 . . . . . . . . . . . . . . . . . 148
3 Expressing Parallelism 153
3.1 Data Parallelism (Vectorization) . . . . . . . . . . . . . . . . . . . . . 154
3.2 Task Parallelism in Shared Memory: OpenMP . . . . . . . . . . . . . . 186
3.3 Task Parallelism with Intel Cilk Plus . . . . . . . . . . . . . . . . . . . 212
3.4 Process Parallelism in Distributed Memory with MPI . . . . . . . . . . 229
4 Optimizing Parallel Applications 261
4.1 Optimization Roadmap for Intel Xeon Phi Coprocessors . . . . . . . . . 261
4.2 Scalar and General Optimizations . . . . . . . . . . . . . . . . . . . . . 267
4.3 Optimizing Vectorization . . . . . . . . . . . . . . . . . . . . . . . . . 289
4.4 Optimization of Multi-Threading . . . . . . . . . . . . . . . . . . . . . 311
4.5 Memory Access Optimization . . . . . . . . . . . . . . . . . . . . . . 356
4.6 Offload Traffic Control . . . . . . . . . . . . . . . . . . . . . . . . . . 387
4.7 Optimization Strategies for MPI Applications . . . . . . . . . . . . . . 396
5 Software Development Tools 427
5.1 Intel Math Kernel Library . . . . . . . . . . . . . . . . . . . . . . . . . 427
5.2 Intel VTune Amplifier XE . . . . . . . . . . . . . . . . . . . . . . . . 444
6 Summary and Resources 465
6.1 Parallel Programming and Intel Xeon Phi Coprocessors . . . . . . . . . 465
6.2 Supplementary Code for Practical Exercises (“Labs”) . . . . . . . . . . 467
6.3 Colfax Developer Training . . . . . . . . . . . . . . . . . . . . . . . . 470
6.4 Additional Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
Bibliography 475

c Colfax International, 2013–2015


vii

Contents

1 Introduction 1
1.1 Intel Xeon Phi Coprocessors . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Technology Overview . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Conventional Programming, Portable Code . . . . . . . . . . . 4
1.1.3 Heterogeneous Computing and Clustering . . . . . . . . . . . . 7
1.1.4 Intel Xeon Phi Product Family . . . . . . . . . . . . . . . . . . 8
1.1.5 Intel Xeon Processor E3, E5 and E7 Family . . . . . . . . . . . 11
1.2 MIC Architecture: Developer’s Perspective . . . . . . . . . . . . . . . 13
1.2.1 Knights Corner Die Organization . . . . . . . . . . . . . . . . . 13
1.2.2 Core Specifications . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2.3 Memory Hierarchy and Cache Properties . . . . . . . . . . . . 17
1.2.4 Integration into the Host System through MPSS . . . . . . . . . 20
1.2.5 Networking with Coprocessors in Clusters . . . . . . . . . . . . 22
1.2.6 File I/O on Coprocessors . . . . . . . . . . . . . . . . . . . . . 24
1.2.7 Common Software Development Tools . . . . . . . . . . . . . . 25
1.2.8 Intel Xeon Processors versus Intel Xeon Phi Coprocessors: De-
veloper Experience . . . . . . . . . . . . . . . . . . . . . . . . 28
1.3 Applicability of the MIC Architecture . . . . . . . . . . . . . . . . . . 30
1.3.1 Task Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.3.2 Data-Parallel Component . . . . . . . . . . . . . . . . . . . . . 32
1.3.3 Memory Access Pattern . . . . . . . . . . . . . . . . . . . . . . 34
1.3.4 PCIe Bandwidth Considerations . . . . . . . . . . . . . . . . . 36
1.4 Preparing for Future Parallel Architectures . . . . . . . . . . . . . . . . 39
1.4.1 Exascale Computing for the Rest of Us . . . . . . . . . . . . . 39
1.4.2 Second Generation MIC Processor, KNL . . . . . . . . . . . . 41
1.4.3 Future-Proof Development Options . . . . . . . . . . . . . . . 44
1.5 System Administration with Intel Xeon Phi Coprocessors . . . . . . . . 46
1.5.1 Hardware Compatibility . . . . . . . . . . . . . . . . . . . . . 46
1.5.2 Operating Systems . . . . . . . . . . . . . . . . . . . . . . . . 47
1.5.3 Installation and Minimal Configuration of MPSS . . . . . . . . 48
1.5.4 Controlling the MPSS service . . . . . . . . . . . . . . . . . . 49
1.5.5 Integration of MPSS with InfiniBand: OFED . . . . . . . . . . 50

c Colfax International, 2013–2015


viii CONTENTS

1.5.6 Restoring MPSS Functionality after Kernel Updates . . . . . . . 51


1.5.7 Installation of Intel Compilers . . . . . . . . . . . . . . . . . . 52
1.5.8 Installing the OpenCL Runtime and CodeBuilder . . . . . . . . 54
1.5.9 Quick Functionality Check . . . . . . . . . . . . . . . . . . . . 56
1.5.10 Overview of Intel MPSS Tools . . . . . . . . . . . . . . . . . . 58
1.5.11 miccheck: Basic Troubleshooting . . . . . . . . . . . . . . . 59
1.5.12 micctrl: Coprocessor OS Configuration . . . . . . . . . . . . 61
1.5.13 micflash: Coprocessor Firmware Updates . . . . . . . . . . 64
1.5.14 micinfo: Coprocesssor, Firmware, Driver Info . . . . . . . . 65
1.5.15 micrasd: Reliability Monitor, Error Logging . . . . . . . . . 67
1.5.16 micsmc: Real-Time Monitoring Tool . . . . . . . . . . . . . . 68
1.5.17 User Management on Intel Xeon Phi Coprocessors . . . . . . . 71
1.5.18 SSH Client Configuration . . . . . . . . . . . . . . . . . . . . . 76
1.5.19 NFS Mounting a Host Export . . . . . . . . . . . . . . . . . . . 77
1.5.20 Sharing a Local Disk with VirtIO Block Device . . . . . . . . . 80
1.5.21 Bridged Networking in Clusters with Coprocessors . . . . . . . 82
1.5.22 Peer to Peer Communication between Coprocessors . . . . . . . 84
1.5.23 Manual Customization of the coprocessor OS . . . . . . . . . . 86
2 Programming Models 87
2.1 Native Applications and MPI . . . . . . . . . . . . . . . . . . . . . . . 88
2.1.1 Using Compiler Argument -mmic to Compile Native Applica-
TM
tions for Intel R Xeon Phi Coprocessors . . . . . . . . . . . . 88
2.1.2 Running Native Applications on Using SSH . . . . . . . . . . . 90
2.1.3 Running Native Applications with micnativeloadex . . . . 91
2.1.4 Monitoring the Coprocessor Activity with micsmc . . . . . . . 93
2.1.5 MPI Applications on Intel Xeon Phi Coprocessors . . . . . . . . 96
2.2 Explicit Offload Model . . . . . . . . . . . . . . . . . . . . . . . . . . 101
2.2.1 “Hello World” Example in the Explicit Offload Model . . . . . 101
2.2.2 Offloading Functions . . . . . . . . . . . . . . . . . . . . . . . 103
2.2.3 Offloading Bitwise-Copyable Data . . . . . . . . . . . . . . . . 104
2.2.4 Data and Memory Persistence Between Offloads . . . . . . . . 106
2.2.5 Asynchronous Offload . . . . . . . . . . . . . . . . . . . . . . 108
2.2.6 Target-Specific Code . . . . . . . . . . . . . . . . . . . . . . . 110
2.2.7 Optional and Conditional Offload, Fall-Back to Host . . . . . . 111
2.2.8 Offload Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . 113
2.2.9 Environment Variables and MIC_ENV_PREFIX . . . . . . . . 114

Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
ix

2.2.10 Proxy Console I/O . . . . . . . . . . . . . . . . . . . . . . . . 116


2.2.11 Review: Explicit Offload Model . . . . . . . . . . . . . . . . . 117
2.3 Shared Virtual Memory Model . . . . . . . . . . . . . . . . . . . . . . 119
2.3.1 Offloading Functions . . . . . . . . . . . . . . . . . . . . . . . 121
2.3.2 Sharing and Offloading Objects . . . . . . . . . . . . . . . . . 122
2.3.3 Dynamic Allocation in Shared Virtual Memory . . . . . . . . . 123
2.3.4 Classes in Shared Virtual Memory . . . . . . . . . . . . . . . . 125
2.3.5 Placement Operator new for Shared Classes . . . . . . . . . . . 128
2.3.6 Asynchronous Offload . . . . . . . . . . . . . . . . . . . . . . 130
2.3.7 Summary for Shared Virtual Memory Model . . . . . . . . . . 131
2.4 Using Multiple Coprocessors . . . . . . . . . . . . . . . . . . . . . . . 132
2.4.1 Multiple Coprocessors with Explicit Offload . . . . . . . . . . . 133
2.4.2 Multiple Coprocessors in the Shared Virtual Memory Model . . 138
2.4.3 Multiple Coprocessors with MPI . . . . . . . . . . . . . . . . . 141
2.5 Offload Programming with OpenMP 4.0 . . . . . . . . . . . . . . . . . 148
2.5.1 Offload with Pragma Target . . . . . . . . . . . . . . . . . . . . 149
2.5.2 Data Persistence with Pragma Target Data . . . . . . . . . . . . 150
3 Expressing Parallelism 153
3.1 Data Parallelism (Vectorization) . . . . . . . . . . . . . . . . . . . . . 154
3.1.1 Vector Instructions: Concept and History . . . . . . . . . . . . 154
3.1.2 Intel Architecture Vector Instruction Sets . . . . . . . . . . . . 155
3.1.3 Is Your Code Using Vectorization? . . . . . . . . . . . . . . . . 156
3.1.4 Data Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . 157
3.1.5 Vector Instructions using Inline Assembly, Compiler Intrinsics
and Class Libraries . . . . . . . . . . . . . . . . . . . . . . . . 163
3.1.6 Automatic Vectorization of Loops . . . . . . . . . . . . . . . . 166
3.1.7 Extensions for Array Notation in Intel Cilk Plus . . . . . . . . . 171
3.1.8 SIMD-Enabled Functions . . . . . . . . . . . . . . . . . . . . . 173
3.1.9 Assumed Vector Dependence . . . . . . . . . . . . . . . . . . . 175
3.1.10 Vectorization Pragmas, Keywords and Compiler Arguments. . . 178
3.1.11 Exclusive Features of the IMCI Instruction Set . . . . . . . . . 181
3.2 Task Parallelism in Shared Memory: OpenMP . . . . . . . . . . . . . . 186
3.2.1 Multiple Cores and Task Parallelism . . . . . . . . . . . . . . . 186
3.2.2 “Hello World” with OpenMP . . . . . . . . . . . . . . . . . . . 188
3.2.3 For-Loops in OpenMP . . . . . . . . . . . . . . . . . . . . . . 190
3.2.4 Tasks in OpenMP . . . . . . . . . . . . . . . . . . . . . . . . . 194

c Colfax International, 2013–2015


x CONTENTS

3.2.5 Shared and Private Variables . . . . . . . . . . . . . . . . . . . 198


3.2.6 Synchronization: Avoiding Unpredictable Behavior . . . . . . . 202
3.2.7 Reduction: Avoiding Synchronization . . . . . . . . . . . . . . 209
3.3 Task Parallelism with Intel Cilk Plus . . . . . . . . . . . . . . . . . . . 212
3.3.1 “Hello World” in Intel Cilk Plus . . . . . . . . . . . . . . . . . 213
3.3.2 For-Loops in Intel Cilk Plus . . . . . . . . . . . . . . . . . . . 215
3.3.3 Fork-Join Model and Spawning in Intel Cilk Plus . . . . . . . . 217
3.3.4 Synchronization with Spawned Tasks . . . . . . . . . . . . . . 219
3.3.5 Reduction: Avoiding Synchronization . . . . . . . . . . . . . . 221
3.3.6 OpenMP versus Intel Cilk Plus . . . . . . . . . . . . . . . . . . 226
3.3.7 Additional Resources on Shared Memory Parallelism . . . . . . 227
3.4 Process Parallelism in Distributed Memory with MPI . . . . . . . . . . 229
3.4.1 Parallel Computing in Clusters with Multi-Core and Many-Core
Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
3.4.2 Program Structure in MPI . . . . . . . . . . . . . . . . . . . . 235
3.4.3 Point-to-Point Communication . . . . . . . . . . . . . . . . . . 238
3.4.4 MPI Communication Modes . . . . . . . . . . . . . . . . . . . 244
3.4.5 Collective Communication and Reduction . . . . . . . . . . . . 253
3.4.6 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . 260
4 Optimizing Parallel Applications 261
4.1 Optimization Roadmap for Intel Xeon Phi Coprocessors . . . . . . . . . 261
4.1.1 Optimization Checklist . . . . . . . . . . . . . . . . . . . . . . 261
4.1.2 Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
4.1.3 Benchmark Methodology . . . . . . . . . . . . . . . . . . . . . 264
4.1.4 Benchmark Computing System . . . . . . . . . . . . . . . . . . 266
4.2 Scalar and General Optimizations . . . . . . . . . . . . . . . . . . . . . 267
4.2.1 Compiler Controls for Optimization . . . . . . . . . . . . . . . 267
4.2.2 Compiler Controls for Precision . . . . . . . . . . . . . . . . . 269
4.2.3 Optimizing Arithmetic Expressions . . . . . . . . . . . . . . . 275
4.2.4 Programming Practices for High Performance . . . . . . . . . . 282
4.2.5 Math Kernel Library for Scalar Arithmetic . . . . . . . . . . . . 287
4.3 Optimizing Vectorization . . . . . . . . . . . . . . . . . . . . . . . . . 289
4.3.1 Diagnosing the Utilization of Vector Instructions . . . . . . . . 289
4.3.2 Unit-Stride Access and Spatial Locality of Reference . . . . . . 290
4.3.3 Regularizing Vectorization Pattern . . . . . . . . . . . . . . . . 295
4.3.4 Compiler Hints: Aligned Data Notice . . . . . . . . . . . . . . 302

Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
xi

4.3.5 Compiler Hints: Pointer Disambiguation . . . . . . . . . . . . . 303


4.3.6 Strip-Mining for Vectorization . . . . . . . . . . . . . . . . . . 306
4.3.7 Additional “Tuning Knobs” for Vectorization . . . . . . . . . . 310
4.4 Optimization of Multi-Threading . . . . . . . . . . . . . . . . . . . . . 311
4.4.1 Avoiding Synchronization through Parallel Reduction . . . . . . 311
4.4.2 Elimination of False Sharing with Padding . . . . . . . . . . . . 316
4.4.3 Resolving Load Imbalance with Scheduling Control . . . . . . . 321
4.4.4 Dealing with Insufficient Parallelism . . . . . . . . . . . . . . . 329
4.4.5 Thread Affinity Optimization . . . . . . . . . . . . . . . . . . . 341
4.4.6 Diagnosing Parallel Efficiency, Scalability Tests . . . . . . . . . 354
4.5 Memory Access Optimization . . . . . . . . . . . . . . . . . . . . . . 356
4.5.1 General Considerations . . . . . . . . . . . . . . . . . . . . . . 356
4.5.2 Loop Tiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
4.5.3 Cache-Oblivious Recursive Methods . . . . . . . . . . . . . . . 371
4.5.4 First Touch Allocation and NUMA Policy . . . . . . . . . . . . 376
4.5.5 Cross-Procedural Loop Fusion . . . . . . . . . . . . . . . . . . 380
4.5.6 Advanced Topic: Prefetching . . . . . . . . . . . . . . . . . . . 385
4.6 Offload Traffic Control . . . . . . . . . . . . . . . . . . . . . . . . . . 387
4.6.1 Bandwidth Optimization with Persistent Buffers . . . . . . . . . 387
4.6.2 Masking Offload Latency with Double Buffering . . . . . . . . 393
4.7 Optimization Strategies for MPI Applications . . . . . . . . . . . . . . 396
4.7.1 Static Load Balancing . . . . . . . . . . . . . . . . . . . . . . . 397
4.7.2 Dynamic Work Scheduling . . . . . . . . . . . . . . . . . . . . 407
4.7.3 Multi-threading within MPI Processes . . . . . . . . . . . . . . 414
4.7.4 Fabric Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
5 Software Development Tools 427
5.1 Intel Math Kernel Library . . . . . . . . . . . . . . . . . . . . . . . . . 427
5.1.1 Functions Offered by MKL . . . . . . . . . . . . . . . . . . . . 428
5.1.2 Linking Applications with MKL. Link Line Advisor . . . . . . 430
5.1.3 MKL on Intel Xeon Phi Coprocessors . . . . . . . . . . . . . . 432
5.1.4 Automatic offload . . . . . . . . . . . . . . . . . . . . . . . . . 433
5.1.5 Compiler-Assisted Offload . . . . . . . . . . . . . . . . . . . . 439
5.1.6 Native Execution . . . . . . . . . . . . . . . . . . . . . . . . . 439
5.1.7 Benchmarks of Select MKL Functions . . . . . . . . . . . . . . 440
5.2 Intel VTune Amplifier XE . . . . . . . . . . . . . . . . . . . . . . . . 444
5.2.1 System Administration . . . . . . . . . . . . . . . . . . . . . . 445

c Colfax International, 2013–2015


xii CONTENTS

5.2.2 Running VTune . . . . . . . . . . . . . . . . . . . . . . . . . . 446


5.2.3 Project Management . . . . . . . . . . . . . . . . . . . . . . . 447
5.2.4 Analysis on the Host CPU . . . . . . . . . . . . . . . . . . . . 448
5.2.5 Analysis on an Intel Xeon Phi Coprocessor . . . . . . . . . . . 459
6 Summary and Resources 465
6.1 Parallel Programming and Intel Xeon Phi Coprocessors . . . . . . . . . 465
6.2 Supplementary Code for Practical Exercises (“Labs”) . . . . . . . . . . 467
6.3 Colfax Developer Training . . . . . . . . . . . . . . . . . . . . . . . . 470
6.4 Additional Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
Bibliography 475

Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
xiii

Foreword to the First Edition


We live in exciting times; the amount of computing power available for sciences and engi-
neering is reaching enormous heights through parallel computing. Parallel computing is driving
discovery in many endeavors, but remains a relatively new area of computing. As such, soft-
ware developers are part of an industry that is still growing and evolving as parallel computing
becomes more commonplace.
The added challenges involved in parallel programming are being eased by four key trends
in the industry: emergence of better tools, wide-spread usage of better programming models,
availability of significantly more hardware parallelism, and more teaching material promising to
yield better-educated programmers. We have seen recent innovations in tools and programming
TM
models including OpenMP and Intel Threading Building Blocks. Now, the Intel R Xeon Phi
coprocessor certainly provides a huge leap in hardware parallelism with its general purpose
hardware thread counts being as high as 244 (up to 61 cores, 4 threads each).
This leaves the challenge of creating better-educated programmers. This handbook from Col-
fax, with a subtitle of “Handbook on the Development and Optimization of Parallel Applications
for Intel Xeon Processors and Intel Xeon Phi Coprocessors” is an example-based course for the
optimization of parallel applications for platforms with Intel Xeon processors and Intel Xeon Phi
coprocessors.
This handbook serves as practical training covering understandable computing problems for
C and C++ programmers. The authors at Colfax have developed sample problems to illustrate
key challenges and offer their own guidelines to assist in optimization work. They provide easy
to follow instructions that allow the reader to understand solutions to the problems posed as well
as inviting the reader to experiment further. Colfax’s examples and guidelines complement those
found in our recent book on programming the Intel Xeon Phi Coprocessor by Jim Jeffers and
myself by adding another perspective to the teaching materials available from which to learn.
In the quest to learn, it takes multiple teaching methods to reach everyone. I applaud these
authors in their efforts to bring forth more examples to enable either self-directed or classroom
oriented hands-on learning of the joys of parallel programming.

James R. Reinders
TM
Co-author of “Intel R Xeon Phi Coprocessor High Performance Programming"
c 2013, Morgan Kaufmann Publishers
Intel Corporation
March 2013

c Colfax International, 2013–2015


xv

Preface to the Second Edition


A lot has happened in Intel’s “parallel universe” since the publication of the first
edition of this book in March 2013. The family of Intel Xeon Phi coprocessors has grown
to three series: 3100, 5100 and 7100, offering a range of performance tiers and prices.
Active-cooling Intel Xeon Phi coprocessors were introduced, allowing workstation users
to take advantage of the Intel Many Integrated Core (MIC) architecture. Plans were
released for future Intel MIC architecture products, based on the Knights Landing chip,
and capable of acting as a stand-alone CPU. In the CPU domain, Intel Xeon processors
based on the Haswell architecture were released, supporting a new instruction set AVX2
and new functionality.
On the software tools side, the Intel Parallel Studio XE 2015 suite was improved to
accommodate the new parallel framework standards: OpenMP 4.0 and MPI 3.0. The
evolution of Intel VTune Amplifier XE has added many useful functions for automated
diagnostics of performance issues. Intel compilers produce more user-friendly optimiza-
tion reports than before, and have become even smarter about automatic vectorization
and other optimizations.
The work in the users’ domain did not stand still, either. With a large number of
case studies and research articles on applications for the Intel MIC architecture, it is
accurate to say that the developer ecosystem has been established. We are proud to say
that Colfax has made a considerable contribution to this progress with the first edition
of “Parallel Programmin and Optimization with Intel Xeon Phi Coprocessors”. In the
years 2013 and 2014, over 1000 science and industry experts at tens of locations across
North America have been students of the Colfax Developer Training based on this book.
Their experience and feedback, along with the innovations in the Intel tools, have built
a solid case for the publication of the second edition of “Parallel Programming and
Optimization with Intel Xeon Phi Coprocessors”.
Among the numerous new features of the second edition, the ones that stand out are:

1. The details unveiled by Intel of the present and future MIC processors, including
Knights Landing;

2. Discussion of configuration and system administration of clusters with Intel Xeon


Phi coprocessors, including InfiniBand support, bridged network configuration
and storage setup;

c Colfax International, 2013–2015


xvi PREFACE TO THE SECOND EDITION

3. Additional applications based on case studies of our research in 2013–2014


included in the text as references, as well as practical exercises;

4. Console listings, example codes and hyperlinks to online manuals accurate as of


Intel Parallel Studio XE 2015, Intel MPSS 3.4.1 and CentOS 7.0 Linux;

5. New programming models made available in OpenMP 4.0;

6. Deeper review of the Intel Math Kernel Library support for the MIC architecture;

7. More convenient page format and font size for on-screen reading, and

8. Numerous updates to the text improving the clarity and depth of the discussion.

We hope that you find this book to be a valuable resource on “all things Xeon Phi”,
and, as always, we value your feedback. The HPC research department of Colfax
International can be reached by email at [email protected], and the latest updates on
our work can be found at research.colfaxinternational.com.

Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
xvii

Preface to the First Edition


Welcome to the Colfax Developer Training! You are holding in your hands or
browsing on your computer screen a comprehensive set of training materials for this
training program. This document will guide you to the mastery of parallel programming
TM
with Intel R Xeon R family products: Intel R Xeon R processors and Intel R Xeon Phi
coprocessors. The curriculum includes a detailed presentation of the programming
paradigm for Intel Xeon product family, optimization guidelines, and hands-on exercises
on systems equipped with Intel Xeon Phi coprocessors, as well as instructions on using
Intel R software development tools and libraries included in Intel R Parallel Studio XE.
These training materials are targeted toward developers familiar with C/C++ program-
ming in Linux. Developers with little parallel programming experience will be able to
grasp the core concepts of this subject from the detailed commentary in Chapter 3. For
advanced developers familiar with multi-core and/or GPU programming, the training
offers materials specific to the Intel compilers and Intel Xeon family products, as well
as optimization advice pertinent to the Many Integrated Core (MIC) architecture.
We have written these materials relying on key elements for efficient learning: practice
and repetition. As a consequence, the reader will find a large number of code listings in
the main section of these materials. In the extended Appendix, we provided numerous
hands-on exercises that one can complete either under an instructor’s supervision, or
autonomously in a self-study training.
This document is different from a typical book on computer science, because we
intended it to be used as a lecture plan in an intensive learning course. Speaking in
programming terms, a typical book traverses material with a “depth-first algorithm”,
describing every detail of each method or concept before moving on to the next method.
In contrast, this document traverses the scope of material with a “breadth-first” algorithm.
First, we give an overview of multiple methods to address a certain issue. In the
subsequent chapter, we re-visit these methods, this time in greater detail. We may go
into even more depth down the line. In this way, we expect that students will have
enough time to absorb and comprehend the variety of programming and optimization
methods presented here. The course road map is outlined in the following list.

• Chapter 1 presents the Intel Xeon Phi architecture overview and the environment
provided by the MIC Platform Software Stack (MPSS) and Intel Parallel Studio
XE on Many Integrated Core architecture (MIC). The purpose of Chapter 1 is

c Colfax International, 2013–2015


xviii PREFACE TO THE FIRST EDITION

to outline what users may expect from Intel Xeon Phi coprocessors (technical
specifications, software stack, application domain).

• Chapter 2 allows the reader to experience the simplicity of Intel Xeon Phi usage
early on in the program. It describes the operating system running on the coproces-
sor, with the compilation of native applications, and with the language extensions
and CPU-centric codes that utilize Intel Xeon Phi coprocessors: offload and virtual-
shared memory programming models. In a nutshell, Chapter 2 demonstrates how
to write serial code that executes on Intel Xeon Phi coprocessors.

• Chapter 3 introduces Single Instruction Multiple Data (SIMD) parallelism and


automatic vectorization, thread parallelism with OpenMP and Intel Cilk Plus, and
distributed-memory parallelization with MPI. In brief, Chapter 3 shows how to
write parallel code (vectorization, OpenMP, Intel Cilk Plus, MPI).

• Chapter 4 re-iterates the material of Chapter 3, this time delving deeper into the
topics of parallel programming and providing example-based optimization advice,
including the usage of the Intel Math Kernel Library. This chapter is the core of
the training. The topics discussed in this Chapter 4 include:
i) scalar optimizations;
ii) improving data structures for streaming, unit-stride, local memory access;
iii) guiding automatic vectorization with language constructs and compiler hints;
iv) reducing synchronization in task-parallel algorithms by the use of reduction;
v) avoiding false sharing;
vi) increasing arithmetic intensity and reducing cache misses by loop blocking
and recursion;
vii) exposing the full scope of available parallelism;
viii) controlling process and thread affinity in OpenMP and MPI;
ix) reducing communication through data persistence on coprocessor;
x) scheduling practices for load balancing across cores and MPI processes;
xi) optimized Intel Math Kernel Library function usage, and other.

If Chapter 3 demonstrated how to write parallel code for Intel Xeon Phi coproces-
sors, then Chapter 4 shows how to make this parallel code run fast.

• Chapter 6 summarizes the course and provides pointers to additional resources.

Throughout the training, we emphasize the concept of portable parallel code. Portable
parallelism can be achieved by designing codes in a way that exposes the data and task

Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
xix

parallelism of the underlying algorithm, and by using language extensions such as


OpenMP pragmas and Intel Cilk Plus. The resulting code can be run on processors as
well as on coprocessors, and can be ported with only recompilation to future generations
of multi- and many-core processors with SIMD capabilities. Even though the Colfax
Developer Training program touches on low-level programming using intrinsic functions,
it focuses on achieving high performance by writing highly parallel code and utilizing
the Intel compiler’s automatic vectorization functionality and parallel frameworks.
The handbook of the Colfax Developer Training is an essential component of a
comprehensive, hands-on course. While the handbook has value outside a training
environment as a reference guide, the full utility of the training is greatly enhanced by
students’ access to individual computing systems equipped with Intel Xeon processors,
Intel Xeon Phi coprocessors and Intel software development tools. Please check the Web
page of the Colfax Developer training for additional information: http://www.colfax-
intl.com/xeonphi/
Welcome to the exciting world of parallel programming!

c Colfax International, 2013–2015


xxi

List of Abbreviations
ALU Arithmetic Logic Unit

AO Automatic Offload

AVX Advanced Vector Extensions (SIMD standard)

BLAS Basic Linear Algebra Subprograms

CAO Compiler Assisted Offload

CCL Coprocessor Communication Link

CFD Computational Fluid Dynamics

CLI Command Line Interface

CPI cycles per instruction

CPU Central Processing Unit, used interchangeably with the terms “processor” and
“host” to indicate the Intel Xeon processor, as opposed to the Intel Xeon Phi
coprocessor

CRI Core Ring Interconnect

DAPL Direct Access Programming Library

DFFT Discrete Fast Fourier Transform

DGEMM Double-precision General Matrix-Matrix Multiply

DMA Direct Memory Access

DSS Direct Sparse Solver

DTD Distributed Tag Directory

ECC Error Correction Code

FFT Fast Fourier Transform

c Colfax International, 2013–2015


xxii PREFACE TO THE FIRST EDITION

FMA Fused Multiply-Add

FP Floating-point

FPGA Field Programmable Gate Array

GCC GNU Compiler Collection

GDDR Graphics Double Data Rate memory

GFLOP Gigaflop, 109 floating point operations.

GFLOP/s Performance metric. Unless stated otherwise, refers to theoretical peak


performance of the multiply and add operation(s), or to the performance of the
HPC Linpack Benchmark

GPGPU General-Purpose Graphics Processing Unit

GUI Graphical User Interface

HPC High Performance Computing

I/O Input/Output

IMCI Initial Many-Core Instructions

IP Internet Protocol

ISA Instruction Set Architecture

ITAC Intel Trace Analyzer and Collector

KNC Knights Corner

KNL Knights Landing

LAPACK Linear Algebra Package

LRU Least Recently Used, a cache replacement policy

MESI Modified/Exclusive/Shared/Invalid, a cache coherency protocol

MKL Math Kernel Library

Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
xxiii

MMB Maximum Memory Bandwidth

MMIO memory-mapped I/O

MMX Multimedia Extensions (SIMD standard)

MPI Message Passing Interface

MPSS Manycore Platform Software Stack

NFS Network File Sharing Protocol

NUMA Non-Uniform Memory Access

OEM Original Equipment Manufacturer

OFED OpenFabrics Enterprise Distribution

OpenCL Open Computing Language

OS operating system

PARDISO Parallel Direct Sparse Solver

PCIe Peripheral Component Interconnect Express

PFLOP Petaflop, 1015 floating point operations. See also GFLOP

PMU Performance Monitoring Unit

PSM Performance Scaled Messaging

QPI Quick Path Interconnect

RAM Random Access Memory

RCI ISS Iterative Sparse Solvers based on Reverse Communication Interface

RCP Recommended Customer Price

RDMA Remote Direct Memory Access

RNG Random Number Generator

c Colfax International, 2013–2015


xxiv LIST OF ABBREVIATIONS

ScaLAPACK Scalable Linear Algebra Package

SIMD Single Instruction Multiple Data

SMP Symmetric Multiprocessor

SSE Streaming SIMD Extensions (SIMD standard)

SSH Secure Shell protocol

SVML Short Vector Math Library

TD Tag Directory

TDP thermal design power

TFLOP Teraflop, 1012 floating point operations. See also GFLOP

TLB Translation Lookaside Buffer

TMI Tag Matching Interface

TPP Theoretical Peak Performance

TSX Transactional Synchronization Extensions

VML Vector Mathematical Library

VSL Vector Statistical Library

Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
1

CHAPTER 1
Introduction
This chapter introduces the Intel manycore architecture and positions Intel
Xeon Phi coprocessors in the context of parallel programming.
Even though the focus of this book is on Intel Xeon Phi coprocessors, we
will also briefly discuss the Intel Xeon family CPUs. This is necessary to
put the performance characteristics of Intel Xeon Phi coprocessors in proper
perspective.
Our approach to comparing CPUs and the manycore architecture builds
upon the first question that the designer of a computing system may ask:
does it make more sense spend the budget for setup costs and operational
expenses on all-CPU nodes, or purchase fewer nodes, but enhance them with
coprocessors? Naturally, technical specifications alone cannot be used to
answer this question. This question can be answered only by benchmarks
of specific applications in combination with power measurements, total cost
analysis, and additional factors such as development effort, available rack
space, administrative burden, etc.
This chapter will help to set expectations for the potential of the Intel
manycore architecture for the reader’s outstanding computing challenges.

c Colfax International, 2013–2015


2 CHAPTER 1. INTRODUCTION

1.1. Intel Xeon Phi Coprocessors

1.1.1. Technology Overview

Intel Xeon Phi coprocessors have been designed by Intel Corporation as


a supplement to the Intel Xeon processor family. The coprocessors feature
the Intel manycore architecture, which enables fast and energy-efficient
execution of some High Performance Computing (HPC) applications.
In most Intel communications, the term “manycore”, refers to the archi-
tecture of the Intel Xeon Phi product family, while “multi-core” architecture
referes to the Intel Xeon family processors.

Figure 1.1: Left: multi-core Intel Xeon processors (CPUs), Right: manycore Intel Xeon Phi
coprocessor. Relative sizes are not to scale.

The manycore architecture may yield more performance per watt of power
and per dollar of setup costs than traditional multi-core CPUs. However,
not every application can be accelerated by manycore coprocessors. Intel
Xeon Phi coprocessors derive their high performance from multiple cores,
dedicated vector arithmetic units with wide vector registers, and cached
onboard GDDR5. High energy efficiency is achieved through the use of
low clock speed x86 cores with lightweight design suitable for parallel
HPC applications. Therefore, only highly parallel applications supporting
vectorized arithmetic with well-behaved (or negligible) memory traffic will
thrive on the manycore architecture.

Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
1.1. INTEL XEON PHI COPROCESSORS 3

Figure 1.2: Examples of computing system solutions featuring the Intel Xeon Phi coprocessors.
Left: A Colfax Workstation CXP7450 with two Intel Xeon Phi coprocessors. Right: A Colfax
Server CXP9000 with eight Intel Xeon Phi coprocessors. Relative sizes not to scale.

First generation Intel Xeon Phi coprocessors based on the Knights Corner
(KNC) chip are end-point Peripheral Component Interconnect Express (PCIe)
devices. They can be installed on the PCIe bus and operated in coprocessor-
ready computing systems, including workstations (e.g., Figure 1.2, left) and
servers (e.g., Figure 1.2, right).
An Intel Xeon Phi coprocessor cannot operate without a CPU-based host
system, which is the reason for terming these products coprocessors. Because
they reside on the PCIe bus and have their own on-board RAM, coprocessors
do not share memory address space with the CPU. Consequently, the mere
presence of a coprocessor in a system does not automatically improve the
performance of applications running on the CPU. To utilize the MIC archi-
tecture, the application or the cluster execution manager must be aware of
the presence of a coprocessor.
The usage model of the second generation Intel MIC based on the Knights
Landing (KNL) chip will be different. The second generation chip will be
available as a standalone processor, as well as a PCIe-endpoint device. For
the standalone processor version, applications need not be coprocessor-aware
in order to be accelerated. However, a prerequisite for accelerated perfor-
mance is optimization of the application code for multi-core and manycore
architectures. See Section 1.4 for more information.

c Colfax International, 2013–2015


4 CHAPTER 1. INTRODUCTION

1.1.2. Conventional Programming, Portable Code


This section describes the value proposition of the Intel MIC architecture.
Established Programming Models
Because of the similarity of the manycore and multi-core architectures,
an Intel Xeon Phi coprocessor can execute applications compiled from the
same C/C++ or Fortran code as an Intel Xeon processor. Furthermore,
Intel Xeon processors and Intel Xeon Phi coprocessors support the same
parallel frameworks and require similar code optimization methods. This is
a significant advantage of the Intel manycore architecture over computing
accelerator technologies (GPGPUs and FPGAs).
The process of application porting to GPGPUs typically involves dis-
carding and re-writing from scratch the compute-intensive pieces of code.
This process is time consuming and prone to the introduction of new bugs,
because the application cannot be tested until porting is complete.
In contrast, it is usually possible to port a code designed for many-core
systems to the MIC architecture. After that, the programmer can incremen-
tally adapt (optimize) the application to the coprocessor platform. Such easy
porting is very important for projects that require modernization of millions
lines of legacy scientific and industrial applications.
It is fair to say that Intel Xeon Phi coprocessors are easy to program
because they use the same languages, frameworks and principles as general-
purpose Intel architecture CPUs, which are familiar to the overwhelming
majority of developers. At the same time, Intel Xeon Phi coprocessors are
only useful in the context of parallel programming, which is not the comfort
zone for the majority of CPU application developers. This book aims to
assist the developers in understanding the programming methods required
to leverage parallelism in both Intel Xeon processors and Intel Xeon Phi
coprocessors.
Common Optimization Requirements
It is incorrect to think that the ability to run legacy code “out of the
box” on Intel Xeon Phi coprocessors means immediate acceleration. On
the contrary, in many cases, the performance of applications just ported to
the MIC architecture is disappointing, and code optimization is required.
Optimization is often a significantly greater effort than initial porting.

Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
1.1. INTEL XEON PHI COPROCESSORS 5

At the same time, the optimization methods used in applications for Intel
Xeon Phi are the same methods that are used in applications for general-
purpose Intel architecture CPUs. Indeed, case studies show that a code
optimized for the MIC platform also runs significantly faster on a CPU
(for a synthetic example, see paper [1] illustrated in Figure 1.3; code for a
similar application is available among the Supplementary Code for Practical
Exercises as Lab 4.01 – see Section 6.2; for realistic examples, refer to [2]).

N-body simulation performance with N=30000


70 66.5
Two Intel(R) Xeon(R) E5-2680 processors
Steps per second (higher is better)

60 One Intel(R) Xeon Phi(TM) B1QS-5110P coprocessor


50
40
31.4
30
20 17.9
10 8.7 10.4
7.2
0 Accuracy
Unoptimized Unit-stride
Code Access Control, -xhost

Figure 1.3: The same C language code used for a simple N-body simulation on the CPU and on
a coprocessor. See white paper [1] for more information.

c Colfax International, 2013–2015


6 CHAPTER 1. INTRODUCTION

Heterogeneous and Accelerated Computnig


From the development maintenance point of view, having a single code
for the main processor and for the coprocessor opens doors to heterogeneous
computing and public code distribution. A heterogeneous application may
utilize the CPU together with the MIC coprocessor, wasting no resources.
Public code with support for Intel Xeon Phi coprocessors has the advantage
that for users who do not own a coprocessor, the execution can seamlessly
fall back to the CPU.
If an application is developed from scratch, rather than ported from a
legacy C, C++ or Fortran code, then developers have additional options for
ensuring code portability. For example, the OpenCL parallel framework can
be used to design a single code for multiple platforms, including the Intel
MIC architecture. In practice, however, even though an OpenCL application
can run on a CPU as well as on a GPGPU or a MIC coprocessor, it has to be
tuned for each platform. At the same time, the similarity of the multi-core
CPU and the MIC architectures ensures that a high-level language code
optimized for the MIC architecture is also optimal for the CPU.
Portability and Future-Proofing
Portability is an important consideration for many developers. Ideally, a
high-level language code developed once should run, with minimal modifica-
tions, on other manufacturers’ processor architectures, as well as on older
and future computing platforms.
Intel Xeon Phi coprocessors are based on the basic architectural elements
common in Intel 64 and Itanium architectures, AMD x86 processors, Sun
SPARC, IBM Blue Gene, Power architecture, and other general purpose
processors: cores, threads, cached memory, vectors. Even though instruction
sets and quantitative aspects of technical specifications are not compatible
across these architectures, the approach to programming computers based
on these architectural elements is common.
Future Intel parallel architectures will evolve using the same architectural
elements (see Section 1.4). This ensures longevity of high-level language
codes for developed today’s Intel Xeon Phi coprocessors. See Section 1.4.3
an extended discussion of this topic.

Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
1.1. INTEL XEON PHI COPROCESSORS 7

1.1.3. Heterogeneous Computing and Clustering


Programming models for Intel Xeon Phi coprocessors include native
execution and offload-based approaches. These approaches enable developers
to design a spectrum of hybrid computing models, ranging from multi-core-
hosted (i.e., only employing the CPU) to multi-core-centric (i.e., executing
on the host system with some operations performed on the coprocessor) to
symmetric (i.e., employing the host and the coprocessor on an equal basis)
and manycore-hosted (i.e., executing exclusively on a set of coprocessors).
The choice of work division between the host and the coprocessor is dic-
tated by the nature of the application. Highly parallel, vectorized workloads
(e.g., linear algebraic calculations) can be executed on the coprocessor as
well as on the host. However, serial segments of an application perform
significantly better on Intel Xeon processors, and so do applications with
stochastic memory access patterns. The overhead of data transport over the
PCIe bus should also be taken into consideration.
Figure 1.4 summarizes the development options for systems enabled with
Intel Xeon Phi coprocessors.

Figure 1.4: Intel architecture benefit: wide range of development options. Breadth, depth,
familiar models meet varied application needs. Diagram based on Intel materials.

Intel Xeon Phi coprocessors are Internet Protocol (IP)-addressable devices


running a Linux operating system (OS). This property enables straightfor-
ward porting of code written for the Intel Xeon architecture to the MIC
architecture. This, combined with code portability, makes Intel Xeon Phi
coprocessors a compelling platform for heterogeneous clustering. In hetero-
geneous cluster applications, host processors and MIC coprocessors can be
used on an equal basis as individual compute nodes.

c Colfax International, 2013–2015


8 CHAPTER 1. INTRODUCTION

1.1.4. Intel Xeon Phi Product Family


Intel Xeon Phi coprocessors come in a range of models featuring different
thermal design power (TDP), different theoretical peak performance and
different memory capacities. Each model is identified by a 5-character code
as shown in Figure 1.5.

Intel® Xeon PhiTM coprocessor 7120P


Brand
(the family Performance shelf
7 - best performance Generation
of products) SKU
5 - best performance/watt 1=Knights
3 - best value Corner digits Product Line Suffix
2=Knights A/P/X=active/passive/no cooling
Landing D=dense form-factor

Figure 1.5: Five-character code identifying the model of an Intel Xeon Phi coprocessor.

The first character in the code stands for the performance shelf: 3, 5 or
7. The second character is the product generation. As of the writing of this
book (Feb 2015), only generation 1 (KNC) is available. Therefore, available
models can be organized into 3 groups: 3100, 5100 and 7100 series.

3100 Series is designed as the price-optimal group. Models in this series


contain fewer active cores, less onboard memory, and feature a lower
memory bandwidth than in other series. This series is a good choice
for compute-bound workloads.

5100 Series is optimized for performance per watt. 5100 Series coproces-
sors feature lower TDP, contain more memory and cores than the 3100
series, and perform better in memory bandwidth-bound and memory
capacity-bound workloads.

7100 Series is the top performing group. It has the greatest core count,
memory size and bandwidth of all series. It also comes at a higher
price than other series, and greater TDP than the 5100 series.

The third and fourth characters in the code are the SKU digits. These

Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
1.1. INTEL XEON PHI COPROCESSORS 9

generally indicate the product stepping, and they increase as minor silicone-
level improvements are made.
Finally, the fifth character is a letter, which indicates the cooling solution
or special usage case of the model.

A stands for active cooling. These coprocessors come inside a heat sink with
a built-in and fan, and are suitable for usage in desktop workstations
(Figure 1.6, left). This cooling solution is not reliant on system fans,
and the built-in fan speed is controlled by an onboard sensor, which
allows these coprocessors cards to be quiet in the idle state.

P stands for passive cooling. These coprocessors have come in a heat sink,
but have no fan (Figure 1.6, right). They cannot be used in workstations
because of imminent overheating, and are designed for servers.

X indicates that no cooling solution is provided, i.e., there is no heat sink on


the card. These coprocessors can be used only with custom cooling
solutions such as liquid cooling, because normal airflow from common
system fans is not sufficient for heat removal.

D is the dense form factor model. It does not have a heat sink, and is smaller
in size than the X option. These models are designed for specialized
solutions capable of supporting a large density of thermal dissipation.

Figure 1.6: Active and passive cooling solutions of Intel Xeon Phi coprocessors.

c Colfax International, 2013–2015


10 CHAPTER 1. INTRODUCTION

Model TDP Cores Clock Turbo RAM MMB DP TPP RCP


(W) (GHz) Boost (GiB) (GB/s) (GFLOP/s)
3120P 300 57 1.100 no 6 240 1003.2 $1695
3120A 300 57 1.100 no 6 240 1003.2 $1695–1960
5120D 245 60 1.053 no 8 352 1010.9 $2759
5110P 225 60 1.053 no 8 320 1010.9 $2437–2649
7120X 300 61 1.238–1.333 1.0 16 352 1208.3 $4129
7120P 300 61 1.238–1.333 1.0 16 352 1208.3 $4129
7120D 270 61 1.238–1.333 1.0 16 352 1208.3 $4235
7120A 300 61 1.238–1.333 1.0 16 352 1208.3 $4129

Table 1.1: Models of Intel Xeon Phi coprocessors available as of May 2014. Columns contain:
model name, thermal design power (TDP) in Watts, number of physical cores, their clock
speed, Intel Turbo Boost technology support, onboard memory size in GiB, maximum memory
bandwidth (MMB) in GB/s, double precision (DP) theoretical peak performance (TPP) in
GFLOP/s, and RCP. RCP is price guidance for bulk purchases by direct Intel customers, subject
to change without notice, not a formal pricing offer from Intel or Colfax International.

Table 1.1 summarizes the currently available models of Intel Xeon Phi
coprocessors and their specifications. In this table, all quantities are obtained
from the Intel Xeon Phi Product Family page, except for the Theoretical
Peak Performance (TPP), which is estimated according to Equation (1.1):

TPP Clock Speed SIMD Register Size


= × FMA × × Cores. (1.1)
GFLOP/s GHz sizeof(TYPE)
For Intel Xeon Phi coprocessors, FMA = 2 (the fused multiply-add operation
is performed in one cycle), SIMD register size is 512 bits (64 bytes), and the
size of double precision numbers is 64 bits (8 bytes). See Section 1.3 and
4.5 for additional discussion.

Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
1.1. INTEL XEON PHI COPROCESSORS 11

1.1.5. Intel Xeon Processor E3, E5 and E7 Family


Only the Intel Xeon family server processors are considered in this book
in conjunction with Intel Xeon Phi coprocessors. Desktop and mobile device
TM TM
product lines (Intel R Core , Intel R Atom , Intel R Pentium R and Intel R
Celeron R ) are not discussed, because
a) generally, there is no support for Intel Xeon Phi coprocessors in boards,
chipsets and BIOS software compatible with consumer CPUs,
b) the set of features, TDP, and cost of consumer processors are very different
from server CPUs, which is not suitable for a meaningful comparison.

Intel Xeon family adheres to the numbering scheme shown in Figure 1.7.

Intel® Xeon® processor E5-2670 v2


Brand
(the family Product line Wayness
of products) E7 - best performance (1,2,4 or 8)
E5 - best performance/watt Socket type Version
E3 - best value (2,4,6 or 8) v1=Sandy Bridge
Processor SKU
v2=Ivy Bridge
(10, 20, ...)
v3=Haswell

Figure 1.7: Codes identifying the model of an Intel Xeon CPU.

The product line (E3, E5 and E7) for Intel Xeon CPUs is similar to the
performance shelf for Intel Xeon Phi coprocessors: E3 is the lowest-cost
option, E5 is optimized for best power consumption, and E7 is the top
performing line.
Wayness is the maximum number of CPU sockets per node. Two digits
of the processor SKU places the CPU within its family. There differences
between different SKUs are mostly quantitative. The SKU determines the
number of cores, clock speed, maximum memory bandwidth, and cache size.
After the SKU, in some CPU models, an additional suffix “L” is present,
indicating a low power consumption model.
Finally, the version of the CPU (v1, v2 or v3) determines the type of
processor microarchitecture used in the chip: Sandy Bridge (v1), Ivy Bridge
(v2) or Haswell (v3). The difference between versions depends on whether

c Colfax International, 2013–2015


12 CHAPTER 1. INTRODUCTION

the version update was a “tick” or a “tock”. For instance, Sandy Bridge
to Ivy Bridge development was a “tick”, i.e., a newer, smaller transistor
technology was used in v2. As a result, v2 CPUs may have more cores,
greater performance and lower power consumption than v1, however, the
instruction set is unchanged. In contrast, Ivy Bridge to Haswell update was
a “tock”, i.e., the same transistor technology as in Ivy Bridge was used to
produce an architecturally improved chip. As a result, v3 CPUs support
additional instruction sets (in this case, AVX2) and features (e.g., TSX), and
operate with a different chipset.
Model TDP Cores Clock Cache MMB DP TPP RCP
(W) (GHz) (MiB) (GB/s) (GFLOP/s)
E5-2603 80 4 1.8 10 34.1 57.6 $198
E5-2690 135 8 2.9 20 51.2 185.6 $2057
E5-2603 v2 80 4 1.8 10 42.6 57.6 $202
E5-2697 v2 130 12 2.7 30 59.7 259.2 $2614
E5-2603 v3 85 6 1.6 15 68.0 76.8 $217
E5-2697 v3 145 14 2.6 35 51.0 291.2 $2706

Table 1.2: Some of the models of Intel Xeon processors available as of April 2015. Columns
as in Table 1.1. RCP is price guidance for bulk purchases by direct Intel customers, subject to
change without notice, not a formal pricing offer from Intel or Colfax International. Values are
per socket; double all values for a dual-socket CPU.

Of the multitude of Intel Xeon SKUs, the most important for the discus-
sion in this book are two-way multi-core CPUs. This is because their TDP
and cost are comparable to those of a single Intel Xeon Phi coprocessor (see
also Section 4.1.2).
Table 1.2 lists key technical specifications of a few selected two-way
models of Intel Xeon processors. Note that the quantities in Table 1.2 are
reported per socket, so for a two-way machine, they must be multiplied
by 2. DP TPP is estimated similarly to Equation (1.1), with SIMD Register
Size=256 bits, and an additional factor of ×2 to account for two ALUs
in Sandy Bridge and Ivy Bridge architectures, or for FMA in the Haswell
architecture (see Section 4.5).
For complete information on the technical specifications of other Intel
processors, refer to http://ark.intel.com/.

Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
1.2. MIC ARCHITECTURE: DEVELOPER’S PERSPECTIVE 13

1.2. MIC Architecture: Developer’s Perspective


Programming applications for Intel Xeon Phi coprocessors is not signifi-
cantly different from programming for Intel Xeon processors. Indeed, both
devices feature the x86 architecture, support for C, C++ and Fortran, and
common parallelization libraries. Therefore, only familiarity with multi-
core processor programming is required. However, in order to optimize
applications, it is helpful to know some of the architectural properties of the
coprocessor. Relevant properties are described in this section.

1.2.1. Knights Corner Die Organization


The KNC die is manufactured using the 22 nm process technology with
3-D Trigate transistors. This technology allows to fit 62 cores and up to
16 GiB of cached GDDR5 memory on a single die. In most production
coprocessor models, from 57 to 61 cores are active, and from 6 to 16 GiB of
RAM is available. The cores and GDDR5 memory controllers are connected
via a bi-directional Core Ring Interconnect (CRI) (see Figure 1.8).

Core Ring
SBOX
CORE CORE CORE CORE Interconnect (CRI)
PCIe v2.0
controller, DATA
L2 L2 L2 L2
DMA engines
ADDRESS
COHERENCE
TD TD TD TD

GDDR5 CORE L2 TD Distributed tag TD L2 CORE GDDR5


directory (DTD)

GDDR5 GDDR5
TD TD
CORE L2 L2 CORE
GDDR5 TD TD TD TD GDDR5
GBOX GBOX
GDDR5 (memory (memory GDDR5
controller) L2 L2 L2 L2 controller)

CORE CORE CORE CORE

Figure 1.8: Knights Corner die organization. A bi-directional ring interconnects cores, tag
directories, onboard memory controllers and PCIe/DMA engines.

c Colfax International, 2013–2015


14 CHAPTER 1. INTRODUCTION

The CRI consists of three bi-directional rings:


1. the data ring, as the name suggests, carries application data between cores
and memory controllers;
2. the address ring carries commands from cores to other devices for memory
fetches, and
3. the acknowledgement ring is used for cache coherency traffic.

In addition to cores, the CRI contains devices that allow the chip to operate
as a symmetric multiprocessor:
i) A distributed Tag Directory (TD): multiple TD devices maintain infor-
mation about cache lines in the L2 caches, and of their states. Together,
all TDs form a Distributed Tag Directory (DTD), responsible for main-
taining a global cache coherency.
ii) 6 to 8 GBOX units, which are memory controllers for onboard GDDR5
RAM. Each controller has two 32-bit channels delivering up to 5.5 GT/s.
The RAM has the Error Correction Code (ECC) capability.
iii) An SBOX (system box) unit, supporting a PCI Express v2.0 logic with
eight Direct Memory Access (DMA) channels for data transfer from
system to GDDR5 memory.

From the programmer’s perspective, this architecture is more non-uniform


than the symmetric architecture of an Intel Xeon CPU. Indeed, the latency
and bandwidth of communication between two cores depends on the distance
between them on the CRI. Therefore, applications for the MIC architecture
must, whenever possible, maintain good data locality and avoid synchroniza-
tion. This will be discussed in greater detail in Chapter 4.

Parallel Programming and Optimization with Intel Xeon Phi Coprocessors. Second Edition
Other documents randomly have
different content
finance, and social and economic institutions. The literary value of
these records is not, however, to be inconsiderately judged from
their bulk. Times and standards in American historiography have
changed. Among the multitude of authors one must not look for
many names which may be written down with those of Prescott,
Motley, and Parkman. Not that the modern period is wanting in good
work or able writers. These are to be found in abundance. But most
of the work belongs to science and not to letters; and besides,
eminence is not fostered by the catholic distribution of talent and
training. Jameson picks up Amiel’s blunt opinion that “the era of
mediocrity in all things is commencing” and applies it to American
historians. At the same time, this wise critic inclines to the belief that
the vast improvement in technical process and workmanship realised
within the present generation is the natural means to the
development of a more substantial and more profound school of
historians than the West has thus far created. The term “mediocrity”
does not, indeed, do full justice to the period and the authors in
question, and we must seek other grounds of excuse for the brevity
of our review of them. These grounds are found, first, in the indirect
importance to literature of the great mass of recent work, and,
secondly, in the impossibility of setting the achievements of
contemporary workers in just perspective.
The writers, great and little, of the periods already surveyed
were, in large measure, self-trained. Until the last two or three
decades, colleges and universities offered little incentive to
methodical work upon historical subjects. Even Harvard, from whose
doors went one after another the men who were to make the New
England School famous, taught history only incidentally. Now, an
academic school has arisen. Young men and women are trained in
undergraduate and graduate studies by teachers who are
themselves historical writers and investigators. Students are taught
the discriminating use of historical instruments, and sound methods
of reconstruction and interpretation. The change has been wrought
under the unequal pressure of external influence, emphasis laid
upon scientific method, a quickened consciousness of the
importance and dignity of American history, and, finally, the example
of those graceful and inspiring writers who gave to Western
historiography an honourable place in the world’s literature. The
academic school owes its existence to no single founder. It is, by its
nature, a school of coöperative endeavour,—coöperation, first,
between teacher and pupil, and coöperation, later, in the conjoint
and organised labour of productive hands and brains. Among its
early advocates and promoters were Charles Kendall Adams,
university professor and president, teacher and historian, who
adapted the German seminary method to the American university;
Henry Adams, professor at Harvard University and author of a
brilliant history in nine volumes (1889–91) of the country under
Jefferson and Madison (1801–17); Justin Winsor, librarian,
bibliographer, and editor of the useful and scholarly “Narrative and
Critical History of America” (1884–89), and Herbert Baxter Adams, of
Johns Hopkins, historian and instructor of historical students. The
coöperative labours of the period have borne abundant fruit. Besides
Winsor’s volumes should be mentioned “The American Nation: a
History from Original Sources by Associated Scholars,” a gigantic
work in twenty-seven volumes just finished (1904–8) under the
editorship of Albert Bushnell Hart. The authorship is divided among a
number of competent historical writers. The collection lays claim to
being “the first comprehensive history of the United States, now
completed, which covers the whole period” from the discovery of
America to the present. Similar undertakings are, however, in
progress, and a number of coöperative works of smaller scope are
already in print. Other notable histories covering comparatively long
periods of time are Edward Channing’s “A History of the United
States,” to be completed in eight volumes; a series of nine volumes
relating to preconstitutional times written by John Fiske, after the
manner of Parkman, and including “The Critical Period of American
History” (1888), “The Beginnings of New England” (1889), “The
American Revolution” (1891), “The Discovery of America” (1892),
etc.; James Schouler’s “History of the United States under the
Constitution” (1880–99); “A Popular History of the United States”
(1876–81), by William Cullen Bryant and Sydney H. Gay; “A History
of the People of the United States from the Revolution to the Civil
War” (6 of the 7 volumes published, 1883–1906), by John B.
McMaster; “The Constitutional and Political History of the United
States,” (1877–92), by Hermann E. von Holst, and “A History of the
American People” (1902), by President Woodrow Wilson of Princeton
University. Channing’s attempt to cover, by the labours of a single
competent scholar, the entire history of the country is comparable to
that of George Bancroft. John Fiske wrote readable and popular
narratives of historical events. He did much, both by books and
lectures, to arouse general interest in matters of American life past
and present. McMaster’s substantial and illuminating history is social
rather than political. He seeks to portray the whole life of the people.
Von Holst’s aim was, on the other hand, political. The author was a
German-American. He held, among academic posts, professorships
at Freiburg and the University of Chicago. His critical review, often
disparaging to democratic institutions, may be taken as a
counterblast to the ebullient patriotism of earlier, native writers. As
the work of a foreign observer of American affairs, it suggests the
reflections of de Tocqueville, of James Bryce, and of Goldwin Smith.
President Wilson’s five volumes contain a wise and judicial
commentary, in the form of a long and attractive essay, on the main
course of events since the days of discovery. For the multitude of
American historical writers who have treated single epochs, space
permits mention of only one or two names. James Ford Rhodes’
“History of the United States from the Compromise of 1850” (7
volumes, 1902–6), the work of “nineteen years’ almost exclusive
devotion,” is commonly regarded as the most thorough and best
balanced study of the Civil War, its causes and its consequences.
Henry Adams has, in his “History of the United States,” etc.,
investigated with competence and penetration the administrations of
Jefferson and Madison.
This meagre list of the more important productions of the
academic school clearly reveals the attraction of the American theme
for the present American historian. Capable and impressive studies
of foreign subjects there have been, it is true;—David Jayne Hill’s
“History of Diplomacy in the International Development of Europe”
and Henry C. Lea’s work on the medieval church are conspicuous
instances;—but the great mass of research and writing has been
gathered at home. Governmental affairs and political events loom
large. Less interest has been taken in the subtler phases of national
character and individual motive; although Fiske and McMaster and
Woodrow Wilson and certain of the best biographers (whose
important service to literature deserves separate consideration)
represent a current tendency toward reflective and philosophical
writing of a literary quality, which augurs well for the future of
American historiography.

II. THE NOVELISTS


The Beginnings.—American fiction was one of the latest types
of native literature to appear. The hard conditions of life imposed on
the colonists by the necessity of clearing the forests and keeping the
Indians in check were evidently unfavourable to sustained efforts in
imaginative writing. And there were other reasons for the late
growth of the novel. Except as they had a religious turn or an
evident moral, stories were likely to be looked upon by the Puritans
as a species of useless frivolity, which could have no part in the
3
saving of souls. Again, in the struggle with the mother country the
robust and scholarly intellects of America had other matters to think
of besides the elements of pure literature. The rights of man, the
basis of resistance to tyranny, the principles of statecraft, the
elements of democracy, were among the interests that absorbed the
Washingtons, the Otises, and the Hamiltons of the latter part of the
eighteenth century. But perhaps the most important reason for the
tardy appearance of American fiction was the lack of tradition and
legend. Of this Hawthorne complained as late as 1859, in the
preface to “The Marble Faun”:
No author, without a trial, can conceive of the difficulty of
writing a romance about a country where there is no shadow,
no antiquity, no mystery, no picturesque and gloomy wrong,
nor anything but a commonplace prosperity, in broad and
simple daylight, as is happily the case with my dear native
land. It will be very long, I trust, before romance-writers may
find congenial and easily handled themes, either in the annals
of our stalwart republic, or in any characteristic and probable
events of our individual lives. Romance and poetry, ivy,
lichens, and wall-flowers need ruin to make them grow.

Thus it was that for a long time Defoe and Fielding, Smollett and
Sterne found no imitators in America. The American novel-reader, for
the most part, was content with British provender, and satisfied his
appetite for the marvellous with Walpole’s “Castle of Otranto,” Lewis’
“Monk,” and Mrs. Radcliffe’s “Romance of the Forest” and “The
Mysteries of Udolpho.” Toward the end of the eighteenth century
several writers essayed the novel, but not with lasting success. In
“The Foresters” (published serially in The Columbian Magazine, and
in book form in 1792), Jeremy Belknap (1774–98) produced an
ingenious though trivial allegorical tale of the colonisation of America
and the rebellion of the colonies. In this, Peter Bullfrog stood for
New York, Ethan Greenwood for Vermont, Walter Pipeweed for
Virginia, Charles Indigo for South Carolina, and so on. Ann Eliza
Bleecker (1752–83) was the author of “The History of Maria Kittle,”
which in the form of a letter sets forth some harrowing experiences
among the savages during the French and Indian War; and of “The
Story of Henry and Anne,” a tale, “founded on fact,” of the
misfortunes of some German peasants who finally settled in
America; both of these were published posthumously in her “Works”
in 1793. Mrs. Susanna Haswell Rowson’s “Charlotte Temple” (1790),
a story of love, betrayal, and desertion, despite its absurdly stilted
phrases and its long-drawn melancholy, has ever been popular with
a certain class of readers; the editor of the latest edition (1905), Mr.
Francis W. Halsey, has examined 104 editions, and his list is
incomplete. An avowed antidote to “Charlotte Temple,” Mrs. Tabitha
G. Tenney’s satirical “Female Quixotism” (1808), suggests to
Professor Trent “an expurgated Smollett”; it is now unknown. Mrs.
Hannah W. Foster, the wife of a clergyman in Massachusetts, wrote
“The Coquette, or The History of Eliza Wharton, a Novel Founded on
Fact” (1797), a story of desertion, showing the marked influence of
Richardson. In the same year, appeared “The Algerine Captive,” by
Royall Tyler, who was one of the first to turn to American life as a
fruitful subject for fiction. His story is a broadly humorous picaresque
tale, of the Smollett type, which introduces rather too many
wearisome details of customs in Algiers; a fault for which his
generally spirited style and his powerful description of the horrors of
a slave-ship partially atone.
Hugh Henry Brackenridge (1748–1816), the classmate at
Princeton of James Madison and Philip Freneau, wrote “Modern
Chivalry, or The Adventures of Captain John Farrago and Teague
O’Regan, His Servant” (Philadelphia and Pittsburgh, published in four
parts, 1792–7), a modern “Don Quixote” narrating his experiences in
the Whisky Insurrection of 1794. Though widely read in its day,
especially by artisans and farmers, its literary worth was not
sufficient to preserve it. “The Gamesters,” published in 1805 by Mrs.
Catharine Warren, was likewise popular in its day; it attempted “to
blend instruction with amusement.”

Charles Brockden Brown.—The history of the novel in America,


therefore, properly begins with Charles Brockden Brown (1771–
1810), who has been called “the first professional man of letters and
important creative writer of the English-speaking portion of the New
World.” He was born in Philadelphia of a good Quaker family; just
forty years earlier, his uncle, Charles Brockden, had drawn up the
constitution of the old Philadelphia Library Company. From early
childhood, books were familiar to the youthful Brown, who became
an omnivorous reader, and at Robert Proud’s school undermined his
health by excessive devotion to reading and study, so that he was
always an invalid. He took up the study of law, but soon abandoned
it, despite the protest of his family, for the career of “book-making.”
After some writing of verse and of essays, he published in 1798 a
successful novel, “Wieland, or The Transformation,” and at once
followed this with five others, “Ormond, or The Secret Witness,”
(1799), “Arthur Mervyn, or Memoirs of the Year 1793” (1799–1800),
in which he gave an account of the ravages of the yellow fever in
Philadelphia, “Edgar Huntly, or The Adventures of a Sleep-Walker,”
“Clara Howard” (1801), and “Jane Talbot” (published in England in
1804). From 1798 till 1801, Brown lived amid congenial surroundings
in New York; in the former year he nearly died of yellow fever, to
which his friend Dr. Elihu H. Smith succumbed. Returning to
Philadelphia in 1801, he spent the remainder of his life there;
marrying happily in 1804, editing The Literary Magazine, and writing
political pamphlets and works on geography and Roman history, until
consumption brought his busy and useful life to a premature end.
Brown’s novels mostly belong with the “tales of terror” so
popular in his day. A radical thinker and analyst, he rejects
supernatural agencies in his explanation of events, and relies wholly
on natural causes; but this does not diminish the number of marvels
in his tales. The plots of one or two of his stories will give an idea of
the character of all. The scene of “Wieland” is laid on the banks of
the Schuylkill, in Pennsylvania. The Wielands are a cultivated
German family. Wieland’s father has died mysteriously by what is
explained as self- or spontaneous combustion, and the son has
inherited a melancholy and superstitious mind, which develops into
fanaticism. The family hear strange voices giving commands or
warnings or telling of events beyond the reach of human knowledge.
A mysterious man, Carwin, appears, with such powers of pleasing
that he becomes very intimate with the family. At length Wieland, at
the command of what he takes to be a heavenly voice, sacrifices to
God his wife and children. Confined in a maniac’s dungeon, he bears
his fate with a sense of moral exaltation. Having escaped, he
attempts to offer up also his sister, the narrator of the story, when
he learns that he has been deceived by the ventriloquism of Carwin,
whom malice has thus led to trick the family. In a frenzy, Wieland
kills himself; Carwin disappears; and the story ends with the
marriage of the sister and Pleyel, a brother of Wieland’s late wife
and now a widower. Less powerful than “Wieland,” but still superior
to Brown’s other works, is “Ormond.” An artist, Stephen Dudley,
engaging in pharmacy to support his family, is brought to beggary
through the villainy of his partner. His daughter Constantia bears up
bravely through severe trials. Just when life appears brighter,
Ormond comes upon the scene, a mysteriously powerful man, much
like Falkland in Godwin’s “Caleb Williams,” of great wealth, strong
mind, and base morals; he deserts Helena Cleves, who commits
suicide, and pursues Constantia. Stephen Dudley is murdered by an
unknown hand. Having a legacy from Helena, Constantia is about to
sail for Europe with her friend (who narrates the story) when
Ormond, finding her invincible, assaults her in a lonely house and
meets death by her hand, after he has himself slain Craig, now
revealed as the assassin of Dudley at Ormond’s instigation.
Constantia afterward lives quietly with her friend in Europe. Brown’s
plots are usually disfigured by irrelevant incidents and superfluous
characters; he frequently changed his plans and even his heroines,
and, writing with great rapidity, often with a greedy printer at his
elbow, he utterly failed to weld together the elements of his stories
and often to give them proper motivation. His characters are drawn
in bold and clear outlines, but are frequently uninteresting—being
too sentimental or inconsistent, or given to long and prosy
soliloquies. It cannot be affirmed that Brown understood human
nature well. Of style he had none; his pages are innocent of epigram
or humorous turn; he employs very little dialogue and makes but
scanty and awkward use of dialect. Yet in certain passages, in
describing great crises, he exhibits considerable vividness and power.
Brown’s chief merit consists in the sense of reality with which he
contrives to invest his scenes of gloom and terror.

The power possessed by this rare genius, says Mr. James


4
H. Morse, of throwing gloomy characteristics into his theme,
was equalled by no other American writer. In the matter of
morbid analysis, Poe, in comparison with Brown, was
superficial, Hawthorne was cheerful, and the modern school
of French writers are feeble. With Poe, we can see that the
gloom came by an effort of a spurred imagination; with
Hawthorne, that it was the work of an artistic sense; but with
Brown, it seems to have been constitutional—the gift at once
of temperament and circumstances.

Brown was an admirer of William Godwin and obviously imitated not


only his method of developing characters but also his style. It may
be added that Brown in turn found many readers in England, where
several of his novels were republished and where, as we have seen,
“Jane Talbot” was first published. Professor Dowden quotes Peacock
as saying that of all the works with which Shelley was familiar, those
which took the deepest root in his mind were Brown’s four novels,
Schiller’s “Robbers,” and Goethe’s “Faust.” Brown’s influence upon
subsequent American writers, moreover, was not inconsiderable, and
his place in our literature, if not high, is at least honourable.

John Davis, an Englishman about whom little is known, wrote


several novels of American life, most of which were published here,
and became somewhat popular. He lived in the United States from
1798 till 1802, and travelled over a large part of the country. His first
novel, “The Original Letters of Ferdinand and Elizabeth” (1798), was
a conventional story of seduction and suicide. It was followed by
“The Farmer of New Jersey” (1800), “The First Settlers of Virginia”
(1805), a pioneer historical novel, crude and ill managed, “Walter
Kennedy, an American Tale” (London, 1805), and “The Post Captain”
(1813). The most that can be said of these stories is that their
author was shrewd and observant, and had some journalistic skill.

Mrs. Sally Keating Wood (1760–1855), wife of General Abiel


Wood, of Maine, may be mentioned as the author of “Julia and the
Illuminated Baron” (1800), which recalls the mysterious evil power
and atheistic tendencies attributed to the Bavarian order of the
Illuminati, established in 1775, which, though suppressed in 1780 by
the Elector, was supposed to have secretly persisted and spread over
Europe. Mrs. Wood wrote also “Dorval, or The Speculator” (1801),
“Amelia, or The Influence of Virtue” (1802), “Ferdinand and Elmira, a
Russian Story” (1804), and “Tales of the Night” (1827), besides
several novels that were never published. Mrs. Wood placed many of
her scenes in Europe.

Isaac Mitchell.—At Poughkeepsie, New York, in 1811, was


published in two volumes “The Asylum, or Alonzo and Melissa, an
American Tale, Founded on Fact.” Of the author of this Gothic
5
romance, Isaac Mitchell, little is known save that he was
successively the editor of The Farmer’s Journal, The Political
Barometer, and The Republican Crisis, all of Albany, New York, and
that after losing his position through political changes, he moved to
Poughkeepsie. The story was later abridged and compressed into
one volume by Daniel Jackson, Jr. (Mitchell’s name disappearing
from the title-page), and in this form was long popular throughout
America; Mr. Reed thinks that for nearly a quarter of a century a new
edition appeared practically every year. The narrative is full of
elaborate descriptions of nature.

Washington Irving.—In general Irving will be discussed rather


with the essayists than with the novelists; but his stories and tales
must be considered here. They have contributed largely if not chiefly
to his enduring reputation. His first book, “Knickerbocker’s History of
New York” (1809), in which he works out a grotesquely humorous
drama of the Dutch fathers wrestling with the weighty problems of
statecraft, is of course in the main fictitious. No doubt it is at times
pretentious or overdone, and the humour is occasionally a little too
broad for the decorum of to-day; but the irrepressible spirit of
comedy, the delightfully burlesqued descriptions of stolid Dutch
character, the vivid though leisurely narrative, give it a supreme
place in our humorous literature. “Rip Van Winkle” and “The Legend
of Sleepy Hollow” are doubtless the most read parts of “The Sketch
Book” and have long since become classics; no more faithful
narratives of quaint old Dutch life have ever been written. In them
the boisterous exuberance of the “History” gives way to a more
graceful, refined, and mature style, which invests the homely
simplicity and contentment of colonial Dutch life with a kind of idyllic
charm. Only a little less successful were Irving’s other stories of early
New Amsterdam life—notably “The Money-Diggers” in “Tales of a
Traveller,” and “Dolph Heyliger” in “Bracebridge Hall.” Inferior
because more conventional and less spontaneous are the first three
parts of the “Tales”; yet even here, in dealing with the sentimental
and the terrible, Irving compares favourably with other story-tellers
of his day. In the stories scattered through “The Alhambra,” Irving
showed clearly that he had found another source of inspiration in the
romantic legends of Spain and the Moors—legends full of Oriental
mystery and of the splendid glories of old Spain, so charmingly and
truthfully set forth that the Spaniards themselves spoke of him as
“the poet Irving.” And “poet” he is in the large sense that he has
created imperishable scenes and characters in that realm of romance
in which we delight to wander, far from the prosaic world and the
madding crowd.

James Kirke Paulding.—A contrast with Irving in more than one


respect is afforded by James K. Paulding (1778–1860), the friend
and collaborator of Washington Irving and the brother-in-law of
William Irving. The author of “The Sketch Book” gave his whole life
to the profession of letters; for Paulding, on the other hand, literary
composition was only an avocation. The genial humour of Irving,
too, differs from the satirical and ironical vein too often indulged in
by his friend. Born in Dutchess County, New York, Paulding went to
New York City while a young man and became associated with the
Irvings in writing Salmagundi, the success of which gave Paulding
confidence in himself and led him to further literary efforts. “The
Diverting History of John Bull and Brother Jonathan” (1812), a
loosely constructed and amateurish satire in the style of Arbuthnot,
became very popular both in America and in England.
“Koningsmarke, the Long Finne” (1823), now remembered only for
the familiar assertion that “Peter Piper picked a peck of pickled
peppers,” was a burlesque on Cooper’s “Pioneers.” Paulding’s most
successful work, which deserves to live, was “The Dutchman’s
Fireside” (1831), in which are charming descriptions of quaint Dutch
customs and personages, of the picturesque scenery of the Hudson,
and of the vast expanse of wilderness that stretched to the
westward. In general, however, Paulding’s work was characterised by
a too harsh and obstreperous Americanism, an immoderate and
amusing hostility to foreigners, and a carelessness of workmanship
which prevented it from enduring long.

Samuel Woodworth.—As a curiosity must here be mentioned the


long-forgotten “Champions of Freedom” (1816) of Samuel
Woodworth (1785–1842). It was his one essay in fiction; a history of
the War of 1812 in the style of a romance. It must be described as a
chaotic miscellany, blending wild romance with commonplace
realism, and conducting the reader from ballroom to battlefield and
back again with the least possible suspicion of method or motive.

John Neal.—Born in Portland, Maine, and beginning life as a


shop-boy in Boston, John Neal (1793–1876) became in turn a
wholesale dry-goods merchant, a lawyer, and a voluminous critic,
poet, and novelist. He boasted that in thirty-six years he had written
enough altogether to fill a hundred octavo volumes; yet to-day he is
little more than a name. His first novel, “Keep Cool,” which he
afterward spoke of with justice as a “paltry, contemptible affair,”
appeared in 1817. His best novels are “Seventy-Six” (1823), a lively
story of the Revolution, “Rachel Dyer” (1828), a story of the Salem
Witchcraft, and “The Down-Easters” (1833), an extravagant tale
which deals with the ways of steamboat passengers, and into which
he manages to introduce plenty of horrors. Neal has been well styled
“the universal Yankee, whittling his way through creation, with a
half-genius for everything, a robust genius for nothing.” He is said to
have been the originator of the woman’s suffrage movement, the
first person to establish a gymnasium in America, and the first to
6
encourage Edgar A. Poe.

James Fenimore Cooper.—The first American to win universal


recognition as a powerful novelist was James Fenimore Cooper. Born
at Burlington, New Jersey, on September 15, 1789, of English
Quaker and Swedish parentage, he was taken, when a year old, to
the Central New York wilderness, where his father, having become
the owner of large tracts of land, had laid out the village of
Cooperstown. Here, on the shores of the beautiful Otsego Lake, in a
motley frontier settlement, the boy Cooper passed his earliest years.
In due time entering the family of an Albany clergyman as a private
pupil, Cooper proceeded in 1803 to Yale College, where he became a
member of the class of 1806. An escapade in his third year led to his
dismissal; after which he served a marine apprenticeship of a year
and then entered the navy, serving as midshipman for nearly four
years. In 1811, he married Susan A. De Lancey, a lady of Huguenot
and Tory family, and a sister of Bishop De Lancey of Western New
York; and at her request resigned his commission, to become an
amateur farmer, successively at Mamaroneck, on Long Island Sound,
at Cooperstown, and at Scarsdale, Westchester County, all in New
York State. Thus he arrived at the age of thirty without having even
dreamed of a career of authorship. One day, reading a novel
descriptive of English society, he impatiently threw down the book
and exclaimed that he could write a better story himself. Challenged
by his wife to do so, he wrote and published “Precaution” (1820), a
dull and conventional story of English social life, purporting to be the
work of an Englishman. Although the novel was not very successful,
his friends urged Cooper to try again, and this time to write of
scenes of which he had some personal knowledge. The publication
of “The Spy, a Tale of the Neutral Ground,” in December, 1821,
marks the beginning of a long series of successes. “The Spy” met
with a large sale both in America and in England. It was soon
translated into most of the cultivated languages of Europe; and its
popularity has never greatly waned. It is a story of the American
Revolution, in which the patriotic hero, Harvey Birch, signally aids
the American cause and exhibits a rare combination of the spy and
the gentleman.
During the twenty-nine years remaining to Cooper, he produced
thirty-two further volumes, chiefly romances. Of these, many are
now rarely read, but the following have retained their popularity for
successive generations:
“The Spy,” already referred to.
“The Leatherstocking Tales,” comprising (in the chronological
order not of their production, but of the narrative):
“The Deerslayer, or The First War Path,” 1841.
“The Last of the Mohicans, a Narrative of 1757,” 1826.
“The Pathfinder, or The Inland Sea,” 1840.
“The Pioneers,” 1823, and “The Prairie,” 1827; and ten volumes
of the “Sea Tales”:
“The Pilot,” 1823.
“The Red Rover,” 1828.
“The Two Admirals,” 1842.
“Homeward Bound, or The Chase,” 1838.
“The Water-Witch, or The Skimmer of the Seas,” 1830.
“The Wing-and-Wing, or Le Feu-Follet,” 1842.
“Afloat and Ashore,” 1844.
“Miles Wallingford,” 1844, published in England as “Lucy
Hardinge.” A sequel to “Afloat and Ashore.”
“Jack Tier, or The Florida Reefs,” 1848.
“The Sea Lions, or The Lost Sealers,” 1849.
The popularity which Cooper achieved, and which reached its
height with the publication of “The Last of the Mohicans,” was most
remarkable; no other American has ever enjoyed anything like it. Not
only were his stories read in well-nigh every household, but they
were promptly dramatised, and furnished subjects for numerous
paintings and poetical effusions. In Europe, his fame fairly rivalled
that of Scott. In 1833, Samuel F. B. Morse, the inventor of the
electric telegraph, wrote: “In every city of Europe that I visited the
works of Cooper were conspicuously placed in the windows of every
bookshop. They are published, as soon as he produces them, in
thirty-four different places in Europe. They have been seen by
American travellers in the languages of Turkey and Persia, in
Constantinople, in Egypt, at Jerusalem, at Ispahan.”
In 1822 Cooper removed with his family to New York, in order to
be near his publisher and to put his daughters into school. There he
founded a club, commonly known as the Bread and Cheese, to
which many of the noted men of the time belonged. The years
1826–33 he spent in Europe, being for a part of this time United
States consul at Lyons. On his return, he lived a few winters in New
York; he then took up his permanent residence at Otsego Hall,
Cooperstown, where he died in September, 1851.
In his later years, Cooper presented the singular spectacle of a
popular novelist who was the most cordially hated man of his time.
The fact is significant and helps to account for the failure of many of
Cooper’s later stories. An ardent lover of his country and its
republican institutions, he boldly rebuked the ignorance and
supercilious condescension of European critics; he wrote “The Bravo”
(1831), “The Heidenmauer” (1832), and “The Headsman” (1833),
for the avowed purpose of assailing monarchical and praising
democratic institutions, and kept this purpose in mind much too
constantly to produce artistic work. On his return to America,
contrasting the restless exertion and bustle, the material progress
which obscured higher ideals than money-making, with the leisure
and dignified culture of European lands, he did not hesitate to speak
plainly of the defects in the American character. This naturally
brought him much abuse from the press; and an unfortunate dispute
with the citizens of Cooperstown over the ownership of Three-Mile
Point on Otsego Lake, though the right was wholly on his side, only
made him more intensely disliked.
In the early ’40’s, certain issues arose in New York State
between the tenants of the old Patroons who held their large estates
under original grants, and their landlords, the tenants attempting to
secure under State legislation a title in fee to their rented lands.
Cooper, whose family interests were themselves likely to be affected
by these claims, threw himself with full force and bitterness into the
contest. In addition to a number of magazine articles and speeches,
he devoted three volumes to the presentation of the claims of the
landlords, volumes which are now read but little, excepting by
special students of the subject. They are entitled respectively:
“Satanstoe, or The Littlepage Manuscripts,” 1845;
“The Chainbearer,” 1846; and
“The Redskins, or Indian and Injin,” 1846.
“The Ways of the Hour” (1850) was also a novel with a purpose,
which overweighted its interest as a story; the purpose was the
reform of court procedure in the State of New York.
In “Homeward Bound” and its sequel, “Home as Found” (1838),
the latter being one of his worst stories, Cooper lashed the petty
vices of his countrymen and sought to show them what ought to be.
As he might have expected, he only confirmed the public in its
hatred of him, while he materially impaired his reputation as a story-
teller. Had he been more tactful, philosophical, and far-seeing, he
would have saved himself years of stormy conflict.
In Lakewood Cemetery at Cooperstown, on the hill overlooking
Otsego Lake, is a majestic monument to Fenimore Cooper, twenty-
five feet in height, and surmounted by a statue of the hunter
Leatherstocking and his dog. As enduring as bronze is this character
in our American fiction; the hero that will live longest of Cooper’s
creations. In him Lowell found “the protagonist of our New World
epic, a figure as poetic as that of Achilles, as ideally representative
as that of Don Quixote, as romantic in his relation to our homespun
and plebeian myths as Arthur in his to the mailed and plumed cycle
of chivalry.” The series in which he appears, “The Deerslayer,” “The
Pathfinder,” “The Last of the Mohicans,” “The Pioneers,” and “The
Prairie,” the group which Cooper himself preferred to his other
stories, is now (excepting always “The Spy”) more read than all
Cooper’s other works put together. Drawn at first from life, Natty
Bumppo becomes an idealised character, the perfect type of the bold
frontiersman and scout, who read nature as an open book, and who
was most at home when farthest from the haunts of the civilised
world. Worthy to stand by his side is the noble Indian Chingachgook,
“grave, silent, acute, self-contained,” as Mr. James H. Morse says of
him; “sufficiently lofty-minded to take in the greatness of the
Indian’s past, and sufficiently farsighted to see the hopelessness of
his future,—with nobility of soul enough to grasp the white man’s
virtues, and with inherited wildness enough to keep him true to the
instincts of his own race.” Famous among Cooper’s sailor folk is Long
Tom Coffin, of “The Pilot”—type of the rough but honest seaman,
superstitious like all seamen but devoutly religious, faithful to the
last and capable of the most heroic self-sacrifice. Other characters
scarcely less well drawn, if less famous, move through Cooper’s
pages—rough, uncouth waifs and strays of border life, grizzled old
sea-dogs, soldiers’ and sailors’ wives and sweethearts, such as the
wife of Ishmael Bush, Hetty and Judith Hutter, and Dew-of-June.
That he exhibited marked imperfections in style and technique
no one will deny. He wrote too rapidly to attain to anything like
elegance of style, and he is not infrequently obscure. He continually
repeats words and expressions, to the great annoyance of the
reader. The same carelessness that characterises his style is
occasionally seen in the construction of his stories. Scenes are
repeated. Mistakes due to forgetfulness occur, as in “Mercedes of
Castile,” where the heroine presents her lover, on his outward
voyage, with a cross of sapphire stones, emblems, she tells him, of
fidelity, which later appear as turquoise stones. Peculiarities of habit
or manner are referred to so continually that the reader becomes
weary and disgusted. Numerous characters are, it must be admitted,
conventional in the extreme. Cooper failed signally in his fine
women. They are not creatures of flesh and blood; they are purely
imaginary creatures in petticoats, mere simulacra, invariably
paragons of sweetness, discretion, and artlessness, ever saying and
doing the correct thing until the reader longs for a little less of the
angel and a good deal more of Mother Eve. Finally, his introductions
are exceedingly prolix and tedious, though in this respect he sinned
in company with Scott and many another of the time.
But we must not let this catalogue of Cooper’s defects obscure
his virtues. In spite of occasional carelessness of construction, all his
best stories are highly interesting; he spins a good yarn. Never
straining after effects, never loading his sentences with ornaments,
when once started he moves straight ahead to his goal; one stirring
scene follows another; there is wonderful fertility of resource, set
forth with the confidence that begets faith. His was a large genius,
which, though unsuccessful at miniature work, could manage a large
canvas marvellously well. It must not be forgotten that Cooper was a
pioneer; that he was the creator of our American romance of forest
and prairie and sea. His descriptions of nature are done with the
hand of a master. “If Cooper,” remarked Balzac, “had succeeded in
the painting of character to the same extent that he did in the
painting of the phenomena of nature, he would have uttered the last
word of our art.” Moreover, Cooper’s stories are honest and
wholesome like himself; they breathe the same genuineness, the
same sincerity and hatred of shams and meanness; they uniformly
hold up noble and worthy ideals; their tone is always as healthful
and invigorating as a breath of ozone. As Professor Trent remarks,
he “lifted the story of adventure into the realms of poetry”; and as
the poet of the primeval American forest he has never been
superseded.
Professor Lounsbury, whose Life of Cooper, in the “American Men
of Letters” Series, remains the authoritative biography, sums up the
man and his work as follows:

America has had among her representatives of the


irritable race of writers many who have shown far more ability
to get on pleasantly with their fellows than Cooper. She has
had several gifted with higher spiritual insight than he, with
broader and juster views of life, with finer ideals of literary
art, and, above all, with far greater delicacy of taste. But she
counts on the scanty roll of her men of letters the name of no
one who acted from purer patriotism or loftier principle. She
finds among them all no manlier nature, and no more heroic
soul.

Mr. W. C. Brownell prepared for the Iroquois Edition of Cooper’s


Works a critical introduction which may safely be accepted as the
most just, most delicate, and most comprehensive analysis of the
man and of his work. Mr. Brownell writes:

There is a quality in Cooper’s romance, however, that


gives it as romance an almost unique distinction. I mean its
solid and substantial alliance with reality. It is thoroughly
romantic, and yet—very likely owing to his imaginative
deficiency, if anything can be so owing—it produces, for
romance, an almost unequalled illusion of life itself....
Cooper’s ... work is in no sense a jardin des plantes; it is like
the woods and sea that mainly form its subject and
substance. Only critical myopia can be blind to the
magnificent forest, with its pioneer clearings, its fringe of
“settlements,” its wood-embosomed lakes, its neighbouring
prairie on the one side, and on the other the distant ocean
with the cities of its farther shore—the splendid panorama of
man, of nature, and of human life unrolled for us by this large
intelligence and noble imagination, this manly and patriotic
American representative in the literary parliament of the
world.

The Elder Dana.—Richard Henry Dana (1787–1879), lawyer,


politician, poet, critic, and novelist, was one of the group of Boston
writers that laid the foundations of New England literature. His tales,
“Tom Thornton” and “Paul Felton,” are romantic stories of villainy
and insanity, and give evidence of the influence of Brockden Brown.
The narrative has at times an impetuous sweep that hurries the
reader along in spite of himself; and the characterisation is wrought
with powerful strokes. A collective edition of his “Poems and Prose
Writings” appeared in 1833.

Miss Sedgwick and Mrs. Child.—Catherine Maria Sedgwick


(1789–1867) was the daughter of Judge Theodore Sedgwick and
was born at Stockbridge, Massachusetts, where she was principal of
a young ladies’ school for half a century. Her duties as a teacher did
not prevent her from becoming a voluminous novelist. Her first story
was “A New England Tale” (1822), which at once found favour.
“Redwood” (1824) was translated into three or four Continental
languages; on the title-page of the French translation, the novel was
ascribed to Fenimore Cooper. Other novels which achieved great
popularity for their faithful portraiture of early and contemporary
New England life were “Hope Leslie, or Early Times in
Massachusetts” (1827), “Clarence, a Tale of Our Own Times” (1830),
“The Linwoods, or Sixty Years Since in America” (1835), and
“Married or Single” (1857). While Miss Sedgwick never rises to the
height of absorbing interest, she is rarely dull, and some of her
women, if we allow for the difference in time, do not suffer in
comparison with those of Mrs. Stowe and Mrs. Wilkins Freeman. Her
descriptions of simple country life were superior to any that had
hitherto appeared. Mrs. Child, born Lydia Maria Francis (1802–1880),
who likewise spent her life in Massachusetts, began writing early,
producing her first novel, “Hobomok,” in 1824 and her second, “The
Rebels,” a year later. The former deals with Salem life in colonial
times; the latter is a story of the Revolution, describing the sack of
Governor Hutchinson’s house and the Boston massacre. Although
they give true pictures of early Puritan customs, they are not
powerful as fiction. In 1836 she essayed a more ambitious flight in
“Philothea,” a romance of the days of Pericles, which, in spite of its
stilted rhetoric, reveals some imaginative power and deserves
mention as a pioneer attempt to interpret Greek life to America.

Timothy Flint.—A voluminous writer and in his day a well-known


figure was Timothy Flint (1780–1840), a native of Reading,
Massachusetts, and a graduate of Harvard in the class of 1800.
Becoming a Congregational minister, in 1815, in search for health, he
crossed the Alleghany Mountains with his family, and after travelling
in Ohio, Indiana, and Illinois, became a missionary, first at St.
Charles, Missouri, and then in Arkansas. The success of his
“Recollections of the Last Ten Years” (1826) led him to publish a
novel, “Francis Berrian, or The Mexican Patriot” (1826), dealing with
adventures with the Comanche Indians, and with the Mexican
struggle of 1821, which resulted in the fall of Iturbide. The story was
crude and improbable, but some of its descriptions found favour.
“Arthur Clenning,” his second novel, published in 1828, includes a
shipwreck in the Southern Ocean, after which the hero and heroine
arrive in New Holland and later settle in Illinois. He wrote some
other novels, but none has survived. For a time (1833), Flint edited
The Knickerbocker; and in 1835 he contributed some “Sketches of
the Literature of the United States” to the London Athenæum.

William Austin (1788–1841) a lawyer of Charlestown,


Massachusetts, deserves to be noticed for the remarkable story of
“Peter Rugg, the Missing Man,” which he wrote for The New England
Galaxy (1827–8; reprinted in “The Boston Book,” 1841, and in other
books and papers). The theme is the same as that of “The
Wandering Jew.” While “originating in the inventive genius of its
author,” as Joseph Buckingham says of it, it doubtless owed
something also to German romance.

Nathaniel Hawthorne.—The greatest genius among American


writers of romance, by many held to be the supreme literary artist of
America, was Nathaniel Hawthorne. He was peculiarly a product of
New England and frankly admitted that New England was quite as
large a lump of earth as his heart could take in. His ancestor, William
Hathorne, came to the New World in 1630, in the ship with John
Winthrop and Thomas Dudley, and became a leader in the colony.
Hathorne’s son John was one of the judges in the witchcraft trials at
Salem in 1691. The grandfather and father of Nathaniel Hawthorne
were both sea-captains. The novelist was born at Salem on July 4,
1804. Four years later, his father, never apparently a robust man,
died at Surinam, and the widowed mother began to live in a deep
seclusion which could not fail to have its effect upon the quick
sensibilities of her son. In 1818, the family removed to Raymond, on
the shore of Sebago Lake, in Maine, where his grandfather Manning
owned large tracts of land. Hawthorne’s boyhood environment,
therefore, was not widely different from that of Fenimore Cooper.
But he was more of a reader than Cooper. As a boy, he became
familiar with Shakespeare, Milton, Bunyan, Clarendon, Froissart,
Rousseau, and Godwin. Entering Bowdoin College, he was graduated
in 1825 in the class with Longfellow. While he did not distinguish
himself in his studies, he became a respectable Latin and English
scholar; and he devoted much time to reading in the little library of
the Athenæan Society. At graduation, he ranked eighteenth in a
class of thirty-eight. Meanwhile his family had returned to Salem,
and thither Hawthorne now went, to begin a period of literary
apprenticeship. It was seemingly a bold undertaking to attempt to
live by his pen; however, he seems to have drifted into the attempt
through aversion to a more active life. In 1828, he published
anonymously a novel called “Fanshawe,” dealing with some of his
college experiences and recalling vaguely the methods of Scott.
Some characters, it must be said, are vigorously conceived, and here
and there the volume gave promise of the author’s future skill; but
there is about the whole a suggestion of unreality, not to say
crudeness. The book found, as it deserved, an indifferent public, and
Hawthorne subsequently recalled as many copies as he could
procure and burned them. For several years, he continued to live in
seclusion, contributing stories and sketches to various annuals and
periodicals. For the stories he got $35 each. In March, 1837, having
been encouraged by his friend Horatio Bridge, he published the first
volume to appear with his name, “Twice-Told Tales.” They were
eighteen in number, being only half of the stories he is known to
have printed up to this time. The “Tales” gave Hawthorne a
considerable reputation; Longfellow praised it in The North American
Review, then influential in literary affairs. Again helped by his
friends, in January, 1839, Hawthorne assumed the position of
weigher and gauger in the Boston Custom-House. At first, the
novelty of contact with the practical world interested him; but he
soon found that his work, always monotonous, left him no time or
strength for writing, and he was not sorry to lose his post when the
Whigs came into power in 1841. For a few months, he tried life at
Brook Farm, thinking that in this new community he should find a
suitable way of combining manual and intellectual labour; but the
work was too hard, and he had too little opportunity for writing.
Accordingly in 1842 he left the Farm, married Miss Sophia A.
Peabody, to whom he had been engaged for four years, and settled
at the Old Manse, an idyllic retreat at Concord, Massachusetts.
Meanwhile, he had published (1841) two volumes of historical tales
for young people, “Grandfather’s Chair” and “Famous Old People”;
and to these he now added a third series, “The Liberty Tree,” as well
as a second series of “Twice-Told Tales” and a volume of
“Biographical Stories for Children” (1842). Of these, none except the
“Tales” rises much above the level of respectable writing to sell. In
the next four years, Hawthorne wrote for periodicals some eighteen
more tales, which, together with a number of earlier uncollected
stories, he republished in 1846 as “Mosses from an Old Manse.”
Hawthorne now returned to his native Salem as surveyor of customs
(1846–9), and proved an able administrator of the office. Another
period of literary barrenness ensued, but in 1847 he resumed his
writing and produced a few tales. The idea of a longer romance had
come to him, and after his dismissal from office in 1849 he found
the leisure necessary for writing “The Scarlet Letter.” Once more,
then, he exchanged the world of affairs for that realm of the
imagination where he was so much more at home. Working
resolutely amid sickness and poverty, he at length completed the
splendid romance, the publication of which distinguishes the year
1850 in American letters as Tennyson’s “In Memoriam” and
Wordsworth’s “Prelude” do in English poetry. Hawthorne had now
entered upon a period of great productivity. In the next two years he
published “The House of the Seven Gables” (1851), “A Wonder-Book
for Girls and Boys” (1851), “The Snow Image and Other Tales”
(1851), “Tanglewood Tales,” (1852), “The Blithedale Romance”
(1852), a tale based on his Brook Farm life, and a campaign “Life of
Franklin Pierce,” his college friend, now a candidate for the
Presidency. Promptly after his election, President Pierce made
Hawthorne consul at Liverpool, an office which he held from July,
1853, until September, 1857. Though rich in experience and in
fruitful observation, his life in England was outwardly quiet and
uneventful. The years 1857–9 the Hawthornes spent in Italy, where
they mingled somewhat more with the world than had been their
wont. The fruit of the Italian life was “The Marble Faun” (1860),
written in Italy and at Redcar, on the shore of the North Sea, and
published in England as “Transformation.” Returning to America in
1860, Hawthorne passed the next four years at the Wayside,
Concord. In 1863 he contributed “Our Old Home” to The Atlantic
Monthly and began “The Dolliver Romance,” which he was destined
not to finish. He died suddenly on May 18, 1864, at Plymouth, N. H.,
while on a journey to the New Hampshire lakes in search of health.
His literary remains must be at least mentioned. In 1868
appeared “Passages from American Note-Books”; in 1870, “Passages
from English Note-Books”; and in 1871, “Passages from French and
Italian Note-Books.” These volumes throw much light on
Hawthorne’s favourite haunts and wandering propensities, as well as
his eagerness for minute observation. “Septimius Felton, or, The
Elixir of Life” (1871) was to be a story, placed in Revolutionary times,
of a man who sought earthly immortality. The theme was a powerful
one; but Hawthorne’s strength was evidently exhausted, and the
story must be pronounced a failure. The last works to appear were
“The Dolliver Romance” (1876) and “Doctor Grimshaw’s Secret,”
which are fragmentary and ineffective studies of the same theme as
“Septimius Felton.” Their failure, in all probability, was due not only
to the waning of Hawthorne’s powers but also to the difficulties
attending the theme itself.
Hawthorne was one of the shyest of men. Kenyon, in “The
Marble Faun,” says: “Between man and man there is always an
insuperable gulf”; such a gulf at any rate separated Kenyon’s creator
from the rest of mankind. Always fond of solitude, he lived in a
world of his own, apart from humankind; longing at times for more
familiar converse with men, but never quite successful in
establishing cordial relations (outside of his own family) with any but
a few friends. Possessed of an exquisitely sensitive nature, he made
no effort to conceal the pleasure which honest praise afforded him;
and he was easily rebuffed by the coolness of his public. Perhaps the
bane of his life was self-distrust. Each of his books when first written
seemed to him well-nigh worthless. James T. Fields has told of the
difficulty with which he extracted from Hawthorne the first
manuscript of “The Scarlet Letter.” “Thus it is with winged horses,”
says Hawthorne in “The Chimæra,” “and with such wild and solitary
creatures. If you can catch and overcome them, it is the surest way
to win their love.” Such was the devotion with which Hawthorne
repaid those who had “captured” him that their confident
encouragement greatly strengthened and inspired him. As might be
supposed, however, with the world at large he was lacking in
sympathy. His point of view was fixed; he could not see the world
with the eyes of another. This helps to account for the effect of
harshness and asperity which his chapter on “The Custom House” in
“The Scarlet Letter” had upon the people of Salem whom he there
described; and for the similar effect of the descriptions of English life

You might also like