Parallel & Distributed Computing
Lecture NO: 01
                      Introduction
          Farhad M. Riaz
          [email protected]
          Department of Computer Science
                NUML, Islamabad
Course Pre-requisites
   Programming Experience (preferably
    Python/C++/Java)
   Understanding of Computer Organization
    and Architecture
   Understanding of Operating System
Requirements & Grading
   Roughly
    –   50 % Final Exam
    –   25% Internal Evaluation
            Quiz 8 Marks
            Assignments 8 Marks
            Project 9 Marks
    –   25% Mid term exam
Books
   Some good books are:
    –   Distributed Systems Third edition
    –   PRINCIPLES OF PARALLEL PROGRAMMING
    –   Designing and Building Parallel Programs
    –   Distributed and Cloud Computing
Course Project
   At the end of the semester students needs to
    submit a semester project like
    –   Distributed computing & smart city services
    –   Large scale convolutional neural networks
    –   Distributed computing with delay tolerant network
Course Overview
   This course covers following main concepts
    –   Concepts of parallel and distributed computing
    –   Analysis and profiling of applications
    –   Shared memory concepts
    –   Distributed memory concepts
    –   Parallel and distributed programming (OpenMP, MPI)
    –   GPU based computing and programming (CUDA)
    –   Virtualization
    –   Cloud Computing, MapReduce
    –   Grid Computing
    –   Peer-to-Peer Computing
    –   Future trends in computing
Recommended Material
   Distributed Systems, Maarten van Steen & Andrew S. Tanenbaum, 3rd
    Edition (2020), Pearson.
   Parallel Programming: Concepts and Practice, Bertil Schmidt, Jorge
    Gonzalez-Dominguez, Christian Hundt, Moritz Schlarb, 1st Edition
    (2018), Elsevier.
   Parallel and High-Performance Computing, Robert Robey and Yuliana
    Zamora, 1st Edition (2021).
   Distributed and Cloud Computing: From Parallel Processing to the
    Internet of Things, Kai Hwang, Jack Dongarra, Geoffrey Fox, 1st
    Edition (2012), Elsevier.
   Multicore and GPU Programming: An Integrated Approach,
    Gerassimos Barlas, 2nd Edition (2015), Elsevier.
   Parallel programming: For multicore and cluster systems. Rauber,
    Thomas, and Gudula Rünger. Springer Science & Business Media,
    2013.
Recent Jobs
Jobs
Research In Parallel & Distributed
Computing
Single Processor Architecture
Memory Hierarchy
5 years of Technology Advance
Productivity Gap
Pipelining
Pipelining
Multicore Trend
Application Partitioning
High-Performance Computing
(HPC)
   HPC is the use of parallel processing for running
    advanced application programs efficiently, reliably
    and quickly.
   It applies especially to systems that function above a
    tera FLOPs (floating-point operations per second)
    processing speed.
   The term HPC is occasionally used as a synonym for
    supercomputing,         although     technically      a
    supercomputer is a system that performs at or near
    the currently highest operational rate for computers.
High Performance Computing
GPU-accelerated Computing
   GPU-accelerated computing is the use of a graphics
    processing unit (GPU) together with a CPU to
    accelerate deep learning, analytics, and engineering
    applications.
   Pioneered in 2007 by NVIDIA, GPU accelerators now
    power energy-efficient data centers in government labs,
    universities, enterprises, and small-and-medium
    businesses around the world.
   They play a huge role in accelerating applications in
    platforms ranging from artificial intelligence to cars,
    drones, and robots.
What is GPU?
   It is a processor optimized for 2D/3D graphics, video,
    visual computing, and display.
   It is highly parallel, highly multithreaded
    multiprocessor optimized for visual computing.
   It provide real-time visual interaction with computed
    objects via graphics images, and video.
   It serves as both a programmable graphics
    processor and a scalable parallel computing
    platform.
   Heterogeneous Systems: combine a GPU with a
    CPU
SGI Altix Supercomputer 2300 processors
HPC System composition
    Parallel Computers
   Virtually all stand-alone computers
    today are parallel from hardware
    perspective:
     – Multiple functional units (L1 cache,
        L2 cache, branch, pre-fetch,
        decode, floating-point, graphics
        processing (GPU), integer, etc.)
     – Multiple execution units/cores
     – Multiple hardware threads
IBM BG/Q Compute Chip with 18 cores (PU) and 16 L2 Cache units (L2)
Parallel Computers
   Networks connect multiple
    stand-alone computers
    (nodes) to make larger
    parallel computer clusters.
   Parallel computer cluster
     –   Each compute node is a
         multi-processor parallel
         computer in itself
     –   Multiple compute nodes are
         networked together with an
         Infiniband network
     –   Special purpose nodes, also
         multi-processor, are used for
         other purposes
Types of Parallel and Distributed
Computing
   Parallel Computing
    –   Shared Memory
    –   Distributed Memory
   Distributed Computing
    –   Cluster Computing
    –   Grid Computing
    –   Cloud Computing
    –   Distributed Pervasive Systems
Parallel Computing
Distributed (Cluster) Computing
   Essentially a group of high-end
    systems connected through a
    LAN
   Homogeneous: same OS, near-
    identical hardware
   Single managing node
Distributed (Grid) Computing
   Lots of nodes from everywhere
     – Heterogeneous
     – Dispersed across several organizations
     – Can easily span a wide-area network
   To allow for collaborations, grids generally use virtual
    organizations.
   In essence, this is a grouping of users (or their IDs) that will
    allow for authorization on resource allocation.
Distributed (Cloud) Computing
Distributed (Pervasive) Computing
   Emerging next-generation of distributed systems in which
    nodes are small, mobile, and often embedded in a larger
    system, characterized by the fact that the system naturally
    blends into the user’s environment.
   Three subtypes
     – Ubiquitous       computing systems: pervasive and
        continuously present, i.e., there is a continuous
        interaction between system and user.
     – Mobile computing systems: pervasive, but emphasis is
        on the fact that devices are inherently mobile.
     – Sensor (and actuator) networks: pervasive, with
        emphasis on the actual (collaborative) sensing and
        actuation of the environment.
Why Use Parallel Computing?
The Real World is Massively
Parallel
   In the natural world, many
    complex, interrelated events
    are happening at the same
    time, yet within a temporal
    sequence.
   Compared to serial computing,
    parallel computing is much
    better suited for modeling,
    simulating and understanding
    complex, real world
    phenomena.
   For example, imagine modeling
    these serially =>
 SAVE TIME AND/OR MONEY
(Main Reasons)
   In theory, throwing
    more resources at a
    task will shorten its
    time to completion,
    with potential cost
    savings.
   Parallel computers
    can be built from
    cheap, commodity
    components.
SOLVE LARGER / MORE COMPLEX
PROBLEMS (Main Reasons)
   Many problems are so large and/or complex
    that it is impractical or impossible to solve
    them on a single computer, especially given
    limited computer memory.
   Example: Web search engines/databases
    processing millions of transactions every
    second
PROVIDE CONCURRENCY
(Main Reasons)
   A single compute resource can only do one
    thing at a time. Multiple compute resources
    can do many things simultaneously.
   Example: Collaborative Networks provide a
    global venue where people from around the
    world can meet and conduct work "virtually".
MAKE BETTER USE OF UNDERLYING
PARALLEL HARDWARE
(Main Reasons)
   Modern computers, even
    laptops, are parallel in
    architecture with multiple
    processors/cores.
   Parallel software is
    specifically intended for
    parallel hardware with
    multiple cores, threads, etc.
   In most cases, serial
    programs run on modern
    computers "waste" potential
    computing power.
                                Intel Xeon processor with 6 cores and 6
                                L3 cache units
The Future
(Main Reasons)
   During the past 20+ years, the trends
    indicated by ever faster networks,
    distributed systems, and multi-
    processor computer architectures
    (even at the desktop level) clearly
    show that parallelism is the future of
    computing.
   In this same time period, there has
    been a greater than 500,000x
    increase in supercomputer
    performance, with no end currently in
    sight.
   The race is already on for Exascale
    Computing!
   Exaflop = 1018 calculations per
    second
That’s all for today!!