0% found this document useful (0 votes)

421 views

General Parallel File System 13

The document discusses architectural and design issues in the General Parallel File System (GPFS). GPFS is a file system designed for deep computing environments that uses a general architecture to provide high performance, scalability, high availability, and concurrency control. Key aspects of GPFS include using large file blocks, distributed locking for parallel access, logging and replication for high availability, and flexibility to handle failures of nodes and disks.

Uploaded by

Ankur Rastogi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

421 views

General Parallel File System 13

Uploaded by

Ankur Rastogi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 23

Architectural and Design Issues in the General Parallel File System

May 12, 2002

IBM Research Lab in Haifa

Benny Mandler - [email protected]

Agenda

What is GPFS? H ? a file system for deep computing R GPFS uses L General architecture How does GPFS meet its challenges - architectural issues
performance ? scalability ? high availability ? concurrency control
?

Scalable Parallel Computing

RS/6000 SP Scalable Parallel Computer 1 highH -512 nodes connected by high-speed switch R1-16 CPUs per node (Power2 or PowerPC) >1 TB disk per node L500 MB/s full duplex per switch port

Scalable parallel computing enables I/O-intensive applications: I/ODeep computing - simulation, seismic analysis, data mining Server consolidation - aggregating file, web servers onto a centrally-managed machine centrallyStreaming video and audio for multimedia presentation Scalable object store for large digital libraries, web servers, databases, ...

What is GPFS?

GPFS addresses SP I/O requirements

High Performance - multiple GB/s to/from a single file
concurrent reads and writes, parallel data access - within a file and across files H Support fully parallel access both to file data and metadata client caching enabled by distributed locking R wide striping, large data blocks, prefetch L

Scalability
scales up to 512 nodes (N-Way SMP). Storage nodes, file system nodes, disks, (Nadapters...

High Availability
faultfault-tolerance via logging, replication, RAID support survives node and disk failures Uniform access via shared disks - Single image file system High capacity multiple TB per file system, 100s of GB per file. Standards compliant (X/Open 4.0 "POSIX") with minor exceptions
What is GPFS?

GPFS vs. local and distributed file systems on the SP2

H R L
Native AIX File System (JFS) No file sharing - application can only access files on its own node Applications must do their own data partitioning DCE Distributed File System (follow-up of AFS) Application nodes (DCE clients) share files on server node Switch is used as a fast LAN Coarse-grained (file or segment level) parallelism Server node is performance and capacity bottleneck

GPFS Parallel File System

GPFS file systems are striped across multiple disks on multiple storage nodes Independent GPFS instances run on each application node GPFS instances use storage nodes as "block servers" - all instances can access all disks

Tokyo Video on Demand Trial

Video on Demand for new "borough" of Tokyo Applications: movies, news, karaoke, education ... H Video distribution via hybrid fiber/coax R "live" since June '96 Trial Currently 500 subscribers L

6 Mbit/sec MPEG video streams 100 simultaneous viewers (75 MB/sec) 200 hours of video on line (700 GB) 1212-node SP-2 (7 distribution, 5 storage) SP-

Engineering Design
Major aircraft manufacturer

Using GPFS to store CATIA designs and structural modeling data GPFS allows all nodes to share designs and models

H Using CATIA for large designs, Elfini for structural modeling and analysis R SP used for modeling/analysis L

GPFS uses

Shared Disks - Virtual Shared Disk architecture

File systems consist of one or more shared disks H ? Individual disk can contain data, metadata, or both R ? Disks are designated to failure group L ? Data and metadata are striped to balance load and
maximize parallelism

Recoverable Virtual Shared Disk for accessing disk storage

Disks are physically attached to SP nodes ? VSD allows clients to access disks over the SP switch ? VSD client looks like disk device driver on client node ? VSD server executes I/O requests on storage node. ? VSD supports JBOD or RAID volumes, fencing, multipathing (where physical hardware permits)

GPFS only assumes a conventional block I/O interface

General architecture

GPFS Architecture Overview

Implications of Shared Disk Model H ? All data and metadata on globally accessible disks (VSD) R ? All access to permanent data through disk I/O interface L ? Distributed protocols, e.g., distributed locking, coordinate disk access from
multiple nodes ? Fine-grained locking allows parallel access by multiple clients ? Logging and Shadowing restore consistency after node failures

Implications of Large Scale

Support up to 4096 disks of up to 1 TB each (4 Petabytes) The largest system in production is 75 TB ? Failure detection and recovery protocols to handle node failures ? Replication and/or RAID protect against disk / storage node failure ? On-line dynamic reconfiguration (add, delete, replace disks and nodes; rebalance file system)
?

General architecture

GPFS Architecture - Node Roles

Three types of nodes: file system, storage, and manager
Each node can perform any of these functions H ? File system nodes
?

R run user programs, read/write data to/from storage nodes Limplement virtual file system interface
cooperate with manager nodes to perform metadata operations

Manager nodes (one per file system)

global lock manager recovery manager global allocation manager quota manager file metadata manager admin services fail over

Storage nodes
implement block I/O interface shared access from file system and manager nodes interact with manager nodes for recovery (e.g. fencing) file data and metadata striped across multiple disks on multiple storage nodes
General architecture

GPFS Software Structure

H R L

General architecture

Disk Data Structures: Files

Large block size allows efficient use of disk bandwidth H Fragments reduce space overhead for small files R No designated "mirror", no fixed placement function: L

Flexible replication (e.g., replicate only metadata, or only important files) Dynamic reconfiguration: data can migrate block-by-block Multi level indirect blocks ? Each disk address: list of pointers to replicas ? Each pointer: disk id + sector no.

General architecture

Large File Block Size

Conventional file systems store data in small blocks to pack data H more densely R GPFS uses large blocks (256KB default) to optimize disk transfer L speed

/ ( Th r ughpu t M B sec ) o

4201

698

867

)setybK( eziS refsnarT O/I

046

215

483

652

821

7 6 5 4 3 2 1 0

Performance

Parallelism and consistency

Distributed locking - acquire appropriate lock for every operation H used for updates to user data R Centralized management - conflicting operations forwarded to a L designated node - used for file metadata Distributed locking + centralized hints - used for space allocation Central coordinator - used for configuration changes

I/O slowdown effects

Additional I/O activity rather than token server overload

Parallel File Access From Multiple Nodes

GPFS allows parallel applications on multiple nodes to access nonoverlapping ranges of a single file with no conflict H Global locking serializes access to overlapping ranges of a file R Global locking based on "tokens" which convey access rights to an L object (e.g. a file) or subset of an object (e.g. a byte range) Tokens can be held across file system operations, enabling coherent data caching in clients Cached data discarded or written to disk when token is revoked Performance optimizations: required/desired ranges, metanode, data shipping, special token modes for file size operations

Performance

Deep Prefetch for High Throughput

GPFS stripes successive blocks across successive disks H Disk I/O for sequential reads and writes is done in parallel R GPFS measures application "think time" ,disk throughput, and cache L state to automatically determine optimal parallelism Prefetch algorithms now recognize strided and reverse sequential access Accepts hints Write-behind policy

Application reads at 15 MB/sec Each disk reads at 5 MB/sec

Three I/Os executed in parallel

Performance

GPFS Throughput Scaling for Non-cached Files NonHardware: Power2 wide nodes, SSA disks H Experiment: sequential R read/write from large L number of GPFS nodes to varying number of storage nodes Result: throughput increases nearly linearly with number of storage nodes Bottlenecks:
microchannel limits node throughput to 50MB/s ? system throughput limited by available storage nodes
?

Scalability

Disk Data Structures: Allocation map

Segmented Block Allocation H MAP: R L

Each segment contains bits representing blocks on all disks Each segment is a separately lockable unit Minimizes contention for allocation map when writing files on multiple nodes Allocation manager service provides hints which segments to try

Similar: inode allocation map

Scalability

High Availability - Logging and Recovery

Problem: detect/fix file system inconsistencies after a failure of one H or more nodes R ? All updates that may leave inconsistencies if uncompleted are logged L ? Write-ahead logging policy: log record is forced to disk before dirty metadata is
written ? Redo log: replaying all log records at recovery time restores file system consistency

Logged updates:
?

I/O to replicated data ? directory operations (create, delete, move, ...) ? allocation map changes ordered writes ? shadowing

Other techniques:
?

High Availability

Node Failure Recovery

Application node failure: H ? force-on-steal policy ensures that all changes visible to other nodes have been R written to disk and will not be lost L ? all potential inconsistencies are protected by a token and are logged
file system manager runs log recovery on behalf of the failed node after successful log recovery tokens held by the failed node are released ? actions taken: restore metadata being updated by the failed node to a consistent state, release resources held by the failed node
?

File system manager failure:

new node is appointed to take over ? new file system manager restores volatile state by querying other nodes ? New file system manager may have to undo or finish a partially completed configuration change (e.g., add/delete disk) Dual-attached disk: use alternate path (VSD) ? Single attached disk: treat as disk failure
High Availability

Storage node failure:

Handling Disk Failures

When a disk failure is detected H ? The node that detects the failure informs the file system manager R ? File system manager updates the configuration data to mark the failed disk as L "down" (quorum algorithm) While a disk is down
?

Read one / write all available copies ? "Missing update" bit set in the inode of modified files File system manager searches inode file for missing update bits ? All data & metadata of files with missing updates are copied back to the recovering disk (one file at a time, normal locking protocol) ? Until missing update recovery is complete, data on the recovering disk is treated as write-only Failed disk is deleted from configuration or replaced by a new one ? New replicas are created on the replacement or on other disks

When/if disk recovers

Unrecoverable disk failure

Cache Management

H R L

Stats

Total Cache

Seq / random General Pool: Clock list, merge, re-map optimal, total Seq / random Block Size pool: Clock list optimal, total Seq / random Block Size pool: Clock list optimal, total Seq / random Block Size pool: Clock list optimal, total

Balance dynamically according to usage patterns Avoid fragmentation - internal and external Unified steal Periodical re-balancing

Epilogue

Used on six of the ten most powerful supercomputers in the world, H including the largest (ASCI white) R Installed at several hundred customer sites, on clusters ranging from L a few nodes with less than a TB of disk, up to 512 nodes with 140 TB of disk in 2 file systems IP rich - ~20 filed patents State of the art TeraSort
world record of 17 minutes ? using 488 node SP. 432 file system and 56 storage nodes (604e 332 MHz) ? total 6 TB disk space
?

References
?

GPFS home page: http://www.haifa.il.ibm.com/projects/storage/gpfs.html ? FAST 2002: http://www.usenix.org/events/fast/schmuck.html ? TeraSort - http://www.almaden.ibm.com/cs/gpfs-spsort.html ? Tiger Shark: http://www.research.ibm.com/journal/rd/422/haskin.html

CBPA Sample-Exam-Questions
No ratings yet
CBPA Sample-Exam-Questions
6 pages
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-3: AZ 104 EXAM STUDY GUIDE
From Everand
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-3: AZ 104 EXAM STUDY GUIDE
Devi Prasad
No ratings yet
Mastering Linux Administration: A Comprehensive Guide: The IT Collection
From Everand
Mastering Linux Administration: A Comprehensive Guide: The IT Collection
Christopher Ford
5/5 (1)
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Mastering Proxmox - Second Edition
From Everand
Mastering Proxmox - Second Edition
Wasim Ahmed
No ratings yet
Proxmox High Availability
From Everand
Proxmox High Availability
Simon M.C. Cheng
No ratings yet
Topic 2 Module 2 Communication and Globalization
100% (2)
Topic 2 Module 2 Communication and Globalization
4 pages
TEMS Investigation KPI Definitions - WAP
No ratings yet
TEMS Investigation KPI Definitions - WAP
13 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Linux 5 Day Introduction Course
From Everand
Linux 5 Day Introduction Course
Stephen Edwards
No ratings yet
USB Mass Storage: Designing and Programming Devices and Embedded Hosts
From Everand
USB Mass Storage: Designing and Programming Devices and Embedded Hosts
Jan Axelson
No ratings yet
Free Open Source Linux OS For Data Recovery & Data Rescue Bilingual Version Ultimate
From Everand
Free Open Source Linux OS For Data Recovery & Data Rescue Bilingual Version Ultimate
Cyber Jannah Sakura
No ratings yet
Best Free Open Source Data Recovery Apps for Mac OS English Edition
From Everand
Best Free Open Source Data Recovery Apps for Mac OS English Edition
Cyber Jannah Sakura
No ratings yet
All My IT Tech Posts
From Everand
All My IT Tech Posts
Stephen Edwards
No ratings yet
VMware Horizon 6 Desktop Virtualization Solutions
From Everand
VMware Horizon 6 Desktop Virtualization Solutions
Ryan Cartwright
No ratings yet
Linux Services Deployment
From Everand
Linux Services Deployment
Fabian Mestre
No ratings yet
Hard Circle Drives (HDDs): Uncovering the Center of Information Stockpiling
From Everand
Hard Circle Drives (HDDs): Uncovering the Center of Information Stockpiling
Friend Good
No ratings yet
Gluster Filesystem - Practical Method
From Everand
Gluster Filesystem - Practical Method
Fabian Mestre
No ratings yet
Microsoft Hyper-V Cluster Design
From Everand
Microsoft Hyper-V Cluster Design
Eric Siron
No ratings yet
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
From Everand
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
Steve Brown
No ratings yet
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Raspberry Pi :The Ultimate Step by Step Raspberry Pi User Guide (The Updated Version )
From Everand
Raspberry Pi :The Ultimate Step by Step Raspberry Pi User Guide (The Updated Version )
Jason Scotts
4/5 (4)
Professional Heroku Programming
From Everand
Professional Heroku Programming
Chris Kemp
4/5 (2)
Mastering Hadoop
From Everand
Mastering Hadoop
Sandeep Karanth
No ratings yet
Pop!_OS System Administration Guide: Definitive Reference for Developers and Engineers
From Everand
Pop!_OS System Administration Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Linux for Beginners: Linux Command Line, Linux Programming and Linux Operating System
From Everand
Linux for Beginners: Linux Command Line, Linux Programming and Linux Operating System
Steve Will
4.5/5 (3)
Network File System in Practice: Definitive Reference for Developers and Engineers
From Everand
Network File System in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Managing Multimedia and Unstructured Data in the Oracle Database
From Everand
Managing Multimedia and Unstructured Data in the Oracle Database
Marcelle Kratochvil
No ratings yet
NVMe Performance Hacks
From Everand
NVMe Performance Hacks
Mei Gates
No ratings yet
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Profound Linux For Users
From Everand
Profound Linux For Users
Onder Teker
No ratings yet
Beginner's Guide for Cybercrime Investigators
From Everand
Beginner's Guide for Cybercrime Investigators
Nicolae Sfetcu
5/5 (1)
Oracle Database 11g - Underground Advice for Database Administrators: Beyond the basics
From Everand
Oracle Database 11g - Underground Advice for Database Administrators: Beyond the basics
April C. Sims
No ratings yet
The Ceph Handbook: Building and Managing Scalable Distributed Storage Systems
From Everand
The Ceph Handbook: Building and Managing Scalable Distributed Storage Systems
Robert Johnson
No ratings yet
Administering ArcGIS for Server
From Everand
Administering ArcGIS for Server
Hussein Nasser
No ratings yet
Learning Hadoop 2
From Everand
Learning Hadoop 2
Garry Turkington
4/5 (1)
PostgreSQL Replication - Second Edition
From Everand
PostgreSQL Replication - Second Edition
Hans-Jurgen Schonig
No ratings yet
Information Technology HandBook
From Everand
Information Technology HandBook
Duong Tran
3/5 (1)
PC Essentials | Learn Basic Computing
From Everand
PC Essentials | Learn Basic Computing
Nolo Nob
No ratings yet
FreeBSD Mastery: Specialty Filesystems: IT Mastery, #8
From Everand
FreeBSD Mastery: Specialty Filesystems: IT Mastery, #8
Michael W. Lucas
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Config File Types
From Everand
Config File Types
Frank Wellington
No ratings yet
Operating Systems: Concepts to Save Money, Time, and Frustration
From Everand
Operating Systems: Concepts to Save Money, Time, and Frustration
Jonathan Rigdon
No ratings yet
Unix / Linux FAQ: with Tips to Face Interviews
From Everand
Unix / Linux FAQ: with Tips to Face Interviews
Prof. N.B. Venkateswarlu
No ratings yet
“Information Systems Unraveled: Exploring the Core Concepts”: GoodMan, #1
From Everand
“Information Systems Unraveled: Exploring the Core Concepts”: GoodMan, #1
Patrick Mukosha
No ratings yet
Debian System Essentials: Definitive Reference for Developers and Engineers
From Everand
Debian System Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Software Knowledge
From Everand
Software Knowledge
Debojit Acharjee
No ratings yet
C++ File Handling Step by Step: A Practical Guide with Examples
From Everand
C++ File Handling Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Linux: Learn in 24 Hours
From Everand
Linux: Learn in 24 Hours
Alex Nordeen
4.5/5 (3)
Advanced Fuse Implementation: Definitive Reference for Developers and Engineers
From Everand
Advanced Fuse Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Linux: A Comprehensive Guide to Linux Operating System and Command Line
From Everand
Linux: A Comprehensive Guide to Linux Operating System and Command Line
Sam Griffin
No ratings yet
Go File Handling for New Coders: A Practical Guide with Examples
From Everand
Go File Handling for New Coders: A Practical Guide with Examples
William E. Clark
No ratings yet
Concise Oracle Database For People Who Has No Time
From Everand
Concise Oracle Database For People Who Has No Time
Billy Aung Myint
No ratings yet
FreeBSD Mastery: Storage Essentials: IT Mastery, #4
From Everand
FreeBSD Mastery: Storage Essentials: IT Mastery, #4
Michael W. Lucas
No ratings yet
Living with Linux in the Industrial World
From Everand
Living with Linux in the Industrial World
Elaiya Iswera Lallan
No ratings yet
Linux Command Line for New Users: A Practical Guide with Examples
From Everand
Linux Command Line for New Users: A Practical Guide with Examples
William E. Clark
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Python File Handling Made Easy: A Practical Guide with Examples
From Everand
Python File Handling Made Easy: A Practical Guide with Examples
William E. Clark
No ratings yet
Mastering Apache Cassandra - Second Edition
From Everand
Mastering Apache Cassandra - Second Edition
Nishant Neeraj
No ratings yet
Ansible by Examples: 200+ Automation Examples For Linux and Windows System Administrators and DevOps
From Everand
Ansible by Examples: 200+ Automation Examples For Linux and Windows System Administrators and DevOps
Luca Berton
No ratings yet
QuickStart Guide to Db2 Development with Python
From Everand
QuickStart Guide to Db2 Development with Python
Roger E. Sanders
No ratings yet
Temporal
No ratings yet
Temporal
38 pages
DOVE Final
100% (1)
DOVE Final
27 pages
Knowledge Discovery in Databases PKDD 2007 1st Edition by Anique ISBN - Quickly download the ebook to start your content journey
100% (8)
Knowledge Discovery in Databases PKDD 2007 1st Edition by Anique ISBN - Quickly download the ebook to start your content journey
91 pages
F4AK_AnimationGuide
No ratings yet
F4AK_AnimationGuide
27 pages
Node-Js-React-Js-Django - LAB
No ratings yet
Node-Js-React-Js-Django - LAB
38 pages
CVSS v3.0 Specification Document
No ratings yet
CVSS v3.0 Specification Document
25 pages
How To Use Outlook Meetings Room Calendars and Book A Shared Resource
No ratings yet
How To Use Outlook Meetings Room Calendars and Book A Shared Resource
22 pages
A+ Course Syllabus
No ratings yet
A+ Course Syllabus
4 pages
About Face 3
100% (5)
About Face 3
651 pages
Isometric Projection
No ratings yet
Isometric Projection
4 pages
Design a NFA That Accepts the Language Over the Alphabet, Σ= {0, 1,2} Where the Decimal Equivalent of the Language is Divisible by 3.
No ratings yet
Design a NFA That Accepts the Language Over the Alphabet, Σ= {0, 1,2} Where the Decimal Equivalent of the Language is Divisible by 3.
3 pages
Pro Editor: Family Material K50L2 A1 Audible Alarm Q Connector Rgb7 Color and Input
No ratings yet
Pro Editor: Family Material K50L2 A1 Audible Alarm Q Connector Rgb7 Color and Input
8 pages
Anaconda Certification PDF
No ratings yet
Anaconda Certification PDF
2 pages
My Mgt613 Quiz 3 by Soban
No ratings yet
My Mgt613 Quiz 3 by Soban
6 pages
C++ Maze Game
No ratings yet
C++ Maze Game
15 pages
CT Remote Control Kit User Instruction - UM - 5849873-199 - EN - 1
No ratings yet
CT Remote Control Kit User Instruction - UM - 5849873-199 - EN - 1
16 pages
Atomos Shogun User Manual
No ratings yet
Atomos Shogun User Manual
44 pages
GSTR 4 Offline Utility v3.0
No ratings yet
GSTR 4 Offline Utility v3.0
4,196 pages
Salesforce Associate - Frequently Asked Questions With Answers (1)
No ratings yet
Salesforce Associate - Frequently Asked Questions With Answers (1)
14 pages
Adas G 010 1
No ratings yet
Adas G 010 1
55 pages
04-Linux Shell Scripting
No ratings yet
04-Linux Shell Scripting
57 pages
Bonanza Online Project
No ratings yet
Bonanza Online Project
59 pages
Mobikwik Integration Guide
0% (1)
Mobikwik Integration Guide
36 pages
Chapter 5 Micropython
No ratings yet
Chapter 5 Micropython
25 pages
Lab Report 2
No ratings yet
Lab Report 2
17 pages
Slybus - HM
No ratings yet
Slybus - HM
77 pages
Characterization of The Chemical Effects of Ceria Slurries For Dielectric Chemical Mechanical Polishing
No ratings yet
Characterization of The Chemical Effects of Ceria Slurries For Dielectric Chemical Mechanical Polishing
148 pages
DIT 0201-HCT0104-Database Notes
No ratings yet
DIT 0201-HCT0104-Database Notes
37 pages
Math 132 Notes
No ratings yet
Math 132 Notes
43 pages

General Parallel File System 13

Uploaded by

General Parallel File System 13

Uploaded by

Architectural and Design Issues in the General Parallel File System

May 12, 2002

IBM Research Lab in Haifa

Benny Mandler - [email protected]

Scalable Parallel Computing

GPFS addresses SP I/O requirements

GPFS vs. local and distributed file systems on the SP2

GPFS Parallel File System

Tokyo Video on Demand Trial

Shared Disks - Virtual Shared Disk architecture

Recoverable Virtual Shared Disk for accessing disk storage

GPFS only assumes a conventional block I/O interface

GPFS Architecture Overview

Implications of Large Scale

GPFS Architecture - Node Roles

Manager nodes (one per file system)

GPFS Software Structure

Disk Data Structures: Files

Large File Block Size

)setybK( eziS refsnarT O/I

Parallelism and consistency

I/O slowdown effects

Parallel File Access From Multiple Nodes

Deep Prefetch for High Throughput

Application reads at 15 MB/sec Each disk reads at 5 MB/sec

Three I/Os executed in parallel

Disk Data Structures: Allocation map

Segmented Block Allocation H MAP: R L

Similar: inode allocation map

High Availability - Logging and Recovery

Node Failure Recovery

File system manager failure:

Storage node failure:

Handling Disk Failures

When/if disk recovers

Unrecoverable disk failure

You might also like