Lustre, in a nutshell
References
[1] Lustre: A Scalable, High-Performance File System (2002)
https://www.cse.buffalo.edu/faculty/tkosar/cse710/papers/lustre-whitepaper.pdf
[2] Lustre: Building a File System for 1,000-node Clusters (2003)
https://www.kernel.org/doc/ols/2003/ols2003-pages-380-386.pdf
[3] Peta-Scale I/O With the Lustre File System (2008) http://wiki.lustre.org/images/9/90/Peta-Scale_wp.pdf
[4] Architecting a High Performance Storage System (2014)
https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/architecting-lustre-storage-whit
e-paper.pdf
[5] Programming Locking Applications (2009) http://people.redhat.com/ccaulfie/docs/rhdlmbook.pdf
[6] Understanding Lustre Filesytem Internals (2009)
http://wiki.lustre.org/images/d/da/Understanding_Lustre_Filesystem_Internals.pdf
→ A Parallel file system for HPC high-bandwidth low-latency site-shared storage system.
→ At its core
- it separates metadata operations from data operations (MDS different than OSSs),
- with multiple storage servers that can serve client in parallel,
- multiple storage targets can be attached to each storage server.
→ More
- storage servers means (mainly) higher bandwidth/throughput (to the limit that
network allow and Applications I/O patterns)
- storage targets per storage server means more capacity
→ Somehow, they are all interconnected: check architecting a high-performance storage
system[slide7]
In addition to that, it can ensure
- No single point of failure (high-availability) through failover and replicated
transactional records (for metadata servers mainly)
- Dedicated metadata servers for hot subdirectories
- NO data replication, reliability should be ensured by the storage targets (SAN, RAID,
… etc)
From a client perspective
→ completely transparent POSIX compliant file system (through its DLM[5]), as if it is local
→ client can talk directly to one or multiple storage servers
→ to exploit the parallelism (of the PFS), lustre clients strip files across multiple storage
servers
- That is the most (single) important factor for achieving better performance with
lustre!
Lustre internal components: Can be grouped in three categories[6]
- Lustre client (VFS, LLite,LOV,MDC)
- Lustre server
- Server fronts (OSS,OST,MDS, MGS)
- Server backends (LDLM,OBDFilter,Journal,ldisks,VFS)
- And LNET+RPC: orthogonal to all
Architecting a High Performance Storage System[4]
→ A systematic approach to storage system design
1. Requirements analysis
2. Pipeline approach to evaluate components
3. Iterative design process (S-C-E cycle)
Architecting a High Performance Storage System[4]
→ A systematic approach to storage system design
1. Requirements analysis: in a top-down approach that creates a complete view of the
system
Architecting a High Performance Storage System[4]
→ A systematic approach to storage system design
1. Requirements analysis
2. Pipeline approach to evaluate components: in order, by following the path of a byte
of data as it flows from a disk, through the intervening components, to the
application
Architecting a High Performance Storage System[4]
→ A systematic approach to storage system design
1. Requirements analysis
2. Pipeline approach to evaluate components
3. Iterative design process (S-C-E cycle)