Module 6 - Simple Storage Service(S3)
Module 6 - Simple Storage Service(S3)
Advantages of S3
Amazon S3 is intentionally built with a minimal feature set that focuses on simplicity and robustness
- Creating buckets – Create and name a bucket that stores data. Buckets are the fundamental
container in Amazon S3 for data storage.
- Storing data – Store an infinite amount of data in a bucket. Upload as many objects as you like into an
Amazon S3 bucket. Each object can contain up to 5 TB of data.
- Downloading data – Download your data or enable others to do so. Download your data anytime you
like, or allow others to do the same.
- Permissions – Grant or deny access to others who want to upload or download data into your
Amazon S3 bucket.
Amazon S3 concepts
Buckets
- To upload your data (photos, videos, documents etc.) to Amazon S3, you must first create an S3
bucket in one of the AWS Regions.
- A bucket is a region specific
- A bucket is a container for objects stored in Amazon S3.
- Every object is contained in a bucket.
- By default, you can create up to 100 buckets in each of your AWS accounts. If you need more buckets,
you can increase your account bucket limit to a maximum of 1,000 buckets by submitting a service
limit increase.
- For example, if the object named photos/puppy.jpg is stored in the john bucket in the US West
(Oregon) Region, then it is addressable using the URL
https://john.s3.us-west-2.amazonaws.com/photos/puppy.jpg
Region
- You can choose the geographical AWS Region where Amazon S3 will store the buckets that you
create.
- You might choose a Region to optimize latency, minimize costs, or address regulatory requirements.
- Objects stored in a Region never leave the Region unless you explicitly transfer them to another
Region.
- For example, objects stored in the Europe (Ireland) Region never leave it.
Object
- Amazon S3 is a simple key, value store designed to store as many objects as you want.
- You store these objects in one or more buckets.
- S3 supports object level storage i.e., it stores the file as a whole and does not divide them
- An object size can be in between 0 KB and 5 TB
- When you upload an object in a bucket, it replicates itself in multiple availability zones in the same
region
An object consists of the following:
- When you re-upload the same object name in a bucket, it replaces the whole object
- You can use versioning to keep multiple versions of an object in one bucket.
- For example, you could store my-image.jpg (version 111111) and my-image.jpg (version 222222) in a
single bucket.
- Versioning protects you from the consequences of unintended overwrites and deletions.
- You must explicitly enable versioning on your bucket. By default, versioning is disabled.
- Regardless of whether you have enabled versioning, each object in your bucket has a version ID.
- If you have not enabled versioning, Amazon S3 sets the value of the version ID to null. If you have
enabled versioning, Amazon S3 assigns a unique version ID value for the object.
- When you enable versioning on a bucket, objects already stored in the bucket are unchanged. The
version IDs (null), contents, and permissions remain the same.
- Enabling and suspending versioning is done at the bucket level.
- When you enable versioning for a bucket, all objects added to it
will have a unique version ID. Unique version IDs are randomly
generated.
- This functionality prevents you from accidentally overwriting or
deleting objects and affords you the opportunity to retrieve a
previous version of an object.
- When you DELETE an object, all versions remain in the bucket
and Amazon S3 inserts a delete marker, as shown in the
following figure.
- The delete marker becomes the current version of the object.
You can, however, GET a noncurrent version of an object by specifying its version ID
Data protection refers to protecting data while in-transit (as it travels to and from Amazon S3) and at
rest (while it is stored on disks in Amazon S3 data centers)
Server-side encryption – Amazon S3 encrypts your objects before saving them on disks in AWS data
centers and then decrypts the objects when you download them
Client-side encryption – You encrypt your data client-side and upload the encrypted data to Amazon S3.
In this case, you manage the encryption process, encryption keys, and related tools.
Static Website Hosting
- For performance-sensitive use cases (those that require millisecond access time) and frequently
accessed data, Amazon S3 provides the following storage class:
S3 Standard – The default storage class. If you don't specify the storage class when you upload an
object, Amazon S3 assigns the S3 Standard storage class
- The S3 Standard-IA and S3 One Zone-IA storage classes are designed for long-lived and infrequently
accessed data(IA stands for infrequent access)
- S3 Standard-IA and S3 One Zone-IA objects are available for millisecond access (similar to the S3
Standard storage class)
- Amazon S3 charges a retrieval fee for these objects, so they are most suitable for infrequently
accessed data.
Storage classes for archiving objects
The S3 Glacier Instant Retrieval, S3 Glacier Flexible Retrieval, and S3 Glacier Deep Archive storage
classes are designed for low-cost data archiving. These storage classes offer the same durability and
resiliency as the S3 Standard and S3 Standard-IA storage classes.
- S3 Glacier Instant Retrieval – Use for archiving data that is rarely accessed and requires milliseconds
retrieval. S3 Glacier Instant Retrieval has higher data access costs than S3 Standard-IA
- S3 Glacier Flexible Retrieval – Use for archives where portions of the data might need to be
retrieved in minutes. Data stored in the S3 Glacier Flexible Retrieval storage class has a minimum
storage duration period of 90 days and can be accessed in as little as 1-5 minutes by using an
expedited retrieval. The retrieval time is flexible, and you can request free bulk retrievals in up to 5-
12 hours
- S3 Glacier Deep Archive – Use for archiving data that rarely needs to be accessed. Data stored in
the S3 Glacier Deep Archive storage class has a minimum storage duration period of 180 days and a
default retrieval time of 12 hours.
AWS S3 Glacier
- Archive Access – S3 Intelligent-Tiering provides you with the option to activate the Archive Access
tier for data that can be accessed asynchronously. After activation, the Archive Access tier
automatically archives objects that have not been accessed for a minimum of 90 consecutive days
- Deep Archive Access – S3 Intelligent-Tiering provides you with the option to activate the Deep
Archive Access tier for data that can be accessed asynchronously. After activation, the Deep Archive
Access tier automatically archives objects that have not been accessed for a minimum of 180
consecutive days.
Object Life Cycle Management
You can replicate objects between different AWS Regions or within the same AWS Region.
1. Cross-Region replication (CRR) is used to copy objects across Amazon S3 buckets in different AWS
Regions.
2. Same-Region replication (SRR) is used to copy objects across Amazon S3 buckets in the same AWS
Region.
How it works
Bucket Policies
Example: The following access point policy grants IAM user Jane in account 123456789012 permissions
to GET and PUT objects with the prefix Jane/ through the access point my-access-point in
account 123456789012
Amazon S3 Transfer Acceleration
- You can delete the objects individually. Or you can empty a bucket, which deletes all the objects in the
bucket without deleting the bucket.
- You can also delete a bucket and all the objects contained in the bucket.
- If you want to use the same bucket, don’t delete the bucket, can empty the bucket and keep it.
- After you delete the bucket, It is available for re-use, but the name might not be available for you to
reuse for various reasons. For example, it might take some time before the name can be reused, and
some other account could create a bucket with that name before you do.
Cloud Front
Introduction to Cloud Front distributions and Cloud Front Edge locations
Is a web service that speeds up distribution of your static and dynamic web content, such as .html, .css,
.js, and image files, to your users
⁻ CloudFront delivers your content through a worldwide network of data centers called edge locations
⁻ When a user requests content that you're serving with CloudFront, the user is routed to the edge
location that provides the lowest latency (time delay), so that content is delivered with the best
possible performance
⁻ If the content is already in the edge location with the lowest latency, CloudFront delivers it
immediately
⁻ If the content is not in that edge location, CloudFront retrieves it from an origin that you've defined—
such as an Amazon S3 bucket, a Media Package channel, or an HTTP server (for example, a web
server)
⁻ You also get increased reliability and availability because copies of your files (also known as
objects) are now held (or cached) in multiple edge locations around the world
How Cloud Front
works ?
AWS Data Migration
services
AWS Storage Gateway
- Hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage
- Provides a standard set of storage protocols such as iSCSI, SMB, and NFS, which allow you to use AWS
storage without rewriting your existing applications
- Provides low-latency performance by caching frequently accessed data on premises, while storing
data securely and durably
The Amazon S3 File Gateway enables you to store and retrieve objects in
Amazon Simple Storage Service (S3) using file protocols such as Network File System
(NFS) and Server Message Block (SMB). Objects written through S3 File Gateway can
File gateway be directly accessed in S3
The Amazon FSx File Gateway enables you to store and retrieve files in Amazon FSx
for Windows File Server using the SMB protocol. Files written through Amazon FSx
File Gateway are directly accessible in Amazon FSx for Windows File Server.
Amazon FSx
File Gateway
The Tape Gateway provides your backup application with an iSCSI virtual tape
library (VTL) interface, consisting of a virtual media changer, virtual tape drives,
and virtual tapes. Virtual tapes are stored in Amazon S3 and can be archived to
Tape Gateway Amazon S3 Glacier or Amazon S3 Glacier Deep Archive.
Fundamental blueprint
AWS DataSync
- Move petabytes of data to and from AWS, or process data at the edge
- AWS Snow Family devices are physical devices
It is a small, rugged, and secure device offering edge computing, data storage, and data transfer on-
the-go, in austere environment with little or no connectivity.
How it works
AWS Snowball
How it works
AWS Snowmobile
How it works
Feature comparison matrix
1. What are bucket naming guidelines ?
2. What is a bucket and object ?
3. Min and max size of an object ?
4. Maximum size of an S3 bucket ?
5. Maximum number of buckets per account (soft and hard limits) ?
6. What is multipart upload ?
7. What is versioning in S3 ?
8. Different storage classes and their uses ?
9. What is Lifecycle management ?
10.Cross region and same region replication ?
11.What are different Storage gateway options available ?
12.Define DataSync ?
13.What are the different types of data migration service available ?