Cloud Infrastructure Services (CIS) Course Outcome (CO) 2: (Session 7)
Cloud Infrastructure Services (CIS) Course Outcome (CO) 2: (Session 7)
Course Outcome ( CO ) 2
( Session 7 )
Session: 7
Glacier – Content Delivery Platforms
AWS Glacier :
Amazon Glacier is a storage service optimized for archival, infrequently used data, or “cold
data.”
Glacier is an extremely low-cost storage service that provides durable storage with security
features for data archiving and backup.
With Amazon Glacier, one can reliably store their data for as little as $0.007 per gigabyte per month.
It enables to offload the administrative burdens of operating and scaling storage to AWS.
There is no worry about capacity planning, hardware provisioning, data replication,
hardware failure detection and repair, or time-consuming hardware migrations.
Amazon Glacier is designed for use with other Amazon web services.
The data can be moved between Amazon Glacier and Amazon S3 by using S3 data
lifecycle policies.
Glacier can store virtually any kind of data in any format.
All data is encrypted on the server side with Glacier handling key management and key protection.
It uses AES-256, one of the strongest block ciphers available
Glacier allows interaction through AWS Management Console, Command Line Interface CLI and
SDKs or REST based APIs.
Management console is used only to create and delete vaults. Rest of the operations are used to
upload, download, create jobs. CLI, SDK or REST based APIS are used for retrieval.
Use cases
Digital media archives
Data that must be retained for regulatory compliance
Financial and healthcare records
Raw genomic sequence data
Long-term database backups
Amazon Glacier Data Model
Amazon Glacier data model core concepts include vaults, archives, job and notification-
configuration resources.
Vault –
o
A vault is a container for storing archives
o
Each vault resource has a unique address, which comprises of the region the vault was
created and the unique vault name within the region and account for e.g. https://glacier.us-
west-2.amazonaws.com/111122223333/vaults/examplevault
o
Vault allows storage of unlimited number of archives
o
Glacier supports various vault operations which are region specific
o
An AWS account can create up to 1,000 vaults per region.
Archive –
o
An archive can be any data such as a photo, video, or document and is a base unit
of storage in Glacier.
o
Each archive has a unique ID and an optional description, which can only be
specified during the upload of an archive.
o
Glacier assigns the archive an ID, which is unique in the AWS region in which it is
stored.
o
Archive can be uploaded in a single request. While for large archives, Glacier
provides a multipart upload API that enables uploading an archive in parts.
Jobs –
o
A Job is required to retrieve an Archive and vault inventory list.
o
Jobs are asynchronous.
o
Most jobs take about four hours to complete.
o
A job is first initiated and then the output of the job is downloaded after the job is completed
o
Vault inventory jobs needs the vault name
o
Data retrieval jobs needs both the vault name and the archive id, with an optional description
o
A vault can have multiple jobs in progress at any point in time and can be identified by Job ID,
assigned when is it created for tracking
o
Glacier maintains job information such as job type, description, creation date, completion date, and
job status and can be queried
o
After the job completes, the job output can be downloaded in full or partially by specifying a byte
range.
Notification Configuration –
o
As the jobs are asynchronous, Glacier supports notification mechanism to a SNS
topic when job completes.
o
SNS topic for notification can either be specified with each individual job request
or with the vault.
o
Glacier stores the notification configuration as a JSON document
Glacier Data Retrieval Options
•Glacier provides three options for retrieving data with varying access times and cost: Expedited,
Standard, and Bulk retrievals.
•Standard retrievals –
• Standard retrievals allow access to any of the archives within several hours.
• Standard retrievals typically complete within 3-5 hours.
•Bulk retrievals –
• Bulk retrievals are Glacier’s lowest-cost retrieval option, enabling retrieval of large amounts,
even petabytes of data per day inexpensively.
• Bulk retrievals typically complete within 5-12 hours.
•Expedited retrievals –
Expedited retrievals allows quick access to the data when occasional urgent requests for a
subset of archives are required.
For all but the largest archives (250MB+), data accessed using Expedited retrievals are
typically made available within 1 – 5 minutes.
There are two types – On-Demand and Provisioned.
Glacier Supported Operations
•Vault Operations
•Archive Operations
•Jobs
Vault Operations
Glacier provides operations to create and delete vaults.
A vault can be deleted only if there are no archives in the vault as of the last computed inventory and there have been no writes to the vault
since the last inventory (as the inventory is prepared periodically)
Vault Inventory
Vault inventory helps retrieve list of archives in a vault with information such as archive ID, creation date, and size for each
archive
Inventory for each vault is prepared periodically, every 24 hours
Vault inventory is updated approximately once a day, starting on the day the first archive is uploaded to the vault.
Glacier returns the last inventory it generated, which is a point-in-time snapshot and not real-time data.
Vault Metadata or Description can also be obtained for a specific vault or for all vaults in a region, which provides information such as
creation date,
number of archives in the vault,
total size in bytes used by all the archives in the vault,
and the date the vault inventory was generated
Glacier also provides operations to set, retrieve, and delete a notification configuration on the vault. Notifications can be used to identify
vault events.
Archive Operations
Glacier provides operations to upload, download and delete archives.
Uploading an Archive -
An archive can be uploaded in a single operation (1 byte to up to 4 GB in size ) or in parts referred as Multipart
upload (40 TB)
Multipart Upload helps to
improve the upload experience for larger archives.
upload archives in parts, independently, parallely and in any order
faster recovery by needing to upload only the part that failed upload and not the entire archive.
upload archives without even knowing the size
upload archives from 1 byte to about 40,000 GB (10,000 parts * 4 GB) in size
Glacier returns a response that includes an archive ID which is unique in the region in which the archive is stored
Glacier does not support any additional metadata information apart from an optional description. Any additional
metadata information required should be maintained at client side
Downloading an Archive –
Downloading an archive is an asynchronous operation and is the 2 step process
Initiate an archive retrieval job
o When a Job is initiated, a job ID is returned as a part of the response
o Job is executed asynchronously and the output can be downloaded after the job completes
o Job can be initiated to download the entire archive or a portion of the archive
After the job completes, download the bytes
o Archive can downloaded as all the bytes or specific byte range to download only a portion of the output
o Downloading the archive in chunks helps in the event of the download failure, as only that part needs to be
downloaded
o Job completion status can be checked by
o Check status explicitly (Not Recommended)
o Completion notification
Deleting an Archive –
An archive can be deleted from the vault only one at a time
This operation is idempotent. Deleting an already-deleted archive does not result in an error
AWS applies pro-rated charge for items that are deleted prior to 90 days, as it is meant for long term
storage
Updating an Archive –
An existing archive cannot be updated and must be deleted and re-uploaded, which would be
•
Glacier returns checksum in the response which can be used to verify if any errors in download by
•
•
Control bandwidth costs
•
Manage your data downloads
•
Retrieve a targeted part of a large archive
Control bandwidth costs
Glacier allows retrieval of up to 5 percent of the average monthly storage for free each month.
•
• Meet the monthly free allowance of 5 percent by spreading out the data requested.
• If the amount of data retrieved doesn’t meet the free allowance percentage, scheduling range retrievals
enables reduction of peak retrieval rate, which determines the retrieval fees.
Manage your data downloads
Glacier allows retrieved data to be downloaded for 24 hours after the retrieval request completes.
•
Only portions of the archive can be retrieved so that the schedule of downloads can be managed within the
•
credentials.
S3 Glacier provides a console to create and delete vaults.
The code must be written for the archive and job operations using either the REST API or
the AWS SDK wrapper libraries.
Regions and Endpoints
A vault can be created in a specific AWS Region.
Glacier requests are sent to an endpoint which is specific to an AWS region.
The list of regions and endpoints are given in the below table.
Region Name Code
US East (Ohio) us-east-2
US East (N. Virginia) us-east-1
US West (N. California) us-west-1
US West (Oregon) us-west-2
Africa (Cape Town) af-south-1
Asia Pacific (Hong Kong) ap-east-1
Asia Pacific (Mumbai) ap-south-1
Asia Pacific (Osaka) ap-northeast-3
Asia Pacific (Seoul) ap-northeast-2
Asia Pacific (Singapore) ap-southeast-1
Asia Pacific (Sydney) ap-southeast-2
Asia Pacific (Tokyo) ap-northeast-1
Canada (Central) ca-central-1
China (Beijing) cn-north-1
China (Ningxia) cn-northwest-1
Europe (Frankfurt) eu-central-1
Europe (Ireland) eu-west-1
Europe (London) eu-west-2
Europe (Milan) eu-south-1
Europe (Paris) eu-west-3
Europe (Stockholm) eu-north-1
Middle East (Bahrain) me-south-1
South America (São Paulo) sa-east-1
Performance
As glacier is a low-cost storage service designed to store data, the Amazon Glacier retrieval
jobs typically complete in 3 to 5 hours.
The upload experience for larger archives can be improved by using multipart upload for
archive.
Durability and Availability
Amazon Glacier is designed to provide average annual durability of 99.99999999999
or down as needed.
Security
By default, only we can access our Amazon Glacier data.
If others need to access our data, we can set up data access control in Amazon Glacier by
vault.
Amazon Glacier uses server-side encryption to encrypt all data at rest.
It handles key management and key protection by using one of the strongest block ciphers
compliance rules.
We can set the compliance controls on individual Amazon Glacier vaults and enforce
these by using lockable policies.
For example, one might specify controls such as “undeletable records” or “time based
data retention” in a Vault Lock policy and then lock the policy from future edits.
After it is locked, the policy becomes immutable, and Amazon Glacier enforces the
prescribed controls to help achieve the compliance objectives.
To monitor data access, Amazon Glacier is integrated with AWS CloudTrail, allowing
any API calls made to Amazon Glacier in our AWS account to be captured and stored in
log files that are delivered to an Amazon S3 bucket that we specify.
Interfaces
There are two ways to use Amazon Glacier, each with its own interfaces.
The Amazon Glacier API provides both management and data operations.
First, Amazon Glacier provides a native, standard-based REST web services interface.
This interface can be accessed using the Java SDK or the .NET SDK.
•
•
The AWS Management Console or Amazon Glacier API actions can be used to create vaults
to organize the archives in Amazon Glacier.
Then use the Amazon Glacier API actions to upload and retrieve archives, to monitor the
•
status of our jobs and to configure our vault to send a notification through Amazon SNS
when a job is complete.
Second, Amazon Glacier can be used as a storage class in Amazon S3 by using object
lifecycle management that provides automatic, policy-driven archiving from Amazon S3 to
Amazon Glacier.
We simply set one or more lifecycle rules for an Amazon S3 bucket, defining what objects