0% found this document useful (0 votes)
68 views

Cloud Infrastructure Services (CIS) Course Outcome (CO) 2: (Session 7)

Glacier is AWS's low-cost storage service optimized for infrequently accessed data. It allows storing unlimited data for as little as $0.007 per GB per month. Data is stored in vaults which can contain archives (files). Archives can be retrieved using jobs that typically complete within 3-5 hours for standard retrievals. Glacier provides APIs to perform vault operations like creation/deletion and archive operations like upload/download/deletion using jobs. Notifications can be configured on vaults. Range retrievals allow downloading portions of archives.

Uploaded by

srinivas tikkana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views

Cloud Infrastructure Services (CIS) Course Outcome (CO) 2: (Session 7)

Glacier is AWS's low-cost storage service optimized for infrequently accessed data. It allows storing unlimited data for as little as $0.007 per GB per month. Data is stored in vaults which can contain archives (files). Archives can be retrieved using jobs that typically complete within 3-5 hours for standard retrievals. Glacier provides APIs to perform vault operations like creation/deletion and archive operations like upload/download/deletion using jobs. Notifications can be configured on vaults. Range retrievals allow downloading portions of archives.

Uploaded by

srinivas tikkana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 31

Cloud Infrastructure Services (CIS)

Course Outcome ( CO ) 2
( Session 7 )
Session: 7
Glacier – Content Delivery Platforms
AWS Glacier :
Amazon Glacier is a storage service optimized for archival, infrequently used data, or “cold

data.”

Glacier is an extremely low-cost storage service that provides durable storage with security
features for data archiving and backup.
With Amazon Glacier, one can reliably store their data for as little as $0.007 per gigabyte per month.


It enables to offload the administrative burdens of operating and scaling storage to AWS.

There is no worry about capacity planning, hardware provisioning, data replication,
hardware failure detection and repair, or time-consuming hardware migrations.

Amazon Glacier is designed for use with other Amazon web services.

The data can be moved between Amazon Glacier and Amazon S3 by using S3 data
lifecycle policies.

Glacier can store virtually any kind of data in any format.

All data is encrypted on the server side with Glacier handling key management and key protection.
It uses AES-256, one of the strongest block ciphers available

Glacier allows interaction through AWS Management Console, Command Line Interface CLI and
SDKs or REST based APIs.

Management console is used only to create and delete vaults. Rest of the operations are used to
upload, download, create jobs. CLI, SDK or REST based APIS are used for retrieval.
Use cases


Digital media archives

Data that must be retained for regulatory compliance

Financial and healthcare records

Raw genomic sequence data

Long-term database backups
Amazon Glacier Data Model
Amazon Glacier data model core concepts include vaults, archives, job and notification-

configuration resources.
Vault –

o
A vault is a container for storing archives
o
Each vault resource has a unique address, which comprises of the region the vault was
created and the unique vault name within the region and account for e.g. https://glacier.us-
west-2.amazonaws.com/111122223333/vaults/examplevault
o
Vault allows storage of unlimited number of archives
o
Glacier supports various vault operations which are region specific
o
An AWS account can create up to 1,000 vaults per region.

Archive –
o
An archive can be any data such as a photo, video, or document and is a base unit
of storage in Glacier.
o
Each archive has a unique ID and an optional description, which can only be
specified during the upload of an archive.
o
Glacier assigns the archive an ID, which is unique in the AWS region in which it is
stored.
o
Archive can be uploaded in a single request. While for large archives, Glacier
provides a multipart upload API that enables uploading an archive in parts.

Jobs –
o
A Job is required to retrieve an Archive and vault inventory list.
o
Jobs are asynchronous.
o
Most jobs take about four hours to complete.
o
A job is first initiated and then the output of the job is downloaded after the job is completed
o
Vault inventory jobs needs the vault name
o
Data retrieval jobs needs both the vault name and the archive id, with an optional description
o
A vault can have multiple jobs in progress at any point in time and can be identified by Job ID,
assigned when is it created for tracking
o
Glacier maintains job information such as job type, description, creation date, completion date, and
job status and can be queried
o
After the job completes, the job output can be downloaded in full or partially by specifying a byte
range.

Notification Configuration –
o
As the jobs are asynchronous, Glacier supports notification mechanism to a SNS
topic when job completes.
o
SNS topic for notification can either be specified with each individual job request
or with the vault.
o
Glacier stores the notification configuration as a JSON document
Glacier Data Retrieval Options
•Glacier provides three options for retrieving data with varying access times and cost: Expedited,
Standard, and Bulk retrievals.
•Standard retrievals –
• Standard retrievals allow access to any of the archives within several hours.
• Standard retrievals typically complete within 3-5 hours.
•Bulk retrievals –
• Bulk retrievals are Glacier’s lowest-cost retrieval option, enabling retrieval of large amounts,
even petabytes of data per day inexpensively.
• Bulk retrievals typically complete within 5-12 hours.
•Expedited retrievals –

Expedited retrievals allows quick access to the data when occasional urgent requests for a
subset of archives are required.

For all but the largest archives (250MB+), data accessed using Expedited retrievals are
typically made available within 1 – 5 minutes.

There are two types – On-Demand and Provisioned.
Glacier Supported Operations
•Vault Operations

•Archive Operations

•Jobs
Vault Operations
Glacier provides operations to create and delete vaults.

A vault can be deleted only if there are no archives in the vault as of the last computed inventory and there have been no writes to the vault
since the last inventory (as the inventory is prepared periodically)
Vault Inventory

 Vault inventory helps retrieve list of archives in a vault with information such as archive ID, creation date, and size for each
archive
 Inventory for each vault is prepared periodically, every 24 hours
 Vault inventory is updated approximately once a day, starting on the day the first archive is uploaded to the vault.
 Glacier returns the last inventory it generated, which is a point-in-time snapshot and not real-time data.
Vault Metadata or Description can also be obtained for a specific vault or for all vaults in a region, which provides information such as

 creation date,
 number of archives in the vault,
 total size in bytes used by all the archives in the vault,
 and the date the vault inventory was generated
Glacier also provides operations to set, retrieve, and delete a notification configuration on the vault. Notifications can be used to identify
vault events.
Archive Operations
Glacier provides operations to upload, download and delete archives.

Uploading an Archive -
 An archive can be uploaded in a single operation (1 byte to up to 4 GB in size ) or in parts referred as Multipart
upload (40 TB)
 Multipart Upload helps to
 improve the upload experience for larger archives.
 upload archives in parts, independently, parallely and in any order
 faster recovery by needing to upload only the part that failed upload and not the entire archive.
 upload archives without even knowing the size
 upload archives from 1 byte to about 40,000 GB (10,000 parts * 4 GB) in size
 Glacier returns a response that includes an archive ID which is unique in the region in which the archive is stored
 Glacier does not support any additional metadata information apart from an optional description. Any additional
metadata information required should be maintained at client side
Downloading an Archive –
Downloading an archive is an asynchronous operation and is the 2 step process
 Initiate an archive retrieval job
o When a Job is initiated, a job ID is returned as a part of the response
o Job is executed asynchronously and the output can be downloaded after the job completes
o Job can be initiated to download the entire archive or a portion of the archive
 After the job completes, download the bytes
o Archive can downloaded as all the bytes or specific byte range to download only a portion of the output
o Downloading the archive in chunks helps in the event of the download failure, as only that part needs to be
downloaded
o Job completion status can be checked by
o Check status explicitly (Not Recommended)
o Completion notification
Deleting an Archive –

An archive can be deleted from the vault only one at a time

This operation is idempotent. Deleting an already-deleted archive does not result in an error

AWS applies pro-rated charge for items that are deleted prior to 90 days, as it is meant for long term
storage
Updating an Archive –
An existing archive cannot be updated and must be deleted and re-uploaded, which would be

assigned a new archive id


Range retrievals –
Amazon Glacier allows retrieving an archive either in whole or a range, or portion.

Range retrievals need a range to be provided that is megabyte aligned.


Glacier returns checksum in the response which can be used to verify if any errors in download by

comparing with checksum computed on the client side.


Specifying a range of bytes can be helpful when


Control bandwidth costs

Manage your data downloads

Retrieve a targeted part of a large archive
Control bandwidth costs
Glacier allows retrieval of up to 5 percent of the average monthly storage for free each month.

Scheduling range retrievals can help in two ways.


• Meet the monthly free allowance of 5 percent by spreading out the data requested.
• If the amount of data retrieved doesn’t meet the free allowance percentage, scheduling range retrievals
enables reduction of peak retrieval rate, which determines the retrieval fees.
Manage your data downloads
Glacier allows retrieved data to be downloaded for 24 hours after the retrieval request completes.

Only portions of the archive can be retrieved so that the schedule of downloads can be managed within the

given download window.


Retrieve a targeted part of a large archive
•Retrieving an archive in range can be useful if an archive is uploaded as an aggregate of multiple individual
files, and only few files need to be retrieved.
Jobs
An S3 Glacier Job can be initiated to perform a select query on an archive, retrieve an

archive or get an inventory of a vault.



There are three types of Glacier Jobs

Select – To perform a select query on an archive.

Archive-retrieval – To retrieve an archive.

Inventory-retrieval – To get an inventory of a vault.
Accessing Amazon S3 Glacier
Amazon S3 Glacier is a RESTful web service that uses HTTP and HTTPS as a transport and

JavaScript Object Notation (JSON) as a message serialization format.



Application code can make requests directly to the S3 Glacier web service API.

When using REST API directly, the necessary code must be written to sign and authenticate
the requests.

Alternatively, application development can be simplified by using the AWS SDKs that wrap
the S3 Glacier REST API calls.
These libraries take care of authentication and request signing when we provide our

credentials.

S3 Glacier provides a console to create and delete vaults.

The code must be written for the archive and job operations using either the REST API or
the AWS SDK wrapper libraries.
Regions and Endpoints

A vault can be created in a specific AWS Region.

Glacier requests are sent to an endpoint which is specific to an AWS region.

The list of regions and endpoints are given in the below table.
Region Name Code
US East (Ohio) us-east-2
US East (N. Virginia) us-east-1
US West (N. California) us-west-1
US West (Oregon) us-west-2
Africa (Cape Town) af-south-1
Asia Pacific (Hong Kong) ap-east-1
Asia Pacific (Mumbai) ap-south-1
Asia Pacific (Osaka) ap-northeast-3
Asia Pacific (Seoul) ap-northeast-2
Asia Pacific (Singapore) ap-southeast-1
Asia Pacific (Sydney) ap-southeast-2
Asia Pacific (Tokyo) ap-northeast-1
Canada (Central) ca-central-1
China (Beijing) cn-north-1
China (Ningxia) cn-northwest-1
Europe (Frankfurt) eu-central-1
Europe (Ireland) eu-west-1
Europe (London) eu-west-2
Europe (Milan) eu-south-1
Europe (Paris) eu-west-3
Europe (Stockholm) eu-north-1
Middle East (Bahrain) me-south-1
South America (São Paulo) sa-east-1
Performance

As glacier is a low-cost storage service designed to store data, the Amazon Glacier retrieval
jobs typically complete in 3 to 5 hours.
The upload experience for larger archives can be improved by using multipart upload for

archives up to about 40TB.



Separate parts of a large archive can be uploaded independently, in any order and in parallel
to improve the upload experience for large archives.
Range retrievals can also be performed on archives by specifying a range or portion of the

archive.
Durability and Availability
Amazon Glacier is designed to provide average annual durability of 99.99999999999

percent for an archive.



It redundantly stores data in multiple facilities and on multiple devices within each facility.

To increase durability, Glacier synchronously stores data across multiple facilities before
returning SUCCESS on uploading an archive.
Unlike traditional systems, Amazon Glacier performs regular, systematic data integrity

checks and is built to be automatically self-healing.


Scalability and Elasticity

Amazon Glacier scales to meet growing and often unpredictable storage requirements.

A single archive is limited to 40TB in size, but there is no limit to the total amount of data
that can be stored in the service.
Whether storing petabytes or gigabytes, Amazon Glacier automatically scales the storage up

or down as needed.
Security

By default, only we can access our Amazon Glacier data.
If others need to access our data, we can set up data access control in Amazon Glacier by

using the AWS Identity and Access Management (IAM) service.


Create an IAM policy that specifies which account users have rights to operations on a given

vault.

Amazon Glacier uses server-side encryption to encrypt all data at rest.
It handles key management and key protection by using one of the strongest block ciphers

available, 256-bit Advanced Encryption Standard (AES-256).


It allows to lock vaults where long-term records retention is mandated by regulations or

compliance rules.

We can set the compliance controls on individual Amazon Glacier vaults and enforce
these by using lockable policies.

For example, one might specify controls such as “undeletable records” or “time based
data retention” in a Vault Lock policy and then lock the policy from future edits.

After it is locked, the policy becomes immutable, and Amazon Glacier enforces the
prescribed controls to help achieve the compliance objectives.

To monitor data access, Amazon Glacier is integrated with AWS CloudTrail, allowing
any API calls made to Amazon Glacier in our AWS account to be captured and stored in
log files that are delivered to an Amazon S3 bucket that we specify.
Interfaces

There are two ways to use Amazon Glacier, each with its own interfaces.

The Amazon Glacier API provides both management and data operations.
First, Amazon Glacier provides a native, standard-based REST web services interface.
This interface can be accessed using the Java SDK or the .NET SDK.


The AWS Management Console or Amazon Glacier API actions can be used to create vaults
to organize the archives in Amazon Glacier.
Then use the Amazon Glacier API actions to upload and retrieve archives, to monitor the

status of our jobs and to configure our vault to send a notification through Amazon SNS
when a job is complete.
Second, Amazon Glacier can be used as a storage class in Amazon S3 by using object
lifecycle management that provides automatic, policy-driven archiving from Amazon S3 to
Amazon Glacier.
We simply set one or more lifecycle rules for an Amazon S3 bucket, defining what objects

should be transitioned to Amazon Glacier and when.



We can specify absolute or relative time period for the transition.

The Amazon S3 API includes a RESTORE operation.

The retrieval process from Amazon Glacier using RESTORE takes three to five hours.

This process puts a copy of the retrieved object in Amazon S3 Reduced Redundancy Storage
(RRS) for a specified retention period.

The original archived object remains stored in Amazon Glacier.

Whenever Amazon Glacier is used as a storage class in Amazon S3, Amazon S3 API is
used.

And whenever Amazon Glacier is used as native, Amazon Glacier API is used.

For example, objects archived to Amazon Glacier using Amazon S3 lifecycle policies can
only be listed and retrieved by using the Amazon S3 API or the Amazon S3 console.

We can not see them as archives in an Amazon Glacier vault.
Cost Model

With Amazon Glacier, we pay only for what we use and there is no minimum fee.

In normal use, Amazon Glacier has three pricing components as

Storage – per GB per month

Data transfer out – per GB per month

Requests – per thousand UPLOAD and RETRIEVAL requests per month
Amazon Glacier is designed with the expectation that retrievals are infrequent and unusual

and data will be stored for extended periods of time.



One can retrieve up to 5 percent of average monthly storage for free each month.
If we retrieve more than this amount of data in a month, we are charged an additional (per

GB) retrieval fee.

You might also like