Cloud Computing
Cloud Computing
• Scalable
Amazon S3 can scale in terms of storage, request rate, and users to support an
unlimited number of web-scale applications.
• Reliable
Store data durably, with 99.99 percent availability. Amazon says it does not allow
any downtime.
• Fast
Amazon S3 was designed to be fast enough to support high-performance
applications. Server-side latency must be insignificant relative to Internet latency.
Any performance bottlenecks can be fixed by simply adding nodes to the system.
• Inexpensive
Amazon S3 is built from inexpensive commodity hardware components.
• Simple
Building highly scalable, reliable, fast, and inexpensive storage is difficult. Doing
so in a way that makes it easy to use for any application anywhere is more
difficult. Amazon S3 must do both.
Design Principles
• Amazon, S3’s design aims to provide scalability, high availability, and low
latency at commodity costs.
• To store your data in Amazon S3, you work with resources known as
buckets and objects.
• A bucket is a container for objects.
• An object is a file and any metadata that describes that file.
• S3 stores arbitrary objects at up to 5GB in size, and each is accompanied by
up to 2KB of metadata.
• Objects are organized by buckets.
• Each bucket is owned by an AWS account and the buckets are identified by
a unique, user-assigned key
• To store an object in Amazon S3, we need to create a bucket and then
upload the object to a bucket.
• When the object is in the bucket, we can open it, download it, and move it.
• When no longer need an object or a bucket, we can clean up your
resources.
S3 - Working
• Buckets and objects are created, listed, and
retrieved using either a REST-style or SOAP
interface.
• Objects can also be retrieved using the HTTP GET
interface or via Bit Torrent.
• An access control list restricts who can access the
data in each bucket.
• To upload your data (photos, videos, documents
etc.) to Amazon S3, you must first create an S3
bucket in one of the AWS Regions.
• You can then upload any number of objects to the
bucket.
S3 - Working
• An object is a file and any metadata that describes that file.
• Amazon S3 is an object store that uses unique key-values to store as many objects
as you want.
• You store these objects in one or more buckets, and each object can be up to 5 TB
in size.
• An object consists of the following:
• Key - The name that you assign to an object. You use the object key to retrieve the
object.
• Version ID - Within a bucket, a key and version ID uniquely identify an object. The
version ID is a string that Amazon S3 generates when you add an object to a
bucket.
• Value - The content that you are storing. An object value can be any sequence of
bytes. Objects can range in size from zero to 5 TB.
• Metadata A set of name-value pairs with which you can store information
regarding the object.
• You can assign metadata, referred to as user-defined metadata, to your objects in
Amazon S3.
• Amazon S3 also assigns system-metadata to these objects, which it uses for
managing objects.
S3 - Working
• Working with metadata :
• Subresources - Amazon S3 uses the
subresource mechanism to store object-
specific additional information.
• Access control information -You can control
access to the objects you store in Amazon S3.
Security
• Amazon S3 supports both server-side encryption
(with three key management options (SSE-KMS,
SSE-C, SSE-S3) and client-side encryption for data
uploads.
• Amazon S3 offers flexible security features to block
unauthorized users from accessing your data.
• Amazon S3 provides management features so that
you can optimize, organize, and configure access to
your data to meet your specific business,
organizational, and compliance requirements.
Features of Amazon S3
• Storage classes
• Amazon S3 offers a range of storage classes designed for
different use cases.
• For example, one can store mission-critical production data
in S3 Standard for frequent access, save costs by storing
infrequently accessed data in S3
• We can store data with changing or unknown access patterns
in S3 Intelligent-Tiering, which optimizes storage costs by
automatically moving your data between four access tiers
when your access patterns change.
• These four access tiers include two low-latency access tiers
optimized for frequent and infrequent access, and two opt-in
archive access tiers designed for asynchronous access for
rarely accessed data.
Features of Amazon S3
• Storage management
• Amazon S3 has storage management features that you can use to manage costs,
meet regulatory requirements, reduce latency, and save multiple distinct copies of
your data for compliance requirements.
• S3 Lifecycle – Configure a lifecycle policy to manage your objects and store them
cost effectively throughout their lifecycle. You can transition objects to other S3
storage classes or expire objects that reach the end of their lifetimes.
• S3 Object Lock – Prevent Amazon S3 objects from being deleted or overwritten for
a fixed amount of time or indefinitely. You can use Object Lock to help meet
regulatory requirements that require write-once-read-many (WORM) storage or to
simply add another layer of protection against object changes and deletions.
• S3 Replication – Replicate objects and their respective metadata and object tags to
one or more destination buckets in the same or different AWS Regions for reduced
latency, compliance, security, and other use cases.
• S3 Batch Operations – Manage billions of objects at scale with a single S3 API
request or a few clicks in the Amazon S3 console. You can use Batch Operations to
perform operations such as Copy, Invoke AWS Lambda function, and Restore on
millions or billions of objects.
Features of Amazon S3
• Access management
• Amazon S3 provides features for auditing and managing access to your
buckets and objects.
• By default, S3 buckets and the objects in them are private.
• You have access only to the S3 resources that you create.
• To grant granular resource permissions that support your specific use case
or to audit the permissions of your Amazon S3 resources, you can use the
following features.
– S3 Block Public Access
– AWS Identity and Access Management (IAM)
– Bucket policies
– Amazon S3 access points
– Access control lists (ACLs)
– S3 Object Ownership
– Access Analyzer for S3
Features of Amazon S3
• Data processing
• To transform data and trigger workflows to
automate a variety of other processing
activities at scale, you can use the following
features.
– S3 Object Lambda
– Event notifications
Features of Amazon S3
• Storage logging and monitoring
• Amazon S3 provides logging and monitoring
tools that you can use to monitor and control
how your Amazon S3 resources are being
used.
• Automated monitoring tools are :
– Amazon CloudWatch metrics for Amazon S3
– AWS CloudTrail
Manual monitoring tools : Server access logging,
AWS Trusted Advisor
Features of Amazon S3
• Analytics and insights
• Amazon S3 offers features to help you gain
visibility into your storage usage, which
empowers you to better understand, analyze,
and optimize your storage at scale.
– Amazon S3 Storage Lens
– Storage Class Analysis
– S3 Inventory with Inventory reports
Features of Amazon S3
• Strong consistency
• Amazon S3 provides strong read-after-write
consistency for PUT and DELETE requests of
objects in your Amazon S3 bucket in all AWS
Regions.
• This behavior applies to both writes of new
objects as well as PUT requests that overwrite
existing objects and DELETE requests.
• In addition, read operations on Amazon S3
Select, Amazon S3 access control lists (ACLs),
Amazon S3 Object Tags, and object metadata
Google Bigtable Datastore
Google Bigtable Datastore
• Cloud Bigtable is a sparsely populated table
• It can scale upto billions of rows and thousands
of columns
• Enables to store TB or even PB of data.
• A single value in each row is indexed
• This value is known as the row key.
• It is possible to store terabytes or even
petabytes of data in Google Cloud BigTable
• The row key is the lone index value that appears
in every row and is also known as the row value.
Google Bigtable Datastore
• Low-latency storage for massive amounts of single-keyed data is made
possible by Google Cloud Bigtable.
• It is the perfect data source for MapReduce processes since it enables
great read and write throughput with low latency.
• MapReduce program executes in three stages, namely map stage, shuffle
stage, and reduce stage.
• Applications can access Google Cloud BigTable through a variety of client
libraries, including supported Java extension to the Apache HBase library.
• Because of this, it is compatible with the current Apache ecosystem of
open-source big data software.
• Applications that require high throughput and scalability for key/value
data, where each value is typically no more than 10 MB, should use
Google Cloud BigTable.
• Additionally, Google Cloud Bigtable excels as a storage engine for
machine learning, stream processing, and batch MapReduce operations.
BigTable Storage Concept
• Each massively scalable table in Google Cloud Bigtable is a sorted key/value map
that holds the data.
• The table is made up of columns that contain unique values for each row and
rows that typically describe a single object.
• A single row key is used to index each row, and a column family is often formed
out of related columns.
• The column family and a column qualifier, a distinctive name within the column
family, are combined to identify each column.
• Multiple cells may be present at each row/column intersection.
• A distinct timestamped copy of the data for that row and column is present in
each cell.
• When many cells are put in a column, a history of the recorded data for that row
and column is preserved.
• Cloud by Google Bigtable tables is sparse, taking up no room if a column is not
used in a given row.