1.
Why Cloud Technology is Transforming Business
- key terms related to the cloud and digital transformation
- the benefits of cloud technology with regard to an organization’s
- the differences between on-premises infrastructure, public cloud, private
cloud, hybrid cloud, and multicloud
- the drivers and challenges that lead organizations to undergo a digital
transformation
2. Digital transformation helps organizations:
- change how they operate
- redefine relationships with their customers, employees, and partners
- modernizing their applications
- creating new services
- delivering value
3. What is the cloud and cloud technology, exactly?
The cloud is a metaphor for the network of data centers which store and
compute information that’s available through the internet.
Essentially, instead of describing a complex web of software, servers,
computers, networks, and security systems, all of that has been combined
into one word: “cloud.”
It might help to explore the different ways organizations can implement their
information technology (or IT) infrastructure: on-premises, private cloud
public cloud, hybrid cloud, and multicloud implementations
3.1On-premises IT infrastructure (on-prem), refers to hardware and software
applications that are hosted on-site, located and operated within an
organization's data center to serve their unique needs. This
implementation is the traditional way of managing IT infrastructure. The
benefit of on-premises is that it doesn’t require third-party access which
gives owners physical control over the server hardware and software and
doesn’t require them to pay for ongoing access. However, to have the
computing power to run their required workloads, organizations must buy
physical servers and other infrastructure through procurement processes
that can take months. These systems require physical space, typically a
specialized room with sufficient power and cooling. After configuring
and deploying the systems, businesses then need expert personnel to
manage them. This long process is difficult to scale when demand spikes
or business expands. Cloud computing addresses these issues by offering
computing resources as scalable, on-demand services.
3.2A private cloud: a type of cloud computing where the infrastructure is
dedicated to a single organization instead of the general public. This type
is also known as single-tenant or corporate cloud. Typically, an
organization has to perform the same kind of ongoing maintenance and
management for a private cloud as it would for traditional on-premises
infrastructure. A private cloud is hosted within an organization’s own
private servers, either at an organization’s own data center, at a third-
party colocation facility, or by using a private cloud provider. Private
cloud computing gives businesses many of the benefits of a public cloud
—including self-service, scalability, and elasticity—with more
customization available from dedicated on-premises infrastructure.
Organizations might use private cloud if they have already made
significant investments in their own infrastructure or if, for regulatory
reasons, data must be kept on-premises or hosted in a certain way.
3.3The public cloud: is where on-demand computing services and
infrastructure are managed by a third-party provider, such as Google
Cloud, and shared with multiple organizations or “tenants” through the
public internet. This sharing is why public cloud is known as multi-tenant
cloud infrastructure, but each tenant’s data and applications running in
the cloud are hidden from other tenants.
Because public cloud has on-demand availability of computing and
infrastructure resources, organizations don't need to acquire, configure, or
manage those resources themselves, and they only pay for what they use.
There are typically three types of cloud computing service models
available in public cloud: The first is infrastructure as a service (IaaS),
which offers compute and storage services. The second is platform as a
service (PaaS), which offers a develop-and-deploy environment to build
cloud apps. And the third is software as a service (SaaS), which delivers
apps as services, where users get access to software on a subscription
basis.
3.4Hybrid cloud : In a hybrid cloud, applications run in a combination of
different environments. The most common hybrid cloud example is
combining a public and private cloud environment, like an on-premises
data center and a public cloud computing environment, like Google
Cloud.
3.5Multi-cloud: describes architectures that combine at least two public
cloud providers. Organizations might operate a combination of on-
premises and multiple public cloud environments, therefore
implementing both hybrid and multicloud simultaneously.
So, although hybrid cloud and multicloud are related, they aren’t
interchangeable terms. Today, most organizations embrace a multicloud
strategy.
4. What are the benefits of cloud computing compared to traditional on-
premises infrastructure?
4.1It's scalable.: Cloud computing gives organizations access to scalable
resources and the latest technologies on-demand, so they don’t need to
worry about capital expenditures or limited fixed infrastructure. This can
significantly accelerate infrastructure deployment time.
4.2It’s flexible.: Organizations and their users can access cloud services
from anywhere scaling services up or down as needed to meet business
requirements.
4.3It’s agile.: Organizations can develop new applications and rapidly get
them into production, without worrying about the underlying
infrastructure.
4.4It offers strategic value.: Because cloud providers stay updated with the
latest innovations and offer them as services to customers, organizations
can get more competitive advantages and a higher return on investment—
than if they’d invested in soon-to-be obsolete technologies. This lets
organizations innovate and try new ideas faster.
4.5It’s secure.: Cloud computing security is recognized as stronger than that
in enterprise data centers, because of the depth and breadth of the
security mechanisms and dedicated teams that cloud providers
implement.
4.6Finally, it’s cost-effective.: No matter which cloud computing service
model organizations implement, they only pay for the computing
resources they use. They don’t need to overbuild data center capacity to
handle sudden spikes in demand or business growth, and they can deploy
IT staff to work on more strategic initiatives.
5. Why it is critical to transform and embrace new technology
6. Cloud era
A transformation cloud is a new approach to digital transformation
7. Challenges that lead to a digital transformation
What are the types of problems and questions that make organizations
undergo a digital transformation?
7.1 First, they want to be the best at understanding and using data: Today,
organizations must unify data across streams, lakes, warehouses, and
databases so that they can quickly and easily break down data silos, generate
real-time insights, and make better business decisions; thus reducing cost
and inefficiencies.
7.2 Second, they want the best technology infrastructure: Organizations are
looking for a cloud platform that will serve as their foundation for growth
and has the flexibility to innovate securely and adapt quickly based on
market needs.
7.3 Third, they want to create the best hybrid workplace: The fundamental
shift in how and where we work requires new, stronger connections and
collaboration, and many interactions that took place in person have been
digitized. This change requires more intentional connections and
collaboration.
7.4. Fourth, it's critical for organizations to know that their data, systems,
and users are secure.: The digital world is seeing more severe security
issues, so now companies are rethinking their security posture. They must
find ways to identify and protect everything from people customers
customers to data and transactions in a fast-changing environment.
7.5 Finally, organizations are prioritizing sustainability as a critical, board-
level topic.: They want to create a more sustainable future through products
and services that minimize environmental impact.
8. Google’s transformation cloud
There are five primary capabilities that form the basis of the transformation
cloud. They are: Data, Open infrastructure, Collaboration, Trust And
Sustainable technology and solutions.
8.1Data: is the key to unlocking value from AI, making it critical for
innovation and differentiation.
A data cloud is a unified solution to manage data across the entire data
lifecycle, regardless of whether it sits in Google Cloud or in other clouds. It
lets organizations identify and process data with great scale, speed, security,
and reliability.
8.2Open infrastructure.: Organizations choose to modernize their IT systems
on Google’s open infrastructure cloud because it gives them freedom to
securely innovate and scale from on-premises, to edge, to cloud on an
easy, transformative, and open platform. Open infrastructure cloud brings
Google Cloud services to different physical locations, while leaving the
operation, governance, and evolution of the services to Google Cloud.
Instead of relying on a single service provider or closed technology stack,
today most organizations want the freedom to run applications in the
place that makes the most sense, using hybrid and multicloud approaches
based on open source software.
Open standard and open source
Open source refers to software whose source code is publicly accessible
and free for anyone to use, modify, and share.
8.3Collaboration: helps transform how people connect, create, and
collaborate. With the definition of the workplace forever changed, it's
essential that information and frontline workers across regions and
industries it's essential that information and frontline workers across
regions and industries connect, create, and collaborate securely from
anywhere, and on any device.
8.4A trusted cloud helps organizations protect what's important with
advanced security tools. Organizations see the cloud as more secure than
on-premises, and they want to make it simple so that employees,
customers, and contractors can safely access their services.
8.5Transformation cloud is built on a sustainable foundation, using
technology and solutions that help organizations build and work more
sustainably.
9. The Google cloud Adoption Framework
9.1Risks: The challenge is multi-dimensional, with far reaching
implications for the solutions that will run in the cloud and also for the
technologies that support them. The people who must implement them
and the processes that govern them.
The value of the Google Cloud adoption Framework is that it serves as a
map to help organizations adopt the cloud quickly and effectively by
creating a comprehensive action plan for accelerating cloud adoption. It
does this by structuring and aligning short term tactical, mid-term,
strategic, and long term transformational business objectives.
Summary:
Chapter 2 Fundamental cloud concept
1. Total cost of ownership (TCO)
For on-premises, TCO is associated with assessing the cost of static
resources throughout their lifetime. However due to the dynamic
nature of the cloud, predicting future costs can be challenging. A
common mistake that organizations make when attempting to
calculate cloud TCO is to directly compare the running costs of the
cloud against their on-premises system. These costs are not
equivalent.
The cost of on-premises infrastructure is dominated by the initial
purchase of hardware and software, but cloud computing costs are
based on monthly subscriptions or pay-per-use models.
It's also important to consider all the operational costs of running your
own data center, such as power, cooling, maintenance, and other
support services.
2. Capital expenditures (CapEx) versus operating expenses (OpEx)
With organizations moving from on-premises infrastructure to on-
demand cloud services, there’s a major shift in spending from capital
expenditures to operating expenses.
Capital expenditures, or CapEx, are upfront business expenses put
toward fixed assets. Organizations buy these items once, and they
benefit their business for many years. Maintaining these assets is also
considered CapEx because it extends their lifetime and usefulness.
Small businesses can find CapEx spending challenging because large
one-time purchases are often high cost.
And then there are operating expenses, or OpEx, which are recurring
costs for a more immediate benefit. This represents the day-to-day
expenses to run a business. In IT, these expenses might be yearly
services like website hosting or domain registrations, or the
subscription fee for cloud services. OpEx covers the spendings on
pay-as-you-go items, but are not considered major long-term
investments like CapEx items.
In the on-premises CapEx model, cost management and budgeting are
a one-time operational process completed annually. Data centers
require a huge CapEx investment up front as organizations purchase
space, equipment, and software and hire a workforce to run and
maintain everything. Forecasting is based on a metric such as historic
growth to determine the needs for the next month, quarter, year, or
even multiple years.
Moving to cloud’s on-demand OpEx model enables organizations to
pay only for what they use and only when they use it. Budgeting is no
longer a one-time operational process completed annually. Instead,
spending must be monitored and controlled on an ongoing basis due
to the dynamic nature of cloud use within organizations.
How infrastructure is procured has radically changed, too. In a more
decentralized cloud world, any employee can create resources in
seconds on infrastructure owned and managed by a cloud provider.
Organizations save on power, cooling, and floor space; they save on
management because they don’t have to install, operate, upgrade, and
troubleshoot it themselves.
3. Private cloud, hybrid cloud, and multi-cloud strategies
Private cloud computing gives an organization many of the benefits of
a public cloud — including self-service, scalability, and elasticity —
with more customization available than from dedicated on-premises
infrastructure.
A hybrid cloud is one in which applications are running in a
combination of different environments. The most common example is
combining a private and public cloud environment, like an on-
premises data center, and a public cloud computing environment like
Google Cloud.
Finally, there’s multicloud, which describes architectures that combine
at least two public cloud providers, such as Google Cloud, Amazon
Web Services, Microsoft Azure, or others.
So, what is a hybrid or multicloud strategy used for?
Let's explore some different business requirements, drivers, and use
cases that lead an organization to choose this kind of approach.
Access to the latest technologies Running workloads in multiple
clouds empowers organizations to leverage the latest innovations and
capabilities from each cloud provider, thus taking a best-in-class
approach to cloud features and obtaining the scale, security, and
agility to innovate fast.
Modernize at the right pace
Improved return on investment
Flexibility through choice of tools
Improve reliability and resiliency
Maintain regulatory compliance
Running apps on-premises
Running apps at remote edge locations Organizations
4. How a network supports digital transformation
A fast, reliable, and low-latency global network ensures exceptional
user experience and high performance. It also makes it easier to
communicate and manage data globally. With ever more distributed
workforces and online businesses, having virtual network services that
can easily scale without adding hardware ensures that organizations
can adapt.
How does a network operate?
5. Network performance: Bandwidth and letancy
Two important terms in networking are bandwidth and latency.
6. Google cloud regions and zones
It’s designed to give customers the highest possible throughput and
lowest possible latencies for their applications.
Google Cloud’s infrastructure is based Google Cloud’s infrastructure
is based in five major geographic locations: North America, South
America, Europe, Asia, and Australia.
Having multiple service locations is important because choosing
where to locate applications affects qualities like availability,
durability, and latency, the latter of which measures the time a packet
of information takes to travel from its source to its destination.
Each of these locations is divided into several different regions and
zones. Regions represent independent geographic areas and are
composed of zones.
For example, London, or europe-west2, is a region that currently
comprises three different zones.
A zone is an area where Google Cloud resources are deployed.
For example, if you launch a virtual machine using Compute Engine it
will run in the zone that you specify to ensure resource redundancy.
You can run resources in different regions.
You can run resources in different regions. This is useful for bringing
applications closer to users around the world, and also for protection
in case there are issues with an entire region, such as a natural
disaster.
Some of Google Cloud’s services support placing resources in what
we call a multi-region.
You can find the most up-to-date numbers for Google Cloud regions
and zones at [Link]/about/locations.
7. Google’s edge network
A recommended best practice for organizations is to keep their traffic
on Google's private network for most of its journey, using the same
network that powers products like Gmail, Google Search and
YouTube allows organizations to take advantage of the performance
that global infrastructure provides.
When a user opens a Google app or Web page, Google responds to
that request from an edge network location that will provide the
lowest latency.
Understanding Google's Edge Network and how it maintains caches
that store popular content near its users helps organizations choose
when to hand off traffic to Google.
A network's edge is defined as a place where a device or an
organization's network connects to the Internet. It's called "the edge"
because it's the entry point to the network.
Google's Edge Network is how we connect with ISPs to get traffic to
and from users.
It's made up of network infrastructure that organizations can hand off
traffic to based on users needs, performance and cost.
Summary
Chapter 3 Cloud computing models and shared responsibility
1. Introduction to cloud computing models and shared responsibility
2. Cloud computing service models
Cloud computing allows for a third party to be responsible for
some part of the infrastructure. This means that organizations then
have more time to focus on their core business.
2.1 Infrastructure as a service, or IaaS, which offers infrastructure
resources such as compute and storage.
2.2 Platform as a service, or PaaS, which offers a develop-and-
deploy environment to build cloud apps.
2.3 Software as a service, or SaaS, which delivers complete
applications as services
3. Infrastructure as a service IaaS
IaaS is a computing model that offers the on-demand availability
of almost infinitely scalable infrastructure resources, such as
compute, networking, storage, and databases as services over the
internet.
It provides the same technologies and capabilities as a traditional
data center without having to physically maintain or manage all of
it.
One of the main reasons businesses choose IaaS is to reduce their
capital expenditures and transform them into operational expenses.
Compute Engine and Cloud Storage are examples of Google Cloud
IaaS products.
So, what scenarios would IaaS be good for?
-The flexibility and scalability of IaaS is useful for organizations that: Have
unpredictable workload volumes or need to move quickly in response to business
fluctuations.
-Require more infrastructure scalability and agility than traditional data centers can
provide.
-Have high business growth that outpaces infrastructure capabilities.
-Experience unpredictable spikes in demand for infrastructure services.
-See low utilization of existing infrastructure resources.
4. Platform as a service PaaS
PaaS is appealing because it provides a platform for developers to develop, run,
and manage their own apps without having to build and maintain the associated
infrastructure. They can also use built-in software components to build their
applications, which reduces the amount of code they have to write.
By abstracting the management of underlying resources even further than IaaS,
PaaS offloads infrastructure management, patches, updates, and other
administrative tasks to the cloud service provider.
So, what scenarios would PaaS be good for?
PaaS is suitable for organizations that: Want to create unique and custom
applications without investing a lot in owning and managing infrastructure.
Want to rapidly test and deploy applications.
Have many legacy applications and want to reduce the cost of operations.
Have a new app project that they want to deploy quickly by growing and updating
the app as fast as possible.
Want to only pay for resources while they’re being used.
And want to offload time-consuming tasks such as setting up and maintaining
application servers and development and testing environments.
4. Software as a service SaaS
Software as a service, or SaaS, is a computing model that offers an
entire application, managed by a cloud provider, through a web
browser. The cloud provider hosts the application software in the
cloud and delivers it through a browser. With this model, you don’t
need to download or install any of it.
Google Workspace, which includes tools such as Gmail, Google Drive, Google
Docs, and Google Meet, is a Google Cloud SaaS product.
And what scenarios would SaaS be good for?
Well, SaaS is suitable for organizations that: Well, SaaS is suitable for
organizations that:
-Want to use standard software solutions that require minimal customization.
-Don’t want to invest time or internal expertise in maintaining applications or
infrastructure.
-Need more time for IT teams to focus on strategic projects.
-And need to access apps from various devices and locations.
5. Choosing a cloud computing model
The answer depends on their business needs, required functionality, and available
expertise.
5. The shared responsibility model
6. How the shared responsibility model works
Lesson 2 Exploring Data Transformation with Google Cloud
1. Introduction
2. Unlocking business value of data
Data can be categorized into three main types: structured, semi-structured, and
unstructured.
Structured data is highly organized and well-defined.
Semi-structured data falls somewhere in between structured and unstructured data.
Examples include emails, HTML, JSON, and XML files.
Unstructured data is information that either doesn’t have a predefined data model
or isn’t organized in a predefined manner.
Categories include: Text, which is the most common, and is often generated and
collected from sources like documents, presentations, or even social media posts.
Data files, like images, audio files, and videos.
3. Data management concepts
3.1Databases: Let’s examine two types of databases: relational and non-
relational.
A relational database stores and provides access to data points that are
related to one another. This means storing information in tables, rows, and
columns that have a clearly defined schema that represents the structure or
logical configuration of the database. A relational database can establish
links—or relationships–between information by joining tables, and
structured query language, or SQL, can be used to query and manipulate
data. Relational databases are highly consistent, reliable, and best suited for
dealing with large amounts of structured data.
A non-relational database, sometimes known as a NoSQL database, is less
structured in format and doesn’t use a tabular format of rows and columns
like relational databases. Instead, non-relational databases follow a flexible
data model, which makes them ideal for storing data that changes its
organization frequently or for applications that handle diverse types of data.
This includes when large quantities of complex and diverse data need to be
organized, or when the data regularly evolves to meet new business
requirements.
Google Cloud relational database products include Cloud SQL and Spanner,
while Bigtable is a non-relational database product.
3.2Data warehouse, Like a database, a data warehouse is a place to store data.
However, while a database is designed to capture data for storage, retrieval,
and use, a data warehouse is designed to analyze data.
BigQuery is Google Cloud's data warehouse offering.
3.3Data lake: A data lake is a repository designed to ingest, store, explore,
process, and analyze any type or volume of raw data, regardless of the
source, like operational systems, web sources, social media, or Internet of
Things, or IoT. It can store different types of data in its original format;
ignoring size limits, and without much pre-processing or adding structure.
4. The role of data in digital transform
5. The data value chain
6. Data governance
Data governance means setting internal standards—data policies—that apply
to how data is gathered, stored, processed, and disposed of.
It governs who can access certain data and what data is under governance.
It also involves complying with external standards set by industry
associations, government agencies, and other stakeholders.
Data governance brings several benefits.
It's possible that organizations without an effective data governance program
will suffer from compliance violations.
Chapter 2 Google cloud data management solutions
1. Unstructured data storage
Google Cloud offers several core storage products. This list includes Cloud
Storage, Cloud SQL, Spanner, BigQuery, Firestore, and Bigtable.
1.1Let’s begin with Cloud Storage, which is a service that offers developers
and IT organizations durable and highly available object storage.
There are four primary storage classes in Cloud Storage.
Cloud Storage also provides a feature called Autoclass
2. Structured data storage
Earlier in the course, we mentioned that a relational database stores
information in tables, rows, and columns that have a clearly defined schema
that represents the structure or logical configuration of the database. Cloud
SQL offers fully managed relational databases, including MySQL,
PostgreSQL, and SQL Server as a service.
When considering which option is best for your business, consider this: if
you have outgrown any relational database, are sharding your databases for
throughput high performance, need transactional consistency, global data,
and strong consistency, or just want to consolidate your database, consider
using Spanner.
If you don’t need horizontal scaling or a globally available system, Cloud
SQL is a cost-effective solution.
The final structured data storage solution that we’ll explore is BigQuery.
BigQuery is a fully-managed data warehouse.
As we’ve already learned, a data warehouse is a large store that contains
petabytes of data gathered from a wide range of sources within an
organization and is used to guide management decisions.
Because it’s fully managed, BigQuery takes care of the underlying
infrastructure, so users can focus on using SQL queries to answer business
questions, without having to worry about deployment, scalability, and
security.
BigQuery provides two services in one: storage and analytics.
BigQuery also has built-in machine learning features so that ML models can
be written directly in BigQuery by using SQL. And if other professional
tools—such as Vertex AI from Google Cloud—are used to train ML models,
datasets can be exported from BigQuery directly into Vertex AI for a
seamless integration across the data-to-AI lifecycle.
3. Semi-structured data storage
Google Cloud offers two semi-structured data storage products, Firestore
and Bigtable.
And then there’s Bigtable, Google's NoSQL big data database service. It's
the same database that powers many core Google services, including Search,
Analytics, Maps, and Gmail.
4. Choosing the right storage product
If data is unstructured, then Cloud Storage is the most appropriate option.
You have to decide a storage class: Standard, Nearline, Coldline, or Archive.
Or whether to let the Autoclass feature decide that for you.
If data is structured or semi-structured, choosing a storage product will
depend on whether workloads are transactional or analytical.
Transactional workloads stem from online transaction processing, or OLTP,
systems, which are used when fast data inserts and updates are required to
build row-based records. An example of this is point-of-sale transaction
records.
Then there are analytical workloads, which stem from online analytical
processing, or OLAP systems, which are used when entire datasets need to
be read. They often require complex queries, for example, aggregations. An
example here would be analyzing sales history to see trends and aggregated
views.
After you determine if the workloads are transactional or analytical, you
must determine whether the data will be accessed by using SQL. So, if your
data is transactional and you need to access it by using SQL, then Cloud
SQL and Spanner are two options.
5. Database migration and modernization
Running modern applications on legacy, on-premises databases requires
overcoming expensive, time-consuming challenges around latency,
throughput, availability, and scaling.
There are different ways that an organization can migrate or modernize their
current database in the cloud.
The most straightforward method is a lift and shift platform migration.
Google Cloud’s Database Migration Service (DMS) can easily migrate your
databases to Google Cloud, or Datastream can be used to synchronize data
across databases, storage systems, and applications.
Chapter 3 Making data useful and accessible
1. Business intelligence and insights using Looker
Looker is a Google Cloud business intelligence (BI) platform designed to
help individuals and teams analyze, visualize, and share data.
2. Streaming analytics
Generally, streaming analytics is useful for the types of data sources that
send data in small sizes, often in kilobytes, in a continuous flow as the
data is generated.
Sources of streaming data include equipment sensors, clickstreams, social
media feeds, stock market quotes, app activity, and more.
Companies use streaming analytics to analyze data in real time and
provide insights into a wide range of activities, such as metering, server
activity, geolocation of devices, or website clicks.
Use cases include:
Ecommerce: User clickstreams can be analyzed to optimize the shopping
experience with real-time pricing, promotions, and inventory
management.
Financial services: Account activity can be analyzed to detect abnormal
behavior in the data stream and generate a security alert.
Investment services: Market changes can be tracked and settings adjusted
to customer portfolios based on configured constraints, such as selling
when a certain stock value is reached.
News media: User click records can be streamed from various news
source platforms and the data can then be enriched with demographic
information to better serve articles that are relevant to the targeted
audience.
Utilities: Throughput across a power grid can be monitored and alerts
generated or workflows initiated when established thresholds are
reached.
Google Cloud offers two main streaming analytics products to ingest,
process, and analyze event streams in real time, which makes data more
useful and accessible from the instant it’s generated.
Pub/Sub ingests hundreds of millions of events per second, but Dataflow
unifies streaming and batch data analysis and builds cohesive data
pipelines.
3. Pub/ Sub and Dataflow
One of the early stages in a data pipeline is data ingestion, which is
where large amounts of streaming data are received. Data, however, may
not always come from a single, structured database.
“Process” in this case refers to the steps to extract, transform, and load
data, sometimes referred to as ETL.
A popular solution for pipeline design is Apache Beam.
Lesson 3 Innovating with Google Cloud Artificial Intelligence
Chapter 1 AI and ML Fundamentals
1. AI and ML defined
1.1Artificial intelligence is a broad field which refers to the use of
technologies to build machines and computers that can mimic
cognitive functions associated with human intelligence. These
functions include, being able to see, understand, and respond to
spoken or written language, analyze data, make recommendations and
more.
1.2Machine learning is a subset of AI that lets a machine learn from data
without being explicitly programmed.
1.3Another area of AI you may be hearing a lot about is generative AI.
This is a type of artificial intelligence that can produce new content,
including text, images, audio, and synthetic data.
2. How AI and ML differ from data analytics and business intelligence
Most data analysis and business intelligence is based on historical
data, used to calculate metrics or identify trends. But to create value in
your business, you need to use that data to make decisions for future
business. This is where artificial intelligence and machine learning
come in.
3. Problem that ML is suited to solve
ML is suited to solve four common business problems.
-The first is replacing or simplifying rule based systems.
-A second business problem ML can help solve relates to automating
processes.
-A third type of business problem that ML can help solve is
understanding unstructured data like images, videos, and audio.
-And finally, there's personalization.
4. Why ML requires high-quality data
An ML model can't make accurate predictions by learning from
incorrect data.
To assess it's quality, data is evaluated against six dimensions:
Completeness, uniqueness, timeliness, validity, accuracy, and
consistency.
5. The importance of responsible and explainable AI
The principles state that AI should be: socially beneficial, avoid
creating or reinforcing unfair bias, be built and tested for safety, be
accountable to people, incorporate privacy design principles, uphold
high standards of scientific excellence. And be made available for
uses that accord with these principles.
In addition to these principles, Google will not design or deploy AI in
the following application areas.
Chapter 2 Google cloud’s AI and ML Solutions
Google Cloud offers four options for building machine learning models.
-The first option is BigQuery ML. This is a tool for using SQL queries to
create and execute machine learning models in BigQuery.
-The second option is to use pre trained APIs, or application programming
interfaces. This option lets you use machine learning models that were built
and trained by Google, so you don't have to build your own ML models if
you don't have enough training data or sufficient machine learning expertise
in house.
- The third option is AutoML, which is a no code solution, letting you build
your own machine learning models on Vertex AI through a point and click
interface.
- And finally, there's custom training through which you can code your very
own machine learning environment, the training, and the deployment, which
gives you flexibility and provides control over the ML pipeline.
1. Big Query ML
BigQuery ML democratizes the use of machine learning by empowering
data analysts, but primary data warehouse users, to build and run models by
using existing business intelligence tools and spreadsheets. Predictive
analytics can guide business decision making across the organization. Using
Python or Java to program an ML solution isn't necessary. Models are
trained and access directly in BigQuery by using SQL, which is a language
familiar to data analysts.
BigQuery ML brings machine learning to the data. It reduces complexity
because fewer tools are required. It also increases speed of production
because moving and formatting large amounts of data for Python-based ML
frameworks is not required for model training in BigQuery. BigQuery ML
also integrates with Vertex AI, Google Cloud's end to end AI and ML
platform.
2. Pre-trains APIs
Google Cloud's pre trained API's are a great option If you don't have your
own training data.
Google Cloud's pre trained API's can help developers build smart apps
quickly by providing access to ML models for common tasks like analyzing
images, videos, and text.
Google also offers several other pre trained API's.
3. Auto ML
Another more custom way to use machine learning to solve problems is
to train models by using your own data. This is where Vertex AI comes
in. Vertex AI brings together Google Cloud services for building ML
under one unified user interface.
4. Custom models
Vertex AI is also the essential platform for creating custom end to end
machine learning models.
Vertex AI provides a suite of products to help at each stage of the ML
workflow, from gathering data to future engineering, building models,
and finally, deploying and monitoring those models.
5. TensorFlow
TensorFlow has a flexible ecosystem of tools, libraries, and community
resources that enable researchers to innovate in ML and developers to
build and deploy ML powered applications.
TensorFlow takes advantage of the Tensor Processing Unit, or TPU,
which is Google's custom developed application specific integrated
circuit used to accelerate machine learning workloads. TPUs act as
domain specific hardware as opposed to general purpose hardware with
CPUs and GPUs. With TPUs, the computing speed increases more than
200 times.
6. AI solutions
Beyond the customizable options, Google Cloud has also created a set of
full AI solutions aimed to solve specific business needs.
Contact Center AI provides models for speaking with customers and
assisting human agents, increasing operational efficiency, and
personalizing customer care to transform your contact center.
Document AI unlocks insights by extracting and classifying information
from unstructured documents such as invoices, receipts, forms, letters,
and reports. The extracted data can then be saved in a database or
exported to another application for further analysis.
Discovery AI for retail uses machine learning to select the optimal
ordering of products on a retailer's e-commerce site when shoppers
choose a category like winter jackets or kitchen ware.
And Cloud Talent Solution uses AI with job search and talent acquisition
capabilities, matches candidates to ideal jobs faster, and allows
employers to attract and convert higher quality candidates.
7. Considerations when selecting Google Cloud AI/ML solutions
- The first consideration is speed: How quickly do you need to get your
model to production? AI projects can typically take anywhere 3-36
months to plan and implement, depending on the scope and complexity
of the use case. But business decision makers often underestimate the
time it will take. Pre-trained API's require no model training, because that
time-consuming task has already been carried out. Custom training
usually takes the longest time because it builds the ML model from the
beginning, unlike autoML and Big query ML.
- The next consideration is differentiation: How unique is your model, or
how unique does it need to be? Google Cloud offers a range of outs of the
box solutions for organizations that want to quickly use ML models in
their day to day business operations. These include image recognition
solutions and chatbots, which are quick to deploy and can be applied in
various use cases. Alternatively, Vertex AI, which is Google Cloud's
unified platform for building, deploying, and managing AI solutions, can
give ML engineers and data scientists full control of the ML workflow.
Vertex AI custom training lets you train and serve custom models with
code on vertex workbench, which results in highly bespoke ML models.
- The expertise required when embarking on an AI or ML project.:
Infusing AI into business processes requires roles such as data engineers,
data scientists, and machine learning engineers among others.
Organizations should consider their current team and then determine a
people strategy, which could include reusing or repurposing existing
resources, upskilling and training current staff, or hiring or working with
outside consultants or contractors. Google Cloud's AI and ML products
vary from those that can be employed by data analysts and business
intelligence teams, right up to those more suited to ML engineers and
data scientists.
- The effort required to build an AI solution.: This depends on several
factors, including the complexity of the problem, the amount of data
available, and the experience of the team.
Lesson 4 Modernize Infrastructure and Applications with Google
Cloud
Chapter 1 Modernizing Infrastructure in the Cloud
In the context of the Cloud, compute refers to a machine's ability to
process information.
In this section of the course, you'll learn about the benefits that Cloud computing
can bring to an organization and explore three Cloud computing options, virtual
machines, containers, and serverless.
1. The benefits of running compute workloads in the cloud
Let's explore some benefits that running compute workloads in the Cloud
can bring to an organization.
We'll begin with total cost of ownership, or TCO, which is a measure of the
total cost of a system or solution over its lifetime.
Cloud computing can help businesses save money on IT costs by eliminating
the need to purchase and maintain physical infrastructure. Cloud providers
offer a pay-as-you-go model, which means that organizations only pay for
the resources used. They also offer discounts for long term commitments,
which can further reduce TCO for businesses that are planning to use Cloud
services for a long period.
Next, there is scalability, which refers to the ability to increase or decrease
the number of resources such as servers, storage, and bandwidth that are
available to a Cloud-based application to meet changing demand.
Another benefit to Cloud computing is reliability. Cloud providers offer a
high degree of reliability and up-time, which gives businesses confidence
that their data and applications will be available when they need them.
Cloud providers have many ways to ensure the reliability of their services.
Google Cloud for example has multiple data centers located in different
parts of the world. This helps to ensure that if one data center goes down,
the others can continue to operate. Cloud providers also use various
technologies to monitor their services and automatically detect and fix
problems.
Next is security. Cloud computing providers offer a high level of security for
data and applications. Organizations need to be sure that their data is being
kept safe. In addition to physical data center security, Cloud security
features include data encryption, identity and access management, network
security, virtual private Clouds, and monitoring services that can detect and
respond to security threats in real time.
Running compute workloads in the Cloud offers a high degree of flexibility
for organizations. Organizations can choose the Cloud services that best
meet their needs at any point in time, and then change or adapt those
services when necessary. For example, a business that needs to increase the
amount of storage space that it uses can easily add more storage space to its
Cloud storage service.
Finally, another benefit of running compute workloads in the Cloud is
abstraction. Abstraction refers to how Cloud providers remove the need for
customers to understand the finer details of the infrastructure
implementation by providing management of the hardware, software, and
certain aspects of security and networking. For example, a Cloud storage
provider might provide a way for customers to store files so that they don't
have to worry about the finer details of how the files are stored on the Cloud
providers' infrastructure. Abstraction also lets Cloud providers offer many
services. For example, Google Workspace lets customers run productivity
applications so that they don't have to worry about the details of how the
applications are actually run or maintained on Google's infrastructure.
Running compute workloads in the Cloud can help organizations get their
products and services to market faster by eliminating the need to develop
and maintain their own infrastructure. At the same time, it provides a
platform for innovation by providing access to the latest technologies and
tools as and when they are released.
2. Virtual machines
Virtualization is a form of resource optimization that lets multiple systems
run on the same hardware. These systems are called Virtual Machines or
VMs. This means that they share the same pool of processing, storage, and
networking resources.
VMs enable organizations to run multiple applications at the same time on a
server in a way that is efficient and manageable.
Compute engine is Google Cloud's infrastructure as a service product, that
lets users create and run virtual machines on Google infrastructure.
An API or application programming interface, is a set of instructions that
allows different software programs to communicate with each other.
When you use virtual machines, compute engine bills by the second with a
one minute minimum and sustained use discounts start to apply
automatically to virtual machines the longer they run, for each VM that runs
for more than 25% of a month. Compute engine automatically applies a
discount for every incremental hour of use.
Compute engine also offers committed use discounts. This means that when
committing to use resources for either a one year or three years period,
discounts are offered over the on demand prices.
And then there are preemptable and spot VMs.
Let's say that a workload doesn't require a human to sit and wait for it to
finish, such as a batch job analyzing a large dataset. Costs can be reduced in
some cases by up to 90% by choosing preemptable or spot VMs to run the
job. A preemptable or spot VM is different from an ordinary compute engine
VM in only one respect. Compute engine has permission to terminate a VM
if its resources are needed elsewhere. Although savings are possible with
preemptable or spot VMs, it needs to be ensured that a job can be stopped
and restarted without impact.
Spot VMs differ from preemptible VMs by offering more features. For
example, preemptable VMs can only run for up to 24 hours at a time, but
spot VMs don't have a maximum run time. However, the pricing is currently
the same for both.
Finally, Compute Engine lets users choose the machine properties of their
instances, like the number of virtual CPUs, the operating system, and the
amount of memory by using a set of predefined machine types, or by
creating custom machine types.
3. Containers
Infrastructure as a service, or IS, lets users share compute resources with
other developers by using virtual machines to virtualize the hardware.
Containers follow the same principle as virtual machines. They provide
isolated environments to run software services and optimize resources from
one piece of hardware.
However, they're even more efficient.
The key difference between virtual machines and containers is that virtual
machines virtualize an entire machine down to the hardware layers. Whereas
containers only virtualize software layers above the operating system level.
Containers start faster and use a fraction of the memory compared to booting
an entire operating system. A container is packaged with your application
and all of its dependencies, so it has everything it needs to run.
This architecture is made up of smaller individual services that run
containerized applications, that communicate with each other through APIs
or other lightweight communication methods, such as REST or gRPC.
4. Managing containers
Containers improve agility, enhance security, optimize resources and
simplify managing applications in the cloud.
Many organizations have a mix of virtual machines and containers. However, as
their It infrastructure setup becomes more complex, they often need a way to
manage their services and machines.
Kubernetes, originally developed by Google, is an open-source platform for
managing containerized workloads and services. It makes it easy to orchestrate
many containers on many hosts, scale them, and easily deploy rollouts and
rollbacks. This improves application reliability and reduces the time and resources
needed to spend on management and operations.
Google Kubernetes Engine or GKE is a Google hosted, managed Kubernetes
service in the Cloud. The GKE environment consists of multiple machines,
specifically compute engine instances grouped to form a cluster. GKE clusters can
be customized, and they support different machine types, numbers of nodes, and
network settings. GKE makes it easy to deploy applications by providing an API
and a Web based console. Applications can be deployed in minutes and can be
scaled up or down as needed. GKE also provides many features that can help
monitor applications, manage resources, and troubleshoot problems.
Another popular option for running containerized applications on Google Cloud is
Cloud Run. Cloud Run is a fully managed serverless platform to deploy and run
containerized applications without needing to worry about the underlying
infrastructure.
5. Serverless computing
Serverless computing doesn't mean there's no server, it means that resources
like compute power are automatically provisioned in the background as
needed.
The advantage here is that organizations won't pay for compute power unless
they're running a query or application. At its simplest definition, serverless
means that businesses provide the code for whatever function they want and
the public Cloud provider does everything else.
One type of serverless computing solution is called function as a service.
Some functions are a response to specific events, like file uploads to Cloud
storage, or changes to database records.
Google Cloud offers many serverless computing products.
The first is Cloud Run, which is a fully managed environment for running
containerized applications. With this product, you don't have to worry about
the underlying infrastructure.
Then there is Cloud functions, which is the platform for hosting simple
single purpose functions that are attached to events emitted from your Cloud
infrastructure and services.
There is also App Engine, which is a service to build and deploy web
applications.
Chapter 2 Modernizing Application in the Cloud
1. The benefits of modern cloud application development
There are many benefits to the modern cloud application development approach.
We'll begin with architecture. Modern cloud applications are typically built as a
collection of microservices. Microservices are independently deployable, scalable
and maintainable components that can be used to build a wide range of
applications. This can help organizations bring business value to market faster
because features can be released as they're completed without waiting for the rest
of the application to be complete.
Regarding deployment, modern applications are typically deployed to the cloud
and can use managed or partially managed services. Managed services take care of
the day-to-day management of cloud-based infrastructure, such as patching,
upgrades, and monitoring. This can free up staff to focus on other tasks, such as
developing new applications. Partially managed services offer a hybrid approach,
where businesses manage some aspects of their cloud-based applications
themselves and the cloud provider manages others.
In terms of cost, modern cloud applications use a pay as you go pricing model,
which can make them extremely cost effective when configured efficiently. That
means that organizations don't always need to pay for resources they aren't fully
utilizing. Developers can also use prebuilt APIs, which we'll explore later in this
section of the course, and other tools offered by the cloud provider to build and
deploy their applications quicker.
And then there's scalability. Modern cloud-based applications can easily be scaled
up or down to meet user demands. Modern cloud applications are designed to be
highly available and resilient with built in features like load balancing, which is the
process of distributing network traffic evenly across multiple servers that support
an application. And automatic failover, which is a process that allows a cloud-
based application to automatically switch to a backup server if a failure occurs.
Additionally, cloud service providers typically offer robust monitoring and
management tools that allow developers to quickly identify and respond to issues,
which can further improve the reliability of cloud applications.
2. Rehosting legacy application in the cloud
When a business decides to modernize and move its operations to the
cloud, it might be running several specialized legacy applications that
aren’t compatible with cloud-native applications. In these situations, a
business might take a rehost migration path, commonly referred to as
lift and shift, where an application is moved from an on-premises
environment to a cloud environment without making any changes to the
application itself.
Rehosting applications brings with it the many benefits of cloud
computing that we explored earlier, such as cost savings, scalability,
reliability, and security.
However, there are also some potential drawbacks to choosing a rehost
migration path for legacy applications, including:
Complexity: rehosting can be a complex process. Businesses need to
carefully plan the migration process and ensure that they have the right
resources in place.
Risk: migrating applications to the cloud always involves some risk.
Businesses need to carefully assess and identify potential risks and
ensure that they have a plan in place in case of any problems.
Vendor lock-in: by moving applications to the cloud, businesses might
become locked into a particular cloud provider.
This can potentially make it difficult to switch providers later. Google
Cloud offers many solutions for ehosting specialized legacy applications.
The first is Google Cloud VMware Engine, which helps migrate existing
VMware workloads to the cloud without having to rearchitect the
applications or retool operations. With Google Cloud VMware Engine,
organizations can maintain their existing VMware environments and
operational processes, while benefiting from the scalability, security,
reliability of Google Cloud. By doing this, organizations can also
access a range of Google Cloud services such as BigQuery, AI/ML, and
Google Kubernetes Engine, which lets them modernize their application
environment and use new capabilities and technologies as needed.
And for organizations with legacy applications on Oracle, Google Cloud
offers Bare Metal Solution. This is a fully managed cloud
infrastructure solution that lets organizations run their Oracle workloads
on dedicated, bare metal servers in the cloud.
3. Application programming interfaces (APIs)
Implementing a software service can be complex and changeable. And if
each software service that an organization uses has to be coded for each
implementation, the result can be fragile and error-prone. One way to
make things easier is to use APIs or application programming interfaces.
An API is a set of instructions that lets different software programs
communicate with each other.
Google itself provides many APIs that let developers access its products
and services. These include APIs that use the power of Google to search
across a website or collection of websites, APIs that let developers access
Google Maps data such as maps, directions and traffic information, and
APIs that let developers translate text from one language to another.
Using APIs can create new business opportunities for organizations and
improve online experiences for users.
4. Apigee and multi- cloud
When an organization has implemented API's, it's important to maintain
and manage them effectively. This can be done using a platform such as
Apigee API management, Google Cloud's API management service to
operate API's with enhanced scale security and automation.
Apigee is a popular choice for organizations that need to manage their
API's because it offers many benefits. It helps organizations secure their
API's by providing features such as authentication, authorization and data
encryption. It tracks and analyzes API usage with real time analytics and
historical reporting. It helps with developing and deploying API's through
a visual API editor and a test sandbox. It offers API versioning, API
documentation, and even API throttling, which is the process of limiting
the number of API requests a user can make in a certain period.
5. Hybrid and multi-cloud
How can organizations modernize their IT infrastructure without
completely migrating to the cloud? How can they maintain flexibility and
avoid lock in?
Two options are hybrid and multi cloud solutions.
A hybrid cloud environment comprises some combination of on premises
or private cloud infrastructure and public cloud services. Interconnects
between the private and public clouds allow interoperability.
A multi-cloud environment is where an organization uses multiple public
cloud providers as part of its architecture. This is ideal for organizations
that need flexibility and secure connectivity between the different
networks.
An organization might choose to use hybrid cloud multi-cloud or a
combination of both if they want to incorporate specific elements of a
public cloud to benefit from the main strengths of that provider.
This lets organizations keep parts of the system's infrastructure on
premises while they move other parts to the cloud. This way they create
an environment that is uniquely suited to the organization's needs. Move
only specific workloads to the cloud because a full scale migration is not
required for it to work, benefit from the flexibility, scalability, and lower
computing costs offered by Cloud services for running specific
workloads. Add specialized services such as machine learning, content
caching, data analysis, long term storage, and IOT or Internet of Things.
To the organization's computing resources toolkit.
How can Google Cloud help in this context?
Google's answer to modern hybrid and Multi-cloud distributed systems
and services management is called GKE Enterprise. GKE Enterprise is a
managed production ready platform for running Kubernetes applications
across multiple cloud environments. It provides a consistent way to
manage Kubernetes, clusters, applications and services regardless of
where they are running. Some of the benefits of GKE enterprise include
Multi-cloud and hybrid-cloud support. GKE enterprise can run
Kubernetes clusters on Google Cloud, AWS, Azure, and other public
clouds. Centralized management GKE Enterprise provides a single
centralized console for managing Kubernetes clusters and applications,
security and compliance.
GKE Enterprise includes many features that help secure Kubernetes
clusters and applications and comply with industry regulations.
Networking and load balancing. GKE Enterprise includes a number of
features that help network and load balance Kubernetes applications.
Monitoring and logging GKE Enterprise provides a rich set of tools for
monitoring and maintaining application consistency across an entire
network, whether on premises in the cloud or in multiple clouds.
Lesson 5 Trust and Security with Google Cloud
Chapter 1 Trust and Security in the Cloud
1. Key security terms and concepts
In the field of cloud security, understanding the terminology is crucial
to navigating the landscape effectively. In this lesson, we introduce
essential security terms and concepts that are commonly encountered
when discussing cloud security.
1.1The first three concepts relate to reducing the risk of unauthorized
access to sensitive data.
The privileged access security model grants specific users access
to a broader set of resources than ordinary users. For example, a
system administrator may have privileged access to perform tasks
such as troubleshooting and data restoration. However, the misuse
of privileged access can pose risks, so it’s essential to manage and
monitor such access carefully.
The least privilege security principle advocates granting users only
the access they need to perform their job responsibilities. By
providing the minimum required access, organizations can reduce
the risk of unauthorized access to sensitive data.
The zero-trust architecture security model assumes that no user or
device can be trusted by default. Every user and device must be
authenticated and authorized before accessing resources. Zero-trust
architecture helps ensure robust security by implementing strict
access controls and continuously verifying user identities.
1.2 These next three concepts relate to how an organization can
protect itself from cyber threats.
-Security by default is a principle that emphasizes integrating
security measures into systems and applications from the initial
stages of development.
By prioritizing security throughout the entire process,
organizations can establish a strong security foundation in their
cloud environments.
-Security posture refers to the overall security status of a cloud
environment. It indicates how well an organization is prepared to
defend against cyber attacks by evaluating their security controls,
policies, and practices.
-Cyber resilience refers to an organization's ability to withstand
and recover quickly from cyber attacks. It involves identifying,
assessing, and mitigating risks, responding to incidents effectively,
and recovering from disruptions quickly.
1.3 Finally, let's explore essential security measures to protect
cloud resources from unauthorized access.
-A firewall is a network device that regulates traffic based on
predefined security rules.
-Encryption is the process of converting data into an unreadable
format by using an encryption algorithm.
-Decryption, however, is the reverse process that uses an
encryption key to restore encrypted data back to its original form.
Safeguarding the encryption key is crucial, because it holds the
secret algorithm necessary for decrypting the data.
2. Cloud security components
We’ll first explore three essential aspects of security: Confidentiality,
Integrity, and Availability.
-Confidentiality is about keeping important information safe and secret. It ensures
that only authorized people can access sensitive data, no matter where it's stored or
sent. Confidentiality is of utmost importance in the cloud, as sensitive information
stored and transmitted across cloud environments must be protected from
unauthorized access or disclosure.
-Encryption plays a crucial role in ensuring confidentiality in the cloud. By using
encryption techniques and safeguarding encryption keys, organizations can ensure
that only authorized individuals can access and decrypt sensitive data, effectively
mitigating the risk of data breaches in the cloud.
-Integrity means keeping data accurate and trustworthy. It ensures that information
doesn't get changed or corrupted, no matter where it's stored or how it's moved
around. You can think of it like making sure a message doesn't get altered during
delivery. Integrity in the cloud involves ensuring the accuracy and trustworthiness
of data throughout its lifecycle. Implementing data integrity controls, such as
checksums or digital signatures, enables organizations to verify the authenticity
and reliability of their data in the cloud. This helps prevent unauthorized
modifications or tampering, ensuring the integrity of critical information stored and
processed in cloud environments.
-Availability is all about making sure that cloud systems and services are always
accessible and ready for use by the right people when needed. It's like having a
reliable electricity supply that never goes out. Cloud environments must be
designed with redundancy, failover mechanisms, and disaster recovery plans to
maximize availability and minimize downtime.
By implementing these measures, organizations can ensure that their systems and
applications in the cloud remain accessible whenever needed, promoting business
continuity even in the face of potential disruptions.
Control refers to the measures and processes implemented to manage and mitigate
security risks. It involves establishing policies, procedures, and technical
safeguards to protect against unauthorized access, misuse, and potential threats.
Control measures in the cloud include implementing robust authentication
mechanisms, access restrictions, and security awareness training.
These measures help organizations manage and mitigate security risks associated
with cloud-based systems. By ensuring that only authorized individuals have
access to sensitive data and systems in the cloud, organizations can reduce the risk
of data breaches and unauthorized activities.
Finally, compliance relates to adhering to industry regulations, legal requirements,
and organizational policies. It involves ensuring that security practices and
measures align with established standards and guidelines. Meeting compliance
standards in the cloud demonstrates an organization's commitment to data privacy
and security, building trust with stakeholders, and minimizing legal and financial
risks.
By integrating these principles into a comprehensive cloud security model,
organizations can establish a strong foundation for protecting their data,
maintaining data integrity, and ensuring continuous access to critical resources.
3. Cloud security versus traditional on- premises security
Let's explore these important differences.
3.1The first is location. Cloud security involves hosting and managing
data and applications in off-site data centers operated by cloud
service providers. The responsibility for securing the infrastructure
and underlying hardware lies with the cloud provider. Conversely,
traditional on-premises security involves hosting and managing
data and applications locally on an organization's own servers and
infrastructure, granting direct control and responsibility for
securing the physical and virtual environment.
3.2Next is responsibility. In a cloud model, the cloud service provider
is responsible for securing the infrastructure, network, and physical
facilities. The customer is typically responsible for securing their
data, applications, user access, and configurations. On the other
hand, in an on-premises setup, the organization is responsible for
securing the entire infrastructure, including hardware, network,
operating systems, applications, and data.
3.3The next difference is scalability. Cloud security offers scalability
and elasticity, which allows organizations to easily scale their
resources up or down based on demand.
This flexibility is suitable for dynamic workloads and rapid
growth. In contrast, on-premises security requires organizations to
provision and maintain their own infrastructure, which can be more
time-consuming and costly when they scale up or down.
3.4 Next is maintenance and updates. Cloud service providers handle
infrastructure maintenance, including security updates, patching,
and software upgrades. Customers can focus on managing their
applications and data without worrying about the underlying
infrastructure.
On-premises environments require organizations to maintain and
update their own infrastructure, involving regular tasks such as
patching, software updates, and hardware upgrades.
3.5The final difference is capital expenditure. Cloud security follows
an operational expenditure (OpEx) model, where organizations pay
for the services they consume on a subscription basis. This
eliminates the need for large upfront capital investments in
physical security infrastructure. Traditional on-premises security
models involve significant capital expenditure (CapEx), because
organizations must purchase and maintain their own security
infrastructure.
4. Cybersecurity threats
In today's fast-paced digital world, we’re bombarded with attention-
grabbing headlines.
So, what are some common cybersecurity threats faced by
organizations?
4.1First is deceptive social engineering. Imagine that a skilled
manipulator is seeking to extract confidential system information
from unsuspecting individuals. These cybercriminals employ
“phishing attacks,” which collect personal details about you, your
employees, or your students. They skillfully craft tailored emails
and mimic authenticity to deceive their targets. Therefore, anyone
within your organization can be tricked into inadvertently
downloading malicious attachments, divulging passwords, or
compromising sensitive data.
4.2Next is physical damage. Whether it be damage to hardware
components, power disruptions, or natural disasters such as floods,
fires, and earthquakes, organizations are responsible for
safeguarding data even in the face of physical adversity. You can
think of this as protecting precious assets amidst nature's
unforgiving forces.
4.3Another threat is malware, viruses, and ransomware. These digital
adversaries architect chaos within the cyber domain. Employing
malicious software, they aim to disrupt operations, inflict damage,
or gain unauthorized access to computer systems. The most
insidious of these is ransomware, where crucial files are held
hostage until a considerable ransom is paid. It's like witnessing the
digital equivalent of a calculated extortion scheme.
4.4The next cybersecurity threat is vulnerable third-party systems.
Imagine inviting a trusted ally into your domain, only to discover
that they inadvertently compromise your security. Many
organizations rely on third-party systems for essential functions
such as finance, inventory management, or account operations.
However, without adequate security measures and regular
evaluations, these systems can transform into potential threats,
leaving data security vulnerable. It's like using a tool that
unwittingly introduces risks to your own treasured possessions.
4.5The final threat is configuration mishaps. Even the most seasoned
experts make mistakes. Misconfiguration occurs when errors arise
during the setup or configuration of resources, which inadvertently
exposes sensitive data and systems to unauthorized access. Surveys
consistently identify misconfiguration as the most prominent threat
to cloud security.
In turn, adopting principles of least privilege and privileged access
are imperative, because they allow resource access only when
explicitly required and authorized. This is like granting access only
to those who have earned your trust. As technology continues to
advance at an astonishing pace, organizations must invest in the
right expertise to assess, develop, implement, and maintain robust
data security plans.
Chapter 2 Google Trusted Infrastructure
Our multilayered strategy builds progressive security layers,
combining global data centers, purpose-built servers, custom
security hardware and software, and two-step authentication.
1. Data centers
Google is committed to minimizing the environmental impact of data
centers. By using cutting-edge technologies and renewable energy sources,
we strive to reduce our ecological footprint.
Let's explore the benefits of Google designing and building its own data
centers, using purpose-built servers, advanced networking solutions, and
custom security hardware and software.
One of the greatest advantages of Google's data centers is the
implementation of a zero-trust architecture, which ensures enhanced security
at every level. Our custom hardware and software are purpose-built with
features like tamper-evident hardware, secure boot, and hardware-based
encryption, which establish a strong security posture within the data center
environment.
Physical security is paramount as well, with robust access control measures
and biometric authentication in place. By adopting the principle of least
privilege, only authorized personnel have access to the data centers, which
minimizes the risk of physical breaches and maintains a privileged access
framework.
Furthermore, our data centers embody the concept of security by default.
From the moment you step into a Google data center, you can trust that
every aspect has been designed and implemented with your security in mind.
With cyber resilience as a core principle, our data centers are equipped to
withstand and recover from potential security incidents, and ensure the
continuity and integrity of your data.
Efficiency is another important aspect of our data center design. Purpose-
built servers are optimized for specific tasks, which allows them to perform
at great speed and with exceptional efficiency. This reduces energy
consumption, cuts down on operating costs, and saves resources and the
environment. In fact, we measure our success through the Power Usage
Effectiveness (PUE) score. By continually striving for the lowest PUE
scores, we ensure maximum efficiency in our data centers, leading to
significant cost savings and a reduced carbon footprint.
Scalability is another benefit. Our data centers can quickly and seamlessly
accommodate new hardware and servers, which allows us to scale up
computing resources on demand. This flexibility is critical for Google to
handle massive data volumes and traffic without any disruptions to services.
Furthermore, managing our own servers and network provides us with
unparalleled customization capabilities. This level of flexibility empowers us
to deliver unique services and capabilities that are not available from other
providers, giving you access to exclusive features and innovations.
Although designing and building data centers requires significant upfront
investment, the long-term benefits are substantial. By optimizing resources
for efficiency and scalability, Google can significantly reduce energy
consumption and operating costs, which results in remarkable savings over
time.
2. Secure storage
Previously, we learned that encryption is like a secret code that transforms
data into an unreadable format using special algorithms. This process
ensures that only those with the right key or password can make sense of the
data.
Let's take a closer look at how encryption protects your data in different
states.
When data is at rest, it's stored on physical devices like computers or
servers. By encrypting data at rest, even if someone gains physical access to
the device, they won't be able to decipher the data without the encryption
key. At Google Cloud, we automatically encrypt all customer content at rest,
without any effort required from you.
It's a free and built-in feature that adds an extra layer of protection to your
valuable data. And if you prefer to manage your encryption keys yourself,
you can use our Cloud Key Management Service (Cloud KMS) for added
control. When data is in transit, it's moving over networks or the internet.
Encryption plays a crucial role here by shielding your data from interception
by cybercriminals or unauthorized parties.
It's like sending your information in a locked box that only the intended
recipient can open. At Google Cloud, we employ robust security measures to
ensure the authenticity, integrity, and privacy of your data during transit.
We encrypt and authenticate data at multiple network layers, especially
when it travels outside the physical boundaries we control. This way, your
information remains safe and secure as it journeys through the digital world.
Data in use refers to data being actively processed by a computer.
Encrypting data in use adds another layer of protection, especially against
unauthorized users who might physically access the computer. We use a
technique called memory encryption, which locks your data inside the
computer's memory, making it nearly impossible for unauthorized users to
gain access to it. When it comes to encryption algorithms, the Advanced
Encryption Standard (AES) takes center stage. AES is a powerful encryption
algorithm trusted by governments and businesses worldwide. It's like having
a top-secret encryption method that keeps your data safe from prying eyes.
So, whether your data is resting, traveling, or actively in use, encryption acts
as your loyal guardian, because it ensures its confidentiality and protection.
3. Identity
Often referred to as the three A’s, authentication, authorization, and auditing
are important aspects of cloud identity management used to ensure secure
access, manage user privileges, and monitor resource usage.
3.1Let's begin with the first A, authentication.
It serves as the gatekeeper, because it verifies the identity of users or
systems that seek access. Authentication involves presenting unique
credentials, such as passwords, physical tokens, or biometric data like
fingerprints or voice recognition. Think of it as presenting your
identification card before entering a restricted area. Two-step verification,
which you may also hear being referred to as two-factor authentication or
multi-factor authentication, is a security feature that adds an extra layer
of protection to cloud-based systems. With 2SV enabled, users need to
provide two different pieces of information to log in. For example, it
could be a combination of a password and a code sent to their phone
through text message, voice call, or an app like Google Authenticator.
This powerful feature makes unauthorized access more difficult, even if
someone manages to obtain your password.
3.2The second A is authorization.
After a user's identity is authenticated, authorization steps in to determine
what that user or system is allowed to do within the system. Think of it as
the access control mechanism. Different permissions are assigned to
individuals or groups based on their roles, responsibilities, and
organizational hierarchy. For example, a system administrator might have
the authority to create and remove user accounts, whereas a standard user
might only be able to view a list of other users. This fine-grained control
ensures that each user has the appropriate level of access to perform their
tasks while preventing unauthorized actions.
3.3The third A, auditing (sometimes referred to as accounting), plays a
critical role in monitoring and tracking user activities within a system. By
collecting and analyzing logs of user activity, system events, and other
data, auditing helps organizations detect anomalies, security breaches,
and policy violations.
It provides a comprehensive record of actions taken on a system or
resource, which proves invaluable during security incident investigations,
compliance tracking, and system performance evaluation. Just like the
surveillance cameras in a shopping mall, auditing keeps a watchful eye
on activities happening within your system. To provide granular control
over who has access to Google Cloud resources and what they can do
with those resources, organizations can use Identity and Access
Management (IAM). With IAM, you can create and manage user
accounts, assign roles to users, grant and revoke permissions to
resources, audit user activity, and monitor your security position. It
provides a centralized and efficient approach to managing access control
within your Google Cloud environment.
Imagine IAM as your organization's security headquarters, equipped with
robust tools to manage and safeguard your digital assets. By integrating
IAM into your Google Cloud security strategy, you can ensure fine-
grained access control, enhanced visibility, and centralized resource
management. This empowers you to protect your organization's sensitive
data and resources effectively.
4. Network security
When you expand your network to include cloud environments, security
considerations take on a whole new dimension. Unlike traditional on-
premises setups with clear perimeters, the cloud brings new possibilities and
challenges.
Let's explore some strategies to secure your organization's network and
ensure the safety of your valuable data and workloads in Google Cloud.
Embrace the power of zero trust networks. In the world of security, trust
shouldn't be given freely. With Google Cloud's BeyondCorp Enterprise, you
can implement a zero trust security model. It means that every access request
is thoroughly verified, and both the user's identity and context are
considered. This way, you maintain strict control over who can access your
network and resources, both inside and outside your organization. Secure
your connections to on-premises and multi-cloud environments.
Many organizations have a mix of cloud and on-premises workloads, or they
use multiple cloud providers for resiliency. Ensuring secure connectivity
across these environments is crucial.
Google Cloud provides private access methods through services like Cloud
VPN and Cloud Interconnect, which let you establish secure connections
between your on-premises networks and Google Cloud resources.
Protect your perimeter with Google Cloud's powerful tools.
Google Cloud offers various methods to help secure your perimeter,
including firewalls and Virtual Private Cloud (VPC) Service Controls, which
help you divide your cloud into different sections and keep them secure. You
can also utilize Shared VPC, which is like having a large fence that separates
each Google Cloud Project, so they can work independently and safely. With
these tools, you can keep your cloud environment protected and give
different teams their own space to work in.
Stay ahead with a web application firewall. External web applications and
services are often targeted by cyber threats, including DDoS attacks. DDoS,
which stands for distributed denial-of-service, is a cyber attack that uses
multiple compromised computer systems to flood a target with more traffic
than it can handle, which causes a denial of service to legitimate users.
Google Cloud Armor comes to the rescue by providing robust DDoS
protection.
It’s like a force field that stops harmful attacks and keeps your website or
application safe from things that could make it stop working properly.
Automate infrastructure provisioning for enhanced security. By adopting
automation tools, you can create immutable infrastructure, which means that
it can't be changed after provisioning. Think of infrastructure provisioning
tools as your personal assistants for setting up and maintaining your cloud
environment. When you use tools like Terraform, Jenkins, and Cloud Build,
they handle all the behind-the-scenes work to create a secure and reliable
cloud environment. It's like having a team of efficient workers who build
and organize everything you need to run your environment smoothly. With
these tools, your cloud environment becomes like a well-designed
workspace where everything has its place and functions perfectly. And the
best part is, when it's set up, it stays that way. No unexpected changes or
disruptions. If anything does go wrong, these tools are there to quickly
identify and fix any issue and ensure that your cloud environment keeps
running smoothly.
Your specific network setup and security measures will depend on your
unique business requirements and risk tolerance.
5. Security operations
SecOps—short for Security Operations—is all about protecting your
organization's data and systems in the cloud. It involves a combination of
processes and technologies that help reduce the risk of data breaches, system
outages, and other security incidents.
Think of it as your secret weapon for keeping your valuable data safe.
Let's explore some of the essential activities involved in SecOps.
Vulnerability management is the process of identifying and fixing security
vulnerabilities in cloud infrastructure and applications.
It’s like regularly checking your castle walls for weak spots.
Google Cloud's Security Command Center (SCC) provides a centralized
view of your security posture. It helps to identify and fix vulnerabilities, and
it ensures that your infrastructure remains solid and protected.
Another crucial activity is log management.
It's like having a watchful eye on your castle grounds, looking out for any
suspicious activity. Google Cloud offers Cloud Logging, a service to collect
and analyze security logs from your entire Google Cloud environment. It
helps you detect and respond to any signs of trouble and ensures that you
anticipate any potential threats.
Of course, being prepared for security incidents is equally important.
This is where incident response comes in. Imagine having a team of knights
ready to defend your castle at a moment's notice. Google Cloud has expert
incident responders across various domains, who are equipped with the
knowledge and tools to tackle any security incident swiftly and effectively.
Another crucial aspect of SecOps is educating your employees on security
best practices. Just like teaching everyone in the castle to be vigilant and
lock the gates, security awareness training helps prevent incidents by raising
awareness and empowering employees to protect themselves and the
organization.
Now, you might be wondering, why should your organization implement
SecOps?
Well, here are the benefits.
-Reduced risk of data breaches: SecOps helps identify and fix
vulnerabilities, which significantly reduces the risk of data breaches.
-Increased uptime: A swift and effective incident response minimizes the
impact of outages on your business operations, which ensures smoother and
uninterrupted services.
-Improved compliance: SecOps helps with meeting security regulations,
such as the General Data Protection Regulation (GDPR), and keeps your
organization in good standing.
-Enhanced employee productivity: By educating employees on security best
practices, SecOps minimizes the risk of human error and promotes a more
secure and productive work environment.
SecOps is an integral part of your organization's security strategy. By
implementing SecOps practices, you can fortify your defenses, reduce
security risks, and protect your data in the ever-changing landscape of cloud
security.
Chapter 3 Google Cloud’s Trust Principles and Compliance
Customers need to be sure that their data and applications are safe and
secure, and so Google Cloud has a strong set of trust principles and
compliance programs in place, which are designed to protect customer data
and meet the needs of a wide range of customers, from small businesses to
large enterprises.
1. The Google Cloud Trust Principles and Transparency Reports
1. You own your data, not Google. We prioritize your control and let you
access, export, delete, and manage data permissions within Google
Cloud.
2. Google does not sell customer data to third parties. We safeguard your
data from being used for Google's marketing or advertising purposes.
3. Google Cloud does not use customer data for advertising. Your data
remains confidential, because Google Cloud ensures that it’s never
utilized to target ads.
4. All customer data is encrypted by default. Your data is protected with
robust encryption, because Google Cloud safeguards it even in the
unlikely event of unauthorized access.
5. We guard against insider access to your data. We implement stringent
security measures to prevent unauthorized employee access to customer
data.
6. We never give any government entity "backdoor" access. Your data
remains secure, and no government entity can access it without proper
authorization.
7. Our privacy practices are audited against international standards. We
undergo regular audits to ensure compliance with rigorous privacy
standards.
Transparency Reports and Independent Audits Transparency are a core
element of our commitment to trust. We provide valuable insights and
accountability through our transparency reports, which shed light on
government and corporate actions that affect privacy, security, and access
to information.
These reports let you stay informed and maintain trust in our services.
Additionally, Google Cloud undergoes independent, third-party audits
and certifications.
This verification process ensures that our data protection practices align
with our commitments and industry standards. Our participation in
initiatives like the EU Cloud Code of Conduct further reinforces our
dedication to accountability, compliance support, and robust data
protection principles.
2. Data residence and data sovereignty
When it comes to storing data and keeping it secure, data sovereignty and
data residency are two important concepts to understand.
Data sovereignty refers to the legal concept that data is subject to the
laws and regulations of the country where it resides. For example, the
General Data Protection Regulation (GDPR) in the European Union
requires companies to comply with data protection laws when processing
or storing personal data of EU citizens, regardless of their location. This
ensures that individuals have control over their personal data and its
usage.
In contrast, data residency refers to the physical location where data is
stored or processed. Some countries or regions have laws or regulations
that require data to be stored within their borders. For instance, some
countries mandate that personal data of its citizens must be stored on
servers within the country. This ensures data remains within the
jurisdiction of local laws.
Now, let's explore how Google Cloud addresses data residency
requirements. We offer a range of options to control the physical location
of your data through regions. Each region consists of one or more data
centers, which lets you choose where your data resides.
By configuring your resources in specific regions, Google ensures that
your data is stored only within the selected region, as stated in our
Service Specific Terms. Additionally, Google Cloud provides
Organization Policy constraints, coupled with IAM configuration, to
prevent accidental data storage in the wrong region. These controls offer
peace of mind and reinforce your data residency requirements.
Furthermore, Google Cloud offers features like VPC Service Controls,
which let you restrict network access to data based on defined perimeters.
You can limit user access through IP address filtering, even if they have
authorization. Google Cloud Armor lets you restrict traffic locations for
your external load balancer by adding an extra layer of protection. By
using these capabilities, organizations can adhere to data residency and
data sovereignty requirements, ensure compliance, and maintain control
over their valuable data within the Google Cloud ecosystem.
3. Industry and regional compliance
As organizations migrate to the cloud, it becomes essential to protect
sensitive workloads while ensuring compliance with diverse regulatory
requirements and guidelines. Compliance is a critical aspect of the cloud
journey, because not meeting regulatory obligations can have far-
reaching consequences. To assist you in achieving compliance, Google
Cloud offers robust resources and tools tailored to support your specific
needs.
First, let's explore the Google Cloud compliance resource center. This
comprehensive hub provides detailed information on the certifications
and compliance standards we satisfy. You can find mappings of our
security, privacy, and compliance controls to global standards. This
transparency lets you validate our adherence to industry-leading
practices. The resource center also offers valuable documentation on
regional and sector-specific regulations, and empowers you to navigate
complex compliance landscapes.
Google Cloud's compliance resource center is your go-to source for
actionable information and support. In addition to the resource center, we
provide the Compliance Reports Manager, a powerful tool at your
disposal. This intuitive platform offers easy, on-demand access to critical
compliance resources at no extra cost. Within the Compliance Reports
Manager, you'll discover our latest ISO/IEC certificates, SOC reports,
and self-assessments. These resources provide evidence of our adherence
to rigorous compliance standards and help streamline your own reporting
and compliance efforts. Imagine you're an enterprise seeking ISO/IEC
27001 certification. The Compliance Reports Manager lets you access the
necessary documentation efficiently, and it saves you time and effort in
the certification process. With this tool, we aim to simplify your
compliance journey and empower you to meet your regulatory
obligations effectively. By using the Google Cloud compliance resource
center and the Compliance Reports Manager, you can navigate the
complex realm of industry and regional compliance with confidence. Our
dedicated teams of engineers and compliance experts work hand in hand
with you to address your specific regulatory needs. Together, we create
an integrated controls and governance framework, while we ensure a
robust compliance posture.
You can visit the compliance resource center at
[Link]/security/compliance and explore the Compliance
Reports Manager at [Link]/security/compliance/compliance-
reports-manager.
Lesson 6 Scaling with Google Cloud Operations
Cloud operations refer to the set of practices and strategies employed to
ensure the smooth functioning, optimization, and scalability of cloud-
based systems. It involves managing and monitoring the infrastructure,
applications, and services that run in the cloud, while adhering to best
practices for reliability, performance, security, and cost optimization.
Cloud operations play a pivotal role in enabling organizations to achieve
digital transformation goals, because they ensure the availability,
efficiency, and resilience of critical systems. “Scaling with Google Cloud
Operations” was designed to help you learn how Google Cloud supports
an organization's ability to control their cloud costs through financial
governance, understand the fundamental concepts of modern operations,
reliability, and resilience in the cloud, and explore how Google Cloud
works to reduce our environmental impact and help organizations meet
sustainability goals.
Chapter 1 Financial Governance and Managing Cloud Costs
1. Fundamentals of cloud financial governance
Easy access to cloud resources presents a need for precise, real-time
control of what’s being consumed. Having cloud financial
governance, which is in part a set of processes and controls that
organizations use to manage cloud spend, can mean the difference
between peace of mind and spiraling costs that lead to budget
overruns. As an organization adapts, it'll need a core team across
technology, finance, and business functions to work together to stay
on top of cloud costs and make decisions in real time. The variable
nature of cloud costs impacts people, process, and technology.
Let’s explore these three areas, starting with people.
People refers to the different roles involved in managing cloud costs.
For small organizations, one person might fulfill multiple roles and be
responsible for managing all aspects of a cloud infrastructure and
associated finance. From budgeting to procurement, tracking
optimization, and more.
Large organizations, however, will likely look to a finance team to
take on a financial planning and advisory role. Using business
priorities, a finance team is expected to make data-driven decisions on
cloud spending, but they might struggle to understand or monitor
cloud spend on a daily, weekly, or monthly basis. Then there are
members of technology and line of business teams. They can advise
on how cloud resources are being used to meet the organization's
overall business strategy and what additional resources might be
needed throughout the upcoming year. However, they don’t
necessarily factor costs into their decision making.
To manage cloud costs effectively, a partnership across finance,
technology, and business functions is required. This partnership might
already exist, or it may take the form of a centralized hub, such as a
cloud center of excellence.
The central team would consist of several experts who ensure that best
practices are in place across the organization and that there's visibility
into the ongoing cloud spend. The centralized group would also be
able to make real-time decisions and discuss trade-offs when spending
is higher than planned.
Now let’s transition from people to process.
On a daily or weekly basis, organizations should monitor and analyze
their cloud usage and costs. Then, on a weekly or monthly basis, the
finance team should analyze the results, charge back the costs through
the appropriate teams, and determine whether any changes are needed
to ensure that the organization's cloud spend is optimized. Having a
culture of accountability in place across teams helps organizations
recognize waste, quickly act to eliminate it, and ensure they're
maximizing their cloud investment. It will also help drive cross-group
collaboration across technology, finance, and business teams to ensure
that their cloud spend aligns with broader business objectives.
And finally, there’s technology.
Google Cloud provides built-in tools to help organizations monitor
and manage costs. These tools help organizations gain greater
visibility, drive a culture of accountability for cloud spending across
the 0rganization, control costs to reduce risks of overspending, and
provide intelligent recommendations to optimize cost and usage.
2. Cloud financial governance best practices
Let’s explore some cloud financial governance best practices that
organizations can adopt to increase the predictability and control of
their cloud resources.
The first best practice is to identify who manages cloud costs. If it's a
team, it should ideally be a mix of IT managers and financial
controllers. Because Cloud spending is decentralized and variable, it's
important to establish a culture of accountability for costs across the
organization. Defining clear ownership for projects and sharing cost
views with the departments and teams that are using cloud resources
helps establish this accountability culture and more responsible
spending. As well as making teams accountable for their spending,
Google Cloud financial governance policies and permissions make it
easy to control who can spend and view costs across your
organization.
In addition, Google Cloud offers flexible options to organize
resources and allocate costs to individual departments and teams. For
example, budgets notify key stakeholders based on your actual or
forecasted cloud costs. Creating multiple budgets with meaningful
alerts is an important practice for staying on top of your cloud costs.
The second best practice is to understand what kind of information
can be found in an invoice versus cost management tools. They’re not
the same concept. An invoice is a document that is sent by a cloud
service provider to a customer to request payment for the services that
were used. However, a cost management tool is software to help track,
analyze, and optimize cloud spend. An organization is rarely only
interested in how much they spend. They also want to know why they
spent that much. Cost management tools, like those built into the
Google Cloud console, are effective for answering the why. They can
provide granular data, uncover trends, and identify actions to take to
control or optimize costs.
And this brings us to the third best practice for increasing the
predictability and control of cloud resources: use Google Cloud’s cost
management tools. Google Cloud believes in supporting organizations
by providing strong financial governance tools that make it easier for
customers to align their strategic priorities with their cloud usage.
Before organizations can optimize their cloud costs, they first need to
understand what they're spending, whether there are any trends, and
what their forecasted costs are.
So, how can this be done?
Start by capturing what cloud resources are being used, by whom, for
what purpose, and at what cost.
From there, determine who will be responsible for monitoring that
information, who will be involved in managing costs, and how the
spending information will be reported on an ongoing basis.
It's also important to set up the cadence and format for ongoing
communication with main cloud stakeholders.
Having this plan outlined up front helps ensure that managing costs
isn't an afterthought.
And how can you monitor current cost trends and identify areas of
waste that could be improved?
Google Cloud provides built-in reporting capabilities, which can help
your team gain visibility into costs. Ideally, reports should be
reviewed weekly, at a minimum. One powerful tool is the Google
Cloud Pricing Calculator. The Pricing Calculator lets you estimate
how changes to cloud usage will affect costs.
The calculator is available at [Link]/products/calculator.
3. Using the resource hierarchy to control access
One important cloud computing consideration involves controlling
access to resources. With on-premises infrastructure, physical access
controls were used. This method, however, is not as effective with
resources stored in the cloud.
The Google Cloud resource hierarchy is a powerful tool that can be
used to control access to cloud resources.
Much like the folder structure you use to organize and control access
to your own files, this resource hierarchy is a tree-like structure that
organizes resources into logical groups.
Google Cloud’s resource hierarchy contains four levels, and starting
from the bottom up they are: resources, projects, folders, and an
organization node.
The first level, resources, represent virtual machines, Cloud Storage
buckets, tables in BigQuery, or anything else in Google Cloud.
Resources are organized into projects, which sit on the second level.
Projects can be organized into folders, or even subfolders.
These sit at the third level.
And then at the top level is an organization node, which encompasses
all the projects, folders, and resources in your organization.
It’s important to understand this resource hierarchy because it directly
relates to how policies are managed and applied when you use Google
Cloud.
A policy is a set of rules that define who can access a resource and
what they can do with it.
Policies can be defined at the project, folder, and organization node
levels. Some Google Cloud services can also apply policies to
individual resources.
The third level of the Google Cloud resource hierarchy is folders.
Folders let you assign policies to resources at the level of granularity
that you choose.
The resources in a folder inherit policies and permissions assigned to
that folder. A folder can contain projects, other folders, or a
combination of both.
Now that you understand the structure of the Google Cloud resource
hierarchy, let’s explore some additional benefits of using it to control
access to cloud resources.
First, the resource hierarchy provides granular access control,
meaning you can assign roles and permissions at different levels of the
hierarchy, such as at the folder, project, or individual resource level.
Second, because the resource hierarchy follows inheritance and
propagation rules, permissions set at higher levels of the resource
hierarchy are automatically inherited by lower-level resources. For
example, if you grant a user access at the folder level, all projects and
resources within that folder inherit those permissions by default. This
inheritance simplifies access management and reduces the need for
manual configuration at each individual resource level.
Third, the resource hierarchy enhances security and compliance
through least privilege principles. By assigning access permissions at
the appropriate level in the hierarchy, you can ensure that users only
have the necessary privileges to perform their tasks. This reduces the
risk of unauthorized access and helps maintain regulatory compliance.
Finally, the resource hierarchy provides strong visibility and auditing
capabilities. You can track access permissions and changes across
different levels of the hierarchy, which makes it easier to monitor and
review access controls.
This improves accountability and helps identify and address potential
security issues.
4. Controlling cloud consumption
Organizations want to control cloud consumption for many reasons. It
could be about cost savings by ensuring they’re not overspending on
unnecessary resources, increased visibility by providing a better
understanding of how resources are being used and identifying areas
to reduce costs, or improved compliance by ensuring your cloud
environment is compliant with industry regulations. Google Cloud
offers several tools to help control cloud consumption, including
resource quota policies, budget threshold rules, and Cloud Billing
reports.
4.1Resource quota policies let you set limits on the amount of
resources that can be used by a project or user. They can help
prevent overspending on cloud resources; therefore, they help you
ensure that your cloud usage is within your budget.
4.2Then there are budget threshold rules, which let you set alerts to be
informed when your cloud costs exceed a certain threshold. They
can act as an early warning for potential cost overruns, and let you
take corrective action before costs get out of control.
Both resource quota policies and budget threshold rules are set in
the Google Cloud console.
4.3And then there are Cloud Billing reports. Whereas resource quota
policies and budget threshold rules provide proactive means to
control cloud consumption, Cloud Billing reports offer a reactive
method to help you track and understand what you’ve already
spent on Google Cloud resources and provide ways to help
optimize your costs. You can use Cloud Billing reports to monitor
costs by exporting billing data to BigQuery. This means exporting
usage and cost data to a BigQuery dataset, and then using the
dataset for detailed analyses. You can also visualize data with tools
like Looker Studio.
After analyzing how you're spending on cloud resources, you
might realize that your organization can optimize costs through
committed use discounts (CUDs). If your workloads have
predictable resource needs, you can purchase a Google Cloud
commitment, which gives you discounted prices in exchange for
your commitment to use a minimum level of resources for a
specific term.
Chapter 2 Operation Excellence and Reliability at Scale
Fundamentals of cloud reliability. In today's rapidly evolving
digital landscape, organizations use cloud technology increasingly
to drive innovation, agility, and efficiency. However, harnessing
the true power of the cloud requires a comprehensive
understanding of operational excellence and reliability at scale.
Operational excellence and reliability refers to the ability of
organizations to optimize their operations and ensure uninterrupted
service delivery, even as they handle increasing workloads and
complexities in the cloud.
To meet the increased demand, the platform needs to scale its
resources rapidly while ensuring uninterrupted service availability.
Operational excellence here involves efficiently scaling the
underlying infrastructure, automating resource provisioning, and
implementing load balancing mechanisms.
Reliability focuses on minimizing downtime, employing fault-
tolerant systems, and employing disaster recovery strategies.
By excelling in these areas, the ecommerce platform can handle
the increased load seamlessly, deliver a consistently positive user
experience, and avoid revenue loss or reputational damage.
1. Fundamentals of cloud reliability
Within any IT team, developers are responsible for writing code
for systems and applications, and operators are responsible for
ensuring that those systems and applications operate reliably.
Developers are expected to be agile and are often pushed to write
and deploy code quickly. Their aim is to release new functions
frequently, increase core business value with new features, and
release fixes fast for an overall better user experience.
In contrast, operators are expected to keep the system stable, and
so they often prefer to work more slowly to ensure reliability and
consistency. Traditionally, developers pushed their code to
operators who often had little understanding of how the code
would run in a production or live environment.
When problems arise, it can be very difficult for either group to
identify the source of the problem and resolve it quickly. Worse,
accountability between the teams isn’t always clear.
DevOps is a software development approach that emphasizes
collaboration and communication between development and
operations teams to enhance the efficiency, speed, and reliability of
software delivery.
It aims to break down silos between these teams and foster a
culture of shared responsibility, automation, and continuous
improvement.
One particular concept within the DevOps framework is Site
Reliability Engineering, or SRE, which ensures the reliability,
availability, and efficiency of software systems and services
deployed in the cloud.
SRE combines aspects of software engineering and operations to
design, build, and maintain scalable and reliable infrastructure.
Monitoring is the foundation of product reliability. It reveals what
needs urgent attention and shows trends in application usage
patterns, which can yield better capacity planning and generally
help improve an application client's experience and lessen their
pain.
There are “Four Golden Signals” that measure a system’s
performance and reliability. They are latency, traffic, saturation,
and errors.
Latency measures how long it takes for a particular part of a system to return a
result. Latency is important because it directly affects the user experience, changes
could indicate emerging issues, its values might be tied to capacity demands, and it
can be used to measure system improvements.
Traffic measures how many requests reach your system. Traffic is important
because it’s an indicator of current system demand, its historical trends are used for
capacity planning, and it’s a core measure when calculating infrastructure spend.
Saturation measures how close to capacity a system is. It’s important to note,
though, that capacity is often a subjective measure that depends on the underlying
service or application. Saturation is important because it's an indicator of how full
the service is, it focuses on the most constrained resources, and it’s frequently tied
to degrading performance as capacity is reached.
And errors are events that measure system failures or other issues. Errors are often
raised when a flaw, failure, or fault in a computer program or system causes it to
produce incorrect or unexpected results, or behave in unintended ways. Errors are
important because they can indicate something is failing, configuration or capacity
issues, service level objective violations, or that it's time to send an alert.
Three main concepts in site reliability engineering are service-level indicators
(SLIs), service-level objectives (SLOs), and service-level agreements (SLAs).
They are all types of targets set for a system’s Four Golden Signal metrics.
Service level indicators are measurements that show how well a system or service
is performing. They’re specific metrics like response time, error rate, or percentage
uptime–which is the amount of time a system is available for use–that help us
understand the system's behavior and performance.
Service level objectives are the goals that we set for a system's performance based
on SLIs. They define what level of reliability or performance that we want to
achieve. For example, an SLO might state that the system should be available for
99.9% of the time in a month.
Service level agreements are agreements between a cloud service provider and its
customers. They outline the promises and guarantees regarding the quality of
service. SLAs include the agreed-upon SLOs, performance metrics, uptime
guarantees, and any penalties or remedies if the provider fails to meet those
commitments. This might include refunds or credits when the service has an outage
that’s longer than this agreement allows.
2. Designing resilient infrastructure and processes
When infrastructure and processes in a cloud environment are
designed, they need to be resilient, fault-tolerant, and scalable, for
high availability and disaster recovery.
Let's explore some of the key design considerations and their significance in more
detail.
-Redundancy refers to duplicating critical components or resources to provide
backup alternatives. Redundancy can be implemented at various levels, such as
hardware, network, or application layers. For example, having redundant power
supplies, network switches, or load balancers ensures that if one fails, the
redundant component takes over seamlessly. Redundancy enhances system
reliability and mitigates the impact of single points of failure.
-Replication involves creating multiple copies of data or services and distributing
them across different servers or locations. It ensures redundancy and fault tolerance
by allowing systems to continue functioning even if certain components or servers
fail. By replicating data across multiple servers, the impact of hardware failures or
outages is minimized, and the availability of services is improved.
Cloud service providers offer multiple regions or data center locations spread
across different geographic areas. By distributing resources across regions,
businesses can ensure that if an entire region becomes unavailable due to natural
disasters, network issues, or other incidents, their services can continue running
from another region. This approach improves resilience and reduces the risk of
prolonged service interruptions.
Building a scalable infrastructure allows organizations to handle varying
workloads and accommodate increased demand without compromising
performance or availability. Cloud technologies enable the dynamic allocation and
deallocation of resources based on workload fluctuations. Autoscaling mechanisms
can automatically adjust resource capacity to match demand, ensuring that services
remain available and responsive during peak periods or sudden spikes in traffic.
Regular backups of critical data and configurations are crucial to ensure that if data
loss, hardware failures, or cyber-attacks occur, organizations can restore their
systems to a previous state. Cloud providers often offer backup services, and they
let organizations automate backups, store them securely, and easily restore data
when needed.
Backups should be stored in geographically separate locations to protect against
regional outages or disasters.
These measures improve high availability, allow for rapid recovery from disasters
or failures, and minimize downtime and data loss. It’s important to regularly test
and validate these processes to ensure that they function as expected during real-
world incidents. Also, monitoring, alerting, and incident response mechanisms
should be implemented to identify and address issues promptly, further enhancing
the overall resilience and availability of the cloud infrastructure.
3. Modernizing operations by using Google Cloud
If you've ever worked with on-premises environments, you know
that you can physically touch the servers. If an application
becomes unresponsive, someone can physically determine why
that happened.
In the cloud though, the servers aren't yours—they belong to the
cloud
provider—and you can’t physically inspect them. So the question
becomes: how do you know what's happening with your server,
database, or application? The answer is: by using Google’s
integrated observability tools.
Observability involves collecting, analyzing, and visualizing data
from various sources within a system to gain insights into its
performance, health, and behavior.
To achieve this, Google Cloud offers Google Cloud Observability,
which is a comprehensive set of monitoring, logging, and
diagnostics tools. It offers a unified platform for managing and
gaining insights into the performance, availability, and health of
applications and infrastructure deployed on Google Cloud.
Let's look at some of the managed services that constitute Google
Cloud Observability.
-Cloud Monitoring provides a comprehensive view of your cloud infrastructure
and applications. It collects metrics, logs, and traces from your applications and
infrastructure, and provides you with insights into their performance, health, and
availability. It also lets you create alerting policies to notify you when metrics,
health check results, and uptime check results meet specified criteria.
-Cloud Logging collects and stores all application and infrastructure logs. With
real-time insights, you can use Cloud Logging to troubleshoot issues, identify
trends, and comply with regulations.
-Cloud Trace ensure the reliability, availability helps identify performance
bottlenecks in applications. It collects latency data from applications, and provides
insights into how they’re performing.
-Cloud Profiler identifies how much CPU power, memory, and other resources an
application uses. It continuously gathers CPU usage and memory-allocation
information from production applications and provides insights into how
applications are using resources.
-Error Reporting counts, analyzes, and aggregates the crashes in running cloud
services in real-time. A centralized error management interface displays the results
with sorting and filtering capabilities. A dedicated view shows the error details:
time chart, occurrences, affected user count, first- and last-seen dates, and a
cleaned exception stack trace. Error Reporting supports email and mobile alerts
notification through its API.
Google's integrated observability tools provided by Google Cloud Observability
offer valuable insights into the performance and health of applications and
infrastructure in the cloud.
4. Google Cloud Customer Care
Any cloud adoption program can encounter challenges, so it's
important to have an effective and efficient support plan from your
cloud provider.
Google Cloud Customer Care can simplify and streamline your
support experience with scalable and flexible services built with
your business needs at the center. There are four different service
levels, which lets you choose the one that’s right for your
organization.
Basic Support is free and is included for all Google Cloud
customers. It provides access to documentation, community
support, Cloud Billing Support, and Active Assist
recommendations.
Active Assist is the portfolio of tools used in Google Cloud to
generate insights and recommendations to help you optimize your
cloud projects.
Standard Support is recommended for workloads under
development. You can kickstart your cloud journey with unlimited
access to tech support, which lets you troubleshoot, test, and
explore. It offers unlimited individual access to English-speaking
support representatives during working hours, 5 days a week.
Standard support also provides access to the Cloud Support API,
which lets you integrate Cloud Customer Care with your
organization's customer relationship management (CRM) system.
Enhanced Support is designed for workloads in production, with
fast response times and additional services to optimize your
experience with high-quality, robust support. Support is available
24/7 in a selection of languages, and initial response times are
quicker than those provided by Standard Support. Enhanced
Support also offers technical support escalations and third-party
technology support to help you resolve multi-vendor issues.
Premium Support is designed for enterprises with critical
workloads. It features the fastest response time, Customer Aware
Support, and a dedicated Technical Account Manager. Our
Premium Support level also offers credit for the Google Cloud
Skills Boost training platform, an event management service for
planned peak events, such as a product launch or major sales
events, operational health reviews to help you measure your
progress and proactively address blockers to your goals with
Google Cloud,
Customer aware support, where Customer Care learns and
maintains information about your architecture, partners, and
Google Cloud projects. This information ensures that our support
experts can resolve your cases promptly and efficiently.
Both the Enhanced and Premium support plans offer Value-Add
Services that are available for additional purchase.
You can learn more about the value-add services and all Google
Cloud Customer Care support offerings at
[Link]/support.
5. The life of a support case
Any Google Cloud customer on the Standard, Enhanced, or
Premium Support plan can use the Google Cloud console to create
and manage support cases. Outside of filing a support case through
the Google Cloud console, Customer Care Support also offers
other contact options for live interactions with Support staff such
as phone and video call support.
The life of a support case during the Google Cloud Customer Care
process typically involves several stages and interactions between
the customer and the support team.
Here's an overview of the typical journey of a support case.
First, the customer initiates the support request by creating a case
in the Google Cloud Console.
Only users who were assigned the Tech Support Editor role within
an organization can do this. The customer provides relevant details
about the issue they are experiencing, including any error
messages, logs, or steps to reproduce the problem.
It’s important for the user to select a priority from P4, which means
low impact, up to P1, which means critical impact, because this
will influence response times from the Customer Care team.
After the case is created, it goes through a triage process. The team
reviews the information provided by the customer to understand
the problem and determine its severity and impact on the
customer's business operations. The team might request additional
information or clarification from the customer at this stage. In
many cases, the Customer Care representative will resolve the
case, but for more complex issues, the case is assigned to a support
engineer with the appropriate level of expertise.
After the case is assigned, the team starts the troubleshooting and
investigation process. They analyze the provided information,
review system logs, and conduct various diagnostic tests to identify
the root cause of the issue. Depending on the complexity of the
problem, this stage might involve collaboration with other internal
teams or experts.
Throughout the investigation, the Customer Care team maintains
regular communication with the customer. They provide updates
on the progress, share findings, and request additional information
or actions from the customer when needed. Escalation is meant for
flagging process breaks or for the rare occasion that a case is stuck
because a customer and the Customer Care team aren’t fully in
sync, despite actively communicating the issue to determine the
next steps.
However, it’s important to note that escalation isn’t always the best
solution, and with high-impact issues, escalation might not make
the case go faster.
This is because escalation can disrupt the workflow of the
Customer Care team and lead to delays in other cases. The best
solution for high-impact issues is to ensure that the case is set to
the appropriate priority, ensuring that the case is assigned to the
right resources as quickly as possible.
Escalation is a tool that can be used to regain traction on a stuck
case.
However, it’s important to use escalation sparingly and only when
it’s absolutely necessary.
When the root cause is identified, the team works on resolving the
issue or providing a mitigation plan.
They might provide the customer with step-by-step instructions,
configuration changes, or workaround suggestions to address the
problem. In some cases, they might consult the issue with higher-
level support or engineering teams for further assistance. The
Customer Care team might also need to submit a feature request to
the Google Cloud engineering team. After implementing the
resolution or mitigation plan, the Customer Care team collaborates
with the customer to validate the effectiveness of the solution.
They might request the customer to perform specific tests or
provide feedback on the outcome. This step ensures that the
problem is fully resolved and meets the customer's expectations.
When the customer confirms that the issue is resolved, the support
case is closed. The team provides a summary of the resolution,
documents the steps taken, and ensures that the customer is
satisfied with the outcome.
If needed, they might also offer recommendations for preventive
measures or future best practices to avoid similar issues. The
customer also receives a feedback survey, so the support team can
learn what they did well and what needs improvement. Throughout
the entire lifecycle of the support case, Google Cloud’s Customer
Care team aims to provide timely and effective assistance to the
customer. They prioritize customer satisfaction, responsiveness,
and strive to address the possible technical challenges faced by
customers when they use Google Cloud services.
Chapter 3 Sustainability with Google Cloud
As we get closer to the end of this Cloud Digital Leader training,
where you’ve explored how cloud computing can help transform
the way you do business, it’s important that we underscore our
technology efforts at Google with our commitment to the
environment and sustainability. The virtual world, which includes
Google Cloud’s network, is built on physical infrastructure, and all
those racks of humming servers use huge amounts of energy.
Altogether, existing data centers use nearly 2% of the world’s
electricity. With this in mind, Google works to make our data
centers run as efficiently as possible.
Just like our customers, Google is trying to look after the planet.
We understand that Google Cloud customers have environmental
goals of their own, and running their workloads on Google Cloud
can be a part of meeting those [Link], it’s useful to note
that Google's data centers were the first to achieve ISO 14001
certification, which is a standard that outlines a framework for an
organization to enhance its environmental performance through
improving resource efficiency and reducing waste.
As an example of how this is being done, here’s Google’s data
center in Hamina, Finland. This facility is one of the most
advanced and efficient data centers in the Google fleet. Its cooling
system, which uses sea water from the Bay of Finland, reduces
energy use and is the first of its kind anywhere in the world.
In our founding decade, Google became the first major company to
be carbon neutral. In our second decade, we were the first
company to achieve 100% renewable energy. And by 2030, we aim
to be the first major company to operate completely carbon free.
We meet the challenges posed by climate change and the need for
resource efficiency by working to empower everyone—businesses,
governments, nonprofit organizations, communities, and
individuals—to use Google technology to create a more
sustainable world.
So, what does that look like in practice?
Let’s explore an example of how one customer, Kaluza, uses
Google Cloud technology to launch smart electric vehicle charging
programs that help customers save money while it reduces their
carbon footprint. Electric vehicles already account for one in seven
car sales globally, and with new gas and diesel cars being phased
out across the world, global sales are forecast to reach 73 million
units by 2040. But with power grids becoming increasingly
dependent on variable energy sources such as wind and solar,
rising demand from electric vehicles risks overstraining grids at
peak times, which can potentially lead to power outages. Launched
by OVO Energy in 2019, Kaluza has taken its deep understanding
of the energy market to partner with some of the world’s major
energy suppliers and vehicle manufacturers.
With a program called Charge Anytime, customers use Kaluza to
smart-charge their electric vehicle, and they pay just about a third
of their household electricity rate to do so.
This means that if the customer plugs in their vehicle to charge
when they get home from work at, say, 6:00 p.m.—a time when
both demand and the carbon intensity on the grid are at their
highest—their vehicle will then be smartly charged at the lowest
cost and greenest periods throughout the night, which leaves it
ready for when they need it in the morning.
Behind Kaluza’s smart charging solution lies some sophisticated
technology, all of which is built on Google Cloud. Their core
optimization engine gathers real-time data from a wide range of
sources, including battery and charging data from the electric
vehicles, and data from the energy suppliers and grid operators,
such as the carbon intensity, and price forecasts.
That data is stored in BigQuery where it’s used to train and
validate the smart charging optimization models. These models are
then deployed with Google Kubernetes Engine so that whenever a
customer plugs in an electric vehicle, data from that vehicle passes
in real-time through their optimization engine to calculate the ideal
charging schedule for that vehicle, which ensures that it uses the
cheapest, least carbon-intensive energy available.
And as for the grid operators and energy companies, the Kaluza
platform lets them visualize how many participating electric
vehicles are plugged into the network at any one time. BigQuery
and Looker Studio dashboards provide granular insights, such as
how many vehicles are idle, how many are charging, and how well
our optimization engine is working. At Google, we remain
committed to sustainability and continue to lead and encourage
others, like Kaluza, to join us in improving the health of our planet.