App Dev &
Infrastructure
playbook
Building the enterprise
of tomorrow
Build the enterprise
of tomorrow
In designing high-performance applications and infrastructure, today’s technology
leaders face a careful balancing act. They are expected to leverage emerging
technologies like generative AI to drive productivity, while refining core applications for
optimal customer experiences. And they need to modernize legacy systems to stay
competitive, while minimizing costs, managing scalability demands, and ensuring
seamless integration across platforms.
“The dramatic impact of generative AI really can’t be
overstated. This is a once-in-a-generation inflection
point and it’s impacting all of us. It feels a bit sudden,
but we know the technology behind it has been the
result of decades of technology advancements.”
Amin Vahdat
VP/GM ML, Systems & Cloud AI, Google Cloud
In this guide, we invite you to follow in the path of Google leaders who have navigated the
journey firsthand as they provide you with the tips and helpful tools to stay ahead.
If you'd like more information, you can watch all the 2024 Google Cloud App Dev &
Infrastructure Summit talks on-demand, or connect with a specialist to discuss your AI
development goals.
Contents
04 Meet the fellows
Three Google Fellows share how Google’s rich history of innovation
has helped pave a path towards its future.
05 Infrastructure
07 Optimize traditional workloads
Use Google's latest infrastructure innovations and partnerships to handle
performance-intensive workloads.
08 Migrate your VMware workloads to the cloud
Drive strategic innovation and competitiveness with VMware.
09 Build for cost-efficient AI workloads
Learn three strategies for navigating heterogeneous workloads in artificial intelligence.
10 Build for hybrid and multicloud
Tackle the complexities of distributed applications head-on.
10 Build for on-premises
Extend your AI-enabled infrastructure from the cloud to on-premises environments.
12 Application development
13 Build a container platform
Build, deploy, and scale software for thousands of clusters.
14 Enhance customer experience
Deliver the prompt interactions and personalized experiences your customers expect.
15 Build for developer success
Equip your developers with the right tools to help them get AI apps to market.
16 Fine-tune AI models
Join the Kaggle community to experiment with AI and machine learning.
Meet the fellows
Google Fellows are some of our longest tenured and most innovative technical leaders.
In this exclusive panel discussion, they share the lessons they've learned, the
cutting-edge innovations they're developing, and how you can translate their
experience into actionable strategies for your own organization.
Watch the full session
Jeff Carrie Grimes Eric
Dean Bostock Brewer
Chief Scientist, VP, Engineering Fellow VP and Fellow
Google Deepmind and
Google Research
4
Infrastructure
Given AI's transformative potential, it is our responsibility to ensure it is built on strong
foundations for long-term sustainability. Cost efficiency is a fundamental key component
here, helping to meet the demands of today's applications while ensuring teams aren’t
overextended.
Before deploying AI workloads, it’s critical to modernize traditional systems like Oracle
and VMware. With this core in place, enterprises can optimize hybrid and multicloud
networks to seamlessly deploy AI-enabled apps across any environment—from cloud to
on-premises.
This section explore a few ways you can boost productivity while enhancing cost
efficiency, paving the way for sustained innovation and a competitive edge.
5
"A lot of things we're doing in AI today rely on the infrastructure and systems we built
and improved over many decades.
In the earliest days of Google, we were essentially a single product search engine trying
to figure out how to build and scale systems that could crawl the web, index it, and
serve queries to an ever-growing number of users per second. At the same time, we
were trying to scale the size of our index because more pages meant better search
quality. We were handling more queries at lower latency for an index that could be
updated more often. And that combination of factors was really one of the exciting
scaling challenges in computing.
Now, we're at a similar point with AI, where we're working on building highly
scalable AI systems at a scale the world has not attempted before.
There's always things that go wrong, but I think it's an exciting time. We now have
systems that can see and understand language better than anything we've built before,
and that's exciting for what we can do with computers."
Jeff Dean
Chief Scientist,
Google Deepmind and Google Research
6
Optimize traditional
workloads
Many enterprises are using traditional workloads, yet are eager to take advantage of
cloud infrastructure. For these legacy systems, modernization isn’t as simple as flipping a
switch. Increasingly, companies are turning to infrastructure as a service (IaaS) to
maintain their existing workloads while reaping the security, reliability, and performance
benefits of cloud.
Get the best of Oracle
Google and Oracle’s groundbreaking multicloud partnership means that Oracle DB and
apps are authorized and supported on Google Cloud—simplifying cloud migration,
multicloud deployment, and management.
Watch the full session
7
Migrate your VMware
workloads to the cloud
Migrating your VMware Workloads to the cloud can help you realize cost efficiencies,
accelerate innovation cycles, and free your business from the restrictions of on-premises
infrastructure.
Here are four proven ways that migrating to Google Cloud VMware Engine can
benefit your enterprise.
Migrate at speed Reduce security and
and with ease performance risks
with 58% faster application with a 52% efficiency boost
migrations and deployments across security teams
Increase flexibility and Reduce OpEx and increase
scalability savings
with 74% less time spent on with a 391% estimated three-year
operational tasks ROI for companies that migrate to
Google Cloud VMware Engine
Watch the full session
8
Build for cost-efficient
AI workloads
Working with heterogeneous AI workloads is a challenge.
Here are three ways your development team can optimize them.
01 Boost efficiency with AI Hypercomputer
Google's AI Hypercomputer architecture enables choice and flexibility to scale
across AI training, tuning, and serving applications.
02 Manage costs with scheduled workloads
Dynamic Workload Scheduler (DWS) works across Google Kubernetes Engine
(GKE) and Vertex AI to offer job start time assurance with future reservations
while optimizing economics and higher obtainability for on-demand resources.
03 Customize storage and networking for AI
workloads
GPUs and TPUs require high-performance storage and networking to succeed.
Use Cloud Storage FUSE caching, Parallelstore Caching, and Hyperdisk ML to
improve training, fine tuning, and model load time, respectively.
“The new DWS scheduling capabilities have been a game-changer
in procuring sufficient GPU capacity for our training runs. We
didn’t have to worry about wasting money on idle GPUs while
refreshing the page hoping for sufficient compute resources to
become available.”
Sahil Chopra
Co-Founder & CEO, Linum AI
Watch the full session
9
Build for hybrid and
multicloud
Anyone running distributed applications will have likely encountered the challenges that
are making network security complex and expensive: Degraded application performance,
unpredictable data transfer costs, lack of deep observability, and complex service
customization.
Google's Cross-Cloud Network can ease the workload with:
● Fixed port pricing for more predictable monthly costs
● VPC flow logs for more granular visibility over hybrid traffic
● Traffic marking to prioritize bandwidth for critical applications
Watch the full session
Build for on-premises
Google Distributed Cloud extends this AI-enabled infrastructure and AI models from the
cloud to your on-premises environments.
This allows your developers to build using the same tools, frameworks, and operational
practices they are used to, so they can:
● Innovate faster with enterprise ● Scale anywhere with
ready gen AI cloud-native agility
● Address data residency and ● Enable modern retail and
operational sovereignty needs manufacturing experiences
● Run modern applications in an open ● Build apps with an
and consistent environment air-gapped option
Watch the full session
10
"When we first worked on Search, there were many things we had to come to terms
with. It had to be a good quality product. You could index everything in the world but
you had to produce something sensible that people could rely on and that they think is
high quality. Then you need to deal with spam and people trying to game the system,
and I think those challenges are really similar to what we're confronting with AI today.
The second thing that really came up is the uncertainty. When we first tried index
selection assisted by machine learning, that was a huge challenge because there was so
much institutional knowledge built in. We had to be ready to take a risk in that we were
not just going to use an algorithm to choose the index, but we were going to do it online
all the time, with trillions of pages per day, and we had to be convinced that over hours,
minutes, days, that those decisions were the right ones.
We're really dealing with something similar in AI. How do you make it into a
product? How do you make it a product somebody wants to use? How do you
manage all the data you're moving around and making sure it's the right data at
the right place and at the right time?”
Carrie Grimes Bostock
VP, Engineering Fellow
11
Application
development
Businesses (and developers) are navigating a tightrope in developing scalable and secure
applications that meet evolving customer demands. As these expectations change,
organizations are prioritizing robust application strategies to stay ahead—and container
platforms form an essential foundation to build from.
From a strong containerized foundation, teams can streamline development while also
bolstering security and accelerating innovation—particularly in AI-powered applications.
And once established, these platforms open doors for further AI-driven enhancements,
both in internal processes and user-facing experiences.
12
Build a container
platform
Building, deploying, and scaling software for thousands of clusters is complex.
Transformative results require more than just adding AI to existing processes—and
managing hardware and software for fast transactions, predictive analytics, and visual
inspections only adds to the challenge. By leveraging AI-enabled cloud infrastructure,
advanced models, and on-premises computing, businesses can unlock greater value and
take a leap of imagination beyond incremental improvement.
Simplify scaling with a container platform
Container platforms can scale for thousands of clusters, and Google Kubernetes Engine
(GKE) pushes it further by supporting clusters with up to 65,000 nodes (offering 10x
large scale than the two largest public cloud providers). Plus, Cupid Run enhances the
container experience by enabling 98% of apps to deploy successfully in under 5 minutes
on their first attempt.
The 3 P’s building blocks for successful AI applications:
P P P
Proximity Platform Productivity
Bring your data and Leverage logical Build a spirit of
applications close platforms that collaboration through
together, enabling combine data, LLMs, AI-assisted tools,
improved customer App, and APIs to help creating more time
experienced through innovate and deliver for innovation and less
real-time capabilities value for crucial use time on operations.
and personalisation. cases.
Watch the full session
13
Enhance customer
experience
Today’s customers crave prompt interactions and personalized experiences, and businesses are
relying on generative AI to deliver them faster than ever.
Here are four tips for enhancing your customer experience with Google Cloud and gen AI.
Enhance personalization through publicly available AI models
Use Vertex AI Agent Builder to integrate and deploy cutting edge models that can
help serve relevant content, recommend products, and make offers in real time.
Deliver lightning-fast response times
Use fine-tuning models and tools like Agent Assist to create customer
interactions that are instantaneous and feel effortless.
Humanize the digital journey with AI capabilities
Leverage Google Cloud’s Conversational Agents and Dialogflow for AI-powered
chatbots, personalized content recommendations, and automated content
generation to redirect your teams toward higher-value customer interactions.
Scale personalization without limits
Grow your AI initiatives effectively, ensuring consistently fast and personalized
experiences even during peak demand.
Quickly deploy open models on GKE
Open model repositories like Model Garden and Hugging Face offer hundreds of
millions of open-source models, ready for direct deployment to GKE with
optimized configurations—making it easy to jumpstart your AI/ML projects.
Watch the full session
14
Build for developer
success
If you’re struggling to get AI apps to market, your development process might need a
refresh. Gemini Code Assist works with software development life cycle (SDLC) tools to
make your developers more productive, accelerating your AI innovation journey.
Here are four new ways that Gemini Code Assist is helping your development
team find success.
01 02
Powered by Gemini 1.5 Pro Local CodeBase awareness
Gemini Code Assist now leverages Google Code Assist can now include
Google DeepMind's latest foundation files on your Google Drive as an input
model, enabling access to a 2 million into its responses, offering your team
token context window for more a whole new level of personalization.
accurate and contextually relevant
code generation.
03 04
Code customization Code transformation
Securely share your pre-existing Take existing code bases and
codebase to Gemini, so your entire re-transform them to bring the power
company can use it as context to help of gen AI models to your coding
generate net-new code. interface.
Watch the full session
15
Fine-tune AI models
Did you know you can build AI-enabled applications without the usual training and
development process? With the help of the online Kaggle community, you can learn and
experiment with fine-tuning to best suit your needs.
Here are two ways you can improve your productivity through fine-tuning with Kaggle.
01 Find resources similar to your use case with Kaggle competitions
Search competitions to see winning responses (as voted by the Kaggle community),
download best-practice models, engage in technical discussion, and see example code
for fine-tuning.
02 Scale your performance with MaxText on Kaggle
Fine-tune large AI models on TPU clusters with MaxText for fast, scalable
experimentation and optimized performance—or learn from pre-existing fine-tuning
models uploaded on KaggleHub.
Watch the full session
"I wanted to build large scale systems that were easy to program and could solve really
large problems. The fact that I was working on Search even before Google existed was
because it was the first problem we had in the world which really required giant
infrastructure, which made it super appealing.
That work indirectly led me to Borg, but in Google there is very much a focus on
containers and processes and API's and all our software is built that way, so it's natural
to say 'how far can you push that model?'. And it's a great model for large scale systems;
it’s really the only model that's worked well. So that really led to Kubernetes, and now
we're seeing Kubernetes being used largely for AI. All the largest players in this
industry are building with Kubernetes. So it's great to see that transition and GKE is by
far the best way to do that today, and it's evolving very quickly.”
Eric Brewer
VP and Fellow
16
Thank you
We’re living in an exciting time. Google’s mission to organize the world’s information and
make it universally accessible has been driven by an AI-first approach, giving us a unique
advantage in understanding the fundamental challenges of this field.
However, our mission means little without our community, whose contributions push the
boundaries of innovation every day.
The Google Cloud App Dev & Infrastructure Summit was designed to celebrate this
community—your community—while offering insights and inspiration to help elevate your
work.
We invite you to watch the full slate of talks on-demand and connect with one of
our specialists to explore how Google Cloud can help optimize your AI
development.