Infrastructure Options For Serving Advertising Workloads (Part 1) PDF
Infrastructure Options For Serving Advertising Workloads (Part 1) PDF
Ad servers and bidders are often complementary platforms with overlapping technology. To
avoid duplicating content, this article and its companion, part 2
(/solutions/infrastructure-options-for-data-pipelines-in-advertising) , provide context for the series:
When dealing with either the buy or sell side of the platforms, consider the following:
https://cloud.google.com/solutions/infrastructure-options-for-serving-advertising-workloads 1/21
26/10/2020 Infrastructure options for serving advertising workloads (part 1)
Reproducibility: When you replicate a system in different regions across the globe, the
ability to consistently deploy the same infrastructure provides consistency across the
platform.
Load balancing: A single machine cannot handle ad-tech loads. Distribute both
internal and external requests across multiple servers.
Autoscaling: Ad-request loads uctuate over the course of the day. You can reduce
costs and increase availability by automatically scaling your system up and down.
Google Cloud offers several options for running your computational workloads. Consider
the following options:
App Engine (/appengine) for running a web user interface (UI) minus most of the
operational overhead.
These options are recommendations and are often interchangeable. Your requirements are
ultimately the deciding factor, whether that factor is cost, operational overhead, or
performance.
Compute Engine and GKE both support preemptible VMs (/preemptible-vms), which are often
used in ad-tech workloads to save on cost. Preemptible VMs can be preempted with only a
one-minute warning, however, so you might want to do the following:
If you use Compute Engine, two different managed instance groups (one preemptible
and the other with standard VMs) can reside behind the same load balancer. By
making sure that one group consists of standard VMs, you ensure that your frontend
is always able to handle incoming requests. The following diagram shows this
approach.
https://cloud.google.com/solutions/infrastructure-options-for-serving-advertising-workloads 2/21
26/10/2020 Infrastructure options for serving advertising workloads (part 1)
Standard Preemptible
Anycast IP
Cloud Load
Balancing
If you use GKE, you can mitigate cost with availability by creating both non-
preemptible and preemptible node pools
(/kubernetes-engine/docs/how-to/preemptible-vms) in your GKE cluster.
Geographic locations
Advertisers might want to target customers in all regions around the globe. Adding a few
extra milliseconds to one of the platform's UI frontends won't decrease your advertisers'
experience when, for example, they are visualizing performance data. Be careful, however, if
additional networking distance adds a few extra milliseconds to the bidding response.
Those few milliseconds might prove to be the difference in whether the advertiser's bid gets
accepted and their ad gets served to the customer.
If latency is critical, you might want to distribute some of your services across those zones
in different regions. You can customize your setup based on your needs. For example, you
could decide to have some frontend servers in us-east4 and us-west1, but have data
stored in a database in us-central. In some cases, you could replicate some of the
database data locally onto the frontend servers; alternatively you might consider a multi-
regional Cloud Spanner instance (/spanner/docs/instances#multi-region_con gurations).
Reproducibility
Reproducibility offers simpler maintenance and deployment, and having the platform run in
all relevant geographic locations is key to returning bids before the critical deadline. To
ensure reproducibility, every region must perform similar work. The main difference is the
https://cloud.google.com/solutions/infrastructure-options-for-serving-advertising-workloads 3/21
26/10/2020 Infrastructure options for serving advertising workloads (part 1)
workload, and how many machines and zones are required to scale to meet the regional
demand.
Instance
template
us-west1 us-east1
Bidders Bidders
Load balancing
For external tra c, such as ad or bid requests, you can use Cloud Load Balancing
(/load-balancing), which provides HTTP(S) and Network load balancing capabilities.
With no pre-warming and its global load balancing and single anycast IP
(/load-balancing/docs/load-balancing-overview#types-of-cloud-load-balancing), your system
can receive millions of requests per second from anywhere in the world.
For internal tra c between services, Google Cloud provides Internal load balancing
(/load-balancing/docs/internal).
If you decide to use Kubernetes for some parts of the infrastructure, we recommend that
you use GKE. Some Kubernetes features might need some extra implementation if your
https://cloud.google.com/solutions/infrastructure-options-for-serving-advertising-workloads 4/21
26/10/2020 Infrastructure options for serving advertising workloads (part 1)
provider does not natively support them. With GKE, Kubernetes can use native Google
Cloud features:
Creating an Ingress (https://kubernetes.io/docs/concepts/services-networking/ingress/) or
Istio Ingress Gateway
(https://istio.io/docs/tasks/tra c-management/ingress/ingress-control/) uses HTTP load
balancing.
Scaling
Because your platform must be able to parse and calculate billions of ad requests per day,
load-balancing is a must. Besides, a single machine would be inadequate for that task.
However, the number of requests tends to change throughout the day, meaning that your
infrastructure must be able to scale up and down based on demand.
If you decide to use Compute Engine, you can create autoscaling managed instance groups
(/compute/docs/autoscaler) from instance templates (/compute/docs/instance-templates). You
can then scale those groups on different metrics such as HTTP load, CPU, and Cloud
Monitoring custom metrics, such as application latency. You can also scale these groups
on a combination of these metrics.
Autoscaling decisions are based on the metric average over the last ten minutes and are
made every minute using a sliding window. Every instance group can have its own set of
scaling rules.
https://cloud.google.com/solutions/infrastructure-options-for-serving-advertising-workloads 5/21
26/10/2020 Infrastructure options for serving advertising workloads (part 1)
Network communication
Virtual Private Clouds (VPCs) can span multiple regions. In other words, if you have
database read replicas in us-east and a master database in asia-southeast within the
same VPC, they can securely communicate using their private IPs or hostnames without
ever leaving the Google network.
In the following diagram, all instances are in the same VPC and can communicate directly
without the need of VPN, even if they are in different regions.
VPC
10.0.0.0/16
us-central1
Database
Compute Engine
us-west1 us-east1
Bidders Bidders
Cloud Load
Balancing
GKE clusters are assigned a VPC when they are created and can use many of those existing
networking features.
https://cloud.google.com/solutions/infrastructure-options-for-serving-advertising-workloads 6/21
26/10/2020 Infrastructure options for serving advertising workloads (part 1)
When you use managed products such as Cloud Bigtable, Cloud Storage, or BigQuery,
Google Cloud provides Private access (/vpc/docs/con gure-private-google-access) to those
products through the VPC.
User frontend
Your user frontend is important, but it requires the least technical overhead because it is
handling much smaller workloads. The user frontend offers platform users the ability to
administer advertising resources such as campaigns, creatives, billing, or bids. The
frontend also offers the ability to interact with reporting tools—for example, to monitor
campaigns or ad performance.
Both of these features require web serving to provide a UI to the platform user, and
datastores to store either transactional or analytical data.
Web serving
Provide a UI.
Your interface likely includes functionality such as a dashboard and pages to set up
advertisers, campaigns, and their related components. The UI design itself is a separate
discipline and beyond the scope of this article.
https://cloud.google.com/solutions/infrastructure-options-for-serving-advertising-workloads 7/21
26/10/2020 Infrastructure options for serving advertising workloads (part 1)
Datastores
Handling requests
Frontends
Requests are sent to be processed to an HTTP(S) endpoint that your platform provides. The
key components are as follows:
A pool of instances that can scale up and down quickly based on various KPIs.
Both Compute Engine and GKE are good options as computing platforms:
GKE uses Cloud Load Balancing and Ingress (or Istio Ingress Gateway), Horizontal
Pod Autoscaler, and Cluster Autoscaler.
Because pod scaling is faster than node scaling, GKE might offer faster autoscaling on a
per-service level. GKE also supports container-native load balancing
(/kubernetes-engine/docs/how-to/container-native-load-balancing) to optimize requests routing
directly to an instance that hosts a pod of the relevant service.
https://cloud.google.com/solutions/infrastructure-options-for-serving-advertising-workloads 8/21
26/10/2020 Infrastructure options for serving advertising workloads (part 1)
Parsing
Ad requests are commonly formatted in JSON or protobuf format with information such as
IP address, user agent, or site category. It's critical to extract this data, which might also
contain details on the (unique) user, to then retrieve segments to select and lter ads.
Static ltering
Some requests, typically received on the buyer side, can be discarded by making use of
static rules. Such early ltering can reduce the amount of data and the complex processing
that is required downstream.
Rules might be publisher blacklists or content type exclusion. During initialization the
workers can pull and load these rules from a le hosted on Cloud Storage (/storage).
Ad selection
Ad selection can be performed in the various services or platforms, including: the publisher
ad server, the advertiser ad server, or the DSP. There are different levels of complexity when
selecting an ad:
More advanced selections incorporate user attributes and segments and potentially
involve machine learning–based ad-recommendation systems.
RTB systems usually make the most complex decisions. Ads are selected based on
attributes such as (unique) user segments and previous bid prices. The selection also
includes a bid calculation to optimize the bid price on a per-request basis.
Choosing the relevant ad is the core function of your system. You have many factors to
consider, including advanced rules–based or ML-selection algorithms. This article, however,
continues to focus on the high-level process and the interactions with the different
datastores.
1. Retrieve the segments associated with the targeted user from the (unique) user
pro le store (#(unique)_user_pro le_store).
2. Select the campaigns or ads that are a good match with the user's segments. This
selection requires reading metadata from the metadata management store
https://cloud.google.com/solutions/infrastructure-options-for-serving-advertising-workloads 9/21
26/10/2020 Infrastructure options for serving advertising workloads (part 1)
3. Filter the selected campaigns or ads in alignment with the metrics, for example the
remaining budget, stored in one of the context stores (#context_stores).
Bidders have more steps related to bids and auctions, and they have harder latency
requirements. For more details about bidder requirements during the ad selection, see
Infrastructure options for RTB bidders (part 4) (/solutions/infrastructure-options-for-rtb-bidders).
Most of the decisions made when selecting an ad require intelligent data that:
How you choose your datastore depends on how you prioritize the following requirements:
Minimizing read or write latencies: If latency is critical, you need a store that is close
to your servers and that can also handle fast reads or writes at scale.
Minimizing operational overhead: If you have a small engineering team, you might
need a fully managed database.
Scaling: To support either millions of targeted users or billions of events per day, the
datastore must scale horizontally.
Adapting querying style: Some queries can use a speci c key, where others might
need to retrieve records that meet a different set of conditions. In some cases, query
data can be encoded in the key. In other cases, the queries need SQL-like capabilities.
https://cloud.google.com/solutions/infrastructure-options-for-serving-advertising-workloads 10/21
26/10/2020 Infrastructure options for serving advertising workloads (part 1)
There are different options to address heavy-read requirements. These include read
replicas, caching systems, in-memory NoSQL databases, and managed wide-column
NoSQL databases.
When using Cloud SQL (/sql) (or an equivalent RDBMS installed and managed on Compute
Engine), you can o oad the reads from the master instance. Many databases natively
support this feature. Workers could query needed information by:
Rule Database
Cloud SQL
Read replicas are designed to serve heavy read tra c, but scalability is not linear and
performance can suffer with larger number of replicas. If you need either reads or writes
that can scale, with global consistency and minimum operational overhead, then consider
using Cloud Spanner (/spanner).
You can use a caching layer such as Redis on Compute Engine with optional local replicas
on the workers. This layer can greatly minimize latency in both reading and networking. The
following diagram shows what this layer looks like.
https://cloud.google.com/solutions/infrastructure-options-for-serving-advertising-workloads 11/21
26/10/2020 Infrastructure options for serving advertising workloads (part 1)
Rule Database
Cloud SQL
NoSQL Master
Compute Engine
Frontends
Compute Engine
NoSQL slave
Deploy an in-memory database such as Aerospike or Redis to provide fast reads at scale.
This solution can be useful for regional data and counters. If you are concerned by the size
of the data structures stored, you can also leverage in-memory databases that can write to
SSD disks. The following diagram shows what this solution looks like.
https://cloud.google.com/solutions/infrastructure-options-for-serving-advertising-workloads 12/21
26/10/2020 Infrastructure options for serving advertising workloads (part 1)
Rule database
Cloud SQL
Frontend
Kubernetes Engine
Wide-column datastores are key-value stores that can provide fast reads and writes at
scale. You can install a common open source database such as Cassandra or HBase.
Intelligence tables
Cloud Bigtable
Frontend
Kubernetes Engine
https://cloud.google.com/solutions/infrastructure-options-for-serving-advertising-workloads 13/21
26/10/2020 Infrastructure options for serving advertising workloads (part 1)
For static data that can be saved in protobuf, AVRO, or JSON format, workers can load from
Cloud Storage (/storage) during initialization and persist the content in RAM. The following
diagram shows what this process looks like.
Cloud
Storage
Frontends
Compute Engine
There is no-one-size- ts-all solution. Choose between the solutions based on your priorities,
and balance latency, cost, operations, read/write performance, and data size.
Operational Read/Write
Solution Latency Cost Data size
overhead performance
In-memory sub-millisecond Compute based High Scales with number of Scales with
stores nodes number of nodes
This section covers data storage options that address one of three different scenarios:
https://cloud.google.com/solutions/infrastructure-options-for-serving-advertising-workloads 14/21
26/10/2020 Infrastructure options for serving advertising workloads (part 1)
Analytical stores are used o ine through ad hoc queries or batch data pipelines. They
support hundreds of terabytes of data stored daily.
The (unique) user pro le store (#unique_user_pro le_store) is used when pro ling the
(unique) user to match the user with an ad. Data is updated mostly using events
(/solutions/infrastructure-options-for-data-pipelines-in-advertising#event_lifecycle) (in part 2).
This data requires persistence.
The metadata management store contains the reference data to which you apply rules
when making the ad selection. Some resources stored here are speci c to a platform, but
others might overlap:
For a buy-side ad server, buyers manage data about campaigns, creatives, advertisers,
and pricing. Advertisers can often update this data themselves through a UI.
For a DSP, buyers manage data about campaigns, creatives, advertisers, and bid
prices. Advertisers can often update the data themselves through a UI.
Writes are the result of platform user edits through the frontend and happen
infrequently.
https://cloud.google.com/solutions/infrastructure-options-for-serving-advertising-workloads 15/21
26/10/2020 Infrastructure options for serving advertising workloads (part 1)
Focusing on the user frontend, the campaign metadata database must be able to manage
resource relationships and hierarchies, and store megabytes to a few gigabytes of data.
The database must also provide reads and writes in the range of hundreds of QPS. To
address these requirements, Google Cloud offers several database options, both managed
and unmanaged:
Cloud SQL (/sql): A fully managed database service that can run MySQL or
PostgreSQL.
Custom: You can also install and manage many of the open source or proprietary
databases (such as MySQL, PostgreSQL, MongoDB, or Couchbase) on Compute
Engine or GKE
Your requirements can help narrow down options, but at a high level you could use Cloud
SQL due to its support for relational data. Cloud SQL is also managed and provides read
replica options.
As mentioned previously, the metadata store is not only used by platform users for
reporting or administering but also by the servers that select ads. Those reads happen
billions of times a day. There are two main ways to approach that heavy-read requirements:
Using a database that can handle consistent writes globally and billions of reads per
day like Spanner.
Decoupling reads and writes. This approach is possible because metadata is not
changed often. You can read more about this approach in exports
(/solutions/infrastructure-options-for-data-pipelines-in-advertising#exports) (in part 2).
This store contains (unique) users and their associated information that provide key
insights to select a campaign or ad on request. Information can include the (unique) user's
attributes, your own segments, or segments imported from third-parties. In RTB, imported
segments often include bid price recommendations.
https://cloud.google.com/solutions/infrastructure-options-for-serving-advertising-workloads 16/21
26/10/2020 Infrastructure options for serving advertising workloads (part 1)
This datastore must be able to store hundreds of gigabytes, possibly terabytes of data. The
datastore must also be able to retrieve single records in, at most, single-digit-millisecond
speeds. How much data you store depends on how detailed your (unique) user information
is. At a minimum, you should be able to retrieve a list of segments for the targeted user.
The store is updated frequently based on the (unique) user's interaction with ads, sites they
visit, or actions they take. The more information, the better the targeting. You might also
want to use third-party data management platforms (DMPs) to enrich your rst-party data.
Cloud Bigtable or Datastore are common databases to use for (unique) user data. Both
databases are well suited to random reads and writes of single records. Consider using
Cloud Bigtable only if you have at least several hundreds of gigabytes of data.
Other common databases such as MongoDB, Couchbase, Cassandra, or Aerospike can also
be installed on Compute Engine or GKE. Although these databases often require more
management, some might provide more exibility, possibly lower latency, and in some
cases, cross-region replication.
Context stores
The context store is often used to store counters, for example, frequency caps and
remaining budget. How often the data is refreshed in the context store varies. For example,
a daily cap can be propagated daily, whereas the remaining campaign budget requires
recalculating and propagating as soon as possible.
Cloud Bigtable
By using horizontal scaling, these stores can handle writes and reads at scale.
Infrastructure options for ad servers (part 3) (/solutions/infrastructure-options-for-ad-servers)
https://cloud.google.com/solutions/infrastructure-options-for-serving-advertising-workloads 17/21
26/10/2020 Infrastructure options for serving advertising workloads (part 1)
You set up budgets in the campaign management tool. You don't want your campaigns to
overspend because, most of the time, advertisers will not pay for those extra impressions.
But it can be challenging in a distributed environment to aggregate counters such as
remaining budget, especially when the system can receive hundreds of thousands of ad
requests per second. Campaigns can quickly overspend in a few seconds if the global
remaining budget is not consolidated quickly.
By default, a worker spends slices of the budget without knowing how much sibling workers
spent. That lack of communication can lead to a situation where the worker spends money
that is no longer available.
There are different ways to handle this problem. Both of the following options implement a
global budget manager, but they behave differently.
1. Notifying workers about budget exhaustion: The budget manager tracks spending
and pushes noti cations to each of the workers when the campaign budget has been
exhausted. Due to the likelihood of high levels of QPS, noti cations should happen
within a second in order to quickly limit overspend. The following diagram shows this
process works.
Budget Store
Events
(Impressions, Clicks)
Compute Engine
Refresh
Budget Manager
Compute Engine
Stop
Workers
Compute Engine
https://cloud.google.com/solutions/infrastructure-options-for-serving-advertising-workloads 18/21
26/10/2020 Infrastructure options for serving advertising workloads (part 1)
2. Recurrently allocating slices of budget to each worker: The budget manager breaks
the overall remaining budget into smaller amounts that are allocated to each worker
individually. Workers spend their own, local budget, and when it is exhausted, they can
request more. This option has a couple of advantages:
a. Before being able to spend again, workers need to wait for the budget manager
to allocate them a new amount. This approach prevents overspending even if
some workers remain idle for a while.
b. The budget manager can adapt the allocated amount sent to each worker based
on the worker's spending behavior at each cycle. The following diagram shows
this process.
Budget Store
Events
(Impressions, Clicks)
Compute Engine
Budget Manager
Compute Engine
Refresh
$1.5 $2 $1
Whichever option you choose, budget counters are calculated based on the pricing model
agreed upon by the publisher and advertiser. For example, if the model is:
CPM based, a billable impression sends an event to the system that decreases the
remaining budget based on the price set per thousand impressions.
CPC based, a user click sends an event to the system that decreases the remaining
budget based on the price set per click.
CPA based, a tracking pixel placed on the advertiser property sends an event to the
system that decreases the budget based on the price per action.
The number of impressions is often a few orders of magnitude higher than clicks. And the
number of clicks is often a few orders of magnitude higher than conversion. Ingesting these
events requires a scalable event-processing infrastructure. This approach is discussed in
https://cloud.google.com/solutions/infrastructure-options-for-serving-advertising-workloads 19/21
26/10/2020 Infrastructure options for serving advertising workloads (part 1)
Analytical store
(Unique) user data is processed o ine to determine the associated segments, which
in turn are copied to faster databases, such as the user pro le database, for serving.
This process is explained in exports
(/solutions/infrastructure-options-for-data-pipelines-in-advertising#exports).
Joining requests with impressions and (unique) user actions in order to aggregate
o ine counters used in a context store.
The reporting/dashboarding store is used in the user frontend and provides insight on how
well campaigns or inventories perform. There are different reporting types. You might want
to offer some or all of them, including custom analytics capabilities and semi-static
dashboards updated every few seconds or in real time.
You can use BigQuery for its analytics capabilities. If you leverage views
(/bigquery/docs/views-intro) to limit data access and share them
(/bigquery/docs/view-access-controls) accordingly with your customers, you can give your
platform users ad hoc analytical capabilities through your own UI or their own visualization
tools. Not every company offers this option, but it is a great addition to be able to offer your
customers. Consider using at-rate pricing (/bigquery/pricing# at_rate_pricing) for this use
case.
If you want to offer OLAP capabilities to your customers with millisecond-to-second latency,
consider using a database in front of BigQuery. You can aggregate data for reporting and
export it from BigQuery to your choice of database. Relational databases, such as those
supported by Cloud SQL, are often used for this purpose.
Because App Engine or any other frontend that acts on behalf of the user by using a service
account (/iam/docs/understanding-service-accounts), BigQuery sees queries as coming from a
single user. The outcome is that BigQuery can cache (/bigquery/docs/cached-results) some
queries and return previously calculated results faster.
https://cloud.google.com/solutions/infrastructure-options-for-serving-advertising-workloads 20/21
26/10/2020 Infrastructure options for serving advertising workloads (part 1)
Semi-static dashboards are also commonly used. These dashboards use pre-aggregated
KPIs written by a data pipeline process. Stores are most likely NoSQL-based, such as
Firestore (/ restore) for easier real-time updates
(https:// rebase.google.com/docs/ restore/query-data/listen), or caching layers such as
Memorystore (/memorystore). The freshness of the data will typically depend on the
frequency of the updates and duration of the window used to aggregate your data.
What's next
For information about managing data in ad tech, see Infrastructure options for data
pipelines in advertising (part 2)
(/solutions/infrastructure-options-for-data-pipelines-in-advertising).
For information about serving ad requests, see Infrastructure options for ad servers
(part 3) (/solutions/infrastructure-options-for-ad-servers).
For information about serving bid requests or building your own DSP, see
Infrastructure options for RTB bidders (part 4)
(/solutions/infrastructure-options-for-rtb-bidders).
Read this solution about using Google Cloud services from GKE
(/solutions/using-gcp-services-from-gke).
Try out other Google Cloud features for yourself. Have a look at our tutorials
(/docs/tutorials).
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0
License (https://creativecommons.org/licenses/by/4.0/), and code samples are licensed under the Apache
2.0 License (https://www.apache.org/licenses/LICENSE-2.0). For details, see the Google Developers Site
Policies (https://developers.google.com/site-policies). Java is a registered trademark of Oracle and/or its
a liates.
https://cloud.google.com/solutions/infrastructure-options-for-serving-advertising-workloads 21/21