Functional Bid Landscape Forecasting for Display Advertising
Functional Bid Landscape Forecasting for Display Advertising
Display Advertising⋆
†
Yuchen Wang, † Kan Ren, † Weinan Zhang, ‡ Jun Wang, † Yong Yu
†
Shanghai Jiao Tong University, ‡ University College London
{zeromike,kren,wnzhang,yyu}@apex.sjtu.edu.cn, [email protected]
1 Introduction
Popularized from 2011, real-time bidding (RTB) has become one of the most
important media buying mechanism in display advertising [7]. In RTB, each ad
display opportunity, i.e., an ad impression, is traded through a real-time auction,
where each advertiser submits a bid price based on the impression features and
the one with the highest bid wins the auction and display her ad to the user
[20]. Apparently, the bidding strategy that determines how much to bid for each
specific ad impression is a core component in RTB display advertising [16].
As pointed out in [22], the two key factors determining the optimal bid price
in a specific ad auction are utility and cost. The utility factor measures the value
of ad impression, normally quantified as user’s response rate of the displayed ad,
such as click-through rate (CTR) or conversion rate (CVR) [12]. The cost factor,
on the other hand, estimates how much the advertiser would need to pay to win
the ad auction [3]. From an advertiser’s perspective, the market price is defined
⋆
W. Zhang and Y. Yu are the corresponding authors.
2
as the highest bid price from her competitors1 . In the widely used second-price
auctions, the winner needs to pay the second highest bid price in the auction,
i.e., the market price [4]. Market price estimation is a difficult problem because
it is the highest bid from hundreds or even thousands of advertisers for a specific
ad impression, which is highly dynamic and it is almost impossible to predict
it by modeling each advertiser’s strategy [2]. Thus, the practical solution is to
model the market price as a stochastic variable and to predict its distribution
given each ad impression, named as bid landscape.
Previous work on bid landscape modeling is normally based on predefining
a parameterized distribution form, such as Gaussian distribution [18] or log-
normal distribution [3]. However, as pointed out in [19], such assumptions are
too strong and often rejected by statistical tests. Another practical problem is
the observed market price is right-censored, i.e., only when the advertiser wins
the auction, she can observe the market price (by checking the auction cost),
and when she loses, she only knows the underlying market price is higher than
her bid. Such censored observations directly lead to biased landscape models.
In this paper, we present a novel functional bid landscape forecasting model
to address the two problems. Decision tree is a method commonly used in data
mining [5],[17]. By building a decision tree, the function mapping from the auc-
tioned ad impression features to the corresponding market price distribution is
automatically learned, without any functional assumption or restriction. More
specifically, to deal with the categorical features which are quite common in on-
line advertising tasks, we propose a novel node splitting scheme by performing
clustering based on the attribute values, e.g., clustering and splitting the cities.
The learning criterion of the tree model is based on KL-Divergence [10] between
the market price distributions of children nodes. Furthermore, to model the
censored market price distribution of each leaf node, we adopt non-parametric
survival models [9] to significantly reduce the modeling bias by leveraging the
lost bid information.
The experiments on a 9-advertiser dataset demonstrate that our proposed
solution with automatic tree learning and survival modeling leads to a 30.7%
improvement on data log-likelihood and a 77.6% drop on KL-Divergence com-
pared to the state-of-the-art model [18].
In sum, the technical contributions of this paper are three-fold.
Automatic function learning: a decision tree model is proposed to automat-
ically learn the function mapping from the input ad impression features to
the market price distribution, without any functional assumption.
Node splitting via clustering: the node splitting scheme of the proposed
tree model is based on the KL-Divergence maximization between the split
data via a K-means clustering of attribute values, which naturally bypasses
the scalability problem of tree models working on categorical data.
Efficient censorship handling: with a non-parametric survival model, both
the data of observed market prices and lost bid prices are fed into the decision
1
The terms ‘market price’ and ‘winning (bid) price’ are used interchangeably in re-
lated literature [2],[3],[18]. In this paper, we use ‘market price’.
3
tree learning to reduce the model bias caused by the censored market price
observations.
The rest of this paper is organized as follows. We discuss some related work
and compare with ours in Section 2. Then we propose our solution in Section 3.
The experimental results and detailed discussions are provided in Section 4. We
finally conclude this paper and discuss the future work in Section 5.
2 Related Work
Learning over Censored Data. In machine learning fields, dealing with cen-
sored data is sometimes regarded as handling missing data, which is a well-
studied problem [6]. The item recommendation task with implicit feedback is a
classic problem of dealing with missing data. [15] proposed a uniform sampling
of negative feedback items for user’s positive ones [14]. In the online advertising
field, the authors in [18] proposed a regression model with censored regression
module using the lost auction data to fix the biased data problem. However, the
Gaussian conditional distribution assumption turns out to be too strong, which
results in weak performance in our experiment. The authors in [2] implemented a
product-limit estimator [9] in handling the data censorship in sponsored search,
but the bid landscape is built on search keyword level, which is not fine-grained
to work on RTB display advertising. We transfer the survival analysis method
from [2] to RTB environment and compare with [18] in our experiment.
3 Methodology
3.1 Problem Definition
The goal of bid landscape forecasting is to predict the probabilistic distribution
density (p.d.f.) px (z) w.r.t. the market price z given an ad auction information
represented by a high-dimensional feature vector x.
Each auction x contains multiple side information, e.g. user agent, region,
city, user tags ,ad slot information, etc. In Table 1, we present the attributes
contained in the dataset with corresponding numbers of value. We can easily
find that different attributes vary in both diversity and quantity. Moreover, the
bid price distribution of a given request may be diverse in different attributes.
Take the field Region as an example, the bid distribution of the samples with
region in Beijing is quite different from that of Xizang, which is illustrated
in Figure 1. Previous work focuses only on the heuristic forms (e.g. log-normal
[3] or a unary function [22]) of distribution and cannot effectively capture the
divergency within data.
Moreover, in RTB marketplace, the advertiser proposes a bid price b and wins
if b > z paying z for the ad impression, loses if b ≤ z without knowing the exact
value of z, where z represents the market price which is the highest bid price
from the competitors. Apparently, the true market price is only observable for
who is winning the corresponding auction. As for the lost auctions, the advertiser
only knows the lower bound of the market price, which results in the problem
of right censored data [2]. The censorship from the lost auctions may heavily
influence the forecasting performance in the online prediction [18].
In this paper, we mainly settle down these two problems. First, we propose
to automatically build the function mapping from the given ad auction x to
5
winning probability
0.08
0.6
0.06
0.4
0.04
0.02 0.2
Beijing
Xizang
0.000 50 100 150 200 250 300 0.00 50 100 150 200 250 300
market price market price
the market price distribution px (z) without any functional form assumption,
generally represented as
px (z) = Tp (x). (1)
Second, we leverage both the observed market price data of winning auctions
and censored one of losing auctions to train a less biased function Tp (x).
We use a binary decision tree to represent Tp (x). More precisely, every node
represents a set of auction samples. For each node Oi , we split the contained
samples into two sets {Sij t
} according to attribute Aj (e.g. Region) value sets
(e.g. {Xizang, Beijing, . . . }), where t ∈ {1, 2}, Aj ∈ Θ and Θ is the attribute
t
space. For each subset Sij , the corresponding market price distribution ptx (z)
can be statistically built. Intuitively, different subsets have diverse distributions
and the samples within the same subset are similar to each other, which requires
effective clustering and node splitting scheme. Furthermore, KL-Divergence [10]
is a reasonable metric to measure the splitted data divergency. So that we choose
i
the best splitting πi with the highest KL-Divergence value DKL calculated be-
1 2
tween the resulted two subsets Si· and Si· in node Oi . Essentially, our goal is to
seek the splitting strategy π = ∪i∈I πi , where each splitting action πi maximizes
the KL-Divergence DKL between two child sets in node Oi . Mathematically, our
functional bid landscape forecasting system is built as
∑
l
Tpπ (x) = arg max i
DKL , (2)
π
i=1
ij
i
DKL = i1
max{DKL , DKL i2
, ..., DKL iN
, ..., DKL } (3)
z∑max
ij px (z)
DKL = px (z) log , (4)
z=1
qx (z)
where p and q are the two probability distributions for the splitted subsets, zmax
ij
represents the maximum market price, DKL means the maximum KL-Divergence
of splitting over the sample set of attribute Aj in node i, N = |Θ| is the number
of attributes and l is the number of splitting nodes.
6
When forecasting, every auction instance will follow a path from the root to
the leaf, classified by the dividing strategy according to the attribute value it
contains. The bid landscape px (z) is finally predicted at the leaf node.
Building the Decision Tree. The combined scheme of building decision tree
based on K-means clustering node splitting is described in Algorithm 2. In Algo-
rithm 2, we first find the splitting attribute with highest KL-Divergence. Then,
we perform the binary splitting of the data by maximizing KL-Divergence be-
tween two leaf nodes with K-means clustering. The sub-tree keeps growing until
the length of sample data in leaf node is less than a predefined value. Finally, we
prune the tree by using reduced error pruning method. Compared with the deci-
sion tree algorithm, the main difference of our proposed scheme is that the binary
node splitting scheme with K-means clustering and the usage of KL-Divergence
as the attribute selection criteria.
In the test process, one problem is that there could be new attribute values
of some test data instances which do not match any nodes of our decision tree
learned from the training data. To handle this, we deploy a randomly choosing
method which decides the attribute value of the given test data to randomly
goes to one of the two children, which is equivalent to non-splitting on such
attribute. The experiment results show that such random method works well on
the real-world dataset.
8
Figure 2 is an example of the decision tree. As we can see, for each node, we
illustrate its best splitting attribute and the corresponding KL-Divergence. The
red box shows the KL-Divergence value for each attribute, and the best splitting
attribute with the highest KL-Divergence is chosen.
In real-time bidding, an advertiser only observes the market prices of the auctions
that she wins. For those lost auctions, she only knows the lower bound of the
market price, i.e., her bid price. Such data is named as the right censored data
[18]. However, the partial information in lost auctions is still of high value. To
better estimate the bid distribution, we introduce survival models [8] to model
the censored data. We implement a non-parametric method to model the real
market price distribution and transfer survival analysis from keywork search
advertising [2] to RTB environment. That is, given the observed impressions
and the lost bid requests, the winning probability can be estimated with the
non-parametric Kaplan-Meier Product-Limit method [9].
Suppose we have sequential bidding logs in form of {bi , wi , mi }i=1,2,...,M ,
where bi is the bidding price in the auction, wi is the boolean value of whether
we have won the auction or not, and mi is the market price (unknown if wi =
0). Then we transform our data into the form of {bj , dj , nj }j=1,2,...,N , where the
bidding price bj < bj+1 , and dj represents the number of the winning auctions
with bidding price bj − 1, nj is the number of auctions that cannot not be won
with bidding price bj − 1. Then the probability of losing an auction with bidding
price bx is
∏ nj − d j
l(bx ) = . (5)
nj
bj <bx
9
Thus the winning probability w(bx ) and the integer2 market price p.d.f. p(z) are
∏ nj − dj
w(bx ) = 1 − , p(z) = w(z + 1) − w(z). (6)
nj
bj <bx
4 Experiments
In this section, we introduce the experimental setup and analyze the results3 .
We compare the overall performance over 5 different bid landscape forecasting
models, and further analyze the performance of our proposed against different
hyperparameters (e.g. tree depth, leaf size).
4.1 Dataset
For the following experiments, we use the real-world bidding log from iPinYou
RTB dataset4 . It contains 64.7M bidding records, 19.5M impressions, 14.79K
clicks and 16.0K CNY expense on 9 campaigns from different advertisers during
10 days in 2013. Each bidding log has 26 attributes, including weekday, hour,
user agent, region, slot ID etc. More details of the data is provided in [13].
In order to simulate the real bidding market and show the advantages of our
survival model, we take the original data of impression log as full-volume auc-
tion data, and perform a truthful bidding strategy [12] to simulate the bidding
process, which produces the winning bid dataset W and lost bid dataset L re-
spectively. For each data sample xwin ∈ W , the simulated real market price zwin
is known for the advertisers, while the corresponding market price zlose remain-
ing unknown for xlose ∈ L. It guarantees the similar situation as that faced by
all the advertisers in the real world marketplace.
In the test phase, the corresponding market price distribution px (z) of each
sample x in the test data is estimated by all of the compared models respectively.
We assess the performance of different settings in several measurements, as listed
in the next subsection. Finally we study the performance of our proposed model
with different hyperparameters, e.g., the tree depth and the maximum size of
each leaf.
The goal of this paper is to improve the performance of market price distribution
forecasting. We use two evaluation methods to measure the forecasting error. The
2
In practice, the bid prices in various RTB ad auctions are required to be integer.
3
The experiment code is available at http://goo.gl/h130Z0.
4
Dataset link: http://data.computational-advertising.org.
10
first one is Average Negative Log Probability (ANLP). After we classifying each
sample data into different leaves with the tree model, the sum of log probability
for all sample data Pnl is given by the Eq. (7), and the average negative log
probability P̄nl given by the P̄nl :
k z∑
∑ max
∑
k z∑
max
where k denotes the number of sub bid landscapes, zmax represents the maximum
market price, Pij means the probability of training sample in the ith leaf node
given price j, Nij is the number of test sample in the ith leaf node given price
j. N is the total number of test samples.
We also calculate the overall KL-Divergence to measure the objective fore-
casting error. DKL is given by the Eq. (9):
1 ∑
k z∑max
Pij
DKL = Ni Pij log , (9)
N i=1 j=1
Q ij
where Ni means the number of test sample in the ith leaf node. Qij means the
probability of test sample in the ith leaf node given price j.
NM - The Normal Model predicts the bid landscape based on the observed
market prices from simulated impression log W , without using the lost bid
request data in L. This model uses a non-parametric method to directly draw
the probability function w.r.t. the market price from the winning dataset.
SM - The Survival Model forecasts the bid landscape with survival analysis,
which learns from both observed market prices from impression log and the
lost bid request data using Kaplan-Meier estimation [2]. The detail has been
discussed in Section 3.3.
MM - The Mix Model uses linear regression and censored regression to predict
the bid landscape respectively, and combines two models considering winning
probability into Mixture Model [18] to predict the final bid landscape.
NTM - The Normal Tree Model predicts the bid landscape using only our pro-
posed tree model, without survival analysis. The detailed modeling method
has been declared in Section 3.2.
STM - The Survival Tree Model predicts the bid landscape with the proposed
survival analysis embedded in our tree model, which is our final mixed model.
11
1.0 Sub bid landscape 1 0.16 Sub bid landscape 2 0.30 Sub bid landscape 3 0.30 Sub bid landscape 4
Train 0.14 Train Train Train
0.8 Test Test 0.25 Test 0.25 Test
market price probability
Data Analysis. Table 2 shows the overall statistics of the dataset, where each
row presents the statistical information of the corresponding advertiser in the
first column. In Table 2, Num of bids is the number of total bids, and Num of
win bids is the number of winning bids in the full simulated dataset W ∪ L. WR
is the winning rate calculated by |W|W∪L|
|
. AMP is the average market price on all
bids. AMP on W and AMP on L are the average market price for the winning
bid set W and the lost bid set L, respectively.
We can easily find that the winning rates of all campaigns are low, which
is practically reasonable since a real-world advertiser can only win a little pro-
portion of the whole-world volume. The market prices of most impressions are
unavailable to the advertiser. We also observe that the average market price
on winning bids (AMP on W ) are much lower than average market price on
lost bids (AMP on L). This verifies the bias between the observed market price
distribution and the true market price distribution.
Bid Landscapes of Leaf Nodes. There are 4 examples of bid landscape be-
tween training and testing samples shown on Figure 3. From the figures, we can
find that the bid landscape of each leaf node is quite different from that of other
leaf nodes. Especially, some sub bid landscape tends to have a large probability
of some price, and the training distribution fit the test distribution very well.
This result suggests we can predict the bid landscape more accurately with tree
models.
12
0.20 campaign 2259 1.0 campaign 2259 0.20 campaign 2261 1.0 campaign 2261
NM NM
Estimated market price probability
Fig. 4. Comparison of the curves of market price distribution and winning probability.
ANLP KLD
Campaign MM NM SM NTM STM MM NM SM NTM STM
1458 5.7887 5.3662 4.7885 4.7160 4.3308 0.7323 0.7463 0.2367 0.6591 0.2095
2259 7.3285 6.7686 5.8204 5.4943 5.4021 0.8264 0.9633 0.3709 0.8757 0.1668
2261 7.0205 5.5310 5.1053 4.4444 4.3137 1.0181 0.4029 0.2943 0.3165 0.1222
2821 7.2628 6.5508 5.6710 5.4196 5.3721 0.7816 0.9671 0.3562 0.6170 0.2880
2997 6.7024 5.3642 5.1411 5.1626 5.0944 0.7450 0.4526 0.1399 0.3312 0.1214
3358 7.1779 5.8345 5.2771 4.8377 4.6168 1.4968 0.8367 0.5148 0.8367 0.3900
3386 6.1418 5.2791 4.8721 4.6698 4.2577 0.8761 0.6811 0.3474 0.6064 0.2236
3427 6.1852 4.8838 4.6453 4.1047 4.0580 1.0564 0.3247 0.1478 0.3247 0.1478
3476 6.0220 5.2884 4.7535 4.3516 4.2951 0.9821 0.6134 0.2239 0.5650 0.2238
overall 6.5520 5.6635 5.0997 4.7792 4.6065 0.9239 0.6898 0.2927 0.5834 0.2160
Model MM NM SM NTM
STM < 10−6 < 10−6 < 10−6 < 10−6
outperforms NM, and STM outperforms SM, which means the tree model effec-
tively improves the performance of bid landscape forecasting. (iv) STM is the
combination of SM and NTM, both of which contribute to a better performance
as is mentioned in (ii) and (iii). Thus it is reasonable that STM has the best
performance. It has both advantages of SM and NTM, i.e., dealing with the bid
distribution difference between different attribute value and learning from the
censored data.
For KLD, we can also find that STM achieves the best performance. The
results of other models are also similar to those of ANLP, but there are some in-
teresting differences. Note that for campaign 3427, the KL-Divergence values of
NM and NTM are equal to each other, so do SM and STM. The KL-Divergence
values of SM and STM for Campaign 3476 are also nearly the same. That is
because the optimal depth of tree in these cases is 1. We shall notice that ac-
tually NM and SM are the special cases of NTM and STM respectively when
the tree depthes of the latter two models are equal to 1. The fact arouses the
question, how to decides the optimal tree depth? We here take the tree depth as
a hyperparameter, and we leave the detailed discussion in the next subsection.
As is mentioned above, in terms of KLD, SM and STM for campaign 3476 are
actually the same model since the optimal tree depth of STM for campaign 3476
is 1. One may still find that the KLD of SM and STM for campaign 3476 is a
little different. That is caused by the handling method of missing feature values
in training data, which is described in 3.2. As the experiment result shows, the
influence is negligible.
We deploy a t-test experiment on negative log probability between our pro-
posed model STM and each of other compared settings to check the statistical
significance of the improvement. Table 4 shows that the p-value of each test is
lower than 10−6 , which means the improvement is statistically significant. The
significant test on KL-Divergence is not performed because KLD is not a metric
calculated based on each data instance.
6.0 Campaign 3358 Leaf size 3000 6.0 Campaign 3358 Leaf size 6000 6.0 Campaign 3358 Leaf size 10000
NTM NTM NTM
Table 5. The average optimal tree depth and leaf numbers for different models.
160 Campaign 3358 Leaf size 3000 120 Campaign 3358 Leaf size 6000 90 Campaign 3358 Leaf size 10000
140 80
100
120 70
100 80 60
Leaf number
Leaf number
Leaf number
50
80 60
40
60 40 30
40 20
NTM 20 NTM NTM
20 STM STM 10 STM
0 0 0
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
Tree depth Tree depth Tree depth
250 Campaign 3427 Leaf size 3000 160 Campaign 3427 Leaf size 6000 140 Campaign 3427 Leaf size 10000
140 120
200
120 100
100
Leaf number
Leaf number
Leaf number
150 80
80
100 60
60
40 40
50 NTM NTM NTM
20 20
STM STM STM
0 0 0
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
Tree depth Tree depth Tree depth
Fig. 6. Relationship between the leaf number and the tree depth.
winning bids and lost bids. As the tree grows, it will reach the best performance
earlier than NTM, which only learns from the winning bids.
We also experimentally illustrate the EM convergence of the tree model in
Figure 7, which shows the value changes of KL-Divergence over EM training
rounds. We observe that our optimization converges within about 6 EM rounds,
and the fluctuation is small. In our experiments, the EM algorithm is quite
efficient and converges quickly. The average training rounds of our EM algorithm
is about 4.
model significantly improves the forecasting performance over the baselines and
the state-of-the-art models in various metrics.
In the future work, we plan to combine the functional bid landscape fore-
casting with utility (e.g. click-through rate, conversion rate) estimation model,
aiming to make more reasonable and informative decisions in bidding strategy.
References
1. Agarwal, D., Ghosh, S., Wei, K., You, S.: Budget pacing for targeted online adver-
tisements at linkedin. In: KDD (2014)
2. Amin, K., Kearns, M., Key, P., Schwaighofer, A.: Budget optimization for spon-
sored search: Censored learning in mdps. arXiv preprint arXiv:1210.4847 (2012)
3. Cui, Y., Zhang, R., Li, W., Mao, J.: Bid landscape forecasting in online ad exchange
marketplace. In: KDD (2011)
4. Edelman, B., Ostrovsky, M., Schwarz, M.: Internet advertising and the general-
ized second price auction: Selling billions of dollars worth of keywords. Tech. rep.,
National Bureau of Economic Research (2005)
5. Faddoul, J.B., Chidlovskii, B., Gilleron, R., Torre, F.: Learning multiple tasks with
boosted decision trees. In: ECML-PKDD (2012)
6. Garcı́a-Laencina, P.J., Sancho-Gómez, J.L., Figueiras-Vidal, A.R.: Pattern classi-
fication with missing data: a review. Neural Computing and Applications (2010)
7. Google: The arrival of real-time bidding (2011)
8. Johnson, N.L.: Survival models and data analysis. John Wiley & Sons (1999)
9. Kaplan, E.L., Meier, P.: Nonparametric estimation from incomplete observations.
Journal of the American statistical association (1958)
10. Kullback, S.: Letter to the editor: the kullback-leibler distance (1987)
11. Lee, K.C., Jalali, A., Dasdan, A.: Real time bid optimization with smooth budget
delivery in online advertising. In: ADKDD (2013)
12. Lee, K.c., Orten, B.B., Dasdan, A., Li, W.: Estimating conversion rate in display
advertising from past performance data (2012)
13. Liao, H., Peng, L., Liu, Z., Shen, X.: ipinyou global rtb bidding algorithm compe-
tition dataset. In: ADKDD (2014)
14. Marlin, B.M., Zemel, R.S.: Collaborative prediction and ranking with non-random
missing data. In: RecSys (2009)
15. Pan, R., Zhou, Y., Cao, B., Liu, N.N., Lukose, R., Scholz, M., Yang, Q.: One-class
collaborative filtering. In: ICDM (2008)
16. Perlich, C., Dalessandro, B., Hook, R., Stitelman, O., Raeder, T., Provost, F.: Bid
optimizing and inventory scoring in targeted online advertising. In: KDD (2012)
17. Stojanova, D., Ceci, M., Appice, A., Džeroski, S.: Network regression with predic-
tive clustering trees. In: ECML-PKDD (2011)
18. Wu, W.C.H., Yeh, M.Y., Chen, M.S.: Predicting winning price in real time bidding
with censored data. In: KDD (2015)
19. Yuan, S., Wang, J., Chen, B., Mason, P., Seljan, S.: An empirical study of reserve
price optimisation in real-time bidding. In: KDD (2014)
20. Yuan, S., Wang, J., Zhao, X.: Real-time bidding for online advertising: measure-
ment and analysis. In: ADKDD (2013)
21. Zhang, W., Wang, J.: Statistical arbitrage mining for display advertising. In: KDD
(2015)
22. Zhang, W., Yuan, S., Wang, J.: Optimal real-time bidding for display advertising.
In: KDD (2014)