0% found this document useful (0 votes)
1K views23 pages

Russell Ovans Gameanalytics Retention

Uploaded by

Esteban Braganza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views23 pages

Russell Ovans Gameanalytics Retention

Uploaded by

Esteban Braganza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Game analytics 100:

The retention curve


by Russell Ovans, East Side Games April 2024

Gameanalytics.com
Table of contents
2

Introduction

Portions of this paper previously appeared in the book Game


Analytics: Retention and Monetization in Free-to-Play Games.
3

Preliminary definitions

Reprinted with permission of Thought Pilots.

How GameAnalytics displays retention

7
The retention curve

8
Fitting a retention curve in Excel

10

Constructing a retention curve in


Tableau from (more) historical data

13

Predicting DAU with a retention curve

17

Player duration as the summation of


the retention curve

20

Retention benchmarks

21 Summary

1
Introduction
Retaining customers is the life blood of any business, let alone a daily active users (DAU) from a constant number of daily installs;
game studio. Generating new customers is an expensive and, the summation of this curve is the expected number of
endeavour as it requires outlays of cash on advertising. As such, distinct days a new user will play your game.
it is generally more economical to keep the customers you have
than it is to buy new ones. Or, as my Uncle Bob explained to me
years ago, “Once the customer enters your business, all of your
effort shifts to selling them one thing: a return visit.”

The topic of this paper is mobile game customer retention: how


do we quantify and model the rate at which users return to play
our games? You can only monetize the users you have, so if your
game doesn’t retain its players, you will struggle to generate
revenue. Most game analysts are surely familiar with the notion
of day-n retention; i.e., the proportion of your players who return
to play exactly n days after they install and play their first
session. For example, if 100 users install your game on July 1st,
and 20 of those players return to play on July 8th, then day 7
retention (D7R) is 0.20. In this paper, we generalize the concept
of day-n retention with a retention curve, a simple formula to
model and predict player retention for any day after install. We
describe how a retention curve is derived from a set of historical
data, plus two important applications of the curve: predicting

2
Preliminary definitions
Data analysts tend not to think too much about individual published by East Side Games, a good D1R is around 40%. For
players. Instead, descriptive statistics are drawn from a group of D7R, we aim for 20%.

players called a cohort. A cohort is a set of players who have  

something in common. Normally, this is their install date, but Dn retention as a KPI is measured at a standard set of days
additional attributes can be used to determine membership in a since install, typically for n ∈ {1, 7, 30, 90} . But in general, DnR
cohort. For example, a cohort might consist of all the Android can be measured for any day since install as
players from the US who installed version 1.26.5 on July 1, 2023.
Cohorts define the players used in the calculation of averages DAUn
DnR=
and other descriptive statistics that make up key performance installs
indicators (KPIs) such as day-n retention.

where installs is the size of the cohort, and DAUn is a count of


Retention is a KPI that is modified with a “days since install” the daily active users from the cohort who played on the nth day
index, which we denote with the variable n. You never talk about after their install date. Note that DAU0 ≡ installs.

retention without specifying a particular day after a cohort’s


install date, which is indicated by “Dn.” Dn retention (DnR) is the For a cohort to have a value for any Dn retention metric their
proportion – often expressed as a percentage – of a cohort that install date must be more than n days ago. For example, we
plays exactly n days after their install date: not the day before, can’t calculate D7R for a cohort of installs until eight days after
nor the day after. By definition, D0R is always 1.0 since the their install date, at which point we say the cohort and its users
install date is equivalent to the date of a player’s first session. are “seven days fully baked.” If a cohort is not fully baked with
D1R is the proportion of installs who played at any time in the respect to a KPI, the value is null.

calendar date immediately following their install date. The


higher your retention, the better. For the idle-genre titles A time series is made up of successive measurements – over

3
9%
consistent intervals – of the same variable. When charted or
graphed, time is the independent variable and occupies the x-
axis. The x-axis usually refers to a specific date when a game
8%
was played; e.g., DAU by day. But for other time-series KPIs, the
x-axis can represent a cohort install date that measures the
evolving, future behaviour of only those users who installed the
game on that date. Day-n retention is one such cohort-based 7%

time series. If we generate a time-series graph of D30 retention,


the value on July 1st represents the proportion of the installs
from that date who played on July 31st. The value on July 2nd 6%

represents the proportion who installed the game that date and
played on August 1st, and so on. (The presented data is
representative, and not from any particular game.) 5%
Jul 9 Jul 12 Jul 15 Jul 18 Jul 21 Jul 24 Jul 27 Jul 30

4
Retention is an aggregate measure applied to a group of players. Rather than a time series of daily retention values, a retention
But at the individual user level, Dn retention is simply a binary profile for a game as a whole is constructed by taking a
variable: the user either played on that day (indicated by the weighted average of Dn retention values over multiple install
value 1), they did not play (indicated by a 0), or if not yet fully cohorts. Given a database table similar to the one above, this is
baked, we don’t know yet (indicated by a null value). Here is trivial to compute with a SQL query, the result set of which might
sample retention data for some random players. appear as follows:

user_id install_date d1 d7 d30 d90 d1r d7r d30r d90r

858993 2017-04-23 0 1 0 0 0.3712 0.1626 0.0787 0.0394

2630131 2017-06-28 1 0 0 1
Typically, users are taken from a range of consecutive install
8554854 2021-11-19 0 0 0 0 dates (as specified by a WHERE clause in the SQL) that are at
226962 2017-04-20 1 1 1 0 least 90 days old to ensure only fully-baked cohorts are included,
but if not, the nulls are conveniently skipped over and not
7706121 2021-02-06 0 0 0 0
included in the calculation of the average.
9025650 2023-09-28 0 0 0 0

3221035 2017-08-19 1 1 0 0

5
How GameAnalytics displays retention
In GameAnalytics, retention metrics are displayed using cohorts,
which are groups of players who share common attributes, such
as install date or specific in-game actions. The retention
metrics, such as day-n retention (DnR), are calculated based on
the behavior of these cohorts over time. By default,
GameAnalytics allows users to view retention metrics via default
triggers, such as the first session after install.

GameAnalytics Pro users have additional options, including the


ability to define custom start trigger event conditions and
custom return trigger event conditions. This allows for more
granular analysis and customization of retention metrics based
on specific player actions or behaviors.

Additionally, GameAnalytics provides the possibility to apply


global filters to easily alter static dimensions like country,
enabling users to analyze retention metrics across different
player segments.

By utilizing GameAnalytics, developers can gain insights into


player retention patterns, identify areas for improvement, and
optimize their games to enhance player engagement and
satisfaction.

6
The retention curve
1.0
By convention, the days 1, 7, 30, and 90 after install are included
in a game’s core set of KPIs. But what about all the other days in
between or after these specific days? We could measure every 0.9

DnR from the cohort data, but is there a closed form function
that can tell us with reasonable accuracy what Dn retention is 0.8

for any day n? It turns out there is, and this function defines the
retention curve. The retention curve is a mechanism to fully 0.7

describe a historical retention profile and predict the expected


number of days each new user will engage with your game 0.6

before churning.

0.5

Retention curves are built from a retention profile of observed


Dn values. Each value is a dot, and the retention curve connects 0.4

the dots. Here is a curve (the green line) fit to observed retention
values of 0.4, 0.23, and 0.16 at D1, D3, and D7, respectively.

0.3

Unlike the time-series graph introduced previously, the x-axis is


0.2
n, representing a discrete day since install. Note how the curve
predicts retention values well beyond the data points used in its
0.1
construction. The curves that best fit mobile game retention

Day 0 Day 1 Day 3 Day 7

7
profiles tend to be real-valued power functions. The generic b that minimize the residual error between the observations (the
form of a retention power curve r is a function r: ℕ→[0, 1] of actual data) and the predictions (the values returned by the
days since install n: function). We now consider two tools a data analyst might
employ in performing this regression: Excel and Tableau.

r(n)=anb

where a ∈ (0, 1], b ∈ [-1, 0), r(0)=1.0.

Fitting a retention curve in Excel

The parameters a and b are referred to as the coefficient and


Assume we have the following observed retention profile for a
exponent, respectively. The coefficient’s value mimics D1R. The
game in soft launch, which we have entered into an Excel
exponent is negative, which means that retention starts out high
spreadsheet. The observations are for D1, D3, and D7 retention.
but decays over time. Values returned by the function are
proportions between 0 and 1. For example, a retention curve for
a mobile game might be:

r(n)=0.4n-0.5

This specific function evaluates to a D1 retention of 40.0%, D7R


of 15.12%, and D180R of 2.98%. But it can also calculate
retention for any arbitrary day after install; e.g., D53R is 5.49%.

What is a power function r(n)=anb that best describes these data


Fitting a retention curve r(n)=anb to a set of DnR observations is points? In Excel, we can deploy the LINEST function to fit a line
an exercise in statistical regression: determine values for a and or curve to an array of known y- and x-values, where y is the

8
dependent variable and x is the independent variable. In our Our retention curve is thus:
case, n (the days since install) is the independent variable. As its
name suggests, the LINEST function – by default – fits a line r(n)=0.396n-0.472
and returns an array (two adjacent cells) containing the slope
And that is how you use Excel to determine a formula for a
and y-intercept of the formula that best fits the data. To instead
game’s retention curve based on a small sample of DnR values.
fit a power function to our data, we need to pass LINEST the
Congratulations – achievement unlocked!

logarithm of the array of known values. Because the logarithm


of 0 is undefined, we only include three data points:

This retention curve – based on observed D1, D3, and D7


retention metrics only – predicts the following values for long-
=LINEST(LN(B3:B5),LN(A3:A5))
tail retention:

Day n Predicted Dn Retention

14 0.114

30 0.079

60 0.057

90 0.047
If we enter that function into cell A7, Excel returns the exponent
180 0.034
b and the natural log of the coefficient a in the cells A7 and B7,
respectively. Cell C7 contains the function =EXP(B7) in order to 360 0.025
convert the coefficient to the correct form.
720 0.018

9
1.8% D720 retention? Really? When modeled as a power days_since_install players installs retention
function, there is no terminal day beyond which every player is
0 7891 7891 1.0
guaranteed to have churned, i.e., r(n)>0 for all n ∈ ℕ. Depending
on your game’s live operations, elder-game mechanics, and 1 2929 7891 0.3712
content release schedule, this may or may not be a reasonable
2 2120 7891 0.2687
assumption. 

3 1769 7891 0.2242

4 1559 7891 0.1976


Constructing a retention curve in Tableau from (more)
5 1409 7891 0.1786
historical data

6 1326 7891 0.1680


How well can we trust these estimates for D90+ retention? They
7 1283 7891 0.1626
seem optimistic, perhaps owing to the very early and limited
number of data points. With more and later observations of ... ... ... ...
retention, the regression analysis should become more 28 658 7891 0.0834
accurate. Let’s consider a mature game that launched 90 days
ago and start by capturing retention data for each user’s first 30 29 651 7891 0.0825

days. We include only those users who had a chance to play 30 30 621 7891 0.0787
days after install (i.e., those who installed between 90- and 31-
days prior) and simply count the number of those users who Notice we do not cohort by install date – all we care about is how
played n days after their install date. many users played exactly n days after they installed, regardless
of each user’s specific install date.

10
1

The first row (days_since_install = 0) is the total number of users


who installed between 90- and 31-days prior. In this case, 7,891
installs are included in our analysis; this is the size of the entire
retention 0.5
cohort spanning multiple install dates. To calculate retention as
a proportion of cohort size, we divide the players by the installs.
For example, we see that 1,283 played seven days after they
installed: D7 retention is 1283/7891= 6.26%.

0
Utilizing Tableau to explore this dataset, we plot retention as a 0 2 4 6 8 10 12 14 16 18 21 23 25 27 29

function of days_since_install (see the top chart).

days_since_install

Note: Some might object to the use of a line chart instead of a bar
chart given that the x-axis is discrete and not continuous; i.e., we
don’t deal in fractional days since install. My use of a continuous
line is not meant to imply that something is happening between
samples; it is simply an aid to help visualize a trend in the data. retention 0.5

Besides, Tableau won’t overlay a trend line on a bar chart.

Now from the Analytics tab we add a Trend Line of type Power to
fit a curve to the data (see the bottom chart).
0
0 2 4 6 8 10 12 14 16 18 21 23 25 27 29

days_since_install

11
Tableau uses the same method of least-squares as Excel to find 1
values for the coefficient a and exponent b that best fits these
observations. If we hover over the dashed line, a pop-up
indicates the formula that Tableau has decided is the best fit for
our data.

retention=0.371848*days_since_install^-0.442077

R-Squared: 0.99592

P-value: < 0.0001


0.5

To three decimal places of accuracy, we have:

r(n)=0.372n-0.442

This curve predicts D90R of 5.09% and D180R of 3.75%. We’ll 0

take that!

12
Predicting DAU with a retention curve
By now we should all be very comfortable with the idea that a n DAU
retention curve is a model – derived from historical data –
0 100 100
defined by a power function r(n)=anb . This function defines the
probability a player has a session exactly n days after their 1 40 100 140
install date. Therefore, when applied to a cohort of installs, the
2 28 40 100 168
expected number of DAU from that cohort on day n after install
is: 3 23 28 40 100 191

4 20 23 28 40 100 211
DAUn=r(n)*cohort size
5 18 20 23 28 40 100 229
Calculating the steady-state daily active users given a retention
6 16 18 20 23 28 40 100 245
formula and a constant number of daily new installs can be
accomplished with lots of copying and pasting in a spreadsheet, 7 15 16 18 20 23 28 40 100 260
but that approach is tedious and error prone. For example,
assume a new game launches with a retention curve defined by
0.4n-0.5 and 100 installs per day. After seven days, a spreadsheet
can tell us the expected total DAU from the overlapping cohorts
is 260. See the table, where each column represents a single
install cohort, and the last column is the total DAU by day n after
launch.

13
By examination of this table, we notice a pattern: the DAU by day
DAU_n <- function(n)

n is a function of only the original cohort’s retained users and {

the DAU on the prior day. For example, DAU after seven days is if (n == 0) {

return (100)

260, comprised of 15 players still remaining from the very first }

cohort, plus the 245 DAU from the prior day. This pattern is return (round(100 * 0.4 * n ^ -0.5) + DAU_n(n-1))

}


succinctly expressed as a recurrence relationship:


R> for (i in 0:7) print(c(i, DAU_n(n=i)))

DAU0=100
[1] 0 100

[1] 1 140

DAUn=(r[n]*DAU0)+DAUn-1 [1] 2 168

[1] 3 191

What is this sorcery! I’m not enough of a mathematician to come [1] 4 211

[1] 5 229

up with a closed form solution to that recurrence relation, but as [1] 6 245

a recovering computer scientist I know how to convert it to a [1] 7 260

recursive function. Here it is in R:

14
Goodbye spreadsheet! It’s a straightforward exercise to
R> options(digits=4)

generalize this function to work for any retention curve – a R> ret_curve(a=0.4, b=-0.5, n=7)

power function parameterized by coefficient a and exponent b –


[1] 1.0000 0.4000 0.2828 0.2309 0.2000 0.1789 0.1633 0.1512


and any number of installs per day. First we define two


functions: one to generate a list of a retention curve’s daily R> sum_ret_curve(a=0.4, b=-0.5, n=7)


values, and another to generate a list of the running sum of [1] 1.000 1.400 1.683 1.914 2.114 2.293 2.456 2.607
these daily values. Both functions are recursive and work
backwards from n to 0.
Finally, we can compute a list of expected DAU from game
ret_curve <- function(a, b, n)

launch to n days after, assuming installs/day and a retention


{
curve defined by anb:
if (n <= 0) return (1)

return(c(ret_curve(a, b, n-1), a * n^b))

dau <- function(installs, a, b, n)

sum_ret_curve <- function(a, b, n)


floor(installs * sum_ret_curve(a, b, n))

{
}
if (n >= 0)

ret_curve(a, b, n) + c(0, sum_ret_curve(a, b, n-1))

}
For example, the predicted DAU after 30 days for a new game
with 500 installs per day and a retention profile of a=0.372,
Here are the functions in action, revealing r(n) and ∑ r(n) for the b=-0.440 is 2,510:
first n=7 days after install for a retention curve defined by a=0.4
and b=-0.5.

15
R> dau(installs=500, a=0.372, b=-0.442, n=30)


[1] 500 686 822 937 1038 1129 1213 1292 1366 1437 1504
1568 1630 1690 1748

[16] 1804 1859 1912 1964 2014 2064 2112 2160 2206 2252 2297
2341 2384 2427 2469 2510

Because the retention curve asymptotically approaches 0 as


n→∞, this DAU model grows forever. A simple fix is to define a
terminal date after which all players are assumed churned, then
update the function ret_curve accordingly.

16
Player duration as the summation of
the retention curve
The retention curve defines the probability that a user drawn at average player duration during the first 30 days after install for a
random plays exactly n days after their install date. As such, an cohort whose retention curve is defined by r(n)=0.372n-0.442?

important use of the retention curve is in predicting future


engagement of new and existing players. In particular, the We reuse one of our R functions and determine this value is
number of distinct dates we can expect each new user to play approximately five:
the game during their first n days after install.

R> sum_ret_curve(a=0.372, b=-0.442, n=30)


A play date occurs when a user has at least one session on a


[1] 1.000 1.372 1.646 1.875 2.076 2.259 2.427 2.585 2.733
specific date. The player duration (PD) is the count of a user’s 2.874 3.009 3.137 3.261 3.381 3.497 3.609 3.719 3.825 3.929

distinct play dates from install. Let PDn be the average player [20] 4.030 4.129 4.226 4.321 4.414 4.505 4.595 4.683 4.769
duration within n days of install 〜for a cohort of users. Then 4.855 4.939 5.021

n n
PDn=∑ DiR=∑ r(i) Note: In the literature you will often see reference to the “area
i=0 i=0
under the curve”, or “the integral of the retention curve.” Strictly
As a cohort metric, PDn is a random variable with an expected speaking this is incorrect, as the retention function is only defined
value defined by the summation of the retention curve r(n). Once for integer values of n. If you plug a retention curve formula into a
you have a function r that estimates the retention profile of your symbolic integration tool, you will find the answer is typically
game, you can use this curve to predict the player duration of new inflated compared to a discrete summation.

installs, and lifetime value (LTV) if you multiply by average


revenue per daily active user (ARPDAU). For example, what is the These five play dates can occur anywhere within a user’s first 30

17
and are not necessarily consecutive. Keep in mind that this is a PD(n) can estimate expected player duration for any value of n
statistical mean and by no means reflects the median number of without requiring the calculation of a summation.

days one can expect a random user to play during the first 30
days after install. Most installs only play one or two days before To illustrate, refer to the following chart where both the
churning, but this mean is skewed by a few regular users who retention (in orange) and sum of retention (in blue) are plotted
like the game and play nearly every day.

on the same synchronized axes:

Why is PDn important? Well, if we know that ARPDAU is $1.00,


we can assume that LTV30 for this cohort is likely to be 5.021 *
$1.00=$5.02. In practice, when launching a new version of a 5

game, extrapolating a retention curve r(n) from its early DnR 4.5

signals and taking an average ARPDAU from existing players, 4

then a reasonable estimate of LTV is given by 3.5

n 3

LTVn=ARPDAU*∑ r(i) 2.5


i=0

Rather than model the retention curve and perform a 1.5


summation, the expected value of PDn can be modelled by fitting
1
a curve directly to the running sum of observed daily retention
0.5
values. In other words, instead of building a retention curve from
0
the daily retention rates, we fit a curve to the running sum of 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34

these same retention rates. The resulting closed form function

18
The blue curve is implemented in Tableau with a Table
Calculation that builds a summation over the individual retention
metrics that make up the orange curve. At day zero, we have
100% retention and 1.00 player day. By day 30, the player days
are 5.017. (This is the sum of the actual DnR observations,
whereas 5.021 is the sum of the retention curve that estimates
this data.) The dashed lines are Trend Lines of type Power,
resulting in the following closed form estimate of PDn:

PD(n)=1.21n0.41

We can use this formula to extrapolate player duration to D90


(=7.65) and D180 (=10.17). If average ARPDAU is $1.00, then
LTV90 and LTV180 are estimated to be $7.65 and $10.17,
respectively.

19
Retention benchmarks
GameAnalytics offers users access to comprehensive Retention
benchmarks, providing valuable insights into player engagement
and retention performance across the gaming industry. These
benchmarks serve as a reference point for developers to
compare their game's retention metrics against industry
standards and identify areas for improvement.

These benchmarks are derived from GameAnalytics dataset


compiled from various games and developers worldwide,
offering a comprehensive overview of retention trends across
different genres, platforms, and player demographics.

One of the key features of Retention benchmarks in


GameAnalytics is the ability to customize and filter the data
based on specific criteria. Users can adjust filters such as genre,
platform, region, and player demographics to refine their While paying attention to your game performance cannot be
benchmark comparisons and identify relevant insights for their overstated, it is equally important to understand player preferences
game.

and industry standards. Conveniently packaged in GameAnalytics


Pro, our industry benchmarks can help you uncover players’
Additionally, GameAnalytics provides Engagement, Monetization, behavior patterns alongside the access to next-gen analytics
and Advertising benchmarks. solutions for your game. Learn more here.

20
Summary
Retaining existing users is fundamental to the success of any that DnR=r(n).

Each game will have its own values for a and b
mobile game since you can only monetize the players you have. that best fit their retention profile.

Retention is measured as the proportion of a cohort that plays


exactly n days after their install date: Since it defines the probability that a user plays exactly n days
after install, expected DAU by day n after game launch is
DAUn dependent on the retention curve. The recurrence relationship
DnR=
DAU0 is DAUn=(r[n]*DAU0)+DAUn-1, which can be implemented as a
recursive function in any programming language.

…where DAU0 is the size of the cohort, and DAUn is a count of the
daily active users from the cohort who played on the nth day after Player duration (PD) is the number of distinct dates a new user
their install date.

plays over their lifetime (i.e., until churn) or within their first n
days after install. The expected value of PD by day n is the
The retention profile for a game is a weighted average of the summation of the retention curve from days 0 to n. This
observed DnR values for installs over a range of historical dates, summation can be performed both iteratively with a
typically calculated for days 1, 7, 30, and 90 since install.

programming language or estimated analytically by fitting a


curve PD(n) to the running sum of r(n). LTVn is predicted by
A non-linear regression function fit to the retention profile multiplying PD(n) by ARPDAU.
succinctly captures expected player interaction for all past and
future installs. This function is referred to as the retention
curve. For mobile games, the retention curve is typically a
negative-exponent power function r(n)=anb, which defines a
proportion as a function of n, the days since install. It follows

21
About the book About the author
Portions of this paper previously appeared in the book Game Russell Ovans, Ph.D., was the Director of Analytics at East Side
Analytics: Retention and Monetization in Free-to-Play Games. Games, developers of hit mobile games such as The Office:
Reprinted with permission of Thought Pilots. Somehow We Manage and Trailer Park Boys: Greasy Money. He is a
computer scientist and has worked as both a software
Mobile games are big business, and the landscape is more engineering professor and programmer for over 35 years. In
competitive than ever. With an in-depth focus on the core areas 2007, he founded Backstage Technologies, a social game studio
of user retention and predicting customer lifetime value, Game that pioneered the monetization of free-to-play games on
Analytics contains the hands-on SQL queries, R scripts, Facebook. Best known for its Family Feud app, Backstage was
statistical theory, full-colour Tableau visualizations, and insider acquired by RealNetworks in 2010, after which Russ returned to
tips and tricks you need to succeed as a data analyst, product teaching college, worked as an executive-in-residence at a tech
manager, or user acquisition manager in free-to-play games.

 incubator, and opened a brewery. He returned to the games
industry in 2018 to lead analytics, growth, and

Game Analytics describes in detail how successful game studios ad monetization at ESG, a

make money, collect and query player data, define key tenure during which the

performance indicators (KPIs), build dashboards and predictive company quadrupled revenue

models of retention and monetization, measure and predict and went public.

return on ad spend (ROAS), and use statistics to analyze A/B


tests designed to improve retention and monetization.

He welcomes your feedback:

[email protected].
The book is available on Amazon in various countries.

You might also like