Russell Ovans Gameanalytics Retention
Russell Ovans Gameanalytics Retention
Gameanalytics.com
Table of contents
2
Introduction
Preliminary definitions
7
The retention curve
8
Fitting a retention curve in Excel
10
13
17
20
Retention benchmarks
21 Summary
1
Introduction
Retaining customers is the life blood of any business, let alone a daily active users (DAU) from a constant number of daily installs;
game studio. Generating new customers is an expensive and, the summation of this curve is the expected number of
endeavour as it requires outlays of cash on advertising. As such, distinct days a new user will play your game.
it is generally more economical to keep the customers you have
than it is to buy new ones. Or, as my Uncle Bob explained to me
years ago, “Once the customer enters your business, all of your
effort shifts to selling them one thing: a return visit.”
2
Preliminary definitions
Data analysts tend not to think too much about individual published by East Side Games, a good D1R is around 40%. For
players. Instead, descriptive statistics are drawn from a group of D7R, we aim for 20%.
something in common. Normally, this is their install date, but Dn retention as a KPI is measured at a standard set of days
additional attributes can be used to determine membership in a since install, typically for n ∈ {1, 7, 30, 90} . But in general, DnR
cohort. For example, a cohort might consist of all the Android can be measured for any day since install as
players from the US who installed version 1.26.5 on July 1, 2023.
Cohorts define the players used in the calculation of averages DAUn
DnR=
and other descriptive statistics that make up key performance installs
indicators (KPIs) such as day-n retention.
3
9%
consistent intervals – of the same variable. When charted or
graphed, time is the independent variable and occupies the x-
axis. The x-axis usually refers to a specific date when a game
8%
was played; e.g., DAU by day. But for other time-series KPIs, the
x-axis can represent a cohort install date that measures the
evolving, future behaviour of only those users who installed the
game on that date. Day-n retention is one such cohort-based 7%
represents the proportion who installed the game that date and
played on August 1st, and so on. (The presented data is
representative, and not from any particular game.) 5%
Jul 9 Jul 12 Jul 15 Jul 18 Jul 21 Jul 24 Jul 27 Jul 30
4
Retention is an aggregate measure applied to a group of players. Rather than a time series of daily retention values, a retention
But at the individual user level, Dn retention is simply a binary profile for a game as a whole is constructed by taking a
variable: the user either played on that day (indicated by the weighted average of Dn retention values over multiple install
value 1), they did not play (indicated by a 0), or if not yet fully cohorts. Given a database table similar to the one above, this is
baked, we don’t know yet (indicated by a null value). Here is trivial to compute with a SQL query, the result set of which might
sample retention data for some random players. appear as follows:
2630131 2017-06-28 1 0 0 1
Typically, users are taken from a range of consecutive install
8554854 2021-11-19 0 0 0 0 dates (as specified by a WHERE clause in the SQL) that are at
226962 2017-04-20 1 1 1 0 least 90 days old to ensure only fully-baked cohorts are included,
but if not, the nulls are conveniently skipped over and not
7706121 2021-02-06 0 0 0 0
included in the calculation of the average.
9025650 2023-09-28 0 0 0 0
3221035 2017-08-19 1 1 0 0
5
How GameAnalytics displays retention
In GameAnalytics, retention metrics are displayed using cohorts,
which are groups of players who share common attributes, such
as install date or specific in-game actions. The retention
metrics, such as day-n retention (DnR), are calculated based on
the behavior of these cohorts over time. By default,
GameAnalytics allows users to view retention metrics via default
triggers, such as the first session after install.
6
The retention curve
1.0
By convention, the days 1, 7, 30, and 90 after install are included
in a game’s core set of KPIs. But what about all the other days in
between or after these specific days? We could measure every 0.9
DnR from the cohort data, but is there a closed form function
that can tell us with reasonable accuracy what Dn retention is 0.8
for any day n? It turns out there is, and this function defines the
retention curve. The retention curve is a mechanism to fully 0.7
before churning.
0.5
the dots. Here is a curve (the green line) fit to observed retention
values of 0.4, 0.23, and 0.16 at D1, D3, and D7, respectively.
0.3
7
profiles tend to be real-valued power functions. The generic b that minimize the residual error between the observations (the
form of a retention power curve r is a function r: ℕ→[0, 1] of actual data) and the predictions (the values returned by the
days since install n: function). We now consider two tools a data analyst might
employ in performing this regression: Excel and Tableau.
r(n)=anb
r(n)=0.4n-0.5
8
dependent variable and x is the independent variable. In our Our retention curve is thus:
case, n (the days since install) is the independent variable. As its
name suggests, the LINEST function – by default – fits a line r(n)=0.396n-0.472
and returns an array (two adjacent cells) containing the slope
And that is how you use Excel to determine a formula for a
and y-intercept of the formula that best fits the data. To instead
game’s retention curve based on a small sample of DnR values.
fit a power function to our data, we need to pass LINEST the
Congratulations – achievement unlocked!
14 0.114
30 0.079
60 0.057
90 0.047
If we enter that function into cell A7, Excel returns the exponent
180 0.034
b and the natural log of the coefficient a in the cells A7 and B7,
respectively. Cell C7 contains the function =EXP(B7) in order to 360 0.025
convert the coefficient to the correct form.
720 0.018
9
1.8% D720 retention? Really? When modeled as a power days_since_install players installs retention
function, there is no terminal day beyond which every player is
0 7891 7891 1.0
guaranteed to have churned, i.e., r(n)>0 for all n ∈ ℕ. Depending
on your game’s live operations, elder-game mechanics, and 1 2929 7891 0.3712
content release schedule, this may or may not be a reasonable
2 2120 7891 0.2687
assumption.
days. We include only those users who had a chance to play 30 30 621 7891 0.0787
days after install (i.e., those who installed between 90- and 31-
days prior) and simply count the number of those users who Notice we do not cohort by install date – all we care about is how
played n days after their install date. many users played exactly n days after they installed, regardless
of each user’s specific install date.
10
1
0
Utilizing Tableau to explore this dataset, we plot retention as a 0 2 4 6 8 10 12 14 16 18 21 23 25 27 29
days_since_install
Note: Some might object to the use of a line chart instead of a bar
chart given that the x-axis is discrete and not continuous; i.e., we
don’t deal in fractional days since install. My use of a continuous
line is not meant to imply that something is happening between
samples; it is simply an aid to help visualize a trend in the data. retention 0.5
Now from the Analytics tab we add a Trend Line of type Power to
fit a curve to the data (see the bottom chart).
0
0 2 4 6 8 10 12 14 16 18 21 23 25 27 29
days_since_install
11
Tableau uses the same method of least-squares as Excel to find 1
values for the coefficient a and exponent b that best fits these
observations. If we hover over the dashed line, a pop-up
indicates the formula that Tableau has decided is the best fit for
our data.
retention=0.371848*days_since_install^-0.442077
R-Squared: 0.99592
r(n)=0.372n-0.442
take that!
12
Predicting DAU with a retention curve
By now we should all be very comfortable with the idea that a n DAU
retention curve is a model – derived from historical data –
0 100 100
defined by a power function r(n)=anb . This function defines the
probability a player has a session exactly n days after their 1 40 100 140
install date. Therefore, when applied to a cohort of installs, the
2 28 40 100 168
expected number of DAU from that cohort on day n after install
is: 3 23 28 40 100 191
4 20 23 28 40 100 211
DAUn=r(n)*cohort size
5 18 20 23 28 40 100 229
Calculating the steady-state daily active users given a retention
6 16 18 20 23 28 40 100 245
formula and a constant number of daily new installs can be
accomplished with lots of copying and pasting in a spreadsheet, 7 15 16 18 20 23 28 40 100 260
but that approach is tedious and error prone. For example,
assume a new game launches with a retention curve defined by
0.4n-0.5 and 100 installs per day. After seven days, a spreadsheet
can tell us the expected total DAU from the overlapping cohorts
is 260. See the table, where each column represents a single
install cohort, and the last column is the total DAU by day n after
launch.
13
By examination of this table, we notice a pattern: the DAU by day
DAU_n <- function(n)
the DAU on the prior day. For example, DAU after seven days is if (n == 0) {
return (100)
cohort, plus the 245 DAU from the prior day. This pattern is return (round(100 * 0.4 * n ^ -0.5) + DAU_n(n-1))
}
DAU0=100
[1] 0 100
[1] 1 140
[1] 3 191
What is this sorcery! I’m not enough of a mathematician to come [1] 4 211
[1] 5 229
up with a closed form solution to that recurrence relation, but as [1] 6 245
14
Goodbye spreadsheet! It’s a straightforward exercise to
R> options(digits=4)
generalize this function to work for any retention curve – a R> ret_curve(a=0.4, b=-0.5, n=7)
values, and another to generate a list of the running sum of [1] 1.000 1.400 1.683 1.914 2.114 2.293 2.456 2.607
these daily values. Both functions are recursive and work
backwards from n to 0.
Finally, we can compute a list of expected DAU from game
ret_curve <- function(a, b, n)
{
}
if (n >= 0)
}
For example, the predicted DAU after 30 days for a new game
with 500 installs per day and a retention profile of a=0.372,
Here are the functions in action, revealing r(n) and ∑ r(n) for the b=-0.440 is 2,510:
first n=7 days after install for a retention curve defined by a=0.4
and b=-0.5.
15
R> dau(installs=500, a=0.372, b=-0.442, n=30)
[1] 500 686 822 937 1038 1129 1213 1292 1366 1437 1504
1568 1630 1690 1748
[16] 1804 1859 1912 1964 2014 2064 2112 2160 2206 2252 2297
2341 2384 2427 2469 2510
16
Player duration as the summation of
the retention curve
The retention curve defines the probability that a user drawn at average player duration during the first 30 days after install for a
random plays exactly n days after their install date. As such, an cohort whose retention curve is defined by r(n)=0.372n-0.442?
distinct play dates from install. Let PDn be the average player [20] 4.030 4.129 4.226 4.321 4.414 4.505 4.595 4.683 4.769
duration within n days of install 〜for a cohort of users. Then 4.855 4.939 5.021
n n
PDn=∑ DiR=∑ r(i) Note: In the literature you will often see reference to the “area
i=0 i=0
under the curve”, or “the integral of the retention curve.” Strictly
As a cohort metric, PDn is a random variable with an expected speaking this is incorrect, as the retention function is only defined
value defined by the summation of the retention curve r(n). Once for integer values of n. If you plug a retention curve formula into a
you have a function r that estimates the retention profile of your symbolic integration tool, you will find the answer is typically
game, you can use this curve to predict the player duration of new inflated compared to a discrete summation.
17
and are not necessarily consecutive. Keep in mind that this is a PD(n) can estimate expected player duration for any value of n
statistical mean and by no means reflects the median number of without requiring the calculation of a summation.
days one can expect a random user to play during the first 30
days after install. Most installs only play one or two days before To illustrate, refer to the following chart where both the
churning, but this mean is skewed by a few regular users who retention (in orange) and sum of retention (in blue) are plotted
like the game and play nearly every day.
game, extrapolating a retention curve r(n) from its early DnR 4.5
n 3
18
The blue curve is implemented in Tableau with a Table
Calculation that builds a summation over the individual retention
metrics that make up the orange curve. At day zero, we have
100% retention and 1.00 player day. By day 30, the player days
are 5.017. (This is the sum of the actual DnR observations,
whereas 5.021 is the sum of the retention curve that estimates
this data.) The dashed lines are Trend Lines of type Power,
resulting in the following closed form estimate of PDn:
PD(n)=1.21n0.41
19
Retention benchmarks
GameAnalytics offers users access to comprehensive Retention
benchmarks, providing valuable insights into player engagement
and retention performance across the gaming industry. These
benchmarks serve as a reference point for developers to
compare their game's retention metrics against industry
standards and identify areas for improvement.
20
Summary
Retaining existing users is fundamental to the success of any that DnR=r(n).
〜
Each game will have its own values for a and b
mobile game since you can only monetize the players you have. that best fit their retention profile.
…where DAU0 is the size of the cohort, and DAUn is a count of the
daily active users from the cohort who played on the nth day after Player duration (PD) is the number of distinct dates a new user
their install date.
plays over their lifetime (i.e., until churn) or within their first n
days after install. The expected value of PD by day n is the
The retention profile for a game is a weighted average of the summation of the retention curve from days 0 to n. This
observed DnR values for installs over a range of historical dates, summation can be performed both iteratively with a
typically calculated for days 1, 7, 30, and 90 since install.
21
About the book About the author
Portions of this paper previously appeared in the book Game Russell Ovans, Ph.D., was the Director of Analytics at East Side
Analytics: Retention and Monetization in Free-to-Play Games. Games, developers of hit mobile games such as The Office:
Reprinted with permission of Thought Pilots. Somehow We Manage and Trailer Park Boys: Greasy Money. He is a
computer scientist and has worked as both a software
Mobile games are big business, and the landscape is more engineering professor and programmer for over 35 years. In
competitive than ever. With an in-depth focus on the core areas 2007, he founded Backstage Technologies, a social game studio
of user retention and predicting customer lifetime value, Game that pioneered the monetization of free-to-play games on
Analytics contains the hands-on SQL queries, R scripts, Facebook. Best known for its Family Feud app, Backstage was
statistical theory, full-colour Tableau visualizations, and insider acquired by RealNetworks in 2010, after which Russ returned to
tips and tricks you need to succeed as a data analyst, product teaching college, worked as an executive-in-residence at a tech
manager, or user acquisition manager in free-to-play games.
incubator, and opened a brewery. He returned to the games
industry in 2018 to lead analytics, growth, and
Game Analytics describes in detail how successful game studios ad monetization at ESG, a
make money, collect and query player data, define key tenure during which the
performance indicators (KPIs), build dashboards and predictive company quadrupled revenue
models of retention and monetization, measure and predict and went public.
[email protected].
The book is available on Amazon in various countries.