Application Monitoring With Prometheus: Intro, Practical Tips, and Adform's Experience
Application Monitoring With Prometheus: Intro, Practical Tips, and Adform's Experience
1
Agenda
• Adform
• History leading up to Prometheus
• Prometheus
• Grafana
• Practical tips
• Adform's experience
2
https://bit.ly/2JKq6qN
3
Adform
4
Advertising Industry
5
Key Milestones
Innovating the Automation of Buying and Selling Advertising
Adform
Founded
2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
6
Global Infrastructure
7
Monitoring
8
Control center
9
Why Monitoring is Important?
Problems that we are trying to solve
10
History leading to Prometheus from
the perspective of Unix and Unix-like
systems
11
Lesson
12
History leading to Prometheus
Since the beginning... (almost)
13
What if the structure of the
messages were
standardized?
14
How to control who sends
how many messages? How to
know if they are legit?
15
Let's turn around the whole
process and make the
structure according to our
requirements!
16
Introducing Prometheus
• Free software project that is a fusion of
different predecessors: Borgmon, Graphite,
etc.
• The underlying special database – the time
series database – got inspiration from
Facebook's Gorilla time series database
(white-paper:
http://www.vldb.org/pvldb/vol8/p1816-
teller.pdf)
• Based on a sliding time window
• Hugely popular: 22k+ stars on GitHub,
Google and all of the other big organizations
are using it
• It, coupled with a few components, solves all
of the problems outlined before
17
How does the data look like?
18
Example metrics
19
Structure of Prometheus data
• We call one data point a "metric"
• Metric is identified by its name and a set of labels
and their values (ASCII a-zA-Z characters), and
the metric's value (floating point number)
• Different labels provide different dimensions to
data
• The time-series database is specialized for this
kind of data and provides a high level of
compression
• Example:
current_wind_speed{city="Kaunas"} 10
20
How do we know that the data
is real?
21
Process of collecting metrics
• Prometheus itself sends GET requests to
specified end-points and parses the
metrics data
• It all happens periodically and the
timestamp gets written to the time series
database (that's where the
word time comes from)
• We know that the data is legit since we do
this from Prometheus side – we do not
trust random senders
22
How to query the data?
23
Prometheus query language -
PromQL
• Uses a syntax very similar to the metrics
• Values in square brackets specify a range-
vector. Example: ticket_price[5m]
• Plethora of functions for aggregation: sum,
avg, count, histogram_quantile,
et cetera
• Label selectors can use regular
expressions: =~, !~, =, !=
24
Useful: simple query in the Prometheus UI
25
Useful: calculate network usage
26
Introducing Grafana
27
Grafana
• Grafana is an open source, feature rich
metrics dashboard and graph editor for
Graphite, Elasticsearch, OpenTSDB,
Prometheus and InfluxDB.
• Very user-friendly
- Dashboards which you can view are
composed of panels
- Different panels can show different
information in any appropriate way
- Has a concept of "organizations" so all of the
dashboards are separated
28
Grafana dashboard example
29
Intuitive interface
30
Alerting in Prometheus
31
Alerts
- Same expressions as in the PromQL
- The only extra things you need to define are
- Extra annotations which give useful information
to the person receiving it
- Thresholds
- The name of the alert and its group
32
Alerting rule example
33
Adform's experience so far
34
Actively used central monitoring service
35
Available all over the world for developers/IT
36
Alerts
37
Alerts in Slack
38
Visibility and transparency
39
[email protected]
https://giedrius.blog
@stag1e
[email protected]
https://giedrius.blog
@stag1e