Baseline Monitoring Anomaly Detection
Baseline Monitoring Anomaly Detection
ANOMALY DETECTION
• all our microphones are muted
• ask your questions in Q&A, not in the Chat
• use Chat for discussion, networking or applauses
• use the hashtag if you post something in your social media channels: #ZabbixMeetingOnline
ZABBIX 6.0 LTS
One of the new features of Zabbix 6.0 LTS is focus on anomaly detection
2
01
BASELINE MONITORING
OVERVIEW
BASELINE AND ANOMALIES
Anomaly detection is a type of data analytics whose goal is detecting unusual patterns in a dataset:
Data must be normally distributed, following a set of rules
Anomaly detection measures how far a data point is away from the mean
When the value deviates too much from the mean, it is considered to be anomalous
Anomaly
4
BASELINE VS FIXED TRIGGERS
BASELINE MONITORING CAN MONITOR VALUES WHICH FOLLOW A PATTERN
Fixed trigger thresholds will either give false alarms or ignore the problem
Baseline monitoring can adapt to such situation
Anomaly
ANOMALY DETECTION OVERVIEW
Anomaly detection is based on a set of statistic functions using:
Standard deviation (σ)
Mean absolute deviation (MAD)
Weighted moving average (WMA)
Seasonal and Trend decomposition using Loess (STL)
6
STANDARD DEVIATION
The standard deviation σ defines how far the normal distribution is spread around the mean
When a metric is normally distributed it follows some interesting laws:
• 68% of all values fall between [mean-σ, mean+σ]
• 95% of all values fall between [mean-2*σ, mean+2*σ]
• 99,7% of all values fall between [mean-3*σ, mean+3*σ]
7
MEAN ABSOLUTE DEVIATION (MAD)
The mean absolute deviation (MAD) of a dataset is the average distance between each data point
and the mean
Calculate the mean
Calculate how far away each data point is from the mean using positive distances (deviations)
Sum those deviations together
Divide the sum by the number of data points
8
WEIGHTED MOVING AVERAGE ALGORITHM
A weighted moving average (WMA) puts more weight on recent data and less on past data
The most recent data is more heavily weighted, and contributes more to the final WMA value
The weighting factor used to calculate the WMA is determined by the period
9
02
TIMESHIFTS
TIMESHIFT SYNTAX
Zabbix can use absolute or relative timeshift to compare current and past periods of data
Relative timeshift specifies time period relatively to the current time:
trendavg(/host/key,1d:now-3d)
trendavg(/host/key,1d:now/d-3d)
RELATIVE TIMESHIFT
Relative timeshift specifies sliding time period relatively to the current time:
3 days
1d
trendavg(//key,1d:now-3d) trendavg(//key,1d)
12
ABSOLUTE TIMESHIFT
Absolute timeshift specifies fixed time period calculated from the end of the period:
3 days
trendavg(//key,1d:now/d-3d) trendavg(//key,1d:now/d+1d)
Three days ago Two days ago Yesterday Today Now Tomorrow
2022-04-24 2022-04-25 2022-04-26 2022-04-27 2022-04-27 2022-04-28
00:00:00 00:00:00 00:00:00 00:00:00 17:00:00 00:00:00
13
03
TREND FUNCTIONS
TREND FUNCTIONS
Zabbix 6.0 offers eight different trend functions for long-term data analysis
Trends analysis:
• trendsum (/host/key,time period:time shift)
• trendavg (/host/key,time period:time shift)
• trendcount (/host/key,time period:time shift)
• trendmax (/host/key,time period:time shift)
• trendmin (/host/key,time period:time shift)
• trendstl (/host/key,eval period:time shift,detection period,season,<dev>,<devalg>,<s_window>)
Baseline calculation:
• baselinedev (/host/key,data period:time shift,season_unit,num_seasons)
• baselinewma (/host/key,data period:time shift,season_unit,num_seasons)
15
STORING TRENDS
Trends in the trends cache are calculated in real-time independently from historical data
TrendCache always has the actual trends value for every item
• to calculate the new average after then nth number, you multiply the old average by n−1, add
the new number, and divide the total by n.
16
TREND CACHES
Zabbix trends are written to database at the beginning of each hour
Trends for the current hour are unavailable for trend functions
Two different Zabbix server internal caches are used by trends
17
TRENDS VS HISTORY
TRENDS UTILIZES LESS SPACE THAN HISTORY IN BOTH DATABASE AND MEMORY CACHES
When functions with time-shift are used for analysis, all data between now and function period are
stored in the memory cache
History data could utilize gigabytes of data in such scenarios, trends are much more efficient
trendsum(//key,1d:now/d-3d)
time
baseline*(/host/key,1h:now/h,"d",3)
baseline function based on the last full hour within the last 3-day period
20
MORE BASELINE FUNCTION EXAMPLES
baseline*(/host/key,1d:now/d,"M",6)
baseline function based on the previous day and the same day of month in the previous 6 months
If the date does not exist in a previous month, last day of month will be used
baseline*(/host/key,2h:now/h,"d",7)
baseline function based on the last two hours and the same hours within a 7-day period
baseline*(/host/key,1w:now/w,"m",3)
baseline function based on the previous week and other weeks within a 3-month period
BASELINEWMA() FUNCTION
Calculates baseline data by averaging data from the same timeframe in multiple equal time periods
Weighted moving average algorithm (WMA) is used
Baseline can be compared to recent trends data to detect anomalies
22
BASELINEDEV() FUNCTION
Calculates number of deviations (σ) between the last data period and periods in preceding seasons
stddevpop algorithm is used (calculates standard deviation based on the entire population)
High number of deviations indicates anomalies
23
05
ANOMALY DETECTION USING
PATTERNS
TRENDSTL FUNCTION
STL will decompose data in predefined intervals and will find anomalies based on a repeating
pattern.
trendstl() function use standard deviation to detect anomalies and returns anomaly rate (0 - 1):
• Function compares smaller detection period to larger evaluation period
• A standard deviation means how far values are from the average
• By default, the MAD algorithm is used, can be also stddevpop or stddevsamp
• The number of deviations can be specified (default is 3)
• s_window is the span (in lags) of the loess window for seasonal extraction
25
SEASONAL-TREND DECOMPOSITION USING LOESS
Seasonal-Trend decomposition using LOESS (STL) is a robust method of time series decomposition
The STL method uses locally fitted regression models to decompose a time series into
• trend components
• seasonal components
• residual components
26
TRENDSTL FUNCTION EXAMPLE
27
Thank you
www.zabbix.com