IT Alert Operations: Standard Operating Procedure
IT Alert Operations: Standard Operating Procedure
Author
[COMPANY NAME] | [COMPANY ADDRESS]
9/24/2019 1:40:00 AM
T ABLE OF C ONTENTS
RDSProxy Back-end ........................................................................................................................................................1
Description.................................................................................................................................................................1
Dashboard Links.....................................................................................................................................................1
Alert Definition ..........................................................................................................................................................1
State: Down ...........................................................................................................................................................1
Symptoms ..................................................................................................................................................................1
Recovery Process .......................................................................................................................................................1
[Element] .......................................................................................................................................................................2
Description.................................................................................................................................................................2
Dashboard Links.....................................................................................................................................................2
Alert Definition ..........................................................................................................................................................2
State [Warning/Critical/Down/Unreachable] ........................................................................................................2
Symptoms ..................................................................................................................................................................2
Recovery Process .......................................................................................................................................................2
Version Date Editor
1
RDSP ROXY B ACK - END
D ESCRIPTION
RDSProxy is just an instance running HAProxy in TCP-Proxy mode (wherein it binds a locally listening socket to a
remote socket on a “back-end” host and steps away, allowing the native transmission to occur on the wire).
HAProxy monitors the “Back-end” RDS instances by making a MySQL client connection to them using the
haproxy_check user.
D ASHBOARD L INKS
RDSProxy Dashboard
MySQL Dashboard
A LERT D EFINITION
S TATE : D OWN
T RIGGER
In the event that 3 sequential health checks fail for a given back-end RDS instance, HAProxy marks that system
"down" and sends no additional traffic to it. Once the back-end server is marked down. HAPROXY will not attempt
to re-enable it. You must do this manually.
N OTIFICATION
Team: MySQL Administrators
E SCALATION
If two or more systems alert with this message, escalate immediately to Senior DBAs.
If not acknowledged/resolved within 30 minutes, escalate notification to Senior DBAs
R ESET C ONDITION
Health check reports status as “up”
S YMPTOMS
Remote calls to the instance in question may result in slow returns of results. Multiple RDSProxy failures will affect
performance.
R ECOVERY P ROCESS
Follow the haproxy.log file to validate that the back-end host has been re-established.
tail -f /var/log/haproxy.log
1
[E LEMENT ]
D ESCRIPTION
Description of the element affected and what it entails – include plain language description of the element and its
role in the organization.
D ASHBOARD L INKS
Links to a dashboard for monitoring the service
A LERT D EFINITION
T RIGGER
Trigger conditions for the above state. Repeat the “State” header with additional statuses if this element can
trigger more than one state.
N OTIFICATION
Who will get the notifications and which transport will be used.
E SCALATION
If there are any escalation paths, define here.
R ESET C ONDITION
How do we know that this is resolved?
S YMPTOMS
What are the symptoms seen by IT, end-users, external services, etc.?
R ECOVERY P ROCESS
How do you recover from this alert? If things are done automatically via the NMS, define them here.