Owein Reese’s Post

View profile for Owein Reese

Software, Data, and a whole lot of SaaS

I was once in a small debate with one of the SRE managers at MediaMath on how to do application gut check on deploy. I liked making a "sanity" endpoint which caused the app to run a dependency check and ack the results. He preferred sticking to vanilla health checks and monitoring. Naturally I went with what he said; he being more experienced and living in that world longer than I had. However, on more than one occasion pinging each node is a cluster and finding the one that acked "err: no db" saved my bacon in a clutch. Now that we live in a world of pods of transient nodes built on managed clusters I think the "gut check" endpoint is much less valuable. Would I first hit up cubectl to see what was running? Then run hit that endpoint only to find out that the problematic server had already been vacuumed away? I'd better spend my efforts building in the right kind of monitoring and telemetry so that the magic could happen without me.

To view or add a comment, sign in

Explore topics