CC Image courtesy of Buster Benson on Flickr
Databases like SQL Server, MySQL and Oracle are the workhorses of IT operations. They are the reliable back-end for applications, services, web sites and many other kinds of systems so it’s no wonder they have to be up, running and available 24 x 7. But despite being workhorses they are still fragile. All it takes is an unexpected traffic spike or even worse, a runaway application, and suddenly response times start to lag, systems start to fail and end users start to complain.
Effective database monitoring means looking beyond system health metrics.
The key to effective database monitoring is to make sure that you are checking the right things and to look beyond simple system health metrics. Don’t get me wrong, system health metrics are great and a necessary part of core monitoring for any system, but even if CPU usage is low, drives have lots of space and memory is plentiful, it doesn’t mean that your databases and the systems that depend on them are healthy.
Two simple techniques can give you deeper visibility and better warning about potential issues with your database server:
First, create sample queries and run them on a regular basis. For effective database monitoring we recommend running actual queries against the database. The main thing you want to check is how long it took the query to complete. Ideally you’ll have a graph of that value over time so you can detect patterns and make an estimate of what normal conditions are. It’s usually a good idea to run separate queries against multiple tables and have different thresholds as some may exhibit different behavior under normal conditions. Set your monitoring software to alert you when the thresholds have been exceeded. Ideally your monitoring configuration will be set to only warn you after several failures in a row. After all, you don’t want to be alerted about momentary spikes, but you do want to be alerted about sustained abnormal conditions.
Second, run health check stored procedures. If your vendor (or internal development team) has not yet implemented health check procedures then it’s time to pressure them now. A health check is a stored procedure that runs internal tests reproducing common database operations that higher level components use. The health check should return a success/failure indicator and text strings indicating what passed and failed, if anything. Run the health check stored procedures on a regular basis and set your monitoring tool to warn you based on what they return.
Get a clear view of normal operations and advanced notice about potential issues.
By implementing the above techniques you’ll get a clear view of what normal operations look like. You’ll also get advanced notice about potential issues as they start to build up and that will give you a better ability to find and fix issues before they become big problems.
Our software, FrameFlow Server Monitor, can do all of the above of course. If you’re new to FrameFlow, take our 30-day trial edition for a spin. If you’ve already purchased a license, book a call with us and one of our reps will help you to implement everything we’ve described here.