Metrics gives us quick feedback. Metrics together with small changes makes it possible to do data-driven development, instead of the more common opinion-based development that large changes and slow feedback leads to. Metrics data is transparent and available for everybody.
We are monitoring the health of both the infrastructure (the production server) and the different functions of the application itself.
The goal is to have automated alerts (for instance as emails, SMSes or as messages in a Slack channel) that will notify us of a potential problem before any user will notice (proactive notifications), so we can make improvements while it is still a small problem. We can for instance get:
- Threshold warnings: CPU usage exceeds 90%
- Rate of change warnings: CPU usage has increased 25% in the last 10 minutes
- Warning for a slow function: The Find Owners operation in the PetClinic application takes on average more than 2 seconds.