The Trade Desk recently moved from an old monitoring system based on Nagios, Graphite, and a number of homegrown pieces of software, to something more standard, based on Prometheus. SRE Patrick O’Brien talked about the lessons they learned along the way.
The KubeCon + CloudNativeCon caravan heads back to Europe this month, bringing an expected 10,000 cloud native enthusiasts to Barcelona’s Fira Gran Via. Already registered and packed your bags? Here’s where you will find Grafana Labs team members during the conference.
Polystat The grafana-polystat-panel plugin was created to provide a way to roll up multiple metrics and implement flexible drilldowns to other dashboards.
This example will focus on creating a panel for Cassandra using real data from Prometheus collected from our Kubernetes clusters. We’ll focus on the basic metrics for CPU/Memory/Disk coming from cAdvisor, but a well-instrumented service will have many metrics that indicate overall health, such as requests per second, error rates, and more.
Grafana Labs has created the Explore UI, which allows you to iterate quickly through Prometheus queries, while leaving your dashboards intact. This post will walk you through a demo and show you how to try it out yourself.
Øredev - Carl Bergquist - Monitoring for Everyone What is monitoring?
What do the terms log, metric, and distributed tracing actually mean?
What makes a good alert?
Why should I care?
At a recent developer conference in Malmö, Sweden, I gave a presentation on monitoring and observability to discuss the high level concepts and common tools that are out there.
Monitoring and observability can easily become quite complex, but at the heart of it, we simply want to know how our systems are performing, and when performance drops – be able to find out why.
It used to be that each machine had one purpose. Your load balancer machines talked to your web server machines which talked to your database machines. Instrumentation from these systems didn’t tend to be spectacular, so the most practical approach was to rely on and page on machine metrics such as CPU. Sure it wasn’t perfect, but it caught most of the times the service got overloaded.
There’s the odd false positive.
Grafana is used by hundreds of thousands of users on a wide variety of data sources. Among these there is a division in approaches to collecting the data. These are logging as exemplified by Elasticsearch as part of the ELK stack (Elasticsearch, Logstash and Kibana), and metrics as exemplified by Prometheus.
What do I mean by monitoring? Monitoring means knowing what’s going on inside your system, how much traffic it’s getting, how it’s performing, how many errors there are.