Grafana Tutorial: Simple Synthetic Monitoring for Applications

Published: 18 Jun 2019 by Brian Gann RSS

Monitoring with Synthetics

Often there’s a focus on how a service is running from the perspective of the organization. But what does service health monitoring look like from the perspective of a user?

There are many metrics that indicate the overall health of a container, vm, or application, but independently they do not indicate if the system is functioning correctly.

Often these metrics (CPU, disk, memory) are too narrow, and they can be poor indicators. High CPU may be desirable or bursts of memory usage may be normal.

Synthetic metrics address the user experience, whether measuring a simple API call or logging into an application and viewing a dashboard.

In this example, we’ll use hosted Grafana since the entire process is well-known. This will demonstrate the common steps and metrics collected that can be used to monitor service health and, as a by-product, show where bottlenecks exist.

Here’s the final dashboard:

final dashboard

What are Synthetic Metrics?

Synthetic metrics are a collection of multi-stage steps required to complete an API call or transaction.

A set of metrics for an API call would contain: 1. Time to connect to API (connect latency) 2. Duration of request (response latency) 3. Size of response payload 4. Result Code of request (200, 204, 400, 500, etc) 5. Success/Failure state of the request

That’s a very high-level synthetic and can be used as a model for more complex API calls.

Taking this idea further, an API call may require authentication before making a request. The user making the request may have a valid authentication token but not the authorization to make some API calls.

A “read only” user would not be modifying data but could make some useful queries.

Why Use Synthetics?

User experience is the most important aspect of service offerings. As long as the user can perform their tasks according to expectations, a service is healthy.

From the SRE viewpoint, these are ways a service can be “degraded” but remain working: 1. A database could be degraded (Two out of three nodes in a cluster are healthy, but the third is offline) 2. Kafka replication may not be working, but enough is online to continue working 3. Cassandra storage may be running out (It always does over time, particularly when you are on-call next) 4. Kubernetes Masters are offline (This does happen, even in the best of clouds)

From the user experience, none of the above issues matter as long as the service is functioning.

Synthetic Metrics with Hosted Grafana

A very basic Python script will be used to traverse 10 steps required to login and validate a session with a hosted Grafana instance. The metrics generated by the script are in Graphite format and will be sent to a hosted metrics instance with tags enabled.

The same script can be adapted to send this data to InfluxDB or provide a metrics API that can be scraped by Prometheus.

Time Series Databases

Grafana offers hosted metrics for both Graphite and Prometheus. The script currently generates metrics suitable for Graphite with tags enabled.

10 Steps to Success

There are 10 steps for the entire process, with a final step that parses the result and ensures the login has succeeded.

To discover these steps, a combination of using Chrome Developer tools and Postman was used to duplicate the process.

Step 1: Target: https://bkgann3.grafana.net
SYNTHETIC GET - STEP 1: https://bkgann3.grafana.net
SYNTHETIC GET - STEP 1: https://bkgann3.grafana.net  DURATION: 567ms
Step 2: Target: https://bkgann3.grafana.net/login
SYNTHETIC GET - STEP 2: https://bkgann3.grafana.net/login
SYNTHETIC GET - STEP 2: https://bkgann3.grafana.net/login  DURATION: 155ms
Step 3: Target: https://bkgann3.grafana.net/login/grafana_com
SYNTHETIC GET - STEP 3: https://bkgann3.grafana.net/login/grafana_com
SYNTHETIC GET - STEP 3: https://bkgann3.grafana.net/login/grafana_com  DURATION: 138ms
Step 4: Target: https://grafana.com/oauth2/authorize?access_type=online&client_id=4579dc0323c2042eb808&redirect_uri=https%3A%2F%2Fbkgann3.grafana.net%2Flogin%2Fgrafana_com&response_type=code&scope=user%3Aemail&state=PuaU_YRJSko1-yV1UtBCM_9rUMeVOMBjBmfCmG9DT7U%3D
SYNTHETIC GET - STEP 4: https://grafana.com/oauth2/authorize?access_type=online&client_id=4579dc0323c2042eb808&redirect_uri=https%3A%2F%2Fbkgann3.grafana.net%2Flogin%2Fgrafana_com&response_type=code&scope=user%3Aemail&state=PuaU_YRJSko1-yV1UtBCM_9rUMeVOMBjBmfCmG9DT7U%3D
SYNTHETIC GET - STEP 4: https://grafana.com/oauth2/authorize?access_type=online&client_id=4579dc0323c2042eb808&redirect_uri=https%3A%2F%2Fbkgann3.grafana.net%2Flogin%2Fgrafana_com&response_type=code&scope=user%3Aemail&state=PuaU_YRJSko1-yV1UtBCM_9rUMeVOMBjBmfCmG9DT7U%3D  DURATION: 158ms
Step 5: Target: https://grafana.com/login?to=%2Foauth2%2Fauthorize%3Faccess_type%3Donline%26amp%253Bclient_id%3D4579dc0323c2042eb808%26amp%253Bredirect_uri%3Dhttps%253A%252F%252Fbkgann3.grafana.net%252Flogin%252Fgrafana_com%26amp%253Bresponse_type%3Dcode%26amp%253Bscope%3Duser%253Aemail%26amp%253Bstate%3DPuaU_YRJSko1-yV1UtBCM_9rUMeVOMBjBmfCmG9DT7U%253D
SYNTHETIC GET - STEP 5: https://grafana.com/login?to=%2Foauth2%2Fauthorize%3Faccess_type%3Donline%26amp%253Bclient_id%3D4579dc0323c2042eb808%26amp%253Bredirect_uri%3Dhttps%253A%252F%252Fbkgann3.grafana.net%252Flogin%252Fgrafana_com%26amp%253Bresponse_type%3Dcode%26amp%253Bscope%3Duser%253Aemail%26amp%253Bstate%3DPuaU_YRJSko1-yV1UtBCM_9rUMeVOMBjBmfCmG9DT7U%253D
SYNTHETIC GET - STEP 5: https://grafana.com/login?to=%2Foauth2%2Fauthorize%3Faccess_type%3Donline%26amp%253Bclient_id%3D4579dc0323c2042eb808%26amp%253Bredirect_uri%3Dhttps%253A%252F%252Fbkgann3.grafana.net%252Flogin%252Fgrafana_com%26amp%253Bresponse_type%3Dcode%26amp%253Bscope%3Duser%253Aemail%26amp%253Bstate%3DPuaU_YRJSko1-yV1UtBCM_9rUMeVOMBjBmfCmG9DT7U%253D  DURATION: 174ms
Step 6: Target: https://grafana.com/api/login
SYNTHETIC POST - STEP 6: https://grafana.com/api/login DURATION: 383ms
Step 7: Target: https://grafana.com/api/oauth2/clients/4579dc0323c2042eb808
SYNTHETIC GET - STEP 7: https://grafana.com/api/oauth2/clients/4579dc0323c2042eb808
SYNTHETIC GET - STEP 7: https://grafana.com/api/oauth2/clients/4579dc0323c2042eb808  DURATION: 145ms
Step 8: Target: https://grafana.com/api/oauth2/grants?clientId=4579dc0323c2042eb808
SYNTHETIC GET - STEP 8: https://grafana.com/api/oauth2/grants?clientId=4579dc0323c2042eb808
SYNTHETIC GET - STEP 8: https://grafana.com/api/oauth2/grants?clientId=4579dc0323c2042eb808  DURATION: 146ms
{u'instanceId': 73788, u'name': u'bkgann3.grafana.net', u'links': [{u'href': u'/oauth2/clients/4579dc0323c2042eb808', u'rel': u'self'}, {u'href': u'/orgs/bkgann', u'rel': u'org'}], u'url': u'https://bkgann3.grafana.net', u'orgSlug': u'bkgann', u'id': u'4579dc0323c2042eb808', u'orgName': u'Brian Gann', u'orgId': 127614, u'updatedAt': u'2019-01-15T00:01:15.000Z', u'redirectUri': u'https://bkgann3.grafana.net/login/grafana_com', u'createdAt': u'2018-12-14T22:02:08.000Z', u'description': u''}
Step 9: Target: https://grafana.com/api/oauth2/authorize
SYNTHETIC POST - STEP 9: https://grafana.com/api/oauth2/authorize DURATION: 162ms
Step 10: Target: https://bkgann3.grafana.net/login/grafana_com?code=a5c9c606f8fbfa61367c3806899ae9ad70d430f5&state=PuaU_YRJSko1-yV1UtBCM_9rUMeVOMBjBmfCmG9DT7U%3D
SYNTHETIC GET - STEP 10: https://bkgann3.grafana.net/login/grafana_com?code=a5c9c606f8fbfa61367c3806899ae9ad70d430f5&state=PuaU_YRJSko1-yV1UtBCM_9rUMeVOMBjBmfCmG9DT7U%3D
SYNTHETIC GET - STEP 10: https://bkgann3.grafana.net/login/grafana_com?code=a5c9c606f8fbfa61367c3806899ae9ad70d430f5&state=PuaU_YRJSko1-yV1UtBCM_9rUMeVOMBjBmfCmG9DT7U%3D  DURATION: 738ms
Step 10: Target: https://bkgann3.grafana.net
SYNTHETIC GET - STEP 11: https://bkgann3.grafana.net
SYNTHETIC GET - STEP 11: https://bkgann3.grafana.net  DURATION: 162ms
YES!

Metrics Generated

The following metrics are generated:

nameunitdescription
resultboolean0 for failure, 1 for Success
durationmillisecondstime to perform request
status_codeintegerHTTP response code
content_sizebytessize of content returned
hosted_grafana.step_01.result;step=hosted_grafana.step_01;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=boolean 1 1560461215
hosted_grafana.step_01.duration;step=hosted_grafana.step_01;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=ms 567 1560461215
hosted_grafana.step_01.status_code;step=hosted_grafana.step_01;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=integer 302 1560461215
hosted_grafana.step_01.content_size;step=hosted_grafana.step_01;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=B 29 1560461215
hosted_grafana.step_02.result;step=hosted_grafana.step_02;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=boolean 1 1560461215
hosted_grafana.step_02.duration;step=hosted_grafana.step_02;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=ms 155 1560461215
hosted_grafana.step_02.status_code;step=hosted_grafana.step_02;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=integer 200 1560461215
hosted_grafana.step_02.content_size;step=hosted_grafana.step_02;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=B 30408 1560461215
hosted_grafana.step_03.result;step=hosted_grafana.step_03;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=boolean 1 1560461215
hosted_grafana.step_03.duration;step=hosted_grafana.step_03;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=ms 138 1560461215
hosted_grafana.step_03.status_code;step=hosted_grafana.step_03;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=integer 302 1560461215
hosted_grafana.step_03.content_size;step=hosted_grafana.step_03;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=B 289 1560461215
hosted_grafana.step_04.result;step=hosted_grafana.step_04;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=boolean 1 1560461215
hosted_grafana.step_04.duration;step=hosted_grafana.step_04;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=ms 158 1560461215
hosted_grafana.step_04.status_code;step=hosted_grafana.step_04;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=integer 302 1560461215
hosted_grafana.step_04.content_size;step=hosted_grafana.step_04;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=B 682 1560461215
hosted_grafana.step_05.result;step=hosted_grafana.step_05;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=boolean 1 1560461215
hosted_grafana.step_05.duration;step=hosted_grafana.step_05;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=ms 174 1560461215
hosted_grafana.step_05.status_code;step=hosted_grafana.step_05;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=integer 200 1560461215
hosted_grafana.step_05.content_size;step=hosted_grafana.step_05;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=B 30761 1560461215
hosted_grafana.step_06.result;step=hosted_grafana.step_06;runner=ares.local;request_method=POST;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=boolean 1 1560461215
hosted_grafana.step_06.duration;step=hosted_grafana.step_06;runner=ares.local;request_method=POST;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=ms 383 1560461215
hosted_grafana.step_06.status_code;step=hosted_grafana.step_06;runner=ares.local;request_method=POST;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=integer 200 1560461215
hosted_grafana.step_06.content_size;step=hosted_grafana.step_06;runner=ares.local;request_method=POST;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=B 5257 1560461215
hosted_grafana.step_07.result;step=hosted_grafana.step_07;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=boolean 1 1560461215
hosted_grafana.step_07.duration;step=hosted_grafana.step_07;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=ms 145 1560461215
hosted_grafana.step_07.status_code;step=hosted_grafana.step_07;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=integer 200 1560461215
hosted_grafana.step_07.content_size;step=hosted_grafana.step_07;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=B 538 1560461215
hosted_grafana.step_08.result;step=hosted_grafana.step_08;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=boolean 1 1560461215
hosted_grafana.step_08.duration;step=hosted_grafana.step_08;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=ms 146 1560461215
hosted_grafana.step_08.status_code;step=hosted_grafana.step_08;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=integer 200 1560461215
hosted_grafana.step_08.content_size;step=hosted_grafana.step_08;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=B 941 1560461215
hosted_grafana.step_09.result;step=hosted_grafana.step_09;runner=ares.local;request_method=POST;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=boolean 1 1560461215
hosted_grafana.step_09.duration;step=hosted_grafana.step_09;runner=ares.local;request_method=POST;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=ms 162 1560461215
hosted_grafana.step_09.status_code;step=hosted_grafana.step_09;runner=ares.local;request_method=POST;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=integer 200 1560461215
hosted_grafana.step_09.content_size;step=hosted_grafana.step_09;runner=ares.local;request_method=POST;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=B 430 1560461215
hosted_grafana.step_10.result;step=hosted_grafana.step_10;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=boolean 1 1560461215
hosted_grafana.step_10.duration;step=hosted_grafana.step_10;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=ms 738 1560461215
hosted_grafana.step_10.status_code;step=hosted_grafana.step_10;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=integer 302 1560461215
hosted_grafana.step_10.content_size;step=hosted_grafana.step_10;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=B 24 1560461215
hosted_grafana.step_11.result;step=hosted_grafana.step_11;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=boolean 1 1560461215
hosted_grafana.step_11.duration;step=hosted_grafana.step_11;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=ms 162 1560461215
hosted_grafana.step_11.status_code;step=hosted_grafana.step_11;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=integer 200 1560461215
hosted_grafana.step_11.content_size;step=hosted_grafana.step_11;runner=ares.local;request_method=GET;info=HostedGrafanaSynthetic;instance_name=bkgann3;org_id=4068;mtype=gauge;unit=B 40699 1560461215

Basic Dashboard

A dashboard can be built with Grafana using a combination of SingleStat panels and Graph panels.

The top portion of the dashboard displays the overall health of a hosted Grafana instance:

overview

The next section displays OK/CRIT for each stage of the synthetic operation:

summary

The last section gives more detail in graph format:

detail

Queries

The Graphite queries are very basic. A few are shown below. See the supplied dashboard json for more details.

Step 1 Results

averageSeries(seriesByTag('name=hosted_grafana.step_01.result'))

Step 1 Durations

alias(averageSeries(seriesByTag('name=hosted_grafana.step_01.duration')), 'duration')

Time to Login - COLD (clean browser, no cookies, session, etc), which is the sum of the duration of steps 1 through 10.

sumSeries(seriesByTag('name=~hosted_grafana.hosted_grafana.(step_0\d|step_10).duration'))

Time to Login - HOT (cookie/session/etc cached) is the time it takes to hit the instance when a session is already active.

alias(sumSeries(seriesByTag('name=~hosted_grafana.step_10.duration')), 'duration')

Deep Dive into Creating Synthetics

Here’s the general process used to figure out each step for a hosted Grafana login. The script that performs each step is written in Python, but could easily be written in other languages.

Step 1

Starting with Chome and devtools show, enable preserve log, and visit the destination, in this case it is https://bkgann3.grafana.net

In dev tools you’ll see a 302 (redirect) as the response. The response will also include the redirect_url. With those two items, we can test step 1 by connecting, checking for a 302 HTTP response code (anything else is an error), and get the redirect_url, which we’ll use in the next step.

synthetic1

Step 2

Connecting to the redirect_url from step 1, we’ll be sent to the login path, in this case https://bkgann3.grafana.net/login. We get a 200 response from this step. Anything else is an error.

synthetic2

Step 3

This step requires minor digging through the web page. Inspecting the login button gives us our next url to query. https://bkgann3.grafana.net/login/grafana_com

synthetic3 synthetic4

Connecting to that url will response with a 302 (redirect) and another url to visit.

Step 4

synthetic5

The redirect from step 3 is to use OAuth with grafana.com. We’ll get another 302 redirect url here.

Step 5

Next we’ll post our login info to https://grafana.com/api/login, expecting a 200 response.

synthetic6

Step 6

Next we’ll query for client ID from the API: https://grafana.com/api/oauth2/clients/4579dc0323c2042eb808 This will return a 200 response, with a payload we’ll use next.

synthetic7

Step 7

Get the “grants,” we next query https://grafana.com/api/oauth2/grants?clientId=4579dc0323c2042eb808

This responds with a 200 and a session cookie.

synthetic8

Step 8

Next we use the cookie and authorize by querying https://grafana.com/api/oauth2/authorize?access_type=online&client_id=4579dc0323c2042eb808 (There is more in the query; the important part is to get a 200 response and the session.)

synthetic9

Step 9

We’ll now use the code and session cookie from step 8 and try to login again: https://bkgann3.grafana.net/login/grafana_com?code=X&state=Y

synthetic10

We’ll get a 302 redirect and an url again, which is exactly where we started off!

Step 10

Now that we have authorization and a good session established, we can connect and get back a 200 response.

synthetic11

Step 11

The 11th step is to parse the body of step 10 for a successful login string, which is easy to locate:

"isSignedIn": true

synthetic12

If we see this string in the body, we’ve completed our login successfully.

Wrapping it All Up

In this example, the end-user experience is measured and provides real feedback on site reliability.

Granular metrics like CPU, disk, and memory are also collected but only leveraged by an SRE when looking for opportunities to optimize the service. The synthetics can provide insight as to where to start looking.

The synthetic script can be cloned from this repo.

Use it to monitor your own experience with hosted Grafana or adapt it for your application!