Blog  /  Engineering

Monitoring COVID-19 virus levels in wastewater using Grafana, Databricks, and the Sqlyze plugin

18 May 2022 5 min read

The new Sqlyze data source plugin (in beta) allows you to connect your Grafana instance to all your favorite SQL databases, many NoSQL databases, and many other non-SQL data sources — from document databases, to ERP systems, to even Slack. You don’t have to know the native query syntax for these data sources; you can just use SQL.

The Sqlyze plugin uses ODBC at its core. Hundreds of ODBC drivers are available for various databases/data sources. So if an ODBC driver exists, you can now query it in Grafana with SQL.

What is Databricks?

Databricks is a popular data lake/data warehouse/AI platform that uses Apache Spark to rip through huge amounts of data and provide super fast metrics. Pairing this with Grafana, we can build some nice looking, lightning fast dashboards that slice through mountains of data and instantly give you some great insights.

Sounds great right? Let’s look at how we can connect to Databricks and run some SQL queries.

Set up and connect the Sqlyze plugin and Databricks

Step 1: Install the driver (this is currently manual but will be easier on future versions). You can obtain the driver here.

For Linux users only:

  • Install unixODBC using the package manager for your Linux distribution
apt-get install unixodbc   or
yum install unixODBC
  • Add an entry to your odbcinst.ini file. This defines where the driver is installed.

    To find your odbcinst.ini use the following command:

odbcinst -j

dbcinst.ini (change the path below to your driver)

[ODBC Drivers]

Step 2: Get your driver settings from Databricks

  • Go to your cluster and select the JDBC/ODBC tab
Screenshot of Databricks UI for driver settings.

Step 3: Install the Sqlyze plugin on Grafana

Step 4: Create a new Sqlyze data source with your driver settings. See example below.  Note: Driver must match the entry name in the odbcinst.ini file.

Screenshot showing how to add a new Sqlyze plugin to Databricks

Step 5: Save and test

Screenshot of UX when a data source is added successfully.

Create a Grafana dashboard and run SQL

After you’ve created your dashboard and panel in Grafana, select your new data source. Once selected, you will be presented with a SQL editor with some nice color coding for SQL.

Importing COVID-19 wastewater data from the CDC

For this example, we’ve decided to look at a timely topic: Tracking the community spread of COVID-19 by importing COVID-19 wastewater surveillance data from the Centers for Disease Control and Prevention into Databricks. People who are infected with COVID-19 shed the virus in their stool, so wastewater surveillance has become one method that public health officials use to understand, monitor, and even predict the spread of COVID-19 within a community — especially as home-testing has become more popular but harder to record and track.

Here, we are going to visualize the percentage of change of the COVID-19 virus in wastewater over time at a particular wastewater treatment plant. First of all, it’s important to note what this “percentage” metric is telling us. This metric is defined by the CDC as follows:

This metric shows whether SARS-CoV-2 virus levels at a site are currently higher or lower than past historical levels at the same site. 0% means levels are the lowest they have been at the site; 100% means levels are the highest they have been at the site. Public health officials watch for increasing levels of the virus in wastewater over time and use this data to help make public health decisions.

I’ve written a query (below) to show the percentage changes for one treatment plant in Michigan and adjusted the time range to a 5-month window.

Dashboard of query showing COVID-19 virus in one wastewater treatment plant in Michigan.

So, using Grafana, we can monitor these changes over time. We can even set up alerts if the percentage change hits a certain threshold.

How to query COVID-19 wastewater data by state and county

This dataset also provides the population of those within this wastewater plant’s jurisdiction. Let’s add another query and stat panel to show that metric.

Dashboard of query showing total population served in COVID-19 virus in wastewater.

We can also see the specific counties that are within the jurisdiction of this treatment plant. It would be interesting to show the counties that are affected. Let’s add another panel and show a table since there can be multiple counties.

Dashboard of query showing counties within the jurisdiction of one wastewater treatment plant in Michigan.

Let’s also add a text panel to describe the metric. Now we’ve got a comprehensive dashboard taking shape.

Dashboard showing a text panel describing the metrics visualized.

I’ve hardcoded the queries to the state of Michigan. Now let’s add some template variables where users can choose the state and county, so they can see the virus levels at treatment plants in other places, perhaps closer to their homes or in areas they may be traveling to in the future.

First, we’ll add a query type variable to get a list of states.

Dashboard with template variables so users can see COVID-19 virus levels in wastewater by state and county.

Now we’ll add a query type variable to get a list of counties. In this query, we will reference our state variable.

Dropdown menu showing counties within a selected state.

Now we will edit our time series panel and update the query to use our new variables.

select date_start as time, wwtp_jurisdiction, percentile
from default.nwss_public_sars_cov_2_wastewater_metric_data
where wwtp_jurisdiction = '$state'
and county_names = '$county'
order by date_start

Let’s also update the stats panel query to reference the variables.

select max(population_served) from default.nwss_public_sars_cov_2_wastewater_metric_data
where wwtp_jurisdiction = '$state'
and county_names = '$county'

Now we can choose state and county from our dashboard. Since we can choose the county now, I’ve removed the table of counties panel.

Dashboard showing COVID-19 virus in wastewater within state and county selected by user.

This is nice, but can we compare different counties? Yes! Let’s change our time series query a bit and enable our county variable to select multiple values.

Dashboard comparing COVID-19 virus levels in wastewater in different counties.

So there you have it — using the Sqlyze data source plugin, we can query COVID-19 wastewater surveillance data on Databricks. Remember the Sqlyze plugin can connect to many other sources.  Stay tuned for more examples!

Grafana Cloud is the easiest way to get started with metrics, logs, traces, and dashboards. We have a generous free forever tier and plans for every use case. Sign up for free now!