Plugins 〉Parquet-S3-Datasource


Developer

Tobiasworkstech

Sign up to receive occasional product news and updates:



Data Source
community

Parquet-S3-Datasource

  • Overview
  • Installation
  • Change log
  • Related content

Parquet-S3-Datasource for Grafana

Query and visualize Apache Parquet files stored in Amazon S3 (or any S3-compatible storage such as MinIO, Wasabi, Cloudflare R2, or DigitalOcean Spaces) directly in Grafana — without an intermediate database.

The datasource reads Parquet footers and column chunks straight from object storage, converts them through Apache Arrow into Grafana data frames, and runs a built-in lightweight SQL engine over the result so you can build dashboards and alerts from your data lake.

Overview

Query Editor

Typical use cases:

  • Ad-hoc exploration of Parquet exports (analytics dumps, ML datasets, ETL outputs) sitting in S3.
  • Long-term observability archives — keep recent metrics in Prometheus and roll older data into Parquet on S3.
  • Multi-tenant data lakes where each customer's data lives in its own bucket or prefix.
  • Alerting on data that already lives in S3, without standing up a query engine.

Features

  • Direct Parquet access over S3 — no Athena, Trino, or DuckDB required.
  • S3-compatible storage support: Amazon S3, MinIO, Wasabi, Cloudflare R2, DigitalOcean Spaces, Backblaze B2, etc. Custom endpoints and path-style URLs are supported.
  • Apache Arrow pipeline for efficient columnar reads and type-correct Grafana frames (timestamps, numerics, booleans, strings).
  • Built-in SQL subset with SELECT, WHERE, GROUP BY, ORDER BY, LIMIT, the aggregates COUNT, SUM, AVG, MIN, MAX, column aliasing, and the comparison/logic operators =, !=, >, <, >=, <=, LIKE, IN, IS NULL, IS NOT NULL, AND, OR.
  • Visual query builder in Builder mode, raw SQL in Code mode, with a SQL preview as you build.
  • Template variables: buckets(), files(prefix), prefixes(prefix), and arbitrary SQL-driven variables.
  • Alerting: the datasource implements Grafana's alerting interface, so any SQL query can drive an alert rule.
  • AWS-SDK auth via grafana-aws-sdk (access key, default chain, assume-role, instance profile, named profile).

The SQL engine is a custom in-memory executor shipped with the plugin — it is not DuckDB. DuckDB-specific functions, window functions, CTEs, and JOINs are not available.

Screenshots

Datasource configuration

Datasource Configuration

Example dashboards (included as provisioned samples)

Iris dataset

Titanic dataset

Server metrics time-series

Template variables

Template Variables

Requirements

  • Grafana >= 11.0.0
  • An S3 or S3-compatible bucket containing Parquet files
  • Credentials with s3:ListBucket and s3:GetObject on that bucket

Installation

Using the Grafana CLI:

grafana-cli plugins install tobiasworkstech-parquets3-datasource

Or with the official Grafana Docker image:

docker run -d -p 3000:3000 \
  -e GF_INSTALL_PLUGINS=tobiasworkstech-parquets3-datasource \
  grafana/grafana

Getting started

  1. Add the datasource: in Grafana, go to Connections → Data sources → Add data source and pick Parquet-S3-Datasource.

  2. Configure the connection:

    • Region — the AWS region your bucket lives in (e.g. us-east-1). Required by the SDK even for S3-compatible providers.
    • Bucket — the bucket to query.
    • Endpoint (optional) — override the S3 endpoint URL for non-AWS providers (e.g. https://minio.example.com, https://<account>.r2.cloudflarestorage.com).
    • Authentication type — choose between AWS SDK Default, Access & Secret Key, Credentials File, EC2 IAM Role, or Assume Role. For static keys, supply the Access Key and Secret Key.
  3. Click Save & test. A green check means Grafana could authenticate and list the bucket.

  4. Build a query: add a panel, pick the datasource, choose a Parquet file, then either let the builder generate SQL or switch to Code mode and write your own:

    SELECT timestamp, host, cpu_percent
    FROM parquet
    WHERE host = 'web-01'
    ORDER BY timestamp DESC
    LIMIT 500
    

    FROM parquet is a placeholder — the actual Parquet file is determined by the Path field in the query editor.

Query examples

-- All rows, all columns
SELECT * FROM parquet

– Filter + sort SELECT name, value FROM parquet WHERE value > 100 ORDER BY value DESC

– GROUP BY + aggregates + alias SELECT category, COUNT(*) AS count, AVG(price) AS avg_price FROM parquet GROUP BY category

– Most recent N rows SELECT * FROM parquet ORDER BY timestamp DESC LIMIT 100

Template variables

Query TypeWhat it returnsInputs
List FilesObject keys in the bucket matching a glob (*.parquet default)Optional prefix, file pattern
PrefixesTop-level "folders" under a prefix (uses Delimiter=/)Optional prefix
SQL QueryFirst column of a SQL query, deduplicatedPath + SQL query

Examples:

  • All parquet files in the bucket: Query Type = List Files, File Pattern = *.parquet.
  • Files in a folder: Query Type = List Files, Prefix = data/2026/, File Pattern = *.parquet.
  • Distinct values from a column: Query Type = SQL Query, Path = data.parquet, SQL = SELECT DISTINCT category FROM parquet.

Configuration examples

Amazon S3

Region:     us-east-1
Bucket:     my-data-lake
Endpoint:   (leave empty)
Auth:       Access & Secret Key   (or Assume Role / Default chain)

MinIO (self-hosted, path-style)

Region:     us-east-1
Bucket:     parquet-data
Endpoint:   http://minio.internal:9000
Auth:       Access & Secret Key

Path-style routing is enabled automatically when a custom endpoint is set.

Cloudflare R2

Region:     auto
Bucket:     my-bucket
Endpoint:   https://<account-id>.r2.cloudflarestorage.com
Auth:       Access & Secret Key

Wasabi

Region:     us-east-1
Bucket:     my-bucket
Endpoint:   https://s3.wasabisys.com
Auth:       Access & Secret Key

Alerting

plugin.json declares "alerting": true, so any query you can write in this datasource can drive a Grafana alert rule. The bundled example (provisioning/alerting/alerts.yml in the repo) shows a "High CPU" rule built from SELECT AVG(cpu_percent) FROM parquet.

Supported Parquet features

  • Primitive types: INT8/16/32/64, UINT32/64, FLOAT32/64, BOOLEAN, STRING, LARGE_STRING, TIMESTAMP (ns/µs/ms/s).
  • Compression codecs handled by Arrow: SNAPPY, GZIP, LZ4, ZSTD, BROTLI.
  • Column pruning when only a subset of columns is selected.
  • Schema discovery via the Parquet footer — the column list for a file is fetched without downloading the whole file.

Nested types (STRUCT, LIST, MAP) are read using their Arrow string representation; explicit nested-projection support is on the roadmap.

Troubleshooting

  • Save & test fails with connection error — verify the region, bucket name, endpoint URL, and that the credentials have s3:ListBucket and s3:GetObject. For self-signed endpoints, make sure the Grafana host trusts the certificate.
  • No data returned — confirm the file path matches an object key in the bucket (case-sensitive, no leading slash). Use a List Files template variable to discover keys.
  • SQL errors — column names are case-insensitive but must exist; quote names with special characters using double quotes ("foo.bar"). Only the SQL subset documented above is supported.
  • Slow queries — the engine reads the full file for non-aggregated queries. Push selectivity into a smaller file or partition by writing your Parquet files with a prefix per time window (e.g. metrics/2026/05/20/...).

Source, issues, and contributing

The plugin is open-source (Apache 2.0). Source, issue tracker, and release notes live at:

https://github.com/tobiasworkstech/tobiasworkstech-parquets3-datasource

Bug reports, feature requests, and pull requests are welcome.

Install on Grafana Cloud

Plugins can be installed directly from within your Grafana instance or automated using the Cloud API or Terraform.

Learn more about plugin installation

For more information, visit the docs on plugin installation.

Changelog

All notable changes to this project will be documented in this file.

[1.2.15] - 2026-05-20

Build

  • Drop 32-bit linux/arm binary. Apache Thrift v0.23.0 (required to fix CVE-2026-41602) uses math.MaxUint32 as an untyped int constant in framed_transport.go, which overflows on 32-bit architectures. 32-bit ARM is rarely used to run Grafana (RPi 4+ is arm64), so the artifact is dropped rather than staying on the vulnerable Thrift version. A custom BuildAll target in Magefile.go builds the remaining five platforms (linux amd64/arm64, darwin amd64/arm64, windows amd64).
  • Refresh grafana-plugin-sdk-go to v0.292.0 to clear the validator's "SDK older than 2 months" warning.

[1.2.13] - 2026-05-20

Bug Fixes

  • S3 list pagination: listPrefixes and listFiles now page through ListObjectsV2 results so buckets with more than 1000 objects are not silently truncated.

Improvements

  • Schema discovery without full read: SELECT * FROM parquet LIMIT 0 (used by the frontend to fetch column names) now reads only the Parquet footer via arrowReader.Schema() instead of downloading the entire file from S3.
  • Renamed pkg/duckdbpkg/sqlexec: the package never used DuckDB; the new name reflects what it actually is — a small in-memory SQL executor.
  • Plugin README: replaced the @grafana/create-plugin template src/README.md with proper, screenshot-rich documentation that is bundled into the published plugin.

[1.2.0] - 2026-02-14

Bug Fixes

  • Fix LARGE_STRING Parquet panic: Pandas-generated Parquet files use LARGE_STRING Arrow type, which caused a panic in the Parquet reader. Separated arrow.LARGE_STRING and arrow.STRING handling with proper type assertions.
  • Fix GROUP BY not aggregating: GROUP BY queries returned one row per record instead of grouped results. The SQL executor was using pointer addresses as group keys (*int64, *string). Added formatValue() helper to dereference pointers before grouping.
  • Fix Titanic dashboard bar charts: Grafana barchart panels require a string x-axis but Pclass is int64. Changed to table panels with gauge cell rendering.

Improvements

  • Added Playwright E2E tests verifying all 4 provisioned dashboards load with correct data
  • Added Docker Compose dev environment (docker/docker-compose.yml) with MinIO and sample data generation
  • Updated all provisioned dashboard JSON files for Grafana compatibility
  • Added CLAUDE.md project documentation

[1.1.0] - 2026-02-02

Features

  • SQL Query Support: Built-in, in-memory SQL engine (a custom lightweight executor, not DuckDB)
    • SELECT, WHERE, GROUP BY, ORDER BY, LIMIT clauses
    • Aggregation functions (COUNT, SUM, AVG, MIN, MAX)
    • Filtering and sorting
  • Visual Query Builder: PostgreSQL-style query builder interface
    • Column selection with aggregations
    • Filter toggle with condition builder
    • Group By toggle with multi-column selection
    • Order By toggle with ASC/DESC
    • SQL Preview panel showing generated query
  • Template Variables: Support for dashboard template variables
    • List files in bucket (with regex filtering)
    • List prefixes/folders
    • SQL-based variable queries
  • Explore View: Enhanced query editor for Grafana Explore
    • File selector with regex filtering
    • Refresh button to reload file list
    • Builder and Code modes
  • Sample Dashboards: Pre-built dashboards demonstrating plugin capabilities
    • Iris Dataset dashboard
    • Titanic Survival Dataset dashboard
    • Time Series Metrics dashboard

Improvements

  • Reduced Grafana version requirement to 11.0.0+
  • Better error handling and logging
  • Improved path-style routing for S3-compatible storage
  • ARM64 binary support for Apple Silicon

[1.0.0] - 2025-12-22

Features

  • Initial Release: Parquet-S3-Datasource plugin for Grafana
  • S3 Connectivity: Support for Amazon S3 and S3-compatible storage providers:
    • Amazon S3
    • MinIO
    • Wasabi
    • DigitalOcean Spaces
    • Any S3-compatible storage
  • Direct Parquet Querying: Read Apache Parquet files directly from S3 without intermediate databases
  • Apache Arrow Integration: Efficient columnar data processing using Apache Arrow
  • Custom Endpoints: Configurable S3 endpoints for private cloud deployments
  • Path-Style Routing: Automatic configuration for storage systems requiring path-style URLs
  • Data Types Support: All Parquet primitive types, nested structures, and compression codecs
  • Grafana 11.6+: Fully compatible with Grafana 11.6.0 and above

Components

  • React-based frontend with TypeScript
  • Go backend utilizing AWS SDK v2
  • Docker Compose setup for local development with MinIO
  • Comprehensive documentation and configuration examples
  • Demo video and screenshots

Configuration

  • Region selection
  • Bucket specification
  • Custom endpoint URLs
  • Secure credential management (Access Key/Secret Key)

Development

  • Provisioned datasource for quick Docker setup
  • Sample Parquet file generation script
  • Minio upload utilities
  • End-to-end testing capabilities