Private data source connect (PDC)
Private data source connect, or PDC, is a way for you to establish a private, secured connection between a Grafana Cloud instance, or stack, and data sources secured within a private network.
Observability data is often located within private networks such as on-premise networks and Virtual Private Clouds (VPCs) hosted by AWS, Azure, Google Cloud Platform, or other public cloud providers. For example, you might host your Splunk or Elasticsearch service on your private network, or you might want to visualize data from Amazon RDS hosted in a VPC. PDC also allows you to connect to any network-secured data source regardless of what cloud provider you use, or if you host your own data in an on-premises network.
The following will help you get started working with PDC:
Private data source connect (PDC) concepts
PDC allows you to connect to any network-secured data source, regardless of whether it is hosted on an AWS/GCP/Azure VPC, an on-prem network, or even your local computer.
Instead of implementing a VPN solution, Grafana has built a solution that allows you to route queries to many isolated networks without having to worry about overlapping subnets. Queries and data are routed and encrypted from your Grafana instance via the PDC agent deployed within your network. As a result, PDC is entirely within your control, since you deploy and manage the agent.
PDC provides the ability for customer-operated SOCKS5 SSH tunnels that connect to a Grafana Cloud managed tunneling reverse proxy that communicates directly with your Grafana Cloud instance and encrypts traffic with a customer-provided SSH key.
PDC operates at a high level in the following manner:
The PDC agent initiates an SSH connection with the Grafana data source connect service. The SSH client running in your network is configured with reverse dynamic forwarding* (_the* -R <port> option)._ In this mode, SSH acts as a SOCKS proxy and forwards connections to destinations requested by Grafana.
Whenever your Grafana instance needs to query your private data source, the TCP connection is wrapped in a secure SOCKS connection and then routed to the Grafana PDC service.
SOCKS packets are forwarded to the PDC agent through the SSH connection.
The PDC agent resolves the DNS of the data source endpoint and establishes a secure connection to the data source.
PDC advantages
There are several advantages to Grafana’s PDC solution:
The monitoring and supervision of the SSH tunnel are delegated to an agent running inside your private network. At any time, you can shut off the agent, which terminates the connection. PDC is entirely within your control, since you deploy and manage the agent.
The agent running inside your private network is a horizontally scalable component to ensure fault-tolerance. This means you can deploy multiple agents within the same network, and Grafana Cloud load balances across them automatically.
Traffic is encrypted all the way from your Grafana Cloud stack to the SSH client running in your private network. If the private data source supports encryption (for example, HTTPS), traffic will be encrypted end-to-end.
In your Grafana Cloud instance, you will be able to configure compatible data sources to route requests through the SSH tunnel. Each data source is configured using the internal DNS name (for example, mysql.your.domain:3306), as if Grafana were running directly inside the private network.
You can restrict the destinations reachable by Grafana Cloud over this tunnel using the PermitRemoteOpen SSH option. For example, you can restrict the agent to permit access to only certain hostnames, ports or IP addresses.
You can route each session transparently and securely to the correct connection without having to deal with CIDR ranges.
Routing requests through PDC has a negligible effect on query time. PDC only increases request time by tens of milliseconds on average.
PDC known limitations
PDC has the following known limitations:
- The SOCKS5 protocol can add some latency to queries, although this should be minor and not readily noticeable by users.
The following data sources are currently supported:
Open Source | Enterprise |
---|---|
AWS Cloudwatch | AppDynamics |
Azure Data Explorer | Big Query |
Azure Monitoring | Databricks |
ClickHouse | Dynatrace |
Elasticsearch | GitLab |
Falcon LogScale | Jira |
Google Cloud Monitoring | MongoDB |
Graphite | Oracle |
Infinity | Salesforce |
InfluxDB | SAP HANA |
Jaeger | ServiceNow |
Loki | Snowflake |
Mimir | Splunk |
MSSQL | Splunk Infrastructure Monitoring |
MySQL | Wavefront |
OpenSearch | |
OpenTSDB | |
Parca | |
PostgreSQL | |
Prometheus | |
Pyroscope | |
Sentry | |
Tempo | |
Zabbix | |
Zipkin |