How to automate image analysis with the ChatGPT vision API and Grafana Cloud Metrics

• 2024-03-27 • 5 min

OpenAI’s ChatGPT has an extraordinary ability to process natural language, reason about a user’s prompts, and generate human-like conversation in response. However, as the saying goes, “a picture is worth a thousand words” — and perhaps an even more significant achievement is ChatGPT’s ability to understand and answer questions about images.

In this post, we’ll walk through an example of how to use ChatGPT’s vision capabilities — officially called GPT-4 with vision (or GPT-4V) — to identify objects in images and then automatically plot the results as metrics in Grafana Cloud. In our example, we will use publicly available images from the United States National Parks Service, but in the end, you’ll be able to leverage computer vision techniques and Grafana Cloud Metrics for your own unique use cases.

Example: count the number of vehicles entering Yellowstone

The following example illustrates how to automate the process of image analysis — a previously time-intensive and manual task performed by humans. Basically, we’ll have an AI-agent-turned-intelligence-analyst at our fingertips.

Let’s get started.

Task

We want to count the number of vehicles waiting to enter Yellowstone National Park from the North Gate, often referred to as the Roosevelt Arch. We will then save the output in a metric time series that we can view in a graph using Grafana Cloud.

Prerequisites

Beginner programming skills in Ruby
Free Grafana Cloud account

Inputs provided

Example input images for your AI agent to match
- Vehicle 1
- Vehicle 2
An image to inspect that refreshes over time
- Yellowstone North Gate Security Camera feed

Process

Step 1: Initialize your OpenAI API client.

@openai_client = OpenAI::Client.new(
  access_token: ENV["OPENAI_API_KEY"],
  organization_id: ENV["OPENAI_API_ORGANIZATION_ID"]
)

Step 2: Prepare your ChatGPT vision prompts. These prompts tell ChatGPT what to do with the images provided.

Note that we provide two images of example vehicles that we’d like ChatGPT to identify. Our third image is from a security camera at the Yellowstone Roosevelt Arch entrance. The third image will update every few minutes, as new images of the entrance become available. Finally, we specify to OpenAI an example output contained in a machine-readable JSON object with the key/value result "matches".

system_context = "You are an expert image analyst capable of identifying patterns between images. You count a match when you find an object in the third image that looks like a car or truck from the first or second images. Only count a match if you're very confident a match exists."

user_messages = [
  { "type": "text", "text": "How many times does the object from the first or second       image appear in the third image? Be precise."},
  { "type": "image_url",
      "image_url": {
          "url": "https://images.unsplash.com/photo-1616549972169-0a0d961c9905",
        },
   },
  { "type": "image_url",
      "image_url": {
         "url": "https://images.unsplash.com/photo-1544601640-b256c49a192d",
   },
      },
      { "type": "image_url",
        "image_url": {
          "url": "https://www.nps.gov/webcams-yell/mammoth_arch.jpg",
        },
      }
    ]

example_output = '
  Example response object:
  {
    "matches": integer,
  }
'

Step 3: Call the ChatGPT vision API: gpt-4-vision-preview.

begin
  response = @openai_client.chat(
    parameters: {
      model: "gpt-4-vision-preview",
        messages: [
          { role: "system", content: system_context },
          { role: "system", content: example_output },
          { role: "user", content: user_messages }
          ],
        temperature: 0.4,
        max_tokens: 100
    })
rescue => err
      logger.fatal(err)
      return
else
  logger.info("OpenAI API response received and successfully processed")
  logger.info("Response:\n#{response}")
end

Step 4: Save the response from OpenAI, which should be a JSON object with one key/value pair, e.g. { “matches”: integer }. Make sure to view the example prompt input images to verify the accuracy of ChatGPT’s results.

hash_results = {}
hash_results = JSON.parse(response.dig("choices", 0, "message", "content"))

Step 5: Push the matches result from OpenAI’s inspection of the sample image to Grafana Cloud Metrics.

We use the Influx Line Protocol format below to write one metric at a time that is translated by Grafana Cloud’s backend to a Prometheus metric. Notice we provide a metric name and label convention so that you can expand from this single example to track metrics from more than one entrance in a single graph. You can find your endpoint URL, as well as the required metrics write credentials, in your Grafana Cloud portal.

#
# Save metric in the Influx Line Protocol format
#
metrics_payload = "nps_entrance,park=yellowstone vehicles=#{hash_results['matches']}"

#
# Push metric to Grafana Cloud using the Influx Line Protocol
#
begin
  uri = URI.parse(ENV['GRAFANA_CLOUD_METRICS_INFLUX_PROXY_ENDPOINT'])
  response = Net::HTTP.start(uri.host, uri.port, use_ssl: true) do |client|
    request                  = Net::HTTP::Post.new(uri.path)
    request.body             = metrics_payload
    request["Authorization"] = "Basic #{@grafana_base64_encoded_auth_token}"
    request["Content-Type"]  = "text/plain"
    client.request(request)
  end
rescue => err
   logger.fatal(err)
  return
else
  logger.info 'Grafana Cloud response:'
  logger.info response.code
  logger.info ''
 return
end

Step 6: Visit your Grafana Cloud instance to graph your metric(s) over time. We suggest starting by using the Explore page and then selecting the example metric from the dropdown list.

A screenshot of the Explore view in Grafana Cloud.

A screenshot of a line graph in Grafana Cloud Metrics.

To record the number of vehicle matches over an extended period of time, we suggest setting up this program to execute every 3-5 minutes as the camera feed refreshes.

That’s it! For a code-complete version of this example using ChatGPT vision capabilities and Grafana Cloud Metrics, please visit the following GitHub Gist. If you want to go one step further, and monitor the costs and resource usage of your OpenAI scripts, check out our OpenAI Integration.

If you have questions or get stuck, feel free to ask for help in our Community Forums or reach out to our Support team and we’ll be glad to help.

Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!

How to automate image analysis with the ChatGPT vision API and Grafana Cloud Metrics

Example: count the number of vehicles entering Yellowstone

Task

Prerequisites

Inputs provided

Process

Related content

Kubernetes observability: How to enrich logs with GeoIP using the Kubernetes Monitoring Helm Chart

SAML authentication in Grafana Cloud: a guide for easy configuration

Grafana Cloud updates: New observability as code tools, Grafana Drilldown enhancements, and more