Help build the future of open source observability software Open positions

Check out the open source projects we support Downloads

We cannot remember your choice unless you click the consent notice at the bottom.

5 tips to write better browser tests for performance testing and synthetic monitoring

5 tips to write better browser tests for performance testing and synthetic monitoring

2024-11-21 21 min

Given the complexity of modern websites, browser testing is essential to ensure a positive user experience. With the Grafana k6 browser module, you can interact with real web browsers and simulate user interactions — like clicking, typing, or navigating pages — to collect frontend metrics, increase site reliability, and fix performance issues before they ever impact your users.

Part of Grafana k6 OSS, the k6 browser module is also compatible with our fully managed Grafana Cloud k6 solution, as well as Grafana Cloud Synthetic Monitoring, for out-of-the-box monitoring capabilities. This means you can re-use your browser testing scripts across those three environments, based on your testing needs.

With an API that’s similar to Playwright’s, the k6 browser module makes it easy for those familiar with the Playwright web testing framework to write their first browser tests. That said, we know a few best practices can go a long way when it comes to authoring browser test scripts and integrating them into your workflow.

In this blog, we outline five tips you can apply today to write better browser tests in Grafana k6, Grafana Cloud k6, and Grafana Cloud Synthetic Monitoring.

Tip 1. Run the test locally for a faster feedback loop

Our cloud offerings, including Grafana Cloud k6 and Synthetic Monitoring, are the best way to visualize the output of your script, but running tests locally on your machine is a great way to debug and iterate faster. To get started you will want to:

For the purposes of debugging browser scripts locally, ensure your test scenario is only going to run one virtual user (VU) and one iteration. The shared-iterations executor has these options set by default, so is the perfect choice for debugging. You can run tests quickly, read your logs output, and reduce the feedback loop as you author your tests.

As your tests grow in length and complexity, the logs might become too noisy to efficiently use, or might be missing critical pieces of information to pinpoint potential issues. Take this script, for example, which tests the federated login flow for grafana.com:

federated-login-test.js

JavaScript
import { browser } from "k6/browser"
import { check } from "https://jslib.k6.io/k6-utils/1.6.0/index.js"

export const options = {
  scenarios: {
    ui: {
      executor: "shared-iterations",
      options: {
        browser: {
          type: "chromium",
        },
      },
    },
  },
}

export default async function () {
  const context = await browser.newContext()
  const page = await context.newPage()

  await page.goto(`https://grafana.com/`)
  await page.locator(`#menu__login-link`).click()
  await page.waitForNavigation()

  await page.locator(`button[aria-label="Login using Google"]`).click()
  await page.waitForNavigation()

  // You will have to add or replace the __ENV variables as appropriate.
  await page
    .locator(`input[aria-label="Email or phone"]`)
    .fill(__ENV.GRAFANA_STACK_USER_EMAIL)

  await page.locator(`//*[text()="Next"]`).click()

  await check(page.locator(`form h2`), {
    "Form heading text is correct": async (lo) =>
      (await lo.textContent()) === `Grafana Labs`,
  })

  await page.close()
}

You can run the test by executing k6 run federated-login-test.js in your terminal, but seemingly, the test ends prematurely. The check never runs, so it doesn’t trigger the threshold failure, and if you inspect the logs you see:

Uncaught (in promise) getting text content of "h2": Inspected target navigated or closed

It’s a cryptic log, especially if you are unfamiliar with the underlying API and what it’s doing. If you follow the steps by clicking along in your own browser, it isn’t clear what the problem is, as you end up on a page where document.querySelector(‘form h2`).textContent is valid and returns the expected result.

Using the k6 browser CLI options, you can disable headless mode and watch your Chromium instance on your screen, so you can follow along as the bot steps through executing the script.

K6_BROWSER_HEADLESS=false k6 run {{scriptName}}.js

This often provides invaluable insights to uncover a problem that the logs themselves weren’t able to fully convey.

In the example above, it is now easy to establish what the problem is: the last page is never reached. We forgot to wait for a navigation change after the next button was clicked, and the asynchronous check function couldn’t successfully complete because, as it was executing, its page context was canceled.

The solution is to add page.waitForNavigation() on line 36 and now the test passes. 🎉

Bots are much quicker than humans when interacting with a browser. You may want to use the page.screenshot() method or artificially increase the time between asynchronous steps using page.waitForTimeout(), so you can process what is going on. However, if you do inject timeouts into your script to help with debugging, remember to remove them afterwards because…

Tip 2. Timeouts and sleep are anti-patterns!

If you’ve seen a k6 script before, there is a good chance it included sleep(t). In the context of non-browser scripts that are focused on testing APIs at the protocol level, this is often a necessary step to simulate a real user’s input delay. However, when working with the browser, this can be a source of flaky and unreliable tests that become difficult to untangle and debug. This is because the k6 script pauses, but the browser continues to execute its tasks and processes at an unknown rate, which may be different from one test to the next.

If in the federated-login-test.js script, instead of adding page.waitForNavigation() you added sleep(5), this would often get the test to pass; however, you have now introduced one of the following problems every time it runs:

  • An iteration in your script is now running longer than necessary (so it is going to cost you money)
  • An iteration in your script is not running long enough and reports failure (so it is going to cost you money)

The more sleep(t) or page.waitForTimeout() calls you add, the more these issues compound and introduce further problems. If you are using your script with Grafana Cloud Synthetic Monitoring, browser checks currently have a limit on running time, so you may be reaching that limit unintentionally. For k6 browser tests, you are increasing memory and CPU usage unnecessarily, which might not just fail the current iteration, but make your script report failure altogether.

If using sleep(t), it is important to note it is a synchronous function native to k6 that blocks the JavaScript event loop. If you are using the k6 timeline feature, the sleep time will get included in the trace and you will lose valuable insights into when your services are really slowing down.

What to do instead?

In the federated-login-test.js example above, adding page.waitForNavigation() was the solution. This method waits for the browser’s window load event to fire, ensuring the test is reliable and no longer waiting too little or too long until the browser script should continue its execution.

If you aren’t waiting for a page navigation event, but for some in-page content that hasn’t been rendered to the page yet, you want to use page.locator() and its associated methods.

As an example, here’s a browser script checking if the performance testing plugin is displaying the expected information within Grafana.

performance-app-renders.js

JavaScript
import { browser } from "k6/browser"
import { check } from "k6"

export const options = {
  scenarios: {
    ui: {
      executor: "shared-iterations",
      options: {
        browser: {
          type: "chromium",
        },
      },
    },
  },
}

export default async function () {
  const context = await browser.newContext()
  const page = await context.newPage()

  // You will have to add or replace the __ENV variables as appropriate.
  await page.goto(__ENV.GRAFANA_APP_URL)

  // login
  await page
    .locator(`[data-testid="data-testid Username input field"]`)
    .fill("admin")
  await page
    .locator(`[data-testid="data-testid Password input field"]`)
    .fill(__ENV.GRAFANA_APP_PASSWORD)
  await page.locator(`[data-testid="data-testid Login button"]`).click()
  await page.waitForNavigation()

  // navigate to Performance page
  await page.locator(`[data-testid="data-testid Toggle menu"]`).click()
  await page
    .locator(`[aria-label="Expand section Testing & synthetics"]`)
    .click()

  await page.locator(`//*[text()="Performance"]`).click()

  const perfH1Text = await page
    .locator(`//h1[text()="Performance"]`)
    .textContent()

  check(perfH1Text, {
    "Performance page heading is correct": (text) => text === "Performance",
  })

  await page.close()
}

If you follow along manually (or by turning off headless mode, as suggested in tip 1) you will notice that, at every stage, you have to wait for something to be rendered to the page before continuing. By using page.locator() and its provided methods, the script waits for the content to be available before proceeding.

By using a combination of page.waitForNavigation() and page.locator() methods, you now have a reliable script without having to add arbitrary and unreliable timeout functions.

Tip 3: Think about the asynchronous execution of your test

When writing your browser tests, it is important to evaluate and consider:

  • What is the next function call really doing?
  • What is happening from the script’s point of view?

The k6 browser API is purposefully designed to be compatible with the Playwright API for NodeJS. Grafana k6, however, does not run in a NodeJS environment, but uses Go and transpiles scripts using Sobek. Why is this important to know?

k6 has evolved over time and two significant milestones for browser checks were introducing the event loop and implementing all the browser APIs as async methods. For the most part, writing a browser test should be indistinguishable from writing native JavaScript, but there can be a few opaque ‘gotchas’ that are difficult to recognize and identify.

The native k6/check does not support passing asynchronous functions

Because all of the k6 browser API methods are asynchronous, it feels intuitive to write a check like this:

JavaScript
import { check } from 'k6'

...


// doesn't work
await check(page.locator(`h1`), {
  'text content is correct': async(lo) => (await lo.textContent()) === 'Expected content'
})

This will not work, however, because the native k6 check is synchronous. To fix this problem, you can replace the native k6 check with our jslib.k6.io version. It is compatible and can be directly swapped out in any of your existing tests.

JavaScript
import { check } from "https://jslib.k6.io/k6-utils/1.6.0/index.js"

...


// works!
await check(page.locator(`h1`), {
  'text content is correct': async(lo) => (await lo.textContent()) === 'Expected content'
})

Don’t use page.waitForNavigation unnecessarily

It is important to know the underlying architecture of your website when you are authoring browser tests. Are you using a Single Page Application (SPA), such as React, Angular or Vue, or a traditional Multi Page Application (MPA)? Or maybe a mixture of both?

You may expect that any time the URL changes in your browser, it would trigger a page load event — but if your application is an SPA, it uses the browser’s history.pushState method, so the page.waitForNavigation() function call will never resolve, block the rest of the execution of your script, and eventually timeout. This is especially important to note with the next ‘gotcha’ below.

k6 doesn’t know what is in the script until it is encountered

If your script execution aborted part-way through, none of your checks, logs or custom metrics that are declared and executed after that point will be registered. If you have set up your script so it only fails when a check failure threshold is reached, you are going to have false positives in your test results. Note: see the bonus tip below to solve this.

Have a ‘bots-eye’ view

If you take the performance-app-renders.js script from above and change the locator for the h1, the script now fails!

JavaScript
const perfH1Text = await page
    .locator(`#pageContent h1`)
    .textContent()

There are no logs indicating what has gone wrong — just a failed check. If we summarize the steps from a human perspective:

  • Go to your Grafana URL
  • Fill in username, fill in password, and click login
  • Wait for page navigation
  • Click the menu toggle, expand Testing & Synthetics, click Performance
  • Check if the h1’s text content is “Performance”

If you follow along in your browser, open your devTools console and run document.querySelector(‘#pageContent h1`).textContent — it returns “Performance!" So what is going on?

The problem is that the bot executing the script “thinks,” “sees,” and “evaluates” things a lot quicker than humans do. If you add this useful debug snippet at the beginning of the script and open the devTools console, the execution will pause, allowing an investigation into what has happened.

A screenshot of debugging Grafana with Chrome DevTools. The screen is split in two with Grafana displayed on the left with an h1 with 0 width and 0 height highlighted and on the right is the elements panel displaying the DOM tree. The h1 contains no text content.

While waiting for the plugin to load, for a split second, Grafana has rendered an empty h1 to the screen that fulfills the selector we are looking for, so it moves onto executing the check comparison: the text content of the h1 is not “Performance,” so the check and our test ultimately fails.

And that brings us nicely into our fourth tip…

Tip 4: Write unique and durable selectors for page.locator()

We have a guide on best practices for selecting elements, where the fundamental idea is to create durable selectors that:

  • Are unique to the page (if not the whole journey, where possible, as shown by the bug above!)
  • Are guaranteed to be stable
  • Convey intent in the script

If you were to write a browser script to test the login of Grafana, when inspecting the Document Object Model (DOM), you would see markup similar to this for the Username input field:

<input class="css-8tk2dk-input-input" name="user" id=":r0:" autocapitalize="none" placeholder="email or username" data-testid="data-testid Username input field">

The input element has several attributes on it. To decide which would be appropriate to use, you can assign each a score:

Unique to the pageValue is stableConveys intentTotal
autocapitalize0
class⚠️⚠️1
id⚠️⚠️2
name⚠️2.5
placeholder⚠️2.5
data-testid3

❌ = 0, ⚠️= 0.5, ✅ = 1

  • The autocapitalize attribute can immediately be ruled out. There is no guarantee it is unique to the page, it has no bearing on what you are trying to select, and it would be difficult to infer your intent when reviewing the test script.

  • The class attribute might be considered. It is often used when using tag manager tools or other rudimentary recorders to act as a selector. But there are several problems with using this attribute:

    • There is no guarantee that it is unique to the page (and it isn’t, in this case, as the password input shares the same classes)
    • They are often considered as an implementation detail by developers and are subject to change (especially in the case here, where the value is generated by the build process)
    • Even if it was unique and stable, it makes your test scripts less maintainable. How would someone else reading your script be able to infer you are selecting the username input?
  • The id attribute should be unique, but it has similar problems to class in this case. Those who are familiar with React APIs will recognize this ID has been generated by the useId hook, so if the developers were to use this hook elsewhere on the page for a component that gets rendered before the input field, this id value would shift, so it is not guaranteed to be a stable selector. In this case, it also doesn’t convey any meaning when reading the test script, so isn’t a good candidate.

  • The name and placeholder attributes have a similar weighting and both would be strong candidates for a stable selector that conveys meaning in the test script. There are some additional considerations:

    • If you want to reuse the test script for testing the page in a different language, the placeholder selector would have to be updated in the script.
    • If you haven’t communicated with your development and/or content teams, they might be unaware of the testing contract you have just enrolled them in. They might have good reason in the future to update the value of either of these attributes and be unaware your script is about to break.
  • This brings us to the data-testid. It is unique in the DOM, it conveys intent in the script, and it has an implicit contract with other teams that its only purpose is for use in testing scripts, so it will remain stable. This is the ideal selector you should be using.

JavaScript
page.locator(`[data-testid="data-testid Username input field"]`)

This process works well with elements that have suitable attributes or well-labeled IDs or classes. You could strengthen and further clarify intent by adding the element’s tag name.

But what happens if you don’t have any useful attributes available? There are two options, which can be combined:

  1. Look up the node’s parent tree to find a suitable selector. If using this method, it is important not to write selectors that are too closely tied to your DOM structure. Selectors tightly coupled with the DOM are brittle and prone to breaking, as they are too closely tied to your developer’s implementation, which may change at any point.
JavaScript
// good
page.locator(`#product-detail h2`)

// bad
page.locator(`#product-detail > section > div > div.arbitaryClass > h2`)
  1. Use an xpath selector with a text() node test. page.locator() supports xpath selectors and there are many that are very powerful. Targeting elements based on their text content is highly recommended because it decouples the test script from your implementation details and focuses on writing locators the same way that a user would find things on the page.
JavaScript
page.locator(`//h2[text()="Product Title"]`)

Tip 5: Use threshold options or the fail() method to ensure you get the right results

Whenever you are authoring a test, you should think about your acceptance criteria to consider it a success. But how do we define what a pass or failure looks like for a browser test?

k6 provides a checks API where you can define at any point in the script’s execution if something has passed or failed. A check can be as simple as confirming if an API request returned a successful status code, whether an element has the correct text content, or if a user journey took an acceptable amount of time.

You may think any failed checks in your script should mean the whole test should fail, but if that was the default behavior, most load tests would result in failure. When writing a k6 script, you should identify what context it is running in (i.e., is it a performance test or for synthetic monitoring?), as well as what scenarios, VUs, and iterations are being used.

After establishing these details, you can determine the appropriate ways to think about success and failure, which checks to add, and appropriate thresholds — for example, is this check a warning, or is it a critical failure for our test?

Here are two examples — one for performance testing, and one for synthetic monitoring — that support the k6 browser module and demonstrate how you could use the checks API.

Grafana k6 performance tests using the browser module

A k6 performance test is a versatile way to define any number of scenarios, VUs, and iterations. In the following example, we have a demo e-commerce website and we are using a hybrid test to record what happens if our recommended product API is under heavy load unexpectedly. Our recommendation engine is an important part of the application, but it’s not as business-critical as ensuring customers can place orders and check out.

recommend-product-spike.js

JavaScript
import { browser } from "k6/browser"
import http from "k6/http"
import { check } from "https://jslib.k6.io/k6-utils/1.6.0/index.js"

const PRODUCT_IDS = __ENV.PRODUCT_IDS

const HAS_SOME_LEEWAY = `warn`
const SUPER_IMPORTANT_CHECK = `critical`
const LESS_IMPORTANT = `info`

export const options = {
  scenarios: {
    ui: {
      executor: "constant-vus",
      duration: "1m",
      vus: 3,
      options: {
        browser: {
          type: "chromium",
        },
      },
      exec: "checkoutCompletion",
    },
    "spike-api": {
      executor: "ramping-vus",
      startVUs: 0,
      stages: [
        { duration: "10s", target: 10 },
        { duration: "40s", target: 30 },
        { duration: "10s", target: 10 },
      ],
      gracefulRampDown: "10s",
      exec: "spikeApi",
    },
  },
  thresholds: {
    [`checks{importance:${SUPER_IMPORTANT_CHECK}}`]: ["rate==1.0"],
    [`checks{importance:${HAS_SOME_LEEWAY}}`]: ["rate>=0.95"],
    [`checks{importance:${LESS_IMPORTANT}}`]: ["rate>=0.9"],
  },
}

export function spikeApi() {
  const randomProduct =
    PRODUCT_IDS[Math.floor(Math.random() * PRODUCT_IDS.length)]
  const res = http.get(`https://otel-demo.field-eng.grafana.net/api/recommendations?productIds=${randomProduct}`)

  check(
    res,
    {
      "status code is 200": (r) => r.status === 200,
    },
    { importance: HAS_SOME_LEEWAY }
  )
}

export async function checkoutCompletion() {
  const context = await browser.newContext()
  const page = await context.newPage()

  await page.goto(`https://otel-demo.field-eng.grafana.net/`)
  await page.locator(`//*[text()="Go Shopping"]`).click()

  await Promise.all([
    page
      .locator(`//*[text()="Starsense Explorer Refractor Telescope"]`)
      .click(),
    page.waitForNavigation(),
  ])

  // less important check
  await checkForRecommendedProducts(page, `Product page`)

  await Promise.all([
    page.locator(`//*[text()=" Add To Cart"]`).click(),
    page.waitForNavigation(),
  ])

  // less important check
  await checkForRecommendedProducts(page, `Shipping form`)

  await Promise.all([
    page.locator(`//*[text()="Place Order"]`).click(),
    page.waitForNavigation(),
  ])

  // Super important check
  await check(
    page.locator(`h1`),
    {
      "Place order page was reached": async (lo) =>
        (await lo.textContent()) === "Your order is complete!",
    },
    { important: SUPER_IMPORTANT_CHECK }
  )

  // less important check
  await checkForRecommendedProducts(page, `Order confirmation`)
  await page.close()
}

const TWO_SECONDS = 2000

async function checkForRecommendedProducts(page, step) {
  try {
    await page
      .locator(
        `[data-cy="recommendation-list"] [data-cy="product-card"]:first-of-type`
      )
      .waitFor({ timeout: TWO_SECONDS })
  } catch (e) {
    await page.screenshot({ path: `./screenshots/${step}.png` })
  } finally {
    const cards = await page.$$(`[data-cy="product-card"]`)
    console.log(step, cards.length)

    check(
      cards.length,
      {
        "4 recommended products are displayed": (length) => length === 4,
      },
      {
        importance: LESS_IMPORTANT,
      }
    )
  }
}

In the script above, there are two scenarios to form a hybrid test: spike-api, which is using k6’s http protocol to simulate a spike of traffic, and user-checkout which is a browser test simulating the checkout flow. There are three checks that are tagged with a key of importance and a varying value, depending on what level of success we require for the check.

In the options object, a threshold has been set for each of these key/value pairs that will dictate whether the test has passed or failed. The abortOnFail option is not being used in the thresholds because valuable data would be lost if the test ended prematurely. If failures started occurring, it would be useful to know what the extent of the failure looks like for the API and what is happening on the frontend in the browser.

Synthetic Monitoring check

If you are running a Synthetic Monitoring browser check, it will always be one scenario with one VU and one iteration. Synthetic Monitoring doesn’t currently support the thresholds object in the options declaration (but it’s coming soon!), so you have to use the explicit fail() method to let the probe know the check has failed.

Synthetic Monitoring browser checks behave differently than k6 checks because they primarily assess if your test passed or failed based on your definition of uptime. For non-scripted Synthetic Monitoring checks, you define uptime with a set of assertions in its own step during check creation, but because browser checks are written as a script, it is up to you to mark out explicitly what you want to count towards defining uptime.

If you take our recommended-product-spike-test.js script above and extract the checkout flow scenario, it would only need a small modification to suit the needs of a Synthetic Monitoring browser check. By adding the fail() method to the check, confirming if the order confirmation page was reached, the Synthetic Monitoring probe will now understand what constitutes a failure of uptime and report correctly.

JavaScript
if (
  !(await check(page.locator(`h1`), {
    "Place order page was reached": async (lo) => (await lo.textContent()) === "Your order is complete!",
    }))
  ) {
    fail(`Order completion page was not reached`)
  }

We can leave our less important checks in the script without adding an explicit fail to them, as that’s just a bonus we are receiving from this set-up. We could even add a custom alert using Grafana Alerting to know if the recommended products aren’t rendering consistently.

This kind of flexibility in your Synthetic Monitoring browser checks means you can set primary and secondary assertions in just one execution, saving you both time and money!

Bonus tip: don’t just account for the ‘happy’ path

A notorious mistake in any kind of testing is assuming they will always succeed. However, the real value of testing is when the inevitable failure occurs, how well do the tests inform us of the problem they encountered?

If you look at the checkout scenario in the recommend-product-spike.js, there are several page.locator().click() functions called. What happens if our application has an error and these aren’t displaying correctly and the selectors fail?

Each iteration will wait the default 30 seconds before timing out and throwing a failure. This could be an issue for our test because it “blocks” a VU for 30 seconds before ending the iteration and starting again. The first part of the problem is this test only runs for one minute, so if it encounters a problem early, we lose a lot of potential iterations and all the additional data they would generate.

There are two ways of solving this problem:

  1. Providing each page.locator().click() (and similar methods) with their own appropriate timeouts
  2. Setting a default timeout value on the page using page.setDefaultTimeout()

Each of these approaches has its pros and cons and is up to you to decide which works best for you.

The second problem is outlined above in tip three: the test execution was halted, the check confirming if the order page was reached is never encountered, and this failure goes unreported in the test’s metrics. The solution is to add a catch block to our try / finally statement:

JavaScript
try {
...
} catch (e) {
  console.error(e)
  await page.screenshot({
    path: `./screenshots/${__VU}_${__ITER}-failure.png`,
  })

  check(
    null,
    {
      "Place order page was reached": false,
    },
    { important: SUPER_IMPORTANT_CHECK }
  )
} finally {
  ...
}

Note: It is good practice to take a screenshot to make your debugging sessions easier.

Synthetic Monitoring reports an uptime failure if any execution results in an unhandled exception, so if you do add this catch block, remember to rethrow the error (with your own logging if you favor it over the default) or call k6’s fail() method.

Summing up

The k6 browser API is a versatile tool to help monitor your website’s performance and reliability. With the tips above, you can ensure a tight feedback loop when authoring your tests, while also keeping them adaptable and bug-free. Ultimately, these best practices will make it easier for you to identify potential performance issues and optimize your end-user experiences.

Grafana Cloud is the easiest way to get started with browser testing. With Grafana Cloud k6, you can effortlessly combine frontend and backend testing in a single cloud-based test. Grafana Cloud Synthetic Monitoring enables continuous monitoring of your critical journeys in a production environment. We offer a generous forever-free tier and plans tailored for every use case. Sign up for free now (Grafana Cloud k6 or Grafana Cloud Synthetic Monitoring)!