Testing shift left observability with the Grafana Stack, OpenTelemetry, and k6
Development is no longer a linear journey from point A to point B. As more projects shift into a state of organic growth, user feedback and constant experimentation are increasingly becoming the norm, if not the standard for engineering.
“In order to support this rapid experimentation, we’re beginning to embrace new working methods and practices,” said Vinodh Ravi, Executive Director of Platform Engineering at JPMorgan Chase.
But there are two distinct cognitive loops that have to interconnect: The approach developers have to create the applications by building and deploying continuously is different from that of the devops engineers who have to maintain the apps and make the code more reliable and resilient.
“Our systems today have so many moving parts that are constantly in flux,” said Ravi, which is why he believes experimental observations and developer narratives during development can be used as effective storytelling to make predictions about a product.
These predictions can then be used to build reliable systems to detect and measure defects, using load testing services such as k6 coupled with chaos engineering platforms like Gremlin to help the code evolve and create better feedback loops.
In his ObservabilityCON 2021 session titled “Observability-driven development at JPMorgan Chase,” Ravi highlighted how observability is also vital to this incremental learning process, emphasizing “the benefits from the shift left resiliency practices and how, by doing so, we can enable our business to move rapidly.”
Why observability = infrastructure innovation
With incremental learning, there is a high volume of data that can often be distracting to teams as they try to suss out what’s beneficial to move the project forward and what’s just noise.
“In order to amplify and dampen the signals coming out of those feedback loops, we need good observability tooling,” said Ravi. “Embracing open source community projects like Loki, Tempo, and k6 is the way forward towards infrastructure automation and infrastructure innovation.”
Ravi outlined three key components to his observability-driven development strategy:
- Leverage open standards like OpenTelemetry and open source libraries such as Sqlcommenter
- Incorporate open source tools like Loki, Prometheus, and Tempo
- Introduce specs like Open Annotation and projects like Hypothesis to provide a platform for teams to have meaningful conversation from which context can be derived.
Ravi then walked through an end-to-end demo of a hypothetical project called Climeat, whose mission is to discourage people from eating meat and effectively reduce gas emissions.
Simulating a three-person team, Ravi ran through experiments on a distributed system instrumented with OpenTelemetry by using k6 to simulate load and Gremlin to run chaos scenarios. He then leveraged the native data correlation capabilities of Loki and Tempo to automatically gather insights from the experiment.
The result? Reduced time to detection by combining the data with feedback and collaboration.
By applying the same rigor to observability tooling that is often placed in critical code software, “we embrace the complexity of constantly trying to deal with complex problems,” said Ravi. “The solution is for us to start thinking differently and building in this type of hygiene early on in the design development phases.”
Ravi’s advice for how to get started? “Start placing trust in the open source community and pay attention to all the wonderful projects and contributions happening right here.”
To see Ravi’s full end-to-end demo and hear more about his approach to observability-driven development, watch his full ObservabilityCON talk. All of our ObservabilityCON 2021 sessions are now available to view on demand.