Q&A

Using LLMs to detect flaky test patterns — any practical experience?

May 11, 2026 901 0

I had an idea: take our flaky test history (test name, failure messages, stack traces, timestamps) and feed it to an LLM to identify common patterns we're missing. Hypothesis: an LLM might surface "these 12 tests all fail between 2–4 AM UTC when the nightly batch job runs" or "these tests share a fixture that does network calls" — patterns that are hard to spot by reading logs manually. Has anyone tried something like this? Or is the failure data too noisy and unstructured for an LLM to give useful signal on?

No comments yet. Be the first to answer this question!

Using LLMs to detect flaky test patterns — any practical experience?

Join the discussion