AI Agent + Product Manager = QA Test Engineer

Wed, 08 Oct 2025 20:39:15 +0800

In September, our company organized a discussion on applying AI in the workplace. I happened to be researching end-to-end testing at the time, so I tried OpenCode with Playwright, and the results were astonishingly good.

I chose OpenCode over other AI agent frameworks (such as Claude Code) because it can integrate with the company’s enterprise GitHub Copilot account, which means we can use models like GPT-4 and Claude Sonnet without limits on the corporate intranet.

Playwright, built by Microsoft, is an automation testing framework that can drive browser APIs. Compared to Selenium, it is lighter, the community is more actively maintained, and it pairs better with large language models (there is an official MCP server). Playwright also bundles a webdriver, sparing a lot of environment configuration.

With OpenCode and the Playwright MCP server, and a few well-crafted prompt templates, you can run a complete set of web UI end-to-end test cases without writing a single line of test code. That would have been unthinkable in the past.

I have long believed that asking programmers to write E2E test code is laborious and more harmful than helpful. For edge cases and performance, unit tests and API tests cover more than 90% of the needs. The real value of E2E testing is in catching issues in UI interaction and integration. Using automated E2E test code to cover integration and UI scenarios carries an extremely high maintenance cost — every tiny UI tweak can break the test code — and statistically, more than half of the failing test cases in a test suite are not caused by functional defects at all, but by UI load latency, renamed frontend variables, slow test environments, and so on. For the real corner cases that threaten the integrated environment — for example, request retries caused by network interruptions, or out-of-range parameters from interface changes — writing E2E tests is less efficient than unit tests and API tests. For these reasons I have always encouraged the team to hire a full-time test engineer rather than reserving part of every sprint for developers to maintain E2E tests.

On the other hand, as a project lead, I care more about whether requirements are truly understood and delivered, and how to verify what the engineers actually built.

The arrival of AI agents has changed the agile workflow. With a combination like OpenCode + Playwright MCP server, the AI only needs to read user documentation to pick up the basics of UI operations. It can then open a browser, follow the natural-language description of a test case, and click through page elements step by step to complete an entire business flow. With a bit of guidance it can also produce the exact steps it took, the results, the issues it hit, and a complete test report. This is not far from hiring a junior QA engineer.

Because the maintenance cost drops dramatically (you only maintain a Markdown file describing the test cases), a lot of detailed UI test scenarios that were previously impractical can now be covered by an AI agent. Most importantly, this work does not depend on engineers at all — product managers, POs, or BAs can write test cases directly in natural language, closing the loop between writing user stories and verifying features, and removing the ambiguity that comes from requirements being relayed between business, engineering, and QA.

The Toyota Production System lists several sources of waste in production:

Overproduction
Waiting
Unnecessary transport
Over-processing
Excess inventory
Unnecessary motion
Defects

AI agents address, to some extent, three of these wastes: “overproduction” (writing test code over and over), “waiting” (waiting from requirements to implementation to test cases before a feature can be verified), and “unnecessary transport” (business requirements being passed between different people).

Why You Shouldn't Let AI Generate Your Unit Tests

Thu, 01 May 2025 09:27:36 +0800

Recently I heard Hailong Zhang, founder of Gru.ai, mention in a podcast that automatically generating unit tests is the main direction they are pursuing in AI coding.

Gru.ai’s website has these two lines:

Forget about unit testing – get covered automatically Harness the expertise of AI engineers to boost your team’s testing efficiency while reducing costs and ensuring top-notch quality.

Zhang’s insights on AI coding are inspiring. I am skeptical, though, of the claim that using AI to write tests cuts cost and boosts efficiency. I think they themselves weren’t fully confident when writing the second line — they couldn’t help tacking on “ensuring top-notch quality” for reassurance.

Unit tests are the concretization of requirements. They are the smallest-grained, closest-to-the-code constraint tool in the entire testing system. Unit tests are used not only to check whether code meets requirements, but more often to detect corner cases — because what makes a program reliable is that it doesn’t break at the boundaries. That is also what distinguishes an experienced engineer from a junior one.

But what Gru.ai is doing is using AI to raise unit test coverage. As we all know, higher coverage does not equal higher testing efficiency, let alone higher quality.

Letting AI automatically write runnable unit tests from a single prompt is very tempting for junior developers. It’s like a shooter trying to improve their accuracy by firing the gun first and then drawing the bullseye around the bullet hole.

The purpose of improving test coverage is to push human engineers to think carefully about edge cases. Using AI to help humans generate tests as a time-saver is perfectly fine, and Gru.ai instead tells us to “forget about unit testing, get covered automatically.” But the AI usually doesn’t know the edge cases unless a human explicitly tells it. So how does the AI infer the edge cases on its own? And how do we know the AI’s inferred edge cases are correct? If the AI tests the code, who tests the AI?

If products like Cursor embody the Silicon Valley imagination of vibe coding, then Gru.ai embodies the Chinese programmers’ “rosy expectations” of vibe testing.

Testing on Steve Sun

AI Agent + Product Manager = QA Test Engineer

Why You Shouldn't Let AI Generate Your Unit Tests