Patrol autofix¶
How a failed Patrol E2E run becomes an investigated, fixed draft PR — without a human sitting in the slow edit-and-rerun loop.
Most Patrol failures are minor test-side issues (a waitUntilVisible hitting a release
animation, a dropdown overlay off-screen) rather than real product bugs. Fixing them is
slow only because the feedback loop is: edit the test → push → wait minutes for the
Patrol job → see if it's green → repeat. This pipeline removes the human from that loop:
on any Patrol failure it investigates whether the failure is a flake or a real break, and
either opens a draft PR with a test fix or reports the broken functionality (and attempts
a minimal product fix).
It reuses the security model of the Datadog auto-fix pipeline: the agent runs edit-only, the workflow owns git, and the draft PR is opened with a GitHub App token so downstream CI runs. Nothing merges without human review.
Flow¶
flowchart TD
patrol[".github/workflows/flutter-patrol-tests.yml<br/>(Patrol job, per app)"] -->|"job fails"| autofix[".github/workflows/patrol-autofix.yml<br/>(workflow_call, per app)"]
autofix --> artifacts["download failing run's<br/>logs + Playwright report"]
artifacts --> resolve["resolve failing titles -> Dart targets"]
resolve -->|"no failures for this app"| noop["no-op (matrix leg for a passing app)"]
resolve -->|"failures"| debug["launch debug app<br/>(marionette VM service)"]
debug --> agent[".github/actions/run-claude-agent<br/>patrol-fix agent (edit-only)"]
agent --> gate["verify gate:<br/>re-run failing target (workers=1)"]
gate -->|"workflow commits + App token opens"| pr["draft PR -> develop"]
pr --> human["human review"]
Components¶
- Trigger — an
autofixjob influtter-patrol-tests.yml,needs: [setup, patrol], runs whenneeds.patrol.result == 'failure'. It is matrixed over the same apps; the reusable workflow no-ops for an app whose run had no failures (matrix legs can't tell which app failed, so each filters itself). - Reusable workflow —
.github/workflows/patrol-autofix.yml(workflow_call). Mints the GitHub App token, checks out the PR head, downloads the failing run's<app>-test-logsartifact, resolves failing test titles back to Dart target files, branches offdevelop, sets up the Patrol toolchain, launches a debug app for marionette, runs the agent, re-runs the failing target as the gate, and opens the draft PR. - Composite action —
.github/actions/run-claude-agentruns Claude Code headless in edit-only mode (strips the agent file's frontmatter, appends a run-context data block,claude -p --permission-mode acceptEdits). Shared with the Datadog pipeline so the "run the agent" step has one source of truth. - Agent —
.github/agents/patrol-fix.agent.md(prompt) +.github/agents/patrol-fix.mcp.json(marionette MCP). Classifies flake vs real break, applies the smallest fix reusingintegration_test_sharedhelpers and the existing page objects, validates with a scoped patrol re-run, and writes a JSON verdict the workflow reads.
How the agent decides flake vs. real break¶
The agent reads the failing step from the log and Playwright report, then probes the live
debug app with marionette (connect, get_interactive_elements, tap, …). If the flow
works interactively, it's a flake and the fix belongs in the test; if the feature is
genuinely broken, it's a real break and the agent attempts a minimal first-party product
fix plus a written report. Known flake root causes and their fixes are encoded in the
agent prompt (dashboard card waitUntilVisible, provider-read-during-build, off-screen
dropdown overlay, double-dispose, un-awaited sign-out).
The authoritative gate is the workflow re-running the failing target with
--web-workers=1. The agent's own re-run is just for fast iteration; the workflow's gate
decides whether the PR is labelled needs-human.
Outcomes & labels¶
- Flake, gate passes — draft PR to
develop, labelspatrol-autofix,auto-fix. - Real break — draft PR with the minimal fix + a report, additionally labelled
needs-human; the report is also commented on the originating PR. - Gate fails / inconclusive / no change — draft PR (if any change) labelled
needs-human; the job fails so the failure stays visible.
Configuration¶
No new secrets. Reuses the Datadog pipeline's ANTHROPIC_API_KEY, AUTOFIX_APP_ID,
AUTOFIX_APP_PRIVATE_KEY, and the optional OTEL vars. The autofix job runs on every
Patrol failure; to pause it, disable or remove the autofix job in
flutter-patrol-tests.yml.
Boundaries¶
- Edit-only agent; first-party files only (no generated clients, lockfiles, vendored code).
- Fixes always target
develop(consistent with the Datadog pipeline). A flake fix won't auto-unblock the originatingrelease/*PR — the reviewer routes/cherry-picks it. - Never writes to
main,release/*, orhotfix/*, and never merges.