Testing & the CI/CD release gate¶
This page is the source of truth for how the Flutter apps (perci-platform-members
and perci-platform-clinicians) are tested and what must be green before a change
can be released. The goal is continuous delivery: a change merges and releases as
soon as the gate is green, multiple times a day.
Test types¶
Both apps use the same four layers, and the same tooling (patrol, golden_toolkit,
mockito).
| Layer | Tool | Lives in | Runs in CI |
|---|---|---|---|
| Unit (domain/data) | flutter_test |
test/features/**/domain, **/data |
every PR (blocking) |
| Widget | flutter_test + ProviderScope overrides |
test/features/**/presentation |
every PR (blocking) |
| Golden | golden_toolkit (@Tags(['golden'])) |
next to the widget, in goldens/ |
every PR (see Goldens) |
| E2E | patrol (Chrome/web) |
patrol_test/ |
pre-release / on-label (separate) |
Mocking convention¶
- Prefer hand-rolled fakes that implement the domain repository interface for unit/provider/widget tests - they are explicit, fast, and need no codegen.
- Use
mockito(@GenerateMocks+build_runner) for new tests where call verification or mocking a concrete SDK/generated client adds real value. - Providers are tested with a
ProviderContainer(orProviderScope) overriding the repository/datasource provider with a fake. ForautoDisposeproviders, hold a listener before awaiting so the provider is not disposed mid-load.
Shared harness¶
packages/perci_platform_test_shared is the single source of truth for the
fiddly, app-agnostic test setup: the silent network-image HTTP layer, the Firebase
Analytics fake, the package_info / secure_storage / datadog channel mocks,
golden_toolkit configuration, device presets and the Firebase core mocks. Each
app keeps a thin test/.../golden_harness.dart that calls
GoldenHarnessBase.baseGlobalSetUp() then wires its own Firebase init, auth
manager and FFAppState. Patrol widget wrappers stay per-app (they embed each
app's root widget).
The release gate (.github/workflows/flutter-pr-checks.yml)¶
A PR to develop or main must pass:
- Code generation -
melos run build_runner(openapi + freezed + riverpod). - Analyze (errors block) -
flutter analyze --no-fatal-infos --no-fatal-warnings. Errors fail the build. Warnings/infos are reported but not yet fatal - they are a ratcheting backlog (see below). Flip to fatal-warnings once the count hits zero. - Unit + widget tests -
melos run test(excludes goldens), with coverage. - Coverage threshold - total line coverage must be
>= FLUTTER_MIN_COVERAGE(a repo variable). Ratchet this up toward 80%; never lower it. - Goldens - a separate, currently-advisory job (see Goldens).
Patrol E2E runs in a separate workflow, not on every PR (see E2E).
Coverage baseline & ratchet¶
Set the repo variable FLUTTER_MIN_COVERAGE to the current measured floor, then
raise it as the backlog is burned down. The immediate purpose of the gate is
non-regression (coverage may not drop); the long-term target is 80% on
hand-written code, reached by ratcheting.
The floor follows the coverage actually present on the target branch. The
foundation lands at develop's current ~9.54%, so FLUTTER_MIN_COVERAGE starts at
9 and is ratcheted to 11 once the stacked test PRs (993 tests) are on
develop, then upward toward 80%. Raise it after each coverage-improving PR; never
lower it below the branch's real coverage.
| Scope | Line coverage |
|---|---|
| Merged (what the gate checks) | 11.54% (7695/66664) |
| perci-platform-members | 13.85% raw / 13.26% testable |
| perci-platform-clinicians | 9.66% raw / 10.10% testable |
Well-covered feature areas already: members video_call 94%, scans_and_tests
47%, checkout 39%, code_signup 31%; clinicians payments, documents,
screening, member_dashboard. The gap to 80% is almost entirely legacy
FlutterFlow UI (onboarding, main_pages, appointments, new_a_p_p) - best covered
by widget + golden + patrol rather than unit tests, and tracked in the backlog below.
Analyze warning ratchet¶
melos analyze currently reports ~360 warnings/infos across the workspace, almost
all pre-existing in legacy FlutterFlow code (perci_library_9rk85z) and a few in
older test infra. There are no error-severity issues, so the error-only gate
passes today. Burn the warning count down (a chunk is auto-fixable via
dart fix --apply), then make warnings fatal in the gate.
Goldens¶
Golden tests run on every PR for both apps via flutter test --tags golden (the
golden job in flutter-pr-checks.yml). The job is non-blocking during the
PPL-2637 test-coverage rollout - it validates both apps, but the consistent
baselines and dropped non-deterministic scenarios land across the stacked test
PRs, so it can only pass on the fully merged set. Re-make it blocking (remove
continue-on-error) once both apps' golden suites are on develop.
Cross-platform: render in boxes, not real fonts¶
Real fonts rasterise differently per OS (Windows DirectWrite, macOS CoreText, Linux
FreeType), so golden PNGs made on one machine never match another - a mixed Win/Mac/Linux
team plus Linux CI can't share real-font baselines. So our goldens do not load real
fonts: the harness never calls loadAppFonts(), so Flutter's test environment renders
all text in the Ahem font (every glyph a fixed square). Ahem output is identical on
every platform, so a baseline generated on any dev machine matches CI exactly. (This is the
same trick as Alchemist's "CI mode"; we do it directly rather than add the dependency.)
Consequence: goldens verify layout, sizing, colour and structure - not readable text
(text shows as boxes). Text content is asserted by widget tests. A small 3% tolerance
comparator (test/flutter_test_config.dart) absorbs residual sub-pixel anti-aliasing at
box/shape edges.
- Regenerate on any OS with
flutter test <path> --tags golden --update-goldens- boxes are platform-independent, so the result matches CI. The Flutter - Update Goldens workflow does the same on CI and opens a PR. - Not golden-tested (non-deterministic regardless of fonts): live camera/video widgets
(the old
meeting_room/waiting_roomgoldens) and the animatedWelcomePagewere dropped; a couple of ultra-narrow scenarios areskip-ed where the wider Ahem glyphs tip a flex-less row into overflow (covered by their wider siblings + widget tests).
E2E (patrol)¶
Patrol tests run against Chrome (web) via Playwright and are slow + device-bound,
so they do not run on every PR. Both apps share one workflow,
.github/workflows/flutter-patrol-tests.yml, which inspects the PR's changed files and runs a
matrix job per affected app: a change under packages/ or the root pubspec.yaml runs both,
an app-only change runs just that app, and the generated clinical BFF spec runs clinicians. It
runs on release/* and hotfix/* PRs into main, and on demand via the preview label.
The shared harness lives in patrol_test/ (patrol_setup.dart, patrol_widget_wrapper.dart,
helpers/clinician_session.dart, pages/). patrol test discovers patrol_test/
by default. Clinician flows covered: sign-in (+ forgot-password), sign-out, members-list
search, members-list filter, member details (contact + demographic), member medical
record (documents + screening sections), appointments (tabs), messages, payments,
and the Learn hub (open + search). Flows reuse the proven selectors from the existing
integration_test/pages/; section/tab flows are permission-gated and skip cleanly when
a role lacks access. These are authored + analyze-verified; runtime execution is the
patrol CI job.
Running locally¶
melos bootstrap # resolve deps (first time / after pubspec changes)
melos run build_runner # generate openapi + freezed + riverpod
melos run analyze # static analysis
melos run test # unit + widget (excludes goldens)
# single app, with coverage
cd apps/perci-platform-members && fvm flutter test --coverage --exclude-tags golden
# goldens for one file
fvm flutter test <path> --tags golden # compare
fvm flutter test <path> --tags golden --update-goldens # regenerate
Coverage backlog (path to "all functionality covered")¶
Comprehensive coverage is delivered by ratcheting FLUTTER_MIN_COVERAGE, not in a
single pass. Remaining gaps are tracked as tickets; priority order:
- Release-critical flows: auth, booking, payments, care flows, chat, documents.
- Per-feature domain + data layers (cheap, high-value unit coverage).
- Presentation/provider logic.
- Legacy FlutterFlow widgets - covered by golden + patrol where unit testing is impractical.