Skip to content

Testing & the CI/CD release gate

This page is the source of truth for how the Flutter apps (perci-platform-members and perci-platform-clinicians) are tested and what must be green before a change can be released. The goal is continuous delivery: a change merges and releases as soon as the gate is green, multiple times a day.

Test types

Both apps use the same four layers, and the same tooling (patrol, golden_toolkit, mockito).

Layer Tool Lives in Runs in CI
Unit (domain/data) flutter_test test/features/**/domain, **/data every PR (blocking)
Widget flutter_test + ProviderScope overrides test/features/**/presentation every PR (blocking)
Golden golden_toolkit (@Tags(['golden'])) next to the widget, in goldens/ every PR (see Goldens)
E2E patrol (Chrome/web) patrol_test/ pre-release / on-label (separate)

Mocking convention

  • Prefer hand-rolled fakes that implement the domain repository interface for unit/provider/widget tests - they are explicit, fast, and need no codegen.
  • Use mockito (@GenerateMocks + build_runner) for new tests where call verification or mocking a concrete SDK/generated client adds real value.
  • Providers are tested with a ProviderContainer (or ProviderScope) overriding the repository/datasource provider with a fake. For autoDispose providers, hold a listener before awaiting so the provider is not disposed mid-load.

Shared harness

packages/perci_platform_test_shared is the single source of truth for the fiddly, app-agnostic test setup: the silent network-image HTTP layer, the Firebase Analytics fake, the package_info / secure_storage / datadog channel mocks, golden_toolkit configuration, device presets and the Firebase core mocks. Each app keeps a thin test/.../golden_harness.dart that calls GoldenHarnessBase.baseGlobalSetUp() then wires its own Firebase init, auth manager and FFAppState. Patrol widget wrappers stay per-app (they embed each app's root widget).

The release gate (.github/workflows/flutter-pr-checks.yml)

A PR to develop or main must pass:

  1. Code generation - melos run build_runner (openapi + freezed + riverpod).
  2. Analyze (errors block) - flutter analyze --no-fatal-infos --no-fatal-warnings. Errors fail the build. Warnings/infos are reported but not yet fatal - they are a ratcheting backlog (see below). Flip to fatal-warnings once the count hits zero.
  3. Unit + widget tests - melos run test (excludes goldens), with coverage.
  4. Coverage threshold - total line coverage must be >= FLUTTER_MIN_COVERAGE (a repo variable). Ratchet this up toward 80%; never lower it.
  5. Goldens - a separate, currently-advisory job (see Goldens).

Patrol E2E runs in a separate workflow, not on every PR (see E2E).

Coverage baseline & ratchet

Set the repo variable FLUTTER_MIN_COVERAGE to the current measured floor, then raise it as the backlog is burned down. The immediate purpose of the gate is non-regression (coverage may not drop); the long-term target is 80% on hand-written code, reached by ratcheting.

The floor follows the coverage actually present on the target branch. The foundation lands at develop's current ~9.54%, so FLUTTER_MIN_COVERAGE starts at 9 and is ratcheted to 11 once the stacked test PRs (993 tests) are on develop, then upward toward 80%. Raise it after each coverage-improving PR; never lower it below the branch's real coverage.

Scope Line coverage
Merged (what the gate checks) 11.54% (7695/66664)
perci-platform-members 13.85% raw / 13.26% testable
perci-platform-clinicians 9.66% raw / 10.10% testable

Well-covered feature areas already: members video_call 94%, scans_and_tests 47%, checkout 39%, code_signup 31%; clinicians payments, documents, screening, member_dashboard. The gap to 80% is almost entirely legacy FlutterFlow UI (onboarding, main_pages, appointments, new_a_p_p) - best covered by widget + golden + patrol rather than unit tests, and tracked in the backlog below.

Analyze warning ratchet

melos analyze currently reports ~360 warnings/infos across the workspace, almost all pre-existing in legacy FlutterFlow code (perci_library_9rk85z) and a few in older test infra. There are no error-severity issues, so the error-only gate passes today. Burn the warning count down (a chunk is auto-fixable via dart fix --apply), then make warnings fatal in the gate.

Goldens

Golden tests run on every PR for both apps via flutter test --tags golden (the golden job in flutter-pr-checks.yml). The job is non-blocking during the PPL-2637 test-coverage rollout - it validates both apps, but the consistent baselines and dropped non-deterministic scenarios land across the stacked test PRs, so it can only pass on the fully merged set. Re-make it blocking (remove continue-on-error) once both apps' golden suites are on develop.

Cross-platform: render in boxes, not real fonts

Real fonts rasterise differently per OS (Windows DirectWrite, macOS CoreText, Linux FreeType), so golden PNGs made on one machine never match another - a mixed Win/Mac/Linux team plus Linux CI can't share real-font baselines. So our goldens do not load real fonts: the harness never calls loadAppFonts(), so Flutter's test environment renders all text in the Ahem font (every glyph a fixed square). Ahem output is identical on every platform, so a baseline generated on any dev machine matches CI exactly. (This is the same trick as Alchemist's "CI mode"; we do it directly rather than add the dependency.)

Consequence: goldens verify layout, sizing, colour and structure - not readable text (text shows as boxes). Text content is asserted by widget tests. A small 3% tolerance comparator (test/flutter_test_config.dart) absorbs residual sub-pixel anti-aliasing at box/shape edges.

  • Regenerate on any OS with flutter test <path> --tags golden --update-goldens - boxes are platform-independent, so the result matches CI. The Flutter - Update Goldens workflow does the same on CI and opens a PR.
  • Not golden-tested (non-deterministic regardless of fonts): live camera/video widgets (the old meeting_room / waiting_room goldens) and the animated WelcomePage were dropped; a couple of ultra-narrow scenarios are skip-ed where the wider Ahem glyphs tip a flex-less row into overflow (covered by their wider siblings + widget tests).

E2E (patrol)

Patrol tests run against Chrome (web) via Playwright and are slow + device-bound, so they do not run on every PR. Both apps share one workflow, .github/workflows/flutter-patrol-tests.yml, which inspects the PR's changed files and runs a matrix job per affected app: a change under packages/ or the root pubspec.yaml runs both, an app-only change runs just that app, and the generated clinical BFF spec runs clinicians. It runs on release/* and hotfix/* PRs into main, and on demand via the preview label.

The shared harness lives in patrol_test/ (patrol_setup.dart, patrol_widget_wrapper.dart, helpers/clinician_session.dart, pages/). patrol test discovers patrol_test/ by default. Clinician flows covered: sign-in (+ forgot-password), sign-out, members-list search, members-list filter, member details (contact + demographic), member medical record (documents + screening sections), appointments (tabs), messages, payments, and the Learn hub (open + search). Flows reuse the proven selectors from the existing integration_test/pages/; section/tab flows are permission-gated and skip cleanly when a role lacks access. These are authored + analyze-verified; runtime execution is the patrol CI job.

Running locally

melos bootstrap            # resolve deps (first time / after pubspec changes)
melos run build_runner     # generate openapi + freezed + riverpod
melos run analyze          # static analysis
melos run test             # unit + widget (excludes goldens)

# single app, with coverage
cd apps/perci-platform-members && fvm flutter test --coverage --exclude-tags golden
# goldens for one file
fvm flutter test <path> --tags golden            # compare
fvm flutter test <path> --tags golden --update-goldens   # regenerate

Coverage backlog (path to "all functionality covered")

Comprehensive coverage is delivered by ratcheting FLUTTER_MIN_COVERAGE, not in a single pass. Remaining gaps are tracked as tickets; priority order:

  1. Release-critical flows: auth, booking, payments, care flows, chat, documents.
  2. Per-feature domain + data layers (cheap, high-value unit coverage).
  3. Presentation/provider logic.
  4. Legacy FlutterFlow widgets - covered by golden + patrol where unit testing is impractical.