Datadog Backend Observability (Firebase Functions Gen2)¶
This document defines backend Datadog setup for:
- Logs: Cloud Logging -> Pub/Sub -> Dataflow -> Datadog
- Traces: post-deploy Cloud Run instrumentation via datadog-ci
Infrastructure (Terraform)¶
Terraform creates the GCP + Datadog log pipeline in:
- infrastructure/terraform/modules/datadog-observability
- infrastructure/terraform/environments/staging
- infrastructure/terraform/environments/production
Required Terraform workspace variables:
- datadog_logs_api_key (sensitive)
Required Terraform workspace environment variables for Datadog provider:
- DATADOG_API_KEY
- DATADOG_APP_KEY
CI/CD Instrumentation¶
Backend deploy workflows run Datadog instrumentation after Firebase Functions deploy:
- .github/workflows/backend-deploy-staging.yml
- .github/workflows/backend-deploy-production.yml
The command used is:
- npx -y @datadog/datadog-ci cloud-run instrument ... --tracing true
Current services instrumented:
- api
- bff
- bff_clinical
- bff_clinical_media
If a new HTTP function is added, update both workflows to include its service name.
Verification Checklist¶
Run after staging deploy, then production deploy.
- Confirm Dataflow is healthy
- GCP Console -> Dataflow -> job
datadog-export-job-staging/datadog-export-job-prod -
Status should be
Runningwith no sustained errors. -
Confirm logs arrive in Datadog
- Query in Datadog Logs:
service:perci-platform-backend env:staging-
service:perci-platform-backend env:production -
Confirm traces arrive in Datadog APM
- Service catalog should show
perci-platform-backend. -
Check recent traces for endpoints under
api,bff,bff_clinical,bff_clinical_media. -
Confirm log/trace correlation
- Open a trace span and verify related logs are linked.
- Open a log entry and verify
dd.trace_idanddd.span_idexist in payload.
Rollout Gates¶
Use this sequence for safe rollout:
- Apply Terraform in staging.
- Deploy staging backend and verify checklist above.
- Keep staging stable for at least one deploy cycle.
- Apply Terraform in production.
- Deploy production backend and verify checklist above.
Failure Recovery¶
If instrumentation fails in CI:
1. Re-run backend deploy workflow.
2. Run manual command with dry-run first:
- npx -y @datadog/datadog-ci cloud-run instrument --project <project> --region europe-west2 --service <service> --tracing true --env <env> --version <sha> --dry-run
3. If needed, temporarily disable Datadog instrumentation step and proceed with deploy, then remediate in a follow-up run.
If log forwarding fails:
1. Check Dataflow worker logs.
2. Validate Secret Manager access for Dataflow worker SA.
3. Validate sink writer has roles/pubsub.publisher on datadog-export-topic.