I'd use Pub/Sub for ingestion since it handles high-throughput message streaming and decouples the mobile clients from the processing layer. From there, I'd run a Dataflow streaming pipeline to parse, validate, and transform events before landing them in BigQuery. For freshness, I'd target a five-minute end-to-end latency using streaming inserts into BigQuery. I'd partition the BigQuery table by event date and cluster on event type to keep query costs down. For reliability, I'd set up Cloud Monitoring alerts on pipeline lag and dead-letter topics for malformed events.
Before I pick services, let me clarify constraints: freshness SLA, tolerable duplicate rate, and whether schema changes are expected. At one billion events per day — roughly eleven thousand per second — I'd use Pub/Sub for ingestion with message deduplication IDs on the client side to enable idempotent writes. Dataflow would handle streaming processing with exactly-once semantics using its native checkpointing. For schema evolution, I'd enforce backward-compatible changes through a Pub/Sub schema registry and version events with a schema ID so Dataflow can route to the correct transformation logic without reprocessing failures. BigQuery receives partitioned streaming inserts; I'd measure freshness lag via a Cloud Monitoring SLO with a five-minute P99 target. Critically, I'd build a Dataflow batch backfill job from day one — triggered off Cloud Composer — so that any pipeline outage can be replayed from Pub/Sub's seven-day retention without manual intervention. I've run this pattern at roughly two billion events per day and kept freshness under four minutes P95.