We had a Dataflow pipeline ingesting clickstream events into BigQuery. One morning a schema change upstream dropped a required field and the pipeline started writing nulls into our conversion metrics table. I caught it about two hours in when our dashboard numbers looked off. I rolled back the pipeline, patched the schema, backfilled the affected rows, and worked with the upstream team to add schema versioning. The fix took about four hours total. After that we agreed to communicate schema changes in advance.
A Dataflow pipeline I owned was writing clickstream events to BigQuery when an upstream team silently dropped a required field. Nulls propagated into our conversion metrics for two hours before downstream reports broke — we had no schema validation at ingestion. I stopped the pipeline, backfilled roughly 1.4 million affected rows from raw Pub/Sub replay, and got the dashboard current within three hours. But the real fix was structural: I added a schema registry check as a pipeline precondition so any undeclared field change fails fast at ingestion, not two hours later in a BI tool. I also wrote the incident post-mortem and proposed the pattern across three other pipelines in our area, two of which adopted it within the sprint. We went from zero schema violation alerts to catching four incidents pre-propagation in the following quarter.