We had a pipeline that was aggregating order data for a downstream reporting team. A schema change was pushed by another team without a migration script, and it caused a silent data drop for about six hours. I caught it during the next morning's data validation check and immediately looped in the team. We patched the schema, backfilled the missing data, and filed a ticket to improve the schema change process. I also suggested we add schema validation checks in our CI pipeline going forward. It was a rough morning, but we got it resolved.
This one is on me. I owned the ingestion pipeline for order data, and I had not implemented schema validation at the entry point — I assumed upstream teams would flag breaking changes. They didn't, and a silent schema drift caused a six-hour data drop that corrupted downstream reports for two business teams. I diagnosed it, drove the backfill personally, and then did not stop there. I wrote a schema contract enforcement layer into the pipeline — deployed within 48 hours — and added a P99 data freshness alert that pages me directly. Silent failures in that pipeline are no longer possible. Data quality incidents in that domain have been zero for eight months.