During a backend refactor, I introduced a caching bug that caused intermittent data inconsistencies in production. My manager flagged it in code review after it had already shipped. I dug into the logs, found the root cause — I had misunderstood the cache invalidation contract for that service — and pushed a fix within the day. I also wrote up a post-mortem and shared it with the team. Looking back, I should have tested the edge cases more carefully, and that experience reminded me to always validate my assumptions during code review.
During a backend refactor, I introduced a cache invalidation bug that caused stale reads for roughly twelve percent of requests over two days before my manager caught it in review. I owned it fully — I had made an undocumented assumption about the invalidation contract without reading the service spec or writing an integration test that would have caught it. The real root cause was not the bug itself — it was that I had a habit of treating internal service contracts as obvious rather than explicit. I drove a systemic fix: I added a required integration test gate to our PR checklist for any service boundary change, wrote the internal guidance doc, and ran a team session on contract-first design. Eighteen months later that checklist has caught three similar issues from other engineers. The lesson was not 'test more carefully' — it was that I was skipping a class of risk entirely, and I needed a mechanism, not just more diligence.