About two years ago I introduced a caching bug during a backend refactor that caused some users to see stale data. I caught it during code review follow-up — a colleague noticed something off in the logs. We rolled back the change within the hour and pushed a corrected version the same day. After that I added a unit test to cover that specific cache invalidation path and made sure to double-check similar patterns in future PRs. It was a good reminder to be more careful with caching logic.
During a backend refactor I introduced a cache invalidation bug that served stale friend-list data to roughly 400,000 users for about 90 minutes before I caught it in our latency dashboards. The root cause was a TTL override I added that silently bypassed our standard invalidation path. I owned the incident end-to-end — wrote the postmortem, led the rollback, and personally communicated the scope to the downstream teams whose pipelines consumed that data. The permanent fix was a shared invalidation wrapper that enforced TTL contracts at the library level, plus a canary alert that fires whenever cache hit-rate drops more than 15 percent on that service. Six months later, zero recurrence.