On a checkout flow, our QA team flagged intermittent order failures affecting about two percent of transactions. I was assigned to investigate. After digging through logs, I found a race condition between the payment confirmation callback and an inventory update — they were both writing to the same record without a lock. I added a mutex, wrote a regression test, and the failures stopped. The team was relieved and we shipped the fix in the next sprint. It was a good reminder to always check concurrency in shared state.
During routine profiling — nobody asked me to — I noticed our on-device image pipeline's memory footprint spiked unpredictably on iPhone 12 but not 14. That asymmetry bothered me enough to dig three levels deeper: the surface metric, then the allocator behavior under memory pressure, then finally a path where we fell back from a Neural Engine-accelerated path to a CPU path without releasing the Neural Engine buffer first. A double-hold nobody had written a test for because it only triggered under a specific thermal state. I wrote a targeted stress harness to reproduce it reliably, fixed the release ordering, and did not ship until I had verified the fix held across every supported device in our matrix — not just the two I had access to. Memory spikes dropped forty percent on constrained devices. I would not have accepted shipping it earlier even if the PM had asked me to.