We were working on a new engagement feature for our feed ranking model. Offline AUC looked promising but we only had two weeks of training data, so the signal was noisy. I pushed to run a small holdout experiment to gather more signal before committing to a full A/B test. Once we had four weeks of data, AUC stabilized and we felt confident enough to launch the A/B. The test ran for two weeks, we saw a 2% CTR lift, and we shipped it. The whole cycle probably took six weeks.
Our Reels re-ranking model showed a 1.8-point NDCG improvement offline, but we only had three weeks of label data after a label pipeline change — not enough to be certain. Rather than waiting, I defined explicit guardrails upfront: a 1% engagement floor and a p99 latency budget of 120 milliseconds. I launched a 5% holdout A/B with automated rollback wired to those thresholds. Within 72 hours we had directional signal — engagement up 3.1%, latency clean — so I escalated to 50% traffic and shipped full that week. Total cycle: nine days.