We were about to launch a new notifications feature and my analysis showed that the engagement lift in the experiment was concentrated in a segment of highly active users — not the broader population the PM was citing. I flagged this in a review meeting and explained the segmentation issue. There was some pushback, and the PM felt the overall numbers still supported the launch. I made sure my concerns were documented, and we ended up launching with a plan to monitor closely post-launch. The metric did flatten out after a few weeks, which validated what I had seen.
We were three weeks from launching a feed ranking change. My experiment readout showed a 4% DAU lift overall, but when I segmented by user tenure, the lift evaporated for users under 90 days — our highest-retention-risk cohort. The PM wanted to proceed. I blocked the launch in the review doc and requested a second meeting with the HM present. I ran a secondary analysis showing that new-user 30-day retention was down 1.8 percentage points in the treatment group. That is the metric that compounds. I held the position that we were trading long-term retention for short-term DAU, and I would not sign off until we either fixed the ranking logic for new users or scoped the rollout to exclude that cohort. We scoped the rollout. Six weeks later, new-user retention in the excluded cohort was 2.1 points higher than the launched segment.