I'd start by looking at the usual suspects — data distribution shift between training and serving, maybe a feature engineering mismatch, or possibly label leakage in the training set. I'd also check whether the serving pipeline was preprocessing inputs the same way as training. Position bias is another thing worth considering if the new model is surfacing content the old ranker never showed. Once I narrowed it down I'd file a bug, work with the data team to pull logs, and iterate from there.
First thing I do is check whether this is a serving incident or a model quality issue — I pull the prediction score distribution from our serving logs and compare it against the offline holdout distribution. If those diverge, I know I have a serving skew problem before I even look at features. My priority order is: serving preprocessing mismatch first — it's the most common cause and fastest to verify — then feature distribution drift, then label leakage. I'd have a monitoring dashboard that already surfaces feature statistics at serving time versus training time, so I can isolate the failing feature in under an hour rather than guessing. If it's position bias, I check whether our training labels were collected under the previous ranker's distribution and run a swap test. Within 24 hours I'd have a root cause hypothesis, a rollback decision point, and a fix scoped. Going forward, I'd add a serving-versus-training skew alert to the launch checklist so no model ships without those checks in place.