Prep by Company
Software Dev Engineer SDE Product Manager PM Data Scientist DS Data Engineer DE ML Engineer MLE Technical PM TPM
Software Engineer SWE Product Manager PM Data Scientist DS Data Engineer DE ML Engineer MLE Technical PM TPM
Software Engineer SWE Product Manager PM Data Scientist DS Data Engineer DE ML Engineer MLE Technical PM TPM
Software Engineer SWE Product Manager PM Data Scientist DS Data Engineer DE ML Engineer MLE Technical PM TPM
Software Engineer SWE Product Manager PM Data Scientist DS Data Engineer DE ML Engineer MLE Technical PM TPM
Software Engineer SWE Product Manager PM Data Scientist DS Data Engineer DE ML Engineer MLE Technical PM TPM
Software Engineer SWE Product Manager PM Data Scientist DS Solutions Architect SA ML Engineer MLE Technical PM TPM
Guides About Get Your Playbook →
The Bar Raiser's Debrief · Amazon Data Engineer

"How would you design a data pipeline that needs to guarantee exactly-once delivery at Amazon scale?"

Dive Deep Data Engineer 5–7 min
Why candidates fail: Candidates recite textbook definitions of exactly-once semantics without acknowledging the real cost tradeoffs or demonstrating they have actually built and operated a system that makes and keeps that guarantee under failure conditions.
Two voices. One question. The insider reaction you don't usually see.
Also on YouTube 5–7 min 2026
"How would you design a data pipeline that needs to guarantee exactly-once delivery at Amazon scale?"
Competency tested
Dive Deep
Who asks it
Bar Raiser · HM · Peer
What they're really asking
Have you owned this guarantee under real failure?
The answer that fails — and why
Candidate answer Does not raise the bar — Dive Deep

To guarantee exactly-once delivery, I would use idempotent writes and track message offsets carefully. With Kafka, you can enable exactly-once semantics using transactional producers and idempotent consumers. On the sink side, you assign each message a unique key and use a deduplication window to catch any replays. I would also implement checkpointing in the stream processor so we can replay from a known state after failure. For monitoring, I would set up lag metrics on the consumer group and alert on processing delays.

Bar Raiser evaluation
Describes configuration options — no evidence of having built this in production
Deduplication window sizing not addressed — the hardest operational decision
No acknowledgment of cost or latency tradeoff exactly-once imposes at scale
Monitoring scoped to lag only — no mention of idempotency key collision or downstream impact
Prefer to hear it? Watch the video for the two-voice delivery with live reaction commentary.
Amazon debrief · DE loop · Bar Raiser evaluation Below Bar
Leadership Principle: Dive Deep
Does not demonstrate Dive Deep.
Recites Kafka transactional API — no evidence of production ownership or failure experience
Deduplication window sizing omitted — this is where exactly-once breaks in practice
No cost or latency tradeoff articulated — exactly-once is not free at Amazon scale
Monitoring answer is shallow — lag metrics alone do not protect the downstream contract
interview101.com · Dive Deep · Amazon DE · Bar Raiser debrief reference
Now here's what a strong answer actually sounds like
The answer that works — in full
Strong answer Raises the bar — Dive Deep

I have built this guarantee in production and the first thing I learned is that exactly-once is a contract with downstream consumers, not a property of the pipeline alone. The design starts with idempotency keys scoped to the consumer's use case — not a generic message ID. For Kafka, I use transactional producers with offset commits inside the same transaction, and I size the deduplication window based on the maximum observed retry interval under our worst incident — not a default TTL. The real cost is write amplification and increased P99 latency; on our highest-throughput pipeline we saw a forty percent latency increase, so we negotiated exactly-once only on the paths where downstream duplication caused financial impact. I also instrumented idempotency key collision rates as a first-class metric, because silent key collisions are how this guarantee fails without alerting.

Bar Raiser evaluation
Frames exactly-once as a downstream consumer contract — correct mental model
Idempotency key design scoped to consumer use case, not generic — shows depth
Deduplication window sized from real incident data — production ownership evident
Latency cost quantified and tradeoff negotiated with stakeholders — Dive Deep and Ownership
Amazon debrief · DE loop · Bar Raiser evaluation Raises Bar
Leadership Principle: Dive Deep
Strong signal. Raises the bar.
Frames exactly-once as a downstream contract — demonstrates correct system-level mental model
Idempotency key design driven by consumer use case, not default configuration
Deduplication window sized from observed incident data — clear production ownership
Latency cost quantified; tradeoff scoped to financially-impactful paths — Dive Deep and Frugality
interview101.com · Dive Deep · Amazon DE · Bar Raiser debrief reference
Run your story through these three questions
1
Can you name the specific failure mode that broke your exactly-once guarantee?
If you cannot, the Bar Raiser will assume you have never actually operated this system.
2
Have you quantified the latency or cost tradeoff of maintaining this guarantee at scale?
Without a number, your design answer sounds like architecture theory, not production engineering.
3
Do you monitor idempotency key collisions as a first-class production metric?
If not, you cannot actually prove the guarantee is holding for downstream consumers.
Get your personalized report
How do your real stories score?
Get a personalized report scored against the interview rubric Amazon uses for your role.
Get your Amazon Data Engineer report →
Other questions from the same loop
Each video covers a different competency tested in the Amazon Data Engineer loop
Explore the full Amazon Data Engineer prep hub