Question Details

No question body available.

Tags

design architecture

Answers (4)

Accepted Answer Available
Accepted Answer
December 18, 2025 Score: 19 Rep: 46,830 Quality: Expert Completeness: 50%

You're running face-first into the Two Generals Problem. In order to safely retry the operation, you need to detect not just if the call failed, but how as well.

You can't.

The best you can do is track the state of each call so you only send it once. You will also need some mechanism in place to detect that the service processed the request. If the service sends a message back to you, then this might serve as our "ACK" - rather it would serve as a response that the request was processed successfully.

After some period of time, you'll need to trigger a workflow that might require manual intervention by people to investigate and correct the problem; how you do this is entirely dependent on the systems you are interacting with.

Another angle on the problem is to send duplicate requests if you don't receive confirmation things were processed, track this on your end, and send additional requests to undo the duplicates. Note that this is very risky, depending on the nature of the downstream system. I definitely wouldn't do this for financial transactions.

You cannot fake the idempotency of another system. My honest advice is to fire this request once, and track it's state on your side from "sent" to whatever positive or negative end result they send back. Beware that you might not ever get a response, so "no response" should be treated as a failure after some amount of time.

Failures in network transmission are not really recoverable programmatically because you often cannot detect if the call failed or how. Humans will need to be in the loop to rectify this, which might require you to build additional use cases that allow people to investigate and ultimately fix or retry the failed request; presumably after contacting someone who works at the organization that owns the other service to confirm the state of that request.

Unfortunately, the Two Generals Problem teaches us that you cannot with 100% certainty know your message made it through, nor can the other side know their response made it back. Instead, you need to record the current state of things and allow for human intervention.

December 19, 2025 Score: 9 Rep: 119,848 Quality: High Completeness: 80%

It might be possible even if the API doesn't explicitly support idempotency. It depends on what you tell it to do.

Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application.

Wikipedia.com

Create a new resource and tell me what you called it. Not idempotent.
Create a resource called Resource A. Idempotent. There can be only one.

Update count so that it increments its current value. Not idempotent.
Update count to be 5 after learning that it was 4. Idempotent.

Replaying the idempotent forms wont do anything if they were previously successful (well, except waste time).

Where that count solution gets interesting is how you learned it was 4. How do you know it's still 4? Same issue with how you decided to call it Resource A. If that name isn't permanently reserved for you then every check you make that it's available will be stale and unreliable by the time you use it.

This is why it's unreliable to test if a file exists before writing to it. Race conditions are easy to ignore and hard to debug. But the issue still doesn't demand a special API. It's about controlling who all is allowed to mess with count or Resource A. If controllers can't coordinate there should be only one.

But these are concerns that go beyond idempotency. All idempotency gives you is that repeats don't matter.

Many APIs that were not designed to be idempotent can still be carefully used in an idempotent way. But you have to know what you're doing.

However, if you have code on the other side of the noisy network that can reliably use the API then you can safely squelch replays by simply numbering your requests. Now you can safely say 101:increment x a dozen times (while waiting for an ack) because your code stops responding once it's seen the first request numbered 101.

Achieving idempotency doesn't solve every concurrency/unreliable network problem. But it helps when impatient users repeatedly pound buttons.

December 19, 2025 Score: 2 Rep: 49,577 Quality: Medium Completeness: 30%

Say you have a transaction that needs to modify two databases. Each change, you send a request through the network, the database makes the change, then sends a message back that it made the change. If and when you receive two replies, you know your transaction has been performed.

What can go wrong with one change? You can lose network access, and know the request was never sent out. You may lose network access just after sending the request, and it never arrives at the database. The database is turned off just before or after handling your request. It may never send back the confirmation. The database or you lose the network connection just after the confirmation is sent.

Since you have two requests, there may be two confirmations that you are missing. At the same time each request may or may not have been performed.

A simple solution is to create a unique ID for each transaction. If everything works you get two confirmations with that same ID back. If not then you resend the requests where you didn’t get a confirmation, changing the handling of the request so that nothing is done if the database already processed a request with that ID.

What you need to do is ensure that each database doesnt process a request with a known ID (but it must send a confirmation saying that the request was performed earlier). And you must re-send the request after a reasonable time, for example after you regained network access if you lost it, or you could ask the database if it has network access (and no answer means it has no network access).

Also each database must be stable in the sense that it either processes a request or does nothing.

One place I worked burglars stole all the RAM from the servers. If you design it well then your computer will retry every hour until the company bought new RAM, plugged it in and restarted the servers.

December 19, 2025 Score: 2 Rep: 4,556 Quality: Low Completeness: 10%

I find it unlikely, that each of your agents needs truly autonomous access to an external API.

Synchronize writes via an internal journalling gateway.

Journalling makes arbitrary operations idempotent so long as access is synchronized (which your gateway would do).