Question Details

No question body available.

Tags

distributed-system apache-kafka eventual-consistency system-architecture

Answers (3)

Accepted Answer Available
Accepted Answer
November 20, 2025 Score: 10 Rep: 46,830 Quality: Expert Completeness: 70%

To be honest, I don't think the example application is modeling the requirements correctly. From what I gather, there are two events: one to produce income, and another to balance the account. The context of your example is finance, and we have specific rules for this: accounting.

You can still have a distributed system here - banks were arguabley the original distributed system - I just don't think you have two message queues. You only need one queue that accepts transactions. Each transaction will add to the account or take away from the account. Each transaction needs a timestamp, so that when some background process runs at the start of each month, it can grab all the transactions from the last month and calculate the balance. And then you need rules about how to handle late transactions; for that you would need to consult someone who runs a bank (again, we have specific rules governing accounting and bank transactions).

Obviously, you don't want the message queue being your permanent storage. The queue can be a buffer so you can accept messages while you are persisting others incrementally.

Instead of working through a concrete design, have a look at the following technologies and concepts:

  • Eventual consistency — this is the heart and soul of micro services or any system that eschews ACID transactions in favor of distributed transactions.

  • Event sourcing — captures every change to application state as a sequence of events.

  • Time series database — essentially a data store optimized for storing timestamps with a corresponding value. This isn't a silver bullet, but it's another option when you need to capture time-related data which might work better than your run-of-the-mill relational database.

There is no single way of designing distributed systems, and when you do design such a system, you need to fully understand the real world requirements. This implies you need access to subject matter experts so you can fully quantify the constraints your system needs to operate within. Beyond that, the concepts above, while abstract, provide the foundation for tackling these kinds of problems.

November 20, 2025 Score: 7 Rep: 12,721 Quality: High Completeness: 30%

There essentially is no strong ordering with a distributed system. A strong ordering is a property of a centralised system, because a centralised system is capable of deciding a single consistent ordering, and in such a system some central function has the authority to make such decisions and impose those decisions upon the other peripheral parts.

The pattern these developments often follow is that the developer starts with a "distributed" system, realises it can't do what is necessary for their purposes, then tries to build their own ad-hoc centralised system using non-local hardware elements, rather than simply restarting the development with standard centralised technologies running on a single machine, with telecoms to allow remote control.

Your example also isn't a "real world" example of how bank accounts really work.

All systems have the concept of a "settlement time", which is the budgeted duration between things happening, and the effects of these happenings propagating to all other necessary parts of the system when the system is working correctly.

In centralised systems which are highly non-local - that is, there are geographically distributed parts which are yoked together by telecoms links into one system - the settlement time can be large and variable, and the risk of breakdown (that is, a condition in which the system has detectably failed to settle within the duration expected in its design) can be very high.

November 20, 2025 Score: 5 Rep: 8,110 Quality: Medium Completeness: 70%

Your contrived example doesn't appear to be about GAAP finance accounting, and is just trying to get at causality in a distributed system. So I will ignore the XY aspect and move on, inventing some tighter assumptions for the example.

Thank you for the narrative description of the problem setup; that's good. However, we don't quite see what the Invariants and the Correctness Requirements are, so I will flesh those out. Typically it is reasoning about invariants that will let us decide whether a distributed system is correct or is buggy / racy.

A (not shown) actor will consume the Account Balance channel and take actions like sending a statement via Postal Service or via e-mail, for January and later for February. That balance consumer should avoid sending twice in a month, and it must send an accurate account balance figure which goes with the given month. The consumer can have a local state machine which helps it suppress duplicate sends, and helps it retry / recover when it reboots or other exciting events happen, like healing of a network partition.

The chief difficulty with your "Correct" and "Incorrect Order" examples is that those messages omit info which the balance consumer would need in order to properly interpret their meaning. You have stripped timestamp / causality information from them. So let's revisit the design of this Public API and its Messages.

Actor hears a message and in response to that sends an Income message. The income message should mention the clock message which triggered it, perhaps as the tuple:
(endofJanuary, 1000).

When the Account Service sends a message, it should mention the clock message that triggered it, as well:
(start
ofFebruary, (endofJanuary, 1000))

We're worried about this racy sequence of arrivals heard by the Account Service:

  • Clock: startofFebruary (what?!? no income?), followed by
  • Actor: 1000

Depending on the business rules of your Use Case, and CAP requirements, you may wish for the Account Service to be in an "inhibited" state until all Actors have sent in an endofJanuary income update. Keep that Clock message buffered, ready to be delivered as soon as we're no longer inhibited.


As a practical matter, many distributed system servers will locally increment a serial number and send that as a message ID. That lets peers conveniently summarize "I have seen all of Alice's message up through serial 7", or perhaps holes such as "I have seen through 7, and also 9, but not yet 8". TCP SACK does this all the time.

To properly interpret and act on a message, an actor like the Balance Consumer may need to see several IDs in that message. Perhaps IDs from Clock, Actor, Clock, and Account Balance. We might call this a Vector Clock, or Lamport Clock. A well-designed Account Balance service might offer the computational service of eliminating some race ambiguity, such as by being inhibited or by retrying, and that can let it get away with sending smaller messages containing fewer upstream IDs.


To clarify, I'm not contemplating sending NTP synchronized wallclock readings anywhere in this, as they only cause trouble when trying to reason about causality. When I speak of a Network Clock or a Lamport Clock, that is in the generally accepted sense of some server locally incrementing a sequence number and sending that number as its clock. Essentially time ticks forward by one with every message sent by that server.

Suppose that Alice's server A was at sequence number 7, and Bob's server B was at 1003. Then we might see messages like

  • A: (7, launchmissle)
  • B: (1003, misslelaunched)
  • A: (8, acknowledged)

We tend to think of wallclock time as global. Network clocks are always w.r.t. the sending server. So we might speak of these three observed network clocks:

  • (A, 7)
  • (B, 1003)
  • (A, 8)

Viewing a number like 7 or 8 in isolation isn't meaningful, because a distributed system doesn't really have any "global" timestamps. We always mention the sender when logging a network clock or when sending it in a message. It is legitimate for Bob to forward Alice's (A, 7) onward to Charlie's server C, to show Charlie the causal chain.