Things that will go wrong in a distributed system
A (very) incomplete list of things that will go wrong in any distributed system.
Feel free to submit a PR to add more failure cases to this list.
Network
- The network will be partitioned
- Latency will grow more than expected
- Timeouts will happen on nodes that are alive
- Your network bandwidth is limited, and you will hit that limit
Time
- Clocks will go backward
- Monotonic clocks will go backward [1], [2]
- Clocks will be out of sync, by more than a few seconds sometimes
- Your NTP server will die
- You will have timezone issues
Hardware
Databases
- Without SSI, you will have inconsistencies
- Without SSI, you will lose data
- Without a proper consensus, you will have more than one leader
- With a proper concensus algorithm, you will have issues too
- Without linearizability, clients will time travel
- Without 2PC, you will have inconsistencies