The paper CAP Twelve Years Later: How the “Rules” Have Changed (2012) by Eric Brewer is a discussion of the CAP theorem and how it has influenced the development of modern distributed systems. The CAP theorem asserts a hard limitation on the available properties of a distributed storage system, and since its introduction developers and researchers have based their strategies for distributed storage on this explicit limitation. This paper suggests, however, that this seemingly unavoidable limitation isn’t as hard a requirement as people think it to be, and that by taking a more realistic approach to these problems can lead to better results which don’t conflict with the premise of the CAP theorem.
The CAP theorem states that any networked shared-data system can have at most two of three desirable properties: consistency (C) equivalent to having a single up-to-date copy of the data, high availability (A) of the data (for updates), and tolerance to network partitions (P). This expression of the CAP theorem was introduced to open the minds of designers to a wider range of systems and tradeoffs, pushing developers and researchers to consider the structure of their systems and what is actually possible to achieve in these settings. However, the CAP theorem is an oversimplification of the realities of system development, and by following the “rules” laid out by the CAP theorem, developers and researchers are hamstringing their own efforts to design and create better distributed systems.
The author suggests that the “2 of 3” rule is misleading, noting that the original reason the CAP theorem existed was to highlight the role of partitions in the growing space of cloud computing. When partitions don’t exist, there is usually no reason to forfeit consistency or availability, meaning that in classical systems, the CAP theorem isn’t very applicable. When partitions are introduced, however, there becomes complications that can’t be generally solved. But even in these scenarios, consistency and availability aren’t necessarily binary. Availability is continuous (on a range from 0 to 100 percent) and there are many levels of consistency. Not only this, but consistency and availability need not be static. Subsystems can make different choices between CAP, and these choices may change depending on the operations, data, and even users. As such, the naive understanding of the CAP theorem lacks a complete picture of the situation, and the author asserts that the fundamental challenge of exploring these nuances requires pushing the traditional way of dealing with partitions.
The author states that, operationally, the essence of CAP takes place during a timeout, which is a period when the program must make the partition decision: cancel the operation (and decrease availability) or proceed with the operation (and risk inconsistency). Thus, pragmatically, a partition is a time bound on communication. This view means that there is no global notion of a partition (since some nodes may detect a partition while others do not) and that nodes can detect a partition and enter a partition mode. As a result, designers can set time bounds according to target response times, which allow the designers of these systems to have explicit control over consistency and availability as they need it. Thus, the challenge for designers is to mitigate a partition’s effects on consistency and availability.
Essentially, it would seem the issue with the CAP theorem, when interpreted directly, is that it omits the context of the system which is arguably the most important part. Logically, the CAP theorem is not incorrect, but it exists in the vacuum of theory. A real-life distributed system is almost guaranteed to serve some purpose, and the goal of a systems designer is to fulfill these purposes rather than appease the conditions of a theorem. By focusing on the goal of the system, one can better understand how the limitations prescribed by the CAP theorem are non all encompassing.
Through the paper, the author breaks down of the management of a partition. He states that this approach has three steps: (1) detect the start of a partition, (2) enter an explicit partition mode that may limit some operations, and (3) initiate partition recovery when communication is restored. By analyzing a partition from this perspective, the author is able to offer various solutions and approaches to these steps which can potentially resolve the realized issues of distributed systems and avoid the limitations of the CAP theorem to some extent.
When a node in a system detects a partition, it can enter a partition mode. Here, the operations of the node can be limited to ensure consistency while maintaining some amount of availability. As an extreme example, when a node detects a partition it can simply refuse to perform any operation, ensuring consistency at the cost of any availability. Realistically, though, it is feasible to imagine limiting the node to operations which are safe to perform despite the partition. What constitutes safe, of course, is dependent on the application.
Consider, as an example, a user-based messaging platform. We can imagine that most messages will not be updated once they’ve been created, so allowing the user to read messages during a partition is a relatively safe procedure. It may also be considered safe to allow users to send messages in a partition since the insertion of one message should not affect the functioning of other users in the app. An example of an operation which may be considered unsafe, though, may be to change properties on channels. If two users are able to change the name of a channel, and these users become partitioned, they may both change the name of the channel simultaneously, resulting in a conflict.
Once a node resumes communication, it leaves the partition mode and enters a recovery mode. Here, the system can attempt to resolve the operations of the partitioned nodes to resume a consistent global state. The author states that the designer must solve two hard problems during recovery: (1) the state on both sides of the partition must become consistent and (2) there must be compensation for the mistakes made during partition mode.
Going back to our user-based messaging example, a node can be made consistent by inserting (and possibly reordering) messages that were created during the partition. If every message has a timestamp, we can ensure that users are able to read the messages in the correct order they were created, which is the expected functionality of the system.
On the opposite side, we may face difficulties if the users have updated data in a conflicting way. In our example, if two users edit the metadata on a channel (e.g., the channel’s name) during a partition, then we may not have a straight-forward way of resolving this during recovery. Approaches such as “last writer wins” may be tolerable, but will ultimately result in data loss. Of course, in the example of a messaging app, this data loss may be acceptable.
I believe there are a lot of different ways of classifying these systems which can inform designers of how partitions should be managed. This paper offers some examples of such systems and approaches, but doesn’t touch the subject of classification in this respect. From my perspective, it seems that there is a disconnect between the theory and the application that can (and should) be mended. For example, formulating a distributed system which attempts to achieve as much consistency and availability as possible almost requires a rejection of considering specific use-cases as there can be infinite use-cases. Solving the problem in general removes this issue, but at the cost of leveraging properties of such specific systems. On the other hand, building a system for one specific use-case means that (a) the system will likely not be reusable and (b) any situation which falls outside these considerations may be improperly handled.
I propose a second layer of refinement to the CAP theorem which acknowledges these potential specifications without losing generality. If we consider the CAP theorem to represent the most idealistic version of a distributed system, then we can possibly branch from this system into weaker versions based on their specifications.
For example, in the user-based messaging example, we can reasonably allow for some level of stochasticity relative to message timestamps. In such a system, it can seem critical that messages are inserted in the correct relative order, otherwise it can seem like a user is replying to a message that was sent after the fact. However, in reality there will almost always be a substantial amount of lag between such messages. Outside the bounds of the system, a user can only reply to a message that they’ve read, which implies that the reply will necessarily come after the original message. On top of this, the user who replies to the message can only do so after reading the message, thinking of a reply, and typing the response, which can be enough lag to compensate for any small inconsistencies in timestamps that may occur as a result of partitioning. This can result in a perceived availability from the user which might not actually exist, assuming we can keep latency within a reasonable bound.
This tolerance for temporal stochasticity is one property of weaker distributed systems which may be used to classify applications and a characteristic of many modern web applications. Such applications do not require 100% availability or perfect consistency, but are often forced to work with systems and frameworks which hinder productivity for the sake of reaching theoretical limits. This paper highlights this issue comprehensively, and I hope others take note of the low-hanging fruit offered being ignored by such systems.