Paxos is a recent breakthrough in consensus algorithms used in distributed systems. But, even with sound logic and theoretical backing, an algorithm can only grow in popularity when practitioners are able to leverage it. With the ever increasing demand for cloud computing services, Paxos can start to really see value in the business sense. The paper ZooKeeper: Wait-free coordination for internet-scale systems by P. Hunt, M. Konar, F. P. Junqueira, and B. Reed (2010) describes a system built on Paxos that attempts to do just this.
ZooKeeper is a service for coordinating processes of distributed applications. The keyword here is “service,” which implies a non-atomic use of this technology. The intention is for ZooKeeper to be used to build other distributed applications, allowing people to leverage the power of Paxos without having to re-implement it from scratch. In contrast to the Paxos implementations found in applications such as Google’s Chubby, ZooKeeper isn’t meant to solve a specific problem but, rather, to provide a framework for others to be able to use (one questions if Google would have used ZooKeeper in-place of building their own Paxos-based database for use with Chubby). The paper provides three main contributions: coordination kernel, coordination recipes, and experience with coordination.
A coordination kernel works to provide flexibility to users of ZooKeeper so that they can provide new primitives without requiring changes to the service core. This allows applications comprised of a large number of processes to use the coordination kernel to manage all of it’s aspects of coordination. Coordination recipes allow ZooKeeper to be used to build higher level coordination primitives. The authors even note that primitives such as blocking can be implemented, which means that Chubby could have been designed using ZooKeeper. As such, ZooKeeper can be seen as a more generalized version of the Paxos implementation used to build Chubby, which speaks for the goal of ZooKeeper being an extendible solution to distributed system coordination.
This paper does a very meticulous job of describing how ZooKeeper works and how it manages to incorporate the fault-tolerance of Paxos in an extensible system that can be used by others without being confined to one specific approach or strategy. They keep it flexible through the use of a hierarchical name space via a set of data nodes which they call znodes. The client manipulates these znodes through the ZooKeeper API, and the underpinning algorithms maintaining the guarantees found with Paxos.
Personally, I think these graphical approach is a welcomed addition to the Paxos ethos (for lack of a better phrase), but the paper fails to emphasize the role the znodes takes in the overall algorithm. I feel a graphical explanation of Paxos may allow for readers less familiar with distributed systems (such as myself) to get a better barring on how this system works and what one can do with it. Instead, it would seem the paper treats znodes as simply an implementation detail rather than an explanation.
The paper also discusses two basic ordering guarantees of ZooKeeper: linearizable writes and FIFO client order. Linearizable writes says that all requests that update the state of ZooKeeper are serializable with respect to precedence and FIFO client order says that all requests from a given client are executed in the order that they were sent by the client.
The linearizability of writes seems to be an important feature of ZooKeeper’s architecture, while the FIFO guarantee seems to be derivative in that it was a choice made to facilitate the linearizable guarantee of the system. They describe a scenario in which we see these two guarantees interact where a new leader takes charge of the system and must change configuration parameters for the system. This process has two important requirements: processes should not use the configuration that is being changed and if the leader dies before the configuration has fully updated, processes should not use the partially changed configuration.
At first glance, this process (and it’s requirements) seem to be naturally handled by Paxos. If the process of updating a configuration can be viewed as an update within the Paxos system, then we’d know that those changes couldn’t be utilized by the system until a majority of nodes commit to the change. However, the situation described here is slightly more complex. The configuration that the leader attempts to change is not an element of the underlying application, but of the system itself. In other words, we can’t guarantee the fault tolerance of Paxos if the Paxos system, itself, is not logically sound, and the issues addressed present such a case.
The authors note that locking, such as those provided by Chubby, can help with the first requirement but can’t help with the second. Instead, ZooKeeper has the new leader designate a path to the ready znode, and other processes will only use the configuration when it exists. The new leader makes the configuration change by deleting the ready, updating the configuration, then creating ready. Because of the ordering guarantee, when a process sees the ready znode, it must also see the configuration changes made by the new leader.
Once again, the use of znodes in this graphical architecture seems to make good use of the intuitive nature of graphs in such an abstract setting. However, the authors fail to elaborate on the mechanics of such a system. I think, in this way, the use of znodes becomes more cumbersome rather than useful: instead of either thinking of this system in an abstract manner or thinking of it in terms of graphs, I’m instead having to make the translation myself. Their use of an example to demonstrate these guarantees, rather than a formal proof or a logical walk-through, adds to the confusion.
As a technical paper on a complex system, ZooKeeper does a great job of explaining the aspects of the system in a tangible way. However, it’s approach seems muddled by having to navigate between different abstract structures to be able to explain it’s mechanisms. Although it may not be feasible (or possible, even), it would be in everyone’s benefit if there were a unified approach to describing this system. This could lead to other researchers adopting this framework, making ZooKeeper’s infrastructure close to the theory and providing the field with a unified description of Paxos.