Times, Clocks, and the Ordering of Events - Review

The paper Time, Clocks, and the Ordering of Events in a Distributed System (1978) by Leslie Lamport describes the concept of logical clocks in distributed systems in order to provide a total ordering of events in the system.  Lamport discusses the partial ordering defined by the “happened before” relation and then extends it through to a consistent total ordering of all events.  Lamport illustrates this algorithm through a simple method for solving synchronization problems, and avoids issues with anomalous behavior introduced when the ordering obtained by the algorithm differs from that perceived by the user by leveraging physical clocks to keep the logical clocks synchronized.

First, Lamport defines the “happened before” relation which does not rely on physical clocks.  The definition depends on a system which is composed of a collection of processes, and each process consists of a sequence of events.  Sending and receiving a message is also considered an event in a process.  Lamport gives a formal definition of this relationship, which says that if a and b are in the same process, then a happened before b if a comes before b in the sequence of events in the process, if a is the sending of a message and b is the receipt of a message then a happened before b, and if a happened before b and b happened before c then a happened before c.  Two events a and b are concurrent if a didn’t happen before b and b didn’t happen before a.

Next, Lamport introduces logical clocks into the system.  He defines a clock for each process to be a function which assigns a number to any event in that process.  The system of clocks is said to be correct if it meets the clock condition, which states that if a happened before b, then the value assigned by the clock to a is less than the value assigned by the clock to b.  He notes that the converse is not necessarily true, and that the clock condition is satisfied if it is true that if a and b are events in the same process and a comes before b then the value assigned to a is less than the value assigned to b and if a is the sending of a message by one process and b is receiving the message by another process, then the value assigned to a is less than the value assigned to b.  Lamport suggests that the clock condition can be guranteed if we implement clocks following two rules: each process increments it’s clock value between any two successive events and if event a is the sending of a message then the message will contain the clock assignment for a, too, and if event b is the receipt of the message, then the clock assignment for b is set to a value greater than or equal to that of b’s processes current clock value and greater than that of the clock value for a.  By following this implementation, one can yield a total ordering over all events in the system, which can be used to solve non-trivial tasks for distributed systems.

Lamport gives an example of an algorithm that leverages the use of logical clocks to produce a total ordering, but notes anomalous behaviour that can happen as a result.  He gives an example of a user who sends a request to a remote computer, then tells another use from a third computer to send a message to the same remote computer.  Due to the nature of logical clocks, it is possible for the second request to receive a lower timestamp than the first request.  This happens because the system has no way of knowing that the first event actually preceded the second event since that information is based on messaging external to the system (i.e., the first user telling the second user to send the second message).  Lamport introduces the concept of a strong clock condition, which, although similar to the previous clock condition, utilizes external information which can provide us with the information needed to avoid such anomalous scenarios.

Lamport, then, turns his attention to physical clocks in order to provide the needed information to satisfy the strong clock condition.  For this to work, the physical clocks in question need to satisfy two conditions: that their rate of change is similar and that their actual values are similar.  To ensure this, we must find a value μ such that if event a which occurs at time t happened before b, then then b occurs at a later physical time than t + μ (i.e., μ is less than the shortest transmission time for the interprocess messages).  To avoid anomalous behavior, we need to ensure that the clock value assigned at t + μ is greater than that assigned at t.

In order to ensure that the strong clock conditions hold, Lamport defines the total delay of the message as the difference between the send time and receive time and assumes that each receiving process knows some minimum delay such that the total delay is less than or equal to the minimum delay.  A message is sent along with it’s timestamp, and the receiving process assigns a clock value equal to the maximum of either the processes perceived timestamp or the received timestamp plus the minimum delay.  Lamport is able to prove that, by following these rules, anomalous behavior can be avoided.

One thing that stands out to me about this paper is the use of rather simple and intuitive ideas to construct very strong and useful algorithms that are still being used today over 40 years later.  Obviously much work has been done since the introduction of logical clocks, but many of these concepts seem to stand on their own in our current systems.

I question how (or if) more modern techniques and approaches can be used to supplement these algorithms.  Specifically, Lamport’s use of a minimum (and unpredictable) delay seems to be a possible point of inefficiency.  Although we have access to more accurate physical clocks, it would seem that Lamport’s suggestion is static and provides an upper bound for error which may be ineffective in settings where the delay can be significant.

To overcome this, perhaps each process can utilize machine learning to provide a more accurate estimate of the unpredictable delay.  Global state information can be utilized to predict the value (and possibly bounds on error), and supplemental information acquired through the transmission process (e.g., appended to the sent state information by routers, etc.) can be leveraged to produce a more accurate value.  In tasks where knowing the precise delay is important, this approach may be suitable and flexible.

Show Comments