Permission is granted to copy, distribute and/or modify this document under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) at http://creativecommons.org/licenses/by-nc-sa/4.0.

Table of Contents

List of Figures

   Figure 2.1 [code/hello1.hny] Hello World!
   Figure 2.2 [code/hello3.hny] Harmony program with two possible outputs
   Figure 2.3 [code/hello4.hny] Harmony program with an infinite number of possible outputs
   Figure 2.4 Demonstrating Harmony methods and threads
   Figure 2.5 [code/hello7.hny] Various interleavings of threads
   Figure 2.6 [code/hello8.hny] Making groups of operations atomic reduces interleaving
   Figure 2.7 [code/triangle.hny] Computing triangle numbers
   Figure 2.8 Running the code in Figure 2.7
   Figure 2.9 Running the code in Figure 2.7 for N = 100
   Figure 3.1 A sequential and a concurrent program
   Figure 3.2 The output of running the code in Figure 3.1(b)
   Figure 3.3 [code/Up.hny] Incrementing the same variable twice in parallel
   Figure 3.4 [code/Upr.hny] What actually happens in Figure 3.3
   Figure 3.5 The output of running the code in Figure 3.3
   Figure 3.6 [code/Upf.hny] Demonstrating the finally clause.
   Figure 3.7 [python/Up.py] Python implementation of Figure 3.3
   Figure 3.8 [python/UpMany.py] Using Python to increment N times
   Figure 4.1 The first part of the HVM bytecode corresponding to Figure 3.3
   Figure 4.2 The HTML output of running Harmony on Figure 3.3
   Figure 4.3 [code/UpEnter.hny] Incorrect attempt at fixing the code of Figure 3.3
   Figure 5.1 [code/csbarebones.hny] Modeling a critical section
   Figure 5.2 [code/cs.hny] Harmony model of a critical section
   Figure 5.3 [code/naiveLock.hny] Naïve implementation of a shared lock and the markdown output of running Harmony
   Figure 5.4 [code/naiveFlags.hny] Naïve use of flags to solve mutual exclusion
   Figure 5.5 [code/naiveTurn.hny] Naïve use of turn variable to solve mutual exclusion
   Figure 5.6 [code/Peterson.hny] Peterson's Algorithm
   Figure 5.7 [code/csonebit.hny] Mutual exclusion using a flag per thread
   Figure 6.1 [code/PetersonMethod.hny] Peterson's Algorithm accessed through methods
   Figure 6.2 [code/hanoi.hny] Towers of Hanoi
   Figure 6.3 [code/clock.hny] Harmony program that finds page replacement anomalies
   Figure 7.1 [code/lock.hny] Specification of a lock
   Figure 7.2 [code/lock_demo.hny] Using a lock to implement a critical section
   Figure 7.3 [code/UpLock.hny] Figure 3.3 fixed with a lock
   Figure 8.1 [code/spinlock.hny] Mutual Exclusion using a "spinlock" based on atomic swap
   Figure 8.2 [code/lock_tas.hny] Implementation of the lock specification in Figure 7.1 using a spinlock based on test-and-set
   Figure 8.3 [code/lock_ticket.hny] Implementation of the lock specification in Figure 7.1 using a ticket lock
   Figure 8.4 [code/lock_test1.hny] A test program for locks (based on \autoref{fig:cs})
   Figure 8.5 [modules/lock_susp.hny] Lock implementation using suspension
   Figure 8.6 [code/xy.hny] Incomplete code for Exercise 8.2 with desired invariant x + y = 100
   Figure 8.7 [code/atm.hny] Withdrawing money from an ATM
   Figure 9.1 A sequential and a concurrent specification of a queue
   Figure 9.2 [code/queue_test1.hny] Using a concurrent queue
   Figure 9.3 [code/queue_lock.hny] An implementation of a concurrent queue data structure and a depiction of a queue with three elements
   Figure 9.4 [code/queue_MS.hny] A queue with separate locks for enqueuing and dequeuing items and a depiction of a queue with two elements
   Figure 9.5 [code/setobj.hny] Specification of a concurrent set object
   Figure 9.6 [code/setobj_test1.hny] Test code for set objects
   Figure 9.7 [code/setobj_linkedlist.hny] Implementation of a set of values using a linked list with fine-grained locking
   Figure 10.1 [code/queue_test_seq.hny] Sequential queue test
   Figure 10.2 [code/queue_btest2.hny] Concurrent queue test. The behavior DFA is for N_PUT = N_GET = 1.
   Figure 10.3 [python/queue_btest2.py] Python implementation of Figure 10.2
   Figure 10.4 [code/queue_broken1.hny] A correct sequential but not a correct concurrent queue implementation
   Figure 11.1 [code/queue_broken2.hny] A buggy queue implementation
   Figure 11.2 Running Figure 10.2 against Figure 11.1
   Figure 11.3 HTML output of Figure 11.2 but for N_PUT=2 and N_GET=1
   Figure 11.4 [code/queue_fix.hny] Queue implementation with hand-over-hand locking
   Figure 12.1 [code/rwlock.hny] Specification of reader/writer locks
   Figure 12.2 [code/rwlock_test1.hny] Test code for reader/writer locks
   Figure 12.3 [code/rwlock_cheat.hny] "Cheating" reader/writer lock
   Figure 12.4 [code/rwlock_btest.hny] A behavioral test of reader/writer locks
   Figure 12.5 [code/rwlock_busy.hny] Busy waiting reader/writer lock
   Figure 12.6 [code/boundedbuffer.hny] Bounded buffer specification
   Figure 13.1 [code/rwlock_cv.hny] Reader/Writer Lock using Mesa-style condition variables
   Figure 13.2 [modules/synch.hny] Implementation of condition variables in the synch module
   Figure 13.3 [code/gpu.hny] A thread-unsafe GPU allocator
   Figure 13.4 [code/qsort.hny] Iterative qsort() implementation
   Figure 13.5 [code/qsorttest.hny] Test program for Figure 13.4
   Figure 14.1 [code/rwlock_cv_fair.hny] Reader/Writer Lock implementation addressing fairness (part 1)
   Figure 14.2 [code/rwlock_cv_fair.hny] Reader/Writer Lock implementation addressing fairness (part 2)
   Figure 15.1 [code/Diners.hny] Dining Philosophers
   Figure 15.2 [code/DinersCV.hny] Dining Philosophers that grab both forks at the same time
   Figure 15.3 [code/DinersAvoid.hny] Dining Philosophers solutions that avoids getting into a deadlock by allowing at most N-1 philosophers to start eating at a time
   Figure 15.4 [code/bank.hny] Bank accounts
   Figure 16.1 Depiction of three actors. The producer does not receive messages.
   Figure 16.2 [code/counter.hny] An illustration of the actor approach
   Figure 17.1 [code/barrier_test1.hny] Test program for Figure 17.2
   Figure 17.2 [code/barrier.hny] Barrier implementation
   Figure 17.3 [code/barrier_test2.hny] Demonstrating the double-barrier pattern
   Figure 17.4 [code/bsort.hny] Parallel bubble sort
   Figure 18.1 [code/rollercoaster.hny] Modeling a roller coaster
   Figure 19.1 [code/file.hny] Specification of the file system
   Figure 19.2 [code/file_btest.hny] Test program for a concurrent file system
   Figure 19.3 [code/disk.hny] Specification of a disk
   Figure 19.4 The file system data structure: (a) disk layout (1 superblock, n blocks, m inode blocks, 4 inodes per block); (b) inode for a file with 3 data blocks
   Figure 19.5 [code/file_inode.hny] File system implementation preamble
   Figure 19.6 [code/file_inode.hny] File system interface implementation
   Figure 19.7 [code/file_inode.hny] File server and worker threads
   Figure 19.8 [code/file_inode.hny] File system initialization
   Figure 19.9 [code/file_inode.hny] File system utility functions
   Figure 19.10 [code/file_inode.hny] Handling of file read requests
   Figure 19.11 [code/file_inode.hny] Handling of file write and delete requests
   Figure 20.1 [code/trap.hny] How to use trap
   Figure 20.2 [code/trap2.hny] A race condition with interrupts
   Figure 20.3 [code/trap3.hny] Locks do not work with interrupts
   Figure 20.4 [code/trap4.hny] Disabling and enabling interrupts
   Figure 20.5 [code/trap5.hny] Example of an interrupt-safe method
   Figure 20.6 [code/trap6.hny] Code that is both interrupt-safe and thread-safe
   Figure 21.1 [code/hw.hny] Non-blocking queue
   Figure 22.1 [code/abp.hny] Alternating Bit Protocol
   Figure 22.2 [code/abptest.hny] Test code for alternating bit protocol
   Figure 23.1 [code/leader.hny] A leader election protocol on a ring
   Figure 24.1 [code/2pc.hny] Two Phase Commit protocol: code for banks
   Figure 24.2 [code/2pc.hny] Two Phase Commit protocol: code for transaction coordinators
   Figure 25.1 [code/rsm.hny] Replicated State Machine
   Figure 25.2 The DFA generated by Figure 25.1 when NOPS=2 and NREPLICAS=2
   Figure 25.3 [code/chain.hny] Chain Replication (part 1)
   Figure 25.4 [code/chain.hny] Chain Replication (part 2)
   Figure 26.1 [code/chainaction.hny] Chain Replication specification using actions (part 1)
   Figure 26.2 [code/chainaction.hny] Chain Replication specification using actions (part 2)
   Figure 27.1 [code/register.hny] An atomic read/write register
   Figure 27.2 [code/abdtest.hny] Behavioral test for atomic read/write registers and the output for the case that NREADERS = NWRITERS = 1
   Figure 27.3 [code/abd.hny] An implementation of a replicated atomic read/write register
   Figure 28.1 [code/consensus.hny] Distributed consensus code and behavior DFA
   Figure 28.2 [code/bosco.hny] A crash-tolerant consensus protocol
   Figure 28.3 The behavior DFA for Figure 28.2
   Figure 28.4 [code/bosco2.hny] Reducing the state space
   Figure 29.1 [code/paxos.hny] A version of the Paxos protocol
   Figure 30.1 [code/needhamschroeder.hny] Needham-Schroeder protocol and an attack
   Figure B.1 Using save and go to implement fork()
   Figure B.2 [code/stacktest.hny] Testing a stack implementation.
   Figure B.3 [code/stack1.hny] Stack implemented using a dynamically updated list.
   Figure B.4 [code/stack2.hny] Stack implemented using static lists.
   Figure B.5 [code/stack3.hny] Stack implemented using a recursive tuple data structure.
   Figure B.6 [code/stack4.hny] Stack implemented using a linked list.
   Figure G.1 [code/queuelin.hny] Queue implementation with linearization points
   Figure G.2 [code/qtestconc.hny] Concurrent queue test
   Figure I.1 Venn diagram classifying all states and a trace
   Figure J.1 [code/rwlock_sbs.hny] Reader/Writer Lock using Split Binary Semaphores
   Figure J.2 [modules/hoare.hny] Implementation of Hoare monitors
   Figure J.3 [code/boundedbuffer_hoare.hny] Bounded Buffer implemented using a Hoare monitor

Chapter 1. On Concurrent Programming

Programming with concurrency is hard. Concurrency can make programs faster than sequential ones, but having multiple threads read and update shared variables concurrently and synchronize with one another makes programs more complicated than programs where only one thing happens at a time. Why are concurrent programs more complicated than sequential ones? There are, at least, two reasons:

The lack of determinism and atomicity in concurrent programs make them not only hard to reason about, but also hard to test. Running the same test of concurrent code twice is likely to produce two different results. More problematically, a test may trigger a bug only for certain "lucky" executions. Due to the probabilistic nature of concurrent code, some bugs may be highly unlikely to get triggered even when running a test millions of times. And even if a bug does get triggered, the source of the bug may be hard to find because it is hard to reproduce.

This book is intended to help people with understanding and developing concurrent code, which includes programs for distributed systems. In particular, it comes with a tool called Harmony that helps with testing concurrent code. The approach is based on model checking [8]: instead of relying on luck, Harmony will run all possible executions of a particular test program. So, even if a bug is unlikely to occur, if the test can expose the bug it will. Moreover, if the bug is found, the model checker precisely shows how to trigger the bug and will try to minimize the number of steps.

Model checking is not a replacement for formal verification. Formal verification proves that a program is correct. Model checking only verifies that a program is correct for some model. Think of a model as a test program. Because model checking tries every possible execution, the test program needs to be simple---otherwise it may take longer than we care to wait for or it may run out of memory. In particular, the model needs to have a relatively small number of reachable states.

If model checking does not prove a program correct, why is it useful? To answer that question, consider a sorting algorithm. Suppose we create a test program, a model, that tries sorting all lists of up to five numbers chosen from the set { 1, 2, 3, 4, 5 }. Model checking proves that for those particular scenarios the sorting algorithm works: the output is a sorted permutation of the input. In some sense it is an excellent test: it will have considered all corner cases, including lists where all numbers are the same, lists that are already sorted or reversely sorted, etc. If there is a bug in the sorting algorithm, most likely it will be triggered and the model checker will produce a scenario that makes it easy to find the source of the bug.

However, if the model checker does not find any bugs, we do not know for sure that the algorithm works for lists of more than five numbers or for lists that have values other than the numbers 1 through 5. Still, we would expect that the likelihood that there are bugs remaining in the sorting algorithm is small. That said, it would be easy to write a program that sorts all lists of up to five numbers correctly but fails to do so for a list of 6 numbers. (Hint: simply use an if statement.) The model checker would not find a bug in this adversarial program.

While model checking does not in general prove an algorithm correct, it can help with proving an algorithm correct. The reason is that many correctness properties can be proved using invariants: predicates that must hold for every state in the execution of a program. A model checker can find violations of proposed invariants when evaluating a model and provide valuable early feedback to somebody who is trying to construct a proof, even an informal one. We will include examples of such invariants as they often provide excellent insight into why a particular algorithm works.

So, what is Harmony? Harmony is a concurrent programming and specification language with a model checker. It was designed to teach the basics of concurrent and distributed programming, but it is also useful for testing new concurrent algorithms or even sequential and distributed algorithms. Harmony programs are not intended to be "run" like programs in most other programming languages---instead Harmony programs are model checked to test that the program has certain desirable properties and does not suffer from bugs.

The syntax and semantics of the Harmony programming language are similar to that of Python. Python is familiar to many programmers and is easy to learn and use. We will assume that the reader is familiar with the basics of Python programming. We also will assume that the reader understands some basics of machine architecture and how programs are executed. For example, we assume that the reader is familiar with the concepts of CPU, memory, register, stack, and machine instructions.

Harmony is heavily influenced by Leslie Lamport's work on TLA+, TLC, and PlusCal [29, 30], Gerard Holzmann's work on Promela and SPIN [25], and University of Washington's DSLabs system [35]. Some of the examples in this book are derived from those sources. Harmony is designed to have a lower learning curve than those systems, but may not be as powerful for specific problems. When you finish this book and want to learn more, we strongly encourage checking those out. Another excellent resource is Fred Schneider's book ``On Concurrent Programming'' [40]. (This chapter is named after that book.)

The book proceeds as follows:

If you already know about concurrent and distributed programming and are just interested in a "speed course" on Harmony, I would recommend reading Chapter 2, Chapter 4, Chapter 6, Chapter 7, and Chapter 9. The appendices contain various details about Harmony itself, including an appendix on convenient Harmony modules (Appendix B), and an appendix that explains how Harmony works (Appendix D).

Chapter 2. Hello World!

print "hello world"

Figure 2.1. [code/hello1.hny] Hello World!

The first programming book that I read cover to cover was The C Programming Language (first edition) by Brian W. Kernighan and Dennis M. Ritchie, which was around 1980. I did not know at the time that 10 years later Dennis, the designer of the C programming language, would be my boss at AT&T Bell Labs in Murray Hill, NJ, while Brian would be my colleague in the same lab. The first C program in the book printed the string "hello, world". Since then, most programming tutorials for pretty much any programming language start with that example.

Harmony, too, has a Hello World program. Figure 2.1 shows the program and the corresponding output. After installation (see https://harmony.cs.cornell.edu), you can run it as follows from the command line:

$ harmony code/hello1.hny
Try it out (here $ represents a shell prompt). For this to work, make sure harmony is in your command shell's search path. The code for examples in this book can be found in the code folder under the name listed in the caption of the example. If you need to, you can download the sources separately from https://harmony.cs.cornell.edu/sources.zip. In this case, the file code/hello1.hny contains the code in Figure 2.1. The output is a Deterministic State Machine (DFA). The green circle represents the initial state and the double circle represents the final state. There is one transition, labeled with the string "hello world". The DFA describes (or recognizes) all possible outputs that the program can generate. In this case, there is only one.

But programs can usually have more than one execution and produce multiple different outputs as a result. This is usually as a result of different inputs, but Harmony programs do not have inputs. Instead, Figure 2.2 demonstrates nondetermistic choice in Harmony programs. In this case, the program chooses to print either "hello" or "world". The corresponding DFA captures both possibilities. You can think of the choose operator as enumerating all possible inputs to the program.

Figure 2.3 shows a program that has an infinite number of possible outputs by using a loop with a non-deterministic stopping condition. Harmony usually requires that any program must be able to terminate, so the loop is conditioned on a nondeterministic choice between False and True. The possible outputs consist of zero or more copies of the string "hello world". Note that this single state DFA (where the initial state and the final state happen to be the same) captures an infinite number of possible executions of the program.

print choose { "hello", "world" }

Figure 2.2. [code/hello3.hny] Harmony program with two possible outputs

while choose { False, True }:
    print "hello world"

Figure 2.3. [code/hello4.hny] Harmony program with an infinite number of possible outputs

def p(s):
    print s

p("hello")
p("world")
def p(s):
    print s

spawn p("hello")
spawn p("world")
(a) [code/hello5.hny] [code/hello6.hny]

Figure 2.4. Demonstrating Harmony methods and threads

Figure 2.4 demonstrates methods and threads in Harmony. In Figure 2.4(a), the code simply prints the strings "hello" and "world", in that order. Notice that this leads to an intermediate state after "hello" is printed but before "world" is. However, there is still only one execution possible. Figure 2.4(b) shows two threads, one printing "hello" and one printing "world". Because the threads run concurrently, the program can either output "hello world" or "world hello". Printing in Harmony is atomic, so "hweolrllod" is not a possible output.

def hello(name):
    print "hello"
    print name

spawn hello("Lesley")
spawn hello("Robbert")

Figure 2.5. [code/hello7.hny] Various interleavings of threads

def hello(name):
    atomically:
        print "hello"
        print name

spawn hello("Lesley")
spawn hello("Robbert")

Figure 2.6. [code/hello8.hny] Making groups of operations atomic reduces interleaving

Figure 2.5 shows two threads, one printing the strings "hello" and "Robbert", while the other prints "hello" and "Lesley". Now there are four possible outputs depending on how the two threads are interleaved, including "hello hello Lesley Robbert". This is probably not what the programmer wanted. Figure 2.6 shows another important feature of Harmony: atomic blocks. The program is similar to Figure 2.5, but the programmer specified that the two print statements in a thread should be executed as an atomic unit. As a result, there are only two interleavings possible.

Harmony is a programming language that borrows much of Python's syntax. Like Python, Harmony is an imperative, dynamically typed programming language. There are also some important differences:

There are also less important differences that you will discover as you get more familiar with programming in Harmony.

const N = 10

def triangle(n) returns result:   # computes the n'th triangle number
    result = 0
    for i in {1..n}:     # for each integer from 1 to n inclusive
        result += i      # add i to result

x = choose {0..N}        # select an x between 0 and N inclusive
assert triangle(x) == ((x * (x + 1)) / 2)

Figure 2.7. [code/triangle.hny] Computing triangle numbers

$ harmony --noweb code/triangle.hny
  • Phase 1: compile Harmony program to bytecode
  • Phase 2: run the model checker (nworkers = 6)
    • 13 states (time 0.00s, mem=0.000GB)
    • 12/12 computations/edges
    • 2 rounds, 98 values
  • Phase 3: Scan states
  • Phase 4: Further analysis
    • cycles impossible
    • No issues found
  • Phase 4a: convert Kripke structure to DFA
  • Phase 5: write results to code/triangle.hco
  • Phase 6: loading code/triangle.hco

open file://.../code/triangle.htm for detailed information

Figure 2.8. Running the code in Figure 2.7

$ harmony -c N=100 --noweb code/triangle.hny
  • Phase 1: compile Harmony program to bytecode
  • Phase 2: run the model checker (nworkers = 6)
    • 103 states (time 0.00s, mem=0.000GB)
    • 102/102 computations/edges
    • 2 rounds, 728 values
  • Phase 3: Scan states
  • Phase 4: Further analysis
    • cycles impossible
    • No issues found
  • Phase 4a: convert Kripke structure to DFA
  • Phase 5: write results to code/triangle.hco
  • Phase 6: loading code/triangle.hco

open file://.../code/triangle.htm for detailed information

Figure 2.9. Running the code in Figure 2.7 for N = 100

Figure 2.7 shows another example of a Harmony program. The example is a sequential program and has a method triangle that takes an integer number as argument. The method declares a variable called result that eventually contains the result of the method (there is no return statement in Harmony). The method also has a bound variable called n containing the value of the argument. The { x..y } notation represents a set containing the numbers from x to y (inclusive). (Harmony does not have a range operator like Python.) The last two lines in the program are the most interesting. The first assigns to x some unspecified value in the range 0..N and the second verifies that triangle(x) equals x(x + 1)2 .

Running this Harmony program (Figure 2.8) will try all possible executions, which includes all possible values for x. The --noweb flag tells Harmony not to automatically pop up the web browser window. The text output from running Harmony is in Markdown format.

The assert statement checks that the output is correct. If the program is correct, Harmony reports the size of the "state graph" (13 states in this case). If not, Harmony also reports what went wrong, typically by displaying a summary of an execution in which something went wrong.

In Harmony, constants have a default specified value, but those can be overridden on the command line using the -c option. Figure 2.9 shows how to test the code for N = 100.

Exercises

Chapter 3. The Problem of Concurrent Programming

shared = True

def f(): assert shared
def g(): shared = False

f()
g()
shared = True

def f(): assert shared
def g(): shared = False

spawn f()
spawn g()
(a) [code/prog1.hny] Sequential (b) [code/prog2.hny] Concurrent

Figure 3.1. A sequential and a concurrent program

  • Phase 1: compile Harmony program to bytecode
  • Phase 2: run the model checker (nworkers = 6)
    • 6 states (time 0.00s, mem=0.000GB)
    • 4/5 computations/edges
    • 3 rounds, 33 values
  • Phase 3: Scan states
  • Phase 4: Further analysis
    • cycles impossible
  • Phase 5: write results to code/prog2.hco
    • Safety Violation
  • Phase 6: loading code/prog2.hco

Summary: something went wrong in an execution

Here is a summary of an execution that exhibits the issue:

  • Schedule thread T0: init()
    • Line 1: Initialize shared to True
    • Thread terminated
  • Schedule thread T2: g()
    • Line 4: Set shared to False (was True)
    • Thread terminated
  • Schedule thread T1: f()
    • Line 3: Harmony assertion failed

open file://.../code/prog2.htm for detailed information

Figure 3.2. The output of running the code in Figure 3.1(b)

Concurrent programming, aka multithreaded programming, involves multiple threads running in parallel while sharing variables. Figure 3.1 shows two programs. Program (a) is sequential. It sets shared to True, asserts that shared = True and finally sets shared to False. If you run the program through Harmony, it will not find any problems because there is only one execution possible and 1) in that execution the assertion does not fail and 2) the execution terminates. Program (b) is concurrent---it executes methods f() and g() in parallel. If method g() runs and completes before f(), then the assertion in f() will fail when f() runs. This problem is an example of non-determinism: methods f() and g() can run in either order. In one order, the assertion fails, while in the other it does not. But since Harmony checks all possible executions, it will always find the problematic one.

Figure 3.2 shows the output of running Figure 3.1(b) through Harmony. Underneath the line, there is a summary of what happened in one of the executions. First, the initialization thread (T0) runs and sets the global variable shared to True. Then, the thread running g() runs to completion, setting shared to False. Finally, the thread running f() runs, and the assertion fails.

Figure 3.3 presents a more subtle example that illustrates non-atomicity. The program initializes two shared variables: an integer count and an array done with two booleans. The program then spawns two threads. The first runs incrementer(0); the second runs incrementer(1).

Method incrementer takes a parameter called self. It increments count and sets done[self] to True. It then waits until the other thread is done. (await c is shorthand for while not c: pass.) After that, method incrementer verifies that the value of count equals 2.

Note that although the threads are spawned one at a time, they will execute concurrently. It is, for example, quite possible that incrementer(1) finishes before incrementer(0) even gets going. And because Harmony tries every possible execution, it will consider that particular execution as well. What would the value of count be at the end of that execution?

count = 0
done = [ False, False ]

def incrementer(self):
    count = count + 1
    done[self] = True
    await done[1 - self]
    assert count == 2

spawn incrementer(0)
spawn incrementer(1)

Figure 3.3. [code/Up.hny] Incrementing the same variable twice in parallel

  • Before you run the program, what do you think will happen? Is the program correct in that count will always end up being 2? (You may assume that load and store instructions of the underlying virtual machine architecture are atomic (indivisible)---in fact they are.)
count = 0
done = [ False, False ]

def incrementer(self):
    var register = count    # load shared variable count into a private register
    register += 1           # increment the register
    count = register        # store its value into variable count
    done[self] = True
    await done[1 - self]
    assert count == 2

spawn incrementer(0)
spawn incrementer(1)

Figure 3.4. [code/Upr.hny] What actually happens in Figure 3.3

What is going on is that the Harmony program is compiled to machine instructions, and it is the machine instructions that are executed by the underlying Harmony machine. The details of this appear in Chapter 4, but suffice it to say that the machine has instructions that load values from memory and store values into memory. Importantly, it does not have instructions to atomically increment or decrement values in shared memory locations. So, to increment a value in memory, the machine must do at least three machine instructions. Figure 3.4 illustrates this. (The var statement declares a new local variable register.) Conceptually, the machine

  1. loads the value of count from its memory location into a register;
  2. adds 1 to the register;
  3. stores the new value into the memory location of count.
When running multiple threads, each essentially runs an instantiation of the machine, and they do so in parallel. As they execute, their machine instructions are interleaved in unspecified and often unpredictable ways. A program is correct if it works for any interleaving of threads. Harmony will try all possible interleavings of the threads executing machine instructions.

If the threads run one at a time to completion, then count will be incremented twice and ends up being 2. However, the following is also a possible interleaving of incrementer(0) and incrementer(1):

  1. incrementer(1) loads the value of count, which is 0;
  2. incrementer(0) loads the value of count, which is still 0;
  3. incrementer(0) adds 1 to the value that it loaded (0), and stores 1 into count;
  4. incrementer(1) adds 1 to the value that it loaded (0), and stores 1 into count;
  5. incrementer(1) sets done[1] to True;
  6. incrementer(0) sets done[0] to True.
The result in this particular interleaving is that count ends up being 1. This is known as a race condition. When running Harmony, it will report violations of assertions. It also provides an example of an interleaving, like the one above, in which an assertion fails. Figure 3.5 shows the output of running Figure 3.3 through Harmony.

Summary: something went wrong in an execution

Here is a summary of an execution that exhibits the issue:

  • Schedule thread T0: init()
    • Line 1: Initialize count to 0
    • Line 2: Initialize done to [ False, False ]
    • Thread terminated
  • Schedule thread T2: incrementer(1)
    • Preempted in incrementer(1) about to store 1 into count in line 5
  • Schedule thread T1: incrementer(0)
    • Line 5: Set count to 1 (was 0)
    • Line 6: Set done[0] to True (was False)
    • Preempted in incrementer(0) about to load variable done[1] in line 7
  • Schedule thread T2: incrementer(1)
    • Line 5: Set count to 1 (unchanged)
    • Line 6: Set done[1] to True (was False)
    • Preempted in incrementer(1)
  • Schedule thread T1: incrementer(0)
    • Line 8: Harmony assertion failed

Figure 3.5. The output of running the code in Figure 3.3

If one thinks of the assertion as providing the specification of the program, then clearly its implementation does not satisfy its specification. Either the specification or the implementation (or both) must have a bug. We could change the specification by changing the assertion as follows:

assert (count == 1) or (count == 2)
This would fix the issue,[footnote: Actually, Harmony still complains, this time about a data race, about which you will learn in Chapter 4.] but more likely it is the program that must be fixed, not the specification.

count = 0

finally count == 2

def incrementer():
    count = count + 1

spawn incrementer()
spawn incrementer()

Figure 3.6. [code/Upf.hny] Demonstrating the finally clause.

Figure 3.3 uses flags done[0] and done[1] to check if both threads have finished incrementing count. Harmony provides a more convenient way to check if some condition holds when all threads have terminated. Figure 3.6 demonstrates the Harmony finally clause. The finally clause is like the assert clause, but the condition is only checked when all threads have finished. This eliminates the need for a shared variable like done, simplifies the code, and makes the intention clearer.

The exercises below have you try the same thing (having threads concurrently increment an integer variable) in Python. As you will see, the bug is not easily triggered when you run a Python version of the program. But in Harmony Murphy's Law applies: if something can go wrong, it will. Usually that is not a good thing, but in Harmony it is. It allows you to find bugs in your concurrent programs much more easily than using a conventional programming environment.

Exercises

import threading

count = 0
done = [ False, False ]

def incrementer(self):
    global count
    count = count + 1
    done[self] = True
    while not done[1 - self]:
        pass
    assert count == 2

threading.Thread(target=incrementer, args=(0,)).start()
threading.Thread(target=incrementer, args=(1,)).start()

Figure 3.7. [python/Up.py] Python implementation of Figure 3.3

import threading

N = 1000000
count = 0
done = [ False, False ]

def incrementer(self):
    global count
    for i in range(N):
        count = count + 1
    done[self] = True
    while not done[1 - self]:
        pass
    assert count == 2*N, count

threading.Thread(target=incrementer, args=(0,)).start()
threading.Thread(target=incrementer, args=(1,)).start()

Figure 3.8. [python/UpMany.py] Using Python to increment N times

Chapter 4. The Harmony Virtual Machine

Harmony programs are compiled to Harmony bytecode (a list of machine instructions for a virtual machine), which in turn is executed by the Harmony virtual machine (HVM). The Harmony compiler, harmony, places the bytecode for file x.hny in file x.hvm. The model checker (called Charm) executes the code in x.hvm and places its output in a file called x.hco. From the x.hco file, harmony creates a detailed human-readable output file in x.hvb and an interactive HTML file called x.htm. The x.htm file is automatically opened in your default web browser unless you specify the --noweb flag to harmony.

To understand the problem of concurrent computing, it is important to have a basic understanding of machine instructions, and in our case those of the HVM.

Harmony Values

Harmony programs, and indeed the HVM, manipulate Harmony values. Harmony values are recursively defined: they include booleans (False and True), integers (but not floating point numbers), strings (enclosed by single or double quotes), sets and lists of Harmony values, and dictionaries that map Harmony values to other Harmony values. Strings that start with a letter or an underscore and only contain letters, digits, and underscores can be written without quotes by preceding it with a dot. So, .example is the same string as "example".

A dictionary maps keys to values. Unlike Python, which requires that keys must be hashable, any Harmony value can be a key, including another dictionary. Dictionaries are written as {k0 : v0, k1 : v1, ...} . The empty dictionary is written as {:}. If d is a dictionary, and k is a key, then the following expression retrieves the Harmony value that k maps to in d:

d k
The meaning of d a b ... is (((d a) b) ... ). This notation is unfamiliar to Python programmers, but in Harmony square brackets can be used in the same way as parentheses, so you can express the same thing in the form that is familiar to Python programmers:
d[k]
However, if d = { .count: 3 }, then you can write d.count (which has value 3) instead of having to write d[.count] or d["count"] (although any of those will work). Thus a dictionary can be made to look much like a Python object.

In Harmony (unlike Python), lists and tuples are the same type. As in Python, you can create a singleton tuple (or list) by including a comma. For example, (1,) is a tuple consisting just of the number 1. Importantly, (1) = 1(1,) . Because, square brackets and parentheses work the same in Harmony, [a, b, c] (which looks like a Python list) is the same Harmony value as (a, b, c) (which looks like a Python tuple). So, if x = [False, True], then x[0] = False and x[1] = True, just like in Python. However, when creating a singleton list, make sure you include the comma, as in [False,]. The expression [False] just means False.

Harmony is not an object-oriented language, so objects don't have built-in methods. However, Harmony does have some powerful operators to make up for some of that. For example, dictionaries have two handy unary operators. If d is a dictionary, then keys d (or equivalently keys(d)) returns the set of keys and len d returns the size of this set.

Section A.1 provides details on all the types of values that Harmony supports.

Harmony Bytecode

A Harmony program is translated into HVM bytecode. To make it amenable to efficient model checking, the HVM is not an ordinary virtual machine, but its architecture is nonetheless representative of conventional computers and virtual machines such as the Java Virtual Machine.

Instead of bits and bytes, a HVM manipulates Harmony values. A HVM has the following components:

  • Code: This is an immutable and finite list of HVM instructions, generated from a Harmony program. The types of instructions will be described later.
  • Shared memory: A HVM has just one memory location containing a Harmony dictionary value.
  • Threads: Any thread can spawn an unbounded number of other threads and threads may terminate. Each thread has a program counter that indexes into the code, a stack of Harmony values, and a private register that contains a Harmony dictionary value.
The register of a thread contains the local variables of the method that the thread is currently executing. It is saved and restored by method invocations. The state of a thread is called a context (aka continuation): it contains the values of its program counter, stack, and register. The HVM state consists of the value of its memory and the multiset (or bag) of contexts. It is a multiset of contexts because two threads can have the same context at the same time.

   0 Frame __init__ ()
code/Up.hny:1 count = 0
   1 Push 0
   2 Store count
code/Up.hny:2 done = [ False, False ]
   3 Push [False, False]
   4 Store done
code/Up.hny:4 def incrementer(self):
   5 Jump 35
   6 Frame incrementer self
code/Up.hny:5     count = count + 1
   7 Load count
   8 Push 1
   9 2-ary +
   10 Store count

Figure 4.1. The first part of the HVM bytecode corresponding to Figure 3.3

It may seem strange that there is only one memory location. However, this is not a limitation because Harmony values are unbounded trees. The shared memory is a dictionary that maps strings (names of shared variables) to other Harmony values. We call this a directory. Thus, a directory represents the state of a collection of variables named by the strings. Because directories are Harmony values themselves and Harmony values include dictionaries and lists that themselves contain other Harmony values, directories can be organized into a tree. Each node in a directory tree is then identified by a sequence of Harmony values, like a path name in the file system hierarchy. We call such a sequence an address. For example, in Figure 3.3 the memory is a dictionary with two entries: .count and .done. And the value of entry .done is a list with indexes 0 and 1. So, for example, the address of done[0] is the sequence [.done, 0]. As we will see later, an address is itself a Harmony value.

Compiling the code in Figure 3.3 results in the HVM bytecode listed in Figure 4.1. You can obtain this code by invoking harmony with the -a flag like so:

harmony -a Up.hny
Each thread in the HVM is predominantly a stack machine, but it also a register. Like shared memory, the register contains a dictionary so it can represent the values of multiple named variables. All individual instructions are atomically executed. The Harmony memory model is sequentially consistent: all accesses are in program order. At first there is one thread, named __init__, which initializes the state. It starts executing at instruction 0 and keeps executing until it reaches the last instruction in the program. In this case, it executes instructions 0 through 5 first. The last instruction in that sequence is a JUMP instruction that sets the program counter to 35 (skipping over the code of the incrementer method). The __init__ thread then executes the remaining instructions and finishes. Once initialization completes, any threads that were spawned (in this case incrementer(0) and incrementer(1)) can run.

At program counter 6 is the code for the incrementer method. All methods start with a Frame instruction and end with a Return instruction. Section C.1 provides a list of all HVM machine instructions, in case you want to read about the details. The Frame instruction lists the name of the method and the names of its arguments. The code generated from count := count + 1 in Line 5 of Up.hny is as follows (see Figure 4.1):

  1. The Load instruction pushes the value of the count variable onto the stack.
  2. The Push instruction pushes the constant 1 onto the stack of the thread.
  3. 2-ary is a + operation with 2 arguments. It pops two values from the stack (the value of count and 1), adds them, and pushes the result back onto the stack.
  4. The Store instruction pops a Harmony value (the sum of the count variable and 1) and stores it in the count variable.
You can think of Harmony as trying every possible interleaving of threads executing instructions. Harmony can report the following failure types: Harmony checks for these types of failure conditions in the given order: if there are multiple failure conditions, only the first is reported. Active busy waiting (Chapter 12) is not technically an indication of a synchronization problem, but instead an indication of an inefficient solution to a synchronization problem--- one that uses up CPU cycles unnecessarily. A data race indicates that the execution depends on the semantics of the underlying memory operations and is therefore undesirable. Harmony may also warn about a set of behaviors, in particular if the set of generated behaviors is only a subset of all correct behaviors.

Figure 4.2. The HTML output of running Harmony on Figure 3.3

Harmony generates a detailed and self-explanatory text output file (see code/Up.hvb) and an interactive HTML file that allows exploring more details of the execution. Open the suggested HTML file and you should see something like Figure 4.2.

In the top right, the HTML file contains the reported issue in red. Underneath it, a table shows the five turns in the execution, with the current turn highlighted in yellow. Each turn displays a list of blocks for each instruction executed in that turn. If it is white, the instruction has not yet been executed. We call this the timeline. You can click on such a block to see the state of the Harmony virtual machine just after executing the corresponding instruction. If a thread has finished its turn, there is also information on the status of that thread. For example, at the end of turn 2, incrementer(1) is about to store the value 1 in variable count, but at that point is preempted by incrementer(0). The table also lists the program counter of the thread at each turn, the values of the shared variables, and any values the thread may have printed (none in this case). Underneath the table it shows the line of Harmony source code that is being executed in blue (with the specific part of the line that is being evaluated in green), and the HVM instruction that is about to be executed in green (along with an explanation in parentheses).

The bottom left shows the bytecode of the program being executed. The instruction that is being or about to be executed, if any, is highlighted in red. If you hover the mouse over a machine instruction, it provides a brief explanation of what the instruction does.

The bottom right contains a table with the state of each thread. The thread that is executing is highlighted in yellow. Status information for a thread can include:

The stack of each thread is subdivided into two parts: the stack trace and the stack top. A stack trace is a list of methods that are being invoked. In this case, the incrementer method does not invoke any other methods, and so the list is of length 1. For each entry in the stack trace, it shows the method name and arguments, as well as the variables of the method. The stack top shows the values on the stack beyond the stack trace.

When you load the HTML file, it shows the final state. As mentioned above, you can go to any point in the execution by clicking on one of the blocks in the timeline. When you do so, the current turn and thread will be highlighted in yellow. There are also various handy keyboard shortcuts:
Right arrow: go to the next instruction;
Left arrow: go back to the previous instruction;
Down arrow: finish executing the current method;
Up arrow: go back to the beginning of the current method;
Enter (aka Return): go to the next line of Harmony code;
0: go to the initial state.
If you want to see an animation of the entire execution, one instruction at a time, you can first hit 0 and then hold down the right arrow. If you want to see it one line of Harmony code at a time, hold down the enter (aka return) key instead. If you hold down the down arrow key, the movie will go by very quickly.

Exercises

count = 0

entered = done = [ False, False ]

def incrementer(self):
    entered[self] = True
    if entered[1 - self]:        # if the other thread has already started
        await done[1 - self]     # wait until it is done
    count = count + 1
    done[self] = True
    await done[1 - self]
    assert count == 2

spawn incrementer(0)
spawn incrementer(1)

Figure 4.3. [code/UpEnter.hny] Incorrect attempt at fixing the code of Figure 3.3

Chapter 5. Critical Sections

def thread():
    while True:
        # Critical section is here
        pass
    
spawn thread()
spawn thread()

Figure 5.1. [code/csbarebones.hny] Modeling a critical section

Hopefully you have started thinking of how to solve the concurrency problem and you may already have prototyped some solutions. In this chapter, we will go through a few reasonable but broken attempts before arriving at a working solution. In the case of Figure 3.3, the problem is that we would like make sure that, when the count variable is being updated, no other thread is trying to do the same thing. The code that must be isolated is called a critical section (aka critical region) [13, 15, 16]: a region of code where only one thread is allowed to execute at a time and that must be executed until completion. In the case of Figure 3.3, the critical section is Line 5.

Critical sections are useful when accessing a shared data structure, particularly when that access requires multiple underlying machine instructions. A counter is a very simple example of a data structure (it is a list of bits), but---as we have seen---incrementing it requires multiple instructions. A more involved one would be accessing a binary tree. Adding a node to a binary tree, or re-balancing a tree, often requires multiple operations. Maintaining "consistency" is certainly much easier if during this time no other thread also tries to access the binary tree. Typically, you want some property of the data structure to hold at the beginning and at the end of the critical section, but in the middle the property may be temporarily invalid---this is not a problem as critical sections guarantee that no other thread will be able to see the intermediate state of the data structure. An implementation of a data structure that can be safely accessed by multiple threads and is free of race conditions is called thread-safe.

A critical section is often modeled as threads in an infinite loop entering and exiting the critical section. Figure 5.1 shows the Harmony code. We need to ensure is that there can never be two or more threads in the critical section. This property is called mutual exclusion. Mutual exclusion by itself is easy to ensure. For example, we could insert the following code to enter the critical section:

await False
This code will surely prevent two or more threads from executing in the critical section at the same time. But it does so by preventing any thread from reaching the critical section. We clearly need another property besides mutual exclusion.

# number of threads in the critical section
in_cs = 0
invariant in_cs in { 0, 1 }

def thread():
    while choose { False, True }:
        # Enter critical section
        atomically in_cs += 1

        # Critical section is here
        pass

        # Exit critical section
        atomically in_cs -= 1
    
spawn thread()
spawn thread()

Figure 5.2. [code/cs.hny] Harmony model of a critical section

Mutual exclusion is an example of a safety property, a property that ensures that nothing bad will happen, in this case two threads being in the critical section. What we need now is a liveness property: we want to ensure that eventually something good will happen. There are various possible liveness properties we could use, but here we will propose the following informally: if (1) there exists a non-empty set S of threads that are trying to enter the critical section and (2) threads in the critical section always leave eventually, then eventually one thread in S will enter the critical section. We call this progress.

In order to detect violations of progress, and other liveness problems in algorithms in general, Harmony requires that every execution must be able to reach a state in which all threads have terminated. Clearly, even if mutual exclusion holds in Figure 5.1, the spawned threads never terminate.

We will instead model threads in critical sections using the framework in Figure 5.2: a thread can choose to enter a critical section more than once, but it can also choose to terminate, even without entering the critical section ever. (Recall that Harmony will try every possible execution, and so it will evaluate both choices.) As it turns out, there is an advantage to doing it this way: we also test if a thread can enter when there is no other thread trying to enter the critical section. As we will see below, this is not always obvious.

This code specifies that at most one thread can be executing in the critical section by using a counter in_cs that a thread atomically increments when entering the critical section and atomically decrements when it leaves. The code specifies the invariant that in_cs must be either 0 or 1. You can think of this as the type of in_cs.

We will now consider various approaches toward implementing this specification.

in_cs = 0
invariant in_cs in { 0, 1 }

lockTaken = False

def thread(self):
    while choose({ False, True }):
        # Enter critical section
        await not lockTaken
        lockTaken = True

        atomically in_cs += 1
        # Critical section
        atomically in_cs -= 1

        # Leave critical section
        lockTaken = False
    
spawn thread(0)
spawn thread(1)

Summary: something went wrong in an execution

Here is a summary of an execution that exhibits the issue:

  • Schedule thread T0: init()
    • Line 1: Initialize in_cs to 0
    • Line 4: Initialize lockTaken to False
    • Thread terminated
  • Schedule thread T3: thread(1)
    • Line 7: Choose True
    • Preempted in thread(1) about to store True into lockTaken in line 10
  • Schedule thread T2: thread(0)
    • Line 7: Choose True
    • Line 10: Set lockTaken to True (was False)
    • Line 12: Set in_cs to 1 (was 0)
    • Preempted in thread(0) about to execute atomic section in line 14
  • Schedule thread T3: thread(1)
    • Line 10: Set lockTaken to True (unchanged)
    • Line 12: Set in_cs to 2 (was 1)
    • Preempted in thread(1) about to execute atomic section in line 14
  • Schedule thread T1: invariant()
    • Line 2: Harmony assertion failed

Figure 5.3. [code/naiveLock.hny] Naïve implementation of a shared lock and the markdown output of running Harmony

in_cs = 0
invariant in_cs in { 0, 1 }

flags = [ False, False ]

def thread(self):
    while choose({ False, True }):
        # Enter critical section
        flags[self] = True
        await not flags[1 - self]

        atomically in_cs += 1
        # Critical section
        atomically in_cs -= 1

        # Leave critical section
        flags[self] = False

spawn thread(0)
spawn thread(1)

Summary: some execution cannot terminate

Here is a summary of an execution that exhibits the issue:

  • Schedule thread T0: init()
    • Line 1: Initialize in_cs to 0
    • Line 4: Initialize flags to [ False, False ]
    • Thread terminated
  • Schedule thread T1: thread(0)
    • Line 7: Choose True
    • Line 9: Set flags[0] to True (was False)
    • Preempted in thread(0) about to load variable flags[1] in line 10
  • Schedule thread T2: thread(1)
    • Line 7: Choose True
    • Line 9: Set flags[1] to True (was False)
    • Preempted in thread(1) about to load variable flags[0] in line 10

Final state (all threads have terminated or are blocked):

  • Threads:
    • T1: (blocked) thread(0)
      • about to load variable flags[1] in line 10
    • T2: (blocked) thread(1)
      • about to load variable flags[0] in line 10

Figure 5.4. [code/naiveFlags.hny] Naïve use of flags to solve mutual exclusion

in_cs = 0
invariant in_cs in { 0, 1 }

turn = 0

def thread(self):
    while choose({ False, True }):
        # Enter critical section
        turn = 1 - self
        await turn == self

        atomically in_cs += 1
        # Critical section
        atomically in_cs -= 1

        # Leave critical section

spawn thread(0)
spawn thread(1)

Summary: some execution cannot terminate

Here is a summary of an execution that exhibits the issue:

  • Schedule thread T0: init()
    • Line 1: Initialize in_cs to 0
    • Line 4: Initialize turn to 0
    • Thread terminated
  • Schedule thread T2: thread(1)
    • Line 7: Choose False
    • Thread terminated
  • Schedule thread T1: thread(0)
    • Line 7: Choose True
    • Line 9: Set turn to 1 (was 0)
    • Preempted in thread(0) about to load variable turn in line 10

Final state (all threads have terminated or are blocked):

  • Threads:
    • T1: (blocked) thread(0)
      • about to load variable turn in line 10
    • T2: (terminated) thread(1)

Figure 5.5. [code/naiveTurn.hny] Naïve use of turn variable to solve mutual exclusion

You may already have heard of the concept of a lock and have realized that it could be used to implement a critical section. The idea is that the lock is like a baton that at most one thread can own (or hold) at a time. A thread that wants to enter the critical section at a time must obtain the lock first and release it upon exiting the critical section. Using a lock is a good thought, but how does one implement one? Figure 5.3 presents an attempt at mutual exclusion based on a naïve (and, as it turns out, incorrect) implementation of a lock. Initially the lock is not owned, indicated by lockTaken being False. To enter the critical section, a thread waits until lockTaken is False and then sets it to True to indicate that the lock has been taken. The thread then executes the critical section. Finally, the thread releases the lock by setting lockTaken back to False.

Unfortunately, if we run the program through Harmony, we find that the assertion fails. Figure 5.3 also shows the Harmony output. thread(1) finds that the lock is available, but just before it stores True in lockTaken, thread(0) gets to run. Because lockTaken is still False, it too believes it can acquire the lock, and stores True in lockTaken and moves on to the critical section. Finally, thread(1) moves on, also stores True into lockTaken and also moves into the critical section. The lockTaken variable suffers from the same sort of race condition as the count variable in Figure 3.3: testing and setting the lock consists of several instructions. It is thus possible for both threads to believe the lock is available and to obtain the lock at the same time.

Preventing multiple threads from updating the same variable, Figure 5.4 presents a solution based on each thread having a flag indicating that it is trying to enter the critical section. A thread can write its own flag and read the flag of its peer. After setting its flag, the thread waits until the other thread ( 1 - self ) is not trying to enter the critical section. If we run this program, the assertion does not fail. In fact, this solution does prevent both threads being in the critical section at the same time.

To see why, first note the following invariant: if thread i is in the critical section, then flags[i] = True. Without loss of generality, suppose that thread 0 sets flags[0] at time t0 . Thread 0 can only reach the critical section if at some time t1 , t1 > t0 , it finds that flags[1] = False. Because of the invariant, flags[1] = False implies that thread 1 is not in the critical section at time t1 . Let t2 be the time at which thread 0 sets flags[0] to False. Thread 0 is in the critical section sometime between t1 and t2 . It is easy to see that thread 1 cannot enter the critical section between t1 and t2 , because flags[1] = False at time t1 . To reach the critical section between t1 and t2 , it would first have to set flags[1] to True and then wait until flags[0] = False. But that does not happen until time t2 .

However, if you run the program through Harmony, it turns out the solution does have a problem: if both try to enter the critical section at the same time, they may end up waiting for one another indefinitely. (This is a form of deadlock, which will be discussed in Chapter 15.) Thus the solution violates progress.

The final naïve solution that we propose is based on a variable called turn. Each thread politely lets the other thread have a turn first. When turn = i, thread i can enter the critical section, while thread 1 - i has to wait. An invariant of this solution is that while thread i is in the critical section, turn = i. Since turn cannot be 0 and 1 at the same time, mutual exclusion is satisfied. The solution also has the nice property that the thread that has been waiting the longest to enter the critical section can go next.

Run the program through Harmony. It turns out that this solution also violates progress, albeit for a different reason: if thread i terminates instead of entering the critical section, thread 1 - i , politely, ends up waiting indefinitely for its turn. Too bad, because it would have been a great solution if both threads try to enter the critical section ad infinitum.

in_cs = 0
invariant in_cs in { 0, 1 }

sequential flags, turn
flags = [ False, False ]
turn = choose({0, 1})

def thread(self):
    while choose({ False, True }):
        # Enter critical section
        flags[self] = True
        turn = 1 - self
        await (not flags[1 - self]) or (turn == self)

        atomically in_cs += 1
        # Critical section
        atomically in_cs -= 1

        # Leave critical section
        flags[self] = False

spawn thread(0)
spawn thread(1)

Figure 5.6. [code/Peterson.hny] Peterson's Algorithm

You may now start to think that a solution to the mutual exclusion problem without special hardware support is impossible. However, in 1981, Gary L. Peterson came up with a beautiful solution, now known as ``Peterson's Algorithm'' [37]. The algorithm is an amalgam of the (incorrect) algorithms in Figure 5.4 and Figure 5.5, and is presented in Figure 5.6. (The first line specifies that the flags and turn variables are assumed to satisfy sequential consistency---it prevents Harmony from complaining about data races involving these variables, explained in Chapter 8.)

A thread first indicates its interest in entering the critical section by setting its flag. It then politely gives way to the other thread should it also want to enter the critical section---if both do so at the same time one will win because writes to memory in Harmony are atomic. The thread continues to be polite, waiting until either the other thread is nowhere near the critical section (flag[1 - self] = False) or has given way (turn = self).

Running the algorithm with Harmony shows that it satisfies both mutual exclusion and progress. Why does it work? To answer that we need to prove the correctness of Peterson's algorithm. Unfortunately, proving the correctness of Peterson's algorithm is not trivial. You can find an informal proof in Appendix H.

Peterson's algorithm implements a critical section, but it is not efficient, especially if generalized to more than two threads. Worse, Peterson relies on load and store operations to be executed atomically, but this may not be the case. There are a variety of possible reasons for this.

Peterson's algorithm relies on a sequential consistent memory model and hence the sequential statement: without it Harmony will complain about data races. More precisely, the sequential statement says that the program relies on memory load and store instructions operating on the indicated variables to be performed atomically (so concurrently invoked instructions are sequentialized), and that this order should be consistent with the order of instructions invoked on each thread. The default memory models of C and Java are not sequentially consistent. The volatile keyword in Java has a similar function as Harmony's sequential keyword. Like many constructions in Java, its volatile keyword was borrowed from C and C++. However, in C and C++, they do not provide sequential consistency, and one cannot implement Peterson's algorithm in C or C++ directly.

Exercises

in_cs = 0
invariant in_cs in { 0, 1 }

sequential flags
flags = [ False, False ]

def thread(self):
    while choose({ False, True }):
        # Enter critical section
        flags[self] = True
        while flags[1 - self]:
            flags[self] = False
            flags[self] = True
        
        atomically in_cs += 1
        # Critical section
        atomically in_cs -= 1

        # Leave critical section
        flags[self] = False

spawn thread(0)
spawn thread(1)

Figure 5.7. [code/csonebit.hny] Mutual exclusion using a flag per thread

Chapter 6. Harmony Methods and Pointers

A method m with argument a is invoked in its most basic form as follows (assigning the result to r).

r = m a
That's right, no parentheses are required. In fact, if you invoke m(a), the argument is (a), which is the same as a. If you invoke m(), the argument is (), which is the empty tuple. If you invoke m(a, b), the argument is (a, b), the tuple consisting of values a and b.

You may note that all this looks familiar. Indeed, the syntax is the same as that for dictionaries and lists (see Chapter 4). Dictionaries, lists, and methods all map Harmony values to Harmony values, and their syntax is indistinguishable. If f is a method, list, or dictionary, and x is some Harmony value, then f x, f(x), and f[x] are all the same expression in Harmony.

def Peterson_enter(pm, pid):
    pm->flags[pid] = True
    pm->turn = 1 - pid
    await (not pm->flags[1 - pid]) or (pm->turn == pid)

def Peterson_exit(pm, pid):
    pm->flags[pid] = False

def Peterson_mutex() returns result:
    result = { .turn: choose({0, 1}), .flags: [ False, False ] }

#### The code above can go into its own Harmony module ####

in_cs = 0
invariant in_cs in { 0, 1 }

sequential mutex
mutex = Peterson_mutex()

def thread(self):
    while choose({ False, True }):
        Peterson_enter(?mutex, self)

        atomically in_cs += 1
        # Critical section
        atomically in_cs -= 1

        Peterson_exit(?mutex, self)

spawn thread(0)
spawn thread(1)

Figure 6.1. [code/PetersonMethod.hny] Peterson's Algorithm accessed through methods

Harmony does not have a return statement. Using the returns clause of def, a result variable can be declared, for example: def f() returns something. The result of the method should be assigned to variable something. Harmony also does not support break or continue statements in loops. One reason for their absence is that, particularly in concurrent programming, such control flow directions are highly error-prone. It's too easy to forget to, say, release a lock when returning a value in the middle of a method---a major source of bugs in practice.

Harmony is not an object-oriented language like Python is. In Python, you can pass an object reference to a method, and that method can then update the object. In Harmony, it is also sometimes convenient to have a method update a shared variable specified as an argument. For this, as mentioned in Chapter 4, each shared variable has an address, itself a Harmony value. If x is a shared variable, then the expression ?x is the address of x. If a variable contains an address, we call that variable a pointer. If p is a pointer to a shared variable, then the expression !p is the value of the shared variable. In particular, !?x = x. This is similar to how C pointers work (*&x = x).

Often, pointers point to dictionaries, and so if p is such a pointer, then (!p).field would evaluate to the specified field in the dictionary. Note that the parentheses in this expression are needed, as !p.field would wrongly evaluate !(p.field). (!p).field is such a common expression that, like C, Harmony supports the shorthand p->field (written p->field), which greatly improves readability.

Figure 6.1 again shows Peterson's algorithm, but this time with methods defined to enter and exit the critical section. The name mutex is often used to denote a variable or value that is used for mutual exclusion. Peterson_mutex is a method that returns a "mutex," which, in this case, is a dictionary that contains Peterson's Algorithm's shared memory state: a turn variable and two flags. Both methods Peterson_enter and Peterson_exit take two arguments: a pointer to a mutex and the thread identifier (0 or 1). pm->turn is the value of the .turn key in the dictionary that pm points to.

You can put the first three methods in its own Harmony source file and include it using the Harmony import statement. This would make the code usable by multiple applications.

Finally, methods can have local variables. Method variables are either mutable (writable) or immutable (read-only). The arguments to a method and the bound variable (or variables) within a for statement are immutable; the result variable is mutable. Using the var statement, new mutable local variables can be declared. For example, var x = 3 declares a new mutable local variable x. The let statement allows declaring new immutable local variables. For example: let x = 3: y += x adds 3 to the global variable y.

current = [ [1, 2, 3], [], [] ]

while current[2] != [1, 2, 3]:
    let moves = { (s, d) for s in {0..2} for d in {0..2}
        where current[s] != []
        where (current[d] == []) or (current[s][0] < current[d][0]) }
    let (src,dst) = choose moves:
        print str(src) + " -> " + str(dst)
        current[dst] = [current[src][0],] + current[dst]
        del current[src][0]

Figure 6.2. [code/hanoi.hny] Towers of Hanoi

As an example of using let, Figure 6.2 solves the Towers of Hanoi problem. If you are not familiar with this problem: there are three towers with disks of varying sizes. In the initial configuration, the first tower has three disks (of sizes 1, 2, and 3), with the largest disk at the bottom, while the other two towers are empty. You are allowed to move a top disk from one tower to another, but you are not allowed to stack a larger disk on a smaller one. The objective is to move the disks from the first tower to the third one. The program tries valid moves at random until it finds a solution. Since Harmony tries all possible executions, the output of this program shows all possible solutions to the problem.

The program also contains an example of set comprehension (Lines 4 to 6). This is similar to set comprehension in Python, except that Harmony uses the keyword where instead of if (simplifying parsing).

const FIFO = False

def CLOCK(n) returns result:
    result = { .entries: [None,] * n, .recent: {}, .hand: 0, .misses: 0 }

def ref(clock, x):
    if x not in clock->entries:
        while clock->entries[clock->hand] in clock->recent:
            clock->recent -= {clock->entries[clock->hand]}
            clock->hand = (clock->hand + 1) % len(clock->entries)
        clock->entries[clock->hand] = x
        clock->hand = (clock->hand + 1) % len(clock->entries)
        clock->misses += 1
    if not FIFO:
        clock->recent |= {x}

clock3, clock4, refs = CLOCK(3), CLOCK(4), []

const VALUES = { 1..5 }

var last = {}
for i in {1..100}:
    let x = i if i < 5 else choose(VALUES - last):
        refs = refs + [x,]
        ref(?clock3, x); ref(?clock4, x)
        assert(clock4.misses <= clock3.misses)
        last = {x}

Figure 6.3. [code/clock.hny] Harmony program that finds page replacement anomalies

If you are ready to learn about how locks are implemented in practice, you can now skip the rest of this chapter. But if you would like to see a cool example of using the concepts introduced in this chapter, hang on for a sequential Harmony program that finds anomalies in page replacement algorithms. In 1969, Bélády et al. published a paper [3] that showed that making a cache larger does not necessarily lead to a higher hit ratio. He showed this for a scenario using a FIFO replacement policy when the cache is full. The program in Figure 6.3 will find exactly the same scenario if you define FIFO to be True. Moreover, if you define FIFO to be False, it will find a scenario for the CLOCK replacement policy [10].

In this program, CLOCK maintains the state of a cache (in practice, typically pages in memory). The set recent maintains whether an access to the cache for a particular reference was recent or not. (It is not used if FIFO is True.) The integer misses maintains the number of cache misses. Method ref(ck, x) is invoked when x is referenced and checked against the cache ck.

The program declares two caches: one with 3 entries (clock3) and one with 4 entries (clock4). The interesting part is in the last block of code. It runs every sequence of references of up to 100 entries, using references in the range 1 through 5. Note that all the constants chosen in this program (3, 4, 5, 100) are the result of some experimentation---you can try other ones. To reduce the search space, the first four references are pinned to 1, 2, 3, and 4. Further reducing the search space, the program never repeats the same reference twice in a row (using the local variable last).

The two things to note here is the use of the choose expression and the assert statement. Using choose, we are able to express searching through all possible strings of references without a complicated nested iteration. Using assert, we are able to express the anomaly we are looking for.

In case you want to check if you get the right results. For FIFO, the program finds the same anomaly that Bélády et al. found: 1 2 3 4 1 2 5 1 2 3 4 5. For the CLOCK algorithm the program actually finds a shorter reference string: 1 2 3 4 2 1 2 5 1 2.

Exercises

Chapter 7. Specifying a Lock

def Lock() returns result:
    result = False

def acquire(lk):
    atomically when not !lk:
        !lk = True

def release(lk):
    assert !lk
    atomically !lk = False

Figure 7.1. [code/lock.hny] Specification of a lock

from synch import Lock, acquire, release

const NTHREADS = 5

thelock = Lock()

def thread():
    acquire(?thelock)
    pass             # critical section is here
    release(?thelock)

for i in {1..NTHREADS}:
    spawn thread()

Figure 7.2. [code/lock_demo.hny] Using a lock to implement a critical section

from synch import Lock, acquire, release

sequential done

count = 0
countlock = Lock()
done = [ False, False ]

def thread(self):
    acquire(?countlock)
    count = count + 1
    release(?countlock)
    done[self] = True
    await done[1 - self]
    assert count == 2

spawn thread(0)
spawn thread(1)

Figure 7.3. [code/UpLock.hny] Figure 3.3 fixed with a lock

So far, we have used Harmony to implement various algorithms. But Harmony can also be used to specify what an algorithm is supposed to do. For example, Figure 7.1 specifies the intended behavior of a lock. In this case, a lock is a boolean, initially False, with two operations, acquire() and release(). The acquire() operation waits until the lock is False and then sets it to True in an atomic operation. The release() operation sets the lock back to False. The code is similar to Figure 5.3, except that waiting for the lock to become available and taking it is executed as an atomic operation.

The code in Figure 7.1 is also in Harmony's synch module. Figure 7.2 shows how locks may be used to implement a critical section. Figure 7.3 gives an example of how locks may be used to fix the program of Figure 3.3.

Note that the code of Figure 7.1 is executable in Harmony. However, the atomically keyword is not available in general programming languages and should not be used for implementation. Peterson's algorithm is an implementation of a lock, although only for two processes. In the following chapters, we will look at more general ways of implementing locks using atomic constructions that are usually available in the underlying hardware.

In Harmony, any statement can be preceded by the atomically keyword. It means that the statement as a whole is to be executed atomically. (Harmony can accomplish this by controlling how threads are scheduled.) The atomically keyword can be used to specify the behavior of methods such as acquire and release. But an actual executable program should not use the atomically keyword because---on a normal machine---it cannot be directly compiled into machine code. If we want to make the program executable on hardware, we have to show how Lock, acquire, and release are implemented, not just how they are specified. Chapter 8 presents such implementations.

The code in Figure 7.1 also uses the Harmony when statement. A when statement waits until a time in which condition holds (not necessarily the first time) and then executes the statement block. The "await condition" statement is the same as "when condition: pass". Combined with the atomically keyword, the entire statement is executed atomically at a time that the condition holds.

It is important to appreciate the difference between an atomic section (the statements executed within an atomic block of statements) and a critical section (protected by a lock of some sort). The former ensures that while the statements in the block are executing no other thread can execute. The latter allows multiple threads to run concurrently, just not within the critical section. The former is rarely available to a programmer (e.g., none of Python, C, or Java support it), while the latter is very common.

Atomic statements are not intended to replace locks or other synchonization primitives. When implementing synchronization solutions you should not directly use atomic statements but use the synchronization primitives that are available to you. But if you want to specify a synchronization primitive, then use atomically by all means. You can also use atomic statements in your test code. In fact, assert statements are executed atomically and specify that some condition must hold in every execution of the program. Note two important differences with other programming languages: in most programming languages, an assert statement only checks the current execution, and the condition is not evaluated atomically.

Chapter 8. Lock Implementations

const N = 5

in_cs = 0
invariant in_cs in { 0, 1 }

shared = False
private = [ True, ] * N
invariant [x for x in [shared,] + private where not x] == [False,]

def swap(s, p):
    atomically !p, !s = !s, !p

def thread(self):
    while choose({ False, True }):
        # Enter critical section
        while private[self]:
            swap(?shared, ?private[self])

        atomically in_cs += 1
        assert not private[self]
        atomically in_cs -= 1

        # Leave critical section
        swap(?shared, ?private[self])

for i in {0..N-1}:
    spawn thread(i)

Figure 8.1. [code/spinlock.hny] Mutual Exclusion using a "spinlock" based on atomic swap

Locks are the most prevalent and basic form of synchronization in concurrent programs. Typically, whenever you have a shared data structure, you want to protect the data structure with a lock and acquire the lock before access and release it immediately afterward. In other words, you want the access to the data structure to be a critical section. That way, when a thread makes modifications to the data structure that take multiple steps, other threads will not see the intermediate inconsistent states of the data structure. (An inconsistent state of a data structure is either an invalid state or a state that does not reflect a possible value in the execution.)

When there is a bug in a program because some code omitted obtaining a lock before accessing a shared data structure, that is known as a data race. More precisely, a data race happens when there is a state in which multiple threads are trying to access the same variable, at least one of those accesses updates the variable, and at least one of those accesses is non-atomic. In many environments, including C and Java programs, the behavior of concurrent load and store operations have tricky or even undefined semantics. One should therefore avoid data races, which is why Harmony reports them even though Harmony has sequentially consistent memory.

Because atomic operations cannot overlap, data races can be avoided by making operations atomic. In Harmony, this can be done in two ways. First, using the sequential statement, you can specify that concurrent load and store operations to the specified variables are considered atomic. Second, you can make accesses atomic using the atomically keyword. However, these are Harmony specification constructs. In a practical implementation, a programmer will often use one or more locks to prevent overlapping access to shared variables.

To implement locks, multi-core processors provide so-called atomic instructions: special machine instructions that can read memory and then write it in an indivisible step. While the HVM does not have any specific built-in atomic instructions besides loading and storing variables, it does have support for executing multiple instructions atomically. We can use the atomically keyword to specify a wide variety of atomic operations. For example, we could fix the program in Figure 3.3 by constructing an atomic increment operation for a counter, like so:

def atomic_inc(ptr):
    atomically !ptr += 1
    
count = 0
atomic_inc(?count)
To support implementing locks, some CPUs have an atomic "swap" operation. Method swap in Figure 8.1 shows its specification. Here s points to a shared boolean variable (the lock) and p to a private boolean variable, belonging to some thread. The operation exchanges the value of the shared variable and the private variable.

Figure 8.1 goes on to implement mutual exclusion for a set of N threads. The approach is called spinlock, because each thread is "spinning" (executing a tight loop) until it can acquire the lock. The program uses N + 1 boolean variables. Variable shared (the lock) is initialized to False while private[i] for each thread i is initialized to True.

An important invariant, I1 , of the program is that at any time exactly one of these variables is False (expressed in Harmony in Line 8). Another invariant, I2(i) , is that if thread i is in the critical section, then private[i] = False (expressed in Line 20). Between the two (i.e., I1 ∧∀i : I2(i) ), it is clear that only one thread can be in the critical section at the same time. To see that invariant I1 is maintained, we can use induction. Note that !p = True upon entry of swap (because of the condition on the while loop that the swap method is invoked in). There are two cases:

  1. !s is False upon entry to swap. Then upon exit !p = False and !s = True, maintaining the invariant.
  2. !s is True upon entry to swap. Then upon exit nothing has changed, maintaining the invariant.
Invariant I1 is also easy to verify for exiting the critical section because we can assume, by the induction hypothesis, that private[i] is True just before exiting the critical section. Invariant I2(i) is obvious as (i) thread i only proceeds to the critical section if private(i] is False, and (ii) no other thread modifies private(i].

Harmony can check these invariants. I1(i) is specified in Line 8, while I2(i) is expressed in Line 20 using an assert statement in the critical section. The expression in Line 8 counts the number of False values in the list consisting of the shared variable and the private variables and checks that the result is 1.

def test_and_set(s) returns oldvalue:
    atomically:
        oldvalue = !s
        !s = True

def atomic_store(p, v):
    atomically !p = v

def Lock() returns initvalue:
    initvalue = False

def acquire(lk):
    while test_and_set(lk):
        pass

def release(lk):
    atomic_store(lk, False)

Figure 8.2. [code/lock_tas.hny] Implementation of the lock specification in Figure 7.1 using a spinlock based on test-and-set

const MAX_THREADS = 8

def fetch_and_increment(p) returns oldvalue:
    atomically:
        oldvalue = !p
        !p = (!p + 1) % MAX_THREADS

def atomic_load(p) returns value:
    atomically value = !p

def Lock():
    result = { .counter: 0, .dispenser: 0 }

def acquire(lk):
    let my_ticket = fetch_and_increment(?lk->dispenser):
        while atomic_load(?lk->counter) != my_ticket:
            pass

def release(lk):
    fetch_and_increment(?lk->counter)

Figure 8.3. [code/lock_ticket.hny] Implementation of the lock specification in Figure 7.1 using a ticket lock

import lock

const N = 5

in_cs = 0
invariant in_cs in { 0, 1 }

thelock = lock.Lock()

def thread():
    while choose({ False, True }):
        lock.acquire(?thelock)

        atomically in_cs += 1
        # Critical section
        atomically in_cs -= 1

        lock.release(?thelock)

for i in {1..N}:
    spawn thread()

Figure 8.4. [code/lock_test1.hny] A test program for locks (based on \autoref{fig:cs})

The lock implementation in Figure 8.1 uses a shared variable and a private variable for each thread. The private variables themselves are actually implemented as shared variables, but they are accessed only by their respective threads. A thread usually does not need to keep explicit track of whether it has a lock or not, because it is implied by the control flow of the program---a thread implicitly knows that when it is executing in a critical section it has the lock. There is no need to keep private as a shared variable---we only did so to be able to show and check the invariants. But in practice this is very expensive: an atomic swap operation requires two memory loads and two memory stores.

A more common solution to implementing a lock uses an atomic test-and-set instruction provided by most CPUs. Figure 8.2 shows the specification of test_and_set and the implementation of a spinlock based on it. Method test_and_set involves only a single variable: the method sets the variable to True and returns the old value of the variable. The variable represents the lock: if free, the value is False; if taken, the value is True. Method Lock() returns the initial value of a lock, which is False. Method acquire(lk) invokes test_and_set(lk) until it returns False, meaning that the lock variable pointed to by lk was available (but now taken). Method release(lk) releases the lock by setting !lk back to False. The lock is cleared in an atomic statement to prevent a data race. (CPUs typically provide an atomic store operation.) Besides this solution being more efficient than Figure 8.1, the solution is general for any number of threads.

You can test the spinlock with the program in Figure 8.4 using the command harmony -m synch=lock_tas code/lock_test1.hny. The -m flag tells Harmony to use the lock_tas.hny file for the synch module rather than the standard synch module (which contains only a specification of the lock methods). The test program has a collection of threads repeatedly enter a critical section and testing that there is at most one thread in the critical section at any time.

The spinlock implementation suffers potentially from starvation: an unlucky thread may never be able to get the lock while other threads successfully acquire the lock one after another. It could even happen with just two threads: one thread might successfully acquire the lock repeatedly in a loop, while another thread is never lucky enough to acquire the lock in between. A ticket lock (Figure 8.3 is an implementation of a lock that prevents starvation using an atomic fetch-and-increment operator. It is inspired by European bakeries. A European bakery often has a clearly displayed counter (usually just two digits) and a ticket dispenser. Tickets are numbered 0 through 99 and repeat over and over again (in the case of a two digit counter). When a customer walks into the bakery, they draw a number from the dispenser and wait until their number comes up. Every time a customer has been helped, the counter is incremented. (Note that this only works if there can be no more than 100 customers in the bakery at a time.)

Figure 8.3 uses two variables for a ticket lock, counter and dispenser. When a thread tries to acquire the lock, it first fetches the current value of the dispenser variable (i.e., the ticket) and increments the variable modulo MAX_THREADS, all in one atomic operation. In practice, MAX_THREADS would be a number like 232 or 264 , but since the Harmony model checker checks every possible state, limiting MAX_THREADS to a small number helps to keep the time and memory needed to model check a Harmony program within reason. Moreover, it is easier to check that it fails when you run it with more than MAX_THREADS threads. After obtaining the ticket, the thread waits until its ticket value equals the counter. Note that loading the counter must also be done atomically in order to avoid a data race. The release the lock, it suffices to atomically increment the counter. You can test the implementation using the command harmony -m synch=lock_ticket code/lock_test1.hny. To see it fail, try harmony -c N=10 -m synch=lock_ticket code/lock_test1.hny.

We now turn to a radically different way of implementing locks, one that is commonly provided by operating systems to user processes. We call a thread blocked if a thread cannot change the state or terminate unless another thread changes the state first. A thread trying to acquire a lock held by another thread is a good example of a thread being blocked. The only way forward is if the other thread releases the lock.

def Lock() returns result:
    result = { .acquired: False, .suspended: [] }

def acquire(lk):
    atomically:
        if lk->acquired:
            stop ?lk->suspended[len lk->suspended]
            assert lk->acquired
        else:
            lk->acquired = True

def release(lk):
    atomically:
        assert lk->acquired
        if lk->suspended == []:
            lk->acquired = False
        else:
            go (lk->suspended[0]) ()
            del lk->suspended[0]

Figure 8.5. [modules/lock_susp.hny] Lock implementation using suspension

In most operating systems, threads are virtual (as opposed to "raw CPU cores") and can be suspended until some condition changes. For example, threads are blocked while they wait for a disk block they are trying to read or while they wait for a network message to arrive. Similarly, a thread that is trying to acquire a lock can be suspended until the lock is available. In Harmony, a thread can suspend itself and save its context (state) in a shared variable. Recall that the context of a thread contains its program counter, stack, and register (containing the current method's variables). In Harmony, a context is a Harmony value and can be saved in a variable just like any other value. The syntax of the expression that a thread executes to suspend itself is as follows:

stop s
Here s is a pointer to some variable. The expression causes the context of the thread to be saved in !s and the thread to be no longer running. Another thread can revive the thread using the go statement:

go !s r
Here r is any Harmony value. It causes a thread with the context contained in !s to be added to the state that has just executed the stop s expression. The stop expression that this thread last executed resumes by returning the value r.

Figure 8.5 shows the lock interface using suspension. It is implemented as follows:

Selecting the first thread is a design choice. Another implementation could have picked the last one, and yet another implementation could have used choose to pick an arbitrary one. Selecting the first (FIFO) is a common choice in lock implementations as it is easy and prevents starvation.

You will find that using the implementation of a lock instead of the specification of a lock (in the synch module) often leads to the model checker searching a significantly larger state space. Thus it makes sense to model check larger programs in a modular fashion: model check one module implementation at a time, using specifications for the other modules.

Exercises

x, y = 0, 100

def setX(a):
    x = a
    y = 100 - a

def getXY() returns xy:
    xy = [x, y]

def checker():
    let xy = getXY():
        assert (xy[0] + xy[1]) == 100, xy
    
spawn checker()
spawn setX(50)

Figure 8.6. [code/xy.hny] Incomplete code for Exercise 8.2 with desired invariant x + y = 100

from synch import Lock, acquire, release

const N_ACCOUNTS = 2
const N_CUSTOMERS = 2
const N_ATMS = 2
const MAX_BALANCE = 1

accounts = [ { .lock: Lock(), .balance: choose({0..MAX_BALANCE})}
                            for i in {1..N_ACCOUNTS} ]

invariant min(accounts[acct].balance for acct in {0..N_ACCOUNTS-1}) >= 0

def atm_check_balance(acct) returns balance:  # return the balance on acct
    acquire(?accounts[acct].lock)
    balance = accounts[acct].balance
    release(?accounts[acct].lock)

def atm_withdraw(acct, amount) returns success: # withdraw amount from acct
    assert amount >= 0
    acquire(?accounts[acct].lock)
    accounts[acct].balance -= amount
    release(?accounts[acct].lock)
    success = True

def customer(atm, acct, amount):
    assert amount >= 0
    let bal = atm_check_balance(acct):
        if amount <= bal:
            atm_withdraw(acct, amount)
        
for i in {1..N_ATMS}:
    spawn customer(i, choose({0..N_ACCOUNTS-1}),
                      choose({0..MAX_BALANCE}))

Figure 8.7. [code/atm.hny] Withdrawing money from an ATM

Chapter 9. Concurrent Data Structures

def Queue() returns empty:
    empty = []

def put(q, v):
    !q += [v,]

def get(q) returns next:
    if !q == []:
        next = None
    else:
        next = (!q)[0]
        del (!q)[0]
def Queue() returns empty:
    empty = []

def put(q, v):
    atomically !q += [v,]

def get(q) returns next:
    atomically:
        if !q == []:
            next = None
        else:
            next = (!q)[0]
            del (!q)[0]
(a) [code/queue_nonatom.hny] Sequential (b) [code/queue.hny] Concurrent

Figure 9.1. A sequential and a concurrent specification of a queue

import queue

def sender(q, v):
    queue.put(q, v)

def receiver(q):
    let v = queue.get(q):
        assert v in { None, 1, 2 }

demoq = queue.Queue()
spawn sender(?demoq, 1)
spawn sender(?demoq, 2)
spawn receiver(?demoq)
spawn receiver(?demoq)

Figure 9.2. [code/queue_test1.hny] Using a concurrent queue

The most common use for locks is in building concurrent data structures. By way of example, we will first demonstrate how to build a concurrent queue. Concurrent queues turn out to be very important in many concurrent and distributed programs, particularly in so-called producer/consumer scenarios where one or more threads are producing items and one or more other threads consume them.

The queue module can be defined as follows:

Figure 9.1(a) shows a sequential specification for such a queue in Harmony. Sequential specifications are useful to write down. They tend to be easy to understand as you only need to understand each operation individually without having to worry about interactions between operations, because we are assuming that operations are executed one at a time. Also, when writing down a specification, you do not have to worry about efficiency. In this particular case, we capture the state of a queue by a list. As a result, in the absence of concurrency, the sequential specification of the queue is also a credible, if inefficient queue implementation.

Unfortunately, this sequential specification does not work well with threads concurrently accessing this queue. This is because operations can overlap in execution, and so one operation can witness the intermediate state of another operation. As a compiler would be free to implement the sequential specification in any way it wants to, it gets to decide what data structures to use to store the list, and what machine instructions to use to implement the operations. For a sequential specification that is not a problem because we only need to understand the state in between operations, but in a concurrent setting Figure 9.1(a) is underspecified; it does not say what happens when two or more operations overlap.

Figure 9.1(b) shows the corresponding concurrent specification. It is almost identical to the sequential specification, but it states that the actual execution of an operation of each method has to happen atomically sometime between invoking the method and the method completing. It cannot be used as an implementation for a queue, as processors generally do not have atomic operations on lists, but it will work well as a specification. See Figure 9.2 for a simple demonstration program that uses a concurrent queue; the specification can be used with any concurrent program that uses a queue. But this concurrent specification is too much for compilers to handle; we have to write down how to implement it.

While in the concurrent specification operations cannot overlap execution, the same need not be true for an implementation. Indeed, we will show two implementations of a concurrent queue. The first uses critical sections, ensuring that operations cannot overlap. The second allows a get and a put operation on a non-empty queue to execute simultaneously. Both are correct implementations of the concurrent queue specification.

In both implementations, the queue data structure is maintained as a linked list. They use the alloc module for dynamic allocation of nodes in the list, which provides methods malloc() and free(). malloc(v) returns a new memory location initialized to v, which should be released with free() when it is no longer in use.

The implementation in Figure 9.3 uses a lock-based critical section to operate on the data structure. The implementation maintains a head pointer to the first element in the list and a tail pointer to the last element in the list. The head pointer is None if and only if the queue is empty. (None is a special address value that is not the address of any memory location.)

Queue() returns the initial value for a queue object consisting of a None head and tail pointer and a lock. The put(q, v) and get(q) methods both take a pointer q to the queue object. It has to be a pointer rather than the value of the queue object because both methods must be able to modify the queue object. Before a method accesses the value of the head or tail of the queue, it first obtains the lock. When the method is done, it releases the lock. Doing so prevents concurrently executing methods from witnessing an intermediate, inconsistent state of the linked list.

Importantly, note Lines 7 and 8 in Figure 9.2. It would be incorrect to replace these by:

assert queue.get(q) in { None, 1, 2 }
The reason is that queue.get() changes the state by acquiring a lock, but the expressions in assert statements (or invariant and finally statements) must be predicates over the state and thus are not allowed to change the state.

The implementation in Figure 9.3 uses a single lock to protect the queue data structure. A thread acquires the lock before any access and releases it afterwards. This is possibly the simplest way to make a data structure concurrent. However, it can cause significant contention over the lock between a producer and a consumer, reducing performance by serializing operations.

Figure 9.4 shows another concurrent queue implementation that addresses this problem [34]. It is based on the insight that a put method operates on the tail of a queue, while a get method operates on the head, and so in theory they should be able to execute concurrently. Although the algorithm is well-known, what is not often realized (because it is not stated explicitly in the paper) is that it requires sequentially consistent memory. The algorithm must be coded carefully to work correctly with modern programming languages and computer hardware that generally lack sequentially consistent memory.

The implementation uses separate locks for the head and the tail, allowing a put and a get operation to proceed concurrently. The implementation uses a dummy node (aka sentinel node) at the head of the linked list. Except initially, the dummy node is the last node that was dequeued. Using dummy nodes can simplify implementations because they can reduce the number of exceptional conditions. In this case, note that neither the head nor tail pointer are ever None, reducing the need for if statements in the code.

The problem with the original specification of the algorithm is when the queue is empty and there are concurrent get and put operations. They obtain separate locks and then concurrently access the next field in the dummy node---a data race with undefined semantics in most modern environments. To get around this problem, the implementation in Figure 9.4 uses atomic_load and atomic_store from the synch module.

from synch import Lock, acquire, release
from alloc import malloc, free

def Queue() returns empty:
    empty = { .head: None, .tail: None, .lock: Lock() }

def put(q, v):
    let node = malloc({ .value: v, .next: None }):
        acquire(?q->lock)
        if q->tail == None:
            q->tail = q->head = node
        else:
            q->tail->next = node
            q->tail = node
        release(?q->lock)
    
def get(q) returns next:
    acquire(?q->lock)
    let node = q->head:
        if node == None:
            next = None
        else:
            next = node->value
            q->head = node->next
            if q->head == None:
                q->tail = None
            free(node)
    release(?q->lock)

Figure 9.3. [code/queue_lock.hny] An implementation of a concurrent queue data structure and a depiction of a queue with three elements

from synch import Lock, acquire, release, atomic_load, atomic_store
from alloc import malloc, free

def Queue() returns empty:
    let dummy = malloc({ .value: (), .next: None }):
        empty = { .head: dummy, .tail: dummy, .hdlock: Lock(), .tllock: Lock() }

def put(q, v):
    let node = malloc({ .value: v, .next: None }):
        acquire(?q->tllock)
        atomic_store(?q->tail->next,  node)
        q->tail = node
        release(?q->tllock)

def get(q) returns next:
    acquire(?q->hdlock)
    let dummy = q->head
    let node = atomic_load(?dummy->next):
        if node == None:
            next = None
            release(?q->hdlock)
        else:
            next = node->value
            q->head = node
            release(?q->hdlock)
            free(dummy)

Figure 9.4. [code/queue_MS.hny] A queue with separate locks for enqueuing and dequeuing items and a depiction of a queue with two elements

from alloc import malloc

def SetObject() returns object:
    object = malloc({})

def insert(s, v):
    atomically !s |= {v}

def remove(s, v):
    atomically !s -= {v}

def contains(s, v) returns present:
    atomically present = v in !s

Figure 9.5. [code/setobj.hny] Specification of a concurrent set object

from setobj import *

myset = SetObject()

def thread1():
    insert(myset, 1)
    let x = contains(myset, 1):
        assert x

def thread2(v):
    insert(myset, v)
    remove(myset, v)

spawn thread1()
spawn thread2(0)
spawn thread2(2)

Figure 9.6. [code/setobj_test1.hny] Test code for set objects

from synch import Lock, acquire, release
from alloc import malloc, free

def _node(v, n) returns node:   # allocate and initialize a new list node
    node = malloc({ .lock: Lock(), .value: v, .next: n })

def _find(lst, v) returns pair:
    var before = lst
    acquire(?before->lock)
    var after = before->next
    acquire(?after->lock)
    while after->value < (0, v):
        release(?before->lock)
        before = after
        after = before->next
        acquire(?after->lock)
    pair = (before, after)

def SetObject() returns object:
    object = _node((-1, None), _node((1, None), None))

def insert(lst, v):
    let before, after = _find(lst, v):
        if after->value != (0, v):
            before->next = _node((0, v), after)
        release(?after->lock)
        release(?before->lock)

def remove(lst, v):
    let before, after = _find(lst, v):
        if after->value == (0, v):
            before->next = after->next
            free(after)
        release(?before->lock)

def contains(lst, v) returns present:
    let before, after = _find(lst, v):
        present = after->value == (0, v)
        release(?after->lock)
        release(?before->lock)

Figure 9.7. [code/setobj_linkedlist.hny] Implementation of a set of values using a linked list with fine-grained locking

A queue has the nice property that only the head or the tail can be accessed. However, in many data structures it is necessary to "walk" the entire data structure, an operation that can take significant time. In such a case, a single lock (known as a "big lock") for the entire data structure might restrict concurrency to an unacceptable level. To reduce the granularity of locking, each node in the data structure must be endowed with its own lock instead.

Figure 9.5 gives the specification of a concurrent set object. SetObject() returns a pointer to a variable that contains an empty set, rather than returning an empty set value. As such, it is more like an object in an object-oriented language than like a value in its own right. Values can be added to the set object using insert() or deleted using remove(). Method contains() checks if a particular value is in the list. Figure 9.6 contains a simple (although not very thorough) test program to demonstrate the use of set objects.

Figure 9.7 implements a concurrent set object using an ordered linked list without duplicates. The list has two dummy "book-end" nodes with values (-1, None) and (1, None). A value v is stored as (0, v)---note that for any value v, (-1, None) < (0, v) < (1, None) (because of lexicographical ordering of tuples). An invariant of the algorithm is that at any point in time the list is "valid," starting with a (-1, None) node and ending with an (1, None) node.

Each node has a lock, a value, and next, a pointer to the next node (which is None for the (1, None) node to mark the end of the list). The _find(lst, v) helper method first finds and locks two consecutive nodes before and after such that before->data.value < (0, v) <= after->data.value. It does so by performing something called hand-over-hand locking. It first locks the first node, which is the (-1, None) node. Then, iteratively, it obtains a lock on the next node and release the lock on the last one, and so on, similar to climbing a rope hand-over-hand. Using _find, the insert, remove, and contains methods are fairly straightforward to implement.

Exercises

Chapter 10. Testing: Checking Behaviors

import queue, queueconc

const NOPS = 4
const VALUES = { 1..NOPS }

specq = queue.Queue()
implq = queueconc.Queue()

for i in {1..NOPS}:
    let op = choose({ "get", "put" }):
        if op == "put":
            let v = choose(VALUES):
                queueconc.put(?implq, v)
                queue.put(?specq, v)
        else:
            let v = queueconc.get(?implq)
            let w = queue.get(?specq):
                assert v == w

Figure 10.1. [code/queue_test_seq.hny] Sequential queue test

Testing is a way to increase confidence in the correctness of an implementation. Figure 9.2 demonstrates how concurrent queues may be used, but it is not a very thorough test program for an implementation such as the one in Figure 9.3 and does little to increase our confidence in its correctness. To wit, if get() always returned 1, the program would find no problems. In this chapter, we will look at approaches to testing concurrent code.

Checking the behavior of a concurrent queue can be tricky. For example, suppose that, in some execution of Figure 9.2, thread 1 invokes put(?demoq, 1) and then thread 2 invokes put(?demoq, 2) and then thread 3 invokes get(?demoq). What value should thread 3 obtain? The answer is: it depends, or in other words, not enough information is provided. All of 1, 2, and None are possible outcomes. Certainly, if put(?demoq, 1) completes before put(?demoq, 2) is invoked and put(?demoq, 2) completes before get(?demoq) is invoked, then 1 is the only correct result. However, if put(?demoq, 1) is invoked first, and then put(?demoq, 2) is invoked before put(?demoq, 1) has completed, then those two operations are executing concurrently, and it might come to pass that 2 is enqueued before 1. In that case, if get(?demoq) is invoked after both those operations complete, both 1 and 2 are possible results, but None cannot be. However, if get(?demoq) is invoked before the other two operations complete, then None is a valid outcome as well, even if the other two operations were invoked before get(?demoq) was invoked.

The issue is that operations take time, and therefore operations can overlap. While locks reduce the amount to which operations can overlap, they do not completely eliminate it, and obtaining a lock takes time as well. Unlike critical sections, it turns out that checking the correct behavior of a queue is difficult to do with assertions alone because it is difficult to characterize the set of correct behaviors of a concurrent queue.

Behaviors say something about how we got to a state. The same state can be reached by multiple behaviors, and the behaviors are often an integral part of whether a program is correct or not. Just because a state satisfies some invariant---however important---does not mean that the state is valid given the sequence of operations. For example, a state in which the queue is empty is certainly a valid state in its own right, but if the last operation to get there was an enqueue operation, there must be a bug in the program. It can therefore be important to capture the behaviors. We could store behaviors in the state itself by adding what is known as a history variable that keeps track of all the operations. While this can be useful for correctness proofs, for model checking this approach presents a problem: introducing this additional state can lead to state explosion or even turn a finite model (a model with a finite number of states) into an infinite one.

Fortunately, if we have a specification of a queue, we can generate the set of correct behaviors of a queue. For simplicity, we will first check if the queue implementation in Figure 9.3 meets the sequential queue specification in Figure 9.1(a). To check if the queue implementation meets the specification, we need to see if any sequence of queue operations in the implementation matches a corresponding sequence in the specification. We say that the implementation and the specification have the same behaviors or are behaviorally equivalent. This is called differential testing.

Figure 10.1 presents a differential test program for sequences of up to NOPS queue operations. It maintains two queues:

For each operation, the code first chooses whether to perform a get or put operation. In the case of a put operation, the code also chooses which value to append to the queue. All operations are performed on both the queue implementation and the queue specification. In the case of get, the results of the operation on both the implementation and specification are checked against one another.

Test programs themselves should be tested. Just because a test program works with a particular implementation does not mean the implementation is correct---it may be that the implementation is incorrect but the test program does not have enough coverage to find any bugs in the implementation. So, run a test program like this with a variety of queue implementations that have known bugs in them and make sure that the test program finds them. Conversely, a test program may be broken in that it finds bugs that do not exist. In my experience, it is often harder to implement the test program than the algorithm that the test program tests.

As with any other test program, Figure 10.1 may not trigger extant bugs, but it nonetheless inspires reasonable confidence that the queue implementation is correct, at least sequentially. The higher NOPS, the higher the confidence. It is possible to write similar programs in other languages such as Python, but the choose expression in Harmony makes it relatively easy to explore all corner cases. For example, a common programming mistake is to forget to update the tail pointer in get() in case the queue becomes empty. Normally, it is a surprisingly tricky bug to find. You can comment out those lines in Figure 9.3 and run the test program---it should easily find the bug and explain exactly how the bug manifests itself, adding confidence that the test program is reasonably thorough.

The test program also finds some common mistakes in using locks, such as forgetting to release a lock when the queue is empty, but it is not designed to find concurrency bugs in general. If you remove all acquire() and release() calls from Figure 9.3, the test program will not (and should not) find any errors, but it would be an incorrect implementation of a concurrent queue.

import queue

const N_PUT = 2
const N_GET = 2
q = queue.Queue()

def put_test(self):
    print("call put", self)
    queue.put(?q, self)
    print("done put", self)

def get_test(self):
    print("call get", self)
    let v = queue.get(?q):
        print("done get", self, v)

for i in {1..N_PUT}:
    spawn put_test(i)
for i in {1..N_GET}:
    spawn get_test(i)

Figure 10.2. [code/queue_btest2.hny] Concurrent queue test. The behavior DFA is for N_PUT = N_GET = 1.

import queue, threading, random

N_PUT = 3
N_GET = 3
q = queue.Queue()

def put_test(self):
    print("call put", self)
    q.put(self)
    print("done put", self)

def get_test(self):
    print("call get", self)
    try:
        v = q.get(block=False)
        print("done get", self, v)
    except queue.Empty:
        print("done get empty", self)

for i in range(N_PUT):
    threading.Thread(target=put_test, args=(i,)).start()
for i in range(N_GET):
    threading.Thread(target=get_test, args=(i,)).start()

Figure 10.3. [python/queue_btest2.py] Python implementation of Figure 10.2

from synch import Lock, acquire, release

def Queue() returns empty:
    empty = { .list: [], .lock: Lock() }

def put(q, v):
    acquire(?q->lock)
    q->list += [v,]
    release(?q->lock)

def get(q) returns next:
    acquire(?q->lock)
    if q->list == []:
        next = None
    else:
        next = q->list[0]
    release(?q->lock)
    acquire(?q->lock)
    if q->list != []:
        del q->list[0]
    release(?q->lock)

Figure 10.4. [code/queue_broken1.hny] A correct sequential but not a correct concurrent queue implementation

We will now see how to test whether the queue implementation meets the concurrent queue specification or not. Figure 9.1(b) shows the concurrent queue specification. It is similar to the sequential specification in Figure 9.1(a) but makes all operations (except instantiation itself) atomic. This means that the effect of an operation appears to happens instantaneously sometime between invoking the operation and the operation completing, and in particular the effects cannot appear to overlap, while the operations themselves can. Testing the implementation of a concurrent queue specification is trickier than testing the implementation of a sequential one because there are many more scenarios to check.

We would like a way that---similar to the sequential test---systematically compares behaviors of the concurrent queue implementation with behaviors of the concurrent queue specification. But we cannot do this by composing the specification and the implementation and simply run the same test operations on both as we did before---concurrency make the operations non-determistic and thus the specification and implementation of a single execution might produce different results, even if both are correct. Instead, we will create a test program that tries various concurrent combinations of queue operations, and run it twice: once against the specification of the concurrent queue and once against the implementation. In the second phase, we check if the behaviors obtained from running the implementation are also behaviors obtained from the specification.

Figure 10.2 shows the test program. It starts N_PUT threads doing a put operation and N_GET threads doing a get operation. In case of a put operation, the thread enqueues its own name (which is provided as an argument to the thread). In order to capture the behaviors, each thread prints what operation it is about to perform, and afterwards it prints that the operation has completed (including the return value if any). The figure also shows the \emph{behavior DFA} that captures the generated behaviors in case N_PUT = N_GET = 1. You can see that if get() completes before put(1) is invoked, then get() must return None. Vice versa, if put(1) completes before get() is invoked, then get() must return 1. Otherwise get() can return either None or 1 depending on how the two operations are serialized.

This is probably much like you would do if you were trying to find a bug in a program. Figure 10.3 shows a Python implementation of the same test program. You can run it a bunch of times and manually check the output. There are, however, two problems with this approach. First, it is often difficult to check if the behaviors you find are correct ones, and it is easy to overlook problems in the output. Second, the test program may not check all possible behaviors.

Using Harmony, these problems can be avoided. One approach is to compare the two behavior DFAs by manual inspection:

$ harmony -c N_PUT=1 -c N_GET=1 code/queue_btest2.hny
$ harmony -c N_PUT=1 -c N_GET=1 -m queue=queue_lock code/queue_btest2.hny
You can try this for various N_PUT and N_GET, slowly increasing their values, although it gets increasingly harder to check by hand that the generated DFAs are the same. For example, run the test program against Figure 10.4, which is an incorrect implementation of the concurrent queue specification in that it cannot handle concurrent get operations correctly. Try, for example:

$ harmony -c N_GET=2 -o x.png code/queue_btest2.hny
$ harmony -c N_GET=2 -o y.png -m queue=queue_broken1 code/queue_btest2.hny
and compare the finite state automata in x.png and y.png. If you look carefully, you will find that the two behaviors are not the same, although the differences are hard to spot. Harmony does not necessarily detect any problems itself this way. This is because the test program only outputs the behaviors---it does not check if they are correct.

Harmony does have a way to check the behaviors of one program against the behaviors of another. In particular, we want to check if the behaviors of the an implementation matches the behaviors of the specification. The following shows, for example, how to check the queue_lock.hny implementation on the command line:

$ harmony -o queue.hfa code/queue_btest2.hny
$ harmony -B queue.hfa -m queue=queue_lock code/queue_btest2.hny
The first command runs the code/queue_btest2.hny program (with the default 4 threads) and writes a representation of the output DFA in the file queue.hfa. The second command runs the same test program, but using the queue implementation in the file code/queue_lock.hny. Moreover, it reads the DFA in queue.hfa to check if every behavior of the second run of the test program is also a behavior of the first run.

Now try the same using the queue_broken1.hny implementation:

$ harmony -o queue.hfa code/queue_btest2.hny
$ harmony -B queue.hfa -m queue=queue_broken1 code/queue_btest2.hny
This way, Harmony will find a "behavior violation" that is hard to spot manually.

Exercises

Chapter 11. Debugging

from synch import Lock, acquire, release
from alloc import malloc, free

def Queue() returns empty:
    empty = { .next: None, .value: None, .lock: Lock() }

def put(q, v):
    let node = malloc({ .next: None, .value: v, .lock: Lock() }):
        var nq = q
        while nq != None:
            acquire(?nq->lock)
            let n = nq->next:
                if n == None:
                    nq->next = node
                release(?nq->lock)
                nq = n

def get(q) returns next:
    acquire(?q->lock)
    if q->next == None:
        next = None
    else:
        let node = q->next:
            q->next = node->next
            next = node->value
            free(node)
    release(?q->lock)

Figure 11.1. [code/queue_broken2.hny] A buggy queue implementation

$ harmony -m queue=queue_broken2 code/queue_btest2.hny
  • Phase 1: compile Harmony program to bytecode
  • Phase 2: run the model checker (nworkers = 8)
    • 4021 states (time 0.00s, mem=0.000GB)
    • 781/11513 computations/edges
  • Phase 3: analysis
    • Safety Violation

Summary: something went wrong in an execution

Here is a summary of an execution that exhibits the issue:

  • Schedule thread T0: init()
    • Line alloc/1: Initialize alloc$pool to {:}
    • Line alloc/2: Initialize alloc$next to 0
    • Line 5: Initialize q to { "lock": False, "next": None, "value": None }
    • Thread terminated
  • Schedule thread T1: puttest(1)
    • Line 8: Print [ "call put", 1 ]
    • Preempted in puttest(1) -- put(?q, 1) -- malloc({ "lock": False, "next": None, "value": 1 }) about to execute atomic section in line alloc/7
  • Schedule thread T4: gettest(2)
    • Line 13: Print [ "call get", 2 ]
    • Preempted in gettest(2) -- get(?q) -- acquire(?q["lock"]) about to execute atomic section in line synch/35
  • Schedule thread T1: puttest(1) -- put(?q, 1) -- malloc({ "lock": False, "next": None, "value": 1 })
    • Line alloc/8: Initialize alloc$pool[0] to { "lock": False, "next": None, "value": 1 }
    • Line alloc/10: Set alloc$next to 1 (was 0)
    • Line synch/36: Set q["lock"] to True (was False)
    • Line queue/14: Set q["next"] to ?alloc$pool[0] (was None)
    • Line synch/41: Set q["lock"] to False (was True)
    • Preempted in puttest(1) about to print [ "done put", 1 ] in line 10
  • Schedule thread T2: puttest(2)
    • Line 8: Print [ "call put", 2 ]
    • Line alloc/8: Initialize alloc$pool[1] to { "lock": False, "next": None, "value": 2 }
    • Line alloc/10: Set alloc$next to 2 (was 1)
    • Line synch/36: Set q["lock"] to True (was False)
    • Line synch/41: Set q["lock"] to False (was True)
    • Preempted in puttest(2) -- put(?q, 2) -- acquire(?alloc$pool[0]["lock"]) about to execute atomic section in line synch/35
...

Figure 11.2. Running Figure 10.2 against Figure 11.1

Figure 11.3. HTML output of Figure 11.2 but for N_PUT=2 and N_GET=1

.
So, you wrote a Harmony program and Harmony reports a problem. Often you may just be able to figure it out by staring at the code and going through some easy scenarios, but what if you don't? The output of Harmony can be helpful in that case.

Figure 11.1 contains an attempt at a queue implementation where the queue is implemented by a linked list, with the first node being a dummy node to prevent data races. Each node in the list contains a lock. The put() method walks the list until it gets to the last node, each time acquiring the lock to access the node's fields. When put() gets to the last node in the list, it appends a new one. The get() method locks the first (dummy) node, removes the second from the list and frees it. The method returns the value from the removed node.

Let us run the code through the test programs in the last chapter. Harmony does not detect any issues with the sequential test in Figure 10.1. (Run this using the -m flag like this: harmony -m queue=queue_broken2 code/qtestseq.hny) However, when we run the new queue code through the test in Figure 10.2, Harmony reports a safety violation (even without specifying a behavior). Figure 11.2 shows the command line to reproduce this and the first few lines of markdown output.

Before we go look at the details of what went wrong, we want to make sure that we generate the simplest scenario. So, first we want to explore what the smallest N_PUT and N_GET parameters are. With some experimentation, we find that N_PUT = 2 and N_GET = 1 generates a problem (harmony -m queue=queue_broken2 -c N_PUT=2 -c N_GET=1 code/queue_btest2.hny)). Figure 11.3 shows the HTML output.

There is quite a bit of information in the HTML output. Let's start with looking at the red text. Harmony found a safety violation (something bad happened during one of the possible executions), and in particular put_test(1) (thread T1) was trying to dereference the address ?alloc$pool[0]["lock"]. The alloc module maintains a shared array pool that it uses for dynamic allocation. Apparently T1 tried to access pool[0], but it does not exist, meaning that either it was not yet allocated, or it had been freed since it was allocated. When we look at the top half of the figure under "Shared Variables", we see that in fact thread T2 allocated pool[0] in turn 2 (during put_test(2)), but T3 freed it in turn 4 (during get_test(1)). Looking at the stack traces in the bottom table, we can see that T3 was in the process of executing release(?q.lock) within get(?q). T1 is currently executing acquire(?alloc$pool[0].lock) within put(?q, 1), but alloc$pool[0] does not exist.

So, how did we get there? In the top we can see that the order of events was the following:

  1. initialization completed, with q being { .lock: False, .next: None, .value: None };
  2. thread T2 (put_test(2)) ran and finished executing put(2) (and is about to print ["done put", 2]). We can see that q.next now points to alloc$pool[0], which the thread must have allocated. The contents is { .lock: False, .next: None, .value: 2 }, as expected;
  3. thread T1 (put_test(1)) started running, calling put(?q, 1). We can see it got as far as allocating a node, but it has not yet added the node to the end of the queue. It then tries to acquire alloc$pool[0].lock;
  4. thread T3 (get_test(1)) started running, calling get(?q). We can also see that it freed pool[0], and is now releasing q.lock;
  5. thread T1 resumes and tries to access alloc$pool[0], which no longer exists (because T3 just freed it).
Clearly there was a race in which T1 was trying to lock pool[0].lock (which contained the node with the value 1) while T3 was freeing that very same node, and T1 lost the race. More precisely, T1 was executing put(?q, 1), when T3 preempted it with get(?q) and removed the node that T1 was trying to access. But why did the locks not prevent this?

It is time to start stepping through the code that has been executed before this happened. This is sometimes known as reverse debugging. In fact, Harmony allows you to step through an execution forwards and backwards. In this case, we first want to see what T2 is doing. You can click on its first (top-left) orange box to time-travel to that part in the execution. The boxes are color-coded: each method has its own color. Now by hitting the return key repeatedly, we can quickly skip through the code, line by line. T1 first calls put(?q, 1) and then allocates a new node initialized with a lock. Keep stepping until it executes nq = q. Hit return once more and inspect the state of T1 in the lower-right corner. You can see that variable nq is initialized to ?q. T1 then enters into the while loop and tries to acquire nq->lock. This succeeds, and next T1 executes let n = nq->next. Now n = ?alloc$pool[0], which is not None. It then releases nq->lock (nq points to q). It then sets nq to n, which is still alloc$pool[0]. Finally, it calls acquire(?nq->lock). But before it can complete that operation, T3 runs next.

T3 chooses "get" and then goes on to invoke get(?q). This first successfully acquires q->lock. T3 then finds out that q->next points to alloc$pool[0]. T3 sets node to alloc$pool[0] as well and sets q->next to node->next. T3 sets the method result next to node->value (which is 2) and then frees node. This is where the problem is---T1 is about to acquire the lock in that same node.

To fix the code without changing the data structure, we can use hand-over-hand locking (Chapter 9). Figure 11.4 shows an implementation that uses hand-over-hand locking both for put() and for get(). It passes all tests.

from synch import Lock, acquire, release
from alloc import malloc, free

def Queue() returns empty:
    empty = { .next: None, .value: None, .lock: Lock() }

def put(q, v):
    var nq = q
    let node = malloc({ .next: None, .value: v, .lock: Lock() }):
        acquire(?nq->lock)
        var n = nq->next
        while n != None:
            acquire(?n->lock)
            release(?nq->lock)
            nq = n
            n = n->next
        nq->next = node
        release(?nq->lock)

def get(q) returns next:
    acquire(?q->lock)
    if q->next == None:
        next = None
    else:
        let node = q->next:
            acquire(?node->lock)
            q->next = node->next
            next = node->value
            release(?node->lock)
            free(node)
    release(?q->lock)

Figure 11.4. [code/queue_fix.hny] Queue implementation with hand-over-hand locking

Chapter 12. Conditional Waiting

Critical sections enable multiple threads to easily share data structures whose modification requires multiple steps. A critical section only allows one thread to execute the code of the critical section at a time. Therefore, when a thread arrives at a critical section, the thread waits until there is no other thread in the critical section.

A thread that is waiting for a condition without changing shared variables is considered blocked. In other words, a blocked thread can only change shared variables if some other thread changes them first. A spinlock to protect a critical section is a good example: a thread that is waiting for a spinlock does not change the state of the lock variable until it acquires the lock.

Sometimes it is useful for a thread to wait for additional conditions besides the critical section being unoccupied. For example, when dequeuing from an empty shared queue, the thread may want to block until the queue is non-empty. The alternative to blocking would be active busy waiting (or also just busy waiting) where the thread repeatedly tries to dequeue an item until it is successful. A thread that is active busy waiting until the queue is non-empty cannot make progress until another thread enqueues an item. However, the thread is not considered blocked because it is changing the shared state by repeatedly acquiring and releasing the lock. Doing so wastes CPU cycles and adds contention to queue access.

We would like to find a solution to conditional waiting so that a thread blocks until the condition holds---or at least most of the time. Before we do so, we will give two classic examples of synchronization problems that involve conditional waiting: reader/writer locks and bounded buffers.

def RWlock() returns lock:
    lock = { .nreaders: 0, .nwriters: 0 }

def read_acquire(rw):
    atomically when rw->nwriters == 0:
        rw->nreaders += 1

def read_release(rw):
    atomically rw->nreaders -= 1

def write_acquire(rw):
    atomically when ((rw->nreaders == 0) and (rw->nwriters == 0)):
        rw->nwriters = 1

def write_release(rw):
    atomically rw->nwriters = 0

Figure 12.1. [code/rwlock.hny] Specification of reader/writer locks

import rwlock

nreaders = nwriters = 0
invariant ((nreaders >= 0) and (nwriters == 0)) or
            ((nreaders == 0) and (nwriters == 1))

const NOPS = 4

rw = rwlock.RWlock()

def thread():
    while choose({ False, True }):
        if choose({ "read", "write" }) == "read":
            rwlock.read_acquire(?rw)
            atomically nreaders += 1
            atomically nreaders -= 1
            rwlock.read_release(?rw)
        else:                       # write
            rwlock.write_acquire(?rw)
            atomically nwriters += 1
            atomically nwriters -= 1
            rwlock.write_release(?rw)

for i in {1..NOPS}:
    spawn thread()

Figure 12.2. [code/rwlock_test1.hny] Test code for reader/writer locks

Section 12.1. Reader/Writer Locks

Locks are useful when accessing a shared data structure. By preventing more than one thread from accessing the data structure at the same time, conflicting accesses are avoided. However, not all concurrent accesses conflict, and opportunities for concurrency may be lost, hurting performance. One important case is when multiple threads are simply reading the data structure. In many applications, reads are the majority of all accesses, and read operations do not conflict with one another. Allowing reads to proceed concurrently can significantly improve performance.

What we want is a special kind of lock that allows either (i) one writer or (ii) one or more readers to acquire the lock. This is called a reader/writer lock [11]. A reader/writer lock is an object whose abstract state contains two integer counters (see Figure 12.1):

  1. nreaders: the number of readers
  2. nwriters: the number of writers
satisfying the following invariant: (nreaders 0 nwriters = 0) (nreaders = 0 nwriters = 1) .

There are four operations on a reader/writer lock rw:

  • read_acquire(rw): waits until nwriters = 0 and then increments nreaders;
  • read_release(rw): decrements nreaders;
  • write_acquire(rw): waits until nreaders = nwriters = 0 and then sets nwriters to 1;
  • write_release(rw): sets nwriters to 0.
Similar to ordinary locks, a thread is restricted in how it is allowed to invoke these operations. In particular, a thread can only release a reader/writer lock for reading if it acquired the lock for reading and the same for writing.

import synch

def RWlock() returns lock:
    lock = synch.Lock()

def read_acquire(rw):
    synch.acquire(rw)

def read_release(rw):
    synch.release(rw)

def write_acquire(rw):
    synch.acquire(rw)

def write_release(rw):
    synch.release(rw)

Figure 12.3. [code/rwlock_cheat.hny] "Cheating" reader/writer lock

import rwlock

const NOPS = 3

rw = rwlock.RWlock()

def thread(self):
    while choose({ False, True }):
        if choose({ "read", "write" }) == "read":
            print(self, "enter ra")
            rwlock.read_acquire(?rw)
            print(self, "exit ra")

            print(self, "enter rr")
            rwlock.read_release(?rw)
            print(self, "exit rr")
        else:                       # write
            print(self, "enter wa")
            rwlock.write_acquire(?rw)
            print(self, "exit wa")

            print(self, "enter wr")
            rwlock.write_release(?rw)
            print(self, "enter wr")

for i in {1..NOPS}:
    spawn thread(i)

Figure 12.4. [code/rwlock_btest.hny] A behavioral test of reader/writer locks

from synch import Lock, acquire, release

def RWlock() returns lock:
    lock = { .lock: Lock(), .nreaders: 0, .nwriters: 0 }

def read_acquire(rw):
    acquire(?rw->lock)
    while rw->nwriters > 0:
        release(?rw->lock)
        acquire(?rw->lock)
    rw->nreaders += 1
    release(?rw->lock)

def read_release(rw):
    acquire(?rw->lock)
    rw->nreaders -= 1
    release(?rw->lock)

def write_acquire(rw):
    acquire(?rw->lock)
    while (rw->nreaders > 0) or (rw->nwriters) > 0):
        release(?rw->lock)
        acquire(?rw->lock)
    rw->nwriters = 1
    release(?rw->lock)

def write_release(rw):
    acquire(?rw->lock)
    rw->nwriters = 0
    release(?rw->lock)

Figure 12.5. [code/rwlock_busy.hny] Busy waiting reader/writer lock

Figure 12.2 shows how reader/writer locks operations may be tested. A problem with this test is that it does not find a problem with an implementation like the one in Figure 12.3. This implementation implements a reader/writer lock as an ordinary lock, and thus lets only one thread in the critical section at a time. In some sense, the implementation is correct because it satisfies the invariant, but it is not a desirable implementation because it does not allow multiple readers to enter the critical section at the same time. For a case like this one, it is better to compare behaviors between the specification and the implementation.

Figure 12.4 is the same test as Figure 12.2 but prints identifying information before and every lock operation. Now we can compare behaviors as follows:

$ harmony -o rw.hfa -cNOPS=2 code/rwlock_btest.hny
$ harmony -B rw.hfa -cNOPS=2 -m rwlock=rwlock_cheat code/rwlock_btest.hny
The second command will print a warning that there are behaviors in the specification that are not achieved by the implementation.

Figure 12.5 illustrates an implementation of a reader/writer lock that uses active busy waiting. The solution is careful only to access the variables while holding a lock and at the same time careful to release the lock regularly to allow other threads to change the state. Nonetheless, it is an undesirable solution, as it wastes CPU cycles. Harmony complains about this solution.

def BoundedBuffer(size) returns buffer:
    buffer = { .buffer: [], .size: size }

def put(bb, v):
    atomically when len(bb->buffer) < bb->size:
        bb->buffer += [v,]

def get(bb) returns next:
    atomically when bb->buffer != []:
        next = bb->buffer[0]
        del bb->buffer[0]

Figure 12.6. [code/boundedbuffer.hny] Bounded buffer specification

Section 12.2. Bounded Buffer

A bounded buffer is a queue with the usual put/get interface, but implemented using a buffer of a certain maximum length. If the buffer is full, an enqueuer must wait; if the buffer is empty, a dequeuer must wait. Figure 12.6 specifies a bounded buffer. It is similar to the implementation in Figure 9.1(b) but adds checking for bounds. Coming up with a good implementation is known as the "Producer/Consumer Problem" and was proposed by Dijkstra [17]. Multiple producers and multiple consumers may all share the same bounded buffer.

The producer/consumer pattern is common. Threads may be arranged in pipelines, where each upstream thread is a producer and each downstream thread is a consumer. Or threads may be arranged in a manager/worker pattern, with a manager producing jobs and workers consuming and executing them in parallel. Or, in the client/server model, some thread may act as a server that clients can send requests to and receive responses from. In that case, there is a bounded buffer for each client/server pair. Clients produce requests and consume responses, while the server consumes requests and produces responses.

Unlike an ordinary queue, where queues can grow arbitrarily, bounded buffers provide flow control: if the consumer runs faster than the producer (or producers), it will automatically block until there are new requests. Similarly, if the producers add requests at a rate that is higher than the consumers can deal with, the producers are blocked. While a buffer of size 1 already provides those properties, a larger buffer is able to deal with short load spikes without blocking producers.

Exercises

Chapter 13. Condition Variables

The last chapter introduced conditional waiting, but the implementations provided were unsatisfactory, either because they unnecessarily restricted behaviors or because they were inefficient. Condition variables are a synchronization primitive designed to allow for general and efficient solutions to conditional waiting problems.

In the late 70s, researchers at Xerox PARC, where among others the desktop and Ethernet were invented, developed a new programming language called Mesa [31]. Mesa introduced various important concepts to programming languages, including software exceptions and incremental compilation. The Mesa language also incorporated a version of condition variables. There are two main classes of condition variable semantics. Appendix I describes Hoare condition variables. However, most programming languages today support the semantics of Mesa's condition variables. Below, we will use the semantics that Mesa provides.

A condition variable works in conjunction with a lock, often called mutex (for mutual exclusion) in this context. A thread p that holds the mutex can invoke the wait operation on a condition variable. This temporarily releases the mutex and places thread p on a queue associated with the condition variable so p no longer executes. Another thread can notify the condition variable. If the condition variable's queue is non-empty, this operation removes a thread q from the queue (according to some scheduling policy) and makes q runnable again. Thread q first re-acquires the mutex before resuming from the wait operation it invoked. If, on the other hand, the condition variable's queue is empty, the notify operation is a no-op.

Condition variables also allow notifying multiple threads. For example, a thread can invoke notify twice---if there are two or more threads waiting on the condition variable, two will be resumed. Operation notifyAll (aka broadcast)) notifies all threads that are waiting on a condition.

from synch import *

def RWlock() returns lock:
    lock = {
            .nreaders: 0, .nwriters: 0, .mutex: Lock(),
            .r_cond: Condition(), .w_cond: Condition()
        }
    
def read_acquire(rw):
    acquire(?rw->mutex)
    while rw->nwriters > 0:
        wait(?rw->r_cond, ?rw->mutex)
    rw->nreaders += 1
    release(?rw->mutex)

def read_release(rw):
    acquire(?rw->mutex)
    rw->nreaders -= 1
    if rw->nreaders == 0:
        notify(?rw->w_cond)
    release(?rw->mutex)

def write_acquire(rw):
    acquire(?rw->mutex)
    while (rw->nreaders > 0) or (rw->nwriters > 0):
        wait(?rw->w_cond, ?rw->mutex)
    rw->nwriters = 1
    release(?rw->mutex)

def write_release(rw):
    acquire(?rw->mutex)
    rw->nwriters = 0
    notifyAll(?rw->r_cond)
    notify(?rw->w_cond)
    release(?rw->mutex)

Figure 13.1. [code/rwlock_cv.hny] Reader/Writer Lock using Mesa-style condition variables

We demonstrate Mesa condition variables using an implementation of reader/writer locks. Figure 13.1 shows the code, which, the reader should note, is similar to the busy waiting solution in Figure 12.5. mutex is the shared lock that protects the critical region. There are two condition variables: readers wait on r_cond and writers wait on w_cond. The implementation also keeps track of the number of readers and writers in the critical section.

Note that wait is always invoked within a while loop that checks for the condition that the thread is waiting for. It is imperative that there is always a while loop around any invocation of wait containing the negation of the condition that the thread is waiting for. Many implementation of Mesa condition variables depend on this, and optimized implementations of condition variables often allow so-called "spurious wakeups," where wait may sometimes return even if the conditon variable has not been notified. As a rule of thumb, one should always be able to replace wait by release followed by acquire. This turns the solution into a busy-waiting one, inefficient but still correct.

In read_release, notify(?w_cond) is invoked when there are no readers left, without checking if there are writers waiting to enter. This is ok, because calling notify is a no-op if no thread is waiting.

write_release executes notifyAll(?r_cond) as well as notify(?w_cond). Because we do not keep track of the number of waiting readers or writers, we have to conservatively assume that all waiting readers can enter, or, alternatively, up to one waiting writer can enter. Awakening both readers and writers is ok because both execute wait within a while loop, re-checking the condition that they are waiting for. So, if both types of threads are waiting, either all the readers get to enter next or one of the writers gets to enter next. (If you want to prevent waking up both readers and a writer, then you can keep track of how many threads are waiting for each condition.)

When using condition variables, you have to be careful to invoke notify or notifyAll in the right places. Much of the complexity of programming with condition variables is in figuring out when to invoke notify and when to invoke notifyAll. As a rule of thumb: be conservative---it is better to wake up too many threads than too few. In case of doubt, use notifyAll. For example, in Figure 13.1 it is ok, if inefficient, to replace the notify operations with notifyAll, but it would be incorrect to replace the notifyAll operation by notify. Waking up too many threads may lead to some inefficiency, but waking up too few may cause the application to get stuck. Harmony can be particularly helpful here, as it examines each and every corner case. You can try to replace each notifyAll with notify and see if every possible execution of the application still terminates.

Andrew Birrell's paper on Programming with Threads gives an excellent introduction to working with Mesa-style condition variables [5]. They have been adopted by all major programming languages. In Java, each object has a hidden lock and a hidden condition variable associated with it. Methods declared with the synchronized keyword automatically obtain the lock. Java objects also support wait, notify, and notifyAll. In addition, Java supports explicit allocations of locks and condition variables. In Python, locks and condition variables must be explicitly declared. The with statement makes it easy to acquire and release a lock for a section of code. In C and C++, support for locks and condition variables is entirely through libraries.

def Condition() returns condition:
    condition = bag.empty()

def wait(c, lk):
    var cnt = 0
    let _, ctx = save():
        atomically:
            cnt = bag.multiplicity(!c, ctx)
            !c = bag.add(!c, ctx)
            !lk = False
        atomically when (not !lk) and (bag.multiplicity(!c, ctx) <= cnt):
            !lk = True

def notify(c):
    atomically if !c != bag.empty():
        !c = bag.remove(!c, bag.bchoose(!c))
        
def notifyAll(c):
    !c = bag.empty()

Figure 13.2. [modules/synch.hny] Implementation of condition variables in the synch module

Harmony provides support for Mesa monitors through the Harmony synch module. Figure 13.2 shows the implementation of condition variables in the synch module. Condition() creates a new condition variable. It is represented by a dictionary containing a bag of contexts of threads waiting on the condition variable. By the way, in Harmony a bag is usually represented by a dictionary that maps the elements of the bag to their multiplicities. For example, the value { .a: 2, .b: 3 } represents a bag with two copies of .a and three copies of .b. The bag module (Section B.3) contains a variety of handy functions on bags.

Method wait adds the context of the thread---used as a unique identifier for the thread---to the bag, incrementing the number of threads in the bag with the same context. The Harmony save expression (Section C.3) returns a tuple containing a value (in this case `()') and the context of the thread. wait then loops until that count is restored to the value that it had upon entry to wait. Method notify removes an arbitrary context from the bag, allowing one of the threads with that context to resume and re-acquire the lock associated with the monitor. notifyAll empties out the entire bag, allowing all threads in the bag to resume.

Exercises

const N = 10

availGPUs = {1..N}

def gpuAlloc() returns gpu:
    gpu = choose(availGPUs)
    availGPUs -= { result }

def gpuRelease(gpu):
    availGPUs |= { gpu }

Figure 13.3. [code/gpu.hny] A thread-unsafe GPU allocator

def Qsort(arr) returns state:
    state = { .arr: arr, .todo: { (0, len(arr) - 1) } }

def swap(p, q):               # swap !p and !q
    !p, !q = !q, !p

def partition(qs, lo, hi) returns pivot:
    pivot = lo
    for i in {lo..hi - 1}:
        if qs->arr[i] <= qs->arr[hi]:
            swap(?qs->arr[pivot], ?qs->arr[i])
            pivot += 1
    swap(?qs->arr[pivot], ?qs->arr[hi])

def sortrange(qs, range):
    let lo, hi = range let pivot = partition(qs, lo, hi):
        if (pivot - 1) > lo:
            qs->todo |= { (lo, pivot - 1) }
        if (pivot + 1) < hi:
            qs->todo |= { (pivot + 1, hi) }

def sort(qs) returns sorted:
    while qs->todo != {}:
        let range = choose(qs->todo):
            qs->todo -= { range }
            sortrange(qs, range)
    sorted = qs->arr

Figure 13.4. [code/qsort.hny] Iterative qsort() implementation

import qsort, bag

const NITEMS = 4

a = [ choose({1..NITEMS}) for i in {1..choose({1..NITEMS})} ]
testqs = qsort.Qsort(a)
sa = qsort.sort(?testqs)
assert all(sa[i - 1] <= sa[i] for i in {1..len(sa)-1}) # sorted?
assert bag.fromList(a) == bag.fromList(sa) # is it a permutation?

Figure 13.5. [code/qsorttest.hny] Test program for Figure 13.4

Chapter 14. Starvation

from synch import *

def RWlock() returns lock:
    lock = {
            .nreaders: 0, .nwriters: 0, .mutex: Lock(),
            .r_waiting: 0, .r_cleared: 0, .w_waiting: 0, .w_cleared: 0,
            .r_cond: Condition(), .w_cond: Condition()
        }
    
def read_acquire(rw):
    acquire(?rw->mutex)
    if (rw->nwriters == 0) and (rw->w_waiting == 0):
        assert rw->r_waiting == 0
        rw->nreaders += 1
    else:
        rw->r_waiting += 1
        while rw->r_cleared == 0:
            wait(?rw->r_cond, ?rw->mutex)
        rw->r_cleared -= 1
    assert rw->nreaders > 0
    assert rw->nwriters == 0
    release(?rw->mutex)

def read_release(rw):
    acquire(?rw->mutex)
    assert rw->nreaders > 0
    assert rw->nwriters == 0
    rw->nreaders -= 1
    if (rw->nreaders == 0) and (rw->w_waiting > 0):
        rw->w_cleared = rw->nwriters = 1
        rw->w_waiting -= 1
        notify(?rw->w_cond)
    release(?rw->mutex)

Figure 14.1. [code/rwlock_cv_fair.hny] Reader/Writer Lock implementation addressing fairness (part 1)

def write_acquire(rw):
    acquire(?rw->mutex)
    if (rw->nreaders == 0) and (rw->nwriters == 0):
        assert rw->r_waiting == rw->w_waiting == 0
        rw->nwriters = 1
    else:
        rw->w_waiting += 1
        while rw->w_cleared == 0:
            wait(?rw->w_cond, ?rw->mutex)
        rw->w_cleared = 0
    assert rw->nreaders == 0
    assert rw->nwriters == 1
    release(?rw->mutex)

def write_release(rw):
    acquire(?rw->mutex)
    assert rw->nreaders == 0
    assert rw->nwriters == 1
    if rw->r_waiting > 0:
        rw->nwriters = 0
        rw->r_cleared = rw->nreaders = rw->r_waiting
        rw->r_waiting = 0
        notifyAll(?rw->r_cond)
    elif rw->w_waiting > 0:
        rw->w_waiting -= 1
        rw->w_cleared = 1
        notify(?rw->w_cond)
    else:
        rw->nwriters = 0
    release(?rw->mutex)

Figure 14.2. [code/rwlock_cv_fair.hny] Reader/Writer Lock implementation addressing fairness (part 2)

A property is a set of traces. If a program has a certain property, that means that the traces that that program allows are a subset of the traces in the property. So far, we have pursued two properties: mutual exclusion and progress. The former is an example of a safety property---it prevents something "bad" from happening, like a reader and writer thread both acquiring a reader/writer lock. The progress property is an example of a liveness property---guaranteeing that something good eventually happens. Informally (and inexactly), progress states that if no threads are in the critical section, then some thread that wants to enter can.

Progress says that some thread eventually can enter, but it does not prevent a scenario such as the following in which some thread never is able to enter. There are three threads repeatedly trying to enter a critical section using a spinlock. Two of the threads successfully keep entering, alternating, but the third thread never gets a turn. This is an example of starvation. With a spinlock, this scenario could even happen with two threads. Initially both threads try to acquire the spinlock. One of the threads is successful and enters. After the thread leaves, it immediately tries to re-enter. This state is identical to the initial state, and there is nothing that prevents the same thread from acquiring the lock yet again.

Peterson's Algorithm (Figure 5.6) does not suffer from starvation, thanks to the turn variable that alternates between 0 and 1 when two threads are contending for the critical section. Ticket locks (Figure 8.3) are also free from starvation.

While spinlocks suffer from starvation, it is a uniform random process and each thread has an equal chance of entering the critical section. Thus the probability of starvation is exponentially vanishing. We shall call such a solution fair (although it does not quite match the usual formal nor vernacular concepts of fairness).

Unfortunately, such is not the case for the reader/writer solution that we presented in Figure 13.1. Consider this scenario: there are two readers and one writer. One reader is in the critical section while the writer is waiting. Now the second reader tries to enter and is able to. The first reader leaves. We are now in a similar situation as the initial state with one reader in the critical section and the writer waiting, but it is not the same reader. Unfortunately for the writer, this scenario can repeat itself indefinitely. So, even if neither reader was in the critical section all of the time, and the second reader arrived well after the writer, the writer never had a chance.

Figure 14.1 and Figure 14.2 present a fair implementation of a read/writer lock. When there is contention between readers and writers, readers and writers end up alternating entering the critical section. While readers can still starve other readers and writers can still starve other writers, readers can no longer starve writers nor vice versa. Other fairness is based on the fairness of scheduling the threads themselves.

Besides the number of readers and writers in the critical section, the implementation keeps track of the number of readers and writers that are waiting to enter the critical section, and the number of readers and writers that are cleared to enter the critical section.

Starting with read_acquire in Figure 14.1, the thread first checks to see if there are no writers in the critical section and there are no writers waiting to enter the critical section (Line 14). If so, the thread can enter the critical section. Otherwise it waits until it is cleared to enter the critical section. read_release checks to see if the thread is the last one to leave the critical section and there are writers waiting to enter. If so, it clears one of the writers.

write_acquire in Figure 14.2 first checks to see that there are no other threads in the critical section. If so, the thread enters the critical section. Otherwise, the thread waits until it is cleared to enter. write_release first checks to see if there are readers waiting to enter. If so, it clears all the readers. If not, it checks to see if there are writers waiting to enter. If so, it clears one of the writers.

Exercises

Chapter 15. Deadlock

from synch import Lock, acquire, release

const N = 5

forks = [Lock(),] * N

def diner(which):
    let left, right = (which, (which + 1) % N):
        while choose({ False, True }):
            acquire(?forks[left])
            acquire(?forks[right])
            # dine
            release(?forks[left])
            release(?forks[right])
            # think

for i in {0..N-1}:
    spawn diner(i)

Figure 15.1. [code/Diners.hny] Dining Philosophers

When multiple threads are synchronizing access to shared resources, they may end up in a deadlock situation where one or more of the threads end up being blocked indefinitely because each is waiting for another to give up a resource. The famous Dutch computer scientist Edsger W. Dijkstra illustrated this using a scenario he called "Dining Philosophers."

Imagine five philosopers sitting around a table, each with a plate of food in front of them and a fork between every two plates. Each philosopher requires two forks to eat. To start eating, a philosopher first picks up the fork on the left, then the fork on the right. Each philosopher likes to take breaks from eating to think for a while. To do so, the philosopher puts down both forks. Each philosopher repeats this procedure. Dijkstra had them repeating this for ever, but for the purposes of this book, philosophers can---if they wish---leave the table when they are not using any forks.

Figure 15.1 implements the dining philosophers in Harmony, using a thread for each philosopher and a lock for each fork. If you run it, Harmony complains that the execution may not be able to terminate, with all five threads being blocked trying to acquire the lock.

  • Do you see what the problem is?
  • Does it depend on N, the number of philosophers?
import synch

const N = 5

mutex = synch.Lock()
forks = [False,] * N
conds = [synch.Condition(),] * N

def diner(which):
    let left, right = (which, (which + 1) % N):
        while choose({ False, True }):
            synch.acquire(?mutex)
            while forks[left] or forks[right]:
                if forks[left]:
                    synch.wait(?conds[left], ?mutex)
                if forks[right]:
                    synch.wait(?conds[right], ?mutex)
            assert not (forks[left] or forks[right])
            forks[left] = forks[right] = True
            synch.release(?mutex)
            # dine
            synch.acquire(?mutex)
            forks[left] = forks[right] = False
            synch.notify(?conds[left])
            synch.notify(?conds[right])
            synch.release(?mutex)
            # think

for i in {0..N-1}:
    spawn diner(i)

Figure 15.2. [code/DinersCV.hny] Dining Philosophers that grab both forks at the same time

There are four conditions that must hold for deadlock to occur, sometimes known as the Coffman Conditions [9]:
  1. Mutual Exclusion: each resource can only be used by one thread at a time:
  2. Hold and Wait: each thread holds resources it already allocated while it waits for other resources that it needs;
  3. No Preemption: resources cannot be forcibly taken away from threads that allocated them;
  4. Circular Wait: there exists a directed circular chain of threads, each waiting to allocate a resource held by the next.
Preventing deadlock thus means preventing that one of these conditions occurs. However, mutual exclusion is not easily prevented in general because many resources allow only one process to use it at a time (although, for some resources it is possible, as demonstrated in Chapter 21). Havender proposed the following techniques that avoid the remaining three conditions [21]:

To implement a No Hold and Wait solution, a philosopher would need a way to lock both the left and right forks at the same time. Locks do not have such an ability, so we re-implement the Dining Philosophers using condition variables that allow one to wait for arbitrary application-specific conditions. Figure 15.2 demonstrates how this might be done. We use a single mutex for the diners, and, for each fork, a boolean and a condition variable. The boolean indicates if the fork has been taken. Each diner waits if either the left or right fork is already taken. But which condition variable to wait on? The code demonstrates an important technique to use when waiting for multiple conditions. The condition in the while statement is the negation of the condition that the diner is waiting for and consists of two disjuncts. Within the while statement, there is an if statement for each disjunct. The code waits for either or both forks if necessary. After that, it goes back to the top of the while loop.

A common mistake is to write the following code instead:

while forks[left]:
    synch.wait(?conds[left], ?mutex)
while forks[right]:
    synch.wait(?conds[right], ?mutex)
  • Can you see why this does not work? What can go wrong?
  • Run it through Harmony in case you are not sure!
The Preemption approach suggested by Havender is to allow threads to back out. While this could be done, this invariably leads to a busy waiting solution where a thread keeps obtaining locks and releasing them again until it finally is able to get all of them.

The No Circular Waiting approach is to prevent a cycle from forming, with each thread waiting for the next thread on the cycle. We can do this by establishing an ordering among the resources (in this case the forks) and, when needing more than one resource, always acquiring them in order. In the case of the philosopers, they could prevent deadlock by always picking up the lower numbered fork before the higher numbered fork, like so:

if left < right:
    synch.acquire(?forks[left])
    synch.acquire(?forks[right])
else:
    synch.acquire(?forks[right])
    synch.acquire(?forks[left])
or, equivalently, like so:

synch.acquire(?forks[min(left, right)])
synch.acquire(?forks[max(left, right)])
This completes all the Havender methods. There is, however, another approach, which is sometimes called deadlock avoidance instead of deadlock prevention. In the case of the Dining Philosophers, we want to avoid the situation where each diner picks up a fork. If we can prevent more than four diners from starting to eat at the same time, then we can avoid the conditions for deadlock from ever happening. Figure 15.3 demonstrates this concept. It uses a counting semaphore to restrict the number of diners at any time to four. A counting semaphore is a resource that can be procured a given number of times. It is supported by the synch module. The P or "procure" operation acquires a counting semaphore, blocking if too many threads have procured it already. The V or "vacate" operation releases the semaphore.

This avoidance technique can be generalized using something called the Banker's Algorithm [14], but it is outside the scope of this book. The problem with these kinds of schemes is that one needs to know ahead of time the set of threads and what the maximum number of resources is that each thread wants to allocate, making them usually impractical.

from synch import *

const N = 5

forks = [Lock(),] * N
sema = Semaphore(N - 1)     # can be procured up to N-1 times

def diner(which):
    let left, right = (which, (which + 1) % N):
        while choose({ False, True }):
            P(?sema)                # procure counting semaphore
            acquire(?forks[left])
            acquire(?forks[right])
            # dine
            release(?forks[left])
            release(?forks[right])
            V(?sema)                # vacate counting semaphore
            # think

for i in {0..N-1}:
    spawn diner(i)

Figure 15.3. [code/DinersAvoid.hny] Dining Philosophers solutions that avoids getting into a deadlock by allowing at most N-1 philosophers to start eating at a time

Exercises

from synch import Lock, acquire, release

const MAX_BALANCE = 2
const N_ACCOUNTS = 2
const N_THREADS = 2

accounts = [ { .lock: Lock(), .balance: choose({0..MAX_BALANCE})}
                            for i in {1..N_ACCOUNTS} ]

def transfer(a1, a2, amount) returns success:
    acquire(?accounts[a1].lock)
    if amount <= accounts[a1].balance:
        accounts[a1].balance -= amount 
        acquire(?accounts[a2].lock)
        accounts[a2].balance += amount 
        release(?accounts[a2].lock)
        success = True
    else:
        success = False
    release(?accounts[a1].lock)

def thread():
    let a1 = choose({0..N_ACCOUNTS-1})
    let a2 = choose({0..N_ACCOUNTS-1} - { a1 }):
        transfer(a1, a2, choose({1..MAX_BALANCE}))

for i in {1..N_THREADS}:
    spawn thread()

Figure 15.4. [code/bank.hny] Bank accounts

Chapter 16. Actors and Message Passing

Figure 16.1. Depiction of three actors. The producer does not receive messages.

import synch

const NCLIENTS = 3

server_queue = synch.Queue()

def server():
    var counter = 0
    while True:
        let q = synch.get(?server_queue):   # await request
            synch.put(q, counter)           # send response
            counter += 1

def client(client_queue):
    synch.put(?server_queue, client_queue)      # send request
    let response = synch.get(client_queue):     # await response
        print(response)

spawn eternal server()

alice_queue = synch.Queue()
spawn client(?alice_queue)
bob_queue = synch.Queue()
spawn client(?bob_queue)
charlie_queue = synch.Queue()
spawn client(?charlie_queue)

Figure 16.2. [code/counter.hny] An illustration of the actor approach

So far we have focused on using critical sections to achieve synchronization between threads accessing shared data structures. Some programming languages favor a different way using so-called actors [23, 1]. Actors are threads that have only private memory and communicate through message passing. See Figure 16.1 for an illustration. Given that there is no shared memory in the actor model (other than the message queues, which have built-in synchronization), there is no need for critical sections. Instead, some sequential thread owns a particular piece of data and other threads access it by sending request messages to the thread and optionally waiting for response messages. Each thread handles one message at a time, serializing all access to the data it owns. As message queues are FIFO (First-In-First-Out), starvation is prevented as well.

The actor synchronization model is popular in a variety of programming languages, including Erlang and Scala. Actor support is also available through popular libraries such as Akka, which is available for various programming languages. In Python, Java, and C/C++, actors can be easily emulated using threads and synchronized queues (aka blocking queues) for messaging. Each thread would have one such queue for receiving messages. Dequeuing from an empty synchronized queue blocks the thread until another thread enqueues a message on the queue.

The synch library supports a synchronized message queue, similar to the Queue object in Python. Its interface is as follows:

For those familiar with counting semaphores: note that a Queue behaves much like a zero-initialized counting semaphore. put is much like V, except that it is accompanied by data. get is much like P, except that it also returns data. Thus, synchronized queues can be considered a generalization of counting semaphores.

Figure 16.2 illustrates the actor approach. There are three client threads that each want to be assigned a unique identifier from the set { 0, 1, 2 }. Normally one would use a shared 0-initialized counter and a lock. Each client would acquire the lock, get the value of the counter and increment it, and release the lock. Instead, in the actor approach the counter is managed by a separate server thread. The server never terminates, so it is spawned with the keyword eternal to suppress non-terminating state warnings. Each client sends a request to the server, consisting in this case of simply the queue to which the server must send the response. The server maintains a local, zero-initialized counter variable. Upon receiving a request, it returns a response with the value of the counter and increments the counter. No lock is required.

This illustration is an example of the client/server model. Here a single actor implements some service, and clients send request messages and receive response messages. The model is particularly popular in distributed systems, where each actor runs on a separate machine and the queues are message channels. For example, the server can be a web server, and its clients are web browsers.

Exercises

Chapter 17. Barrier Synchronization

import barrier

const NTHREADS = 3
const NROUNDS = 4

barr = barrier.Barrier(NTHREADS)
round = [0,] * NTHREADS
in_b = [False,] * NTHREADS

invariant all((in_b[i] and in_b[j]) => (round[i] == round[j])
        for i in { 0 .. NTHREADS - 1 } for j in { 0 .. NTHREADS - 1 })

def thread(self):
    for r in {0..NROUNDS-1}:
        round[self] += 1
        barrier.bwait(?barr)
        in_b[self] = True
        pass
        in_b[self] = False

for i in {0..NTHREADS-1}:
    spawn thread(i)

Figure 17.1. [code/barrier_test1.hny] Test program for Figure 17.2

from synch import *

def Barrier(required) returns barrier:
    barrier = {
        .mutex: Lock(), .cond: Condition(),
        .required: required, .left: required, .cycle: 0
    }

def bwait(b):
    acquire(?b->mutex)
    b->left -= 1
    if b->left == 0:
        b->cycle = (b->cycle + 1) % 2
        b->left = b->required
        notifyAll(?b->cond)
    else:
        let cycle = b->cycle:
            while b->cycle == cycle:
                wait(?b->cond, ?b->mutex)
    release(?b->mutex)

Figure 17.2. [code/barrier.hny] Barrier implementation

import barrier

const NTHREADS = 3
const NROUNDS = 4

barr = barrier.Barrier(NTHREADS)
round = [0,] * NTHREADS
phase = 0

def thread(self):
    for r in {0..NROUNDS-1}:
        round[self] += 1
        if self == 0:                # coordinator prepares
            phase += 1
        barrier.bwait(?barr)         # enter parallel work
        assert round[self] == phase
        barrier.bwait(?barr)         # exit parallel work

for i in {0..NTHREADS-1}:
    spawn thread(i)

Figure 17.3. [code/barrier_test2.hny] Demonstrating the double-barrier pattern

from barrier import *

const N = 5     # size of list to be sorted

list = [ choose({ 1 .. N }) for i in { 1 .. N } ]

finally all(list[i-1] <= list[i] for i in { 1 .. N - 1 })

const NTHREADS = N / 2
bar = Barrier(NTHREADS)
count = 0                   # to detect termination

def fetch_and_increment(p): # atomic increment
    atomically !p += 1

def sorter(i):
    var sorted = False
    var oldcount = 0
    while not sorted:
        # Even phase
        if list[i - 1] > list[i]:
            list[i - 1], list[i] = list[i], list[i - 1]
            fetch_and_increment(?count)

        bwait(?bar)

        # Odd phase
        if (i < (N - 1)) and (list[i] > list[i + 1]):
            list[i], list[i + 1] = list[i + 1], list[i]
            fetch_and_increment(?count)

        bwait(?bar)

        # Sorted if nobody swapped anything
        sorted = count == oldcount
        oldcount = count

        bwait(?bar)

for k in { 0 .. NTHREADS - 1 }:
    spawn sorter((2*k) + 1)

Figure 17.4. [code/bsort.hny] Parallel bubble sort

Barrier synchronization is a problem that comes up in high-performance parallel computing. The Harmony model checker uses it internally to synchronize its worker threads. A barrier is almost the opposite of a critical section: the intention is to get a group of threads to run some code at the same time, instead of having them execute it one at a time. More precisely, with barrier synchronization, the threads execute in rounds. Between each round, there is a so-called barrier where threads wait until all threads have completed the previous round and reached the barrier---before they start the next round. For example, in an iterative matrix algorithm, the matrix may be cut up into fragments. During a round, the threads run concurrently, one for each fragment. The next round is not allowed to start until all threads have completed processing their fragment.

A barrier is used as follows:

Figure 17.1 is a test program for barriers. It uses an integer array round with one entry per thread, all initialized to 0. It also uses a boolean array in_b with one entry per thread. Each thread, in a loop, increments its round number and then sets its in_b entry to True. The pass statement models the code that each thread must execute after passing the barrier. After this statement, each thread sets its in_b entry back to False. If bwait() is implemented correctly, then whenever two threads are executing the pass statement, they should have the same round number. This is captured by the invariant in the code.

When implementing a barrier, a complication to worry about is that a barrier can be used over and over again. If this were not the case, then a solution based on a lock, a condition variable, and a counter initialized to the number of threads could be used. The threads would decrement the counter and wait on the condition variable until the counter reaches 0.

Figure 17.2 shows how one might implement a reusable barrier. Besides a counter .left that counts how many threads still have to reach the barrier, it uses a counter .cycle that is incremented after each use of the barrier---to deal with the complication above. The last thread that reaches the barrier restores .left to the number of threads (.required) and increments the cycle counter. The other threads are waiting for the cycle counter to be incremented. The cycle counter is allowed to wrap around---in fact, a single bit suffices for the counter.

A common design pattern with barriers in parallel programs, demonstrated in Figure 17.3, is to use the barrier twice in each round. Before a round starts, one of the threads---let's call it the coordinator---sets up the work that needs to be done while the other threads wait. Then all threads do the work and go on until they reach a second barrier. The second barrier is used so the coordinator can wait for all threads to be done before setting up the work for the next round.

Figure 17.4 shows an implementation of a parallel sorting algorithm based on bubblesort. The threads (one for every two elements) go through three phases. In the first phase, the threads swap entries 0 and 1, 2 and 3, ... as needed. In the second phase, they swap entries 1 and 2, 3 and 4, ... as needed. Finally, they check if any elements were swapped. If so, they repeat the phases.

Chapter 18. Advanced Barrier Synchronization

from synch import *

def RollerCoaster(nseats): result = {
    .mutex: Lock(), .nseats: nseats, .entered: 0, .left: nseats,
    .empty: Condition(), .full: Condition()
}

def enter(b):
    acquire(?b->mutex)
    while b->entered == b->nseats:  # wait for car to empty out
        wait(?b->empty, ?b->mutex)
    b->entered += 1
    if b->entered != b->nseats:     # wait for car to fill up
        while b->entered < b->nseats:
            wait(?b->full, ?b->mutex)
    else:                           # car is ready to go
        b->left = 0
        notifyAll(?b->full)         # wake up others waiting in car
    release(?b->mutex)

def exit(b):
    acquire(?b->mutex)
    b->left += 1
    if b->left == b->nseats:    # car is empty
        b->entered = 0
        notifyAll(?b->empty)    # wake up riders wanting to go
    release(?b->mutex)

Figure 18.1. [code/rollercoaster.hny] Modeling a roller coaster

In the previous chapter, we presented barrier synchronization for the case where the number of threads is exactly the number of threads that the barrier is waiting for. However, when modeling certain scenarios it is often the case that there are more threads than the barrier is waiting for. Consider, for example, the following example taken from an exam:

Given is a roller coaster with a single car. Safety precautions require the following: each ride requires that all seats are filled. That is, partially filled cars are not allowed to ride. After a ride, the car must completely empty out before new riders are allowed to enter the car. Write code to model this.
In such problems there is a shared resource (in this case the car) that is used repeatedly in what we shall call rounds. In each round, the resource usually has to be used to capacity. The users (the riders in this case) can be modeled by threads. It is often useful to think of these problems as having a section that threads must enter and exit, much like a critical section. In this case, entering the section means going for a ride, and exiting the section means being done with the ride.

Entering a section consists of two phases. A thread should first wait until any threads that already entered the section have exited. Then, it should wait until conditions are met, in this case until there are enough users (threads) to fill up the car. Not following this recipe might lead to the following problem: as the car is emptying out, new riders may try to enter the car because it is no longer full.

In the first phase, a thread needs to be able to detect that all threads that have entered previously have now left. In the second phase, a thread should check to see if it is the last thread to enter the second phase. If not, it should wait. If so, it should notify the ones that are waiting. When leaving the section, a thread should check if it is the last thread to leave. If so, it should notify the threads that are waiting in the first phase.

Figure 18.1 presents a solution. Method RollerCoaster(nseats) returns the initial state of a roller coaster with nseats seats to a car. Method enter(b) takes a pointer b to a roller coaster variable. Following the recipe sketched above, a thread that calls enter(b) blocks until

  1. all previous riders have left the car, and
  2. the car has filled up again.
After "doing the ride," the thread must call method exit(b).

An important part of solving the problem is figuring out what state must be kept about the resource. The code keeps track of the number of threads that have entered the car (including the threads that have not yet entered the section and the threads that have left the section) and the number of threads that have left the section. The first number is only reset after that last thread has left the section. The second number is reset after the last thread has entered the car.

In this example exam question there was just a single instance of a resource, and all users were in some sense the same in that they are interchangeable. In the problem set below we include a problem in which there are multiple resources and a problem in which not all users are the same. Nonetheless, a similar approach can be used as illustrated above.

Exercises

Chapter 19. Example: A Concurrent File Service

This chapter presents a concurrent file service to illustrate many of the techniques we have discussed inside a single example. We will cover the specification of such a service as well as that of a disk, and show how the specification can be implemented on top of the disk. The file service implementation will use a collection of worker threads synchronizing using both ordinary locks and reader/writer locks. Clients of the file service implementation (threads themselves) use blocking synchronized queues to communicate with the workers. The example will also illustrate modular model checking, as the disk, the locks, and the queues are only specified.

from alloc import malloc

def File(n_files) returns fs:
    fs = malloc([ [], ] * n_files)

def read(fs, ino, offset) returns result:
    atomically result = (!fs)[ino][offset] if 0 <= offset < len (!fs)[ino] else None

def write(fs, ino, offset, data) returns result:
    atomically:
        let n = len (!fs)[ino]:
            if 0 <= offset <= n:
                (!fs)[ino][offset] = data
            else:
                (!fs)[ino] += ([ None, ] * (offset - n)) + [data,]
        result = "ok"
            

def delete(fs, ino) returns result:
    atomically:
        (!fs)[ino] = []
        result = "ok"

Figure 19.1. [code/file.hny] Specification of the file system

In practice, there are many aspects to a file system. We will focus here on a low-level notion of a file, where the file abstraction is identified by a number (the so-called "inode number" or ino) and consists of a sequence of fixed-sized blocks. In our abstraction, each block holds an arbitrary Harmony value. If you want to remain more truthful to reality, you might only store lists of numbers of fixed length in a block, representing a block of bytes. A more complete file system would keep track of various additional information about each file, such as its size in bytes, its owner, its access rights, and when the file was last modified. Moreover, a system of folders (aka directories) built on top of the files would associate user-readable names to the files. Such abstractions can be modularly implemented on top.

Figure 19.1 shows the file system interface. Just like in Unix-like file systems, you have to specify the (maximum) number of files when you initialize the file system. File(n) returns a handle that must be passed to file operations, where n is the maximum number of files. For our example, we have only included three operations on files. read(fs, ino, offset) returns the block of file ino at the given offset, or None if nothing has been stored at that offset. write(fs, ino, offset, data) stores data at the given offset in file ino. If needed, the file is grown to include the given offset. "Holes" (unwritten blocks) are plugged with None values. delete(fs, ino) deletes a file.

from file import *

const N_FILES = 2
const MAX_FILE_SIZE = 2

const N_READ = 1
const N_WRITE = 1
const N_DELETE = 1

system = File(N_FILES)

def read_test(i):
    let ino = choose { 0 .. N_FILES - 1 }
    let offset = choose { 0 .. MAX_FILE_SIZE - 1 }:
        print(i, "read", ino, offset)
        let data = read(system, ino, offset):
            print(i, "read done", ino, offset, data)

def write_test(i):
    let ino = choose { 0 .. N_FILES - 1 }
    let offset = choose { 0 .. MAX_FILE_SIZE - 1 }:
        print(i, "write", ino, offset)
        write(system, ino, offset, i)
        print(i, "write done", ino, offset)

def delete_test(i):
    let ino = choose { 0 .. N_FILES - 1 }:
        print(i, "delete", ino)
        delete(system, ino)
        print(i, "delete done", ino)

for i in { 1 .. N_READ }:
    spawn read_test(i)
for i in { 1 .. N_WRITE }:
    spawn write_test(i)
for i in { 1 .. N_DELETE }:
    spawn delete_test(i)

Figure 19.2. [code/file_btest.hny] Test program for a concurrent file system

Figure 19.2 shows how the file system may be tested and illustrates how the file system interface is used. As shown in Chapter 10, we can test a concurrent system by checking all interleavings of some selection of its operations. We can do this for both the specification and implementation of the file system and check that every behavior of the implementation is also a behavior of the specification.

from alloc import malloc

def new(n_blocks) returns disk:
    disk = malloc([ None, ] * n_blocks)

def getsize(disk) returns size:
    size = len !disk

def read(disk, bno) returns contents:
    contents = (!disk)[bno]

def write(disk, bno, block):
    (!disk)[bno] = block

Figure 19.3. [code/disk.hny] Specification of a disk

To store the file system, we will use a disk. Like a file, a disk is an array of blocks, albeit one of fixed length. Figure 19.3 specifies a disk. The interface is similar to that of files, except that there are no inode numbers. Each block is identified by its offset or block number. For example, disk_read(disk, bno) retrieves the value of block bno on the given disk. Note that operations are not atomic. For example, two threads concurrently writing the same block can result in chaos. It is up to the file system implementation that this does not happen. Of course, more than one thread can read the same block at the same time. This is only a specification of a disk---an implementation may want to include a cache of blocks for good performance. Certain operations may also be re-ordered to further improve performance.

For the implementation, we will use a simplified Unix file system. In a Unix file system, the disk is subdivided into three parts (Figure 19.4(a)):

  1. The superblock, at offset 0.
  2. An array of fixed-sized inodes, stored in a range of m disk blocks starting at block 1. An inode number indexes into this array. Each inode maintains some information about a file, including the size and where to find the data.
  3. The remaining blocks, starting at offset 1 + m , which will either store data, metadata, or are free. Metadata blocks contain a list of block numbers.
The superblock specifies the number of inode blocks and the start of a linked list of free blocks.

Figure 19.4. The file system data structure: (a) disk layout (1 superblock, n blocks, m inode blocks, 4 inodes per block); (b) inode for a file with 3 data blocks

In this simplified file system, each inode contains just three pieces of information about the file (Figure 19.4(b)): the size of the file in blocks, the block number of the first data block, and the block number of an indirect block---a metadata block that contains block numbers of additional data blocks. Any block number may be None to indicate a hole in the file (unused blocks). Note that a Unix file is essentially implemented as a tree of blocks.

Free blocks are maintained in a simple linked list. The superblock contains the block number of the first free block, which every block on the free list contains the block number of the next free block, or None for the end of the list. (In a more realistic Unix file system, each block on the free list would maintain pointers to additional free blocks.)

Note that the entire file system data structure is essentially a tree of blocks, with the superblock acting as the root of the tree. The superblock points to the free list and the inode blocks. The inode blocks point to the blocks that are allocated. An invariant of the data structure is that all blocks are in the tree and each block (except for the superblock) is pointed to exactly once. The invariant may not hold while the data structure is being updated. For example, temporarily a block may be both on the free list but also be part of an inode, or a block may not be referenced at all.

Figure 19.5 shows the modules that the file system implementation will use and some constants. The implementation uses the actor model (Chapter 16)---the synch module provides blocking multi-reader/multi-writer queues that the actors will use for messaging. The file server itself is implemented as a multithreaded actor. The threads synchronize using a plain lock for the free list and reader/writer locks for each of the inode blocks. N_BLOCKS specifies the size of the disk to be used in blocks. INODES_PER_BLOCK specifies how many inodes fit in an inode block. INDIR_PER_BLOCK specifies how many block numbers fit in a metadata block. Note that the maximum file size is this simplified file system is 1 + INDIR_PER_BLOCK blocks. In a more realistic Unix file system, indirect blocks can point to other indirect blocks, allowing for much larger files. N_WORKERS specifies the number of worker threads or actors.

from synch import *             # shared queue for file server and lock for superblock
from rwlock import *            # read/write locks for inode blocks
from alloc import *             # malloc/free
from list import subseq         # list slicing
import disk                     # reading and writing blocks

const N_BLOCKS = 10             # total number of disk blocks
const INODES_PER_BLOCK = 2      # number of inodes that fit in a block
const INDIR_PER_BLOCK  = 4      # number of block pointers per block

Figure 19.5. [code/file_inode.hny] File system implementation preamble

def File(n_files) returns req_q:
    req_q = malloc(Queue())
    let n_inode_blocks = (n_files + (INODES_PER_BLOCK - 1)) / INODES_PER_BLOCK
    let n_workers = 2
    let d = disk.new(N_BLOCKS):
        # Initialize the file system on disk
        fs_init(d, n_inode_blocks)

        # Allocate the in-memory shared state of the file server
        let fs_state = malloc({ .next: None,
                .disk: d, .req_q: req_q, .free_lock: Lock(),
                .n_inode_blocks: n_inode_blocks,
                .ib_locks: [ RWlock(), ] * n_inode_blocks }):

            # Start worker threads to handle client requests
            for i in { 1 .. n_workers }:
                spawn eternal fs_worker(fs_state, i)

def read(req_q, ino, offset) returns result:
    let res_q = malloc(Queue()):
        put(req_q, { .type: "read", .ino: ino, .offset: offset, .q: res_q })
        result = get(res_q)
        free(res_q)

def write(req_q, ino, offset, data) returns result:
    let res_q = malloc(Queue()):
        put(req_q, { .type: "write", .ino: ino, .offset: offset, .data: data, .q: res_q })
        result = get(res_q)
        free(res_q)

def delete(req_q, ino) returns result:
    let res_q = malloc(Queue()):
        put(req_q, { .type: "delete", .ino: ino, .q: res_q })
        result = get(res_q)
        free(res_q)

Figure 19.6. [code/file_inode.hny] File system interface implementation

Figure 19.6 shows the implementation of the file system interface, which is the same interface as the specification (Figure 19.1) but a different implementation. File(), instead of returning an object containing an array of files, returns an object containing a queue to communicate with the file system worker threads. The first argument is the maximum number of files. The number of inode blocks can be computed from the number of files by dividing by INODES_PER_BLOCK and rounding up. The function also initializes a disk object using fs_init and then allocates some shared state to be used by the worker threads. The shared state includes the following information:

Finally, File() spawns the fs_worker() threads that will handle requests.

The remaining interfaces simply put a request on the request queue and wait for a response on another queue res_q that is allocated just for this purpose. Note that the request queue has concurrent producers (the clients) and concurrent consumers (the worker threads). The response queues are single use only and have a single producer (a worker thread) and a single consumer (the client).

# A worker thread handles client requests
def fs_worker(fs_state_init, id):
    var fs_state = fs_state_init
    while True:
        let req = get(fs_state->req_q)
        let ib = req.ino / INODES_PER_BLOCK:
            if req.type in { "write", "delete" }:
                write_acquire(?fs_state->ib_locks[ib])
                fs_update_request(fs_state, id, req, ib)
                write_release(?fs_state->ib_locks[ib])
                put(req.q, "ok")
            else:
                assert req.type == "read"
                read_acquire(?fs_state->ib_locks[ib])
                let response = fs_read_request(fs_state, req, ib):
                    read_release(?fs_state->ib_locks[ib])
                    put(req.q, response)

Figure 19.7. [code/file_inode.hny] File server and worker threads

Each worker thread executes an infinite loop, obtaining client requests and handling them. Each request is for a particular inode. The worker first determines which inode block needs to be locked. Depending on the request, it obtains either a read lock or a write lock on the block. In practice, files are read much more frequently than written, so reader/writer locks can significantly improve the potential for concurrent access compared to regular locks. The requests themselves are handled in the fs_query_request() and fs_update_request() methods, which we will describe below.

# Initialize the file system by writing the superblock, the free list, and
# the i-node blocks,
def fs_init(d, n_inode_blocks):
    # Initialize the i-node blocks
    for i in { 1 .. n_inode_blocks }:
        disk.write(d, i, [ { .direct: None, .indir: None, .size: 0 }, ] * INODES_PER_BLOCK)

    # Free the data blocks
    var free_list = None
    for i in { n_inode_blocks + 1 .. N_BLOCKS - 1 }:
        disk.write(d, i, free_list)
        free_list = i

    # Write the superblock
    disk.write(d, 0, { .n_inode_blocks: n_inode_blocks, .free_list: free_list })

Figure 19.8. [code/file_inode.hny] File system initialization

Figure 19.8 shows how the disk is initialized with a fresh file system. The superblock is first initialized with the number of inode blocks and a linked list of free blocks. Next, the inode blocks are initialized, each with an empty file.

# Write to the disk
def do_write(fs_state, bno, block):
    disk.write(fs_state->disk, bno, block)

# Allocate a disk block.
def fs_alloc(fs_state) returns bno:
    acquire(?fs_state->free_lock)
    var super = disk.read(fs_state->disk, 0)
    bno = super.free_list
    if bno != None:
        super.free_list = disk.read(fs_state->disk, bno)
        do_write(fs_state, 0, super)
    release(?fs_state->free_lock)

# Free block bno.
def fs_free(fs_state, bno):
    acquire(?fs_state->free_lock)
    var super = disk.read(fs_state->disk, 0)
    do_write(fs_state, bno, super.free_list)
    super.free_list = bno
    do_write(fs_state, 0, super)
    release(?fs_state->free_lock)

# Read inode ino in block ib.
def fs_get_inode(fs_state, ib, ino) returns inode:
    let inode_block = disk.read(fs_state->disk, 1 + ib):
        inode = inode_block[ino % INODES_PER_BLOCK]

# Write inode ino in block ib.
def fs_put_inode(fs_state, ib, ino, inode):
    var inode_block = disk.read(fs_state->disk, 1 + ib)
    inode_block[ino % INODES_PER_BLOCK] = inode
    do_write(fs_state, 1 + ib, inode_block)

Figure 19.9. [code/file_inode.hny] File system utility functions

Figure 19.9 contains the code for allocating and freeing blocks. The methods first acquire the free list lock and then do simple linked list operations. Block allocation can be made much more efficient if each worker thread maintained a small cache of free blocks that it can allocate from without having to coordinate with the other workers.

# Handle a read request.  A read lock on i-node block ib has been acquired.
def fs_read_request(fs_state, req, ib) returns result:
    # Read the inode block and extract the inode
    let inode = fs_get_inode(fs_state, ib, req.ino):
        # Read the direct block.  Return None if there is no direct block.
        if req.offset == 0:
            if inode.direct == None:
                result = None
            else:
                result = disk.read(fs_state->disk, inode.direct)

        # Read indirectly.  If there is no indirect block return None
        elif inode.indir == None:
            result = None

        # Read the indirect block and get the pointer to the data block,
        # which may be None.
        else:
            let indir = disk.read(fs_state->disk, inode.indir):
                if indir[req.offset - 1] == None:
                    result = None
                else:
                    result = disk.read(fs_state->disk, indir[req.offset - 1])

Figure 19.10. [code/file_inode.hny] Handling of file read requests

Figure 19.10 shows the code for read-only operations on files, which is currently only reading a block from a file. The method first needs to read the block that contains the inode. Argument ib contains the inode block number, which is computed by dividing the inode number by INODES_PER_BLOCK and adding 1 (because the first inode block is block 1). To get the index of the inode in the block, you need to compute the remainder of that division. Handling of a read request depends on the offset. If the offset is 0, then the request tries to access the data that is in the direct block. Otherwise, it is necessary to read the indirect block first. In any block number is None along the way, the response should be None.

# Handle an update request.  A write lock on i-node block ib has been acquired.
def fs_update_request(fs_state, id, req, ib):
    var inode = fs_get_inode(fs_state, ib, req.ino)
    if req.type == "write":
        if req.offset == 0:
            if inode.direct == None:
                inode.direct = fs_alloc(fs_state)
                inode.size = max(inode.size, 1)
                fs_put_inode(fs_state, ib, req.ino, inode)
            do_write(fs_state, inode.direct, req.data)
        else:
            if inode.indir == None:
                inode.indir = fs_alloc(fs_state)
                inode.size = max(inode.size, req.offset + 1)
                fs_put_inode(fs_state, ib, req.ino, inode)
                let bno = fs_alloc(fs_state)
                let indir = [ bno if i == (req.offset - 1) else None
                                for i in { 0 .. INDIR_PER_BLOCK - 1 } ]:
                    do_write(fs_state, bno, req.data)
                    do_write(fs_state, inode.indir, indir)
            else:
                var indir = disk.read(fs_state->disk, inode.indir)
                if indir[req.offset - 1] == None:
                    indir[req.offset - 1] = fs_alloc(fs_state)
                    do_write(fs_state, inode.indir, indir)
                do_write(fs_state, indir[req.offset - 1], req.data)
                if inode.size <= req.offset:
                    inode.size = req.offset + 1
                    fs_put_inode(fs_state, ib, req.ino, inode)
    else:
        assert req.type == "delete"
        if inode.direct != None:
            fs_free(fs_state, inode.direct)
        if inode.indir != None:
            let indir = disk.read(fs_state->disk, inode.indir):
                for bno in indir:
                    if bno != None:
                        fs_free(fs_state, bno)
        inode.direct = inode.indir = None
        inode.size = 0
        fs_put_inode(fs_state, ib, req.ino, inode)

Figure 19.11. [code/file_inode.hny] Handling of file write and delete requests

Finally, Figure 19.11 contains the code to write to a file or to delete a file. The write operation first checks if the direct block is updated or a block accessible through an indirect block. If it is the direct block, the code checks to see if the block has already been allocated. Otherwise it needs to check if the indirect block has already been allocated as well as the data block. Data blocks, indirect blocks, and even the inode block may all have to be updated as part of this operation. Deleting a file puts all its blocks back on the free list and clears the inode.

Chapter 20. Interrupts

count = 0
done = False

finally count == 1

def handler():
    count += 1
    done = True

def main():
    trap handler()
    await done

spawn main()

Figure 20.1. [code/trap.hny] How to use trap

count = 0
done = False

finally count == 2

def handler():
    count += 1
    done = True

def main():
    trap handler()
    count += 1
    await done

spawn main()

Figure 20.2. [code/trap2.hny] A race condition with interrupts

from synch import Lock, acquire, release

countlock = Lock()
count = 0
done = False

finally count == 2

def handler():
    acquire(?countlock)
    count += 1
    release(?countlock)
    done = True

def main():
    trap handler()
    acquire(?countlock)
    count += 1
    release(?countlock)
    await done

spawn main()

Figure 20.3. [code/trap3.hny] Locks do not work with interrupts

count = 0
done = False

finally count == 2

def handler():
    count += 1
    done = True

def main():
    trap handler()
    setintlevel(True)
    count += 1
    setintlevel(False)
    await done

spawn main()

Figure 20.4. [code/trap4.hny] Disabling and enabling interrupts

Threads can be interrupted. An interrupt is a notification of some event such as a keystroke, a timer expiring, the reception of a network packet, the completion of a disk operation, and so on. We distinguish interrupts and exceptions. An exception is caused by the thread executing an invalid machine instruction such as divide-by-zero. An interrupt is caused by some peripheral device and can be handled in Harmony. In other words: an interrupt is a notification, while an exception is an error.

Harmony allows modeling interrupts using the trap statement:

trap handler argument
invokes handler argument at some later, unspecified time. Thus you can think of trap as setting a timer. Only one of these asynchronous events can be outstanding at a time; a new call to trap overwrites any outstanding one. Figure 20.1 gives an example of how trap might be used. Here, the main() thread loops until the interrupt has occurred and the done flag has been set. After this, count must equal 1.

But now consider Figure 20.2. The difference with Figure 20.1 is that both the main() and handler() methods increment count. This is not unlike the example we gave in Figure 3.3, except that only a single thread is involved now. And, indeed, it suffers from a similar race condition; run it through Harmony to see for yourself. If the interrupt occurs after main() reads count (and thus still has value 0) but before main() writes the updated value 1, then the interrupt handler will also read value 0 and write value 1. We say that the code in Figure 20.2 is not interrupt-safe (as opposed to not being thread-safe).

You would be excused if you wanted to solve the problem using locks, similar to Figure 7.3. Figure 20.3 shows how one might go about this. But locks are intended to solve synchronization issues between multiple threads. But an interrupt handler is not run by another thread---it is run by the same thread that experienced the interrupt. If you run the code through Harmony, you will find that the code may not terminate. The issue is that a thread can only acquire a lock once. If the interrupt happens after main() acquires the lock but before main() releases it, the handler() method will block trying to acquire the lock, even though it is being acquired by the same thread that already holds the lock.

Instead, the way one fixes interrupt-safety issues is through disabling interrupts temporarily. In Harmony, this can be done by setting the interrupt level of a thread to True using the setintlevel interface. Figure 20.4 illustrates how this is done. Note that it is not necessary to change the interrupt level during servicing an interrupt, because it is automatically set to True upon entry to the interrupt handler and restored to False upon exit. It is important that the main() code re-enables interrupts after incrementing count. What would happen if main() left interrupts disabled?

setintlevel(il) sets the interrupt level to il and returns the prior interrupt level. Returning the old level is handy when writing interrupt-safe methods that can be called from ordinary code as well as from an interrupt handler. Figure 20.5 shows how one might write a interrupt-safe method to increment the counter.

count = 0
done = False

finally count == 2

def increment():
    let prior = setintlevel(True):
        count += 1
        setintlevel(prior)

def handler():
    increment()
    done = True

def main():
    trap handler()
    increment()
    await done

spawn main()

Figure 20.5. [code/trap5.hny] Example of an interrupt-safe method

from synch import Lock, acquire, release

count = 0
countlock = Lock()
done = [ False, False ]

finally count == 4

def increment():
    let prior = setintlevel(True):
        acquire(?countlock)
        count += 1
        release(?countlock)
        setintlevel(prior)

def handler(self):
    increment()
    done[self] = True

def thread(self):
    trap handler(self)
    increment()
    await done[self]

spawn thread(0)
spawn thread(1)

Figure 20.6. [code/trap6.hny] Code that is both interrupt-safe and thread-safe

It will often be necessary to write code that is both interrupt-safe and thread-safe. As you might expect, this involves both managing locks and interrupt levels. To increment count, the interrupt level must be True and countlock must be held. Figure 20.6 gives an example of how this might be done. One important rule to remember is that a thread should disable interrupts before attempting to acquire a lock. Try moving acquire() to the beginning of the increment method and release() to the end of increment and see what happens. This incorrect code can lead to threads getting blocked indefinitely.

(Another option is to use synchronization techniques that do not use locks. See Chapter 21 for more information.)

There is another important rule to keep in mind. Just like locks should never be held for long, interrupts should never be disabled for long. With locks the issue is to maximize concurrent performance. For interrupts the issue is fast response to asynchronous events. Because interrupts may be disabled only briefly, interrupt handlers must run quickly and cannot wait for other events.

Exercises

Chapter 21. Non-Blocking Synchronization

const MAX_ITEMS = 3

sequential back, items
back = 0
items = [None,] * MAX_ITEMS

def inc(pcnt) returns prior:
    atomically:
        prior = !pcnt
        !pcnt += 1

def exch(pv) returns prior:
    atomically:
        prior = !pv
        !pv = None

def produce(item):
    items[inc(?back)] = item

def consume() returns next:
    next = None
    while next == None:
        var i = 0
        while (i < back) and (next == None):
            next = exch(?items[i])
            i += 1

for i in {1..MAX_ITEMS}:
    spawn produce(i)
for i in {1..choose({0..MAX_ITEMS})}:
    spawn consume()

Figure 21.1. [code/hw.hny] Non-blocking queue

So far, we have concentrated on critical sections to synchronize multiple threads. Certainly, preventing multiple threads from accessing certain code at the same time simplifies how to think about synchronization. However, it can lead to starvation. Even in the absence of starvation, if some thread is slow for some reason while being in the critical section, the other threads have to wait for it to finish executing the critical section. Also, using synchronization primitives in interrupt handlers is tricky to get right (Chapter 20) and might be too slow. In this chapter, we will have a look at how one can develop concurrent code in which threads do not have to wait for other threads (or interrupt handlers) to complete their ongoing operations.

As an example, we will revisit the producer/consumer problem. The code in Figure 21.1 is based on code developed by Herlihy and Wing [22]. The code is a "proof of existence" for non-blocking synchronization; it is not necessarily practical. There are two variables. items is an unbounded array with each entry initialized to None. back is an index into the array and points to the next slot where a new value is inserted. The code uses two atomic operations:

Method produce(item) uses inc(?back) to allocate the next available slot in the items array. It stores the item as a singleton tuple. Method consume() repeatedly scans the array, up to the back index, trying to find an item to return. To check an entry, it uses exch() to atomically remove an item from a slot if there is one. This way, if two or more threads attempt to extract an item from the same slot, at most one will succeed.

There is no critical section. If one thread is executing instructions very slowly, this does not negatively impact the other threads, as it would with solutions based on critical sections. On the contrary, it helps them because it creates less contention. Unfortunately, the solution is not practical for the following reasons:

However, in the literature you can find examples of practical non-blocking (aka wait-free) synchronization algorithms.

Exercises

Chapter 22. Alternating Bit Protocol

def net_send(pchan, msg):
    atomically:
        !pchan = msg if choose({ False, True }) else ()

def net_recv(pchan) returns msg:
    atomically:
        msg = !pchan
        !pchan = ()

def app_send(net, seq, payload):
    !seq = 1 - !seq
    let m = { .seq: !seq, .payload: payload }:
        var blocked = True
        while blocked:
            net_send(?net->s_chan, m)
            let response = net_recv(?net->r_chan):
                blocked = (response == ()) or (response.ack != !seq)
            
def app_recv(net, seq) returns payload:
    !seq = 1 - !seq
    var blocked = True
    while blocked:
        let m = net_recv(?net->s_chan):
            if m != ():
                net_send(?net->r_chan, { .ack: m.seq })
                if m.seq == !seq:
                    payload = m.payload
                    blocked = False

Figure 22.1. [code/abp.hny] Alternating Bit Protocol

import abp

const NMSGS = 10

invariant s_seq in { 0, 1 }
invariant r_seq in { 0, 1 }

network = { .s_chan: (), .r_chan: () }
s_seq = r_seq = 0

def sender():
    for i in {1..NMSGS}:
        abp.app_send(?network, ?s_seq, i)
    
def receiver():
    var i = 1
    while True:
        let payload = abp.app_recv(?network, ?r_seq):
            assert payload == i
        i += 1

spawn sender()
spawn eternal receiver()

Figure 22.2. [code/abptest.hny] Test code for alternating bit protocol

A distributed system is a concurrent system in which a collection of threads communicate by message passing, much the same as in the actor model. The most important difference between distributed and concurrent systems is that the former takes failures into account, including failures of threads and failures of shared memory. In this chapter, we will consider two actors, Alice and Bob. Alice wants to send a sequence of application messages to Bob, but the underlying network may lose messages. The network does not re-order messages: when sending messages m1 and m2 in that order, then if both messages are received, m1 is received before m2 . Also, the network does not create messages out of nothing: if message m is received, then message m was sent.

It is useful to create an abstract network that reliably sends messages between threads, much like the FIFO queue in the synch module. For this, we need a network protocol that Alice and Bob can run. In particular, it has to be the case that if Alice sends application messages m1,...,mn in that order, then if Bob receives an application message m, then m = mi for some i and, unless m is the very first message, Bob will already have received application messages m1,...,mi-1 (safety). Also, if the network is fair and Alice sends application message m, then eventually Bob should deliver m (liveness).

The Alternating Bit Protocol is suitable for our purposes. We assume that there are two unreliable network channels: one from Alice to Bob and one from Bob to Alice. Alice and Bob each maintain a zero-initialized sequence number, s_seq and r_seq resp. Alice sends a network message to Bob containing an application message as payload and Alice's sequence number as header. When Bob receives such a network message, Bob returns an acknowledgment to Alice, which is a network message containing the same sequence number as in the message that Bob received.

In the protocol, Alice keeps sending the same network message until she receives an acknowledgment with the same sequence number. This is called retransmission. When she receives the desired sequence number, Alice increments her sequence number. She is now ready to send the next message she wants to send to Bob. Bob, on the other hand, waits until he receives a message matching Bob's sequence number. If so, Bob delivers the payload in the message and increments his sequence number. Because of the network properties, a one-bit sequence number suffices.

We can model each channel as a variable that either contains a network message or nothing (we use () to represent nothing in the model). Let s_chan be the channel from Alice to Bob and r_chan the channel from Bob to Alice. net_send(pchan, m) models sending a message m to !pchan, where pchan is either ?s_chan or ?r_chan. The method places either m (to model a successful send) or () (to model loss) in !pchan. net_recv}(pchan) models checking !pchan for the next message. If there is a message, it sets !pchan to ().

Method app_send(net, seq, msg) retransmits { .seq: !seq, .payload: msg } until an acknowledgment for !seq is received. Method app_recv(net, seq) returns the next successfully received message (with the given sequence number bit) if any. Figure 22.2 shows how the methods may be used to send and receive a stream of NMSGS messages reliably. It has to be bounded, because model checking requires a finite model. Note that the receiver() is spawned as eternal because it does not terminate.

Exercises

Chapter 23. Leader Election

const NIDS = 5      # number of identifiers

network = {}        # the network is a set of messages
leader = 0          # used for checking correctness

def send(msg):
    atomically network |= { msg }

def receive(self) returns msg:
    msg = { (id, found) for (dst, id, found) in network where dst == self }

def processor(self, succ):
    send(succ, self, False)
    var working = True
    while working:
        atomically when exists (id, found) in receive(self):
            if id == self:
                assert self == leader
                send(succ, id, True)
            elif id > self:
                assert self != leader
                send(succ, id, found)
            if found:
                working = False

var ids, nprocs, procs = { 1 .. NIDS }, choose({ 1 .. NIDS }), []
for i in { 0 .. nprocs - 1 }:
    let next = choose(ids):
        ids -= { next }
        procs += [ next, ]
        if next > leader:
            leader = next
for i in { 0 .. nprocs - 1 }:
    spawn processor(procs[i], procs[(i + 1) % nprocs])

Figure 23.1. [code/leader.hny] A leader election protocol on a ring

Leader election is the problem of electing a unique leader in a network of processors. Typically this is challenging because the processors have only limited information. In the version that we present, each processor has a unique identifier. The processors are organized in a ring, but each processor only knows its own identifier and the identifier of its successor on the ring. Having already looked into the problem of how to make the network reliable, we assume here that each processor can reliably send messages to its successor.

The protocol that we present elects as leader the processor with the highest identifier [7] and works in two phases: in the first phase, each processor sends its identifier to its successor. When a processor receives an identifier that is larger than its own identifier, it forwards that identifier to its successor as well. If a processor receives its own identifier, it discovers that it is the leader. That processor then starts the second phase by sending a message around the ring notifying the other processors of the leader's identifier.

Figure 23.1 describes the protocol and its test cases in Harmony. In Harmony, processors can be modeled by threads and there are a variety of ways in which one can model a network using shared variables. Here, we model the network as a set of messages. The send method atomically adds a message to this set. Messages are tuples with three fields: (dst, id, found). dst is the identifier of the destination processor; id is the identifier that is being forwarded; and found is a boolean indicating the second phase of the protocol. The receive(self) method looks for all messages destined for the processor with identifier self.

To test the protocol, the code first chooses the number of processors and generates an identifier for each processor, chosen non-deterministically from a set of NIDS identifiers. It also keeps track in the variable leader of what the highest identifier is, so it can later be checked.

Method processor(self, succ) is the code for a processor with identifier self and successor succ. It starts simply by sending its own identifier to its successor. The processor then loops until it discovers the identifier of the leader in the second phase of the protocol. A processor waits for a message using the Harmony atomically when exists statement. This statement takes the form

atomically when exists v in s: statement block
where s is a set and v is variable that is bound to an element of s. The properties of the statement are as follows: If a processor receives its own identifier, it knows its the leader. The Harmony code checks this using an assertion. In real code the processor could not do this as it does not know the identifier of the leader, but assertions are only there to check correctness. The processor then sends a message to its successor that the leader has been found. If the processor receives an identifier higher than its own, the processor knows that it cannot be the leader. In that case, it simply forwards the message. A processor stops when it receives a message that indicates that the leader has been identified.

Note that there is a lot of non-determinism in the specification, leading to a lot of executions that must be checked. First, every possible permutation of identifiers for the processors is tried. When there are multiple messages to receive by a processor, every possible order is tried (including receiving the same message multiple times). Fortunately, the atomically when exists statement is executed atomically, otherwise the body of the statement could lead to additional thread interleavings. Because in practice the different processors do not share memory, it is not necessary to check those interleavings.

Exercises

Chapter 24. Transactions and Two Phase Commit

const NBANKS = 3
const NCOORDS = 2
const MAX_BALANCE = 1

network = {}

def send(msg):
    atomically network |= { msg }

def bank(self, _balance):
    var balance = _balance
    var status, received = (), {}
    while True:
        atomically when exists req in network - received when req.dst == self:
            received |= { req }
            if req.request == "withdraw":
                if (status != ()) or (req.amount > balance):
                    send({ .dst: req.src, .src: self, .response: "no" })
                else:
                    status = balance
                    balance -= req.amount
                    send({ .dst: req.src, .src: self, .response: "yes", .funds: balance })
            elif req.request == "deposit":
                if status != ():
                    send({ .dst: req.src, .src: self, .response: "no" })
                else:
                    status = balance
                    balance += req.amount
                    send({ .dst: req.src, .src: self, .response: "yes", .funds: balance })
            elif req.request == "commit":
                assert status != ()
                status = ()
            else:
                assert (status != ()) and (req.request == "abort")
                balance, status = status, ()

Figure 24.1. [code/2pc.hny] Two Phase Commit protocol: code for banks

import list

def transfer(self, b1, b2, amt):
    send({ .dst: b1, .src: self, .request: "withdraw", .amount: amt })
    send({ .dst: b2, .src: self, .request: "deposit", .amount: amt })
    atomically let msgs = { m for m in network where m.dst == self }
    when { m.src for m in msgs } == { b1, b2 }:
        if all(m.response == "yes" for m in msgs):
            for m in msgs where m.response == "yes":
                send({ .dst: m.src, .src: self, .request: "commit" })
        else:
            for m in msgs where m.response == "yes":
                send({ .dst: m.src, .src: self, .request: "abort" })

def check(self, total):
    let allbanks = { (.bank, i) for i in { 0 .. NBANKS - 1} }:
        for b in allbanks:
            send({ .dst: b, .src: self, .request: "withdraw", .amount: 0 })
        atomically let msgs = { m for m in network where m.dst == self }
        when { m.src for m in msgs } == allbanks:
            assert all(m.response == "yes" for m in msgs) =>
                        (list.sum(m.funds for m in msgs) == total)
            for m in msgs where m.response == "yes":
                send({ .dst: m.src, .src: self, .request: "abort" })

let balances = [ choose({ 0 .. MAX_BALANCE }) for _ in { 0 .. NBANKS - 1} ]:
    for i in { 0 .. NBANKS - 1 }:
        spawn eternal bank((.bank, i), balances[i])
    for i in { 1 .. NCOORDS }:
        if choose({ "transfer", "check" }) == .transfer:
            let b1 = choose({ (.bank, j) for j in { 0 .. NBANKS - 1}})
            let b2 = choose({ (.bank, j) for j in { 0 .. NBANKS - 1}} - { b1 }):
                spawn transfer((.coord, i), b1, b2, 1)
        else:
            spawn check((.coord, i), list.sum(balances))

Figure 24.2. [code/2pc.hny] Two Phase Commit protocol: code for transaction coordinators

Modern databases support multiple clients concurrently accessing the data. They store data on disk, but we will ignore that in this book. (If you want to model a disk, this is probably best done as a separate thread that maintains the contents of the disk.) The complication we address here is that databases may be sharded, where different parts of the data are stored on different servers. The different servers may even be under different authoritive domains, such as multiple banks maintaining the accounts of their clients.

In database terminology, a transaction is an operation on a database. The operation can be quite complex, and the execution of a transaction should have the following two properties:

We will use as an example a distributed database that maintains bank accounts. For simplicity, we will model this as a collection of banks, each maintaining a single account. There are two kinds of transactions: transfer (similar to Exercise 15.2) and check. In this example, a transfer is a transaction that moves some funds between two accounts. A check is a transaction over all accounts and checks that the sum of the balances across the accounts remains the same.

Executing such transactions must be done with care. Consider what would happen if transactions are not all-or-nothing or are not serialized. A transfer consists of two operations: withdrawing funds from one account and depositing the same amount of funds in the other. These two operations can be done concurrently, but if the withdrawal fails (for example, because there are not sufficient funds in the source account) then the whole transaction should fail and become a no-op. Even if this is not the case, a concurrent check transaction may accidentally witness a state in which either the withdrawal or the deposit happened, but not both. And matters get more complicated with multiple concurrent transfers.

The Two-Phase Commit protocol [20] is a protocol that can be used to implement transactions across multiple database servers---banks in this case. Each transaction has a coordinator that sends a PREPARE message to each of the servers involved in the transaction, asking them to prepare to commit to their part in a particular transaction. A server can either respond with YES if it is ready to commit and will avoid doing anything that might jeopardize this (like committing a conflicting transaction), or with NO if it does not want to participate in the transaction. If all servers respond with YES, then the coordinator can decide to commit the transaction. Otherwise the coordinator must decide to abort the transaction. In the second phase, the servers that responded with YES (if any) must be notified to inform them of the coordinator's decision.

Different transactions can have different coordinators. In our implementation, each bank and each coordinator is a thread. Figure 24.1 shows the code for a bank. The state of a bank consists of the following local variables:

Messages sent to a bank are dictionaries with the following fields: A bank waits for a message destined to itself that it has not yet received. In case of a withdrawal when the bank is idle and there are sufficient funds, the bank saves the current balance in status to indicate an ongoing transaction and what its original balance was. The bank then responds with a .yes message to the coordinator, including the new balance. Otherwise, the bank responds with a .no message. Deposits are similar, except that it is not necessary to check for sufficient funds. In case of a .commit message, the bank changes its status to (), indicating that there is no ongoing transaction. In case of a .abort message, the bank restores balance first.

Figure 24.2 contains the code for transfers and inquiries, as well as tests. The receive() method is used by coordinators in an atomically when exists statement to wait for a response from each bank involved in a transaction. Argument self is the identifier of the coordinator and sources is the set of banks. It returns the empty set if there not yet responses from all banks. Otherwise it returns a singleton set containing the set of responses, one for each source.

The transfer() method contains the code for the transfer transaction. Argument self is the identifier of the coordinator, b1 is the source bank, b2 is the destination bank, and amt is the amount to transfer. The coordinator sends a PREPARE message containing a .withdraw request to b1 and a PREPARE message containing a .deposit request to b2. It then waits for responses from each. If both responses are .yes, then it commits the transaction, otherwise it aborts the transaction.

The check() method checks if the sum of the balances equals total, the sum of the initial balances. The code is similar to transfer, except that it always aborts the transaction---there is no need to ever commit it. As a code-saving hack: the balance inquiry is done by withdrawing $0.

As for testing, the initial balances are picked arbitrarily between 0 and MAX_BALANCE (and Harmony as always will try every possible set of choices). Each coordinator chooses whether to do a transfer or a check. In case of a transfer, it also chooses the source bank and the destination bank.

While the protocol perhaps seems simple enough, there are a lot of if statements in the code, making it hard to reason about correctness. Model checking is useful to see if there are corner cases where the code does not work. While confidence increases by increasing the number of banks or the number of coordinators, doing so quickly increases the number of possible states so that model checking may become infeasible.

Exercises

Chapter 25. Chain Replication

const NREPLICAS = 3     # number of replicas
const NOPS = 2          # number of operations

network = []            # the network is a queue of messages

def crash():
    stop()

def send(msg):
    atomically network += [msg,]

def replica(self, immortal):
    if not immortal:
        trap crash()
    var delivered = 0
    while True:
        atomically when len(network) > delivered:
            let msg = network[delivered]:
                print(self, msg)
                delivered += 1

def client(self):
    print(self)
    send(self)

let immortal = choose {1..NREPLICAS}:
    for i in {1..NREPLICAS}:
        spawn eternal replica(i, i == immortal)
for i in {1..NOPS}:
    spawn client(i)

Figure 25.1. [code/rsm.hny] Replicated State Machine

Figure 25.2. The DFA generated by Figure 25.1 when NOPS=2 and NREPLICAS=2

const NREPLICAS = 3     # number of replicas
const NOPS = 2          # number of operations (or clients)

network = {}            # the network is a set of messages

def send(self, dst, msg):           # send msg to replica dst
    atomically network |= { (dst, (self, msg)) }

def broadcast(self, msg):           # broadcast msg to all
    atomically for dst in {1..NREPLICAS}:
        network |= { (dst, (self, msg)) }

def receive(self) returns msgs:     # return messages for me
    msgs = { payload for (dst, payload) in network where (dst == self) }

def crash(self):                    # server 'self' is crashing
    broadcast(self, "crash")        # notify all other replicas
    stop()

def is_prefix(hist1, hist2) returns result: # hist1 is a strict prefix of hist2
    result = (len(hist1) < len(hist2)) and
                all(hist1[i] == hist2[i] for i in {0..len(hist1)-1})

Figure 25.3. [code/chain.hny] Chain Replication (part 1)

const NREPLICAS = 3     # number of replicas
const NOPS = 2          # number of operations (or clients)

network = {}            # the network is a set of messages

def send(self, dst, msg):           # send msg to replica dst
    atomically network |= { (dst, (self, msg)) }

def broadcast(self, msg):           # broadcast msg to all
    atomically for dst in {1..NREPLICAS}:
        network |= { (dst, (self, msg)) }

def receive(self) returns msgs:     # return messages for me
    msgs = { payload for (dst, payload) in network where (dst == self) }

def crash(self):                    # server 'self' is crashing
    broadcast(self, "crash")        # notify all other replicas
    stop()

def is_prefix(hist1, hist2) returns result: # hist1 is a strict prefix of hist2
    result = (len(hist1) < len(hist2)) and
                all(hist1[i] == hist2[i] for i in {0..len(hist1)-1})

Figure 25.4. [code/chain.hny] Chain Replication (part 2)

As you have probably experienced, computers can crash. If you are running a web service, you may not be able to afford a long outage. If you are running software that flies a plane, then an outage for any length of time could lead to a disaster. To deal with service outages caused by computers crashing, you may want to replicate the service onto multiple computers. As long as one of the computers survives, the service remains available.

Besides availability, it is usually important that the replicated service acts as if it were a single one. This requires that the replicas of the service coordinate their actions. The Replicated State Machine Approach [27, 39] is a general approach to do just this. First, you model your service as a deterministic state machine. The replicas each run a copy of the state machine, started in the same state. As long as the replicas handle the same inputs in the same order, determinism guarantees that they produce the same outputs in the same order.

Figure 25.1 presents a Harmony specification of state machine replication. We model the state machine as a history: a sequence of operations. In a replicated state machine, the abstract network maintains this history as an ordered queue of messages. NOPS clients each place an operation on the network. The replicas process messages from the ordered network.

All but one of the replicas is allowed to crash. Crashes are modeled as interrupts, so we use Harmony's trap clause to schedule one. When crashing, a replica simply stops. The model chooses one of the replicas that is not allowed to crash. Of course, a replica does not know whether it is immortal or not in practice---it should just assume that it is. The immortality of one of the replicas is only used for modeling the assumptions we make about the system.

The behavior is captured as before. Before an operation is added to the network, a client prints the operation (in this case, its own identifier). After a replica processes an operation, it prints a pair consisting of its own identifier and the operation. All replicas print the same operations in the same order until they crash. Figure 25.2 shows the allowed behaviors in case there are just two clients and two replicas. Because one of the replicas is immortal and clients do not crash, at least one of the replicas will print both operations (liveness). If both do, they do so in the same order (safety).

But in reality the network is not an ordered queue and better modeled as a set of messages. The trick now is to ensure that all replicas handle the same requests in the same order and to do so in a way that continues to work even if some strict subset of replicas crash. Chain Replication [41] is such a replication protocol. In Chain Replication, the replicas are organized in a linear chain which may change as replicas crash. Importantly, at any point in time there is only one head and one tail replica.

Only the head is allowed to accept new operations from clients. When it does so, it adds the operation to the end of its history and sends the history to its successor on the chain. When the direct successor receives such a history, it makes sure that the history is an extension of its own and, if so, replaces its own history with the one received. It then sends the history on to its successor, if any. When an operation reaches the tail, the operation is what is known as stable---it has been reliably ordered and persisted.

In order for this to work, each replica needs to know who is its predecessor and who is its successor. So, when a replica fails, its neighbors should find out about it. In practice, one server can detect the failure of another server by pinging it. If a server does not receive a response to its ping within some maximum amount of time, then the server considers its peer crashed. Note that this, in general, is not a safe thing to do---the network or the peer may be temporarily slow but the peer is not necessarily crashed when the timer expires.

Nonetheless, we will assume here that failure detection does not make mistakes and that eventually every failure is detected. This is called the Fail-Stop failure model [38], which is distinct from the often more realistic Crash Failure model where processes can crash but accurate detection is not available. We will consider that more realistic failure model in the upcoming chapters. For chain replication, when a replica crashes, it will reliably notify the other replicas by broadcasting a message over the reliable network. Because failure detection is accurate, at most one replica can think it is the head at any time (and, if so, it is in fact the head). Moreover, when it has detected all its predecessors having failed, eventually some replica thinks it is the head. The same is true for the tail.

Figure 25.3 and Figure 25.4 show an implemenation of chain replication. The network is modeled as a append-only set of messages of the form (destination, (source, payload)). When sending, a message is atomically added to this set. A client broadcasts its operation to all replicas.

Each replica maintains its own history hist and a chain configuration config. The replica executes a loop in which it receives and atomically handles messages until it crashes. As before, one of the replicas cannot crash. Because replicas do not want to handle the same message twice, each replica maintains a set received of messages it has already handled. Each replica then waits for a message on the network it has not already handled before.

When a replica receives a client request, it adds the request to a set requests that it maintains. A replica can only handle such a request if it is the head, but each replica can eventually become the head so it should carefully save all requests. (In theory, it can remove them as soon as they are on its history.) When a replica receives a failure notification, it updates its configuration accordingly. When a non-head replica receives a history that extends its own history, then the replica adopts the received history.

Next, if a replica is the head, it adds any requests it has received to its history unless they are already on there. If a replica is the tail, it "delivers" operations on its history (by printing the operation) that it has not already delivered. For this, it maintains a counter delivered that counts the number of delivered requests. Any replica that is not the tail sends its history to its successor in the chain.

The question is whether chain replication has the same behavior as the replicated state machine specification of Figure 25.1. This can be checked using the following two Harmony commands:

$ harmony -o rsm.hfa code/rsm.hny
$ harmony -B rsm.hfa code/chain.hny
The first command outputs the behavior DFA of code/rsm.hny in the file rsm.hfa. The second command checks that the behavior of code/chain.hny satisfies the DFA in rsm.hfa. Note that chain replication does not produce all the possible behaviors of a replicated state machine, but all its behaviors are valid.

The model has each replica send its entire history each time it extends its history. This is fine for modeling, but in practice that would not scale. In practice, a predecessor would set up a TCP connection to its successor and only send updates to its history along the TCP connection. Because TCP connections guarantee FIFO order, this would be identical to the predecessor sending a series of histories, but much more efficient.

Chapter 26. Working with Actions

import list, action

const NREPLICAS = 3
const NOPS = 2

# Global state
let immortal = choose {1..NREPLICAS}:
    replicas = { p: { .immortal: immortal == p, .crashed: False,
                .requests: {}, .hist: [], .config: {1..NREPLICAS},
                .received: {}, .delivered: 0 } for p in {1..NREPLICAS} }
clients = { c: { .sent_request: False } for c in {1..NOPS} }

const is_head = lambda(p): p == min(replicas[p].config) end
const is_tail = lambda(p): p == max(replicas[p].config) end

def is_successor(self, p) returns result:
    let succ = { q for q in replicas[self].config where q > self }:
        result = False if succ == {} else (p == min(succ))

def do_sendOperation(c):
    print(c)
    clients[c].sent_request = True
    for p in {1..NREPLICAS}:
        replicas[p].requests |= { c }

const sendOperation = lambda(): { ?do_sendOperation(c)
    for c in {1..NOPS} where not clients[c].sent_request } end

def do_gotOperation(self, op):
    replicas[self].hist += [op,]

const gotOperation = lambda(): { ?do_gotOperation(p, op)
    for p in {1..NREPLICAS}
      where not replicas[p].crashed and is_head(p)
    for op in replicas[p].requests
      where op not in replicas[p].hist } end

def do_sendHist(self, p):
    replicas[p].received |= { replicas[self].hist }

Figure 26.1. [code/chainaction.hny] Chain Replication specification using actions (part 1)

const sendHist = lambda(): { ?do_sendHist(p, q)
    for p in {1..NREPLICAS}
      where not replicas[p].crashed
    for q in {1..NREPLICAS}
      where is_successor(p, q) and (replicas[p].hist not in replicas[q].received)
  } end

def do_gotHist(self, hist):
    replicas[self].hist = hist

const gotHist = lambda(): { ?do_gotHist(p, hist)
    for p in {1..NREPLICAS} where not replicas[p].crashed
    for hist in replicas[p].received where (len(replicas[p].hist) < len(hist))
                        and list.startswith(hist, replicas[p].hist) } end

def do_deliver(self):
    print(self, replicas[self].hist[replicas[self].delivered])
    replicas[self].delivered += 1

const deliver = lambda(): { ?do_deliver(p)
    for p in {1..NREPLICAS} where not replicas[p].crashed and
        is_tail(p) and (len(replicas[p].hist) > replicas[p].delivered) } end

def do_crash(self):
    replicas[self].crashed = True

const crash = lambda(): { ?do_crash(p)
    for p in {1..NREPLICAS}
        where not replicas[p].crashed and not replicas[p].immortal } end

def do_detect(self, p):
    replicas[self].config -= { p }

const detect = lambda(): { ?do_detect(p, q)
    for p in {1..NREPLICAS} where not replicas[p].crashed
    for q in {1..NREPLICAS} where replicas[q].crashed and
                (q in replicas[p].config) } end

action.explore({sendOperation, gotOperation, sendHist,
                    gotHist, deliver, crash, detect})

Figure 26.2. [code/chainaction.hny] Chain Replication specification using actions (part 2)

So far we have mostly modeled concurrent activities using threads. Another way of modeling is by enumerating all the possible state transitions from any given state. For example, this is how things are commonly specified in TLA+. As in TLA+, we call such state transitions actions. In this chapter we will revisit modeling chain replication, but this time using actions.

Figure 26.1 and Figure 26.2 contain the new specification. The state of the replicas and the clients are stored in the variables replicas and clients respectively. Each type of action is captured using a lambda and a method. The method updates the state, while the lambda enumerates the possible actions of this type.

For example, consider the crash action. All replicas, except the replica that is immortal and the replicas that have already crashed, can crash. There is a lambda crash that generates a set of all possible crashes. Each element in the set is a thunk, that is, a delayed call of a method and an argument [26]. For example, ?do_crash(1) is the action representing replica 1 failing. If we look at the do_crash(p) method, all it does is set the crashed field of the replica. The specification does this for every type of action:

The Harmony action module explores all possible behaviors of such a specification. It has a single method explore that takes a set of lambdas, each of which returning a set of possible actions.

So, which of the two types of specification do you prefer? One metric is readability, but that is subjective and depends on what you have experience with. Another object is the size of the state space, and in general control over the state space that is being explored. Threads have hidden state such as their stacks, program counters, and local variables, adding to the state space in sometimes unexpected ways. With an action-based specification all state is explicit, and all state changes are explicit. This can be advantageous. On the other hand, the thread-based specification is easier to turn into an actual running implementation.

Chapter 27. Replicated Atomic Read/Write Register

reg = None

def init():
    pass

def read(uid) returns contents:
    atomically contents = reg

def write(uid, v):
    atomically reg = v

Figure 27.1. [code/register.hny] An atomic read/write register

import register

const NREADERS = 2
const NWRITERS = 1

def reader(i):
    print(i, "reads")
    let v = register.read(i):
        print(i, "read", v)
 
def writer(i):
    print(i, "writes")
    register.write(i, i)
    print(i, "wrote")
        
register.init()
for i in { 1 .. NREADERS }:
    spawn reader(i)
for i in { 1 .. NWRITERS }:
    spawn writer(-i)

Figure 27.2. [code/abdtest.hny] Behavioral test for atomic read/write registers and the output for the case that NREADERS = NWRITERS = 1

import bag

const F = 1
const N = (2 * F) + 1

network = bag.empty()

def send(m): atomically network = bag.add(network, m)

def server():
    var t, v, received = (0, None), None, {}
    while True:
        atomically when exists m in { k for k in keys network - received
                            where k.type in {"read", "write"} }:
            received |= { m }
            if (m.type == "write") and (m.value[0] > t):
                t, v = m.value
            send({ .type: .response, .dst: m.src, .value: (t, v) })

def init():
    for i in { 1 .. N }: spawn eternal server()

def receive(uid, phase) returns quorums:
    let msgs = { m:c for m:c in network
                where (m.type == .response) and (m.dst == (uid, phase)) }:
        quorums = bag.combinations(msgs, N - F)

def read(uid) returns contents:
    send({ .type: "read", .src: (uid, 1) })
    atomically when exists msgs in receive(uid, 1):
        let (t, v) = max(m.value for m in keys msgs):
            send({ .type: "write", .src: (uid, 2), .value: (t, v) })
            contents = v
    atomically when exists msgs in receive(uid, 2):
        pass

def write(uid, v):
    send({ .type: "read", .src: (uid, 1) })
    atomically when exists msgs in receive(uid, 1):
        let (t, _) = max(m.value for m in keys msgs)
        let nt = (t[0] + 1, uid):
            send({ .type: "write", .src: (uid, 2), .value: (nt, v) })
    atomically when exists msgs in receive(uid, 2):
        pass

Figure 27.3. [code/abd.hny] An implementation of a replicated atomic read/write register

A register is an object that you can read or write. In a distributed system, examples include a shared file system (each file is a register) or a key/value store (each key corresponds to a register). A simple shared register implementation would have its value maintained by a server, and clients can read or write the shared register by exchanging messages with the server. We call two operations such that one does not finish before the other starts concurrent. Since messages are delivered one at a time to the server, concurrent operations on the shared register appear atomic. In particular, we have the following three desirable properties:
  1. All write operations are ordered;
  2. A read operation returns either the last value written or the value of a concurrent write operation.
  3. If read operation r1 finishes before read operation r2 starts, then r2 cannot return a value that is older than the value returned by r1 .
It is instructive to look at the test program and its output in Figure 27.2. This is for the case when there is only a single reader thread (identified as "1") and a single writer thread (identified as " -1 "), but already there are many cases to consider. Each thread prints information just before and just after doing their single operation. The output shows all possible interleavings in the form of a DFA. Note that if the read operation starts after the write operation has completed, then the read operaion must return the new value. However, if the two operations interleave in some way, then the read operation can return either the old or the new value.

Unfortunately, a server is a single point of failure: if it fails, all its clients suffer. We would therefore like to find a solution that can survive the crash of a server. While we could use Chain Replication to replicate the register, in this chapter we will use a solution that does not assume that crashes can be accurately detected.

We will again replicate the register object: maintain multiple copies, but we will not use the replicated state machine approach. One could, for example, imagine that clients write to all copies and read from any single one. While this solves the single-point-of-failure problem, we lose all the nice properties above. For one, it is not guaranteed that all servers receive and process all write operations in the same order.

We present a protocol preserving these properties that is based on the work by Hagit Attiya, Amotz Bar-Noy, and Danny Dolev [2]. In order to tolerate F failures, it uses N = 2F + 1 servers. In other words, the register survives as long as a strict majority of its copies survive. All write operation will be ordered by a unique logical timestamp (see also Chapter 10). Each server maintains not only the value of the object, but also the timestamp of its corresponding write operation.

Each read and write operation consists of two phases. In a phase, a client broadcasts a request to all servers and waits for responses from a majority (N - F or equivalently F + 1 servers). Note that because we are assuming that no more than F servers can fail, doing so is safe, in that a client cannot indefinitely block as long as that assumption holds.

In the first phase, a client asks each server for its current timestamp and value. After receiving N - F responses, the client determines the response with the highest timestamp. In case of a write operation, the client then computes a new unique timestamp that is strictly higher than the highest it has seen. To make this work, timestamps are actually lexicographically ordered tuples consisting of an integer and the unique identifier of the client that writes the value. So, if (t,c) is the highest timestamp observed by client c, and c needs to create a new timestamp, it can select (t + 1,c) . After all (t + 1,c) > (t,u) and no other client will create the same timestamp.

Suppose client c is trying to write a value v. In phase 2, client c broadcasts a request containing timestamp (t + 1,c) and v. Each server that receives the request compares (t + 1,c) against its current timestamp. If (t + 1,c) is larger than its current timestamp, it adopts the new timestamp and its corresponding value v. In either case, the server responds to the client. Upon c receiving a response from N - F servers, the write operation completes. In case of a read operation, client c simply writes back the highest timestamp it saw in the first phase along with its corresponding value.

Figure 27.3 contains the code for a server, as well as the code for read and write operations. For efficiency of model checking, the servers are anonymous---otherwise we would have to consider every permutation of states of those servers. Because the servers are anonymous, they may end up sending the same exact message, but clients are waiting for a particular number of messages. Because of this, we will model the network as a bag of messages.

A server initializes its timestamp t to (0,None) and its value to None. Each server also keeps track of all the requests its already received so it doesn't handle the same request twice. The rest of the code is fairly straightforward.

Read and write operations are both invoked with a unique identifier uid. Both start by broadcasting a .read request to all servers and then waiting for a response from N - F servers. The receive() function uses the bag.combinations method to find all combinations of subsets of responses of size N - F. The second phase of each operation is similar.

Figure 27.2 can be used to test this protocol, although you will notice that the model checker cannot deal with cases involving more than three client threads. Three is just enough to check the third property listed above (using one writer and two readers). Doing so illustrates the importance of the second phase of the read operation. You can comment out Lines 34, 36, and 37 in Figure 27.3 and to elide the second phase and see what goes wrong.

One may wonder how failures can occur in this model. They are not explicitly modeled, but Harmony tries every possible execution. This includes executions in which the clients terminate before F of the servers start executing. To the clients, this is indistinguishable from executions in which those servers have failed.

Chapter 28. Distributed Consensus

const N = 4

proposals = [ choose({0, 1}) for i in {0..N-1} ]
decision = choose { x for x in proposals }

def processor(proposal):
    if choose { False, True }:
        print decision

print proposals
for i in {0..N-1}:
    spawn processor(proposals[i])

Figure 28.1. [code/consensus.hny] Distributed consensus code and behavior DFA

Distributed consensus is the problem of having a collection of processors agree on a single value over a network. For example, in state machine replication, the state machines have to agree on which operation to apply next. Without failures, this can be solved using leader election: first elect a leader, then have that leader decide a value. But consensus often has to be done in adverse circumstances, for example in the face of processor failures.

Each processor proposes a value, which we assume here to be from the set { 0, 1 }. By the usual definition of consensus, we want the following three properties:

  1. Validity: a processor can only decide a value that has been proposed;
  2. Agreement: if two processors decide, then they decide the same value.
  3. Termination: each processor eventually decides.
The consensus problem is impossible to solve in the face of processor failures and without making assumptions about how long it takes to send and receive a message [19]. Here we will not worry about Termination.

Figure 28.1 presents a specification for binary consensus---the proposals are from the set {0, 2} In this case there are four processors. The proposal of processor i is in proposals[i]. The decision is chosen from the set of proposals. Each processor may or may not print the decision---capturing the absence of the Termination property. It may be that no decisions are made, but that does not violate either Validity or Agreement. Thus the behavior of the program is to first print the array of proposals, followed by some subset of processors printing their decision. Notice the following properties:

This is just the specification---in practice we do not have a shared variable in which we can store the decision a priori. We will present a simple consensus algorithm that can tolerate fewer than 13rd of processors failing by crashing. More precisely, constant F contains the maximum number of failures, and we will assume there are N = 3F + 1 processors.

import bag

const F = 1
const N = (3 * F) + 1
const NROUNDS = 3

proposals = [ choose({0, 1}) for i in {0..N-1} ]
network = bag.empty()

def broadcast(msg):
    atomically network = bag.add(network, msg)

def receive(round, k) returns quorum:
    let msgs = { e:c for (r,e):c in network where r == round }:
        quorum = bag.combinations(msgs, k)

def processor(proposal):
    var estimate, decided = proposal, False
    broadcast(0, estimate)
    for round in {0..NROUNDS-1}:
        atomically when exists quorum in receive(round, N - F):
            let count = [ bag.multiplicity(quorum, i) for i in { 0..1 } ]:
                assert count[0] != count[1]
                estimate = 0 if count[0] > count[1] else 1
                if count[estimate] == (N - F):
                    if not decided:
                        print estimate
                        decided = True
                    assert estimate in proposals   # check validity
                broadcast(round + 1, estimate)

print proposals
for i in {0..N-1}:
    spawn processor(proposals[i])

Figure 28.2. [code/bosco.hny] A crash-tolerant consensus protocol

Figure 28.3. The behavior DFA for Figure 28.2

Figure 28.2 presents our algorithm. Besides the network variable, it uses a shared list of proposals and a shared set of decisions. In this particular algorithm, all messages are broadcast to all processors, so they do not require a destination address. The N processors go through a sequence of rounds in which they wait for N - F messages, update their state based on the messages, and broadcast messages containing their new state. The reason that a processor waits for N - F rather than N messages is because of failures: up to F processors may never send a message and so it would be unwise to wait for all N. You might be tempted to use a timer and time out on waiting for a particular processor. But how would you initialize that timer? While we will assume that the network is reliable, there is no guarantee that messages arrive within a particular time. We call a set of N - F processors a quorum. A quorum must suffice for the algorithm to make progress.

The state of a processor consists of its current round number (initially 0) and an estimate (initially the proposal). Therefore, messages contain a round number and an estimate. To start things, each processor first broadcasts its initial round number and initial estimate. The number of rounds that are necessary to achieve consensus is not bounded. But Harmony can only check finite models, so there is a constant NROUNDS that limits the number of rounds.

In Line 21, a processor waits for N - F messages using the Harmony atomically when exists statement. Since Harmony has to check all possible executions of the protocol, the receive(round, k) method returns all subbags of messages for the given round that have size k = N - F. The method uses a dictionary comprehension to filter out all messages for the given round and then uses the bag.combinations method to find all combinations of size k. The atomically when exists statement waits until there is at least one such combination and then chooses an element, which is bound to the quorum variable. The body of the statement is then executed atomically. This is usually how distributed algorithms are modeled, because they can only interact through the network. There is no need to interleave the different processes other than when messages are delivered. By executing the body atomically, a lot of unnecessary interleavings are avoided and this reduces the state space that must be explored by the model checker significantly.

The body of the atomically when exists statement contains the core of the algorithm. Note that N - F = 2F + 1, so that the number of messages is guaranteed to be odd. Also, because there are only 0 and 1 values, there must exist a majority of zeroes or ones. Variable count[0] stores the number of zeroes and count[1] stores the number of ones received in the round. The rules of the algorithm are simple:

After that, proceed with the next round.

To check for correct behavior, run the following two commands:

$ harmony -o consensus.hfa code/consensus.hny
$ harmony -B consensus.hfa code/bosco.hny
Note that the second command prints a warning: "behavior warning: strict subset of specified behavior." Thus, the set of behaviors that our algorithm generates is a subset of the behavior that the specification allows. Figure 28.3 shows the behavior, and indeed it is not the same as the behavior of Figure 28.1. This is because in our algorithm the outcome is decided a priori if more than twothirds of the processors have the same proposal, whereas in the consensus specification the outcome is only decided a priori if the processors are initially unanimous. Another difference is that if the outcome is decided a priori, all processors are guaranteed to decide.

import bag

const F = 1
const N = (3 * F) + 1
const NROUNDS = 3

let n_zeroes = choose { 0 .. N / 2 }:
    proposals = ([0,] * n_zeroes) + ([1,] * (N - n_zeroes))
network = bag.empty()

def broadcast(msg):
    atomically network = bag.add(network, msg)

def receive(round) returns quorum:
    let msgs = { e:c for (r,e):c in network where r == round }:
        quorum = {} if bag.size(msgs) < N else { msgs }

def processor(proposal):
    var estimate, decided = proposal, False
    broadcast(0, estimate)
    for round in {0..NROUNDS-1}:
        atomically when exists msgs in receive(round):
            let choices = bag.combinations(msgs, N - F)
            let quorum = choose(choices)
            let count = [ bag.multiplicity(quorum, i) for i in { 0..1 } ]:
                assert count[0] != count[1]
                estimate = 0 if count[0] > count[1] else 1
                if count[estimate] == (N - F):
                    if not decided:
                        print estimate
                        decided = True
                    assert estimate in proposals           # validity
                broadcast(round + 1, estimate)

print proposals
for i in {0..N-1}:
    spawn processor(proposals[i])

Figure 28.4. [code/bosco2.hny] Reducing the state space

While one can run this code within little time for F = 1, for F = 2 the state space to explore is already quite large. One way to reduce the state space to explore is the following realization: each processor only considers messages for the round that it is in. If a message is for an old round, the processor will ignore it; if a message is for a future round, the processor will buffer it. So, one can simplify the model and have each processor wait for all N messages in a round instead of N - F. It would still have to choose to consider just N - F out of those N messages, but executions in which some processors are left behind in all rounds are no longer considered. It still includes executions where some subset of N - F processors only choose each other messages and essentially ignore the messages of the remaining F processors, so the resulting model is just as good.

Another way to reduce the state space to explore is to leverage symmetry. First of all, it does not matter who proposes a particular value. Also, the values 0 and 1 are not important to how the protocol operates. So, with 5 processors (F = 2), say, we only need to explore the cases where no processors propose 1, where exactly one processors proposes 1, and where 2 processors proposes 1.

Figure 28.4 shows the code for this optimized model. Running this with F = 2 does not take very long and this approach is a good blueprint for testing other round-based protocols (of which there are many).

Exercises

Chapter 29. Paxos

import bag

const F = 1
const NACCEPTORS = (2 * F) + 1
const NLEADERS = F + 1
const NBALLOTS = 2

network = bag.empty()

proposals = [ choose({0, 1}) for i in {0..NLEADERS-1} ]

def send(msg):
    atomically network = bag.add(network, msg)

def receive(ballot, phase) returns quorum:
    let msgs = { e:c for (b,p,t,e):c in network
                        where (b,p,t) == (ballot, phase, "B") }:
        quorum = bag.combinations(msgs, NACCEPTORS - F)

print proposals
for i in {0..NLEADERS - 1}:
    spawn leader(i + 1, proposals[i])
for i in {1..NACCEPTORS}:
    spawn eternal acceptor()

def leader(self, proposal):
    var ballot, estimate, decided = self, proposal, False
    send(ballot, 1, "A", None)
    while ballot <= NBALLOTS:
        atomically when exists quorum in receive(ballot, 1):
            let accepted = { e for e:_ in quorum where e != None }:
                if accepted != {}:
                    _, estimate = max(accepted)
            send(ballot, 2, "A", estimate)
        atomically when exists quorum in receive(ballot, 2):
            if bag.multiplicity(quorum, (ballot, estimate)) == (NACCEPTORS - F):
                assert estimate in proposals    # validity
                if not decided:
                    print estimate
                    decided = True
            ballot += NLEADERS
            if ballot <= NBALLOTS:
                send(ballot, 1, "A", None)

def acceptor():
    var ballot, last_accepted, received = 0, None, {}
    while True:
        atomically when exists b,p,e in { (bb,pp,ee) for bb,pp,tt,ee:_ in network
                    where ((bb,pp) not in received) and (tt == "A") }:
            received |= { (b, p) }
            if b >= ballot:
                ballot = b
                if p == 2:
                    last_accepted = (ballot, e)
            send(b, p, "B", last_accepted)

Figure 29.1. [code/paxos.hny] A version of the Paxos protocol

Paxos [28] is the most well-known family of consensus protocols for environments in which few or no assumptions are made about timing. In this chapter, we present a basic version of a Paxos protocol, one that is single-decree (only tries to make a single decision). It uses two kinds of processors: leaders and acceptors. In order to tolerate F crash failures, you need at least F + 1 leaders and 2F + 1 acceptors, but leaders and acceptors can be colocated, so in total only 2F + 1 independently failing processors are needed. Here we provide only a rudimentary introduction to Paxos; for more detailed information refer to [28].

As in the consensus protocol of Chapter 28, Paxos uses rounds of messaging. The communication pattern, however, is different. Similar to the atomic read/write register protocol in Chapter 27, Paxos uses two kinds of rounds: "Phase 1" and "Phase 2" rounds. Rounds are identified by a so-called ballot number combined with the phase number. Different leaders are in charge of different ballot numbers. Leaders broadcast "Type A" messages to the acceptors, which respond point-to-point with "Type B" messages.

Figure 29.1 contains the code for this Paxos protocol. Paxos is perhaps best understood starting with the second phase. At the end of the first phase, the leader broadcasts a 2.A message (Phase 2, Type A) to the acceptors containing the ballot number and a proposal and then waits for N - F matching 2.B responses from the acceptors. If each response contains the ballot number and the proposal, then the proposal is deemed decided. But one or more of the responses can contain a higher ballot number, in which case the leader has to try again with an even higher ballot number. This is where the first phase comes in.

It is not possible that an acceptor responds with a smaller ballot number. This is because acceptors maintain two state variables. One is ballot, the highest ballot number they have seen. Second is a variable called last_accepted that, if not None, contains the last proposal the acceptor has accepted and the corresponding ballot number. The acceptor also contains a set received that contains (ballot, phase) tuples identifiying all rounds that the ballot has already participated in. An acceptor waits for a message for a round that is not in received. If its ballot number is higher than what it has seen before, the acceptor moves into that ballot. If the phase is 2, then the acceptor accepts the proposal and remembers when it did so by saving the (ballot, proposal) tuple in last_accepted. In all cases, the acceptor responds with the current values of ballot and last_accepted.

In its first phase, a leader of a ballot must come up with a proposal that cannot conflict with a proposal of an earlier ballot that may already have been decided. To this end, the leader broadcasts a 2.A message to the acceptors and awaits N - F of their last_accepted values. If all those acceptors responded with None, then the leader is free to choose its own proposal. Otherwise the leader updates its proposal with the one corresponding to the highest ballot number. The leader then moves on to the second round.

To run and check the Paxos code, do the following (leveraging the consensus specification of Figure 28.1):

$ harmony -o consensus.hfa -cN=2 code/consensus.hny
$ harmony -B consensus.hfa code/paxos.hny
You should get a warning that our implementation of Paxos does not generate all possible behaviors. This is because we only run the protocol for a finite number of ballots, and therefore at least one of the ballots will be successful. With an unlimited number of ballots, Paxos may never decide unless you make some liveness assumptions.

Exercises

Chapter 30. Needham-Schroeder Authentication Protocol

network = {}

dest = choose({ None, "Bob", "Corey" })

def send(msg):
    atomically network |= { msg }

def alice():
    if dest != None:
        send({ .dst: dest,
            .contents: { .type: 1, .nonce: "nonceA", .initiator: "Alice" } })
        atomically when exists m in network when (m.dst == "Alice")
                    and (m.contents.type == 2) and (m.contents.nonce == "nonceA"):
            send({ .dst: dest, .contents: { .type: 3, .nonce: m.contents.nonce2 } })

def bob():
    atomically when exists m in network when (m.dst == "Bob")
                    and (m.contents.type == 1) and (m.contents.initiator == "Alice"):
        send({ .dst: "Alice",
            .contents: { .type: 2, .nonce: m.contents.nonce, .nonce2: "nonceB" } })
    atomically when exists m in network when (m.dst == "Bob")
                    and (m.contents.type == 3) and (m.contents.nonce == "nonceB"):
        assert dest == "Bob"

def corey():
    var received, nonces, msgs = {}, { "nonceC" }, {}
    while True:
        atomically when exists m in network - received when m.dst == "Corey":
            received |= { m }
            nonces |= { m.contents.nonce }
            if m.contents.type == 2:
                nonces |= { m.contents.nonce2 }
            for dst in { "Alice", "Bob" } for n in nonces:
                msgs |= {{ .dst: dst, .contents: { .type: 1, .nonce: n, .initiator: ini }}
                                    for ini in { "Alice", "Bob" }}
                msgs |= {{ .dst: dst, .contents: { .type: 2, .nonce: n, .nonce2: n2 }}
                                    for n2 in nonces }
                msgs |= {{ .dst: dst, .contents: { .type: 3, .nonce: n }}}
            send(choose(msgs - network))

spawn alice()
spawn eternal bob()
spawn eternal corey()

Figure 30.1. [code/needhamschroeder.hny] Needham-Schroeder protocol and an attack

The Needham-Schroeder protocol [36] is a security protocol in which two parties authenticate one another by exchanging large and recently created random numbers called nonces that nobody else should be able to read. The nonces should only be used once for an instantiation of the protocol between honest participants (i.e., participants that follow the protocol). The version we describe here uses public key cryptography [12]: with public key cryptography it is possible to create a message for a particular destination that only that destination can read. We denote with mp a message m encrypted for p so that only p can decrypt the message and see that it contains m.

Suppose Alice wants to communicate with Bob. The three critical steps in the Needham-Schroeder protocol are as follows:

  1. Alice creates a new nonce NA and sends 1,A,NABob to Bob;
  2. Upon receipt, Bob creates a new nonce NB and sends 2,NA,NBAlice to Alice;
  3. Alice sends 3,NBBob to Bob.
When Bob receives 1,A,NABob , Bob does not know for sure that the message came from Alice, and even if it came from Alice, it does not know if Alice sent the message recently or if it was replayed by some adversary. When Alice receives 2,NA,NBAlice , Alice does know that, if Bob is honest, (1) Bob and only Bob could have created this message, and (2) Bob must have done so recently (since Alice created NA ). When Bob receives 3,NBBob , Bob decides that it is Alice that is trying to communicate at this time. Since Bob created NB recently and sent it encrypted to Alice, Bob does not have to worry that the type 3 message was an old message that was replayed by some adversary. Also, if Alice is honest, it seems only Alice can have seen the message containing NB .

Thus, the intended security properties of this protocol are symmetric. Assuming Alice and Bob are both honest:

After the protocol, Alice can include NA in messages to Bob and Bob can include NB in messages to Alice to authenticate the sources of those messages to one another.

Figure 30.1 shows the protocol implemented in Harmony. A message mp is encoded in Harmony as a dictionary {.dst : p,.contents : m} . The code for Alice and Bob simply follows the steps listed above.

Unfortunately, the protocol turns out to be incorrect, but it took 17 years before somebody noticed [32]. Model checking can be used to find the bug [33]. To demonstate the bug, we need to model the environment. In particular, we introduce a third party, which we will call Corey. We want to make sure that Corey cannot impersonate Alice or Bob. However, it is possible that Alice tries to set up an authenticated connection to Corey using the Needham-Schroeder protocol. That in itself should not be a problem if the protocol were correct.

The code in Figure 30.1 has Alice either not do anything, or has Alice try to set up a connection to either Bob or Corey. Bob only accepts connections with Alice. Corey, when receiving a message that it can decrypt, will try to find an attack by sending every possible message to every possible destination. In particular, it keeps track of every nonce that it has seen and will try to construct messages with them to send to Alice and Bob. If Bob finishes the protocol, it checks to see if Alice actually tried to connect to Bob. If not, the assertion fails and an attack is found.

Running the code in Figure 30.1 quickly finds a viable attack. The attack goes like this:

  1. Alice creates a new nonce NA and sends 1,A,NACorey to Corey;
  2. Upon receipt, Corey sends 1,A,NABob to Bob;
  3. Upon receipt, Bob, believing it is engaging in the protocol with Alice, creates a new nonce NB and sends 2,NA,NBAlice to Alice;
  4. Alice thinks the message came from Corey (because it contains NA , which Alice created for Corey and sent to Corey) and sends 3,NBCorey to Corey.
  5. Corey learns NB and sends 3,NBBob to Bob.
  6. Bob receiving 3,NBBob is identical to the situation in which Alice tried to set up a connection to Bob, so Bob now thinks it is talking to Alice, even though Alice never tried to communicate with Bob.
The security property is violated. In particular, Bob, duped by Corey, finished the protocol with Alice and received AN , and even though Bob and Alice are both honest, Corey has a copy of AN . So, Corey is now able to impersonate Alice to Bob (but not vice versa because Alice did not try to authenticate Bob).

Exercises

References

  1. Gul Agha, "Actors: A Model of Concurrent Computation in Distributed Systems (doctoral dissertation)", 1986.
  2. Hagit Attiya, Amotz Bar-Noy, and Danny Dolev, "Sharing Memory Robustly in Message-Passing Systems", 1995.
  3. László A. Bélády, R. A. Nelson, and G. S. Shedler, "An Anomaly in Space-Time Characteristics of Certain Programs Running in a Paging Machine", 1969.
  4. Michael Ben-Or, "Another Advantage of Free Choice (Extended Abstract): Completely Asynchronous Agreement Protocols", in Proceedings of the 2nd Annual ACM Symposium on Principles of Distributed Computing, 1983.
  5. Andrew D. Birrell, "An introduction to programming with threads", 1989.
  6. Per Brinch Hansen, "Operating System Principles", 1973.
  7. Ernest Chang and Rosemary Roberts, "An Improved Algorithm for Decentralized Extrema-Finding in Circular Configurations of Processes", 1979.
  8. Edmund M. Clarke, E. Allen Emerson, and A. Prasad Sistla, "Automatic Verification of Finite-State Concurrent Systems Using Temporal Logic Specifications", 1986.
  9. Edward G. Coffman, Melanie Elphick, and Arie Shoshani, "System Deadlocks", 1971.
  10. Fernando J. Corbató, "A Paging Experiment with the Multics System", in In Honor of Philip M. Morse, 1969.
  11. Pierre-Jacques Courtois, Frans Heymans, and David L. Parnas, "Concurrent Control with "Readers" and "Writers"", 1971.
  12. Whitfield. Diffie and Martin E. Hellman, "New directions in cryptography", 1976.
  13. Edsger W. Dijkstra, "EWD-35: Over de sequentialiteit van procesbeschrijvingen", 1962.
  14. Edsger W. Dijkstra, "EWD-108: Een algorithme ter voorkoming van de dodelijke omarming", 1964.
  15. Edsger W. Dijkstra, "Solution of a Problem in Concurrent Programming Control", 1965.
  16. Edsger W. Dijkstra, "EWD-123: Cooperating Sequential Processes", 1965.
  17. Edsger W. Dijkstra, "EWD-329: Information streams sharing a finite buffer", 1972.
  18. Edsger W. Dijkstra, "EWD-703: A tutorial on the split binary semaphore", 1979.
  19. Michael J. Fischer, Nancy A. Lynch, and Mike Paterson, "Impossibility of Distributed Consensus with One Faulty Process", 1985.
  20. Jim N. Gray, "Notes on data base operating systems", in Operating Systems: An Advanced Course, 1978.
  21. James W. Havender, "Avoiding Deadlock in Multitasking Systems", 1968.
  22. Maurice P. Herlihy and Jeannette M. Wing, "Axioms for Concurrent Objects", in Proceedings of the 14th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, 1987.
  23. Carl Hewitt, Peter Bishop, and Richard Steiger, "A Universal Modular ACTOR Formalism for Artificial Intelligence", in Proceedings of the 3rd International Joint Conference on Artificial Intelligence, 1973.
  24. C. A. R. Hoare, "Monitors: An Operating System Structuring Concept", 1974.
  25. Gerard Holzmann, "The SPIN Model Checker: Primer and Reference Manual", 2011.
  26. Peter Z. Ingerman, "Thunks: A Way of Compiling Procedure Statements with Some Comments on Procedure Declarations", 1961.
  27. Leslie Lamport, "Time, Clocks, and the Ordering of Events in a Distributed System", 1978.
  28. Leslie Lamport, "The Part-Time Parliament", 1998.
  29. Leslie Lamport, "Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers", 2002.
  30. Leslie Lamport, "The PlusCal Algorithm Language", in Theoretical Aspects of Computing - ICTAC 2009, 2009.
  31. Butler W. Lampson and David D. Redell, "Experience with Processes and Monitors in Mesa", 1980.
  32. Gavin Lowe, "An Attack on the Needham-Schroeder Public-Key Authentication Protocol", 1995.
  33. Gavin Lowe, "Breaking and fixing the Needham-Schroeder Public-Key Protocol using FDR", in Tools and Algorithms for the Construction and Analysis of Systems, 1996.
  34. Maged M. Michael and Michael L. Scott, "Simple, fast, and practical non-blocking and blocking concurrent queue algorithms", in Proceedings of the 15th annual ACM Symposium on Principles of Distributed Computing (PODC), 1996.
  35. Ellis Michael, Doug Woos, Thomas Anderson, Michael D. Ernst, and Zachary Tatlock, "Teaching Rigorous Distributed Systems With Efficient Model Checking", in Proceedings of the Fourteenth EuroSys Conference 2019, 2019.
  36. Roger M. Needham and Michael D. Schroeder, "Using Encryption for Authentication in Large Networks of Computers", 1978.
  37. Gary L. Peterson, "Myths about the mutual exclusion problem", 1981.
  38. Richard D. Schlichting and Fred B. Schneider, "Fail-Stop Processors: An Approach to Designing Fault-Tolerant Computing Systems", 1983.
  39. Fred B. Schneider, "Implementing fault-tolerant services using the state machine approach: A tutorial", 1990.
  40. Fred B. Schneider, "On Concurrent Programming", 1997.
  41. Robbert van Renesse and Fred B. Schneider, "Chain Replication for Supporting High Throughput and Availability", in 6th Symposium on Operating System Design and Implementation (OSDI 2004), San Francisco, California, USA, December 6-8, 2004, 2004.

Appendix A. Harmony Language Details

Section A.1. Value Types and Operators

Chapter 4 provides an introduction to Harmony values. Below is a complete list of Harmony value types with examples:

Type Name Example
Boolean "bool" False, True
Integer "int" ..., -2, -1, 0, 1, 2, ...
String "str" "example", .example
Program Counter "pc" (method names, lambdas, and labels)
List "list" [ 1, 2, 3, ], ( (1, 2), 3 ), [1,], ()
Dictionary "dict" { .account: 12345, .valid: False }, {:}
Set "set" {}, { 1, 2, 3 }, { False, .id, 3 }
Address "address" ?lock, ?flags[2], None
Context "context" (generated by stop or save expression)
In Harmony, there is no distinction between tuples (denoted with parentheses) and lists (denoted by square brackets). That is, their format is either (e, e, ..., e,) or [e, e, ..., e,]. They map indexes (starting at 0) to Harmony values. If the list has two or more elements, then the final comma is optional.

Method type e returns the type name of e.

All Harmony values are ordered with respect to one another. First they are ordered by type according to the table above. So, for example, True < 0 < .xyz < { 0 }. Within types, the following rules apply:

  • False < True;
  • integers are ordered in the natural way;
  • strings are lexicographically ordered;
  • program counters are ordered by their integer values;
  • lists are lexicographically ordered;
  • dictionaries are first converted into a list of ordered (key, value) pairs. Then two dictionaries are lexicographically ordered by this representation;
  • a set is first converted into an ordered list, then lexicographically ordered;
  • Except for None, an address is a pair of a function a list of arguments. Addresses are lexicographically ordered accordingly. None is the smallest address.
  • contexts (Section C.3) are ordered deterministically in an unspecified way.
Harmony supports the following comparison operators:

e == e equivalence
e != e inequivalence
e < e, e <= e, e > e, e >= e ordering
In Harmony programs, the ordering operators are only allowed on pairs of integers, pairs of strings, pairs of lists, or pairs of sets. In the case of strings and lists, the ordering is lexicographical. In the case of sets, the ordering is partial based on the subset relation. Comparison operators can be chained: x <= y == z means x <= y and y == z, although y is evaluated once in the former and twice in the latter expression. Note that evaluation of a chain stops as soon as one of the comparisons fails. So, 1 < 0 < x does not evaluate x.

Harmony supports atomic expression evaluation using the following syntax: atomically e, where e is some expression.

Boolean

The boolean type has only two possible values: False and True. Unlike Python, in Harmony booleans are distinct from integers, and in particular False < 0 . In statements and expressions where booleans are expected, it is not possible to substitute values of other types.

Operations on booleans include:

e and e and ... conjuction
e or e or ... disjunction
e => e, e not => e implication
not e negation
v if e else v' v or v' depending on e
any s, all s disjunction / conjunction for set or list s
The meanings of or, and, and => are perhaps best explained by putting them in terms of the ternary if else operator:

  • x or y means True if x else y
  • x and y means False if not x else y
  • x => y means True if not x else y
Note that this means that the result of the operation may not be a Boolean. For example, (False or 2) == 2. Also, the operators are not commutative. 2 or False is an illegal expression because you cannot use an integer as a condition for if. (Unlike in Python, in Harmony only True is "thruthy" and only False is "falsy.") We recommend using only Booleans for these operators, so that their outputs are also guaranteed to be a Boolean. Finally, note that the righthand side of the expression may not be evaluated. For example, True or x evaluates to True without evaluating x.

Integer

The integer type supports any whole number. Harmony supports decimal integers, hexadecimal integers (start with `0x'), binary integers (start with `0b'), and octal integers (start with `0o').

In the C-based model checker, integers are currently implemented by two's complement 60-bit words. The model checker checks for overflow on various operations.

Operations on integers include:

-e negation
abs e absolute value
e + e + ... sum
e - e difference
e * e * ... product
e / e, e // e integer division
e % e, e mod e integer division remainder
e ** e power
~e binary inversion
e & e & ... binary and
e | e | ... binary or
e ^ e ^ ... binary exclusive or
e << e binary shift left
e >> e binary shift right
{ e..e' } set of integers from e to e inclusive

String

A string is a sequence of zero or more unicode characters. If it consists entirely of alphanumerical characters or underscore characters and does not start with a digit, then a string can be represented by a "." followed by the characters. For example, .example is the same as the string "example".

Native operations on strings include the following:

s k indexing
s k = v updating the character at index k to string v
s s ... concatenation
s + s + ... concatenation
s * n n copies of s concatenated
v [not] in s check if v is [not] a substring in s
del s[x] remove the character at index x
len s the length of s
keys s the set { 0 .. len(s) - 1 }
str e string representation of any value e

Set

In Harmony you can create a set of any collection of Harmony values. Its syntax is v0,v1,... . Python users: note that in Harmony the empty set is denoted as {}. (In Python, {} means the empty dictionary, which is represented as {:} in Harmony.)

The set module (Section B.7) contains various convenient routines that operate on sets. Native operations on sets include:

len s cardinality
s < s strict subset
s <= s subset
s > s strict superset
s >= s superset
s - s set difference
s & s & ... intersection
s | s | ... union
s ^ s ^ ... inclusion/exclusion (elements in odd number of sets)
choose s select an element (Harmony will try all)
min s minimum element
max s maximum element
any s True if any value is True
all s True if all values are True

Harmony also supports set comprehension. In its simplest form, {f(v) for v in s} returns a set that is constructed by applying f to all elements in s (where s is a set or a list). This is known as mapping. But set comprehension is much more powerful and can include joining multiple sets (using nested for loops) and filtering (using the where keyword).

For example: x + y for x in s for y in s where (x * y) == 4 returns a set that is constructed by summing pairs of elements from s that, when multiplied, have the value 4.

List or Tuple

In Harmony, there is no distinction between a list or a tuple. You can denote a list by a sequence of values, each value terminated by a comma. As per usual, you can use brackets or parentheses at your discretion. For Python users, the important thing to note is that a singleton list in Harmony must contain a comma. For example [1,] is a list containing the value 1 , while [1] is simply the value 1 .

The list module (Section B.6) contains various convenient routines that operate on lists or tuples. Native operations on lists or tuples include the following:

t k indexing
t k = v updating entry k to v
t t ... concatenation
t + t + ... concatenation
t * n n copies of t concatenated
v [not] in t check if v is [not] a value in t
del t[x] remove the entry at index x
len t the length of t
keys t the set { 0 .. len(t) - 1 }
min t the minimum value in t
max t the maximum value in t
any t True if any value is True
all t True if all values are True
Lists and tuples support comprehension. In its most basic form: [f(v) for v in t]. For example, to check if any element in a list t is even, you can write: any((x % 2) == 0 for x in t).

The domain of a list L of length n, interpreted as a function, are the integers 0..n - 1 . It is illegal to read L[n] . However, unlike Python, it is possible to write into L[n] . For example, if variable x contains [1, 2], then the statement x[2] = 3 results in x having the value [1, 2, 3].

Dictionary

A dictionary maps a set of values (known as keys) to another set of values. The generic syntax of a dictionary is {k0 : v0,k1 : v1,...} . Different from Python, the empty dictionary is written as {:} (because {} is the empty set in Harmony). If there are duplicate keys in the list, then only the one with the maximum value survives. Therefore the order of the keys in the dictionary does not matter.

Dictionaries support comprehension. The basic form is: { f(v):g(v) for v in s }.

There are various special cases of dictionaries, including lists, tuples, strings, and bags (multisets) that are individually described below.

Operations on dictionaries include the following:

d k indexing
d k = v updating field k to v
del d[k] remove the entry with key k
len d the number of keys in d
keys d the set of keys in d
k [not] in d check if k is [not] a key in d
min d the minimum value in d
max d the maximum value in d
any d True if any value is True
all d True if all values are True
d & d & ... dictionary intersection
d | d | ... dictionary union
Because in Harmony brackets are used for parsing purposes only, you can write d[k] (or d(k)) instead of d k. However, if k is a string like .id, then you might prefer the notation k.id.

Dictionary intersection and dictionary union are defined so that they work well with bags. With disjoint dictionaries, intersection and union work as expected. If there is a key in the intersection, then dictionary intersection retains the minimum value while dictionary union retains the maximum value. Unlike Python, Harmony dictionary intersection and union are commutative and associative.

A bag is represented by a dictionary that maps each element to its multiplicity. For example, { 10:2, 12:1 } is the bag containing two copies of 10 and one copy of 12. The bag module (Section B.3) contains various convenient routines that operate on bags. Native operations on bags include the following:

v [not] in b check if v is [not] in b
t & t & ... bag intersection
t | t | ... bag union

Program Counter

A program counter is an integer that can be used to index into Harmony bytecode. When you define a method, a lambda function, or a label, you are creating a constant of the program counter type. You can create lambda functions similarly to Python, except that the expression has to end on the keyword end. For example: lambda(x,y): x+y end.

Address

A Harmony address is a type of thunk consisting of a curried function and a list of arguments [26]. A thunk delays the invocation of the curried function. A function can be a constant or a variable and the arguments are all Harmony values. Given an address p = ?a[b][c]..., you can use the notation !p to evaluate it. Harmony will first evaluate a, then apply the result to b, then apply the result to c, and so on.

As a simple example, ?5 is the address of the constant 5, and therefore !?5 evaluates to 5. Now consider the following program:

let p = ?5:
    assert !p == 5
    !p = 5
    !p = 4
The only line in this program that fails is the last one, as you are not allowed to store 4 at the address of 5.

a can be a constant that maps Harmony values to Harmony values: dictionaries, lists, and strings. In that case, ?a[b] refers to the value of entry b in a.

The most common use of addresses is when a is a shared variable. In that case !?a evaluates to the current value of a.

Finally, a can be a program counter value (method or lambda). ?a(b) is then the thunk representing a delayed call to method a and argument b. In this case, !?a(b) evaluates a(b). For example, the following Harmony program, perhaps surprisingly, does not run into failing assertions:

counter = 0

def f():
    counter += 1
    result = counter

let p = ?f():
    if !p != 1: assert False
    if !p != 2: assert False
    if !p != 3: assert False
Internally, Harmony uses the address of a method variable and sometimes you see them on the stack during a computation. If k is a method variable, then its address is output as ?@k. However, at the Harmony language level there is no such thing as the address of a local variable. Consider the following two programs:

x = 1
let p = ?x:
    x = 2
    assert !p == 2
var x = 1
let p = ?x:
    x = 2
    assert !p == 1
In the program on the left, x is a shared variable, and ?x is the location of variable x. In the program on the right, x is a local variable. ?x evaluates x and then takes its address, so in this case ?x equals ?1.

Like C, Harmony supports the shorthand p->v for the expression (!p).v.

Context

A context value (aka continuation) captures the state of a thread. A context is itself composed over various Harmony values. The following operations generate contexts:

save e returns a Harmony value (see below)
stop p saves context in !p and stops the thread (see below)
The save e expression, when invoked, returns a tuple (e, c) where c is the context value of the thread right after the save instruction finished. Using go c r the context can be turned into a new thread that, instead, returns r from the save e expression. See Figure B.1 for an example of how this could be used to fork a thread into two threads.

The stop p expression stores the context of the thread right after the expression in !p (i.e., p must be an address value) and terminates the thread. The thread can later be reinstantiated with go !p r, in which case the stop expression returns r. A thread can be for ever suspended using stop None or just stop().

Section A.2. Statements

Harmony currently supports the following statements (below, S is a list of statements and an lvalue is an expression you can use on the left-hand side of an assignment statement):

e e is an expression
lv = [lv =]... e lv is an lvalue and e is an expression
lv [op]= e op is one of +, -, *, /, //, %, &, |, ^, and, or
assert b [, e] b is a boolean. Optionally report value of expression e
await b b is a boolean
const a = e a is a bound variable, e is a constant expression
def m a [returns v]: S m is an identifier, a a bound variable, and v a variable
del lv delete lv
finally e e is a boolean expression that must hold in each final state
for a[:b] in e [where c]: S a and b are bound variables, e is a set, dictionary, or string
from m import ... m identifies a module
global v, ... v is a shared global variable
go c e c is a context, e is an expression
if b: S else: S b is a boolean, S is a list of statements
import m, ... m identifies a module
invariant e e is an invariant (must always hold)
let a = e: S a is a bound variable, e is an expression
pass do nothing
print e e is an expression
sequential v, ... v has sequential consistency
spawn [eternal] lv [, t] lv is an lvalue expression, t is the thread-local state
trap lv lv is an lvalue expression
var v = e v is a new variable, e is an expression
when b: S b is a boolean, S a list of statements
when exists a in e: S a is a bound variable, e is an expression
while b: S b is a boolean, S a list of statements

  • Bound variables are read-only.
  • A statement can be preceded by the atomically keyword to make the statement atomic.
  • Multiple for statements can be combined into a single statement.
  • Multiple let and when statements can be combined into a single statement.

Single expression evaluation

Any expression by itself can be used as a statement. The most common form of this is a function application, for example: f(). This statement evaluates f() but ignores its result. It is equivalent to the assignment statement _ = f() .

Assignment

The statement x = 3 changes the state by assigning 3 to variable x (assuming x was not already 3). x may be a local variable or a shared variable. The statement x = y = 3 first updates y, then x. The statement x[f()] = y[g()] = h() first computes the address of x[f()], then computes the address of y[g()], then evaluates h(), then assigns the resulting value to y[g()] (using its previously computed address), and finally assigns the same value to x[f()] (again using its previously computed address). The statement a,b = c assumes that c is a tuple with two values. It first evaluates the addresses of a and b and first assigns to the latter and then the former. If c is not a tuple with two values, then Harmony will report an error.

Assigning to _ (underscore) evaluates the righthand side expression but is otherwise a no-op. The left-hand side can also contain constants. For example (3, x) = (3, True) assigns True to x. However, (3, x) = (4, True) fails.

The statement x + = 3 loads x, adds 3, and then stores the results in x. In this case, it is equivalent to x = x + 3 . However, in general this is not so. For example, x[f()] += 3 only evaluates f() once. Unlike Python, however, x += [3,] is equivalent to x = x + [3,] in Harmony. (In Python, the following two compound statements lead to different results for y: x = y = []; x += [3] and x = y = []; x = x + [3].)

assert

The statement assert b evaluates b and reports an error if b is false. It should be considered a no-op---it is part of the specification, not part of the implementation of the algorithm. In particular, it specifies an invariant: whenever the program counter is at the location where the assert statement is, then b is always true.

If b is an expression, then it is evaluated atomically. Moreover, the expression is not allowed to change the state. If it does change the state, Harmony will report an error as well.

As in Python, you can specify an additional expression: assert b, e. The value of e will be reported as part of the error should b evaluate to false.

atomically

A statement can be preceded by the atomically keyword to make the statement atomic. The statement atomically: S1; S2; ... evaluates statements S1, S2, ... atomically. This means that the statement runs indivisibly---no other thread can interleave in the atomic statement. The only exception to this is if the atomic block executes a stop expression. In this case, another thread can run. When the original thread is resumed (using a go statement), it is once again atomically executing.

atomically statements are useful for specification and implementing synchronization primitives such as test-and-set. It is also useful for testing. It is not a replacement for lock/unlock, and should not generally be used for synchronization otherwise. Lock/unlock does allow other threads to run concurrently---just not in the same critical section.

await

The statement await b is equivalent to when b: pass. It is intended to improve readability of your code.

const

The expression const N = 3 introduces a new constant N with the value 3. Evaluating N does not lead to loading from a memory location. The assignment can be overridden with the -c flag: harmony -cN=4 executes the model checker with 4 assigned to N instead of 3. Harmony also supports const N, M = 3, 4, which assigns 3 to N and 4 to M. Harmony has limited support for constant folding. For example, const N = 3 + 4 assigns value 7 to constant N.

def

The statement def m a [returns r]: S1; S2: ... defines a new program counter constant m referring to a method that takes an argument a and executes the statements S1, S2, .... The argument a can be a tuple pattern similar to those used in let and for statements. Examples include (), (x,), (x, y), and (x, (y, z)). The given local variable names are assigned upon application and are read-only. Optionally, a result variable r can be declared. If not declared, there is (for backwards compatibility), a default result variable called result, initialized to None. Harmony does not support a return statement that breaks out of the code before executing the last statement.

del

The statement del x removes variable x from the state. x can be either a local or a shared variable. For example, the statement del x.age removes the .age field from dictionary x. Harmony automatically removes top-level local variables that are no longer in use from the state in order to attempt to reduce the number of states that are evaluated during model checking.

del can also be used to remove elements from a list. x = [.a, .b, .c]; del x[1] results in x having value [.a, .c].

finally

The statement finally c declares that boolean expression c must hold in each final state. c is only allowed to read shared variables and is evaluated in each final state. If it evaluates to False, Harmony reports an error. Harmony also reports an error if the expression evaluates to a value other than False or True.

for ... in ... [where ...]

The statement for x in y: S1; S2; ... iterates over y and executes for each element the statements S1, S2, .... y must be a set, list, dictionary, or string. y is evaluated only once at the beginning of the evaluation of this statement. In case of a set, the result is sorted (using Harmony's global order on all values). In case of a dictionary, the statement iterates over the keys in order. For each element, the statements S1, S2, ... are executed with local variable y having the value of the element. x can be a pattern such as (a) or (a,(b,c)) . If the pattern cannot be matched, Harmony detects and error. It is allowed, but discouraged, to assign different values to x within statements S1, S2, ....

Harmony also supports the form for k:v in y: S1; S2; .... This works similar, except that k is bound to the key and v is bound to the value. If y is not a dictionary, then k ranges from 0 to len(y) - 1.

The statement also supports nesting and filtering. Nesting is of the form for x1 in y1 for x2 in y2: S1; S2; ..., which is equivalent to the statement for x1 in y1: for x2 in y2: S1; S2; .... Filtering is of the form for x in y where z: S1; S2; .... For example, for x in 1 .. 10 where (x % 2) == 0: S1; S2; ... only evaluates statements S1, S2, ... for even x, that is, 2, 4, 6, 8, and 10.

Harmony does not support break or continue statements.

from ... import

The statement from x import a, b, ... imports module x and makes its constants a, b, ... also constants in the current module. If a module is imported more than once, its code is only included the first time. The constants will typically be the names of methods (program counter constants) within the module.

You can import all constants from a module m (including program counter constants) using the statement from m import *. This, however, excludes constants whose names start with the character _: those are considered private to the module.

global

The statement global v, ... tells the compiler that the given variables are shared global variables.

go

def fork():
    atomically:
        let (r, ctx) = save True:
            result = r
            if r:
                go ctx (False, None)

def main():
    if fork():
        print "parent"
    else:
        print "child"

spawn eternal main()

Figure B.1. Using save and go to implement fork()

The statement go c e starts a thread with context c that has executed a stop or save expression. The stop or save expression returns value e. The same context can be started multiple times, allowing threads to fork. See Figure B.1 for an example.

if ... [elif ...]* [else]

Harmony supports if statements. In its most basic form, if c: S1; S2; ... evaluates c and executes statements S1, S2, ... if and only if boolean expression c evaluated to true. Harmony checks that c is either False or True---if neither is the case, Harmony reports an error. The statement if c: S1, S2, ... else: T1; T2; ... is similar, but executes statements T1, T2, ... if and only if c evaluated to False. You can think of elif c: as shorthand for else: if c:.

import

The statement import m1, m2, ... imports modules m1, m2, ... in that order. If a module is imported more than once, its code is only included the first time. The constants (including method constants) and shared variables declared in that module can subsequently be referenced by prepending "m.". For example, method f() in imported module m is invoked by calling m.f(). If you would prefer to invoke it simply as f(), then you have to import using the statement from m import f.

invariant

The statement invariant c declares that boolean expression c is an invariant. c is only allowed to read shared variables and is evaluated atomically after every state change. If it ever evaluates to False, Harmony reports an error. Harmony also reports an error if the expression evaluates to a value other than False or True.

Invariants can be useful to specify the type of a global variable. For example, you can write invariant (type(x) == "int") and ((x % 2) == 0) to state that x is an even integer variable.

let

You can introduce new bound variables in a method using the let statement. The statement let a = b: S1; S2, ... evaluates b, assigns the result to read-only variable a, and evaluates statements S1, S2, .... let supports pattern matching, so you can write let x, (y, z) = b: S1; S2, .... This will only work if b is a tuple with two elements, the second of which also being a tuple with two elements---if not, Harmony will report an error.

let statements may be nested, such as let a1 = b1 let a2 = b2: S1; S2; .... Doing so can improve readability by reducing indentation compared to writing them as separate statements. Compare the following two examples:

let a = y:
    let b = z:
        ...
let a = y
let b = z:

pass

The pass statement does nothing.

print

The statement print e evaluates e and adds the result to the print log. The print log is used to create an "external behavior DFA" for the Harmony program.

sequential

In Harmony, shared variable Load and Store operations are atomic and have sequential consistency. However, Harmony does check for data races. A data race occurs when two or more threads simultaneously access the same shared variable, with at least one of the accesses being a Store operation outside of an atomic block. If so, Harmony will report an error. This error can be suppressed by declaring the shared variable as sequential. In particular, the statement sequential x, y, ... specifies that the algorithm assumes that the given variables have sequential consistency.

Note that few modern processors support sequentially consistent memory by default, as doing so would lead to high overhead.

spawn

The statement spawn lv starts a new thread that evaluates lvalue expression lv. The most typical form is spawn f(a), where f is some method called with argument a. However, if c is a thunk, one could also call spawn !c, say.

The default thread-local state of the thread, called self, is the empty dictionary by default. It can be specified by adding a parameter: spawn m a, e specifies that e should be the initial value of the thread-local state.

Harmony normally checks that all threads eventually terminate. If a thread may never terminate, you should spawn it with spawn eternal m a to suppress those checks.

trap

The statement trap lv specifies that the current thread should evaluate lv at some future, unspecified, time. It models a timer interrupt or any kind of asynchronous event to be handled by the thread. Such interrupts can be disabled by setting the interrupt level of the thread to True using the setintlevel operator.

var

You can introduce new local variables in a method using the var statement. The statement var a = b evaluates b and assigns the result to local variable a. var supports pattern matching, so you can write var x, (y, z) = b. This will only work if b is a tuple with two elements, the second of which also being a tuple with two elements---if not, Harmony will report an error.

when

The statement when c: S1; S2; ... executes statements S1, S2, ... after waiting until c evaluates to True. when statements are most useful when combined with the atomically keyword. If waiting is an unused local variable, then atomically when c: S1; S2; ... is equivalent to

var waiting = True
while waiting:
    atomically:
        if c:
            S1
            S2
            ...
            waiting = False
Multiple let and when statements can be combined. The expressions before the colon are re-evaluated repeated until all when conditions are satisfied.

when exists ... in ...

The statement when exists} x in y: S1; S2; ... requires that y evaluates to a set value. The statement does the following three things:

  • it waits until y is non-empty;
  • it selects one element of y non-deterministically (using a choose expression);
  • it executes statements S1, S2, ... with the selected element bound to read-only variable x.
x may be a pattern, like in let, for, and def statements. Harmony reports an error if y evaluates to a value that is not a set.

when statements are most useful when combined with the atomically keyword. If waiting is an unused local variable, then atomically when exists x in y: S1; S2; ... is equivalent to

var waiting = True:
    while waiting:
        atomically:
            if y != {}:
                let x = choose(y):
                    S1
                    S2
                    ...
                waiting = False
The statement is particularly useful in programming network protocols when having to wait for one or more messages and executing a set of actions atomically after the desired messages have arrived.

while

The statement while c: S1; S2; ... executes statements S1, S2, ... repeatedly as long as c evaluates to True. Harmony does not support break or continue statements.

Section A.3. Harmony is not object-oriented

Python is object-oriented, but Harmony is not. For Python programmers, this can lead to some unexpected differences. For example, consider the following code:

x = y = [ 1, 2 ]
x[0] = 3
assert y[0] == 1
In Python, lists are objects. Thus x and y point to the same list, and the assertion would fail if executed by Python. In Harmony, lists are values. So, when x is updated in Line 2, it does not affect the value of y. The assertion succeeds. Harmony supports references to values (Chapter 6), allowing programs to implement shared objects.

Because Harmony does not have objects, it also does not have object methods. However, Harmony methods and lambdas are program counter constants. These constants can be added to dictionaries. For example, in Figure 6.1 you can add the Peterson_enter and Peterson_exit methods to the Peterson_mutex dictionary like so:

{ .turn: 0, .flags: [ False, False ], .enter: Peterson_enter, .exit: Peterson_exit }
That would allow you to simulate object methods.

There are at least two reasons why Harmony is not object-oriented. First, object-orientation often adds layers of indirection that would make it harder to model check and also to interpret the results. Consider, for example, a lock. In Python, a lock is an object. A lock variable would contain a reference to a lock object. In Harmony, a lock variable contains the value of the lock itself. Thus, the following statement means something quite different in Python and Harmony:

x = y = Lock()
In Python, this creates two variables x and y referring to the same lock. In Harmony, the two variables will be two different locks. If you want two variables referring to the same lock in Harmony, you might write:

lock = Lock()
x = y = ?lock
or, using the alloc module,

from alloc import malloc
x = y = malloc(Lock())
The second reason for Harmony not being object-oriented is that many concurrency solutions in the literature are expressed in C or some other low-level language that does not support object-orientation, but instead use malloc and free.

Section A.4. Constants, Global and Local Variables

Each (non-reserved) identifier in a Harmony program refers to either a global constant, a global shared variable, a local bound variable, a local mutable variable, or a module. Constants are declared using const statements. Those constants are evaluated at compile-time.

Mutable method variables can be declared using the returns clause of a def statement or using var. Bound variables, which are immutable, can be declared in def statements (i.e., arguments), let statements, for loops, and when exists statements. Each thread has a mutable variable called this that contains the thread-local state. Method variables are tightly scoped and cannot be shared between threads. While in theory one method can be declared within another, they cannot share local variables either. All other variables are global and must be initialized before spawned threads start executing.

Section A.5. Operator Precedence

In Harmony, there is no syntactic difference between applying an argument to a function or an index to a dictionary. Both use the syntax a b c .... We call this application, and application is left-associative. So, a b c is interpreted as (a b) c: b is applied to a, and then c is applied to the result. For readability, it may help to write a(b) for function application and a[b] for indexing. In case b is a simple string, you can also write a.b for indexing.

There are three levels of precedence. Application has the highest precedence. So, !a b is interpreted as !(a b) and a b + c d is interpreted as (a b) + (c d). Unary operators have the next highest precedence, and the remaining operators have the lowest precedence. For example, -2 + 3 evaluates to 1, not -5 .

Associative operators ( + , * , | , &, ^, and, or) are interpreted as general n-ary operators, and you are allowed to write a + b + c . However, ambiguous expressions such as a - b - c are illegal, as is any combination of operators with an arity larger than one, such as a + b < c . In such cases you have to add parentheses or brackets to indicate what the intended evaluation order is, such as (a + b) < c .

In almost all expressions, subexpressions are evaluated left to right. So, a[b] + c first evaluates a, then b (and then applies b to a), and then c. The one exception is the expression a if c else b, where c is evaluated first. In that expression, only a or b is evaluated depending on the value of c. In the expression a and b and ... , evaluation is left to right but stops once one of the subexpressions evaluates to False. Similarly for or, where evaluation stops once one of the subexpressions evaluates to True. A sequence of comparison operations, such as a < b < c , is evaluated left to right but stops as soon as one of the comparisons fails.

Section A.6. Tuples, Lists, and Pattern Matching

Harmony tuples and lists are equivalent. They can be bracketed either by '(' and ')' or by '[' and ']', but the brackets are often optional. Importantly, with a singleton list, the one element must be followed by a comma. For example, the statement x = 1, assigns a singleton tuple (or list) to x.

Harmony does not support special slicing syntax like Python. To modify lists, use the subseq method in the list module (Section B.6).

Harmony allows pattern matching against nested tuples in various language constructs. The following are the same in Python and Harmony:

  • x, = 1,: assigns 1 to x;
  • x, y = 1, (2, 3): assigns 1 to x and (2, 3) to y;
  • x, (y, z) = 1, (2, 3): assigns 1 to x, 2 to y, and 3 to z ;
  • x, (y, z) = 1, 2: generates an runtime error because 2 cannot be matched with (y, z);
  • x[0], x[1] = x[1], x[0]: swaps the first two elements of list x.
As in Python, pattern matching can also be used in for statements. For example:
for key, value in [ (1, 2), (3, 4) ]: ...
Harmony (but not Python) also allows pattern matching in defining and invoking methods. For example, you can write:
def f[a, (b, c)]: ...
and then call f[1, (2, 3)]. Note that the more familiar: def g(a) defines a method g with a single argument a. Invoking g(1, 2) would assign the tuple (1, 2) to a. This is not consistent with Python syntax. For single argument methods, you may want to declare as follows: def g(a,). Calling g(1,) assigns 1 to a, while calling g(1, 2) would result in a runtime error as (1, 2) cannot be matched with (a,).

Pattern matching can also be used in const, let, and when exists statements.

Section A.7. Dynamic Allocation

from stack import Stack, push, pop

teststack = Stack()
push(?teststack, 1)
push(?teststack, 2)
v = pop(?teststack)
assert v == 2
push(?teststack, 3)
v = pop(?teststack)
assert v == 3
v = pop(?teststack)
assert v == 1

Figure B.2. [code/stacktest.hny] Testing a stack implementation.

def Stack() returns stack:
    stack = []

def push(st, v):
    (!st)[len(!st)] = v

def pop(st) returns next:
    let n = len(!st) - 1:
        next = (!st)[n]
        del (!st)[n]

Figure B.3. [code/stack1.hny] Stack implemented using a dynamically updated list.

import list

def Stack() returns stack:
    stack = []

def push(st, v):
    !st += [v,]

def pop(st) returns next:
    let n = len(!st) - 1:
        next = (!st)[n]
        !st = list.subseq(!st, 0, n)

Figure B.4. [code/stack2.hny] Stack implemented using static lists.

def Stack() returns stack:
    stack = ()

def push(st, v):
    (!st) = (v, !st)

def pop(st) returns next:
    let (top, rest) = !st:
        next = top
        !st = rest

Figure B.5. [code/stack3.hny] Stack implemented using a recursive tuple data structure.

from alloc import malloc, free

def Stack() returns stack:
    stack = None

def push(st, v):
    !st = malloc({ .value: v, .rest: !st })

def pop(st) returns next:
    let node = !st:
        next = node->value
        !st = node->rest
        free(node)

Figure B.6. [code/stack4.hny] Stack implemented using a linked list.

Harmony supports various options for dynamic allocation. By way of example, consider a stack. Figure B.2 presents a test program for a stack. We present four different stack implementations to illustrate options for dynamic allocation:
  1. Figure B.3 uses a single list to represent the stack. It is updated to perform push and pop operations;
  2. Figure B.4 also uses a list but, instead of updating the list, it replaces the list with a new one for each operation;
  3. Figure B.5 represents a stack as a recursively nested tuple (v,f) , where v is the element on top of the stack and r is a stack that is the remainder;
  4. Figure B.6 implements a stack as a linked list with nodes allocated using the alloc module.
While the last option is the most versatile (it allows cyclic data structures), Harmony does not support garbage collection for memory allocated this way and so allocated memory that is no longer in use must be explicitly released using free.

Section A.8. Comments

Harmony supports the same commenting conventions as Python. In particular, anything after a # character on a line is ignored. You can also enclose comments on separate lines within triple quotes. In addition, Harmony supports nested multi-line comments of the form (* comment *).

Section A.9. Type Checking

Harmony is dynamically typed. You can add type annotations to your program in the form of assertions and invariants. For example:

invariant (type(x) == "int") and ((x % 2) == 0)
x = choose { 0, 2, 4, 6 }

def double(n) returns result:
    assert type(n) == "int"
    result = n * 2
    assert type(result) == "int"

def main():
    x = double(x)

spawn main()
The invariant in Line 1 states that x is an even integer. The assertion in Line 5 states that the argument to function double is an integer. The assertion in Line 7 states that the return value of the function is also an integer. Harmony checks these types as it evaluates the program.

Appendix B. Modules

Harmony modules provide convenient access to various data structures, algorithms, and synchronization paradigms. They are all implemented in the Harmony language itself (so you can look at their code) although some methods have also been implemented directly into the underlying model checker for more efficient model checking.

Currently there are the following modules:

action Section B.1 support for action-based specifications
alloc Section B.2 dynamic memory allocation
bag Section B.3 multi-sets
fork Section B.4 fork/join interface to threads
hoare Section B.5 Hoare module interface
list Section B.6 common operations on lists
set Section B.7 common operations on sets
synch Section B.8 synchronization

Section B.1. The action module

The action module supports action-based specification. Such a specification consists of a explicit global state and rules for how to make state transitions. Chapter 26 provides an example. The module has only one method:

explore(x) explore the state space
Here x is a set of lambdas, each of which can return a set of thunks, each representing a possible action (state change). The union of the results of the lambdas should generate all possible actions. A thunk represents a method and its arguments that updates the state accordingly.

Section B.2. The alloc module

The alloc module supports thread-safe (but not interrupt-safe) dynamic allocation of shared memory locations. There are just two methods:

malloc(v) return a pointer to a memory location initialized to v
free(p) free an allocated memory location p
The usage is similar to malloc and free in C. malloc() is specified to return None when running out of memory, although this is an impossible outcome in the current implementation of the module.

Section B.3. The bag module

The bag module has various useful methods that operate on bags or multisets:

empty() returns an empty bag
fromSet(s) create a bag from set s
fromList(t) convert list t into a bag
multiplicity(b, e) count how many times e occurs in bag b
bchoose(b) like choose(s), but applied to a bag
add(b, e) add one copy of e to bag b
remove(b, e) remove one copy of e from bag b
combinations(b, k) return set of all subbags of size k

Section B.4. The fork module

The fork module implements the fork/join interface to threads.

fork(thunk) spawn thunk and return a thread handle
join(handle) wait for the thread to finish and return its result
For example, the following code doubles each element of data in parallel and then sums the result when done.

from fork import *
from list import *

data = { 1, 2, 4 }

def main():
    let double = lambda x: 2*x end
    let map = { fork(?double(k)) for k in data }:
        print sum(join(t) for t in map)

spawn main()

Section B.5. The hoare module

The hoare module implements support for Hoare-style monitors and condition variables.

Monitor() return a monitor mutex
enter(m) enter a monitor. m points to a monitor mutex
exit(m) exit a monitor
Condition() return a condition variable
wait(c, m) wait on condition variable pointed to by c in monitor pointed to by m
signal(c, m) signal a condition variable

Section B.6. The list module

The list module has various useful methods that operate on lists or tuples:

subseq(t, b, f) return a slice of list t starting at index b and ending just before f
append(t, e) returns t + [e,]
head(t) return the first element of list t
tail(t) return all but the first element of list t
index(t, e) return the index of element e in list t
startswith(t, s) returns whether s is a prefix of t
filter(f, t) returns a list of elements of t satisfying function f
map(f, t) returns a list of elements of t mapped by function f
permuted(t) returns a permutation of set t
reversed(t) returns the elements of list t in reverse order
sorted(t) returns a sorted list from the elements or set or list t
set(t) convert a list into a set
list(t) convert a set into a list
values(t) convert values of a dict into a list sorted by key
items(t) convert dict into (key, value) list sorted by key
enumerate(t) like Python enumerate
sum(t) returns the sum of all elements in t
qsort(t) returns a copy of t sorted using quicksort
foldl(t, f, z) left fold with f a binary method and z the initial value
foldr(t, f, z) right fold with f a binary method and z the initial value
reduce(f, t, z) same as foldl(t, f, z)

Section B.7. The set module

The set module implements the following methods:

add(s, e) returns s ∪{e}
remove(s, e) returns s \ {e}
subsets(s) returns the set of subsets of s
union(s) returns the union of the elements of s
filter(f, s) returns a set of elements of s satisfying function f
map(f, s) returns a set of elements of s mapped by function f
cartesian(d) d is a list of sets. Returns the Cartesian product.
combinations(s, k) returns set of all subsets of size k
reduce(f, t, z) same as Python's functools reduce()

Section B.8. The synch module

The synch module provides the following methods:

atomic_load(p) atomically evaluate !p
atomic_store(p, v) atomically assign !p = v
tas(lk) test-and-set on !lk
cas(ptr, old, new) compare-and-swap on !ptr
BinSema(v) return a binary semaphore initialized to v
Lock() return a binary semaphore initialized to False
acquire(bs) acquire binary semaphore !bs
release(bs) release binary semaphore !bs
Condition() return a condition variable
wait(c, lk) wait on condition variable !c and lock lk
notify(c) notify a thread waiting on condition variable !c
notifyAll(c) notify all threads waiting on condition variable !c
Semaphore(cnt) return a counting semaphore initialized to cnt
P(sema) procure !sema
V(sema) vacate !sema
Queue() return a synchronized queue object
get(q) return next element of q, blocking if empty
put(q, item) add item to a

Appendix C. The Harmony Virtual Machine

The Harmony Virtual Machine (HVM, Chapter 4) has the following state:

code a list of HVM machine instructions
variables a dictionary mapping strings to values
ctxbag a bag of runnable contexts
stopbag a bag of stopped contexts
choosing if not None, indicates a context that is choosing
There is initially a single context with name __init__() and program counter 0. It starts executing in atomic mode until it finishes executing the last Return instruction. Other threads, created through spawn statements, do not start executing until then.

A step is the execution of a single HVM machine instruction by a context. Each step generates a new state. When there are multiple contexts, the HVM can interleave them. However, trying to interleave every step would be needlessly expensive, as many steps involve changes to a context that are invisible to other contexts.

A stride can involve multiple steps. The following instructions start a new stride: Load, Store, AtomicInc, and Continue. The HVM interleaves stides, not steps. Like steps, each stride involves a single context. Unlike a step, a stride can leave the state unchanged (because its steps lead back to where the stride started).

Executing a Harmony program results in a graph where the nodes are Harmony states and the edges are strides. When a state is choosing, the edges from that state are by a single context, one for each choice. If not, the edges from the state are one per context.

Consecutive strides by the same thread are called a turn. Each state maintains the shortest path to it from the initial state in terms of turns. The diameter of the graph is the length of the longest path found in terms of turns.

If some states have a problem, the state with the shortest path is reported. Problematic states include states that experienced exceptions. If there are no exceptions, Harmony computes the strongly connected components (SCCs) of the graph (the number of such components are printed as part of the output). The sink SCCs should each consist of a terminal state without any threads. If not, again the state with the shortest path is reported.

If there are no problematic states, Harmony reports "no issues found" and outputs in the HTML file the state with the longest path.

Section C.1. Machine Instructions

Apply m call method m
Assert, Assert2 pop b and check that it is True. Assert2 also pops value to print
AtomicInc/Dec increment/decrement the atomic counter of this context
Continue no-op (but causes a context switch)
Choose choose an element from the set on top of the stack
Cut retrieve an element from a iterable type
Del [v] delete shared variable v
DelVar [v] delete thread variable v
Dup duplicate the top element of the stack
Finally pc pc is the pc of a lambda that returns a boolean
Frame m a start method m with arguments a, initializing variables
Go pop context and value, push value on context's stack, and add to context bag
Invariant pc pc is the pc of a lambda that takes arguments pre, post and returns a boolean
Jump p set program counter to p
JumpCond e p pop expression and, if equal to e, set program counter to p
Load [v] evaluate the address on the stack (or load shared variable v)
LoadVar v push the value of a thread variable onto the stack
Move i move stack element at offset i to top of the stack
n-ary op apply n-ary operator op to the top n elements on the stack
Pop pop a value of the stack and discard it
Print pop a value and add to the print history
Push c push constant c onto the stack
ReadonlyInc/Dec increment/decrement the read-only counter of this context
Return [v [, d]] pop return address, push v (or default value d), and restore pc
Sequential pop an address of a variable that has sequential consistency
SetIntLevel pop e, set interrupt level to e, and push old interrupt level
Spawn [eternal] pop initial thread-local state, argument, and method and spawn a new context
Split pop tuple and push its elements
Stop [v] save context into shared variable v and remove from context bag
Store [v] pop a value from the stack and store it in a shared variable
StoreVar [v] pop a value from the stack and store it in a thread variable
Trap pop interrupt argument and method

Clarifications:

  • Even though Harmony code does not allow taking addresses of thread variables, both shared and thread variables can have addresses.
  • The Load, Del, DelVar, and Stop instructions have an optional variable name: if omitted the top of the stack must contain the address of the variable.
  • The Store instruction has an optional variable name. The StoreVar instruction can even have a nested tuple of variable names such as (a, (b, c)). In both cases the value to be assigned is on the top of the stack. If the name is omitted, the address is underneath that value on the stack.
  • The Frame instruction pushes the value of the thread register (i.e., the values of the thread variables) onto the stack. The Return instruction restores the thread register by popping its value of the stack.
  • All method calls have exactly one argument, although it sometimes appears otherwise:
    • m() invokes method m with the empty dictionary () as argument;
    • m(a) invokes method m with argument a;
    • m(a, b, c) invokes method m with tuple (a, b, c) as argument.
    The Frame instruction unpacks the argument to the method and places them into thread variables by the given names.
  • The Apply instruction is unnecessary as it can be implemented using 2-ary Closure and Load. However, method calls are frequent enough to warrant a faster mechanism, reducing model checking time.
  • The Return instruction has an optional result variable and default value. If neither is specified, the result value is on top of the stack. Otherwise it tries to read the local variable. If the variable does not exist, the default value is used or an error is thrown.
  • Every Stop instruction must immediately be followed by a Continue instruction.
  • There are two versions of AtomicInc: lazy or eager. When eager, an atomic section immediately causes a switch point (switch between threads). When lazy, the state change does not happen until the first Load, Store, or Print instruction. If there are no such instructions, the atomic section may not even cause a switch point.
The n-Ary instruction can have many different operators as argument. Section A.1 describes many of these operators, but some are used internally only. The current set of such operators are as follows:

AddArg pop an argument and an address and push an address with the argument added
Closure pop an argument and a function and push an address with the single argument
DictAdd pop a value, a key, and a dictionary, and push an updated dictionary
ListAdd pop a value and a list, and push a new list with the given value added to the end
SetAdd pop a value and a set, and push a new set with the given value added

Section C.2. Addresses and Method Calls

Syntactically, Harmony does not make a distinction between methods calls and indexing in Harmony dictionaries, lists, and strings. This is because Harmony makes all four look like functions that map a value to another value. Beuses dynamic types, an expression like a b could mean that variable a contains a program counter value and a method call must be made with b as argument, or index b must be looked up in the a value. Things can get more complicated for an expression like a b c, which means ((a b) c): a b could return a program counter value or an indexable Harmony value.

To deal with this, Harmony has a fairly unique address type. An address consists of a function and a list of arguments, which we will denote here as f, [a0,a1,...] . If a is a shared variable, then the address of a b c is $, [ "a" , b, c ], where $ is the function that maps the names of shared variables to their values. In particular, $("a") is the value of variable a. A function can also be a program counter value or an indexable Harmony value. So, if a is instead a method (i.e., a program counter constant), then the address would by a,[b,c]. In the Harmony Virtual Machine, the $ function is represented as the program counter value -1 .

To evaluate the Harmony expression a b c, Harmony first generates its address (evaluating the expression left to right). If a is a variable name, then the function in the address depends on whether it is a shared variable or a thread variable. After the address is computed and pushed onto the stack, the Load instruction evaluates the address, possibly in multiple steps in an iterative manner.

A basic step of evaluating function,arguments proceeds as follows:

  1. If arguments is empty, replace the address by function and proceed to the next instruction.
  2. If function is an indexable Harmony value (list, string, or dictionary), arg is the first argument, and remainder are the remaining arguments, then replace the address by function[arg],remainder and repeat.
  3. If function is $, then replace the address by $[arg],remainder and repeat.
  4. If function is a program counter value, then push remainder, the current program counter (still pointing to the Load instruction), and arg onto the stack and set the program counter to function. The Return instruction pushes r,remainder , where r is the result of the function, and restores the program counter so it executes the Load instruction again.
The Harmony Virtual Machine can sometimes to multiple of these basic steps in one big step. For example, if a b c is a memory address, the Load instruction will finish in a single atomic step. Both Load and Return are optimized in such ways.

Section C.3. Contexts and Threads

A context captures the state of a thread. Each time the thread executes an instruction, it goes from one context to another. All instructions update the program counter (Jump instructions are not allowed to jump to their own locations), and so no instruction leaves the context the same. There may be multiple threads with the same state at the same time. A context consists of the following:

program counter an integer value pointing into the code
atomic if non-zero, the thread is in atomic mode
readonly if non-zero, the thread is in read-only mode
stack a list of Harmony values
method variables a dictionary mapping strings (names of method variables) to values
thread-local variables a dictionary mapping strings (names of thread-local variables) to values
stopped a boolean indicating if the context is stopped
failure if not None, string that describes how the thread failed
Details:

  • A thread terminates when it reaches the Return instruction of the top-level method (when the stack frame is of type thread) or when it hits an exception. Exceptions include divide by zero, reading a non-existent key in a dictionary, accessing a non-existent variable, as well as when an assertion fails;
  • The execution of a thread in atomic mode does not get interleaved with that of other threads.
  • The execution of a thread in read-only mode is not allowed to update shared variables of spawn threads.
  • The register of a thread always contains a dictionary, mapping strings to arbitrary values. The strings correspond to the variable names in a Harmony program.

Section C.4. Formal Specification

Most of the Harmony Virtual Machine is specified in TLA+. Given a Harmony program, you can output the TLA+ specification for the program using the following command:

$ harmony -o program.tla program.hny
For most Harmony programs, including Peterson's algorithm and the Dining Philosophers in this book, the result is complete enough to run through the TLC model checker.

Appendix D. How Harmony Works

This appendix gives a very brief overview of how Harmony works. In a nutshell, Harmony goes through the following three phases:

  1. The Harmony compiler turns your Harmony program into bytecode. A recursive descent parser and code generator written in Python (see harmony.py) turns an x.hny program into x.hvm, a JSON file containing the corresponding bytecode.
  2. The Harmony model checker evaluates the state space that the program (now in bytecode) can generate. The model checker is written in C as it needs to be highly efficient (see charm.c). The model checker starts from the initial state, and then, iteratively, checks for each state that it has found what next steps are possible and generates the next states using the Harmony virtual machine (Appendix C). If the model is finite, eventually the model checker will generate a graph with all possible states. If there is a problematic path in this graph (see below), then it will report the shortest such path in the x.hco output file in JSON format.
  3. The x.hco output file is translated twice by harmony.py. There is a so-called brief output that is written to standard output. The rest depends on whether there was a problem with the execution or not. If there was a problem, the more comprehensive output is placed in the x.htm HTML output file, allowing you to navigate the problematic path and all the details of each of the states on the path. If not, a DFA of the print behavior is generated and compared with a provided DFA if specified with the -B flag.

Section D.1. Compiler

The Harmony compiler, in order to stay true to the Harmony source program, does not do much in the way of optimizations. The main optimizations that it does are:

  • Constant folding: (simple) expressions consisting only of constants are evaluated by the compiler rather than by the model checker;
  • Jump threading: Harmony eliminates jump to jump instructions;
  • Dead variable elimination: Harmony removes method variables that are no longer in use from the state in order to reduce the state space to be explored.

Section D.2. Model Checker

The Harmony model checker, called Charm, takes the output from the compiler and explores the entire state space in breadth-first order. Even though Harmony does not really support input, there are three sources of non-determinism that make this exploration non-trivial:

  • choose expressions: Harmony's ability to let the program choose a value from a set;
  • thread interleaving: different threads run pseudo-concurrently with their instructions interleaved in arbitrary ways;
  • interrupts: Harmony programs can set interrupts that can go off at arbitrary times.
A thread can be in atomic mode or not. In atomic mode, the execution of the thread is not interleaved with other threads. A thread can also be in read-only mode or not. In read-only mode, the thread cannot write or deleted shared variables.

Charm has some tricks to significantly reduce the state space to explore.

  • A thread can have local state (program counter, stack, method variables, and thread-local state variables). That state is called the context of the thread. The context of a thread cannot be accessed by other threads, nor by invariant or finally statements. So, the model checker only interleaves threads at Load, Store, and Del instructions where a thread interacts with global variables.
  • Threads are anonymous, and therefore two or more threads can have the same context. The state of the model checker therefore maintains a bag (multiset) of contexts rather are than a set of contexts. Thus even if there are hundreds of threads, there may be only tens of possible context states.
That said, state space explosion is still a possibility, and Harmony programmers should keep this in mind when writing and testing their programs. Do not be too ambitious: start with small tests and gradually build them up as necessary.

The model checker stops either when it finds a failing execution or when it has explored the entire state space, whichever comes first. An execution can fail for a variety of reasons:

  • An invariant failing: Harmony evaluates all invariants in all states that if finds---if one fails, Harmony stops further exploration;
  • An assertion failing;
  • A behavior violation: this is when the sequence of printed values are not recognized by the provided DFA (using the -B flag);
  • A silly error: this includes reading variables that have not been assigned, trying to add a set to an integer, taking the length of something that is not a set of a dictionary, and so on;
  • An infinite loop: a thread goes into an infinite loop without accessing shared variables.

Section D.3. Model Checker Output Analysis

The output of the model checker is a graph (a so-called Kripke structure) that is typically very large. If some execution failed, then Harmony will simply report the path of that failing execution. But otherwise there may be the following outcomes:

  • No issues: no failing executions and each program can terminate;
  • Non-terminating states: some executions lead to some form of deadlock or other issue that causes some (non-eternal) threads not to be able to terminate;
  • Race conditions: there are executions in which two threads access the same shared state variable, with at least one of those accesses being a Store operation;
  • Busy waiting: executions in which threads are actively waiting for some condition, usually by releasing and reacquiring locks.
In order to diagnose these outcomes, Harmony must analyze the graph.

The first thing that Harmony does is to locate non-terminating states, if any. To do this, Harmony first determines the strongly connected components of the graph using Kosaraju's algorithm. A component (subgraph) of a graph is strongly connected if each vertex (state) in the component can be reached from each other vertex. The components then form a Directed Acyclic Graph (DAG). The DAG is easier to analyze than the original graph. One can easily determine the sink components (the components with no outgoing edges). If such a component has non-eternal threads in it, then each state in that component is a non-terminating state.

To find race conditions, the model checker looks in the graph for states in which there are multiple threads that can make a step. If there is a step in which multiple threads access the same shared variable, at least one of those accesses is a store operation, and at least one of those threads is not in atomic mode, then Harmony reports the shortest path to such a state.

To show how Harmony detects busy waiting, we will first show how Harmony determines if a thread is blocked or not. A thread is considered blocked if it cannot terminate without the help of another thread. For example, a thread waiting for a lock is blocked and cannot terminate until another thread releases the lock. Determining whether a thread is blocked in a particular state can be done within the confines of the connected component: the analyzer tries all possible executions of the thread. If it cannot "escape" the connected component by doing so, it is considered blocked. A thread is considered busy waiting if it is blocked, but it is also changing the shared state while doing so. A thread that is waiting on a spinlock only observes the state.

In the output, each thread has a unique identifier: T0 is the initialization thread; Tn is the nth spawned thread that executes. This seems to contradict the fact that Harmony threads are anonymous. The output analyzer assigns these identifiers a posteriori to the threads in the state graph by keeping track, along the reported execution path, what state each thread is in. So, by examining the initial context of the thread that is running from some particular state, it can determine if that context corresponds to the current context of some thread that ran previously or if the context belongs to a new thread that has not run before.

If there are no issues, Harmony also generates a DFA of the print behavior. Starting with the original state graph or Kripke structure, the edges are inspected. If there are multiple print operations on an edge, additional states are inserted so that there are either 0 or 1 print operations on an edge. This graph of nodes (states) and edges (transitions) forms a Non-deterministic Finite Automaton (NFA) with ϵ -transitions (transitions without print operations). Harmony turns the NFA into a DFA and by default also minimizes the DFA (although not strictly necxessary). The DFA can be fed into another run of the model checker to check that its print operations are consistent with the provided DFA.

Appendix E. Simplified Grammar

The next pages show a compact version of the complete Harmony grammar. The precedence rules are loosely as follows. Application binds most strongly. Next are unary operators. Next are binary operators. Thus -a[1] - a[2] parses as (-(a[1])) - (a[2]). !a[1] parses as !(a[1]). Harmony will complain about ambiguities such as a - b + c. Avoiding other ambiguities, Harmony does not allow expressions of the form a @b where @ is some kind of unary operator. You have to write this as either a[@b] or a(@b). The simplified grammar ignores indentation rules.

block: statement [[NEWLINE | ';'] statement]*;

statement
    : e     # usually a function call
e   # expression

Appendix F. Directly checking linearizability

from synch import Lock, acquire, release
from alloc import malloc, free

def Queue():
    result = { .head: None, .tail: None, .lock: Lock(), .time: 0 }

def _linpoint(q):
    atomically:
        this.qtime = q->time
        q->time += 1

def put(q, v):
    let node = malloc({ .value: v, .next: None }):
        acquire(?q->lock)
        if q->tail == None:
            q->tail = q->head = node
        else:
            q->tail->next = node
            q->tail = node
        _linpoint(q)
        release(?q->lock)
    
def get(q):
    acquire(?q->lock)
    let node = q->head:
        if node == None:
            result = None
        else:
            result = node->value
            q->head = node->next
            if q->head == None:
                q->tail = None
            free(node)
    _linpoint(q)
    release(?q->lock)

Figure G.1. [code/queuelin.hny] Queue implementation with linearization points

import queuelin, queuespec

const NOPS = 4
const VALUES = { 1..NOPS }

sequential qtime
qtime = 0

implq = queuelin.Queue()
specq = queuespec.Queue()

def thread():
    let op = choose({ "get", "put" }):
        if op == "put":
            let v = choose(VALUES):
                queuelin.put(?implq, v)
                await qtime == this.qtime
                queuespec.put(?specq, v)
        else:
            let v = queuelin.get(?implq):
                await qtime == this.qtime
                let w = queuespec.get(?specq):
                    assert v == w
    atomically qtime += 1

for i in {1..NOPS}:
    spawn thread()

Figure G.2. [code/qtestconc.hny] Concurrent queue test

We want a concurrent queue to behave consistently with a sequential queue in that all put and get operations should appear to happen in a total order. Moreover, we want to make sure that if some put or get operation o1 finished before another operation o2 started, then o1 should appear to happen before o2 in the total order. If these two conditions are met, then we say that the concurrent queue implementation is linearizable.

In general, if a data structure is protected by a single lock and every operation on that data structure starts with acquiring the lock and ends with releasing the lock, it will automatically be linearizable. The queue implementation in Figure 9.3 does not quite match this pattern, as the put operation allocates a new node before acquiring the lock. However, in this case that is not a problem, as the new node has no dependencies on the queue when it is allocated.

Still, it would be useful to check in Harmony that Figure 9.3 is linearizable. To do this, instead of applying the operations sequentially, we want the test program to invoke the operations concurrently, consider all possible interleavings, and see if the result is consistent with an appropriate sequential execution of the operations.

Harmony provides support for testing linearizability, but requires that the programmer identifies what are known as linearization points in the implementation that indicate exactly which sequential execution the concurrent execution must align with. Figure G.1 is a copy of Figure 9.3 extended with linearization points. For each operation (get and put), the corresponding linearization point must occur somewhere between acquiring and releasing the lock. Each linearization point execution is assigned a logical timestamp. Logical timestamps are numbered 0,1,... To do so, we have added a counter (time) to the Queue. Method _linpoint saves the current counter in this.qtime and increments the counter. The this dictionary maintains thread-local state associated with the thread (Chapter 4)---it contains variables that can be accessed by any method in the thread.

Given the linearization points, Figure G.2 shows how linearizability can be tested. The test program is similar to the sequential test program (Figure 10.1) but starts a thread for each operation. The operations are executed concurrently on the concurrent queue implementation of Figure G.1, but they are executed sequentially on the sequential queue specification of Figure 9.1(a). To that end, the test program maintains a global time variable qtime, and each thread waits until the timestamp assigned to the last concurrent queue operation matches qtime before invoking the sequential operation in the specification. Afterward, it atomically increments the shared qtime variable. This results in the operations being executed sequentially against the sequential specification in the same order of the linearization points of the concurrent specification.

Appendix G. Manual Pages

NAME

Harmony --- the Harmony compiler and model checker

SYNOPSIS

harmony [options] filename

DESCRIPTION

harmony is a compiler and model checker for the Harmony programming language. harmony compiles Harmony into bytecode and then model checks the bytecode. The result is analyzed for failing assertions and invariants, non-terminating conditions such as deadlock and infinite loops, race conditions, deviations from specifications, and busy waiting. There are three phases:

  • compile: parses Harmony source code and generates Harmony virtual machine code;
  • model check: generates a graph of all reachable states from the Harmony virtual machine code while checking for safety violations;
  • analysis: checks the graph for non-termination, race conditions, and busy waiting.
The Harmony file name extensions are as follows:

  • .hny: Harmony source code;
  • .hvm: Harmony virtual machine code (in JSON format);
  • .hco: Harmony output (in JSON format);
  • .hvb: Harmony verbose output (human readable);
  • .hfa: Harmony finite automaton, describing the possible print outputs (in JSON format).
In addition, harmony can also generate .tla (TLA+), .htm (HTML), .gv: (Graphviz DOT version of .hfa output), .png: (PNG version of .hfa output), and .tex: (LaTeX formatted source code).

By default, running ``harmony x.hny' will generate x.hvm, x.hco, x.hvb, x.png, and x.hvm files. Harmony will also, by default, automatically start a web browser to display the x.hvm file. Various options can be used to change the behavior.

When importing a module using import x, harmony will try to find the corresponding .hny file in the following order:

  1. check if the module file is specified with the -m or --module option;
  2. see if a file by the name x.hny is present in the same directory as the source file;
  3. see if a file by the name x.hny is present in the installation's modules directory.

OPTIONS

Output file options:

  • -o filename.gv: specify the name of the file where the graphviz (DOT) output should be stored;
  • -o filename.hco: specify the name of the file where model checker output should be stored;
  • -o filename.hfa: specify the name of the file where the Harmony finite automaton should be stored;
  • -o filename.htm: specify the name of the file where the HTML output should be stored;
  • -o filename.hvb: specify the name of the file where the verbose output should be stored;
  • -o filename.hvm: specify the name of the file where the Harmony virtual machine code should be stored;
  • -o filename.png: specify the name of the file where the PNG output should be stored;
  • -o filename.tla: generate a TLA+ file specifying the behaviors of the Harmony virtual machine code;
  • -o filename.tex: generate a LaTeX+ file containing the formatted source code.

Other options:

  • -a: compile only and list machine code (with labels);
  • -A: compile only and list machine code (without labels);
  • -B filename.hfa: check Harmony code against output behaviors described in filename.hfa (result of another Harmony run);
  • -c, --const constant=expression: set the value of the given constant (which must be defined in the code) to the result of evaluating the given expression;
  • -m, --module module=filename.hny: load the given module instead of looking in default locations;
  • --noweb: do not start a web browser upon completion;
  • -v, --version: print the harmony version number.
  • -w #workers: specify the number of concurrent threads the model checker uses.

Appendix H. Peterson's Algorithm

Figure I.1. Venn diagram classifying all states and a trace

In 1981, Gary L. Peterson came up with a beautiful solution to the mutual exclusion problem, now known as ``Peterson's Algorithm'' [37]. Figure 5.6 presents the algorithm. Why does it work? We will focus here on how one might go about proving mutual exclusion for an algorithm such as Peterson's. It turns out that doing so is not trivial. You have to understand a little bit about how the Harmony virtual machine (HVM) works. In Chapter 4 we talked about the concept of state: at any point in time the HVM is in a specific state. A state is comprised of the values of the shared variables as well as the values of the thread variables of each thread, including its program counter and the contents of its stack. Each time a thread executes a HVM machine instruction, the state changes (if only because the program counter of the thread changes). We call that a step. Steps in Harmony are atomic.

The HVM starts in an initial state in which there is only one thread (__init__()) and its program counter is 0. A trace is a sequence of steps starting from the initial state, resulting in a sequence of states. When making a step, there are two sources of non-determinism in Harmony. One is when there is more than one thread that can make a step. The other is when a thread executes a choose operation and there is more than one choice. Because there is non-determinism, there are multiple possible traces. We call a state reachable if it is either the initial state or it can be reached from the initial state through a finite trace. A state is final when there are no threads left to make state changes.

It is often useful to classify states. Initial, final, and reachable, and unreachable are all examples of classes of states. Figure I.1 depicts a Venn diagram of various classes of states and a trace. One way to classify states is to define a predicate over states. All states in which x = 1, or all states where there are two or more threads executing, are examples of such predicates. For our purposes, it is useful to define a predicate that says that at most one thread is in the critical section. We shall call such states exclusive.

An invariant of a program is a predicate that holds over all states that are reachable by that program. We want to show that exclusivity is an invariant because mutual exclusion means that all reachable states are exclusive. In other words, we want to show that the set of reachable states of executing the program is a subset of the set of states where there is at most one thread in the critical section.

One way to prove that a predicate is an invariant is through induction on the number of steps. First you prove that the predicate holds over the initial state. Then you prove that for every reachable state, and for every step from that reachable state, the predicate also holds over the resulting state. For this to work you would need a predicate that describes exactly which states are reachable. But we do not have such a predicate: we know how to define the set of reachable states inductively, but---given an arbitrary state---it is not easy to see whether it is reachable or not.

To solve this problem, we will use what is called an inductive invariant. An inductive invariant I is a predicate over states that satisfies the following:

One candidate for such a predicate is exclusivity itself. After all, it certainly holds over the initial state. And as Harmony has already determined, exclusivity is an invariant: it holds over every reachable state. Unfortunately, exclusivity is not an inductive invariant. To see why, consider the following state s: let thread 0 be at label cs and thread 1 be at the start of the await statement. Also, in state s, turn = 1 . Now let thread 1 make a step. Because turn = 1 , thread 1 will stop waiting and also enter the critical section, entering a state that is not exclusive. So, exclusivity is an invariant (holds over every reachable state, as demonstrated by Harmony), but not an inductive invariant. It will turn out that s is not reachable.

We are looking for an inductive invariant that implies exclusivity. In other words, the set of states where the inductive invariant holds must be a subset of the set of states where there is at most one thread in the critical section.

Let us begin with considering the following important property: F(i) = thread(i)@[10⋅⋅⋅17] flags[i] , that is, if thread i is executing in lines 10 through 17, then flags[i] is set. Although it does not, by itself, imply exclusivity, we can show that F(i) is an inductive invariant (for both threads 0 and 1). To wit, it holds in the initial state, because in the initial state thread i does not even exist yet. Now we have to show that if F(i) holds in some state, then F(i) also holds in a next state. Since only thread i ever changes flags[i] , we only need to consider steps by thread i. Since F(i) holds, there are two cases to consider:

  1. states in which flags[i] = true
  2. states in which ¬thread(i)@[10⋅⋅⋅17] (because false implies anything)
In each case, we need to show that if thread i takes a step, then F(i) still holds. In the first case, there is only one step that thread i can take that would set flags[i] to false: the step from line 17 to line 18. But executing the line would also take the thread out of lines 10⋅⋅⋅17 , so F(i) continues to hold. In the second case (thread i is not executing in lines 10⋅⋅⋅17 ), the only step that would cause thread i to execute in lines 10⋅⋅⋅17 would be the step in line 9. But in that case flags[i] would end up being true, so F(i) continues to hold as well. So, F(i) is an inductive invariant (for both threads 0 and 1).

While F(i) does not imply mutual exclusion, it does imply the following useful invariant: thread(i)@cs flags[i] : when thread i is at the critical section, flags[i] is set. This seems obvious from the code, but now you know how to prove it. We will use a similar technique to prove the exclusivity is invariant.

We need a stronger inductive invariant than F(i) to prove mutual exclusion. What else do we know when thread i is in the critical section? Let C(i) = ¬flags[1 - i] turn = i , that is, the condition on the await statement for thread i. In a sequential program, C(i) would clearly hold if thread i is in the critical section: thread(i)@cs C(i) . However, because thread 1 - i is executing concurrently, this property does not hold. You can use Harmony to verify this. Just place the following command in the critical section of the program:

assert (not flags[1 - self]) or (turn == self)
When running Harmony, this assertion will fail. You can check the HTML output to see what happened. Suppose thread 0 is at the critical section, flags[0] = true, turn = 1 , and thread 1 just finished the step in line 7, setting flags[1] to true. Then C(0) is violated. But it suggests a new property: G(i) = thread(i)@cs C(i) thread(1 - i)@12 . That is, if thread i is at the critical section, then either C(i) holds or thread 1 - i is about to execute line 12.

G(i) is an invariant for i = 0,1 . Moreover, if F(i) and G(i) both hold for i = 0,1 , then mutual exclusion holds. We can show this using proof by contradiction. Suppose mutual exclusion is violated and thus both threads are in the critical section. By F it must be the case that both flags are true. By G and the fact that neither thread is about to execute Line 12, we know that both C(0) and C(1) must hold. This then implies that turn = 0 turn = 1 , providing the desired contradiction.

We claim that G(i) is an inductive invariant. First, since neither thread in in the critical section in the initial state, it is clear that G(i) holds in the initial state. Without loss of generality, suppose i = 0 (a benefit from the fact that the algorithm is symmetric for both threads). We still have to show that if we are in a state in which G(0) holds, then any step will result in a state in which G(0) still holds.

First consider the case that thread 0 is at label cs. If thread 0 were to take a step, then in the next state thread 0 would be no longer at that label and G(0) would hold trivially over the next state. Therefore we only need to consider a step by thread 1. From G we know that one of the following three cases must hold before thread 1 takes a step:

  1. flags[1] = False;
  2. turn = 0;
  3. thread 1 is about to execute Line 12.
Let us consider each of these cases. We have to show that if thread 1 takes a step, then one of those cases must hold after the step. In the first case, if thread 1 takes a step, there are two possibilities: either flags[1] will still be False (in which case the first case continues to hold), or flags[1] will be True and thread 1 will be about to execute Line 12 (in which case the third case will hold). We know that thread 1 never sets turn to 1, so if the second case holds before the step, it will also hold after the step. Finally, if thread 1 is about to execute Line 12 before the step, then after the step turn will equal 0, and therefore the second case will hold after the step.

Now consider the case where thread 0 is not in the critical section, and therefore G(0) holds trivially because false implies anything. There are three cases to consider:

  1. Thread 1 takes a step. But then thread 0 is still not in the critical section and G(0) continues to hold;
  2. Thread 0 takes a step but still is not in the critical section. Then again G(0) continues to hold.
  3. Thread 0 takes a step and ends up in the critical section. Because thread 0 entered the critical section, we know that flags}[1] = False or turn = 0 because of the await condition. And hence G(0) continues to hold in that case as well.
We have now demonstrated mutual exclusion in Peterson's Algorithm in two different ways: one by letting Harmony explore all possible executions, the other using inductive invariants and proof by induction. The former is certainly easier, but it does not provide intuition for why the algorithm works. The second provides much more insight.

Even though they are not strictly necessary, we encourage you to include invariants in your Harmony code. They can provide important insights into why the code works.

A cool anecdote is the following. When the author of Harmony had to teach Peterson's Algorithm, he refreshed his memory by looking at the Wikipedia page. The page claimed that the following predicate is invariant: if thread i is in the critical section, then C(i) (i.e., G without the disjunct that thread 1 - i is about to execute Line 12. We already saw that this is not an invariant. (The author fixed the Wikipedia page with the help of Fred B. Schneider.)

This anecdote suggests the following. If you need to do a proof by induction of an algorithm, you have to come up with an inductive invariant. Before trying to prove the algorithm, you can check that the predicate is at least invariant by testing it using Harmony. Doing so could potentially avoid wasting your time on a proof that will not work because the predicate is not invariant, and therefore not an inductive invariant either. Moreover, analyzing the counterexample provided by Harmony may well suggest how to fix the predicate.

Exercises

Appendix I. Split Binary Semaphores

from synch import BinSema, acquire, release

def RWlock() returns lock:
    lock = {
            .nreaders: 0, .nwriters: 0, .mutex: BinSema(False),
            .r_gate: { .sema: BinSema(True), .count: 0 },
            .w_gate: { .sema: BinSema(True), .count: 0 }
        }

def _release_one(rw):
    if (rw->nwriters == 0) and (rw->r_gate.count > 0):
        release(?rw->r_gate.sema)
    elif ((rw->nreaders + rw->nwriters) == 0) and (rw->w_gate.count > 0):
        release(?rw->w_gate.sema)
    else:
        release(?rw->mutex)

def read_acquire(rw):
    acquire(?rw->mutex)
    if rw->nwriters > 0:
        rw->r_gate.count += 1; _release_one(rw)
        acquire(?rw->r_gate.sema); rw->r_gate.count -= 1
    rw->nreaders += 1
    _release_one(rw)

def read_release(rw):
    acquire(?rw->mutex); rw->nreaders -= 1; _release_one(rw)

def write_acquire(rw):
    acquire(?rw->mutex)
    if (rw->nreaders > 0) or (rw->nwriters > 0):
        rw->w_gate.count += 1; _release_one(rw)
        acquire(?rw->w_gate.sema); rw->w_gate.count -= 1
    rw->nwriters += 1
    _release_one(rw)

def write_release(rw):
    acquire(?rw->mutex); rw->nwriters -= 1; _release_one(rw)

Figure J.1. [code/rwlock_sbs.hny] Reader/Writer Lock using Split Binary Semaphores

The Split Binary Semaphore (SBS) approach is a general technique for implementing conditional waiting. It was originally proposed by Tony Hoare and popularized by Edsger Dijkstra [18]. A binary semaphore is a generalization of a lock. While a lock is always initialized in the released state, a binary semaphore---if so desired---can be initialized in the acquired state. SBS is an extension of a critical section that is protected by a lock. If there are n waiting conditions, then SBS uses n + 1 binary semaphores to protect the critical section. An ordinary critical section has no waiting conditions and therefore uses just one binary semaphore (because n = 0 ). But, for example, a bounded buffer has two waiting conditions:

  1. consumers waiting for the buffer to be non-empty;
  2. producers waiting for an empty slot in the buffer.
So, it will require 3 binary semaphores if the SBS technique is applied.

Think of each of these binary semaphores as a gate that a thread must go through in order to enter the critical section. A gate is either open or closed. Initially, exactly one gate, the main gate, is open. Each of the other gates, the waiting gates, is associated with a waiting condition. When a gate is open, one thread can enter the critical section, closing the gate behind it.

When leaving the critical section, the thread must open exactly one of the gates, but it does not have to be the gate that it used to enter the critical section. In particular, when a thread leaves the critical section, it should check for each waiting gate if its waiting condition holds and if there are threads trying to get through the gate. If there is such a gate, then it must select one and open that gate. If there is no such gate, then it must open the main gate.

Finally, if a thread is executing in the critical section and needs to wait for a particular condition, then it leaves the critical section and waits for the gate associated with that condition to open.

The following invariants hold:

The main gate is implemented by a binary semaphore, initialized in the released state (signifying that the gate is open). The waiting gates each consist of a pair: a counter that counts how many threads are waiting behind the gate and a binary semaphore initialized in the acquired state (signifying that the gate is closed).

We will illustrate the technique using the reader/writer problem. Figure J.1 shows code. The first step is to enumerate all waiting conditions. In the case of the reader/writer problem, there are two: a thread that wants to read may have to wait for a writer to leave the critical section, while a thread that wants to write may have to wait until all readers have left the critical section or until a writer has left. The state of a reader/writer lock thus consists of the following:

Each of the read_acquire, read_release, write_acquire, and write_release methods must maintain this state. First they have to acquire the mutex (i.e., enter the main gate). After acquiring the mutex, read_acquire and write_acquire each must check to see if the thread has to wait. If so, it increments the count associated with its respective gate, opens a gate (using method release_one), and then blocks until its waiting gate opens up.

_release_one() is the function that a thread uses when leaving the critical section. It must check to see if there is a waiting gate that has threads waiting behind it and whose condition is met. If so, it selects one and opens that gate. In the given code, _release_one() first checks the readers' gate and then the writers' gate, but the other way around works as well. If neither waiting gate qualifies, then _release_one() has to open the main gate (i.e., release mutex).

Let us examine read_acquire more carefully. First, the method acquires mutex. Then, in the case that the thread finds that there is a writer in the critical section ( nwriters > 0 ), it increments the counter associated with the readers' gate, leaves the critical section (release_one), and then tries to acquire the binary semaphore associated with the waiting gate. This causes the thread to block until some other thread opens that gate.

Now consider the case where there is a writer in the critical section and there are two readers waiting. Let us see what happens when the writer calls write_release:

  1. After acquiring mutex, the writer decrements nwriters, which must be 1 at this time, and thus becomes 0.
  2. It then calls _release_one(). _release_one() finds that there are no writers in the critical section and there are two readers waiting. It therefore releases not mutex but the readers' gate's binary semaphore.
  3. One of the waiting readers can now re-enter the critical section. When it does, the reader decrements the gate's counter (from 2 to 1) and increments nreaders (from 0 to 1). The reader finally calls _release_one().
  4. Again, _release_one() finds that there are no writers and that there are readers waiting, so again it releases the readers' semaphore.
  5. The second reader can now enter the critical section. It decrements the gate's count from 1 to 0 and increments nreaders from 1 to 2.
  6. Finally, the second reader calls _release_one(). This time _release_one() does not find any threads waiting, and so it releases mutex. There are now two reader threads that are holding the reader/writer lock.

import synch

def Monitor() returns monitor:
    monitor = synch.Lock()

def enter(mon):
    synch.acquire(mon)

def exit(mon):
    synch.release(mon)

def Condition() returns condition:
    condition = { .sema: synch.BinSema(True), .count: 0 }

def wait(cond, mon):
    cond->count += 1
    exit(mon)
    synch.acquire(?cond->sema)
    cond->count -= 1

def signal(cond, mon):
    if cond->count > 0:
        synch.release(?cond->sema)
        enter(mon)

Figure J.2. [modules/hoare.hny] Implementation of Hoare monitors

import hoare

def BoundedBuffer(size) returns buffer:
    buffer = {
            .mon: hoare.Monitor(),
            .prod: hoare.Condition(), .cons: hoare.Condition(),
            .buf: { x:() for x in {1..size} },
            .head: 1, .tail: 1,
            .count: 0, .size: size
        }

def Queue() returns empty:
    empty = BoundedBuffer(4)
    
def put(bb, item):
    hoare.enter(?bb->mon)
    if bb->count == bb->size:
        hoare.wait(?bb->prod, ?bb->mon)
    bb->buf[bb->tail] = item
    bb->tail = (bb->tail % bb->size) + 1
    bb->count += 1
    hoare.signal(?bb->cons, ?bb->mon)
    hoare.exit(?bb->mon)

def get(bb) returns next:
    hoare.enter(?bb->mon)
    if bb->count == 0:
        hoare.wait(?bb->cons, ?bb->mon)
    next = bb->buf[bb->head]
    bb->head = (bb->head % bb->size) + 1
    bb->count -= 1
    hoare.signal(?bb->prod, ?bb->mon)
    hoare.exit(?bb->mon)

Figure J.3. [code/boundedbuffer_hoare.hny] Bounded Buffer implemented using a Hoare monitor

Tony Hoare, who came up with the concept of split binary semaphores, devised an abstraction of the concept in a programming language paradigm called monitors [24]. (A similar construct was independently invented by Per Brinch Hansen [6].) A monitor is a special version of an object-oriented class, comprising a set of variables and methods that operate on those variables. A monitor also has special variables called condition variables, one per waiting condition. There are two operations on condition variables: wait and signal.

Harmony does not have language support for monitors, but it has a module called hoare. Figure J.2 shows its implementation. A Hoare monitor uses a hidden split binary semaphore. The mutex semaphore is acquired when entering a monitor and released upon exit. Each condition variable maintains a binary semaphore and a counter for the number of threads waiting on the condition. Method wait increments the condition's counter, releases the monitor mutex, blocks while trying to acquire the condition's semaphore, and upon resuming decrements the counter---in much the same way as we have seen for SBS. Method signal checks to see if the condition's count is non-zero, if so releases the condition's semaphore, and then blocks by trying to acquire the mutex again.

Figure J.3 presents a bounded buffer implemented using Hoare monitors. It is written in much the same way you would if using the SBS technique (see Exercise 9.2). However, there is no release_one method. Instead, one can conclude that put guarantees that the queue will be non-empty, and signal will check if there are any threads waiting for this event. If so, signal will pass control to one such thread and, unlike release_one, re-enter the critical section afterwards by acquiring the mutex.

Implementing a reader/writer lock with Hoare monitors is not quite so straightforward, unfortunately. When a writer releases the lock, it has to choose whether to signal a reader or another writer. For that it needs to know if there is a reader or writer waiting. The simplest solution would be to peek at the counters inside the respective condition variables, but that breaks the abstraction. The alternative is for the reader/writer implementation to keep track of that state explicitly, which complicates the code. Also, it requires a deep understanding of the SBS method to remember to place a call to signal in the read_acquire method that releases additional readers that may be waiting to acquire the lock.

Exercises

Acknowledgments

I received considerable help and inspiration from various people while writing this book.

First and foremost I would like to thank my student Haobin Ni with whom I've had numerous discussions about the initial design of Harmony. Haobin even contributed some code to the Harmony compiler. Many thanks are also due to William Ma who refactored the Harmony code to make it easier to maintain. He also wrote the first version of the behavior automaton generator and created the first graphs using the graphviz tool. I have had lots of discussions with him about a wide range of improvements to the Harmony language, many of which came to fruition. I also want to thank Ariel Kellison with whom I discussed approaches to formally specify the Harmony virtual machine in TLA+.

Kevin Sun and Anthony Yang built a beautiful VSCode extension for Harmony called HarmonyLang and proceeded to build an animator for Harmony executions and two cloud-based Harmony offerings, which you can learn about at http://harmony.cs.cornell.edu. They also developed much of that web site and made valuable suggestions for improvements to the Harmony language. Later they were joined by Shi Chong Zhao and Robin Li, who also made significant contributions. Kevin, Anthony, and Robin continue to make great contributions to the Harmony distribution.

I also would like to acknowledge my regular conversation about Harmony with Sasha Sandler of the Oracle Cloud Infrastructure group. He is an early industrial adopter of Harmony and has used it successfully to find and fix bugs in industrial settings. His insights have been invaluable.

Most of what I know about concurrent programming I learned from my colleague Fred Schneider. He suggested I write this book after demonstrating Harmony to him. Being a foremost security expert, he also assisted significantly with the chapter on the Needham-Schroeder protocol.

Leslie Lamport introduced me to using model checking to test properties of a concurrent system. My experimentation with using TLC on Peterson's Algorithm became an aha moment for me. I have learned so much from his papers.

I first demonstrated Harmony to the students in my CS6480 class on systems and formal verification and received valuable feedback from them. The following people contributed by making comments on or finding bugs in early drafts of the book: Alex Chang, Anneke van Renesse, Brendon Nguyen, CJ Lee, Harshul Sahni, Hartek Sabharwal, Heather Zheng, Jack Rehmann, Jacob Brugh, Liam Arzola, Lorenzo Alvisi, Maria Martucci, Nalu Concepcion, Phillip O'Reggio, Saleh Hassen, Sunwook Kim, Terryn Jung, Melissa Reifman, Trishita Tiwari, Xiangyu Zhang, Yidan Wang, Zach Garcia, Zhuoyu Xu, and Zoltan Csaki.

Finally, I would like to thank my family who had to suffer as I obsessed over writing the code and the book, at home, during the turbulent months of May and June 2020.

Index

Glossary