# Testing: Checking Behaviors

qtestseq.hny
import queue, queueconc

const NOPS = 4
const VALUES = { 1..NOPS }

specq = queue.Queue()
implq = queueconc.Queue()

for i in {1..NOPS}:
let op = choose({ "get", "put" }):
if op == "put":
let v = choose(VALUES):
queueconc.put(?implq, v)
queue.put(?specq, v)
else:
let v = queueconc.get(?implq)
let w = queue.get(?specq):
assert v == w


Testing is a way to increase confidence in the correctness of an implementation. Figure 11.2 demonstrates how concurrent queues may be used, but it is not a very thorough test program for an implementation such as the one in Figure 11.3 and does little to increase our confidence in its correctness. To wit, if get() always returned 1, the program would find no problems. Similarly, Figure 12.2 is not a good test program for something as complicated as Figure 12.3. In this chapter, we will look at approaches to testing concurrent code.

As with critical sections---when testing a concurrent data structure---we need a specification. For example, Figure 11.1(a) shows a sequential specification of a queue in Harmony. First, we can check if the queue implementation in Figure 11.3 meets the sequential queue specification in Figure 11.1(a). To check if the queue implementation meets the specification, we need to see if any sequence of queue operations in the implementation matches a corresponding sequence in the specification. We say that the implementation and the specification have the same behaviors or are behaviorally equivalent.

Behaviors say something about how we got to a state. The same state can be reaching by multiple behaviors, and the behaviors are often an integral part of whether a program is correct or not. Just because a state satisfies some invariant---however important---does not mean that the state is valid given the sequence of operations. For example, a state in which the queue is empty is certainly a valid state in its own right, but if the last operation to get there was an enqueue operation, there must be a bug in the program. It can therefore be important to capture the behaviors. We could store behaviors in the state itself by adding what is known as a history variable that keeps track of all the operations. While this can be useful for correctness proofs, for model checking this approach presents a problem: introducing this additional state can lead to state explosion or even turn a finite model (a model with a finite number of states) into an infinite one. We therefore use a different approach: composing an implementation with its specification to ensure that accept the same behaviors.

Figure 13.1 presents a test program that does exactly this, for sequences of up to NOPS queue operations. It maintains two queues:

• specq: the queue of the specification;

• implq: the queue of the implementation.

For each operation, the code first chooses whether to perform a get or put operation. In the case of a put operation, the code also chooses which value to append to the queue. All operations are performed on both the queue implementation and the queue specification. In the case of get, the results of the operation on both the implementation and specification are checked against one another.

Test programs themselves should be tested. Just because a test program works with a particular implementation does not mean the implementation is correct---it may be that the implementation is incorrect but the test program does not have enough coverage to find any bugs in the implementation. So, run a test program like this with a variety of queue implementations that have known bugs in them and make sure that the test program finds them. Conversely, a test program may be broken in that it finds bugs that do not exist. In my experience, it is often harder to implement the test program than the algorithm that the test program tests.

As with any other test program, Figure 13.1 may not trigger extant bugs, but it nonetheless inspires reasonable confidence that the queue implementation is correct, at least sequentially. The higher NOPS, the higher the confidence. It is possible to write similar programs in other languages such as Python, but the choose expression in Harmony makes it relatively easy to explore all corner cases. For example, a common programming mistake is to forget to update the tail pointer in get() in case the queue becomes empty. Normally, it is a surprisingly tricky bug to find. You can comment out those lines in Figure 11.3 and run the test program---it should easily find the bug and explain exactly how the bug manifests itself, adding confidence that the test program is reasonably thorough.

The test program also finds some common mistakes in using locks, such as forgetting to release a lock when the queue is empty, but it is not designed to find concurrency bugs in general. If you remove all acquire() and release() calls from Figure 11.3, the test program will not (and should not) find any errors, but it would be an incorrect implementation of a concurrent queue.

The next step is to test if the queue implementation meets the concurrent queue specification or not. Figure 11.1(b) shows the concurrent queue specification. It is similar to the sequential specification in Figure 11.1(a) but makes all operations (except instantiation itself) atomic. Testing the implementation of a concurrent queue specification is trickier than testing the implementation of a sequential one because there are many more scenarios to check.

We would like a way that---similar to the sequential test---systematically compares behaviors of the concurrent queue implementation with behaviors of the concurrent queue specification. But we cannot do this by composing the specification and the implementation and simply run the same test operations on both as we did before---concurrency make the operations non-determistic and thus the specification and implementation of a single execution might produce different results, even if both are correct. Instead, we will create a test program that tries various concurrent combinations of queue operations, and run it twice: once against the specification of the concurrent queue and once against the implementation. We will then check if the behaviors obtained from running the implementation are also behaviors obtained from the specification.

qtestpar.hny
import queue

const NOPS = 4
q = queue.Queue()

def put_test(self):
print("call put", self)
queue.put(?q, self)
print("done put", self)

def get_test(self):
print("call get", self)
let v = queue.get(?q):
print("done get", self, v)

nputs = choose {1..NOPS-1}
for i in {1..nputs}:
spawn put_test(i)
for i in {1..NOPS-nputs}:
spawn get_test(i)

queueseq.hny
def Queue():
result = { .data: [], .head: 0, .tail: 0 }

def put(q, v):
let i = q->tail:
q->data[i] = v
q->tail = i + 1

def get(q):
if i == q->tail:
result = None
else:
result = q->data[i]


We will start with a test program that tries concurrent combinations of various queue operations. Figure 13.2 shows the test program. It starts NOPS threads doing either a put or a get operation. It selects the fraction of put over get operations nondetermistically, although avoiding the uninteresting case in which there are none of one of them. In case of a put operation, the thread enqueues its own name (which is provided as an argument to the thread). In order to capture the behaviors, each thread prints what operation it is about to perform, and afterwards it prints that the operation has completed (including the return value if any). This is probably much like you would do if you were trying to find a bug in a program.

Figure 13.2 also shows the deterministic finite automaton that describes the possible outputs when the test program is run against the specification in the case NOPS = 2---for NOPS = 4 it would be much too large to print here. Since there are no cycles in the DFA, you can follow some paths through this DFA and see that they are valid interleavings of the threads. You can obtain this output yourself by running

$harmony -c NOPS=2 -o spec.png code/qtestpar.hny  If you run the same test program against the implementation of Figure 11.3, you will get the same output: $ harmony -c NOPS=2 -o impl.png -m queue=queueconc code/qtestpar.hny


You can try this for various NOPS, although it gets increasingly harder to check by hand that the generated DFAs are the same as NOPS increases. Now run the test program against Figure 13.3, which is clearly an incorrect implementation of the concurrent queue specification because it contains no synchronization code. However, Harmony does not immediately detect any problems. In fact, for $$\mathtt{NOPS} = 2$$ it even generates the same set of behaviors. This is because the test program only outputs the behaviors---it does not check if they are correct.

Harmony does have a way to check the behaviors of one program against the behaviors of another. In particular, we want to check if the behaviors of the implementations we have match behaviors of the specification. The following shows, for example, how to check the queueconc.hny implementation on the command line:

$harmony -o queue4.hfa code/qtestpar.hny$ harmony -B queue4.hfa -m queue=queueconc code/qtestpar.hny


The first command runs the code/qtestpar.hny program (with the default 4 threads) and writes a representation of the output DFA in the file queue4.hfa. The second command runs the same test program, but using the queue implementation in the file code/queueconc.hny. Moreover, it reads the DFA in queue4.hfa to check if every behavior of the second run of the test program is also a behavior of the first run. You can try the same using the code/queueseq.hny implementation and find that this implementation has behaviors that are not allowed by the specification.

## Exercises

13.1 Figure 8.1 shows a specification of a lock. Write a program that checks the behaviors of lock implementations such as Figure 10.1 and Figure 10.2. That is, it should not rely on assertions such as in Figure 5.2.

13.2 Write a Harmony program that checks if Figure 12.3 satisfies the specification of Figure 12.1 sequentially.

13.3 Write a Harmony program that checks if Figure 12.3 satisfies the specification of Figure 12.1 concurrently.

13.4 Rewrite Figure 13.1 so it only imports queue and runs NOPS nondeterministically chosen operations against it (similar in style to Figure 13.2 but without threads). Then use behaviors to check that Figure 11.3 and Figure 13.3 are correct sequential implementations of the queue. Check your test program by also trying it on one or two buggy queue implementations.