Saturday, May 29, 2010

Word Tearing

... Or why you are not guaranteed a coherent value when you are reading longs and doubles in Java.

How is that possible?

This mainly due to a language compromise to deal with 'legacy' 32-bit CPU systems, where a 64-bit value memory action is divided into two 32-bit memory actions, which is mainly done for historical efficiency reasons (Refer to 17.4 of the Java Language Specification for an explicit answer to why). But note that this is not a problem if you are using native 64-bit architectures.

So how does this translate to a possible problem?

Lets try to simplify this problem by illustrating the same scenario as having a 2-bit value memory action being split into two 1-bit memory action. So assuming that we have a memory location m which looks something like this:



From the diagram m is logically separated into to two 1-bit locations m[0] and m[1]. Let's assume that m starts off with 00b as its initial value. Then m looks like this:



Now suppose that we have 2 CPUs concurrently executing the following code:

CPU1CPU2
R(m)W(m=11b)

R(m) is a function that reads the value of m, while W(m) is a function that sets the value to m. In the case of our example, CPU1 is reading the value of m while CPU2 is writing the binary value 11b in to m.

Logically if atomicity is enforced, then the only possible value that can be observed at any given time is either 00b or 11b. But in the case of longs and doubles, the specification explicitly mentions that no guarantees of atomicity is enforced.

Hence it is possible to timeslice memory updates in the following order:



As CPU2's execution is timesliced between the execution of CPU1, this results in R(m) return the value of 01b, which is a transient value, but it is not a proper value to be observed.

It can be said that that CPU1 happens to be reading at the right place but at a wrong time. The volatile keyword in Java will resolve this problem, by virtue of the fact that it does not allow reordering of read and write actions, and hence by implictly disallowing read in-between write actions and vice-versa.

This has the effect of making read and write actions appear atomic, not that the read and write actions are implicitly atomic themselves. For a better understanding of what I mean by atomic, see my explanation on the difference between Atomic and Volatile.
Saturday, May 22, 2010

Google Docs's Equation Editor does not Create Permalink Images

I initially thought I could get away with it by linking Latex images for my equations directly from Google Docs to Blogger, but it seems that the image URLs are not permalinks, causing the images in my blog posts to be missing after a while.

Wordpress has a plugin to support this directly, but there's a nary of solution for Blogger.

Still trying to look for possible workarounds to this problem. Hopefully it's not one that involves manually copying and linking the images directly. :(
Saturday, May 15, 2010

Difference between Atomic and Volatile?

There seems to be a tendency for people to confuse an atomic action with a volatile memory access. This is quite natural, since they exhibit similar properties and makes it hard for people to distinguish between the two.

So what is the common property between an atomic action and a volatile memory? They both exhibit the property of total ordering. Let's assume for \forall a, b, c \in A where A is the set of possible actions a program can execute, and a \rightarrow b denotes that an action a precedes action b, then the property of total order means:

a \rightarrow b  \wedge b \rightarrow a \Rightarrow a = b

a \rightarrow b , b  \rightarrow c \Rightarrow a \rightarrow c

a \rightarrow b \vee b  \rightarrow a

Literally, the set of rules mean (1) that for any action a that precedes b and any action b that precedes a, then the action a must necessarily be the same action as b. (2) If a precedes b, and b precedes c, then a must precede c. (3) a must precede b or b must precede a. These are just some mathematical (or convoluted) formalisms of saying any action is guaranteed to be in sequential order.

But that's where the similarity ends.

An atomic action is literally what it means: an action that is indivisible. Volatile memory only provides a guarantee of sequential order, but offers no guarantee that an interleaving of actions cannot occur. For an illustration, lets take a look at the following code:

a = 0;
a++;

Assume that 'a' is a volatile variable. From the code, we naturally think that the increment operator (++) will atomically increment the value of 'a' to 1, but this assumption is slightly faulty:

[a] = 0;
r0 = [a];
r0 = r0 + 1;
[a] = r0;

The square brackets between [a] denotes a value gotten from memory, while r0 represents a register in which the addition operation actually occurs. An increment operation is actually a composite action, composed of a read, then a write action. (By the way, it is implied that register actions are atomic, despite being a composite action. This is because an operation within a register cannot be directly observed).

So increment is a composite action, compared to being an atomic action, what's the harm? Lets go through the same example, but with 2 concurrent processors executing simultaneously:

CPU1
CPU2
a++;
a++;

Lets assume that a is initially 0 to start. Given that iff the actions from CPU1 and CPU2 are both volatile and atomic, then the only possible transition value that can be seen after execution can only be 2.

However, if the actions from CPU1 and CPU2 are only volatile, then the actions can translate to this:

CPU1
CPU2
r0 = a;
r1 = a;
r0 = r0 + 1;
r1= r1 + 1
a = r0;
a = r1

For all the possible decompositions of C into uniprocessor-MT interleavings, one can see that it is possible that both registers r0 and r1 both read a as 0 simultaneously which causes the resultant possible values to be either 1 or 2. Clearly seeing 1 is wrong, since if the operation is atomic, then 1 cannot be a legal value at the end of the execution.

Conversely, a volatile variable can be deemed atomic. Consider the example below:

CPU1
CPU2
a=1;
a=2;

In this case, the only possible set of terminal values are 1 or 2, which is exactly the same as you would get for an atomic action. Therefore, a volatile memory access is atomic iff you perform only a read or write action, which are atomic in nature. But a volatile variable is no guarantee of atomicity if a composite action is executed.

Hence the label atomic actions is a slight misnomer, because it doesn't imply that the action is atomic, but rather it means that it guarantees a composite action is executed as a single atomic action. A well known example of such a composite action is the Compare-and-Swap instruction.

For simplicity, just remember that atomicity is not a property of memory, but a property its actions, to which a volatile memory access may or may not uphold.
Saturday, May 08, 2010

Concurrency vs. Multi-Threading

A lot of people tend to treat concurrency and multi-threading as the same thing. In many ways, they are correct given the similarities between the two. But when trying to reason program behaviour, teasing out the distinction is somewhat important. Concurrency, C, is by definition multi-threading; but multi-threading, MT, is not necessarily concurrent.

To make it clear why concurrency is not strictly multi-threading, we start off by defining what concurrency is. Let P be the set of all program actions and that A, B \subseteq P and \forall a, b, c \in P. Also, a \rightarrow b denotes that the action a precedes action b.

A \rightarrow B \wedge B \rightarrow A \Rightarrow A = B

a \rightarrow b , b \rightarrow c \Rightarrow  a \rightarrow c

A \rightarrow A

The explanation is actually straightforward when expressed in plain English.

The set of rules says (1) if all the actions in set A precedes all the actions in set B, and all the actions in set B precedes all the actions in set A, then the set of actions are equivalent; (2) for any action a that precedes action b, and any action b that precedes action c, then action a precedes action c. This is known as a transitive relation, but note that while this transitive relation describes an action, broadly this relationship holds within a set of actions as well. (3) All the set of actions in A precedes all the set of actions in A, i.e. this loosely means that if an action has happened before, then it cannot happen again, which is used to describe how program flows.

To put it simply, concurrency means that for a given quantum of time, there can be one or more program actions happening simultaneously. A set of actions on a given thread T1 will be causally ordered by only on the thread itself, however this set of actions may or may not have an ordering relationship with the set of actions on another thread T2.

Understanding what concurrency is, how is multi-threading different?

Consider multi-threading on a single CPU machine. In a uni-core CPU, all actions are totally ordered, ie no two actions can happen at a single quantum of time, but this is not the case for a multi-core (concurrent) CPU. Obviously, we can easily make a multi-core CPU function like a single core CPU, but certainly not the other way around, which is why C \geq MT.

Even though most times C type programs are what we are interested in, we do usually assign an arbitary program order to these programs to make it functionally behave like a single-core MT program, which is much easier to reason with. You see these C to totally ordered MT conversions from time to time, such as in my discussion of the Implementation of Volatile in the JVM.

To put it simply, threading is a way of describing program actions for concurrent programs, and sequential programs can also be reasoned in the context of threads, but not the other way round. For a simple illustration, consider the following program:


public class MyLock {
private volatile int thread_id;
public void lock ( )
{
int my_id = Thread.getID ( );
thread_id = my_id;
while ( thread_id == my_id ) { } // wait loop
}
}


The code is multi-threaded; assume that there are 2 threads executing at the moment, and with thread T1, T2 returning IDs of 1, 2 correspondingly.

There is an interesting property with this piece of code - if the threads are concurrent, then eventually the thread_id value will change, and the code can never halt (and the threads will be fairly-scheduled as an interesting side effect). But if the threads are sequential, then one thread will enter the lock ( ) code, and halts the processor in a wait loop.

For those who may be interested, the code in this example is a part of an implementation of Peterson's algorithm, and a simple and poignant example to show how concurrent programs can be different from multi-threading ones.