Multithreading: I heard I could use lockfree programming

Lock-free means that you do not need system's mutual exclusion when manipulating a structure or a set of data.
Lock-free programming relies, very often, on an Atomic Compare-And-Swap operation.

The Atomicity also implies that no other core/processor will access the data while the instruction is being executed. It is sometimes implicit, sometimes not. For this reason; if you develop on windows, you need to use: "Interlocked" prefixes…

This is confusing, because sometime lock-free implementation use "Interlocked" functions

Many articles showed lock-free structure to be somewhat useful when you try to scale to N processors… Unfortunately, many free implementation were removed because of patents. Up to this date, I have met no free and efficient c++ based implementation, but I might be wrong (surely) and it doesn't mean it does not exist (surely exists)… Feel free to mail me any implementation

Simple lock-free: static arrays, vector, matrices

Lets take an example: you store results of ray casts onto a big static array, no need of special synchronization, every thread write on the vector, and when they are all done, you show results on screen…

Another good example is double buffering. it is a common technique in computer graphics. You use two buffers: one for reading; and one for writing. When you are done writing on the second buffer, you swap them. You are then sure that no reading/writing is done on the same buffer simultaneously.

More advanced lock-free: lists, queue

Microsoft(Yes I am sold to Microsoft) offers a single linked interlocked list

However there exists another implementation on the internet, this implementation rely on a Critical Section to implement an atomic Compare-And-Swap. You can always replace it with your os/platform primitives.

Sadly, even with that, it showed less efficient than a Mutex based implementation in my benchmarks… but hey! It is proof of concept :-)

Lock-free ? Really a kick-ass performance gain ?

Well, if 4 threads are going to use a list about 10 times/second and they will hardly do it at the exact same moment, your mutex/critical section implementation will rarely delay threads.
The mutex will introduce cpu cycle but each core/processor is executing about billions of cycles.

With a Lock-free implementation, you might be saving (10 * 2250 cycles * 4 cores/processors) 100.000 cpu cycles. Fine, but it is still less than 1% of the total computation…
Lock-free is may be not needed in this case. It is bit like using a truck to pick some fries in a McDrive….

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License