DISCLAIMER: This is probably not interesting to anyone who reads this blog - but it's a curious mental exercise for myself (and you)!
A common problem in multi-threaded design is locking... as in two or more threads/processes/CPU's are sharing access to a common piece of information - be it a structure, block of memory, what have you.
And lately we have been using several mutex locks (mutually exclusive) - meaning one execution task can OWN it for reading and writing. Or perhaps we use semaphores (many shared) which allows several execution tasks to own it.
A more interesting innovation is a read/write lock - which is used when readers outnumber writers (in general). Basically you have many tasks reading this shared data - but only a few writing to it.
There are existing systems for read/write locks in place - specifically in pthreads (unix typically) and ERESOURCES in windows. I was forced recently to write my own due to the inadequacy of the windows kernel to provide a usable lock in DISPATCH/DPC interrupt mode.
Background: The Windows XP kernel uses the same mechanism to lock as it does for pre-emption (executing higher-priority code while in the middle of doing something else). This "feature" means that when a thread wants to lock for something that takes a long while (oh, like say, encryption) that in effect you are killing everything else running on that particular CPU.
So, the solution I propose is to use an atomic read/write lock that does not utilize DISPATCH/DPC mode at all. Where this is tricky though - is that in order to utilize this - you have to atomically compare and store both the read lock and write lock - at the same time.
So, here's the bit of hackery! It turns out there is no "dual operand" compare and store on X86. There was on the 68040 - but it's long dead. But what we DO have is a long long compare and store instruction - essentially a 64 bit CAS (compare and store) called CMPXCHG8B.
We're creating a pair of spin locks here - the read lock allows up to 2^32 readers and the write lock is for a single writer. The writer gets precedence over readers by forcing them to wait until completion.
The hackery comes in play in that we implement the locks in a structure like this - and make it's total length 64 bits like so:
// Packed on 4 byte boundary
__declspec(align(4)) typedef struct {
LONG ro_locks;
LONG rw_gate;
} GATE_LOCK;
Ah HA! So this suddenly becomes a much simpler problem. Now we can operate on both the read/write locks atomically with a single instruction!
So, to create a read lock - we do this:
volatile __int64 c, old;
// c is volatile, should be atomic
c = lock->gateLock.ro_locks;
// We're comparing the entire gate lock with the ro_locks count
// Should block if a read-write lock has been set
while( (old = CAS( ( volatile unsigned __int64 *) & lock->gateLock,
c + 1, c )) != c+1) {
c = old & 0xffffffff;
} // while RO_LOCKS
Notice the 0xffffffff - this is masking (setting to zero) the top bits in the 64 bit integer - thus requiring that the write lock is not in place.
Here's the write lock:
// Set the gate lock, drain out the readers
XCHNG( (volatile LONG *) &lock->gateLock.rw_gate, 1);
// Wait for the readers to empty out
while( (__int64)1<<32 != CAS( (volatile unsigned __int64 *) & lock->gateLock,
(__int64 ) 1 << 32,
(__int64 ) lock->gateLock.rw_gate << 31 ))
} // while RW_GATE
Right now I'm using a basic mutex lock to ensure that writers are only let in one at a time - but I'll update this code to be mutex-free soon. This will involve testing for zero and updating the value to 1 - a basic spin-lock in essence.