Conviviality: XP Kernel

Having been the lead architect on more than a few security projects - I can really appreciate the aims of the windows security kernel API.

A token to represent every object in the system. Objects live everywhere, as files, users, domains, et cetera.

But somewhere along the development timeline, vain attempts to obfuscate the true meaning of certain elements were chosen over safe techniques, using provably secure algorithms and clear documentation.

Simply speaking, it became an extreme example of complication with varying true security value. Drivers which run within the kernel are treated as aliens, pretending to be running in restrictions which are entirely fictitious fantasies - leading to insecurities and ultimately more bugs.

Provable security is difficult enough within the computer environment - safe timing, "true" randomness and safe, tested cryptographic algorithms are difficult enough as it is.

Don't get me wrong - the security paradigm of Windows is arguably much better (at least theoretically) than Unix. It's just done very badly.

For example, having to decide if I need a Security Context, a SID, LUID, SecurityUserData, or perhaps an Access token is inane and utter bull. All these types could easily be represented as a single type, with perhaps a non-referenced user-mode handle for safety.

So, for now, I put this down on my " Top Ten Horrors of the Security World."

So I made an innovative discovery last month... probably nothing you all are interested in, but I wanted to document it as mine :-). I've never seen this done before and I'm a bit happy about it.

DISCLAIMER: This is probably not interesting to anyone who reads this blog - but it's a curious mental exercise for myself (and you)!

A common problem in multi-threaded design is locking... as in two or more threads/processes/CPU's are sharing access to a common piece of information - be it a structure, block of memory, what have you.

And lately we have been using several mutex locks (mutually exclusive) - meaning one execution task can OWN it for reading and writing. Or perhaps we use semaphores (many shared) which allows several execution tasks to own it.

A more interesting innovation is a read/write lock - which is used when readers outnumber writers (in general). Basically you have many tasks reading this shared data - but only a few writing to it.

There are existing systems for read/write locks in place - specifically in pthreads (unix typically) and ERESOURCES in windows. I was forced recently to write my own due to the inadequacy of the windows kernel to provide a usable lock in DISPATCH/DPC interrupt mode.

Background: The Windows XP kernel uses the same mechanism to lock as it does for pre-emption (executing higher-priority code while in the middle of doing something else). This "feature" means that when a thread wants to lock for something that takes a long while (oh, like say, encryption) that in effect you are killing everything else running on that particular CPU.

So, the solution I propose is to use an atomic read/write lock that does not utilize DISPATCH/DPC mode at all. Where this is tricky though - is that in order to utilize this - you have to atomically compare and store both the read lock and write lock - at the same time.

So, here's the bit of hackery! It turns out there is no "dual operand" compare and store on X86. There was on the 68040 - but it's long dead. But what we DO have is a long long compare and store instruction - essentially a 64 bit CAS (compare and store) called CMPXCHG8B.

We're creating a pair of spin locks here - the read lock allows up to 2^32 readers and the write lock is for a single writer. The writer gets precedence over readers by forcing them to wait until completion.

The hackery comes in play in that we implement the locks in a structure like this - and make it's total length 64 bits like so:


// Packed on 4 byte boundary
__declspec(align(4)) typedef struct {
        LONG        ro_locks;
        LONG        rw_gate;
    } GATE_LOCK;

Ah HA! So this suddenly becomes a much simpler problem. Now we can operate on both the read/write locks atomically with a single instruction!

So, to create a read lock - we do this:


volatile __int64 c, old;
        // c is volatile, should be atomic
        c = lock->gateLock.ro_locks;
        // We're comparing the entire gate lock with the ro_locks count
        // Should block if a read-write lock has been set

        while(  (old = CAS(     ( volatile unsigned __int64 *) & lock->gateLock,
                                c + 1, c )) != c+1) {
                c = old & 0xffffffff;
        } // while RO_LOCKS

Notice the 0xffffffff - this is masking (setting to zero) the top bits in the 64 bit integer - thus requiring that the write lock is not in place.

Here's the write lock:


        // Set the gate lock, drain out the readers
        XCHNG( (volatile LONG *) &lock->gateLock.rw_gate, 1);
        // Wait for the readers to empty out
        while( (__int64)1<<32 !=    CAS(    (volatile unsigned __int64 *) & lock->gateLock,
                                (__int64 ) 1 << 32,
                                (__int64 ) lock->gateLock.rw_gate << 31 )) 
} // while RW_GATE

Right now I'm using a basic mutex lock to ensure that writers are only let in one at a time - but I'll update this code to be mutex-free soon. This will involve testing for zero and updating the value to 1 - a basic spin-lock in essence.

Conviviality

Tuesday, November 03, 2009

Notes on Windows Kernel Security...

Monday, September 24, 2007

CompSci Research interest...

Contributors

Blog Archive