09/23: Test-and-set and performance

Caching Locks

Spin lock: to acquire a lock, a process may enter an infinite loop that keeps attempting a read-modify till it succeeds

If the lock is in memory, there is heavy bus traffic -> other processes make little forward progress

Locks can be cached (Still quite expensive):
•    Cache coherence ensures that a lock update is seen by other processors
•    The process that acquires the lock in exclusive state gets to update the lock first
•    Spin on a local copy – the external bus sees little traffic


  1. Perform test and set on P1
  2. request a lock
  3. change the state to Modify
  1. Perform test and set on P2
  2. Request state change
  3. Change the state on P1 from Modify to Invalid
  4. Change P2 state to Modify


Coherence Traffic for a Lock (test and test & set)

If every process spins on an exchange, every exchange instruction will attempt a write ? many invalidates and the lock value keeps changing ownership (Each thread is not going to do test and set right away)

Hence, each process keeps reading the lock value – each read does not generate coherence traffic and every process spins on its locally cached copy

When the lock owner releases the lock by writing a 0, other copies are invalidated, each spinning process generates a read miss, acquires a new copy, sees the 0, attempts an exchange (requires acquiring the block in exclusive state so the write can happen), first process to acquire the block in exclusive state acquires the lock, others keep spinning.

Test and Test and Set

test     register, location
bnz      register, lock
t&s      register, location
bnz      register, lock
St       location, #0

The performance is better because the traffic in the network gets busy only when the lock is released. For the remaining of the time, each thread will just keep spinning on their lock.


Reference Video