09/23: Test-and-set and performance

Caching Locks

Spin lock: to acquire a lock, a process may enter an infinite loop that keeps attempting a read-modify till it succeeds

If the lock is in memory, there is heavy bus traffic -> other processes make little forward progress

Locks can be cached (Still quite expensive):
•    Cache coherence ensures that a lock update is seen by other processors
•    The process that acquires the lock in exclusive state gets to update the lock first
•    Spin on a local copy – the external bus sees little traffic


  1. Perform test and set on P1
  2. request a lock
  3. change the state to Modify
  1. Perform test and set on P2
  2. Request state change
  3. Change the state on P1 from Modify to Invalid
  4. Change P2 state to Modify


Coherence Traffic for a Lock (test and test & set)

If every process spins on an exchange, every exchange instruction will attempt a write ? many invalidates and the lock value keeps changing ownership (Each thread is not going to do test and set right away)

Hence, each process keeps reading the lock value – each read does not generate coherence traffic and every process spins on its locally cached copy

When the lock owner releases the lock by writing a 0, other copies are invalidated, each spinning process generates a read miss, acquires a new copy, sees the 0, attempts an exchange (requires acquiring the block in exclusive state so the write can happen), first process to acquire the block in exclusive state acquires the lock, others keep spinning.

Test and Test and Set

test     register, location
bnz      register, lock
t&s      register, location
bnz      register, lock
St       location, #0

The performance is better because the traffic in the network gets busy only when the lock is released. For the remaining of the time, each thread will just keep spinning on their lock.


Reference Video

09/18 – Producer and Cosumer


09/09: Lab – Java threading exercise

Inspiration from Jamie King’s Youtube tutorials on Threading – Divide and Conqueror Example. Instead of using C#, the below code is in Java (yay!).

import java.util.Random;
import java.util.logging.Level;
import java.util.logging.Logger;

public class ThreadExercise extends Thread {

	private final static Logger LOGGER = Logger.getLogger(ThreadExercise.class.getName());

	static int cores = Runtime.getRuntime().availableProcessors();

	final static int NUM_OF_VALUES =  cores * 55000000;
	static int[] values = new int[NUM_OF_VALUES]; 

	static int portionSize = values.length / cores;
	static int[] portionResults = new int[cores];
	private int portionNumber;

	public ThreadExercise(int portionNumber) {
		this.portionNumber = portionNumber;

	public void run() {
		// Summing each portion in multi-thread
		int sum =0;
		int baseIndex = portionNumber * portionSize;
		for( int i=baseIndex; i<baseIndex + portionSize; i++) {
			sum += values[i];
		portionResults[portionNumber] = sum;
		LOGGER.info(portionNumber + ": " + sum);

	public static void generateInts() {
		Random randomGenerator = new Random();
		for( int i=0; i<values.length; i++ ) {
			int randomInt = randomGenerator.nextInt(10);
			values[i] = randomInt;

	public static void main(String args[]) {


				Using one thread
			System.out.println("Summing with one thread");
			int total = 0;
			long startTime = System.currentTimeMillis();

			for( int i=0; i<values.length; i++ ) {
				total += values[i];

			long elapsedTime = System.currentTimeMillis() - startTime;

			System.out.println("Total value is: " + total);
			System.out.println("Time to sum: " + elapsedTime + " ms\n");

				Using multiple threads
			System.out.println("Summing with multiple threads");
			int total = 0;
			long startTime = System.currentTimeMillis();

			Thread[] thread = new Thread[cores];

			// multi-threading: summing in each portion
			for( int i=0; i<cores; i++ ) {
				thread[i] = new ThreadExercise(i);

			// join threads to wait for all threads to be done
			// before proceeding the next step
			for( int i=0; i<cores; i++ ) {
				try {
				} catch (InterruptedException e) {

			// sum portion to compute final result
			for( int i=0; i<cores; i++ ) {
				total += portionResults[i];

			long elapsedTime = System.currentTimeMillis() - startTime;

			System.out.println("Total value is: " + total);
			System.out.println("Time to sum: " + elapsedTime + " ms");


My system has 4 cores and here’s the result:

Summing with one thread
Total value is: 990026095
Time to sum: 101 ms

Summing with multiple threads
Total value is: 990026095
Time to sum: 76 ms





09/04: Two Professor’s Problem

The two professor’s problem is basically a two general’s problem except that the generals are replaced with professors and the soldiers are now birds.

Proposed algorithm (in sudo code):


chosse dates
order list in ascending order
send list to the other professor


receive list
wait until timeout
if (rcv)
choose the earliest acceptable date
resend msg
goto wait


Other possible enhacement to be made:

  • Assume bound communication
  • Adaptive timeout
  • Fault masking:
    • message retransmission
    • send heartbeat messages (telling the other professor that I have not yet crushed but still working on the schedule)
  • Reliability:
    • Use more birds to flood the message
    • Quorum: Increase the number of professors
      So we only need the majority votes and ignored the crushed professor
  • Fault detector:
    • Use admin to check if the prof is still there