Syllabus   Blank Homework  
Notes   Labs   Scores   Blank

Lecture Notes
Dr. Tong Lai Yu, March 2010
    0. Review and Overview
    1. An Introduction to Distributed Systems
    2. Deadlocks
    3. Distributed Systems Architecture
    4. Processes
    5. Communication
    6. Distributed OS Theories
        7. Distributed Mutual Exclusions
    8. Agreement Protocols
    9. Distributed Scheduling
    10. Distributed Resource Management
    11. Recovery and Fault Tolerance
    12. Security and Protection
    Distributed Mutual Exclusion
    Life consists not in holding good cards but in playing those you hold well.
    							Josh Billings
    1. Introduction

    2. A Centralized Algorithm
      • One process is elected as the coordinator.
      • Whenever a process wants to access a shared-resource, it sends request
        to the coordinator to ask for permission.
      • Coordinator may queue requests.

    3. Decentralized
      • nontoken-based
      • token-based

    4. Requirements of Mutual Exclusion Algorithms
      • only one request accessess the CS at a time ( primary goal )
      • Freedom from deadlocks
      • Freedom from starvation
      • Fairness
      • Fault Tolerance

    5. Performance of a mutual exclusion algorithm

      System throughput S ( rate at which the system executes requests for the CS )

        S = 1
        Sd + E
      Sd = synchronization delay
      E = average execution time
    6. low load and high load performance
    7. best and worst case performance; if fluctuates statistically, take average
    8. Election Algorithms

      An algorithm requires that some process acts as a coordinator. The question
      is how to select this special process dynamically.

      In many systems the coordinator is chosen by hand (e.g. file servers). This
      leads to centralized solutions ) single point of failure.

      After a network partition, the leader-less partition must elect a leader.

      Election by bullying

      Each process has an associated priority (weight). The process with
      the highest priority should always be elected as the coordinator.

      How do we find the heaviest process?

    9. Any process can just start an election by sending an election
      message to all other processes with higher numbers.
    10. If a process Pheavy receives an election message from a lighter
      process Plight, it sends a take-over message to Plight. Plight is out of
      the race.
    11. If a process doesn't get a take-over message back, it wins, and
      sends a victory message to all other processes.

      (a) Pocess 4 holds an election. (b) Processes 5 and 6 respond, telling 4 to stop.
      (c) Noew 5 and 6 hold an election. (d) Process 6 tells 5 to stop.
      (e) Process 6 wins and tells everyone.
    12. Issue
      Suppose crashed nodes comes back on line:

    13. Sends a new election message to higher numbered processes
    14. Repeat until only one process left standing
    15. Announces victory by sending message saying that it is coordinator (if not already coordinator)
    16. Existing (lower numbered) coordinator yields
      Hence the term 'bully'
    17. Election in a ring

      Process priority is obtained by organizing processes into a (logical)
      ring. Process with the highest priority should be elected as

    18. Any process can start an election by sending an election message
      to its successor. If a successor is down, the message is passed
      on to the next successor.
    19. If a message is passed on, the sender adds itself to the list. When
      it gets back to the initiator, everyone had a chance to make its
      presence known.
    20. The initiator sends a coordinator message around the ring
      containing a list of all living processes. The one with the highest
      priority is elected as coordinator.

    21. 2 and 5 start election message independently.

      Both messages continue to circulate.

      Eventually, both messages will go all the way around .

      2 and 5 will convert Election messages to COORDINATOR messages.
      All processes recognize highest numbered process as new coordinator.

      Does it matter if two processes initiate an election?

      What happens if a process crashes during the election?

      Superpeer election

      How can we select superpeers such that:

    22. Normal nodes have low-latency access to superpeers
    23. Superpeers are evenly distributed across the overlay network
    24. There is a predefined fraction of superpeers
    25. Each superpeer should not need to serve more than a fixed
      number of normal nodes
    26. DHTs
      Reserve a fixed part of the ID space for superpeers. Example if S
      superpeers are needed for a system that uses m-bit identifiers, simply
      reserve the k = |-log2S-| leftmost bits for superpeers. With N nodes,
      we'll have, on average, 2k-mN superpeers.

      Routing to superpeer
      Send message for key p to node responsible for
      p AND 11...1100...00

      Node Positioning Approach

    27. N tokens are spread across N randomly chosen nodes.
    28. No node can hold more than 1 token.
    29. Each token represents a repelling force.
    30. If force on token-holder node exceeds a threshold, token moves away.
    31. Eventually, they will spread evenly across the network.
    32. Election in Wireless Environment

    33. Node waits for neigbours' replies before replying to parent.

    34. Node a is the source; (a) is the initial network.
    35. (b) - (e) Tree building phase
    36. (f) Reporting of best node to source.
    37. Non-token-based algorithms

    38. Lamport's Algorithm

      Si -- site, N sites

      each site maintains a request set

        Ri = { S1, S2, ..., SN }
      request-queuei containing mutual exclusion requests ordered by their timestamps,
      use => total order relation ( with Lamport's clock )
      tsi -- timestamp of site i
      • messages are received in the same order as they are sent
      • eventually every message is received

      1. To request entering the CS, process Pi sends a REQUEST( tsi, i ) message to every process ( including itself ), puts the request on request-queuei

      2. When process Pj receives REQUEST(tsi, i ), it places it on its request-queuej and sends a timestamped REPLY ( acknowledgement ) to Pi

      3. Process Pi enters CS when the following 2 conditions are satisfied:
        • Pi's request is at the head of request-queuei
        • Pi has received a ( REPLY ) message from every other process time-stamped later than tsi

      4. When exiting the CS, process Pi removes its request from head of its request-queue and sends a timestamped RELEASE to every other process

      5. When Pj receives a RELEASE from Pi, it removes Pi's request from its request queue.

        for each CS invocation

        		(N-1) REQUEST
        		(N-1) REPLY
        		(N-1) RELEASE
        		total 3(N-1) messages
        	  synchronization delay Sd = average delay
        Ricart, Agrawala optimized Lamport's algorithm by merging the RELEASE and REPLY messages.
        (See example below.)


          (a) Two processes want to access a shared resource at the same time
          (b)Process 0 has the lowest timestamp, so it wins
          (c) When process 0 is done, it sends an OK also, so 2 can now go ahead

    39. Maekawa's Voting Algorithm

      Voting Algorithms:

    40. Lamport's algorithem requires a process to get permisson from all other processes. It is an overkill.
    41. A different approach is to let processes compete for votes. If a process has received more votes than any other process, it can enter the CS. If it does not have enough votes, it waits until the process in the CS is done and releases its votes.
    42. Quorums have the property that any two groups have a non-empty intersection.
    43. Simple majorities are quorums. Any 2 sets whose sizes are simple majorities must have at least one element in common.

        12 nodes, so majority is 7

    44. Grid quorum: arrange nodes in logical grid (square). A quorum is all of a row and all of a column. Quorum size is 2 √ N - 1.

    45. Principles:
    46. To get accessi to a CS, not all processes have to agree
    47. Suffices to split set of processes up into subsets ("voting sets") that overlap
    48. Suffices that there is consensus within every subset
    49. When a process wishes to enter the CS, it sends a vote request to every member of its voting district.
    50. When the process receives replies from all the members of the district, it can enter the CS.
    51. When a process receives a vote request, it responds with a "YES" vote if it has not already cast its vote.
    52. When a process exits the CS, it informs the voting district, which can then vote for other candidates.
    53. May have deadlock.
    54. Request sets

      N = { 1, 2, ..., N }

      Ri ∩ Rj ≠ ∅   all i, j ∈ N

      A site can send a REPLY ( LOCKED ) message only if it has not been LOCKED (i.e. has not cast the vote).


      1. Ri ∩ Rj ≠ ∅
      2. Si ∈ Ri
      3. |Ri| = K     for all i ∈ N
      4. any site Si is in K number of Ri's

      Maekawa found that:

        N = K ( K - 1 ) + 1

        or K = |Ri| ≈ √N

      Messages exchange:

        Failed -- F,   Sj cannot grant permission to Sk because Sj has granted permission to a site with higher request priority.
        Inquire -- I,   Sj wants to find out if Sk has successfully locked all sites. ( the outstanding grant to Sk has a lower priority than the new request )
        Yield -- Y,   Sj yields to Sk ( Sj has received a failed message from some other site or Sj has sent a yield to some other site but has not received a new grant )
      ( The request's priority is determined by its sequence number ( timestamp ); the samller the sequence number, the higher the priority; if sequence # same, the one with smaller site number has higher priority )


      1. A site Si requests access to CS by sending REQUEST(i) messages to all the sites in its request set Ri
      2. When a site Sj receives the REQUEST(i) message, it sends a REPLY(j) message to Si provided it hasn't sent any REPLY to any site since last RELEASE. Otherwise, it queues up the REQUEST.
      3. Site Si could access the CS only after it has received REPLY from all sites in Ri

      Deadlock Handling:

      1. When a REQUEST(i) from Si blocks at site Sj because Sj has currently granted permission to site Sk then Sj sends FAILED(j) message to Si if Si has lower priority. Otherwise Sj sends an INQUIRE (j) message to Sk.
      2. In response to an INQUIRE(j) from Sj, site Sk sends YIELD(k) to Sj, provided Sk has received a FAILED message or has sent a YIELD to another site, but has not recived a new REPLY from it.
      3. In response to a YIELD(k) message from Sk, site Sj assumes it has been released by Sk, places the request of Sk at the appropriate location in the request queue, and sends a REPLY(j) to the top request's site in the queue. Sj

      13 nodes, 13 = 4(4-1) + 1, thus K = 4

        R1 = { 1, 2, 3, 4 }
        R2 = { 2, 5, 8, 11 }
        R3 = { 3, 6, 8, 13 }
        R4 = { 4, 6, 10, 11 }
        R5 = { 1, 5, 6, 7 }
        R6 = { 2, 6, 9, 12 }
        R7 = { 2, 7, 10, 13 }
        R8 = { 1, 8, 9, 10 }
        R9 = { 3, 7, 9, 11 }
        R10 = { 3, 5, 10, 12 }
        R11 = { 1, 11, 12, 13 }
        R12 = { 4, 7, 8, 12 }
        R13 = { 4, 5, 9, 13 }

      Suppose sites 11, 8, 7 want to enter CS; they all send requests with sequence number 1. ( 7 has highest priority, 8 next, 11 lowest )

      1. site 11 wants to enter; requests have arrived at 12, 13; R to 1 is on the way
      2. 7 wants to enter CS; R arrived at 2 and 10 but R to 13 is on its way
      3. 8 also wants to enter CS; sends R to 1, 9, 10 but fails to lock 10 because 10 has been locked by 7 with higher priority

      4. R from 11 finally arrived at 1 and R from 7 arrived at 13

        11, 7, 8 are circularly locked:

      5. 8 receives F and cannot enter CS
      6. 11 receives F and cannot enter CS
      7. 7 cannot enter CS because it has not received all REPLY ( LOCKED) messages
      8. 13 is locked by 11 ( has lower priority than 7 ) and receives request from 7, so it sends an INQUIRE to 11 to ask it to yield
      9. When 11 receives an INQUIRE, it knows that it cannot enter CS; therefore it sends a YIELD to 13
      10. then 13 can send L to 7 which enters CS
      11. when 7 finished, sends RELEASE
      12. then 8 locks all members, ... , sends RELEASE
      13. then 11 enters

    55. Token-based algorithms

    56. Principles
      • one token, shared among all sites
      • site can enter its CS iff it holds token
      • The major difference is the way the token is searched
      • use sequence numbers instead of timestamps o used to distinguish requests from same site
        o kept independently for each site
        o use sequence number to distinguish between old and current requests
      • The proof of mutual exclusion is trivial
      • The proof of other issues (deadlock and starvation) may be less so

        (a) An unordered group of processes on a network.
        (b) A logical ring connected in software.

    57. a) Suzuki-Kasami's Broadcast Algorithm

      • TOKEN -- a special PRIVILEGE message
      • node owns TOKEN can enter CS
      • initially node 1 has the TOKEN
      • node holding TOKEN can execute CS repeatedly if no request from others comes
      • if a node wants TOKEN, it broadcasts a REQUEST message to all other nodes
      • node:
        REQUEST(j, n)
          node j requesting n-th CS invocation n = 1, 2, 3, ... , sequence #
        node i receives REQUEST from j update RNi[j] = max ( RNi[j], n )
        RNi[j] = largest seq # received so far from node j

      • TOKEN:
        TOKEN(Q, LN ) ( suppose at node i ) Q -- queue of requesting nodes
        LN -- array of size N such that
           LN[j] = the seq # of the request of node j granted most recently
        When node i finished executing CS, it does the following
        1. set LN[i] = RNi[i] to indicate that current request of node i has been granted ( executed )
        2. all node k such that RNi[k] > LN[i] (i.e. node k requesting ) is appended to Q if its not there
        When these updates are complete, if Q is not empty, the front node is deleted and TOKEN is sent there



        There are three processes, p1, p2, and p3.
        p1 and p3 seek mutually exclusive access to a shared resource.

        Initially: the token is at p2 and the token's state is LN = [0, 0, 0] and Q empty;

        p1's state is: n1 ( seq # ) = 0, RN1 = [0, 0, 0];
        p2's state is: n2 = 0, RN2 = [0, 0, 0];
        p3's state is: n3 = 0, RN3 = [0, 0, 0];

      • p1 sends REQUEST(1, 1) to p2 and p3; p1: n1 = 1, RN1 = [ 1, 0, 0 ]

      • p3 sends REQUEST(3, 1) to p1 and p2; p3: n3 = 1, RN3 = [ 0, 0, 1 ]

      • p2 receives REQUEST(1, 1) from p1; p2: n2 = 1, RN2 = [ 1, 0, 0 ], holding token

      • p2 sends the token to p1

      • p1 receives REQUEST(3, 1) from p3: n1 = 1, RN1 = [ 1, 0, 1 ]

      • p3 receives REQUEST(1, 1) from p1; p3: n3 = 1, RN3 = [ 1, 0, 1 ]

      • p1 receives the token from p2

      • p1 enters the critical section

      • p1 exits the critical section and sets the token's state to LN = [ 1, 0, 0 ] and Q = ( 3 )
      • p1 sends the token to p3; p1: n1 = 2, RN1 = [ 1, 0, 1 ], holding token; token's state is LN = [ 1, 0, 0 ] and Q empty

      • p3 receives the token from p1; p3: n3 = 1, RN3 = [ 1, 0, 1 ], holding token
      • p3 enters the critical section

      • p3 exits the critical section and sets the token's state to LN = [ 1, 0, 1 ] and Q empty
      • Performance:
      • It requires at most N message exchange per CS execution ( (N-1) REQUEST messages + TOKEN message
      • or 0 message if TOKEN is in the site
      • synchronization delay is 0 or T
      • deadlock free ( because of TOKEN requirement )
      • no starvation ( i.e. a requesting site enters CS in finite time )
      • Comparison of Lamport and Suzuki-Kazami Algorithms

        The essential difference is in who keeps the queue. In one case every site keeps its own local copy of the queue. In the other case, the queue is passed around within the token.