Syllabus   Blank Homework  
Notes   Labs   Scores   Blank

Lecture Notes
Dr. Tong Lai Yu, March 2010
    0. Review and Overview
    1. An Introduction to Distributed Systems
    2. Deadlocks
    3. Distributed Systems Architecture
    4. Processes
    5. Communication
    6. Distributed OS Theories
        7. Distributed Mutual Exclusions
    8. Agreement Protocols
    9. Distributed Scheduling
    10. Distributed Resource Management
    11. Recovery and Fault Tolerance
    12. Security and Protection
    1. Layered Protocols

    2. Low-level layers
    3. Transport layer
    4. Application layer
    5. Middleware layer
    6. Basic networking model
      ISO OSI model

    7. Drawbacks:
      • Focus on message-passing only
      • Often unneeded or unwanted functionality

    8. Low-level layers
      • Physical layer: contains the specification and implementation of
        bits, and their transmission between sender and receiver
      • Data link layer: prescribes the transmission of a series of bits into
        a frame to allow for error and flow control
      • Network layer: describes how packets in a network of computers
        are to be routed.

      For many distributed systems, the lowest-level interface
      is that of the network layer.

    9. Transport Layer
      The transport layer provides the actual communication facilities for
      most distributed systems.

      Standard Internet Procotols:

      • TCP: connection-oriented, reliable, stream-oriented
      • UDP: unreliable (best-effort) datagram communication

      IP multicasting is often considered a standard available service (which
      may be dangerous to assume).

      Middleware Layer

      Middleware is invented to provide common services and protocols
      that can be used by many different applications

      • A rich set of communication protocols
      • (Un)marshaling of data, necessary for integrated systems
      • Naming protocols, to allow easy sharing of resources
      • Security protocols for secure communication
      • Scaling mechanisms, such as for replication and caching

      What remains are truly application-specific protocols...
      such as?
    10. Types of communication

      We can view the middleware as an additional service in client server computing:
      (Consider, for example an email system.)

      Traditional Client-Server

      Client-server with Middleware


    11. Transient versus persistent communication
    12. Asynchrounous versus synchronous communication
    13. Transient versus persistent:

    14. Transient communication: Comm. server discards message when
      cannot be delivered at the next server, or at the receiver.
    15. Persistent communication: A message is stored at a communication
      server as long as it takes to deliver it.
    16. Asynchronous versus synchronous:

    17. Asynchronous communication: A sender continues immediately after
      it has submitted the message for transmission.
    18. Synchronous communication: The sender is blocked until its request
      is known to be accepted. There are three places that synchronization can take place
      ( see Figure above ):
      • At request submission
      • At request delivery
      • After request processing
    19. Client/Server
      Some observations
      Client/Server computing is generally based on a model of transient
      synchronous communication:

    20. Client and server have to be active at time of commun.
    21. Client issues request and blocks until it receives reply
    22. Server essentially waits only for incoming requests, and
    23. subsequently processes them

    24. Drawbacks of synchronous communication
    25. Client cannot do any other work while waiting for reply
    26. Failures have to be handled immediately: the client is waiting
    27. The model may simply not be appropriate (mail, news)
    28. Messaging
      Message-oriented middleware ( MOM )
      Aims at high-level persistent asynchronous communication:

    29. Processes send each other messages, which are queued
    30. Sender need not wait for immediate reply, but can do other things
    31. Middleware often ensures fault tolerance
    32. Remote Procedure Call (RPC)

      Basic RPC operation

    33. Application developers are familiar with simple procedure model
    34. Well-engineered procedures operate in isolation (black box)
    35. There is no fundamental reason not to execute procedures on
      separate machine
    36. Conclusion
      Communication between caller &
      callee can be hidden by using
      procedure-call mechanism.

      1 Client procedure calls client stub.
      2 Stub builds message; calls local OS.
      3 OS sends message to remote OS.
      4 Remote OS gives message to stub.
      5 Stub unpacks parameters and calls
        6 Server returns result to stub.
      7 Stub builds message; calls OS.
      8 OS sends message to client's OS.
      9 Client's OS gives message to stub.
      10 Client stub unpacks result and returns to
          the client

      RPC: Parameter passing

      Parameter marshaling
      There's more than just wrapping parameters into a message:

    37. Client and server machines may have different data
      representations (think of byte ordering)
    38. Wrapping a parameter means transforming a value into a
      sequence of bytes
    39. Client and server have to agree on the same encoding:
      • How are basic data values represented (integers, floats, characters)
      • How are complex data values represented (arrays, unions)
      • Client and server need to properly interpret messages,
        transforming them into machine-dependent representations.
    40. RPC parameter passing: some assumptions

    41. Copy in/copy out semantics: while procedure is executed, nothing can
      be assumed about parameter values.
    42. All data that is to be operated on is passed by parameters. Excludes
      passing references to (global) data.
    43. Conclusion
      Full access transparency cannot be realized.

      A remote reference mechanism enhances access transparency:

    44. Remote reference offers unified access to remote data
    45. Remote references can be passed as parameter in RPCs
    46. Asynchronous RPCs

      Try to get rid of the strict request-reply behavior, but let the client
      continue without waiting for an answer from the server.

        (a) Traditional RPC
        (b) Asynchronous RPC ( no returned result required )

      Deferred Synchronous RPCs

      Client can also do a (non)blocking poll at the server to see whether
      results are available.

      RPC in Practice

      Client-to-server binding (DCE)

      Issues(1) Client must locate server machine, and (2) locate the server.

    47. Message-Oriented Communication

    48. Transient Messaging
    49. Message-Queuing System
    50. Message Brokers
    51. Example: IBM Websphere
    52. Transient messaging: sockets

      Berkeley socket interface

      Message-Queuing Model

        Loosely-coupled communications using Queues.
        Sender and receiever can execute completely independent of each other.

      Asynchronous persistent communication through support of
      middleware-level queues. Queues correspond to buffers at
      communication servers.

      Basic interface to a queue in message-queuing system
      PUT Append a message to a specified queue
      GET Block until the specified queue is nonempty, and remove
      the first message
      POLL Check a specified queue for messages, and remove
      the first. Never block
      NOTIFY   Install a handler to be called when a message is put
      into the specified queue

      Message Broker

      Message queuing systems assume a common messaging protocol: all
      applications agree on message format (i.e., structure and data
      representation) i.e. the sender needs to have its outgoing messages
      in the same format as that of the receiver

      Message broker
      Centralized component that takes care of application heterogeneity in
      an MQ system:

      • Transforms incoming messages to target format
      • Very often acts as an application gateway
      • May provide subject-based routing capabilities => Enterprise
        Application Integration
        ( publish/subscribe )

      IBM's WebSphere MQ

      Basic concepts:

    53. All queues are managed by queue managers
    54. Application-specific messages are put into, and removed from
    55. Queues reside under the regime of a queue manager
    56. Processes can put messages only in local queues, or through an
      RPC mechanism
    57. Message transfer:

    58. Messages are transferred between queues
    59. Message transfer between queues at different processes, requires
      a channel
    60. At each endpoint of channel is a message channel agent
    61. Message channel agents (MCAs ) are responsible for:
      • Setting up channels using lower-level network communication
        facilities (e.g., TCP/IP)
      • (Un)wrapping messages from/in transport-level packets
      • Sending/receiving packets
      • Channels are inherently unidirectional
      • Automatically start MCAs when messages arrive
      • Any network of queue managers can be created
      • Routes are set up manually (system administration)
    62. Routing
      By using logical names, in combination with name resolution to local queues,
      it is possible to put a message in a remote queue
      Entry in a routing table: (destQM, sendQ)
      Local alias for queue manager names is used to improve management flexibility.

    63. Stream-oriented communication

    64. Support for continuous media
    65. Streams in distributed systems
    66. Stream management
    67. Continuous media

      All communication facilities discussed so far are essentially based on a
      discrete, that is time-independent exchange of information

      Continuous media
      Characterized by the fact that values are time dependent:

    68. Audio
    69. Video
    70. Animations
    71. Sensor data (temperature, pressure, etc.)
    72. Transmission modes
      Different timing guarantees with respect to data transfer:

    73. Asynchronous: no restrictions with respect to when data is to be
    74. Synchronous: define a maximum end-to-end delay for individual
      data packets
    75. Isochronous: define a maximum end-to-end delay and maximum delay variance
      (jitter is bounded)
    76. Stream

      A (continuous) data stream is a connection-oriented communication
      facility that supports isochronous data transmission.

      Some common stream characteristics

    77. Streams are unidirectional
    78. There is generally a single source, and one or more sinks
    79. Often, either the sink and/or source is a wrapper around hardware
      (e.g., camera, CD device, TV monitor)
    80. Simple stream: a single flow of data, e.g., audio or video
    81. Complex stream: multiple data flows, e.g., stereo audio or
      combination audio/video
    82. Streams and QoS

      Streams are all about timely delivery of data. How do you specify this
      Quality of Service (QoS)? Basics:

    83. The required bit rate at which data should be transported.
    84. The maximum delay until a session has been set up (i.e., when an
      application can start sending data).
    85. The maximum end-to-end delay (i.e., how long it will take until a
      data unit makes it to a recipient).
    86. The maximum delay variance, or jitter.
    87. The maximum round-trip delay.
    88. Enforcing QoS

      There are various network-level tools, such as differentiated services
      by which certain packets can be prioritized.

      Use buffers to reduce jitter:

      How to reduce the effects of packet loss (when multiple samples are in
      a single packet)?

        The effect of packet loss in (a)noninterleaved transmission and
        (b) interleaved transmission

      Stream synchronization

      Given a complex stream, how do you keep the different substreams in

      Think of playing out two channels, that together form stereo sound.
      Difference should be less than 20-30 μsec!

      Multiplex all substreams into a single stream, and demultiplex at the
      receiver. Synchronization is handled at multiplexing/demultiplexing
      point (MPEG).

        Time-division multiplexing

    89. Multicast communication

      Multicast communication

    90. Application-level multicasting
    91. Gossip-based data dissemination
    92. Application-level multicasting ( ALM )


      Chord-based tree building

      1 Initiator generates a multicast identifier ( mid ).
      2 Lookup succ(mid), the node responsible for mid. ( see also previous chapter )
      promote the node to become the root of the tree
      3 Request is routed to succ(mid), which has become the root.
      4 If P wants to join, it sends a join request to the root.
      5 When request arrives at Q:
      • Q has not seen a join request before, it becomes a forwarder for the group;
        P becomes child of Q. Join request continues to be forwarded.
      • Q knows about tree; P becomes child of Q. No need to forward
        join request anymore.

      ALM: Some Costs

    93. Link stress: How often does an ALM message cross the same
      physical link? Example: message from A to D needs to cross
      (Ra,Rb) twice.
    94. Stretch: Ratio in delay between ALM-level path and network-level
      path. Example: messages B to C
    95. B --> Rb --> Ra --> Rc --> C (total cost =59)
    96. B --> Rb --> Rd --> Rc -- C (total cost = 47)
    97. => stretch = 59/47 = 1.255.
    98. Epidemic Algorithms

    99. General background
    100. Update models
    101. Removing objects
    102. Basic idea
      Assume there are no write conflicts:

    103. Update operations are performed at a single server
    104. A replica passes updated state to only a few neighbors
    105. Update propagation is lazy, i.e., not immediate
    106. Eventually, each update should reach every replica
    107. Two forms of epidemics

    108. Anti-entropy: Each replica regularly chooses another replica at random,
      and exchanges state differences, leading to identical states at both
    109. Gossiping: A replica which has just been updated (i.e., has been
      contaminated), tells a number of other replicas about its update
      (contaminating them as well).
    110. Anti-entropy

      Principle operations:

    111. A node P selects another node Q from the system at random.
    112. Push: P only sends its updates to Q
    113. Pull: P only retrieves updates from Q
    114. Push-Pull: P and Q exchange mutual updates (after which they
      hold the same information).
    115. Observation
      For push-pull it takes O(log(N)) rounds to disseminate updates to all
      N nodes (round = when every node as taken the initiative to start an


      Basic model
      A server S having an update to report, contacts other servers. If a
      server is contacted to which the update has already propagated, S
      stops contacting other servers with probability 1/k.

      If s is the fraction of ignorant servers (i.e., which are unaware of the
      update), it can be shown that with many servers

      s = e-(k+1)(1-s)
      If we really have to ensure that all servers are eventually updated,
      gossiping alone is not enough

      Deleting Values

      Fundamental problem
      We cannot remove an old value from a server and expect the removal
      to propagate. Instead, mere removal will be undone in due time using
      epidemic algorithms

      Removal has to be registered as a special update by inserting a death

    116. Naming

      1. Naming Entities
      2. Names are used to denote entities in a distributed system.
        To operate on an entity, we need to access it at an access point.
        Access points are entities that are named by means of an address.
      3. A location-independent name for an entity E, is independent from the
        addresses of the access points offered by E.
      4. Identifier

        A name with the following properties:
      5. Each identifier refers to at most one entity
      6. Each entity is referred to by at most one identifier
      7. An identifier always refers to the same entity (prohibits reusing an identifier)
      8. Flat Naming

        Given an essentially unstructured name (e.g., an identifier), how can we locate its associated access point?

      9. Simple solutions:

        • broadcasting: cannot scale beyond LAN
        • Forwarding pointers: When an entity moves, it leaves behind a pointer to next locationi

      10. Home-based approaches

          Use a home location to keep track of the current location of an entity.

      11. Distributed Hash Tables (DHT) (e.g. Chord system)

        Organize many nodes into a logical ring:
        • Each node is assigned a random m-bit identifier.
        • Every entity is assigned a unique m-bit key.
        • Entity with key k associates with node with smallest id ≥ k
          ( called its successor, denoted by succ(K) ).
        • Nonsolution: Let node p keep track of succ ( p + 1 ) as well
          as its precessor pred ( p ) and start linear search along the ring.
        • Use Finger Tables:
        • Each node p maintains a finger table FTp[] with at most m entries (use mod 2m arithmetic ):
            FTp[i] = succ ( p + 2i-1 )       1 ≤ i ≤ m
          Note: FTp[i] points to the first node succeeding p by at least 2i-1.
        • To look up a key k, node p forwards the request to node with index j satisfying q = FTp[j]; FTp[j] ≤ k < FTp[j +1]
          (Stops when k ≤ q, which is the actual node.)
        • If p < k < FTp[1], the request is also forwarded to FTp[1]
        • e.g. Consider resolving k = 26 from node 1.
        • Improvements (with modifications):
          • topology-based assignment of node identifiers
          • proximity routing
          • proximity neighbour selection
      12. Hierarchical Location Service (HLS)

        • Build a large-scale search tree for which the underlying network is
          divided into hierarchical domains. Each domain is represented by a
          separate directory node.
            The root knows every entity location (only up to next level)!
        • HLS Tree organization
          Address of entity E is stored in a leaf or intermediate node
          Intermediate nodes contain a pointer to a child iff the subtree rooted at
          the child stores an address of the entity
          The root knows about all entities
        • HLS Lookup operation
          Basic principles
          Start lookup at local leaf node
          Node knows about E => follow downward pointer, else go up
          Upward lookup always stops at root

      13. DNS vs. Chord

        DNS Chord
        provides a host name to IP address mapping can provide same service: Name = key, value = IP
        relies on a set of special root servers requires no special servers
        names reflect administrative boundaries has no naming structure
        is specialized to finding named hosts or services can also be used to find data objects that are not tied to certain machines