Syllabus   Blank Homework  
Notes   Labs   Scores   Blank

Lecture Notes
Dr. Tong Lai Yu, March 2010
    0. Review and Overview
    1. An Introduction to Distributed Systems
    2. Deadlocks
    3. Distributed Systems Architecture
    4. Processes
    5. Communication
    6. Distributed OS Theories
        7. Distributed Mutual Exclusions
    8. Agreement Protocols
    9. Distributed Scheduling
    10. Distributed Resource Management
    11. Recovery and Fault Tolerance
    12. Security and Protection
     
    Distributed Resource Management
    
    To fight a disease after it has occurred is like trying to dig a well
    when one is thirsty or forging a weapon once a war has begun.
    
    						Chinese Medicine
    
    1. Distributed File Systems
    2. Overview
      most visible part of OS, organized with tree structure

    3. Architecture
      • file servers and file clients interconnected by a communication network.
      • two most important components: name server and cache manager.
      • name server: map names to stored objects (files, directories)
      • cache manager: perform file caching. Can be present on both servers and clients.
          o Cache on the client deals with network latency
          o Cache on the server deals with disk latency

    4. Typical steps to access data
      • Check client cache, if present, return data
      • Check local disk, if present, load into local cache, return data
      • Send request to file server
      • ...
      • server checks cache, if present, load into client cache, return data
      • disk read
      • load into server cache
      • load into client cache
      • return data
          (a)The remote access model (e.g. NFS)     (b) The upload/download model (e.g. FTP service)

          The basic NFS architecture for UNIX systems.
        • Virtual File System (VFS): standard for interfacing to different (distributed) file systems
        • Stateless NFS server up to version 3,
        • Statefull version 4 and later
        • Open Network Computing RPC (ONC RPC) protocol is used for communications between a client and a server

    5. Mechanisms:
      • mounting: binding together different filename spaces to form a single hierarchically structured name space. o The mount table maintains the mapping from mount points to storage devices.
        o Can be used in distributed file systems to do name resolution. Where to maintain the mount information?
        • at the clients: (NFS) different clients may see different files
        • at the server: all clients may see the same filename space. Good when files move around the servers.
      • Caching: a common technique to exploit locality and reduce delays.
      • Hints: caching without cache coherence. Cache coherence operations too expensive in the distributed systems (too many communications). Hints work well for applications that can recover from invalid data (e.g. address mapping)
      • Bulk Data Transfer: communication cost = S + C*B, S: startup cost, mostly software, C: per byte cost, B: number of bytes to send.
      • Encryption: typical method to enforce security.

    6. Concerns
    7. Naming
    8. Data transfer
    9. Locking and access synchronization
    10. Coherency
    11. Security
    12. Data Replication

    13. Enhance reliability
    14. Improve performance
    15. Protection against data corruption
    16. Replicas allow data to reside close to where it is used.
    17. Directly supports the distributed systems
    18. Enhanced scalability
      • e.g. serving more web clients
      • When system grows (more users), it slows. So replication helps
        by moving data closer to users
    19. Data-Centric Consistency Models

    20. A data-store can be read from or written to by any process in a distributed system.
    21. A local copy of the data-store (replica) can support "fast reads".
    22. However, a write to a local replica needs to be propagated to all remote replicas.
    23. Consistency model
      • Contract between processes and the data store
      • If processes obey certain rules, data store will work correctly
    24. All models attempt to return the results of the last write for a read operation
      • Differ in how "last" write is determined/defined
      Strict Consistency

        (a) A strictly consistent store
        (b) Store not strictly consistent
        W(x)a means a write by the process to data item x with the value a
        R(x)a: read data item x by the process returning a

      Sequential Consistency

        (a)A sequentially consistent data store.     (b) A data store that is not sequentially consistent.

      Causal Consistency

        This sequence is allowed with a causally-consistent store, but not with a sequentially consistent store.


        (a) A violation of a causally-consistent store.     (b) A correct sequence of events in a causally-consistent store.
    25. Client-Centric Consistency Models

      Goal
      Show how we can perhaps avoid systemwide consistency, by
      concentrating on what specific clients want, instead of what should be
      maintained by servers.
      eventual consistency:

    26. Very weak consistency (e.g. some data stores involve reading
      most of the time),
    27. Lack of simultaneous update
      e.g. web pages are maintained by a single person
    28. The only requirement is that all replicas will eventually be the same.
    29. All updates must be guaranteed to propogate to all replicas ... eventually!
    30. This works well if every client always updates the same replica.
    31. Things are a little difficult if the clients are mobile.
    32. Example
      Consider a distributed database to which you have access through
      your notebook. Assume your notebook acts as a front end to the
      database.

    33. At location A you access the database doing reads and updates.
    34. At location B you continue your work, but unless you access the
      same server as the one at location A, you may detect inconsistencies:
      • your updates at A may not have yet been propagated to B
      • you may be reading newer entries than the ones available at A
      • your updates at B may eventually conflict with those at A
    35. This problem is alleviated by client-centric consitency model:
    36. When the system can guarantee that a single client sees accesses to the
      data-store in a consistent way, we then say that client-centric consistency holds.
    37. A client should never be able to move "back in time" with respect to the data store
    38. The emphasis is more on maintaining a consistent view of things
      for the individual client process that is currently operating on
      the data-store.
      • Monotonic Reads
        Monotonic-read consistency: If a process reads x, any successive read on x by that process will always return that same value or a more recent value.
          The R(x) operations are performed by the same process P
          at sites L1 and L2.

        a) A monotonic-read consistent data store.
        b) A data store that does not provide monotonic reads.
        WS -- set of series of write operations at site since initialization
      • Monotonic Writes
        Monotonic-write consistency: If a process writes x, and finishes before any successive writes of x by the same process.
          The W(x) operations are performed by the same process P
          at sites L1 and L2.

        a) A monotonic-write consistent data store.
        b) A data store that does not provide monotonic-write consistency.
      • Read Your Writes The effect of a write operation by a process on x will always be seen by successive read operation on x by the same process.

        (a) Yes
        (b) No
      • Writes Follow Read A write operation by a process on x following a previous read operation on x by the same process is guarantee to take place on the same or a more recent value of x that was read.

        (a) Yes
        (b) No
    39. Distributed Shared Memory ( DSM )


      Messages are hidden from programmers.
      Node 1
       
      Memory
       
       
      Mapping
      Manager
       
      Node 2
       
      Memory
       
       
      Mapping
      Manager
      . . . . . .
      Node 3
       
      Memory
       
       
      Mapping
      Manager
            *                                         *                                                                          *
                      *                                  *                                                            *
                                *                           *                                              *
                                          *                    *                                *
                                                    *             *                  *
      Shared Memory

      Advantages of DSM:

    40. Easier programming:
      • No need to deal with communication details, unlike message passing model ( e.g. RPC ), data movement is transparent to users
      • Easy to handle complex data structures
    41. DSM systems are much cheaper than tightly coupled multiprocessor systems (DSMs can be built over commodity components).
    42. DSM takes advantages of the memory reference locality -- data are moved in the unit of pages.
    43. DSM can form a large physical memory.
    44. Programs written for shared memory multiprocessors can easily be ported to DSMs
    45. Challenges of DSM:

    46. How to keep track of the location of remote data?
    47. How to overcome the communication delays and high overhead associated with the references to remote data?
    48. How to allow "controlled" concurrent accesses to shared data?
    49. Mapping manager:

      Shared memory partitioned into pages

      pages |
      |
      -


      -
      read-only -- can have copies reside in physical memories
      of many processors at the same time
      write -- can reside in only one processor's physical memory

      A memory reference causes a page fault when the page containing the memory location is not in a processor's current physical memory.
      When this happens, M.M.M. retrieves the page from either disk or from the memory of another processor
      If the page also has copies in other nodes, then some work must be done to keep the memory coherent.

      A parallel program is a set of threads or processes that share a virtual address space.

      allow processes of a program to execute on different processors in parallel

      Implementations

    50. Central Server
      • A central-server ( centralized manager at a single processor ) maintains all the shared data.
      • Manager maintains mutual exclusive access to data.
      • for read: the server just return the data
      • for write: update the data and send acknowledgement to the client
      • Data can be distributed -- need a directory to store the location of a page, done by maintaining a table which has one entry for each page
      • The central manager can be a bottle neck.

    51. Data Migration
    52. Each node has a MM
    53. Data blocks are sent to the request location; subsequent accesses can be performed locally
    54. For both read/write: get the remote page to the local machine, then perform the operation.
    55. Keeping track of memory location: location service, home machine for each page, broadcast.
    56. Problems: thrashing -- pages move between nodes frequently, false sharing
    57. Multiple reads can be costly.
    58. Read-replication
    59. In previous approaches, only processes on one node could access shared data at any one moment
    60. read-replication -- replicating data blocks, allow multiple nodes to have read access and one node to have write-access
    61. a write to a copy causes all copies of the data to be updated or invalidated
    62. All locations must be kept track of: locations of service/home machines
    63. Full-replication
      • Allows multiple read and multiple write concurrently
      • Must control the access to shared memory
      • Need to maintain consistency
        e.g. use a gap-free sequencer, all nodes wishing to modify shared data will send the modification to a sequencer which multicast the modifications

        if gap between sequence # => something missing → request retransmission of the modifications it has missed