Syllabus   Blank Homework  
Notes   Labs   Scores   Blank

Lecture Notes
Dr. Tong Lai Yu, March 2010
    0. Review and Overview
    1. An Introduction to Distributed Systems
    2. Deadlocks
    3. Distributed Systems Architecture
    4. Processes
    5. Communication
    6. Distributed OS Theories
        7. Distributed Mutual Exclusions
    8. Agreement Protocols
    9. Distributed Scheduling
    10. Distributed Resource Management
    11. Recovery and Fault Tolerance
    12. Security and Protection
    A man who gives in to temptation after five minutes simply
    does not know what it would have liked an hour later.
    That is why bad people, in a sense, know very little about badness.
    The have lived a sheltered life by always giving in.
    						C.S. Lewis
    Recovery and Fault Tolerance
    1. Basic Concepts
    2. A system consists of a set of hardware and software components and is designed to provide a specified service.
    3. Failure of a system occurs when the system does not perform its services in the manner specified.
    4. An erroneous state of the system is a state which could lead to a system failure by a sequence of valid state transitions
    5. A fault is an anomalous physical condition.
    6. An error is a manifestation of a fault in a system, which can lead to system failure.
    7. Failure recovery is a process that involves restoring an erroneous state to an error-free state.
    8. Failure Classification
    9. process failure
    10. system failure
    11. secondary storage failure
    12. communication medium failure
    13. System Model

      Assume stable storage:

    14. does not lose information in the event of system failure
    15. is used to keep logs & recovery points

    16. Stable Storage:

      After a crash:

    17. If both disks are identical: you're in good shape.
    18. If one is bad, but the other is okay (checksums): choose the good one.
    19. If both seem okay, but are different: choose the main disk.
    20. If both aren't good: you're in bad shape.
    21. Recovery

      Forward recovery

      e.g. send signals to satellite

      need error correction code

      Backward recovery

    22. rollback recovery
    23. based on recovery points
    24. two approaches:
      1. operation-based recovery

        record all modifications in sufficient detail so that a previous state of the process can be restored by reversing all the changes

      2. state-based recovery

        the complete state of a process is saved at various checkpoints

      A process takes a checkpoint from time to time by saving its state in stable storage

      need consistent global state

      the state of channels corresponding to a global state is the set of messages sent but not yet received

      A check point is saved as a local state of a process

      A set of check points one per process in the system, is consistent if the saved state form a consistent global state

      Two approaches to create check points:

      1. State based or operation based: processes take checkpoints independently and save all checkpoints in stable storage ( asynchronous )
      2. Global checkpointing: processes coordinate their checkpointing actions such that each process saves only its most recent checkpoints, and the set of checkpoints in the system is guaranteed to be consistent

      Independent Checkpointing

      Each process independently takes checkpoints, with the risk of a cascaded
      rollback to system startup:
      1. Let CP[i](m) denote mth checkpoint of process Pi and INT[i](m) the interval between CP[i](m-1) and CP[i](m)
      2. When process Pi sends a message in interval INT[i](m), it piggybacks (i,m)
      3. When process Pj receives a message in interval INT[j](n), it records the dependency INT[i](m) → INT[j](n)
      4. The dependency INT[i](m)→ INT[j](n) is saved to stable storage when taking checkpoint CP[j](n)

        If process Pi rolls back to CP[i](m1), Pj must roll back to CP[j](n1).

      Orphan messages and Domino effect:

      May lead to unacceptable delays.

      • After failure, Y has to roll back to y2.
      • If X only rolls back to x3, an extra message m will be recorded.
      • So it must roll back to x2.

      Lost messages:

      • If the system is restored to state {x1, y1}, message m is lost as X has past the point where it sends m as X has past the point where it sends m.


      A situation in which a single failure can cause an infinite number of rollbacks, preventing the system from making progress.

      • Y fails before receiving message n1 (upper figure).
      • When Y rolls back to y1, there's no record of sending m1. So X has to roll back to x1.
      • When Y recovers, it sends out m2 and receives n1 (lower figure).
      • X sends out n2 and receives m2.
      • But X has no record of sending out n1, so Y has to roll back,
      • which forces X to roll back also ....
      So, state-based or operation-based are not adequate in many cases.

      Coordinated Checkpointing

      Each process takes a checkpoint after a globally coordinated action.

      Strongly Consistent Set of Checkpoints

      Consistent recovery state

      Every message that has been received is also shown to have been
      sent in the state of the sender (i.e. strongly consistent set of checkpoints).

      Recovery line
      Assuming processes regularly checkpoint their state, the most recent
      consistent global checkpoint.
          Figure: A recovery line

      If and only if the system provides reliable communication.
      Sent messages should also be received in a consistent state.

      Message logging

      Instead of taking an (expensive) checkpoint, try to replay your
      (communication) behavior from the most recent checkpoint: store messages in a log.


      We assume a piecewise deterministic execution model:
    25. The execution of each process can be considered as a sequence of state intervals
    26. Each state interval starts with a nondeterministic event (e.g., message receipt)
    27. Execution in a state interval is deterministic
    28. Conclusion

      If we record nondeterministic events (to replay them later), we obtain a
      deterministic execution model that will allow us to do a complete replay

      Message logging and consistency

      When should we actually log messages?
      Issue: Avoid orphans:
    29. Process Q has just received and subsequently delivered messages m1 and m2
    30. Assume that m2 is never logged.
    31. After delivering m1 and m2, Q sends message m3 to process R
    32. Process R receives and subsequently delivers m3.
      • Figure : Incorrect replay of messages after recovery, leading to an orphan process.
    33. Message-logging schemes

    34. HDR[m]: The header of message m containing its source, destination, sequence number, and delivery number
      The header contains all information for resending a message and
      delivering it in the correct order (assume data is reproduced by the application)
      A message m is stable if HDR[m] cannot be lost (e.g., because it has been written to stable storage)
    35. DEP[m]: The set of processes to which message m has been delivered, as well as any message that causally depends on delivery of m
    36. COPY[m]: The set of processes that have a copy of HDR[m] in their volatile memory


      If C is a collection of crashed processes, then Q ∉ C is an orphan if there is a message m such that Q ∈ DEP[m] and COPY[m] ⊆ C (i.e. All processes that have a copy of m are crashed and Q depends on m!)


      We want ∀m ∀C :: COPY[m] ⊆ C => DEP[m] ⊆ C. This is the same as saying that ∀m :: DEP[m] ⊆ COPY[m].


      No orphans means that for each message m,
        DEP[m] ⊆ COPY[m]

      Pessimistic protocol

      For each nonstable message m, there is at most one process dependent on m, that is |DEP[m]| ≤ 1.


      An unstable message in a pessimistic protocol must be made stable before sending a next message.

      Optimistic protocol

      For each unstable message m, we ensure that if COPY[m] ⊆ C, then eventually also DEP[m] ⊆ C, where C denotes a set of processes that have been marked as faulty


      To guarantee that DEP[m] ⊆ C, we generally rollback each orphan process Q until Q ∉ DEP[m]