Syllabus   Blank Homework  
Notes   Labs   Scores   Blank

Lecture Notes
Dr. Tong Lai Yu, March 2010
    0. Review and Overview
    1. An Introduction to Distributed Systems
    2. Deadlocks
    3. Distributed Systems Architecture
    4. Processes
    5. Communication
    6. Distributed OS Theories
        7. Distributed Mutual Exclusions
    8. Agreement Protocols
    9. Distributed Scheduling
    10. Distributed Resource Management
    11. Recovery and Fault Tolerance
    12. Security and Protection
     
    Distributed Scheduling
    1. Load Balancing

      some may be lightly loaded, some moderately loaded, some heavily loaded
    2. Differences between "N-times" faster processor and processor pool of N processors is interesting.
    3. While the arrival rate is N times the individual rate, the service rate is not unless the pool is constantly maximally busy
    4. but one N-times-faster processor can be much more expensive ( or non-existent ) than N slow processor
    5. Let's analyze N isolated systems to see the problem of underutilization in the absence of load balancing -- consider a system of N identical and independent M/M/1 servers:

        Probability of at Least One Task waiting,
        and At Least One processor Idle

    6. Issues in Load Distribution

      Load

    7. Resource and CPU queue lengths are good indicators of load.
    8. Artificially increment CPU queue length for transferred jobs on their way.
    9. Set timeouts for such jobs to safeguard against transfer failures.
    10. Little correlation between queue length and CPU utilization for interactive jobs: use utilization instead.
    11. Monitoring CPU utilization is expensive
    12. Modeling -- Poisson Process, Markov process, M/M/1 queue, M/M/N
    13. Classification of Algorithms

    14. Static -- decisions hard-wired into algorithm using prior knowledge of system
    15. Dynamic -- use state information to make decisions.
    16. Adaptive -- special case of dynamic algorithms; dynamically change parameters of the algorithm
    17. Load Sharing vs. Load Balancing

    18. Load Sharing -- reduce the likelihood of unshared state by transferring tasks to lightly loaded nodes
    19. Load Balancing -- try to make each load have approximately same load
    20. Preemptive vs. Nonpreemptive

    21. Preemptive transfers -- transfer of a task that is partially executed, expensive due to collection of task's state
    22. Nonpreemptive transfers -- only transfer tasks that have not begun execution.
    23. Components of Load Distribution

    24. Transfer policy -- threshold based, determine if a process should be executed remotely or locally
    25. Selection policy -- which task should be picked, overhead in transfer of selected task should be offset by reduction in its response time
    26. Location policy -- which node to be sent, possibly use polling to find suitable node
    27. Information policy -- when should the information of other nodes should be collected; demand-driven, or periodic, or state-change-driven
      Demand-driven:
        nodes gather information about other nodes
      • sender initiated
      • receiver initiated
      • symmetrically initiated
      Periodic :
        nodes exchange information periodically
      State-change-driven :
        nodes disseminate information when their state changes
    28. Stability -- queueing-theoretic, or algorithmic perspective
    29. Sender Initiated Algorithms

    30. overloaded node-- when a new task makes the queue length ≥ threshold T
    31. underloaded node -- if accepting a task still maintains queue lenght < threshold T
    32. overloaded node attempts to send task to underloaded node
    33. only newly arrived tasks considered for transfers
    34. location policies:
      • random -- no information about other nodes
      • threshold -- polling to determine if its a receiver ( underloaded )
      • shortest -- a # of nodes are polled at random to determine their queue length
    35. information policy: demand-driven
    36. stability: polling increases activites; render the system unstable at high load
    37. Receiver Initiated Algorithms

    38. initiated by underloaded nodes
    39. underloaded node tries to obtain task from overloaded node
    40. initiate search for sender either on a task departure or after a predetermined period
    41. information policy: demand-driven
    42. stability: remain stable at high and low loads
    43. disadvantage: most tasks transfer are preemptive
    44. Symmetrically Initiated Algorithms

    45. senders search for receivers --good in low load situations; but high polling overhead in high load situations
    46. receivers search for senders -- useful in high load situations, preemptive task transfer facility is necessary
    47. Stable Symmetrically Initiated Algorithms

    48. use the information gathered during polling to classify nodes in system as either Sender/overload, Receiver/underload, or OK
    49. Comparisons

    50. Anything better than no load distribution
    51. Receiver better than sender under high load

    52. RAND -- Sender initiated algorithm with random location policy
      SEND -- Sender initiated with threshold policy

      Comparing with Symmetrically Initiated Load Sharing
    53. better than sender-initiated
    54. but unstable at high-load

    55. Comparing with Stable Symmetrically Initiated Algorithms
    56. ADSYM ( stable SYM ) very good

    57. Performance under Hetergeneous Workloads
    58. some nodes generate load while others generate nothing
    59. system load = 0.85
    60. nodes shown originate none of the system workload, while the remaining nodes originate all of the system workload
    61. RECV very unstable (random probes unlikely to find work)
    62. SEND also unstable, can't get rid of load fast enough
    63. ADSYM very good