Syllabus   Blank Homework   Quizzes  
Notes   Labs   Scores   Blank

Lecture Notes
Dr. Tong Lai Yu, 2010

  1. Introduction
  2. Processes
  3. Inter Process Communication
  4. Deadlocks
  5. Memory Management
 
  1. File Systems
  2. Protection and Security
  3. I/O Systems


The fruit of silence is PRAYER. The fruit of prayer is
FAITH. The fruit of faith is LOVE. The fruit of love
is SERVICE.  The fruit of service is PEACE.

				MOTHER TERESA
Memory Management
  1. Memory Hierarchy

    NUMA -- NonUniform Memory Access, meaning that different types of memory have different access times.

    The part of the OS that manages the memory hierarchy is called memory manager.

  2. Introduction

    Memory Management without swaping or paging:

  3. Monoprogramming
    e.g. MS-DOS, running one process at a time
    No memory abstraction -- every program sees the physical memory:
      MOV register1, [1000]
          moves the content of memory location 1000 to register1.

  4. Multiprogramming
    Advantages:
    • easier to program by splitting into 2 or more processes
    • process spend time to wait for I/O to complete

      Example:

      A process spends fraction p of its time in I/O wait.

      if n processes in memory

      CPU is not utilized if all processes are waiting

        Probability Pwait = pn

        Thus CPU utilization α = 1 - Pwait = 1 - pn

      n -- degree of multiprogramming

      e.g p = 1/2

      n α
      1 1/2 ( 50% )
      2 3/4 ( 75% )
      3 7/8 ( 87.5% )
      .. ..
      10 99.9%

  5. Swapping

    moving processes from main memory to disk and back

    variable partition

    A
    B
    C
     
    D
    hole
    B
    E
    hole
    External fragmentation

    memory compaction -- combine all holes into one big hole ( it may require a lot of time; mainframe has special hardware to handle compaction )

    Internal fragmentation

    consider a hole ~ 10244 bytes

    next process requires 10242 bytes

    2 bytes left, overhead to keep track of the 2 bytes is much larger than 2 bytes

    so better allocate the 2 bytes to the process → internal fragmentation

    Memory allocation with swapping

    Use a linked list to handle dynamic storage allocation

    A set of holes with various sizes scattered throughout memory

  6. first fit -- allocate the first hole that is big enough; broken up into 2 pieces, one for process and one for unused

  7. next fit -- starts searching from where it is left off

  8. best fit -- allocate the smallest hole that is big enough => smallest leftover hole

  9. worst fit -- allocate largest hole => largest leftover hole

  10. quick fit -- maintains separate lists of holes of similar sizes
  11. Virtual Memory

  12. Paging

    • old days, program split into pieces -- overlay, overlay0, overlay1 swapped by OS but program split was done by programmers
    • modern computers have special hardware called a memory management unit (MMU). Whenever the CPU wants to access memory (whether it is to load an instruction or load or store data), it sends the desired memory address to the MMU, which translates it to another address before passing it on the the memory unit.
    • address generated by the CPU -- virtual address,
    • address translated to by the MMU -- physical address
    • virutal address space is broken up into equal-sized units -- pages
    • the corresponding units in physical memory is called page frames
    • page size = page frame size
    • page tables
    • logical address ( virtual address ) is mapped into physical address
       

      Example

      page     64K virtual
                    address space
       32K main mem   page frame
      0 0
      4K
      2
      1  
      8 K
      1
      2  
      12 K
      6
      3  
      16 K
      0
      4  
      20 K
      4
      5  
      24 K
      3
      6  
      28 K
      X
      7  
      32 K
      X
      8  
      36 K
      X
      9  
      40 K
      5
      10  
      44 K
      X
      11  
      48 K
      7
      12  
      52 K
      X
      13  
      56 K
      X
      14  
      60 K
      X
      15  
      64 K
      X
       
      3   0
      4K  
      0
      1    
      8 K
      1
      0    
      12 K
      2
      5    
      16 K
      3
      4    
      20 K
      4
      9    
      24 K
      5
      2    
      28 K
      6
      11    
      32 K
      7

    • MOV register, [ 36K + 10 ]
        page table => page frame 5

        Thus, MMU maps the address to 20K + 10 and put it on the bus

    • MOV register, [52K]
        page fault,

        CPU generates a trap
        put a page frame back on disk
        and put required page to main memory

      e.g. want to access location 8196

      8196 = 8192 + 4 = 8K + 4     so it is in page 2

      Virtual Address
      page # 4K address
      0   0   1   0 0 . . .              . . . 0 1 0 0
      15  0

      page
      #
      page
      frame
      0 0 1 0 1
      1 0 0 1 1
      2 1 1 0 1
      3 0 0 0 1
      4 1 0 0 1
      5 0 1 1 1
      6 0 0 0 0
      7 0 0 0 0
      8 0 0 0 0
      9 1 0 1 1
      10 0 0 0 0
      11 1 1 1 1
      12 0 0 0 0
      13 0 0 0 0
      14 0 0 0 0
      15 0 0 0 0

      valid/invalid
       
       
       
       
       
       
       
       
      1 1 0 0 ... ... 0 1 0 0

    • The tables used by the MMU have a valid bit for each page in the virtual address space. If this bit is set, the translation of virtual addresses on a page proceeds as normal. If it is clear, any attempt by the CPU to access an address on the page generates an interrupt called a page fault trap
    • Heavily used programs such as assembler, compiler, data base systems and so on can be shared among different users. The only condition for it is the code must be reentrant:
    • It is written so that its code cannot be modified (no values are changed).
    • It does not keep track of anything.
    • Calling programs keep track of their own progress.
    • Thus one copy of a reentrant routine can be shared by users or processes.
    • It is crucial the correct functionality of shared paging scheme that the pages are unchanged. If one user were to change a location, it would be changed for all other users.

  13. Segmentation
    In an OS system, information is grouped in blocks.
    When blocks are a fixed size, they are called pages and the
    associated virtual memory organization is called paging. ( See previous section )
    When blocks may be of different sizes, they are called segments and the
    associated virtual memory organization is called segmentation.
    In block mapping, the system represents a virtual addres v as an ordered pair:
      v = ( b, d )
    where b = block number, d = displacement ( offset )

    In paging, a user's view of memory is separated from actual physical memory

    user's view of memory -- a collection of variable-sized segments, e.g. stacks, data, symbol table, main program, functions, ...

    Segmentation is a memory-management scheme that supports its user view of memory; a logical memory is a collection of segments:

    • A logical address consists of a segment number s and an offset d.
    • The segment number s is used as an index into the segment table.
    • Each entry in the segment table has a segment base, which points where the physical memory begins and a segment limit which points where physical memory ends. Therefore, the offset d must be between 0 and the limit

    Implementation of segment tables:

    • The segment table can be kept either in fast registers or in system memory.
    • kept in registers: can be very quickly referenced
    • kept in system memory: needed when a program has a large number of segments. Need
    • Segment Table Base Register STBR points to the segment table's location in memory,
    • Segment Table Length Register STLR indicates the number of segments used by a program:
        segment number s is legal if s < STLR

  14. Page Replacement Algorithms

    • Optimal Page Replacement

      replace the page that will not be used for the longest period of time

      guarantees lowest page fault but almost impossible to implement

      mainly for comparison

    • Not-Recently-Used ( NRU ) Page Replacement

      associate with each page 2 bits:

      R -- referenced bit
      M -- modified bit ( dirty bit )

      R is set on any read to the page

      M is set when the page is written

      classfunctions R M
      0not referenced, not modified     0 0
      1not referenced, modified 0 1
      2referenced, not modified 1 0
      3referenced, modified 1 1

      NRU removes a page at random from the lowest nonempty class

      easy to understand

      efficient to implement

    • First-in First-out ( FIFO ) Replacement

      replace the oldest page

      Belady's anomaly
      page
      reference
      youngest    oldest page
      fault
      -----
      00--Fault
      110-F
      2210F
      3321F
      0032F
      1103F
      4410F
      0410N
      1410N
      2241F
      3324F
      4324N
      9 faults
      3 page frames, 5 pages
       
      page
      reference
      youngest    oldest page
      fault
      ------
      00---Fault
      110--F
      2210-F
      33210F
      03210N
      13210N
      44321F
      00432F
      11043F
      22104F
      33210F
      44321F
      10 faults
      4 page frames, 5 pages

    • Second Chance Replacement a simple modification to FIFO

      reference bit R
      inspect oldest page
      if R = 0, page is replaced immediately
      if R = 1, then set R → 0 and page is put at end of queue and inspect next oldest page

    • Least-Recently-Used ( LRU ) Replacement

      select the page that has not been used for the longest time
      expensive, need a linked list of all pages in memory
      A hardware implementation:

    • n pages
    • hardware maintains an n x n matrix of bits, all initialized to 0
    • when page frame k is reference, sets all bits of row k to 1, then set all bits of column k to 0
    • the row with lowest binary value is least-recently-used frame


      Pages accessed in order: 0 1 2 3 2 1 0 3 2 3

    • clock algorithm -- an efficient way to approximate LRU

         uses circular queue and reference bits as mechanisms
      
         current   |  new value
         ref. bit  |  ref. bit       action
        -----------+------------------------------------------------------
                   |
             0     |      0       replace this page, move pointer forward
                   |
             1     |      0       skip this page for now, continue search
      
         
      two-handed clock uses both reference and modified bits
      
          current   |  new value
         ref. mod.  |  ref. mod.       actions
        ------------+------------------------------------------------------
                    |
          0    0    |   0    0      replace this page (when any scheduled
                    |                  write back on it is completed), move
                    |                  pointer forward
                    |
          0    1    |   0    0      clean the page (i.e., write back), skip
                    |                  this page for now, continue search
                    |
          1    0    |   0    0      skip this page for now, continue search
                    |
          1    1    |   0    1      skip this page for now, continue search
         
  15. Design Issues for Paging System

  16. When to Load a Page
  17. Demand Paging (lazy loading): OS loads a page the first time it is referenced.
    • Page fault occurs when the page is loaded
    • May remove a page from memory to release space for the new page
    • Process gives up CPU while page-loading
  18. Pre-pageing: OS guesses what pages the process will need and pre-loads them in memory (see below )
  19. If the OS guesses wrong, page fault occurs
  20. Difficult to guess right because of branches in code
  21. The Working Set Model

    • temporal ( time ) locality -- storage location referenced recently are likely to be referenced in near future, e.g.
    • stacks
    • subroutines
    • looping
    • spatial locality -- storage references tend to be clustered so that once a location is referenced, it is highly likely that nearby locations will be referenced e.g.
    • array traversals
    • sequatial code execution

    Working set -- the set of pages that a process is currently using

    Denning

    W( Δ , t ) -- at time t, the set consists of all pages used by most recent memory reference

    Δ -- working set window

    A process is loaded in RAM only if all of the pages that it is currently using (often approximated by the most recently used pages) can be in RAM.
    If a process needs more pages to run, and there is no room in RAM, the process is swapped out of memory to free the memory for other processes to use.

    A process starts with no pages in memory and a number of page faults until the working set is in memory. (In pure swapping, all memory pages for a process is swapped from secondary storage to main memory.)

    if too many processes are running, page faults may occur in every few instructions; this occurs when the total size of all working set exceeds the number of frames → thrashing

    to avoid thrashing, we can keep track of each process's working set and make sure that it is in memory before letting the process in ( prepaging )
    A process will never be executed unless its working set is resident in main memory. Pages outside the working set may be discarded at any time

    use the recent needs of a process to predict its future needs

    aging bit -- if a page is not referenced for a certain amount of time, it will be dropped from the working set

  22. Local Users' Global Allocation

    local -- each process is allocated a fixed amount of memory

    global -- dynamically allocate page frames among the runnable processes

    in general global works better

    One way to manage allocation is to use PFF ( Page Fault Frequency ) algorithm, which tells us when to increase or decrease a process's page allocation


  23. Page Size

    on average, half of the final page is wasted ( internal fragmentation )

    if n segments, page size = p bytes,

      n x p / 2   bytes wasted
    this may lead to the conclusion of smaller page size is better

    But small pages => large page table

    suppose

      each table entry requires e bytes

      page size = p bytes

      wasted memory in each page = p / 2

      average process size = s bytes

      number of pages needed per process = s / p

      total table space = s x e / p

      total overhead H = s x e
      p
      + p
      2

      Minimize

      0 = d H
      d p
      = - s x e
      p2
      + 1
      2

      => p = ( 2se ) ½

      if s = 32K, e = 8 bytes, p ~ ( 2 x 32K x 8 ) ½ ~ 724 ( bytes )

      ~ 512, 1 K, 2K, 4K

  24. Class Exercises

    1. The low cost of main memory coupled with the increase in memory capacity in most systems has obviated the need for memory management strategies. True or false?
        False. Despite the low cost and high capacity of main memory, there continue to be environments that consume all available memory. Also, memory management strategies should be applied to cache, which consists of more expensive, low-capacity memory. In either case, when memory becomes full, as system must implement memory management strategies to obtain the best possible use of memory.
    2. Why is first-fit an appealing strategy?
        Because it does not require that the free memory list be sorted, so it incurs little overhead. Howerver, it may operate slowly if the holes that are too small to hold the incoming job are at the front of the free memory.
    3. The number of faults for a particular process always decreases as the number of page frames allocated to a process increases. True or false?
        False. This is indeed the normal behavior, but if the algorithm is subject ot Belady's Anomaly, the number of faults might decrease.
    4. Does looping through an array exhibit both spatial and temporal locality? Why?
        Yes. It exhibits spatial locality because the elements of an array are contiguous in virtual memory. It exhibits temporal locality because the elements are generally much smaller than a page. Therefore references to two consecutive elements usually result in the same page being referenced twice within a short period of time.
    5. LRU is designed to benefit processes that exhibit spatial locality. True or false.
        False. LRU benefits processes that exhibit temporal locality.
    6. Suppose a block mapping system represents a virtual address v = ( b, d ) using 32 bits.
      If d is n bits, how many blocks does the virtual address space contain?
      Discuss how setting n = 6, n = 12, and n = 24 affects memory fragmentation and the overhead
      incurred by mapping information.
        2^(32-n) blocks. If n = 6, block size would be small and there is not much internal fragmentation,
        but the number of blocks would be so large as to make implemenation infeasible.
        If n = 24, quite a lot of internal fragmentation. But block mapping table would not consume too much memory
        n = 12, balance, appropriate size. page size = 2^12 = 4096 has been a popular choice.