Why? Linux VM System
Transcription
Why? Linux VM System
Why? • OS theory is all fine and good, but the real world is messier. Linux VM System – The Linux VM is really complicated, so we’ll just get a glimpse of the structure. • OS research and innovation is still important. – The Linux VM has been replace at least 3 times in the last few versions. (I’ve lost track.) • Caveat: I am glossing over many details, and may get some things wrong. • Problem with Open Source is that documentation is thin or lags behind development. You have seen this before… 0xFFFFFFFF Kernel 0xCFFFFFFF Stack Kernel mapped into upper 1GB of address space Physical Memory Layout Size of RAM 1 GB Dynamically allocated Heap 0x00000000 Dynamically allocated and mapped into virtual addr space on demand BSS segment 16 MB data segment 8 MB text segment 1 MB 0 mem_map DMA region Kernel code image BIOS comm area Kernel address space is 1GB so any physical memory > 1GB is mapped into virtual address space on demand. Physical memory map Reserved for DMA Kernel binary itself Used by some devices 1 Physical Memory Map VM Structures Overview task_struct mem_map page struct ref count addr space # of processes using the page index flags dirty, Linked list of index in locked addr space addr etc. mappings space for mmapped files lru head of the lru list this page is on mm_struct open() mm_struct *mm close() start and end vaddr nopage() vm_file *vm_next Segment Info Virtual mem areas vm_next Top-level page table Refcount (#threads) Segment Info vm_next Start of code section Start of heap etc. Segment Info vm_next Page table structure called on page fault flags (readonly etc.) vm_ops mm_struct task_struct PID State (runnable, etc) User id, grp id, etc. mm_struct *mm sched priority etc. vm_area_struct • One vm_area_struct per segment in the address space – The list of VMAs comprises entire address space – VMAs cannot overlap vm_ops nopage() vm_area_struct vm_area_struct PID State (runnable, etc) User id, grp id, etc. mm_struct *mm sched priority etc. • Linux maintains a global array, mem_map consisting of one entry for each physical page in the system file struct File corresponding to VMA (NULL if anonymous region. • Linux supports three-level page tables in software – These map onto two-level page tables on x86. Linear address PGD PMD PTE Physical Memory Off mm_struct ->pgd PFN PFN PMD# Offset … PTE# vm_area_struct 2 Page Fault Handling Hardware page fault trap OK Reclaiming Page Frames • kswapd: Kernel pageout daemon Allocate page table entries – Thread that executes in the kernel to free up physical pages. Valid VMA Yes Page Present? No • Each physical memory “zone” maintains a count of free pages. SEGV Write access? No On disk? Yes Copy on write Mark page as referenced Yes Read page from disk No Allocate and zero new page Reclaiming Page Frames • Kernel maintains a “page cache” – Set of physical pages corresponding to blocks on disk. • Either files or swap space. – Before doing any disk I/O, always check page cache for the page • Page cache has two lists of pages: “active” and “inactive” – Active pages have been used recently – Inactive pages have not been used recently and may be swapped out. – Zones correspond to DMA memory, “main” memory (up to 1 GB) and “high” memory (above 1GB) Shrinking the page cache • To reclaim physical memory, first try to shrink the page cache – kswapd has a target for the number of pages to free – If a page has reference bit set to 0, move it to “inactive” list – Otherwise, move it to front of the “active” list • This is essentially the Clock algorithm • Next step: Decide which pages in the inactive list to swap out – Tries to minimize disk I/O – If not enough pages freed, try again with higher target – If enough pages freed, lower the target 3 Shrinking the page cache • Other kernel caches are shrunk in a similar manner – e.g., Special caches for file pages, kernel VM allocator and others • Only if we can’t free enough memory from these caches is process memory freed – Scan over process page tables and try to free inactive pages. 4