Learn how NT keeps tabs on memory
Last month, I began this two-part series about memory management in Windows NT by introducing the concept of virtual memory. I reviewed the way x86 and Alpha processors use a two-level virtual address-to-physical address translation scheme. I discussed paging and introduced two powerful features of the Memory Manager: memory-mapped files and copy-on-write memory.
This month, I'll go into more detail about the internal data structures the Memory Manager uses to track the state of memory. I'll discuss working sets and the Page Frame Number (PFN) Database. I'll wrap up with a tour inside the additional data structures the Memory Manager uses to track memory shared by two or more applications, and I'll discuss Section Objects, the data structures the PFN Database uses to implement memory-mapped files.
Working Sets
The biggest effect the Memory Manager has on individual applications'
performance and on the system is in its allocation of physical memory to each active process. The amount of memory the Memory Manager assigns to a process is called the working set. Every process has a working set. A special working set, called the system working set, is physical memory that belongs to parts of the NT Executive, device drivers, and the Cache Manager. If a process' working set is too small, the process will incur a high number of page faults as it accesses page table entries that describe data not present in memory but located either in the process' executable image on disk or in a paging file. For each such access, the Memory Manager must intervene and perform disk I/O to retrieve the data. If a process' working set is too large, the process will not incur page faults, but its physical memory might be holding data that the process will not access for some time, and that other processes might require. Thus, the Memory Manager must try to achieve a balance for all processes according to their memory usage patterns.
The choice NT makes for two memory-management policies--the page fetch policy and the page replacement policy--affects the way NT manages working sets. The page fetch policy is the method NT uses to bring in a process' data. NT uses the most common page fetch policy, demand paging. In demand paging, a process' data is paged-in as a process accesses it. A different page fetch policy, prefetching, requires the Memory Manager to aggressively bring in a process' data before the process asks for it. Prefetching can improve the performance of an application at the expense of other applications, because the Memory Manager's prediction of what data a process will ask for is likely to be imperfect and can waste physical memory in storing unused data. NT therefore implements a slight variation on demand paging: clustered demand paging. Instead of bringing into a process' working set only the page that a process accesses, the Memory Manager will attempt to bring in pages that surround the requested page. The idea behind clustered demand paging is that if a process accesses a page, it will likely also access the memory just preceding or following that page. The pages the Memory Manager thus brings in are known as a cluster. In NT, clusters can vary between 0 pages and 7 pages, depending on the amount of physical memory present on the system and whether the system is accessing code or data.
The page replacement policy affects the way the Memory Manager decides
which page to remove from a working set. The Memory Manager must remove pages
whenever physical memory is full of in-use pages and space is necessary to
accommodate a page of data a process is accessing. Two characterizations for
replacement policies are global and local. In a global
replacement policy, the Memory Manager considers all pages of physical memory as
replacement candidates, without regard to which working set the pages belong to.
In a local replacement policy, the Memory Manager considers as replacement
candidates only pages in the working set of the process that is accessing the
page to be brought in. The disadvantage of global policy is that a rogue process
can, by accessing large amounts of memory very quickly, adversely affect all
other processes in the system by forcing their data out of memory. As with its
page fetch policy, NT implements a variation of the replacement policy. This
variation is local, because it first considers the pages in the working set of
the process that is accessing the data. But the variation is also global in that
it removes pages from the working sets of other processes if their memory
requirements are low.
Replacement policies are further characterized by the method with which
they choose a page to replace out of the set of local or global candidates. One
policy, first in, first out (FIFO), removes pages in the order in which
they were added to the working set (i.e., the first page added is the first page
removed). An algorithm that has a more positive effect on the performance of
applications is least recently used (LRU). The LRU algorithm requires
the aid of the MMU to give the Memory Manager an idea of how recently a process
accessed a particular page. LRU replaces first those pages that processes have
not accessed for the longest period of time. On uniprocessors, NT uses the clock
algorithm, a simple form of LRU replacement that is based on the limited access
tracking x86 and Alpha MMUs provide. Whenever a process references a page, these
MMUs set the Accessed flag bit in the page table entries (PTEs) of the page. If
a page's Accessed flag is not set, the Memory Manager knows that no processes
have accessed the page since the last time the Memory Manager examined the
page's PTE.
The working set of every process, including the system working set, has a
minimum size, a current size, and a maximum size. On a typical NT system with
64MB of physical memory, the default minimum size of a working set is 200KB, and
the default maximum size is 1.4MB. Processes with the PROCESS_SET_QUOTA
privilege can override these values using a Win32 API. When a process starts, it
generates a large number of page faults as it references data in its virtual
memory map (most often in its executable image) that it must bring into physical
memory. The Memory Manager lets a process' current working set size grow as
needed to its working set maximum. When a working set reaches its maximum size,
the Memory Manager will allow additional pages into the working set only if
plenty of free (unused) pages are available. If free pages are not available,
the Memory Manager uses the replacement policies I just described to determine
which page in the working set to replace. The pages in the working set are
linked in a list that the Memory Manager scans. On a uniprocessor, if the Memory
Manager finds a page with its Accessed flag set, the Memory Manager clears the
flag and proceeds to the following pages, selecting for replacement the next
page it finds with a cleared Accessed flag. Thus, the Memory Manager avoids
pages that the process has accessed since the previous scan, because the Memory
Manager assumes that the process will access those pages again. If the Memory
Manager finds no pages to select for replacement, it restarts the scan and
almost certainly finds a page whose Accessed flag it cleared on the previous
pass and the process has not accessed since the first scan.
On a multiprocessor system, the Memory Manager does not clear Accessed
flags during scans, because any modification to a PTE on a multiprocessor system
invalidates Translation Look-aside Buffer (TLB) entries (which I described last
month) on all processors. If the Memory Manager invalidates an address' cached
translation by clearing its Accessed flag, the address must again undergo the
slow three-step virtual memory translation process the next time a process
references it. The result is an effectively random page-replacement algorithm on
multiprocessors. Microsoft is developing more effective replacement algorithms
for the multiprocessor version of NT 5.0.
The Balance Set Manager
The Memory Manager adjusts the sizes of working sets once every second, in response to page-in operations or when free memory drops below a certain
threshold. The Memory Manager also examines all working sets to determine
whether the Balance Set Manager thread needs to trim them. If free memory is
plentiful, the Balance Set Manager removes pages from only the working sets of processes whose current size is above minimum that have not recently incurred many page faults. However, if the Memory Manager awakens the Balance Set Manager thread because free memory is scarce, the Balance Set Manager can trim pages from any process until it creates an adequate number of free pages--even to the point of dropping working sets below their minimum size.