Linux-Kongreß´97

Linux Virtual Memory

This session will talk about the Linux Virtual Memory (VM) subsystem. In particular, we will be looking at a number of changes in the way things work since the 1.2 kernels - the new, streamlined 2.0 VM system has a number of substantial improvements over the old system, offering a cleaner mechanism with much better performance and functionality than older kernels.

First of all, we will look at the functionality managed by the VM system. The principle of VM will be familiar to many readers: the operating system allows many different processes each to see their own partition of CPU memory. Memory owned by one process is invisible to another. Further more, we are dealing with virtual memory: a page of memory owned by one process need not correspond to exactly one page of physical memory. Unused process memory may be evicted to disk, and many different processes may each be able to see a single page of memory at once. Finally, the VM system supports memory mapping: a file on disk may be mapped into memory so that a process may access and modify the file simply by reading and writing a range of memory.

The basic structure by which this is managed has changed significantly since the 1.2 kernels. On 1.2, the filesystem and VM systems were entirely separate components of the kernel. Caching within the filesystem was dealt with by a dedicated buffer cache, and the VM system maintained its pages in separate areas of memory. The only concession to page sharing was that a properly aligned set of buffers was permitted to be mapped, read-only, into a process' address space to provide a restricted (but still very useful) form of memory mapping. The buffer cache and VM systems had their own separate interfaces into the block device IO system, with the buffer cache being tied very closely to block devices. In the 1.3.50 kernels, a new structure appeared: the page cache. As its name suggests, the page cache stores entire pages of file data at once. However, unlike the buffer cache it is not limited to caching block device contents. It gives a number of advantages over the simpler, single-cache system in the 1.2 kernels:

We can cache non-block-device filesystems, such as NFS
All file data is automatically stored page-aligned, so memory mapping is much easier
Read/write memory mapping is supported
Data lookup is much faster

We will look at how this new structure fits into the kernel. In particular, the old dedicated method of reading and writing pages to/from disk is gone: as single scheme of buffer IO handles all requests for block devices, with temporary buffer descriptors created on the fly when performing IO from the page cache. The old buffer is still preserved, however, both to handle write-back disk writes, and to cache filesystem metadata such as directory and inode blocks. Finally, we will look at the performance of the 1.2 and 2.0 kernels. The page cache is not the only major improvement to VM performance in linux-2.0, and we will look at some of the other changes, either already implemented in 2.0 or planned for future kernels, to swapping and paging performance.