43. Virtual mapping

43.1. Introduction

.intro: This is the design of the virtual mapping module.

.readership: Any MPS developer; anyone porting the MPS to a new platform.

.overview: The virtual mapping module provides a simple, portable, low-level interface to address space, with functions for reserving, releasing, mapping and unmapping ranges of addresses.

.motivation: The virtual mapping module is heavily used by the VM Arena Class (see design.mps.arena.vm).

43.2. Requirements

.req.granularity: The virtual mapping module must report the granularity with which address space can be managed. (This is necessary for the arena to be able to portably determine its grain size; see design.mps.arena.def.grain.)

.req.reserve: The reserve operation must reserves a chunk of address space.

.req.reserve.exclusive: The MPS should have exclusive use of the reserved chunk. (None of our supported operating systems can actually provide this feature, alas. We rely on co-operation with the client program.)

.req.reserve.contiguous: The reserved chunk is a contiguous portion of address space. (Contiguity is needed for zones to work; see design.mps.arena.vm.overview.gc.zone.)

.req.reserve.size: The reserved chunk is at least a specified size. (This is necessary for zones to work.)

.req.reserve.align: The reserved chunk is aligned to a specified alignment. (This is necessary for the arena to be able to manage address space in terms of grains.)

.req.reserve.overhead: The reserved chunk is not much larger than specified, preferably with no more than a grain of overhead. (This is necessary in order to allow the client program to specify the amount of address space the MPS uses, so that it can co-operate with other subsystems that use address space.)

.req.reserve.address.not: There is no requirement to be able to reserve address space at a particular address. (The zone implementation uses bits from the middle of the address, so can cope wherever the portion is placed in the address space.)

.req.reserve.map.not: The reserve operation should not map the chunk into main memory or swap space. (The zone strategy is most efficient if address space is use sparsely, but main memory is a limited resource.)

.req.release: The release operation should release a previously reserved chunk of address space so that it may be used by other subsystems of the client program. (This is needed to support client programs on systems where address space is tight, and the client’s subsystems need to co-operate in their use of address space.)

.req.reserved: The virtual mapping module must report the total amount of reserved memory in each chunk of address space. (This is needed to implement mps_arena_reserved().)

.req.map: The map operation must arrange for a (previously reserved) range of address space to be mapped into main memory or swap space, so that addresses in the range can be read and written.

.req.unmap: The unmap operation should arrange for a previously mapped range of address space to no longer be mapped into main memory or swap space. (This is needed to support client programs on systems where main memory is scarce, and the client’s subsystems need to co-operate in their use of main memory.)

.req.mapped: The virtual mapping module must maintain the total amount of mapped memory in each chunk of address space. (This is needed to allow the client program to limit the use of main memory by the MPS via the “commit limit” mechanism.)

.req.bootstrap: The virtual mapping module must be usable without allocating heap memory. (This is necessary for the VM arena to get off the ground.)

.req.params: The interface should make it possible for MPS to allow the client program to modify the behaviour of the virtual mapping implementation. (This is needed to implement the MPS_KEY_VMW3_MEM_TOP_DOWN keyword argument.)

.req.prot.exec: The virtual mapping module should allow mutators to write machine code into memory allocated by the MPS and then execute that code, for example, to implement just-in-time translation, or other forms of dynamic compilation. Compare design.mps.prot.req.prot.exec.

43.3. Design

.sol.overhead: To meet .req.reserve.contiguous, .req.reserve.align and .req.reserve.overhead, most VM implementations ask the operating system for size + grainSize - pageSize bytes of address space. This ensures that wherever the operating system places the reserved address space, it contains a contiguous region of size bytes aligned to a multiple of grainSize. The overhead is thus grainSize - pageSize, and in the common case where grainSize is equal to pageSize, this is zero.

.sol.bootstrap: To meet .req.bootstrap, the interface provides the function VMCopy(). This allows the initialization of a VMChunk to proceed in four steps. First, allocate space for a temporary VM descriptor on the stack. Second, call VMInit() to reserve address space and initialize the temporary VM descriptor. Third, call VMMap() on the new VM to map enough memory to store a VMChunk. Fourth, call VMCopy() to copy the temporary VM descriptor into its place in the VMChunk.

.sol.params: To meet .req.params, the interface provides the function VMParamFromArgs(), which decodes relevant keyword arguments into a temporary buffer provided by the caller; this buffer is then passed to VMInit(). The size of the buffer must be statically determinable so that the caller can allocate it on the stack: it is given by the constant VMParamSize. Since this is potentially platform-dependent it is defined in config.h.

.sol.prot.exec: The virtual mapping module maps memory as executable, if this is supported by the platform.

43.4. Interface

typedef VMStruct *VM

.if.vm: VM is a descriptor for a reserved chunk of address space. It points to a VMStruct structure, which is defined in vm.h so that it can be inlined in the VMChunkStruct by the VM arena class.

Size PageSize(void)

.if.page.size: Return the “page size”: that is, the granularity with which the operating system can reserve and map address space.

.if.page.size.cache: On some systems (for example, Windows), determining the page size requires a system call, so for speed the page size is cached in each VM descriptor and should be retrieved by calling the VMPageSize() function.

Res VMParamFromArgs(void *params, size_t paramSize, ArgList args)

.if.param.from.args: Decode the relevant keyword arguments in the args parameter, and store a description of them in the buffer pointed to by params (which is paramSize bytes long). It is an error if the buffer is not big enough store the parameters for the VM implementation.

Res VMInit(VM vm, Size size, Size grainSize, void *params)

.if.init: Reserve a chunk of address space that contains at least size addresses, starting at an address which is a multiple of grainSize. The params argument points to a parameter block that was initialized by a call to VMParamFromArgs(). If successful, update vm to describe the reserved chunk, and return ResOK. Otherwise, return ResRESOURCE.

void VMFinish(VM vm)

.if.finish: Release the chunk of address space described by vm. Any addresses that were mapped through this VM are now unmapped.

Res VMMap(VM vm, Addr base, Addr limit)

.if.map: Map the range of addresses from base (inclusive) to limit (exclusive) into main memory. It is an error if the range does not lie between VMBase(vm) and VMLimit(vm), or if base and limit are not multiples of VMPageSize(vm). Return ResOK if successful, ResMEMORY otherwise.

void VMUnmap(VM vm, Addr base, Addr limit)

.if.unmap: Unmap the range of addresses from base (inclusive) to limit (exclusive). The conditions are the same as for VMMap().

Addr VMBase(VM vm)

.if.base: Return the base address of the VM (the lowest address in the VM that is a multiple of the grain size).

Addr VMLimit(VM vm)

.if.limit: Return the limit address of the VM (the limit of the last grain that is wholly inside the VM).

Size VMReserved(VM vm)

.if.reserved: Return the amount of address space (in bytes) reserved by the VM. This may include addresses that are not available for mapping because of the requirement for VMBase(vm) and VMLimit(vm) to be multiples of the grain size.

Size VMMapped(VM vm)

.if.mapped: Return the amount of address space (in bytes) currently mapped into memory by the VM.

void VMCopy(VM dest, VM src)

.if.copy: Copy the VM descriptor from src to dest.

43.5. Implementations

43.5.1. Generic implementation

.impl.an: In vman.c.

.impl.an.page.size: The generic VM uses a fake page size, given by the constant VMAN_PAGE_SIZE in config.h.

.impl.an.param: Decodes no keyword arguments.

.impl.an.reserve: Address space is “reserved” by calling malloc().

.impl.an.release: Address space is “released” by calling free().

.impl.an.map: Mapping (and unmapping) just fills the mapped region with copies of VMJunkBYTE to emulate the erasure of freshly mapped pages by virtual memory systems.

43.5.2. Unix implementation

.impl.ix: In vmix.c.

.impl.ix.page.size: The page size is given by sysconf(_SC_PAGESIZE). We avoid getpagesize(), which is a legacy function in Posix:

Applications should use the sysconf() function instead.

The Single UNIX ® Specification, Version 2

.impl.ix.param: Decodes no keyword arguments.

.impl.ix.reserve: Address space is reserved by calling mmap(), passing PROT_NONE and MAP_PRIVATE | MAP_ANON.

.impl.ix.anon.trans: Note that MAP_ANON (“map anonymous memory not associated with any specific file”) is an extension to POSIX, but it is supported by FreeBSD, Linux, and macOS. A work-around that was formerly used on systems lacking MAP_ANON was to map the file /dev/zero.

.impl.ix.release: Address space is released by calling munmap().

.impl.ix.map: Address space is mapped to main memory by calling mmap(), passing PROT_READ | PROT_WRITE | PROT_EXEC and MAP_ANON | MAP_PRIVATE | MAP_FIXED.

.impl.ix.unmap: Address space is unmapped from main memory by calling mmap(), passing PROT_NONE and MAP_ANON | MAP_PRIVATE | MAP_FIXED.

.impl.xc.prot.exec: The approach in .sol.prot.exec of always making memory executable causes a difficulty on macOS on Apple Silicon. The virtual mapping module uses the same solution as the protection module, that is, detecting Apple Hardened Runtime, and retrying without the request for the memory to be executable. See design.mps.prot.impl.xc.prot.exec for details.

43.5.3. Windows implementation

.impl.w3: In vmw3.c.

.impl.w3.page.size: The page size is retrieved by calling GetSystemInfo() and consulting SYSTEMINFO.dwPageSize.

.impl.w3.param: Decodes the keyword argument MPS_KEY_VMW3_MEM_TOP_DOWN, and if it is set, arranges for VMInit() to pass the MEM_TOP_DOWN flag to VirtualAlloc().

.impl.w3.reserve: Address space is reserved by calling VirtualAlloc(), passing MEM_RESERVE (and optionally MEM_TOP_DOWN) and PAGE_NOACCESS.

.impl.w3.release: Address space is released by calling VirtualFree(), passing MEM_RELEASE.

.impl.w3.map: Address space is mapped to main memory by calling VirtualAlloc(), passing MEM_COMMIT and PAGE_EXECUTE_READWRITE.

.impl.w3.unmap: Address space is unmapped from main memory by calling VirtualFree(), passing MEM_DECOMMIT.

43.6. Testing

.testing: It is important to test that a VM implementation works in extreme cases.

.testing.large: It must be able to reserve a large address space. Clients will want multi-GB spaces, more than that OSs will allow. If they ask for too much, mps_arena_create() (and hence VMInit()) must fail in a predictable way.

.testing.larger: It must be possible to allocate in a large space; sometimes committing will fail, because there’s not enough space to replace the “reserve” mapping. See request.epcore.160201 for details.

.testing.lots: It must be possible to have lots of mappings. The OS must either combine adjacent mappings or have lots of space in the kernel tables. See request.epcore.160117 for ideas on how to test this.