.. mode: -*- rst -*- Virtual mapping =============== :Tag: design.mps.vm :Author: richard :Date: 1998-05-11 :Status: complete design :Revision: $Id: //info.ravenbrook.com/project/mps/master/design/vm.txt#24 $ :Copyright: See `Copyright and License`_. :Index terms: pair: virtual mapping; design Introduction ------------ _`.intro`: This is the design of the virtual mapping module. _`.readership`: Any MPS developer; anyone porting the MPS to a new platform. _`.overview`: The virtual mapping module provides a simple, portable, low-level interface to address space, with functions for reserving, releasing, mapping and unmapping ranges of addresses. _`.motivation`: The virtual mapping module is heavily used by the VM Arena Class (see design.mps.arena.vm_). .. _design.mps.arena.vm: arenavm Requirements ------------ _`.req.granularity`: The virtual mapping module must report the *granularity* with which address space can be managed. (This is necessary for the arena to be able to portably determine its grain size; see design.mps.arena.def.grain_.) .. _design.mps.arena.def.grain: arena#.def.grain _`.req.reserve`: The *reserve* operation must reserves a chunk of address space. _`.req.reserve.exclusive`: The MPS should have exclusive use of the reserved chunk. (None of our supported operating systems can actually provide this feature, alas. We rely on co-operation with the client program.) _`.req.reserve.contiguous`: The reserved chunk is a *contiguous* portion of address space. (Contiguity is needed for zones to work; see design.mps.arena.vm.overview.gc.zone_.) .. _design.mps.arena.vm.overview.gc.zone: arenavm#overview.gc.zone _`.req.reserve.size`: The reserved chunk is at least a *specified size*. (This is necessary for zones to work.) _`.req.reserve.align`: The reserved chunk is aligned to a *specified alignment*. (This is necessary for the arena to be able to manage address space in terms of grains.) _`.req.reserve.overhead`: The reserved chunk is not much larger than specified, preferably with no more than a grain of overhead. (This is necessary in order to allow the client program to specify the amount of address space the MPS uses, so that it can co-operate with other subsystems that use address space.) .. _design.mps.arena.vm.overview.gc.zone: arenavm#overview.gc.zone _`.req.reserve.address.not`: There is no requirement to be able to reserve address space at a particular address. (The zone implementation uses bits from the middle of the address, so can cope wherever the portion is placed in the address space.) _`.req.reserve.map.not`: The reserve operation should not map the chunk into main memory or swap space. (The zone strategy is most efficient if address space is use sparsely, but main memory is a limited resource.) _`.req.release`: The *release* operation should release a previously reserved chunk of address space so that it may be used by other subsystems of the client program. (This is needed to support client programs on systems where address space is tight, and the client's subsystems need to co-operate in their use of address space.) _`.req.reserved`: The virtual mapping module must report the total amount of reserved memory in each chunk of address space. (This is needed to implement ``mps_arena_reserved()``.) _`.req.map`: The *map* operation must arrange for a (previously reserved) range of address space to be mapped into main memory or swap space, so that addresses in the range can be read and written. _`.req.unmap`: The *unmap* operation should arrange for a previously mapped range of address space to no longer be mapped into main memory or swap space. (This is needed to support client programs on systems where main memory is scarce, and the client's subsystems need to co-operate in their use of main memory.) _`.req.mapped`: The virtual mapping module must maintain the total amount of mapped memory in each chunk of address space. (This is needed to allow the client program to limit the use of main memory by the MPS via the "commit limit" mechanism.) _`.req.bootstrap`: The virtual mapping module must be usable without allocating heap memory. (This is necessary for the VM arena to get off the ground.) _`.req.params`: The interface should make it possible for MPS to allow the client program to modify the behaviour of the virtual mapping implementation. (This is needed to implement the ``MPS_KEY_VMW3_MEM_TOP_DOWN`` keyword argument.) _`.req.prot.exec`: The virtual mapping module should allow mutators to write machine code into memory allocated by the MPS and then execute that code, for example, to implement just-in-time translation, or other forms of dynamic compilation. Compare design.mps.prot.req.prot.exec_. .. _design.mps.prot.req.prot.exec: prot#.req.prot.exec Design ------ _`.sol.overhead`: To meet `.req.reserve.contiguous`_, `.req.reserve.align`_ and `.req.reserve.overhead`_, most VM implementations ask the operating system for ``size + grainSize - pageSize`` bytes of address space. This ensures that wherever the operating system places the reserved address space, it contains a contiguous region of ``size`` bytes aligned to a multiple of ``grainSize``. The overhead is thus ``grainSize - pageSize``, and in the common case where ``grainSize`` is equal to ``pageSize``, this is zero. _`.sol.bootstrap`: To meet `.req.bootstrap`_, the interface provides the function ``VMCopy()``. This allows the initialization of a ``VMChunk`` to proceed in four steps. First, allocate space for a temporary VM descriptor on the stack. Second, call ``VMInit()`` to reserve address space and initialize the temporary VM descriptor. Third, call ``VMMap()`` on the new VM to map enough memory to store a ``VMChunk``. Fourth, call ``VMCopy()`` to copy the temporary VM descriptor into its place in the ``VMChunk``. _`.sol.params`: To meet `.req.params`_, the interface provides the function ``VMParamFromArgs()``, which decodes relevant keyword arguments into a temporary buffer provided by the caller; this buffer is then passed to ``VMInit()``. The size of the buffer must be statically determinable so that the caller can allocate it on the stack: it is given by the constant ``VMParamSize``. Since this is potentially platform-dependent it is defined in ``config.h``. _`.sol.prot.exec`: The virtual mapping module maps memory as executable, if this is supported by the platform. Interface --------- ``typedef VMStruct *VM`` _`.if.vm`: ``VM`` is a descriptor for a reserved chunk of address space. It points to a ``VMStruct`` structure, which is defined in ``vm.h`` so that it can be inlined in the ``VMChunkStruct`` by the VM arena class. ``Size PageSize(void)`` _`.if.page.size`: Return the "page size": that is, the granularity with which the operating system can reserve and map address space. _`.if.page.size.cache`: On some systems (for example, Windows), determining the page size requires a system call, so for speed the page size is cached in each VM descriptor and should be retrieved by calling the ``VMPageSize()`` function. ``Res VMParamFromArgs(void *params, size_t paramSize, ArgList args)`` _`.if.param.from.args`: Decode the relevant keyword arguments in the ``args`` parameter, and store a description of them in the buffer pointed to by ``params`` (which is ``paramSize`` bytes long). It is an error if the buffer is not big enough store the parameters for the VM implementation. ``Res VMInit(VM vm, Size size, Size grainSize, void *params)`` _`.if.init`: Reserve a chunk of address space that contains at least ``size`` addresses, starting at an address which is a multiple of ``grainSize``. The ``params`` argument points to a parameter block that was initialized by a call to ``VMParamFromArgs()``. If successful, update ``vm`` to describe the reserved chunk, and return ``ResOK``. Otherwise, return ``ResRESOURCE``. ``void VMFinish(VM vm)`` _`.if.finish`: Release the chunk of address space described by ``vm``. Any addresses that were mapped through this VM are now unmapped. ``Res VMMap(VM vm, Addr base, Addr limit)`` _`.if.map`: Map the range of addresses from ``base`` (inclusive) to ``limit`` (exclusive) into main memory. It is an error if the range does not lie between ``VMBase(vm)`` and ``VMLimit(vm)``, or if ``base`` and ``limit`` are not multiples of ``VMPageSize(vm)``. Return ``ResOK`` if successful, ``ResMEMORY`` otherwise. ``void VMUnmap(VM vm, Addr base, Addr limit)`` _`.if.unmap`: Unmap the range of addresses from ``base`` (inclusive) to ``limit`` (exclusive). The conditions are the same as for ``VMMap()``. ``Addr VMBase(VM vm)`` _`.if.base`: Return the base address of the VM (the lowest address in the VM that is a multiple of the grain size). ``Addr VMLimit(VM vm)`` _`.if.limit`: Return the limit address of the VM (the limit of the last grain that is wholly inside the VM). ``Size VMReserved(VM vm)`` _`.if.reserved`: Return the amount of address space (in bytes) reserved by the VM. This may include addresses that are not available for mapping because of the requirement for ``VMBase(vm)`` and ``VMLimit(vm)`` to be multiples of the grain size. ``Size VMMapped(VM vm)`` _`.if.mapped`: Return the amount of address space (in bytes) currently mapped into memory by the VM. ``void VMCopy(VM dest, VM src)`` _`.if.copy`: Copy the VM descriptor from ``src`` to ``dest``. Implementations --------------- Generic implementation ...................... _`.impl.an`: In ``vman.c``. _`.impl.an.page.size`: The generic VM uses a fake page size, given by the constant ``VMAN_PAGE_SIZE`` in ``config.h``. _`.impl.an.param`: Decodes no keyword arguments. _`.impl.an.reserve`: Address space is "reserved" by calling ``malloc()``. _`.impl.an.release`: Address space is "released" by calling ``free()``. _`.impl.an.map`: Mapping (and unmapping) just fills the mapped region with copies of ``VMJunkBYTE`` to emulate the erasure of freshly mapped pages by virtual memory systems. Unix implementation ................... _`.impl.ix`: In ``vmix.c``. _`.impl.ix.page.size`: The page size is given by ``sysconf(_SC_PAGESIZE)``. We avoid ``getpagesize()``, which is a legacy function in Posix: Applications should use the sysconf() function instead. — `The Single UNIX ® Specification, Version 2 `__ _`.impl.ix.param`: Decodes no keyword arguments. _`.impl.ix.reserve`: Address space is reserved by calling |mmap|_, passing ``PROT_NONE`` and ``MAP_PRIVATE | MAP_ANON``. .. |mmap| replace:: ``mmap()`` .. _mmap: https://pubs.opengroup.org/onlinepubs/9699919799/functions/mmap.html _`.impl.ix.anon.trans`: Note that ``MAP_ANON`` ("map anonymous memory not associated with any specific file") is an extension to POSIX, but it is supported by FreeBSD, Linux, and macOS. A work-around that was formerly used on systems lacking ``MAP_ANON`` was to map the file ``/dev/zero``. _`.impl.ix.release`: Address space is released by calling |munmap|_. .. |munmap| replace:: ``munmap()`` .. _munmap: https://pubs.opengroup.org/onlinepubs/9699919799/functions/munmap.html _`.impl.ix.map`: Address space is mapped to main memory by calling |mmap|_, passing ``PROT_READ | PROT_WRITE | PROT_EXEC`` and ``MAP_ANON | MAP_PRIVATE | MAP_FIXED``. _`.impl.ix.unmap`: Address space is unmapped from main memory by calling |mmap|_, passing ``PROT_NONE`` and ``MAP_ANON | MAP_PRIVATE | MAP_FIXED``. _`.impl.xc.prot.exec`: The approach in `.sol.prot.exec`_ of always making memory executable causes a difficulty on macOS on Apple Silicon. The virtual mapping module uses the same solution as the protection module, that is, detecting Apple Hardened Runtime, and retrying without the request for the memory to be executable. See design.mps.prot.impl.xc.prot.exec_ for details. .. _design.mps.prot.impl.xc.prot.exec: prot#.impl.xc.prot.exec Windows implementation ...................... _`.impl.w3`: In ``vmw3.c``. _`.impl.w3.page.size`: The page size is retrieved by calling |GetSystemInfo|_ and consulting ``SYSTEMINFO.dwPageSize``. .. |GetSystemInfo| replace:: ``GetSystemInfo()`` .. _GetSystemInfo: https://docs.microsoft.com/en-us/windows/desktop/api/sysinfoapi/nf-sysinfoapi-getsysteminfo _`.impl.w3.param`: Decodes the keyword argument ``MPS_KEY_VMW3_MEM_TOP_DOWN``, and if it is set, arranges for ``VMInit()`` to pass the ``MEM_TOP_DOWN`` flag to |VirtualAlloc|_. _`.impl.w3.reserve`: Address space is reserved by calling |VirtualAlloc|_, passing ``MEM_RESERVE`` (and optionally ``MEM_TOP_DOWN``) and ``PAGE_NOACCESS``. .. |VirtualAlloc| replace:: ``VirtualAlloc()`` .. _VirtualAlloc: https://msdn.microsoft.com/en-us/library/windows/desktop/aa366887.aspx _`.impl.w3.release`: Address space is released by calling |VirtualFree|_, passing ``MEM_RELEASE``. .. |VirtualFree| replace:: ``VirtualFree()`` .. _VirtualFree: https://msdn.microsoft.com/en-us/library/windows/desktop/aa366892.aspx _`.impl.w3.map`: Address space is mapped to main memory by calling |VirtualAlloc|_, passing ``MEM_COMMIT`` and ``PAGE_EXECUTE_READWRITE``. _`.impl.w3.unmap`: Address space is unmapped from main memory by calling |VirtualFree|_, passing ``MEM_DECOMMIT``. Testing ------- _`.testing`: It is important to test that a VM implementation works in extreme cases. _`.testing.large`: It must be able to reserve a large address space. Clients will want multi-GB spaces, more than that OSs will allow. If they ask for too much, ``mps_arena_create()`` (and hence ``VMInit()``) must fail in a predictable way. _`.testing.larger`: It must be possible to allocate in a large space; sometimes committing will fail, because there's not enough space to replace the "reserve" mapping. See request.epcore.160201_ for details. .. _request.epcore.160201: https://info.ravenbrook.com/project/mps/import/2001-11-05/mmprevol/request/epcore/160201 _`.testing.lots`: It must be possible to have lots of mappings. The OS must either combine adjacent mappings or have lots of space in the kernel tables. See request.epcore.160117_ for ideas on how to test this. .. _request.epcore.160117: https://info.ravenbrook.com/project/mps/import/2001-11-05/mmprevol/request/epcore/160117 Document History ---------------- - 1998-05-11 RB_ Incomplete design. - 2002-06-07 RB_ Converted from MMInfo database design document. - 2013-05-23 GDR_ Converted to reStructuredText. - 2014-06-16 GDR_ Document the whole interface. - 2014-10-22 GDR_ Refactor module description into requirements. .. _RB: https://www.ravenbrook.com/consultants/rb/ .. _GDR: https://www.ravenbrook.com/consultants/gdr/ Copyright and License --------------------- Copyright © 2013–2020 `Ravenbrook Limited `_. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.