MPS issue job003654

Title	Bit table scanning on segment allocation is not scalable
Status	closed
Priority	essential
Assigned user	David Lovemore
Organization	Ravenbrook
Description	When allocating segments, large amounts of bit tables are scanned. This causes performance problems for large amounts of memory.
Analysis	I remember the days when 32Mb was more memory than anyone had. If you have 4Kb pages, and have a bit table of mapped pages then each byte represents 8 4Kb pages, or 32Kb. A 1Kb table is sufficient for 32Mb. If you scan from the bit table from the beginning every allocation, then you have O(n**2) behaviour. Not a problem if you are allocating a 4Kb page and the most you can scan is 1Kb. But this has become a problem with larger memory sizes. If you have 256Kb of memory used, the page table now becomes 8Kb, twice the amount that you are probably allocating. Under certain conditions then, especially where memory is being allocated and not freed, we scan the whole table to fill a 4K buffer. This could typically happen while collecting. When you reach 1Gb of memory, the situation is quite bad. Every buffer fill, a segment allocation happens, which may cause the level 1 cache to be totally filled with bit table. I have 16Gb on my machine. That would need 512Kb of bit table, enough to fill your level 2 cache. That would require a lot of scanning to allocate a 4K segment. What seems to make sense is to remember where the first free bit in the table is. This is made complicated by the preferred zone allocation policy. A quick fix is to keep a record per zone of where it is full up to and also a bitset of full zones that is updated whenever we allocate. One complication is that there can be multiple zones per chunk because the chunk size can be bigger than the initial chunk. Unfortunately we allocate in zones going up but within a set of zone ranges going down. Perhaps it would make sense to fix that. A fix with a better complexity would maybe be a version of the MV pool that combined edges with bit tables. (Or didn't use bit tables at all) EP complained about bit table scanning taking up lots of time back when they starting using the MPS in 1996. ".analysis: The profiles show that ChunkSegAlloc is taking almost all of the MM time. ChunkSegAlloc spends all its time iterating over a bit table looking for a set range (effectively BTFindSetRange)." [1] The response was to speed up the bit scanning code. That may have worked at the time and it was using AMS. Now we typically have 1000x more memory. More evidence of this problem in [2]. Note that the profiles for [2] was from a heap with a maximum amount of allocated memory of 3Mb. Mails [3], [4] notice but don't diagnose this problem in CET. We ended up finding a problem with hash tables but this was never fixed. Notice we are about 10x slower when allocating with a 700Mb heap than with a 50Mb heap.
How found	inspection
Evidence	[1] "Performance for typical test jobs is too slow" <`https://info.ravenbrook.com/project/mp...01-11-05/mmprevol/request/epcore/170534`> [2] "Re: MVT and MVFF performance (was Re: Merges pending)" <`https://info.ravenbrook.com/mail/2013/06/08/13-06-53/0/`> [3] "allocTest session notes" <`https://info.ravenbrook.com/mail/2013/06/05/12-10-27/0/`> [4] "Re: allocTest session notes" <`https://info.ravenbrook.com/mail/2013/06/05/18-23-28/0/`>
Created by	David Lovemore
Created on	2013-12-28 01:24:22
Last modified by	Gareth Rees
Last modified on	2014-03-17 13:54:15
History	2013-12-28 DL Created 2013-12-29 DL Added analysis about EP. 2013-12-29 DL Added evidence from e-mails

Fixes

Change	Effect	Date	User	Description
184783	closed	2014-03-13 15:28:06	Richard Brooksby	Merging branch/2014-01-17/cbs-tract-alloc into master.
184359	open	2014-02-17 17:05:37	Richard Brooksby	Merging spare-ring branch into master, from //info.ravenbrook.com/project/mps/branch/2014-01-25/spare-ring/...