MPS issue job003435

TitleRehashing large hash tables breaks nursery collection strategy
Assigned userNick Barnes
DescriptionSimple use of location dependencies in flat hash tables has a very bad performance conflict with our generational collection scheduling strategy, once a hash table is large enough for the main hash table array to be a significant fraction of the nursery size (assuming that at least some of the objects in the hash table are new or nearly new).
AnalysisThe hash table LD refset includes (at least) the refset of the nursery (because objects in the hash table are new). On a rehash, the hash table array is reallocated in the nursery, adding to the nursery generation's "new size". A nursery collection is scheduled once this "new size" exceeds the nursery's capacity (or will do so "soon"). That collection makes the hash table's LD stale (for all references) because the collection puts the nursery refset into the arena's LD history. Then the next time the hash table's stale test is taken (e.g. on the next insert or miss), another rehash will be triggered. For a hash table which is a significant fraction of the nursery size, this causes a lot of rehashes. Hash table code might even go into an endless loop (because a rehash involves a lot of insert operations, which - if not written with this possibility in mind - could trigger another rehash).
There are several possible solutions. A quick-and-dirty hack being applied on a custom branch is for buffer fills greater than a fraction of the size of the containing generation not to count towards that generation's "new size".
How foundcustomer
Observed in1.110.0
Created byNick Barnes
Created on2013-03-07 19:50:31
Last modified byRichard Brooksby
Last modified on2013-05-01 16:36:17
History2013-03-07 NB Created.


Change Effect Date User Description
181142 closed 2013-03-13 15:24:45 Nick Barnes Fix a brain-damaged patch with a correct (but still flaky) one.
181082 open 2013-03-07 19:51:51 Nick Barnes Dirty hack workaround for job003435