The Chunk Memory Management System

The Chunk Memory Management System raevnos Fri, 2003-09-05 11:56

Synopsis:

The chunk memory management system has three goals: to reduce overall memory consumption, to improve locality of reference, and to allow less-used sections of memory to be paged out to disk. These three goals are accomplished by implementing an allocation management layer on top of malloc(), with significantly reduced overhead and the ability to rearrange allocations to actively control fragmentation and increase locality.

Basic operation:

The managed memory pool is divided into approximately 64K regions. Each of these regions contain variable-size chunks representing allocated and available (free) memory. No individual allocation may be larger than will fit in a single region, and no allocation may be smaller than two bytes. Each chunk has between two and four bytes of overhead (indicating the used/free status, the size of the chunk, and the number of dereferences for the chunk), and each region has additional overhead of about 32 bytes.

Allocations are made with the chunk_create() call, which is given the size of the data, the data value to be stored, and an initial dereference count to be assigned to the chunk. Once created, the value of a chunk cannot be changed; the storage is immutable. chunk_create() returns an integral reference value that can be used to retrieve or free the allocation.

Allocations are accessed with the chunk_fetch(), chunk_len(), and chunk_derefs() calls. Each of these if given a reference (as returned by chunk_create()), and chunk_fetch() is additionally given a buffer and length to fill with the allocated value. Both chunk_fetch() and chunk_len() increment a chunk's dereference count (up to the maximum of 255), which is used in migration to improve locality.

Allocations are freed with the chunk_delete() call, which also requires a reference as input.

Finally, allocations are allowed to rearrange themselves with the chunk_migration() call. chunk_migration() takes an array of pointers to chunk references as input, and examines each of the indicated chunks to see which need to be moved to improve the distribution of allocations. If any allocations are moved, then the references to the moved allocations are updated in place (hence the array of pointers to references, instead of just an array of references). Migration may be done incrementally by submitting only a portion of the allocations with each call to chunk_migration(); however, _all_ allocations made with chunk_create() must eventually be submitted for migration in order to maintain the memory pool in a non-fragmented state.

Migration:

Under normal conditions, extended use of this chunk allocation system would lead to a significantly fragmented datastore, unless there was some means to defragment the storage arena. In the long run, this could be very bad, leading to quite a mess. Calling chunk_migration() gives the allocator permission to move allocations around both to defragment the arena and to improve locality of reference (by making sure that all the infrequently used chunks are segregated from the chunks in active use). Of course, moving all the allocated chunks at once would be a slow and painful process. Instead, migration may be done incrementally, giving permission to move a small number of chunks at any one time, and spreading out the cost of defragmenting the data store.

Just because you give permission to move a chunk doesn't mean that it will be moved. The chunk may be perfectly happy where it is, with no need to move it elsewhere. Chunks are only moved when their personal happiness would be improved by a move. In general, maximizing the happiness of individual chunks will improve the happiness of the whole.

There are several things that factor into a chunk's happiness. The things that make a chunk unhappy are:

  • Having free space both before and after the chunk in its region.
  • Having only one allocated neighbor (or worse, none). The edges of a region count as allocated neighbors.
  • Having a dereference count different from the region average. The greater the difference, the more unhappy the chunk is.
  • Being in a sparsely populated region. The more free space in a region, the more unhappy the chunks in it.
  • Being away from the other chunks migrated at the same time. If some of the other chunks allowed to migrate are in the same region as a chunk, then it is happier. (This is specifically to improve locality during dumps.)

None of these factors are absolute; all of them have different weights that add into a general unhappiness for the chunk. The lower the unhappiness, the better.

Over time and usage, the dereference counts for chunks will increase and eventually reach a maximum value of 255. (The count is limited by the fact that it's stored in a single byte for each chunk.) If this is left unchecked, eventually all chunks would have a dereference count of 255, and the counts would be useless for improving locality. To counteract this, when the average dereference count for a certain number of regions exceeds 128, the 'migration period' is incremented and all chunk dereference counts are halved. The critical number of regions is determined based on the cache size and the total number of regions. In general, period change should be controlled primarily by the frequency of database dumps (which end up incrementing the dereference count on all chunks, and thus all regions). Given a dump frequency of once per hour (the default), there should be a period change about every 2.6 days.

Statistics:

The chunk memory management system keeps several statistics about the allocation pool, both to maintain good operation through active encouragement of locality, and to satisfy the curiosity of people using the system (and its designer ;-)). These statistics are reported (in PennMUSH) through the use of the @stats command, with /chunks switch.

@stats/chunks generates output similar to this:

Chunks:         99407 allocated (   8372875 bytes,     223808 ( 2%) overhead)
                  74413 short     (   1530973 bytes,     148826 ( 9%) overhead)
                  24994 medium    (   6841902 bytes,      74982 ( 1%) overhead)
                      0 long      (         0 bytes,          0 ( 0%) overhead)
                  128 free      (   1319349 bytes,      23058 ( 1%) fragmented)
Regions:          147 total,       16 cached
Paging:        158686 out,     158554 in
Storage:      9628500 total (86% saturation)

Period:             1 (   5791834 accesses so far,       1085 chunks at max)
Migration:     245543 moves this period
                 145536 slide
                     45 away
                  30719 fill exact
                  69243 fill inexact

First, the number of allocated chunks is given, along with their total size and overhead. Then, the allocated chunks are broken up by size-range; short chunks (2 to 63 bytes) with two bytes of overhead each, medium chunks (64 to 8191 bytes) with three bytes of overhead each, and long chunks (8192 to ~64K bytes) with four bytes of overhead each. Rounding out the individual chunk statistics is the number of free chunks, their total size, and the amount of fragmented free space (free space not in the largest free chunk for its region is considered fragmented).

Next comes statistics on regions: the number of regions in use and the number held in the memory cache. All regions not in the cache are paged out to disk. Paging statistics follow, listing the number of times a region has been moved out of or into memory cache. After that, the total amount of storage (in memory or on disk) used is given, along with the saturation rate (where saturation is indicated by what fraction of the used space is actually allocated in chunks).

Finally comes statistics on migration and the migration period. The period number is listed, along with the total number of dereferences in the period and how many chunks have the maximum dereference count of 255. Then the amount of migration movement is listed, both in total and broken up by category. Slides occur when an allocation is shifted to the other side of a neighboring free space. Away moves are made when an allocation is extremely unhappy where it is, and is pushed out to somewhere else. Fills are when an allocation is moved in order to fill in a free space; the space can be either exactly filled by the move, or inexactly filled (leaving some remaining free space).

Histograms:

The chunk memory management system can also display a few histograms about itself. These histograms are reported (in PennMUSH) through the use of the @stats command, with the /regions, /freespace, or /paging switches.

All of @stats/regions, @stats/freespace, and @stats/paging produce histograms vs. region average dereference count. The histograms use buckets four counts wide, so all regions from 0-3 will be in the first bucket, 4-7 in the second, etc., up to 252-255 in the last bucket. If the heights of the buckets are significantly different, then the highest spikes will be allowed to extend off the top of the histogram (with their real values labeled in parenthesis next to them).

@stats/regions is a histogram of how many regions at each count currently exist. In a healthy game, there should be a large spike at some dereference count between 64 and 128 (representing the largely unused portion of the database), a lesser spike at 255 (representing the portion of the database that's used very frequently), and a smattering of regions at other counts, with either new areas of the database (below the large spike) or semi-frequently used areas (above the large spike). New migration periods occur when the large spike would pass 128, at which point everything is halved and the spike is pushed back down to 64.

@stats/freespace is a histogram of how much free space exists in regions at each dereference count. This histogram is included to aid in diagnosis of the cause for dropping saturation rates. Ideally, the free space in any bucket should be less than 120K.

@stats/paging is a histogram of the number of regions being paged in or out at each dereference count. As of this writing, a very unhealthy behaviour is observed, wherein the histogram shows a trapeziod between 64 and 128, drowning out most of the rest of the chart. This indicates that as time goes on, the attributes associated with a single low-use object are getting scattered randomly throughout all the low-use regions, and thus when dumps occur (with their linear enumeration of all attributes on objects) the low-use regions thrash in and out of cache. This can be very detrimental to dump performance. Something will have to be done to fix this tendency of migration; the unhappiness based on other chunks being migrated in the same region is the current attempt. Healthy behaviour will make some other pattern in the paging histogram which has not yet been determined.