Allocating and tracking memory

Allocating and tracking memory raevnos Tue, 2007-06-12 03:39

Penn includes a basic memory allocation debugger, enabled by setting mem_check to yes in mush.cnf and rebooting. However, to take advantage of this, you have to use special functions to allocate and free memory.

It also (As of 1.8.3p4) features a slab allocator meant for space-efficient allocation of small fixed-size objects.

Slabs

Slabs raevnos Fri, 2007-06-22 14:52

Each time you malloc() some memory, a bit more than what you requested is actually used -- the malloc system uses some bytes before or after (Or both) what it returns for its own bookkeeping. With large allocations, this doesn't matter much, but with lots of small allocations, it can cause problems. Some malloc implementations are better than others. A slab allocator can be used to reduce much of this overhead no matter how the underlying malloc implementation acts.

Before 1.8.3p4, attribute and lock structures were stored in their own, simple, slabs. In 1.8.3p4, a more general purpose and powerful allocator was added, and many areas of the source converted to use it.

The slab allocator is intended for small, fixed-size objects — structures, basically. strings are not suitable unless they are all the same length. Object sizes should be well under the page size of the host computer's virtual memory system — usually 4K, sometimes 8K.

The allocator uses malloc() to allocate a page's worth of memory at a time, and crams as many objects into that as it can. It keeps track of how many objects in each page are being used and how many are free, so it can delete any unused pages.

When allocating objects from a slab, you can hint as to what page should be looked at first for a free object. This is used to improve cache behavior (For example, when iterating through all attributes on a database object, it's faster if most of the structs are on the same page.).

API

Coming later.

Resources

mush_malloc() and memcheck

mush_malloc() and memcheck raevnos Fri, 2007-06-22 14:40

Penn includes a basic memory allocation debugger, enabled by setting mem_check to yes in mush.cnf and rebooting. However, to take advantage of this, you have to use special functions to allocate and free memory.

The memcheck system uses reference counting, incrementing the count for a particular tag that have been allocated and decrementing it when a tag is freed. You can see the current counts with @list allocations as an admin character (@stats/tables and God for pre-1.8.3p3p4 versions). Each line looks like:

TAG : COUNT.

A constantly increasing count where there shouldn't be one means you're not freeing memory after you're done. Attempts to free memory with a tag that has a 0 or negative count get logged in game/log/checkpt.log, along with where this was done in the source (The latter as of 1.8.3p4).

tags are just strings describing what's been allocated. More specific tags, for example "fun_align.string", make tracking down leaks easier than more generic tags like "string", but increase the size of the data structures that store reference counts.

API

To associate memory with a tag, use the following wrapper functions, prototyped in externs.h. All take a tag argument, which is a string describing what's being allocated or freed. The same tag must be used to free a chunk of memory as what used when it was allocated.

void* mush_malloc(size_t bytes, const char *tag)
Wrapper for malloc(), allocates a number of bytes and returns them or NULL.
void* mush_calloc(size_t count, size_t size, const char *tag)
Allocates an array of count elements of size bytes and returns a pointer to the first or NULL. The returned memory is filled with zero bytes. (Added in 1.8.3p3)
void* mush_realloc(void * restrict ptr, size_t size, const char * restrict tag)
Wrapper for realloc(): Allocates, resizes, or frees memory depending on the values of ptr and size. The tag and pointer cannot be the same string. (Added in 1.8.3p4)
char* mush_strdup(const char * restrict string, const char * restrict tag)
Returns a newly allocated copy of a C string (or NULL). The arguments cannot be the same string. Does not work with multibyte character strings.
void mush_free(void * restrict ptr, const char * restrict tag)
Frees memory. The arguments cannot be the same string.

Even if an allocation function returns NULL, it still increases the reference count of that tag.

If you have no control over allocation of some memory (Such as something returned by a foreign library), you can add a reference for it with void add_check(const char *tag) and decrement it with void del_check(const char *tag). Finally, void list_mem_check(dbref player) displays the current counts to a player and void log_mem_check(void) writes them to a log file.

See the Source documentation for more on the functions mentioned in the above paragraph.

Other options

Penn comes with an malloc implementation that includes optional memory debugging (CSRImalloc in debug mode, #define MALLOC_PACKAGE 2 in options.h. I've never used it.

(TODO: Links for the things mentioned below)

There are a variety of third-party tools for detecting memory leaks and other allocation-related problems, such as the MallocDebug program and library that come with the Mac OS X developer tools, the Boehm garbage collection library in leak-detection mode, and more.

The Chunk Memory Management System

The Chunk Memory Management System raevnos Fri, 2003-09-05 11:56

Synopsis:

The chunk memory management system has three goals: to reduce overall memory consumption, to improve locality of reference, and to allow less-used sections of memory to be paged out to disk. These three goals are accomplished by implementing an allocation management layer on top of malloc(), with significantly reduced overhead and the ability to rearrange allocations to actively control fragmentation and increase locality.

Basic operation:

The managed memory pool is divided into approximately 64K regions. Each of these regions contain variable-size chunks representing allocated and available (free) memory. No individual allocation may be larger than will fit in a single region, and no allocation may be smaller than two bytes. Each chunk has between two and four bytes of overhead (indicating the used/free status, the size of the chunk, and the number of dereferences for the chunk), and each region has additional overhead of about 32 bytes.

Allocations are made with the chunk_create() call, which is given the size of the data, the data value to be stored, and an initial dereference count to be assigned to the chunk. Once created, the value of a chunk cannot be changed; the storage is immutable. chunk_create() returns an integral reference value that can be used to retrieve or free the allocation.

Allocations are accessed with the chunk_fetch(), chunk_len(), and chunk_derefs() calls. Each of these if given a reference (as returned by chunk_create()), and chunk_fetch() is additionally given a buffer and length to fill with the allocated value. Both chunk_fetch() and chunk_len() increment a chunk's dereference count (up to the maximum of 255), which is used in migration to improve locality.

Allocations are freed with the chunk_delete() call, which also requires a reference as input.

Finally, allocations are allowed to rearrange themselves with the chunk_migration() call. chunk_migration() takes an array of pointers to chunk references as input, and examines each of the indicated chunks to see which need to be moved to improve the distribution of allocations. If any allocations are moved, then the references to the moved allocations are updated in place (hence the array of pointers to references, instead of just an array of references). Migration may be done incrementally by submitting only a portion of the allocations with each call to chunk_migration(); however, _all_ allocations made with chunk_create() must eventually be submitted for migration in order to maintain the memory pool in a non-fragmented state.

Migration:

Under normal conditions, extended use of this chunk allocation system would lead to a significantly fragmented datastore, unless there was some means to defragment the storage arena. In the long run, this could be very bad, leading to quite a mess. Calling chunk_migration() gives the allocator permission to move allocations around both to defragment the arena and to improve locality of reference (by making sure that all the infrequently used chunks are segregated from the chunks in active use). Of course, moving all the allocated chunks at once would be a slow and painful process. Instead, migration may be done incrementally, giving permission to move a small number of chunks at any one time, and spreading out the cost of defragmenting the data store.

Just because you give permission to move a chunk doesn't mean that it will be moved. The chunk may be perfectly happy where it is, with no need to move it elsewhere. Chunks are only moved when their personal happiness would be improved by a move. In general, maximizing the happiness of individual chunks will improve the happiness of the whole.

There are several things that factor into a chunk's happiness. The things that make a chunk unhappy are:

  • Having free space both before and after the chunk in its region.
  • Having only one allocated neighbor (or worse, none). The edges of a region count as allocated neighbors.
  • Having a dereference count different from the region average. The greater the difference, the more unhappy the chunk is.
  • Being in a sparsely populated region. The more free space in a region, the more unhappy the chunks in it.
  • Being away from the other chunks migrated at the same time. If some of the other chunks allowed to migrate are in the same region as a chunk, then it is happier. (This is specifically to improve locality during dumps.)

None of these factors are absolute; all of them have different weights that add into a general unhappiness for the chunk. The lower the unhappiness, the better.

Over time and usage, the dereference counts for chunks will increase and eventually reach a maximum value of 255. (The count is limited by the fact that it's stored in a single byte for each chunk.) If this is left unchecked, eventually all chunks would have a dereference count of 255, and the counts would be useless for improving locality. To counteract this, when the average dereference count for a certain number of regions exceeds 128, the 'migration period' is incremented and all chunk dereference counts are halved. The critical number of regions is determined based on the cache size and the total number of regions. In general, period change should be controlled primarily by the frequency of database dumps (which end up incrementing the dereference count on all chunks, and thus all regions). Given a dump frequency of once per hour (the default), there should be a period change about every 2.6 days.

Statistics:

The chunk memory management system keeps several statistics about the allocation pool, both to maintain good operation through active encouragement of locality, and to satisfy the curiosity of people using the system (and its designer ;-)). These statistics are reported (in PennMUSH) through the use of the @stats command, with /chunks switch.

@stats/chunks generates output similar to this:

Chunks:         99407 allocated (   8372875 bytes,     223808 ( 2%) overhead)
                  74413 short     (   1530973 bytes,     148826 ( 9%) overhead)
                  24994 medium    (   6841902 bytes,      74982 ( 1%) overhead)
                      0 long      (         0 bytes,          0 ( 0%) overhead)
                  128 free      (   1319349 bytes,      23058 ( 1%) fragmented)
Regions:          147 total,       16 cached
Paging:        158686 out,     158554 in
Storage:      9628500 total (86% saturation)

Period:             1 (   5791834 accesses so far,       1085 chunks at max)
Migration:     245543 moves this period
                 145536 slide
                     45 away
                  30719 fill exact
                  69243 fill inexact

First, the number of allocated chunks is given, along with their total size and overhead. Then, the allocated chunks are broken up by size-range; short chunks (2 to 63 bytes) with two bytes of overhead each, medium chunks (64 to 8191 bytes) with three bytes of overhead each, and long chunks (8192 to ~64K bytes) with four bytes of overhead each. Rounding out the individual chunk statistics is the number of free chunks, their total size, and the amount of fragmented free space (free space not in the largest free chunk for its region is considered fragmented).

Next comes statistics on regions: the number of regions in use and the number held in the memory cache. All regions not in the cache are paged out to disk. Paging statistics follow, listing the number of times a region has been moved out of or into memory cache. After that, the total amount of storage (in memory or on disk) used is given, along with the saturation rate (where saturation is indicated by what fraction of the used space is actually allocated in chunks).

Finally comes statistics on migration and the migration period. The period number is listed, along with the total number of dereferences in the period and how many chunks have the maximum dereference count of 255. Then the amount of migration movement is listed, both in total and broken up by category. Slides occur when an allocation is shifted to the other side of a neighboring free space. Away moves are made when an allocation is extremely unhappy where it is, and is pushed out to somewhere else. Fills are when an allocation is moved in order to fill in a free space; the space can be either exactly filled by the move, or inexactly filled (leaving some remaining free space).

Histograms:

The chunk memory management system can also display a few histograms about itself. These histograms are reported (in PennMUSH) through the use of the @stats command, with the /regions, /freespace, or /paging switches.

All of @stats/regions, @stats/freespace, and @stats/paging produce histograms vs. region average dereference count. The histograms use buckets four counts wide, so all regions from 0-3 will be in the first bucket, 4-7 in the second, etc., up to 252-255 in the last bucket. If the heights of the buckets are significantly different, then the highest spikes will be allowed to extend off the top of the histogram (with their real values labeled in parenthesis next to them).

@stats/regions is a histogram of how many regions at each count currently exist. In a healthy game, there should be a large spike at some dereference count between 64 and 128 (representing the largely unused portion of the database), a lesser spike at 255 (representing the portion of the database that's used very frequently), and a smattering of regions at other counts, with either new areas of the database (below the large spike) or semi-frequently used areas (above the large spike). New migration periods occur when the large spike would pass 128, at which point everything is halved and the spike is pushed back down to 64.

@stats/freespace is a histogram of how much free space exists in regions at each dereference count. This histogram is included to aid in diagnosis of the cause for dropping saturation rates. Ideally, the free space in any bucket should be less than 120K.

@stats/paging is a histogram of the number of regions being paged in or out at each dereference count. As of this writing, a very unhealthy behaviour is observed, wherein the histogram shows a trapeziod between 64 and 128, drowning out most of the rest of the chart. This indicates that as time goes on, the attributes associated with a single low-use object are getting scattered randomly throughout all the low-use regions, and thus when dumps occur (with their linear enumeration of all attributes on objects) the low-use regions thrash in and out of cache. This can be very detrimental to dump performance. Something will have to be done to fix this tendency of migration; the unhappiness based on other chunks being migrated in the same region is the current attempt. Healthy behaviour will make some other pattern in the paging histogram which has not yet been determined.