Data structures

Data structures raevnos Thu, 2007-05-24 16:27

The pages in this chapter describe the various data structures used in Penn. They include hash tables, shared strings, unique-prefix lookup tables, and more.

Hash tables

Hash tables raevnos Mon, 2007-06-04 22:26

Hash tables are a (usually) fast lookup function for (string key, value) tuples. They are used for things like looking up softcode functions. They support lookup of exact key names, and iteration through all elements in a random order.

API

To use hash tables, #include "htab.h" if it isn't already. Hash tables can then be declared with the HASHTAB type.

For example:

HASHTAB htab_sample;

Before using a hash table, it must be initialized with hash_init. This takes three arguments: A pointer to the hash table, an initial size of the table, and a pointer to a function that is passed the data stored in an entry when it's deleted (Or NULL). hashinit() is a macro that passes a NULL cleanup function and only takes the first two arguments.

Example:
hash_init(&htab_sample, 256, NULL);

When done with a hashtable, call hash_free() on it.

Once you've initialized a table, you can add elements to it with hashadd(). It takes three arguments: The string to use as a key, a pointer to the data to store, and a pointer to the table. The key is copied; the data pointer is stored untouched. It is this that's passed to a cleanup function if given in hash_init().

Example:
hashadd("foo", &some_value, &htab_sample);.

You can look up elements with hashfind(). Its arguments are the key to look up and a pointer to the table. It returns the data pointer for the associated key, or NULL if the key was not found. Hence, it's not a good idea to use a NULL data pointer.

Example:
if (hashfind("foo", &htab_sample)) { item_present(); }

Delete elements from the table with hashdelete(), which takes the key and pointer to table arguments just like hashadd().

Iterate through all data pointers in the hash table with hash_firstentry()/hash_nextentry(), which returns the first data pointer, followed by every other data pointer, returning NULL after the last. hash_firstentry_key()/hash_nextentry_key() does the same thing for the keys; you cannot currently (1.8.3p3) get both key and data in one iteration; this might change in the future.

Internals

Hash tables use cuckoo hashing, which produces very dense memory-saving tables. They strongly favor usages where there are few insertions (Which can potentially be very costly), and many lookups (Which are very fast and cheap).

Perfect Hash Tables

Perfect Hash Tables raevnos Sat, 2011-04-16 02:04

For cases where we have to check to see if a string is one of a fixed set of values that will never change while a MUSH is running, Penn uses a tool called gperf to generate lookup functions that use a perfect hash table to quickly and efficiently check for a match.

For details on the format of a gperf file, see its documentation.

To include a gperf-generated table in Penn:

  1. Create a new foo.gperf gperf file. See the existing ones for templates.
  2. Look for the existing gperf entries in src/Makefile.in and add a new one following their template to generate foo.c.
  3. Run ./config.status to rebuild src/Makefile.
  4. In the source file that uses the lookup function, #include "foo.c"

Examples can be seen in src/funmath.c and src/markup.c, among others.

Integer Maps

Integer Maps raevnos Thu, 2008-01-31 14:53

Integer Maps map unsigned 32-bit integer keys to arbitrary pointers. They were added in 1.8.3p7. They're used in a number of places; for example, in queue entry process id related code (src/cque.c), and in looking up the structure representing a connected socket via its descriptor number (src/bsd.c).

API

To use integer maps, you have to #include "intmap.h". The type of an integer map is a intmap *. The type is opaque; you cannot directly access elements of the structure.

Basic functions

intmap * im_new(void)
Return a new, empty intmap structure.
bool im_insert(intmap *, uint32_t, void *)
Insert a new key,value pair into the map. Returns true on success, false on failure (Usually because of a duplicate key).
void * im_find(intmap *, uint32_t)
Return the pointer associated with the given key, or NULL if the key isn't present.
bool im_exists(intmap *, uint32_t)
Returns true if a key is present in the map.
int64_t im_count(intmap *)
Returns the number of keys stored in the map.
bool im_delete(intmap *, uint32_t)
Removes a mapping from the map. Returns true on success, false on failure (If the key didn't exist.) Deletion of memory allocated in the data pointer is left up to the user.

Less common functions

void im_destroy(intmap *)
Delete an entire intmap. The intmap passed to this function becomes unusable. Memory allocated in the data pointers must be freed by the user.
void im_dump_graph(intmap *, const char *)
Dump a representation of the map to the given filename. The dump is in the dot language, which can be rendered by tools in the graphviz package, specifically dot(1). Useful for debugging the internal tree structure.

Implementation

Integer maps use a slight modification of radix trees (AKA patricia trees), a self-balancing binary tree where branching is based on the status of particular bits in the key.

Prefix tables

Prefix tables raevnos Thu, 2007-05-24 17:03

Prefix tables, aka ptabs, are a (string key,value) lookup structure used to find the entry for a name that is the unique prefix of a key in the table. For example, they're used by @set for flag names so that you don't have to get the full name of the flag.

API

To use a ptab, #include "ptab.h" if it isn't already in the source file. Prefix tables then can be declare using the PTAB type.

For example:
PTAB ptab_sample;

Before using any other functions on a new ptab, it must be initialized, with ptab_init(), which takes a pointer to a PTAB as its argument. Unless otherwise mentioned, all ptab functions take a pointer to a PTAB as their first argument.

Example:
ptab_init(&ptab_sample);

When done with the table, call ptab_free() to release allocated memory:

ptab_free(&ptab_sample);

Before looking up anything, you have to add (key,value) tuples to the table. There are three steps involved in this. First, you have to call ptab_start_inserts(), then add new entries with ptab_insert(ptab, key, data). Once done inserting items, call ptab_end_inserts(). You can only insert items between a start/end; lookups are disallowed.

Example:


ptab_start_inserts(&ptab_sample);
for (n = 0; n < initarray_size; n++) {
ptab_insert(&ptab_sample, initarray[n].key,
initarray[n].value);
}
ptab_end_inserts(&ptab_sample);

With version 1.8.3p3 and later, you can use ptab_insert_one() to insert a single new item. It's better to use ptab_insert() when inserting multiple entries at one time.

Example:
ptab_insert_one(&ptab_sample, "I AM A UNIQUE PREFIX", &stdin);

You can look up an entry with ptab_find(). If an entry matching the given key is found, that element's data pointer is returned. If it's not found or there are more than one keys with the given lookup key as a prefix, NULL is returned. Note that this means that if you use a NULL data pointer, there is no way to tell it apart from a failed lookup.

Example:
data = ptab_find(&ptab_sample, "I AM A");

You can also do a lookup testing for an exact match only with ptab_find_exact():

data = ptab_find_exact(&ptab_sample, "I AM A UNIQUE PREFIX");

Delete entries from a table with ptab_delete(). The key must be an exact match, and no cleanup is done of the data pointer. If it needs to be freed, that has to be done first.

Example:
ptab_delete(&ptab_sample, "I AM A UNIQUE PREFIX");

Finally, you can iterate through all entries in a table with the ptab_firstentry_new() and ptab_nextentry_new() functions. The former returns the data of the first element, and the latter subsequent data. It returns NULL once all elements have been returned. The second argument to each, if non-NULL, must be large enough to hold the longest key in the table. If you don't care about the key, use ptab_firstentry() and ptab_nextentry().

Internals
Internally, ptabs are implemented as a sorted array. Exact lookups are done with a binary search and thus have a O(log N) performance. Prefix lookups use a modified binary search that finds the first entry with a prefix of the lookup key and checks the elements before and after. to see if it's unique.

When new entries are added, the key (A string) is copied. The calling function can safely free the memory used as the key argument if needed. The data pointer is merely stored.

ptab_start_inserts() marks the table as being unsorted. ptab_insert() adds new elements to the end of the array, and ptab_end_inserts() sorts it again. ptab_insert_one() does a single insertion into the proper place in the table.

See the documentation of ptab.c for more information.

Shared strings

Shared strings raevnos Thu, 2007-05-24 18:28

Shared string trees, aka StrTrees or just string trees, allow you to cache frequently-repeated strings instead of having dozens or hundreds of copies of the same one. They are used for things like attribute and object names.

API

Use string trees by putting a #include "strtree.h" in your source if it isn't there already. String tree variables have the type StrTree. Unless otherwise mentioned,all string tree functions take a pointer to a StrTree as their last argument.

StrTree st_sample;

Before using one, it must be initialized by calling st_init().

st_init(&st_sample);

New strings are added to the tree with st_insert(). The string that it returns is the persistent one that should be saved by you.

Example:
mystruct->name = st_insert("FOO", &st_sample);

When done with a particular copy of a shared string, call st_delete().

Example:
st_delete(mystruct->name, &st_sample);

To test if a given string is in the tree, use st_find(). It returns a pointer to the shared string and NULL if not. This pointer should not be saved beyond the immediate function it's used in. Use st_insert() for that.

Implementation

The string tree is a red-black tree (Hence the name) of reference-counted strings. Every time a string is inserted, its count goes up, and when it's deleted, the count goes down. A string is removed from the tree when its count goes to 0. If a string's reference count goes to 127, it becomes a permanent entry and will never be deleted.

See the the documentation of strtree.c for more information.