Lag

Lag javelin Thu, 2004-03-18 13:45

One of the most pernicious problems common to MUSHes is lag, the condition in which the MUSH feels slow in responding to player input. There are a number of things which contribute to lag, and once you identify the culprit, you can decide if you can improve the situation.

Network lag

Network lag is caused by difficulties in the network connection between the player and the MUSH host machine. For example, a router on the internet between the two might be dropping packets, or a segment of the network might be overloaded with packets.

The characteristic property of netlag is that you won't experience it if you're connecting to the MUSH from the MUSH machine itself. If you can get a lagless connection by doing a 'telnet localhost ', netlag is responsible.

If your host has the ping command (most do), you can test how long it takes a packet to travel between your host and another machine, and try to identify how slow things are going to be. If you happen to have the traceroute command, you can see exactly where (between which routers) the network is lagged.

Unless you happen to be a network administrator of the problem stretch of network (or if it's just the local connection to your machine), there's not much you can do. If the problem is your particular net connection, you can probably spend more money and get one with a higher bandwidth, or reduce other things your machine does that requires the net (email, etc.), but neither of these are really worth it for a MUSH, usually. :(

DNS and IDENT lag

Another network-related source of lag involves domain name service lookups. When a player connects to the MUSH, the MUSH knows the player's IP address, and queries the DNS to get the player's hostname. This is called a "reverse hostname lookup".

Some hosts, however, have very slow nameservers. Sometimes this is because the nameservers are behind slow internet connections with heavy traffic. The MUSH stops while a reverse hostname lookup is going on, so if it takes more than a second, you will feel lag.

If you've got your MUSH configured to use IDENT lookups, the same kind of problem applies. In addition, IDENT lookups of non-unix systems can hang until the lookup times out.

There are a few ways you might deal with this problem:

  • #define INFO_SLAVE in options.h. This creates a separate process that handles lookups, so your MUSH won't have to. This doesn't work well on win32 systems.
  • Turn off ident or reduce the ident_timeout value in mush.cnf
  • If you don't need hostnames, set use_dns to "no" in mush.cnf and no reverse hostname lookups will be performed.
  • If there are particular sites that are causing you trouble, and if you have access to your system's hosts file (/etc/hosts on most Unix systems), you could try making an entry for the troublesome system in the hosts file. Many DNS setups will check the hosts file before asking the nameserver.

CPU lag

CPU lag is caused when the MUSH machine is having to split its time doing many tasks or tasks which require a lot of running time spent in the CPU. If your MUSH is on a machine which has a lot of users, this is more likely. If the users are programmers who run things like compilers regularly, this becomes much more likely.

You can examine the CPU load in a few ways. The uptime command will display an interesting, if non-objective statistic called "load average", measured over the last minute, 5 minutes, and 15 minutes. If you know what typical load average looks like, you'll be able to recognize abnormally high load. Loads over 3, especially in the 5/15 minute entries, tend to make for a slow game.

But what's causing the load? Here you can use ps -auxw (BSD) or ps -elf (SysV) to see all the running processes and how much CPU time they're getting at that moment. This static picture can be deceiving, but is a good start. Read the man page for details.

If you're responsible for the load due to your own compiling and such (or if you need to decrease the CPU load your MUSH puts on the machine), read the man page for the nice and renice commands, which let you tell the system that your compilation (or MUSH process) should be nice about using CPU, and the CPU should give it lower priority.

Disk swapping

Unix systems have a limited amount of memory in which to run their programs. Memory is also expensive. So unix systems use a part of the disk as "virtual memory" or "swap space". Processes send parts of their memory that haven't been accessed in a while off to virtual memory, a process called paging. Paging helps make all the programs work together gracefully.

MUSHes can be pretty big programs, though, and sometimes another program (some compilers and editors, as well as statistical software and other such things, for example) has to be granted more memory than can be recovered even with paging. In these situations, the whole MUSH process may be "swapped out" to the disk, temporarily put on hold until it can be swapped back into memory. While the MUSH is swapped out, the game is frozen, and if it stays swapped long enough and often enough, you experience lag.

Dealing with swapping and paging is beyond the scope of this guide. Read the man pages for ps, vmstat, pstat, and iostat, or pick up a book on Unix System Administration or Performance Tuning (say, the O'Reilly Handbooks, which are great) if you're the system administrator of your MUSH machine. If not, purge unused objects from your database and hope.

Because PennMUSH now performs its own swapping of certain parts of memory out to disk (attribte values, locks, and mail texts), the operating system is less likely to need to page or swap the MUSH process. On the other hand, because PennMUSH is itself using disk, the speed of disk I/O is still important. On a machine with lots of memory and slow disks, turning off Penn's own chunk swapping may be a good thing, and is done by setting the 'chunk_cache_memory' define in mush.cnf to a very high number (2000000000 is recommended).

Queue lag

Queue lag occurs when the MUSH's queue becomes clogged, and the MUSH can no longer keep up with servicing the queue and player input. Players get priority, so keyboard response will be good, but if they try to use $commands, which go on the object queue, they won't get response for some time.

@kick can be used as a temporary fix for this, though see the concerns above in the section on Wizcommands. A more permanent solution might be to adjust the values of queue_chunk and (especially) active_queue_chunk in the mush.cnf file. This will make keyboard response slighly worse, but will usually fix the queue clogging.