Diffs, patches, and source code control

Diffs, patches, and source code control javelin Tue, 2003-01-28 23:06

Sure, you want to add your own hacks to PennMUSH, but you also want to keep up with the patchlevels that the developers release so you get the benefit of bugfixes and new features, right? In this section, we'll look at how that's done, and you'll learn about generating your own patches, applying patches, and more sophisticated mechanisms of source code control.

I'll refer to the distributed PennMUSH source code and patches as the 'distributed branch' or 'dist'. We'll be looking at the situation where you have modified the dist to create your 'local branch' or 'local'. If you start out with the dist at patchlevel n (we'll call that dist[n]), and you add some hacks to it, you've created local[n] -- dist[n] + your local changes. Suddenly, the developers release dist[n+1] and a patch dist[n]->dist[n+1]. Now what?

There are two main strategies to upgrade:

  1. Take your copy of local[n] and apply dist[n]->dist[n+1]. If it succeeds, you've probably got local[n+1]. If not, you'll have to fix the failed chunks by hand.
  2. Get a copy of dist[n] and a copy of local[n] and generate a patch dist[n]->local[n]. This patch essentially contains your changes to dist[n]. Then get a copy of dist[n+1], and apply the patch. If it succeeds, you've probably got local[n+1]. If not, you'll have to fix the failed chunks by hand.

Each approach has pros and cons. The first method doesn't require you to keep many copies of the source code; you just use the patches as they come out. The downside is that if patch hunks fail, you'll have to apply them by hand, and if you didn't back up your source tree, you may not be able to get up and running if there's a serious failure.

The second method requires you to keep two or three source trees, which is both more annoying and safer. On the plus side, if a patch fails, you'll be applying your hack back by hand -- code that you probably know quite well. And if you want to contribute your hack to PennMUSH (or to another user), you've already got a patch.

In general, the first method works best when your local changes are small and well-contained (in fact, if they're all in the *local.c files, the first method is ideal, as those files are never patched directly). When your local changes are extensive, the second method wins.

Even better, you can automate the second method by using source code control software, which we'll discuss later. First, a look at how to make a patch file, and how patches are applied.

Making patches: the diff program

Making patches: the diff program javelin Tue, 2003-01-28 23:18

The "diff" program (a standard unix utility) can produce apatchfile, given the original and revised source code
files. For example, if you revise player.c, and save the older version as player.c.orig, you could make a patchfile like this:

    diff -c player.c.orig player.c > patchfile

The "-c" switch indicates that you want a context diff, which is more detailed than an ordinary diff and better for patches. If you're going to publicly distribute the patch, be sure it's a context diff! (Another diff format, unified diff, is also appropriate. You can use "-u" to get a unified diff. Some people find them easier to read, and some -- like me -- don't.)

The order of the files is important. The patchfile will be written to apply the differences between player.c.orig and player.c so that player.c will be the end result.

If there's more than one source file changed, you can do this:

    diff -c player.c.orig player.c > patchfile
    diff -c game.c.orig game.c >> patchfile

You may be able to quickly make a collection of
diffs across the whole PennMUSH source tree by using David
Cheatham's mkpatch shell script, which is
available at http://ftp.pennmush.org/Accessories/mkpatch

Applying patches: the patch program

Applying patches: the patch program javelin Tue, 2003-01-28 23:36

Changes and bugfixes to the MUSH code are often distributed as patches in context diff format. Diffs are files which describe how the source code should be changed. The program "patch" (by Larry Wall, distributed by the GNU project) automatically reads these files and makes the changes to your source code. If you don't have the patch program, ask your system administrator to get it! (A version for Win32 systems is available at http://unxutils.sourceforge.net)

Typically, you can get patches to PennMUSH from the pennmush mailing list or FTP site. To apply a patch, place it in your top-level pennmush directory. Then READ THE PATCH FILE.

PennMUSH patches contain instructions at the top of the file. These instructions can be very important, and can differ from patch to patch. Always read the patch file.

That said, the most common instruction is to apply the patch by typing:

    patch -p1 < patchfile

in order to process the patch.

The patch program is very smart. The changes in the context diff are split into hunks. Each hunk represents a section of code that's been changed (and that's far enough away from any other code changes to be handled separately). A hunk includes information about the line numbers in the original file where the change was made, along with a few lines of context -- lines of code that haven't changed but surround the changed lines. Using this context, patch can apply the changes in the right place even if you've modified other sections of the program.

Sometimes, however, you may have made changes that overlap the changes in a patch hunk. In this case, the hunk will fail to patch because the patch program won't find the appropriate context. When this happens, you need to read the failed hunk yourself and apply the changes by hand. Failed hunks are usually named for the file they were intended for, but end in .rej (e.g. "bsd.c.rej").

Here's how to read a failed hunk (this is based on an email message by T. Alexander Popiel):

  • If the diff begins with the line "Prereq: ", then it means that in the file that follows, the string should be present within the first few lines, or the diff should not be applied. Patch checks for prerequisites to help insure that you're patching the right version of the source code. For example, PennMUSH patches always start by patching the Patchlevel file, and use the contents of that file as a prerequisite to be sure you don't apply 1.7.7-patch08 to the 1.7.7p4 version by mistake.
  • For each file in the diff, there are two header lines, which look like this:
    *** filename1	date1
    --- filename2	date2

    This indicates that the diff is the difference between filename1 and filename2, and should usually be applied to filename2.

  • After the header lines, the diff will indicate which line numbers in filename1 (the source) are to be examined, what should be changed or deleted, what the resulting line numbers in filename2 (the destination) are, and what should
    be changed or added:
    • A '-' in the first column of the source part of the patch indicates a line deletion.
    • A '+' in the first column of the destination part of the
      patch indicates a line addition.
    • A set of '!'s in the first column in both the source and the destination indicate a line-group replacement; a group of consecutive lines in the source are replaced with a corresponding group of lines in the destination.

The patch program can also "reverse" a patch, applying it backward to turn the new version back into the old version. To reverse a patch, you feed the patch to the patch program giving it the "-R" switch.

Source code control

Source code control javelin Tue, 2003-01-28 23:51

You can go a long way with diff and patch, but if you're making serious changes, you need more powerful tools. If you hack at the PennMUSH source code long enough, you will eventually make a mistake, and want to go back to an earlier version of your work. Or you'll make a really good but extensive change, and want to keep up with patchlevels or produce your own patchfiles to distribute it to others.

You can make your life a lot easier with some form of "source code control" or "revision management" software. ere are some common revision management approaches:

  1. Backups. Make a directory alongside your pennmush directory called "oldpenn", and put a copy of your source code into it. When you make changes, you can recover your old files from oldpenn, use it to produce patches, and eventually copy your new files into oldpenn when you're sure they work. Many people keep a "clean" source directory containing the original dist code of their version, in case they need it. Of course, if you need to go back more than one revision, you're in trouble unless you clutter your disk with many many oldpenn directories. A variant of this strategy involves storing older versions as compressed tar files.
  2. SCCS. SCCS (source code control system) is a more sophisticated way to manage source code. It stores changes from version to version in a subdirectory. You "check out" files to work on them, and "check in" files that you've hacked. You can revert to any revision at any time. This is good. Many major unix systems (Ultrix, SunOS, HP-UX) come with sccs installed. Read the man pages for info.
  3. RCS. RCS (revision control system) is the GNU project's free replacement for SCCS, available from
    http://www.cvshome.org. The commands are different from SCCS, and some things are easier to do. RCS is standard with Linux. RCS can be used to ease upgrading to a new patchlevel by preserving your hacks to the older patchlevel.
  4. CVS. CVS (also from GNU) is the "concurrent version system". It uses RCS to store revisions, but provides a higher-level concept of a project version (rather than just single files) and has better support for multiple programmers concurrently changing files (including making changes to the same file). Finally, CVS repositories can be made accessible over the Internet.
  5. prcs. prcs (project revision control system) is the PennMUSH devteam's current favorite. It's available from
    http://prcs.sourceforge.net. Like CVS, prcs uses RCS, but provides a high-level concept of a project rather than individual files. It also has excellent support for automatically merging code changes (such as new PennMUSH releases) into your locally modified version.
  6. Subverson. Subversion (svn) is a version control system built with the intention of being a compelling replacement for CVS. Available from http://subversion.tigris.org under an Apache/BSD-style license. It includes many of the key features of CVS, including the higher level concept of projects rather than individual files. It also tracks meta-data for directories, renames, and files, has truly atomic commits, and cheap easy branching and tagging. Repositories are accessible locally, through http(s) with WebDAV/DeltaV, and via its own svn server protocol for easy remote usage.

If you choose to use a source code control system (and I can't recommend it highly enough), discipline yourself to always* check in code after each revision, so that you can undo each step. If you have multiple people hacking especially from different accounts on the machine), you can take advantage of the fact that RCS and SCCS will "lock" revisions so that only the person who checked it out can modify it and check it back in, preventing two people from making inconsistent changes. Or use CVS or prcs, which allow (and expect) multiple people to change things at once, and try to help deal with possibly conflicting changes.

If you can't decide what software to use, CVS has a very large userbase who can probably be helpful, but prcs has the PennMUSH devteam who can advise you. Your call.

When you get your first pennmush distribution, check in the entire source directory. With prcs, that's:

prcs checkout pennmush
prcs populate
prcs checkin

If you make some changes and then want to produce a diff of your changes:

prcs diff > patchfile

makes a diff from the last checked-in revision to the curren
version of the project. If you read the man page for prcs, you'll see that it can also make diffs between checked-in revisions, for single files, etc.

prcs supports the notion of multiple branches. You can store a branch that tracks the distributed PennMUSH source code, and a second branch that tracks your locally hacked code. You can then produce diffs between them at any time using prcs diff, or merge changes to the dist into your local code using prcs merge. See the man page for details.

#ifdef and #define

#ifdef and #define javelin Tue, 2003-01-28 23:56

You can save yourself a lot of hassle if you're careful in how you hack the PennMUSH code. When you decide to add new code, or change old code, add an #define into options.h which will turn your code change on or off. For example, if you're adding a new feature to change the WHO format, put something like this into options.h:

/* If defined, the WHO commands will use a new format */
#define NEWWHO

Then, surround your additions with #ifdef NEWWHO...#endif pairs. For changes, use #ifdef NEWWHO...#else...#endif. For deletions from the original code, use #ifndef NEWWHO...#endif around the code to delete:

#ifdef NEWWHO
  this is code that you've added
#endif
...
#ifdef NEWWHO
  this is code you've changed
#else
  this is the original code
#endif
...
#ifndef NEWWHO
  this is original code you want deleted
#endif

This allows you to preserve the original PennMUSH coding, should you ever need to refer back to it (if, for example, you're trying to apply someone else's patch to something you've already changed), and allows you to turn on and off your feature as necessary.