Code tip: Data Factory

Data handling is one of the most awkward things in MUSH. You want your data to be compact, so you want to try to avoid splattering it across a zillion individual attributes. But you also need your data model to be flexible, so that you can add fields to your data structure over time. If you shove the entirety of a data structure into a list, you can often end up with code that's hard to write and debug, because you're constantly trying to find and edit elements embedded within that list.

My belief is that one of the reasons that people find MUSHcode extremely time-consuming to write, as well as hard to maintain, is that their data models, and the way they handle, store, and manipulate data simply isn't very good. Moreover, it is incredibly easy to write obfuscated MUSHcode.

My solution to this is a layer of what I call Data Factory code. What follows is an explanation plus the code for it.

The Data Factory Concept

At minimum, every data entity consists of a data structure type, its ID number (specific to type), the dbref of the object that it is stored on, the structure version number, an owner, the last object to edit it, and the last edit time. Every data entity also contains a set of named fields, specific to its type (and to the version number of that type).

So, for instance, I can have a data structure type called "ship". Version 1 of the ship structure might just contain "xpos ypos" as its two named additional elements, indicating the ship's coordinates. Version 2 might be "xpos ypos homeport"; using this system, I can make that modification and have the data migration to the new format
be automatic and invisible.

In addition to a list of named fields, data structures can be customized with a set of labels for those fields (used for the default print function for that type), and a set of defaults to populate in those fields when a new instance of that type is created. I can also set a maximum number of IDs to store on a data object for entities of that
type. If this is set, the MUSH automatically creates an additional data object every time that maximum is exceeded.

Every time I create or load a data entity, I can give it an arbitrary identifier, like "myship". All fields within the entity are accessed as identifier.field -- so, for instance, if identifier "myship" is of type "ship", then the global register myship.homeport is the value of its homeport field.

My generic MUSHcode SAVE_FN takes an identifier and saves it. Since the fields are just global registers, you can change them with setq() and friends, and the save will write them out automatically; since the entity carries its ID and datastore object on it as part of its fields, the programmer does not have to worry about the details. Similarly,
when you call LOAD_FN, you just need to provide a type and the ID; it knows how to derive the data object it's stored on.

Because I'm working with named global registers, and each such register is very clearly associated with a particular entity (attacker.weapon is obviously different than defender.weapon, say), making any changes to data is just a matter of writing to a register, then saving the whole blob. It saves me a gigantic amount of time, and the resulting code is a lot easier to read, too.

So, to take the ship example, if I wanted to create a new ship, owned by me (the enactor), and give it initial coordinations of (4201, 6250) and a home port of Mars, and be able to readily refer to it as 'newship' in the remainder of this block of code, I'd do this sequence of functions (the setq calls can be combined if desired):

[u(NEW_FN, %#, newship, ship)]
[setq(newship.xpos, 4201)]
[setq(newship.ypos, 6250)]
[setq(newship.homeport, Mars)]
[u(SAVE_FN, newship)]

I can access the newly-saved ship's ID with r(newship.id) and that number is the way I can access that data later. So at some future point, if I know that this is ship ID 123, I could do [u(LOAD_FN, oldship, ship, 123)] to load it with the identifier "oldship", allowing me to access the data with r(oldship.xpos), and so on. If I want to save any modifications, it's as easy as calling [u(SAVE_FN, oldship)]

This kind of technique can not only save you a lot of coding time, but it can also make your code much more readable.

The Data Factory Code

This code is for TinyMUSH 3.1, but can be adapted for any current MUSH flavor. It's intended to be quoted through Adam Dray's Unformat. (There are places where %q substitutions have been done with r() instead, to make this easier to post in HTML format.)

If you make heavy use of this, instantiating lots of different entities in a single pass, you may need to bump up the number of global registers that an object is allowed to use.

For convenience, let's start by defining a couple of global named references, which allow us to assign names to dbrefs. #67 and #68 just happen to be the dbrefs on my database; substitute whatever objects for yours. IF your codebase doesn't support nrefs, just use normal dbrefs; it's just going to allow us to use #__factory and #__meta to refer to the objects.

@reference _factory = #67
@reference _meta = #68

#__factory is the object that's going to contain our data factory code, along with our data definitions. #__meta is for our metadata; it's the object that is going to contain all the actual runtime data, like where the actual entity data is stored, and how many IDs of a particular type we've created. We're also going to use the metadata object as a storage container for the data objects. Both of these objects should be flagged HALT and SAFE.

We'll begin with the function that creates a brand-new entity. Every entity has an ID number, which is permanent; this is distinct from its identifier, which is the handle for the instance. We can normally allow the function to simply take care of where to store the data, but it can take an optional final parameter. It will also populate the new entity with the defaults for that type.

# Call as: u(NEW_FN, owner_dbref, identifier, type)
# Returns: Nothing.
# Side-effects:
#   - Sets identifier.various_fields registers to type defaults.
#   - Increments the top ID of the type.
#   - May create a new data object, if we've exceeded a max-IDs breakpoint.
#
# %0 - owning player, %1 - string identifier
# %2 - data structure type, %3 - data object (optional)
#
&NEW_FN #__factory=
[setq(%1.owner,%0)]
[setq(%1.editor,#-1)]
[setq(%1.etime,secs())]
[setq(%1.type,%2)]
[setq(%1.obj,usetrue(%3,last(setr(lo,get(#__meta/%2_OBJ)))))]
[setq(%1.id,inc(get(#__meta/%2_TOP)))]
[set(#__meta,%2_TOP:[r(%1.id)])]
[nonzero(cand(notbool(%3),
              setr(ld,v(%2_MAX)),
              gt(r(%1.id),r(ld)),
              eq(mod(r(%1.id),r(ld)),0)
         ),
  [setq(%1.obj,create([capstr(lcstr(%2))] Data [inc(words(r(lo)))],10))]
  [set(#__meta,%2_OBJ:[r(lo)] [r(%1.obj)])]
  [set(r(%1.obj),halt)][set(r(%1.obj),safe)][tel(r(%1.obj),#__meta)]
)]
[qvars(iter(v(%2_DATA),%1.##),v(%2_DEF),`)]
-

Creating an entity doesn't actually save it permanently; the assumption is that you'll create, alter the fields as need be, and then save it. So our next thing is our save function, which we call with the entity's identifier, and the dbref of the player (or object) that we want to note is responsible for the change.

# Call as: u(SAVE_FN, identifier, editor_dbref)
# Returns: Nothing.
# Side-effects: Saves the entity to an object, as attr type_ID
#
# %0 - identifier, %1 - editor
#
&SAVE_FN #__factory=
[case(,
  r(%0.obj),#-1 NO OBJECT,
  r(%0.id),#-1 NO ID,
  set(r(%0.obj),
    [r(%0.type)]_[r(%0.id)]:
      [default([r(%0.type)]_V,1)]`[r(%0.owner)]`[usetrue(%1,%#)]`[secs()]`
      [iter([v([r(%0.type)]_DATA)],edit(r(%0.##),`,'),,`)]
  )
)]
-

As a word of warning, because the backtick ` is used to separate data fields, you need to make sure to clean all ` out of your data before saving it. The code automatically does this for you, at the moment, replacing ` with ' at save time. If you want to worry about doing that yourself, replace the line:

[iter([v([r(%0.type)]_DATA)],edit(r(%0.##),`,'),,`)]

with:

[iter([v([r(%0.type)]_DATA)],r(%0.##),,`)]

and keep your data clean by checking it before saving it.

Now that we can save data, we need to be able to load it. We call this with the identifier we want to associate with this instance of the entity, the entity's type, and the entity's ID. We can normally allow it to just figure out what data object to read it from, but we can also specify it with an optional final parameter. Our load function is also able to automatically migrate data in an old format to the current version of that type. (Note that when we update the data format, we need to keep the attribute type_DATA_version on #__factory in order to know how to load that previous version.)

# Call as: u(LOAD_FN, identifier, type, ID)
# Returns: Nothing.
# Side-effects:
#   - Success: Sets identifier.various_fields registers to data.
#   - Failure: Sets identifier.various_fields registers to null.
#
# %0 - string identifier, %1 - data structure type, %2 - ID, %3 - data object
#
&LOAD_FN #__factory=
[nonzero(neq(words(setr(lo,usetrue(%3,get(#__meta/%1_OBJ)))),1),
  setq(lo,extract(r(lo),inc(div(%2,v(%1_MAX))),1))
)]
[nonzero(setr(ld,get(r(lo)/%1_%2)),
  nonzero(qvars(iter(v owner editor etime [v(%1_DATA)],%0.##),r(ld),`),
    /@@ read failed - wrong data version - upgrade automatically @@/
    [qvars(iter(v(%1_DATA),%0.##),v(%1_DEF),`)]
    [qvars(iter(v owner editor etime [v(%1_DATA_[first(r(ld),`)])],%0.##),
           r(ld),`
    )]
  ),
  /@@ no data, return empty @@/
  [setq(%0.v,-1)][setq(%0.owner,#-1)][setq(%0.editor,#-1)][setq(%0.etime,-1)]
  [null(iter(v(%1_DATA),setq(%0.##,)))]
)]
[setq(%0.type,%1)][setq(%0.id,%2)][setq(%0.obj,r(lo))]
-

It's useful to have a wrapper function that does a load, and tells us whether the load succeeded or not. So we make a function that returns 0 or 1, indicating failure and success. We'll probably rarely call LOAD_FN directly, since we usually care about knowing whether or not we have an error to handle.

# Call as: u(OK_LOAD_FN, identifier, type, ID)
# Returns: 0 if the load failed, and 1 if the load succeeded.
# Side-effects:
#   - Success: Sets identifier.various_fields registers to data.
#   - Failure: Sets identifier.various_fields registers to null.
#
# %0 - string identifier, %1 - data structure type, %2 - ID, %3 - data object
#
&OK_LOAD_FN #__factory=[u(LOAD_FN,%0,%1,%2,%3)][gt(r(%0.v),0)]
-

In many cases, we'll want to check whether a particular entity exists or not, before attempting to do some operation on it. So we have a function that simply checks if a given ID number of a specific type, exists (where "exists" is "has been saved and exists as an attribute on the data object"). Like usual, we can let the function just take care of finding the appropriate data object, but it can be specified as an optional final parameter if desired.

# Call as: u(EXISTS_FN, type, ID)
# Returns: 0 if ID of type does not exist, and 1 if it does.
# Side-effects: None.
#
# %0 - data structure type, %1 - ID, %2 - data object
#
&EXISTS_FN #__factory=
[nonzero(neq(words(setr(lo,usetrue(%2,get(#__meta/%0_OBJ)))),1),
  setq(lo,extract(r(lo),inc(div(%2,v(%1_MAX))),1))
)]
[hasattr(r(lo),%0_%1)]
-

One last piece of magic: Every data type can have up to 32 flags; "flags" must be one of the field names chosen in order to enable this. These flags are stored as a bitfield. So we need a couple of functions to manipulate flags.

We create a generic function that we use to set and unset flags, calling it with an identifier and a list of flags that we want to set or unset; to unset a flag, just precede its name with a !.

# Call as: u(FLAGMOD_FN, identifier, list_of_flags)
#          list_of_flags can contain flag and !flag lists
#          This is used to set and unset flags, respectively.
# Returns: Nothing.
# Side-effects: Modifies identifier.flags global register.
#
# %0 - identifier, %1 - flag list
#
&FLAGMOD_FN #__factory=
[setq(%0._f,v([r(%0.type)]_FLAGS))]
[setq(%0._d,elements(%1,matchall(%1,!*)))]
[setq(%0._u,setdiff(%1,r(%0._d)))]
[nonzero(r(%0._d),
  setq(%0.flags,
    bnand(r(%0.flags),
          ladd(iter(r(%0._d),
               iftrue(match(r(%0._f),delete(##,0,1)),power(2,dec(#$)),0)))
    )
  )
)]
[nonzero(r(%0._u),
  setq(%0.flags,
    bor(r(%0.flags),
          ladd(iter(r(%0._d),iftrue(match(r(%0._f),##),power(2,dec(#$)),0)))
    )
  )
)]
[setq(%0._f,,%0._d,,%0._u,)]
-

Then we need a function to check if an entity possesses a flag. We can just call it with the identifier and the flag we want to check for.

# Call as: u(FLAGGED_FN, identifier, flag_name)
# Returns: 0 if the entity doesn't have the flag, 1 if it does.
# Side-effects: None.
#
# %0 - identifier, %1 - flag to check for
#
&FLAGGED_FN #__factory=
[iftrue(match(v([r(%0.type)]_FLAGS),%1),
  band(r(%0.flags),power(2,dec(#$))),
  0
)]
-

Finally, we want to have a quick-and-dirty way to display all data associated with an entity. We'll almost certainly write our own custom data views, but this is very handy for debugging purposes, and we'll try to make the format nice enough that it's a reasonable view until you get around to writing something nicer for a given data type. For a bit of customization without having to write something totally different, you can set the register identifier.show, which should be formatted text to show between separators, after the main body of data is shown.

# Call as: ulocal(SHOW_FN, identifier)
# Returns: Displays dump of data for an entity.
# Side-effects: None intended; call with ulocal(). 
#
&SHOW_FN #__factory=
[setq(f,iter(v([r(%0.type)]_DATA),capstr(##),%b,`))]
[setq(l,usetrue(v([r(%0.type)]_LABELS),%qf))]
[nonzero(setr(m,match(%qf,flags,`)),
  [setq(f,replace(%qf,%qm,flagwords,`))]
  [setq(%0.flagwords,
    iter2(setr(b,v([r(%0.type)]_FLAGS)),iter(%qb,power(2,dec(#@))),
      nonzero(band(r(%0.flags),#+),##)
    )
  )]
)]
[setq(w,add(2,lmax([strlen([r(%0.type)] ID)] [iter(%ql,strlen(##),`)])))]
[setq(r,sub(40,%qw))]
%xb[repeat(-,78)]%xn%r
[ljust(%xr[capstr(r(%0.type))] ID:%xn,%qw)] [ljust(r(%0.id),%qr)] /@@ @@/
[ljust(%xrEditor:%xn,11)] [Color(r(%0.editor))]%r
[ljust(%xrOwner:%xn,%qw)] [ljust(Color(r(%0.owner)),%qr)] /@@ @@/
[ljust(%xrEdit Time:%xn,11)] [convsecs(r(%0.etime))]%r
[iter2(%ql,%qf,
  [ljust(%xr##:%xn,%qw)] [r(%0.#+)],
  `,%r
)]%r
[nonzero(r(%0.show),
  %xb[repeat(=,78)]%xn%r[r(%0.show)]%r
)]
%xb[repeat(-,78)]%xn
-

And that's it. All we have to do now is to define data types.

Defining a Data Type

All information for data types is stored on #__factory. A type name is a single word; for convenience, it should probably be a short word, like "ship". The definitions consist of the following attributes:

type_DATA: a space-separated list of field names
type_LABELS: a `-separated list of user-friendly field labels
type_DEF: a `-separated list of field defaults
type_FLAGS: optional; a space-separated list of flag names
type_MAX: optional; the maximum IDs to store on one data object

All field and flag names should be lowercased. Also, make sure that no field name ever starts with an underscore _, because that's used for variables internal to the factory code.

An Example of Usage

Here's a ship example:

&SHIP_DATA #__factory = name xpos ypos homeport
-
&SHIP_LABELS #__factory = Ship Name`X Coord`Y Coord`Home Port
-
&SHIP_DEF #__factory = Unnamed Ship`100`100`Earth
-
&SHIP_FLAGS #__factory = needs_repair in_hyperspace stolen
-
&SHIP_MAX #__factory = 100
-

You also need to seed the data objects by creating a data object, and doing:

&SHIP_OBJ #__meta = dbref

Some very crude examples of use (#__globals is the global command object, which we @parent to #__factory), that allow us to create, display, change the home port of a ship, and violently take a ship out of hyperspace and flag it as needing to be repaired:

# Command: +ship/create ship_name for player
#
&DO_SHIP_CREATE #__globals = $+ship/create * for * : @pemit %#=
case(0,
  hasflag(%#,Wizard),Only wizards can create ships.,
  t(setr(0,num(*%1))),'%1' is not a valid player.,
  [u(NEW_FN,%q0,this,ship)]
  [setq(this.name,%0)]
  [u(SAVE_FN,this,%#)]
  New ship created for [name(%q0)]. ID number is [r(this.id)].
)
-

# Command: +ship/show ID
#
&DO_SHIP_SHOW #__globals = $+ship/show *: @pemit %#=
case(0,
  u(OK_LOAD_FN,this,ship,%0),That is not a valid ship ID.,
  controls(%#,r(this.owner)),Permission denied.,
  u(SHOW_FN,this)
)
-

# Command: +ship/port ID at port
#
&DO_SHIP_PORT #__globals = $+ship/port * at *: @pemit %#=
case(0,
  hasflag(%#,Wizard),Only wizards can change the home port of ships.,
  u(OK_LOAD_FN,this,ship,%0),That is not a valid ship ID.,
  [setq(this.homeport,%1)]
  [u(SAVE_FN,this,%#)]
  Home port of ship '[r(this.name)]' changed.
)
-

# Command: +ship/crash ID
#
&DO_SHIP_CRASH #__globals = $+ship/crash *: @pemit %#=
case(0,
  hasflag(%#,Wizard),Only wizards can crash ships.,
  u(OK_LOAD_FN,this,ship,%0),That is not a valid ship ID.,
  u(FLAGGED_FN,this,in_hyperspace),That ship is not in hyperspace.,
  [u(FLAGMOD_FN,this,!in_hyperspace needs_repair)]
  [u(SAVE_FN,this,%#)]
  You crash the ship '[r(this.name)]'.
)
-

Easy, yes? Hopefully you'll find this kind of approach useful in your own coding.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

um, wow....

This is impressive, Amberyl.

Thank you for this!

Very cool, thanks heaps for

Very cool, thanks heaps for this.

One minor query: in SHOW_FN you have
[Color(r(%0.editor))]

Where is the Color() function from?
It doesn't seem to be part of tm3.1p6
Is it a Pennmush thing?

Whups. That is a softcode

Whups. That is a softcode function of mine that colors names according to idle time. Use name() instead. :)

sad hack's picture

When did (will?) Penn get long register names?

In a previous incarnation (easily a decade ago on M*U*S*H), I cried out to the PennMUSH devs to implement long register names. I continued to cry out for a long time here and there (in all the wrong places ... like the +softcode channel, rather than the devs mailing list). Then, one day, Javelin magically tweaked in the addition of a-z to the pre-existing 0-9 single character %q-registers. I was (and still am) extremely grateful for this (tons of my code underwent rewrites to change attribute setting stuff to register usage, where more than 10 items were being jostled... as well as many, many,simple intuitive changes like: %qp for 'player' or %qd for #dbref).

This code (thanks for putting it up here!! I didn't see any license mention. Is it public domain? Is it deferred to the c.p.o site license?) uses long register names such as...

[setq(%1.owner,%0)]

where the register name is "<foo>.owner" rather than the previous restriction of a single character (0-9, a-z).

Question (since this IS the "PennMUSH" community site):

When did PennMUSH get long register names? (And *pout* *foot stomp* why wasn't I informed?)

Or, is this simply an oversite from a programmer already adjusted to something Penn doesn't yet have (but, if they'd listened to me a decade ago could have had FIRST!)?

If it's the latter, is there any anticipation of adding long register names to Penn?

p.s. I've been off-Net for the past couple years, so I may simply need to do some of my own catching up (yes, I did download and install a copy of the Win32 executable a couple weeks ago on an old friend's PC, so I'm beginning to make strides toward a reincarnation).

javelin's picture

Not yet

That code's written for TinyMUSH. Penn does not have long register names. You'd have to check with the devs for plans to implement them. Because it's a good and obvious idea, and because Penn prefers to be softcode compatible with other mush flavors as much as possible, there is likely no philosophical objection to long register names -- it just may be lower on the priority list than the available dev time, and no one else submitted a patch for it.

Looking back, there was a discussion of one implementation of this in 2005 (ticket 6784), but the dev majority felt that particular implementation was too wasteful of memory in the stack. Walker suggested another implementation that would be more memory-efficient, but there appears to have been no progress in actually doing it since then. Maybe this will spur someone to do it.

Aside for other readers: Just because this is the PennMUSH community site doesn't mean that stuff that uses other mush flavors shouldn't appear. It's very welcome.

I keep forgetting that

I keep forgetting that PennMUSH has not done named registers. It is a TinyMUSH addition from 2002, and I think names are a big key to readable MUSHcode.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.