On formats

Submitted by Cheetah on Mon, 2011-08-15 09:35

So last time I pulled a markup language out of thin air and presented it as the most likely candidate so far for representing the data for the new protocol. Like I'd mentioned, this wasn't really something I planned to think about this early in the development stages but hey, guess plans don't always work out. Most of the reason was that I'd already given it some thought, and I needed a quick example to show, because it's hard to communicate ideas at the level of abstraction of 'there is no representation, just think of data hanging off a bit of text'. Yeah.. In related news, there is no spoon.

Anyway.. I've already fielded some questions on why I didn't pick this other already existing format or another, and since these questions will come up more often, let me answer them here now.

There is already another format that can do this.

Technically not really a question, but it comes up, and to avoid duplication for all covered formats as well as some I don't mention, I'll cover it first.

Yeah, there are many formats out there to do things that look to lesser or greater degree like what I'm doing here. But please, before you suggest one ask yourself, do you understand the format? Can you see how the format would cover the needs of the new protocol? If you can't give specific, technical reasons why a format would be useful, please look into the format first and then get back to me.

Why not XML?

Well, aside from the problems often noted with XML, there are a few reasons specific to the problem at hand.

XML is ridiculously over-featured for this. We don't need namespaces or validation or DTDs. We just need to toss a little bit of markup over some bits of text. Entities are pretty meh as well, as they add a separate class of things we need to parse, more characters that can't be used as a literal and thus add more noise, which doesn't help with debugging. In fact, even attributes are way overkill. Most use cases need only one at most. In addition, XML is much weightier than ESE, what with all the extra header bits needed and closing tags and all that. Though we live in a time when most users have broadband, that's no reason to toss more bytes over the line than we need to.

In addition, it incurs the cost of having an XML parser. These are not trivial to write. Of course a library could be used for that, but then a codebase's support for the new protocol would rely on the chosen library to be installed. This would be less of an issue for clients, which can bundle the library easily. Still, both client and codebase devs would be required to learn the API to some third party's code. The example implementation of ESE is, depending on some details, not much larger, or potentially smaller than doing the same thing with a third party XML library, except it represents all code required to do the job, and the person implementing it will have written the implementation themselves, making the entirety of what it does easier to understand.

Finally, among the people I personally talk to that will have to implement the protocol should I get it anywhere, none that I know of are proponents of XML, but quite a few of them dislike it.

But XML is a standard.

Correct. But just because it's a standard doesn't mean it's good. EBCDIC is a standard, but it's not good. And just because it's a standard doesn't mean it's suited to the task. SMTP (to take an extreme example) is a standard, but it's clearly unsuited for this task. Though it's of course far closer than SMTP is, I don't feel XML is suited to this task, either.

Why not MXP?

MXP, more than XML, suffers from having too many things that aren't related to what I'm trying to achieve. Meanwhile, the benefits are limited because a client that already supports MXP will still need to implement specifics to my protocol, so the only thing we save is the relatively simple ESE parser.

Worse, clients that don't support MXP (the majority of them, I'm given to understand) would first have to implement MXP (including its trickier and, for this protocol, unnecessary bits) and then mine. This would only decrease the odds that a client dev would adopt my protocol, unless they already support MXP.

In addition, MXP sends some non-printing sequences which would make debugging more bothersome. It doesn't hash well with the security model I have in mind at all, either. And generally it's just not that well suited to MUSHes. I'll consider the fact that, to my knowledge, none of the MUSH family of codebases (my target audience) have implemented MXP as evidence for this. In fact, MXP suffers from the same 'not designed with MUSH in mind' problems that Pueblo suffers from. We need a protocol that can be implemented safely and securely on a MUSH. IE: Without either making it unusable for mortals, or making it dangerous.

And again, from what I've heard from people who'd have to implement it, I've heard little love for MXP.

Why not JSON?

JSON is not a markup language. It's a data interchange format. There's nothing wrong with that, but that means it might turn

<Softcode> Cheetah says, "This is text."

into

{ type: "channel", channel: "Softcode", player: "Cheetah", say: "This is text." }

for example. The problem is that I'm no longer marking up bits but I'm supplying them in a way that no longer resembles the original. The client would now need to know every single type of message that it could possibly receive, or it can't turn it into something useful. This is fine with something like IRC, which has done things in a vaguely similar way from day one, but it completely wrecks any interoperability with old clients and makes it impossible to slowly transition an existing client to using the new protocol. It has to be done all or nothing. And in practice this will result in it being nothing. The main reason my protocol is getting a foot in the door at all is because it'll be very simple to implement a bare minimum to go back to a situation where nothing is lost, but functionality can be added gradually.

Edit: I forgot about the possibility, but someone was kind enough to remind me.. Of course you could still include the entire text in the JSON in a standardized way, but that would cause a lot of duplication and interesting issues when nesting. It might help if you could include multiple counts of 'text: "<"' and such to build the original string back that way, but I'm not sure that's allowed, nor am I sure JSON guarantees keeping the same order. In either case, it still seems like it's not the best tool for the job, as it neither is nor aspires to be a markup language.