Sunday, December 24, 2017

Further Exploration of Needs

Last time I talked about requirements in terms of some user stories and a general description of what I need. Now, taking the hat of a developer/analyst I really need to dig a little deeper and start to put together a design. The design activity will then feed into some system architecture, which is mostly going to consist of assessing various tools and picking some.

So, what do I actually want here? What really is the genesis of the need, the itch that wants to be scratched which leads to putting in the time and effort to design a whole web-based application? In a nutshell I want to be able to author content in a way that is reasonably simple and yields good results. I'd like the material to be in a format which could potentially be useful in other contexts (IE some sort of fairly standard and well-supported markup that has support for doing things like generating PDF, Postscript, LaTeX, or something similar that can potentially be published or merged with other content). This desire comes from an appreciation of the value of flexibility in information representation and the great utility of reuse. I'm going to author content, I want it to have maximum value and yet be simple to create and straightforward to reuse. Again, this is why most CMS/Wiki solutions don't work, the results are anything but portable!

One big area of research this brings up is obviously what sort of content data representation to use. The world is replete with different options for markup languages, WYSIWYG editor/word processor formats, etc. Word processors don't seem to be of much use here. Sure, you can author most anything in Word, or OO Writer, or utilize something online such as Google Docs, which can support export/import in several formats. Most of these tools will inter-operate to some extent, but they all eventually fall short in some area or other. All of them are difficult to use to produce online content with much quality. Beyond that, I haven't seen much evidence that word processors are really very adept at producing book-length content at the level of quality needed to do something like print publishing. Certainly its a huge challenge, and if you've invested your time and effort into carefully formatting/laying out your content in, say, Word, then you're likely to have to do it all again in Scribus or InDesign to produce high-quality output. Word processors tend to be difficult applications to get your data out of as well. You can export, but the target formats and results are usually too limited and messy to be useful in a toolchain, or else you simply lose all the formatting you created in the first place.  So, actual content generating, writing text, could be done in OO Writer, for example, but you'd be basically using it as a text editor, so why bother?

The alternative are markup languages. HTML is one which we're all familiar with, but its not an easy language to author, and it doesn't tend to lend itself to particularly well-structure results which can be easily moved into other formats and re-used. There are MANY 'wiki formats', basically small markup languages which are intended to render as HTML, such as TWiki markup. These are OK, but are generally only well-supported by one or a very few applications, with options for conversion to other formats either limited or of low quality.

This leaves roughly three contenders for a good markup language that will support flexible reuse and work pretty well with HTML, LaTeX, Docbook, and Markdown. LaTeX is the grand-daddy of all markups (well, there is runoff and its ilk, but these are little-used today). It certainly has the expressive power required to do anything you want. You could author the Encyclopedia Britannica in LaTeX (and maybe they do, I have no idea). Of course this is also a two-edged sword. LaTeX is powerful, but it is also really complex. You pretty much have to be a LaTeX Guru, or have access to one, to do a good job of authoring LaTeX documents.

Docbook is another XML-based markup; this one is much better structured and was designed with publishing tool chains in mind. It naturally converts to HTML or other XML-based markups pretty easily. It can also be converted to LaTeX without a vast amount of trouble. Its fairly standardized, and it inherently supports breaking down projects into chapters/sections, which fits pretty well with the idea of displaying our content in a sort of wiki-like form.

Markdown is the other option that I've really looked at. It is another of the sort of 'wiki editor' languages. It's main claim to fame is ease of authoring. Like most wiki type formats it can be typed in quickly using pretty much any sort of text editor, or even a basic HTML textarea control. Its main weakness is more limited expressiveness than other options. While you can create a lot of content in something like GFM (Git Flavored Markdown) it lacks features like being able to specify headers and footers, floating elements left and right, borders and backgrounds, and any number of other fairly basic formatting capabilities you might want in a published document. Still, it is widely supported, albeit in a number of semi-compatible dialects, and works well online.

I don't think there's a perfect answer here. As with many such cases, maybe it is best to work with a few of the options and see how they pan out. Its not clear that we must lock ourselves into a single choice that cannot ever be revisited. It might be hard at some point to pull together content in several formats into a single cohesive document -if we were to desire to publish our contents in print for example- but at the same time premature decisions are bad decisions, so lets consider what we might do about this and look at the solution space and architecture, then perhaps perform some experiments (spikes as we say).

No comments:

Post a Comment