He has no friends, but he gets a lot of mail.

Thursday, 3 December 2009

Single sourcing is a bad name for multiple sourcing

Michael Hiatt's The Myth of Single Source Authoring has caused a bit of comment in the technical communication arena.  But it's not an attack on single sourcing, at least the way I understand it. The clue is in his second paragraph:
And who will be our emerging heroes to fill the promise of content reuse and localization savings? Knowledge mashups and applications using cloud-based linked data and the emergence of the semantic Web.
Wait, what? Single sourcing is essential for knowledge mashups! Let me explain:



Single sourcing may be a bad name. Single sourcing does not mean "a tightly-controlled, single, authoritative source for all information, presented in a canonical form which will be used regardless of the output format or the audience." It certainly doesn't mean, as he puts it, "the belief that static authoring from a single vantage point from a single author paid by a single organization is a workable system". Of course it isn't. Wasn't that the precise thing single sourcing was developed to overcome?


For me, single sourcing means "for each piece of information, having an identifiable owner, and empowering that owner to act as a single source for that information, in whatever information use environment it is presented."
 

In the old days, every document had a single author (paid by a single organization), which meant that the same information was presented in different ways in different documents. And this is the most important point that Michael makes: there was nothing wrong with this, because a well-designed documentation set is broken up into documents aimed at different use environments, therefore each document should be written in a different way. The biggest mistake in the single-sourcing world is the idea that you can reuse authored topics effectively between use environments, but even Wikipedia knows that.


The author's job, in both traditional and single-sourced contexts, is to identify an information use environment, in fact, to enact an information use environment. "Information use environment" includes audience, language, culture, expectations, anything which affects how someone uses information. There are technically as many information use environments as there are occasions a person has to use information; but readers are malleable, and willing to mould their environment to some extent to fit with the information they have access to.


Once the use environment has been enacted, and agreed between author and reader, then the author can "suit" the information she presents to that environment. The difference between traditional authoring and single-source authoring is that the process of "suiting" information in a single-sourced system occurs at the single, original, source of the information, or at least at the point where it enters the author's domain, not at the point where it leaves to be assembled into a document.


For material which is authored in-house, the difference is small, since it is the author who originated the material. Likewise, for material which an organization aims at a very specific information use environment, the difference is small, since it is the author's organization that enacted that use environment.


So how can we benefit from single sourcing? The key is in that action of enacting. In a small company, the sole technical author bears responsibility for enacting the use environment. Because she enacted it, she finds she cannot reuse anyone else's information. As organizations grow, authoring teams share house standards, and the enacted use environments get codified so that authoring teams can successfully collaborate. If, as Michael says, "a writer seldom grabs a topic wholesale and places it into his or her document. Topics rarely meet all needs of the author and usually throw off the context and purpose of the document", this is a symptom of a lack of standards in the organization, so that individual authors are making their own decisions about the target audience.


Sometimes this is the right thing to do, and as an organization grows, naturally the number of use environments it is exposed to also grows; but there is always a core, the "standard documentation set", whose use environment has been fully enacted and formalized.


Which is where mashups come in. We cannot expect a mashup to be successful unless we share enacted use environments between organizations; ultimately, globally. But when this happens, it will be a revolution, because readers of any content from any organization will understand their role in the environment, it will become part of the culture of information use, not just part of the house style of an organization. (Look at the "enacted use environment" of pictograms in airports and other places: a truly global standard, which almost everyone thinks of as "intuitive" purely because it's so culturally ingrained.)


With single sourcing, once we have agreed (to whatever extent) to enact a particular use environment and write content for it, an organization will be able to re-use content from any other organization, anywhere, and it will fit in seamlessly. Without it, an organization will always have to rewrite information so that it speaks their language.


And that is why single sourcing is really multiple sourcing.

Monday, 16 November 2009

Wikis and the rule of the feature

In Alfresco Share*, teams do their work in separate shared spaces known as Sites. As well as a library for traditional documents, every Site has a selection of components which allow different types of collaboration:
  • blog
  • wiki
  • forum
So far so fair enough; but when you provide a fairly common front-end to them, these three features become very similar. In fact, I can summarize the entire set of differences pretty easily:
  1. Wiki pages can't have the same title as each other.
  2. You can't comment on wiki pages.
  3. You can't link from one blog post to another, or from one forum thread to another.
  4. You can't sort by "largest number of comments" in the blog (or, of course, the wiki.)
  5. You can have a "draft" blog post, but not a draft of the other two.
  6. You can publish a blog post to an external web site, but not a wiki page or a forum thread.
Notice something about this? All of these differences are limitations imposed on "bunch of documents with comments." When choosing to post content, I need to choose which set of limitations to work within. Alfresco Share is frustrating not because it imposes these limitations, but because it makes it so obvious that the limitations are arbitrary.**

Wiki pages can't have the same title as each other. This rule is only necessary because of the [square bracket] notation for creating links as you type, a feature that survives from the original Wiki philosophy. Ward Cunningham's wiki was designed from the first to encourage linking by making it easy. It was also designed to be really simple to code, so he decided to use a simple text box and apply a markup rule (in his case, RunningWordsTogether) to denote links. He also invented some actual markup, which has been superseded in Share by a javascript rich text editor. But the link markup remains as a relic. Linking could be made more intuitive with an application of some more javascript, and the constraint would not then be necessary.

You can't comment on wiki pages. Traditionally you would edit the actual page to reflect a discussion, which would then get refactored from "Thread Mode" into "Document Mode" as the outcome of the discussion. You can still do this in Share, if you have the rights to. But if you have these rights, you can also go into a blog post or a forum and edit it. Limiting the wiki feature like this is purely to force people to use the wiki like a wiki is traditionally used. (And in many modern wiki implementations, you can comment on wiki pages.)

You can't link from one blog post to another, or from one forum thread to another. Actually, you can, as long as you're happy typing in the URL. This restriction only exists because the wiki link markup convention doesn't apply to the blog or the forum. Again, a decent "make link" widget would completely remove this problem.

You can't sort by "largest number of comments" in the blog. This is the only real difference between the forum and the blog, except for:

You can have a "draft" blog post, but not a draft of the other two. It's a useful feature, why implement it only in one component? Many forum apps have draft functionality.

You can publish a blog post to an external web site, but not a wiki page or a forum thread. Another useful feature cruelly denied to users of 2 out of 3 components.

I don't want to appear like I am bashing Share here. In fact, we accept the arbitrary restrictions because they are a core part of the feature's identity. A wiki would not qualify as a wiki, in the minds of the Alfresco Share developers, if it completely removed the Wiki Markup. So it remains.**

The blog/forum distinction is even more subtle, because the core difference here is not even one of functionality. Simply, we are encouraged to pay more attention to the initiating post than to the comments in a blog, and to the thread as a whole in a forum. There is a whole continuum operating here. Often the first post in a forum thread is edited by the original poster to summarize discussion in the thread; likewise, some blogs are as well known for the activity in the comments section as for the original content. Political Betting being an example where the blog's authors often have to create new posts purely as a way of separating diverging threads of discussion.

Presented with a choice of site features in a system like Share, the interplay of limitations and expectations can make it hard to decide which component to use for a particular piece of information. But it seems to be important enough to us that we have a feature we can name, and assign a purpose to, that we will accept these issues anyway.***

Google Wave, to take a counterexample, has tried to approach the problem in the classic computer scientist way: by abstracting out as many of the restrictions as possible. As a result, people have struggled with working out how they are supposed to use it. Perhaps we can't win.

* Alfresco is a document management system, and Share is its "groupware" front-end. We've been trialling it as a collaborative tool as well as an improvement to the shared drive for documents. It's available under an open source license which is important for small and cheap companies like us.

**But a wiki would not qualify as a wiki, in the mind of Ward Cunningham, if (as in Share) only a restricted set of people can edit it. One for a subsequent post.

***Look at Twitter; you could have done the same thing for years in Blogger or Livejournal just by ignoring the "post body" box. But you didn't. 

Friday, 13 November 2009

Regression

We are in regression this week.

This doesn't mean we behave more childishly than usual (though maybe we do), but that the developers have merged down all their features into the trunk in source control, and now they're testing whether anything has broken.

As part of that, all my topics have also moved into trunk, and I've been building the first feature-complete documents for the upcoming release, for QA to review.

A bit of an oversight in the whole source control plan is that this leaves me working on trunk for the week. This shouldn't be allowed, but they let me because fixes to documentation bugs don't need testing.

What should happen is for me to do all my reactive work on my own development branch which stays open after the rest of the product is frozen. When the release goes out, there will always be tweaks to be made which don't go with a feature, so it's good for that too.

Indexes

Compiling indexes must be one of the least-missed aspects of traditional document writing. It's the sort of thing a professor might get a grad student to do in order to break his spirit.

Nowadays we don't need to do the actual sorting and compilation, and in an environment that promotes re-use, we can make bits of content automatically produce consistent index entries. So far so good!

But as I've been going along I've realised I missed an important aspect of index writing: index entries are not tags.

Having previously used any excuse to avoid compiling an index, and as a child (or at least cousin) of Web 2.0, I hadn't really understood what they were about, and gaily added generic index terms to my topics, such as "security".

This led to my index looking like this:

security,10
security,10
security,11
security,12
security,15
security,21
security,22
security,46

No worries, I thought, I can get the style sheet to bunch up all the duplicate entries and produce something more index-like:

security,10,11,12,15,21,22,46

I've seen this in lots of books, so it must be okay, yes?

Trouble is, it's really unusable. If I look up "security" in the index, I then have to visit 8 places throughout the document to see if that page happens to mention it in the way I need. What would be better would be to provide more context:

security, 10
security, policy, 10
security, over the network, 11
and so on.

An index is used like a search engine, and since it's not interactive, such a search engine must able to tell you in advance which queries are the "good" ones, where a query is only good if it narrows down the results to just one.

After you've disambiguated everything, then, you can start bunching up entries:

security, 10; policy, 10; over the network, 11,...

This approach clearly needs to be applied throughout the topic base, so now I'm considering doing an audit of the whole thing to see which index entries are duplicated and disambiguating them - since I don't know in advance which entries are going in which document.


The exercise of writing good index entries is also helpful when writing topics, as it forces you to really think about what this topic is about, and how it distinguishes itself from all the other topics in the repository.

Tuesday, 3 November 2009

Content independence

Today's post from Tom Johnson of I'd Rather Be Writing talks about "content independence" and how a wiki can help you achieve it.

Tom's argument starts with this observation:
  • Authors need to update documentation after release.
This is because documents have bugs in. They have more bugs in than they might, because authors don't get time to fully check their work before release. OK so this really means
  • The time required to create full documentation with no bugs is too much for a project manager to include in the plan.
Why's this? Because the product is ready; we need to release it or risk being behind the market. But we spent a lot of time in QA fixing the bugs in the product; why are bugs in the documentation worth any less to the project manager? I think there are two perceptions at work:
  1. Documentation is not absolutely essential. The product is "good enough", and will work even if nobody knows how to use it. In the worst case, we can send a support engineer over to the customer to make it work for them.
  2. Bugs in documentation are easy to fix. Most problems with documentation don't require a complete rewrite of large chunks of material, whilst bugs in software can sometimes be traced back to bad design assumptions and lead to large refactoring projects.
Also, and because of these two:
  • Fixes to documentation bugs don't need testing. Because of 2., the fix has already been adequately "unit tested" by the writer, which is normally enough, and because of 1., the product won't break if we incorporate it.
And because of this:
  • Documentation doesn't need release management. Project managers allow authors to provide what they can with the release, and fix later.
This is where Tom comes in: working in this environment, he naturally looks to wikis and other online tools to make fixing easier. But there are environments where the core assumptions don't hold, for example:

  • Correct documentation may be a legal or contractual requirement. In some fields, particularly safety-critical ones, the documentation must be correct on the first try. Many clients like to see full lists of changes to documentation as a condition of acceptance. This breaks assumption 1.
  • The supply channels for updates may be limited. In a world of wikis, we forget that people still naturally turn to the materials provided with the product when there is a problem, if only to find out where the wiki is. Also, some products are used in situations where it's impractical to check online for changes to documentation. (This leads to situations like of "If your modem is not working, check our website for instructions".) Again, this is assumption 1 failing.
  • Documentation may be widely translated. A small change to a document may require weeks of sending round to multiple translation agencies to implement it. This breaks assumption 2. (Though if you don't mind some translations lagging behind the "native" documentation, you can split the task up so that it's not a problem.)
Ultimately, whether or not documentation should be release managed depends on the product and its position in the market as much as on the documentation itself; but authors often end up making this decision on their own.

In my company, as for many smaller software businesses, I think the assumptions hold, and I'm working at establishing an online knowledge base which is updated independently of the release process. I hope that it won't take me five years to get the idea past IT, though.

Friday, 30 October 2009

Proactive and reactive

Writing the previous post it occurred to me that I work in two modes most of the time:

  • First, I work with architects and developers, helping a feature get designed for usability, and writing as we go. My work as part of that team is part of the feature development. This is the proactive mode.
  • Later, I step back to view the product as a whole, and reactively describe the feature in that context. Developers can deliver a feature, say, "that's enough", and move on to the next one. I don't have that luxury; people will experience the feature as part of the complete system, and they need to know how to find it and work with it in the context of everything else the system is doing.
The "structured" / "traditional" authoring debate is a struggle between these two modes. Each style pulls in the favour of one mode, although we always need both.

  • Structured authoring, as exemplified by DITA, is a proactive mindset. Topics should stand alone and tell people exactly what they need right now. They are authored separately from each other, and pulling a document together is deferred until the last possible moment (and may never happen, in the case of a knowledge base.)
  • Traditional authoring is a reactive mindset. The product is delivered fully-formed and is documented as a unit. In the OpenOffice authoring project authors work on entire chapters, which are organized according to user needs. Despite being an important part of a huge, global open source project, it's entirely divorced from the collaborative, proactive mode which is natural for the developers of the actual product.
Personally, I like being able to work with developers, and to promote design choices which improve usability, and the reuse abilities of structured authoring allow me to maintain, on my own, a much larger information base than I would be able to if I had to manage documents individually.

But it's in the reactive mode that information gets consolidated and where, in the end, the user's experience of the product is decided for better or for worse.

Source control

A big advantage of using structured authoring for me is the ability to use our existing source control system for documentation.

Source control is all about people working simultaneously on a large set of files. An SCM tool is a collaboration tool, not a software development tool. Sure, it's quite a technical one, but SCM tools have evolved to deal with enormous codebases, and thousands of developers working on the same product.

As a software company, our developers work in source control all the time. A team goes off and develops a feature, and comes back, and merges the feature back into the product. That merge process is largely automatic, and once it's done, the feature gets incorporated into the product with no fuss.

I can do exactly the same for the documentation. I don't even have to be present when the feature lands: a lot of the time, the only file that conflicts is the map file, and the SCM system is smart enough to figure it out for itself.

Even if I wasn't at a software company, this idea that you can develop new features separately and easily combine them later is useful enough that I'd probably want to maintain a SCM system of my own to do it with.





* Actually, it doesn't even ask this most of the time, but if you have one file for each topic, it makes merging easier.

Twitter / dajlinton