get the facts: Ship XML or HTML? That is the question.

Sean Wheller sean at inwords.co.za
Sun Jun 19 18:39:27 UTC 2005


Recent discussions around help viewers has resulted in much confusion, 
miscommunication and division. The disarray and division is both internal to 
the Ubuntu documentation team and external with a few influential individuals 
in the development team. The root cause for this disarray begins and ends 
with myself. I made what I now see is a controversial decision to ship Ubuntu 
Documents as HTML instead of XML. when I made this decision, it was not 
without due diligence or consultation with the community, but I now see that 
people either have not understood my previous communications or have 
forgotten the discussion we have had in the past. For whatever reason, not 
understanding the reasoning behind this decision, people have jumped to all 
sorts of conclusions and mixed a number of issues. The outcome of all this 
has resulted in some members of the Ubuntu Documentation team wanting to hold 
a technical board meeting in order to decide whether or not Ubuntu documents 
should be shipped as XML or HTML. Recent conversation has failed to change 
the position of these members.

I am not convinced that we need a technical board meeting to decide this 
matter, I would much rather this decision be made internal to the team. I do 
not see the need to invoke structures such as the technical board without 
good cause. However, people want to go ahead with a technical board which 
leaves me with little choice but to go with the flow on this one. So here is 
a document that, I hope, will convey not only my logic for not wanting a 
technical board meeting but also my reasoning for wanting to ship ubuntu 
documents in HTML as apposed to XML.

Before starting I feel that some historical context is required. I would 
therefore like readers to please read the following artifacts (message items) 
and their resulting threads.

1.On 11/01/2005 Sean Wheller posted a Request for Comment (RFC), “Online Help 
Systems.” [http://lists.ubuntu.com/archives/ubuntu-doc/2005-January/000944.html]

2.On 19/01/2005 Nick Loeve posted, “format of distributed 
docs” [http://lists.ubuntu.com/archives/ubuntu-doc/2005-January/001020.html]

3.On 02/03/2005 Jeff Schering posted, “yelp doesn't do xref or 
trademarks” [http://lists.ubuntu.com/archives/ubuntu-doc/2005-March/001291.html]

4.On 04/03/2005 Sean Wheller posted, “To yelp or not to Yelp? that is the 
question” [http://lists.ubuntu.com/archives/ubuntu-doc/2005-March/001322.html]

5.On 13/04/2005 Sean Wheller posted, “[wanted] thoughts on repos 
structure” [http://lists.ubuntu.com/archives/ubuntu-doc/2005-March/001528.html]

6.On 01/06/2005 Mathew East posted, “Possible documentation 
viewer” [http://lists.ubuntu.com/archives/ubuntu-doc/2005-June/002457.html]

Hopefully you have read these artifacts, for they serve to show that this is 
not a new discussion but one that has been on the burner since January 2005. 
In addition to these artifacts numerous discussions where held on #ubuntu-doc 
on or around the dates listed above. For purpose of brevity I will not list 
them here. In addition a number of communications have taken place off-list. 
These messages for privacy reasons are not admissible here. However on 
artifact definitely work noting is the Documentation Team Meeting held on the 
12/03/2005. The summary of this meeting can be reviewed here 
[https://wiki.ubuntu.com/DocumentationTeamMeetingSummary3]. Within this 
summary I would like to draw your attention to the following topics:

* Open bugs needing to be solved for Hoary

* Desktop-neutral documentation format

The first topic indicates some of the problems encountered last minute in 
Hoary. The second highlights the acceptance of the Ubuntu Documentation Team 
to accept “documentation desktop-neutral, at least in the format of the 
sources.”

With these artifacts and the final acceptance of the team in the meeting of 
12/03/2005 the decision to create desktop neutral documentation sources was 
accepted. Based on these proceedings I made the decision to ship HTML. The 
justification for which I shall now provide.

So what is the problem? Why ship HTML instead of XML?

To answer these questions we need to understand some technical fundamentals 
and build from this understanding:

1.How does Yelp and Docbook XML Work?

2.What are the advantages and disadvantages of this approach?

Yelp is a Help Viewer for the GNOME Desktop Environment. As with most 
open-source projects the GNOME Documentation Team uses Docbook XML as the 
standard, presentation neutral format in which to store documents that are 
the user manuals to applications. Please note I have said “presentation 
neutral format” this is intentional since XML was not designed as a format 
for viewing. Instead it was designed to separate the concerns of data and 
presentation layers. Unlike HTML, XML does not define how data will be 
presented. Many may be confused at this point for what they see in Yelp is 
nicely formatted text complete with fonts, styles and colors. To understand, 
let's look at what happens for Yelp to create this presentation.

First the Yelp team have developed a set of XSLT files. These files are not 
the same as the XSLT files developed by Norman Walsh and that form a part of 
the Docbook project at sourceforge. The Yelp XSLs (yelp stylesheets) are 
compiled into Yelp. There job is to transform the Docbook XML files created 
by the GNOME Documentation Project into a presentable layout and formating 
under Yelp. So when a user makes a request for a User Manual on a GNOME 
application, Yelp reads the XML file and transforms it using the Yelp 
Stylesheets. The advantage of this approach is that XML is dynamically 
transformed, at time of viewing, into a presentational format that users can 
read. It saves GNOME developers and authors from having to first transform to 
a presentational format such as HTML before packaging and shipping the User 
Manuals for GNOME. Other than this advantage, there is really no other 
benefit to this approach.

While the Yelp approach of dynamically transforming XML into a usable 
presentational format is cool, even the way to for the long term, it does 
have some drawbacks. Please note I do say, “the way to go.” I therefore agree 
that from a technological perspective the direction shown by Yelp is a good 
one. So what's the problem? I seem to be agreeing with Yelp and supporting 
the case to ship XML.

There is no single problem. Rather a collection of problems.

The first problem is a simple one, but nevertheless a complaint raised by 
users. The problem is that in order for Yelp to make a document presentable 
it must transform the document. This means that as a user moves between 
documents Yelp must transform and then render the document before the target 
document can be read. This results in a slow performance that many users on 
slow computers find very annoying. The result is a less that favorable user 
experience and an eventual reluctance to use the help system.

Next problem. The Yelp stylesheets do not support all Docbook elements and 
their organization as defined by the Docbook Document Type Definition (DTD). 
For the record, the DTD is a standard managed by the OASIS Docbook Technical 
Committee. The problem here is that authors cannot use Docbook as it is 
defined in the DTD as a result many features are excluded from an authors 
possibilities. 

Why is it this way? Well, the fact is that “Yelp is a Help Viewer for the 
GNOME Desktop Environment.” Docbook is large and very powerful, the Yelp 
Developers and the GNOME Documentation Team have not needed nor wanted to use 
all the features of Docbook. So they have focused their attention on 
developing support for those features and functions required by GNOME. This 
approach is totally understandable as it would take an enormous effort for 
anyone to develop fully Docbook compatability. As a result the Yelp 
developers are focused on GNOME and also do not have capacity to facilitate 
the needs and wants of every project.

While I understand and agree with the reasoning of the GNOME Documentation 
Team and the Yelp Developers not to support all of the Docbook standard it 
does not help solve problems and wants of people at ubuntu-doc. I have not 
listed the unsupported things here as they are discussed in the artifacts 
listed above, in particular artifact “01/06/2005 Mathew East, “Possible 
documentation viewer.”

Next problem. In addition to not implementing a full support for Docbook, the 
GNOME implementation also uses three methods that are proprietary to Yelp. 
The first is a processing instruction that determines the level to which the 
table of content will be expanded or collapsed in the tree view pane of the 
Yelp workspace. The instruction looks like this.

	<?yelp:chunk-depth 3?>

The second method proprietary to yelp is its implementation of inter-document 
cross references. To create external cross references GNOME have used the 
xref element, which looks like this.

	<xref linkend=””/>

This is a standard and valid Docbook element. However, in order to create an 
inter-document reference Yelp uses the following syntax in the value of the 
linkend attribute.

	<xref linkend=”ghelp:foo-app”/>

This means that the xref will only work in the runtime environment that is 
Yelp.

The second method used by Yelp is not so much proprietary as just forcing 
authors to do something in order to comply with Yelp requirements for 
generating a toc. It means that every chapter or sect* node in the document 
must declare an id attribute in order for it to be displayed in the toc. When 
an id attribute is missing the node is just not displayed. On the face of 
this is not a big problem. But it does force authors to add id attributes 
when they are not required other than buy Yelp.

The overall of these problems is that they create incompatibility. First the 
XML is Yelp specific and second if you where to try transform to another 
format the xrefs would not work. The processing instruction also needs to be 
commented out in order to validate the document. The use of the ghelp feature 
also defines a namespace that is not part of the Docbook XML standard. In 
order for it to work document declarations must be expanded to include the 
new namespace.

Next problem. Working solely in a GNOME environment, the issues discussed so 
far are not a problem but what happens when you have more than one desktop 
environment to cater for. I remind you that Yelp is a Help Viewer for the 
GNOME Desktop Environment. Yelp compliant XML is not supported by KDE or any 
other desktop environment. So how does KDE do it? Well while they also store 
documents in Docbook XML, they ship HTML. This means all of the KDE help is 
available in any application capable of rendering HTML. Oh, did I mention 
that Yelp 2.10 can render HTML. I think it was mentioned in one of the 
artifacts presented earlier. In KDE it is worth noting that the Help Viewer 
is the KHelpCenter. It does not have the capability to dynamically transform 
Docbook XML into a presentational format. Interestingly enough in KDE an 
applications help can be accessed in more ways than just KHelpCenter. For 
example with Konqueror, the KDE file manager, you can use kioslaves to call a 
user manual using the syntax help:foo-app.

In addition to the fact that Docbook is a standard for doing documentation in 
open-source projects, there are a number of reasons why it is good to use 
Docbook. I will not list them all here, I only refer to features we can use 
at ubuntu-doc and explain in short the benefits to us.

The first is content reuse. Within Docbook as an XML format we can easily 
reuse or re-purpose content. So for example, content that is the same between 
Ubuntu and Kubuntu can be easily shared between documents. Sometimes the node 
being reused may contain a cross-reference between documents. If the ghelp 
method was used, we would not be able to use the content in this way. Another 
example is where documents are so much the same that it is worthwhile 
managing both gnome and kde versions in the same XML-instance. In this case 
we use profiling. Profiling entails marking XML nodes in a way that at time 
of processing more than one variant of the document can be output depending 
on the profile selected. yelp has no support for this feature as a result we 
would have to duplicate content and maintain duplicates over time. Update of 
information is therefore not just a case of changing something in something 
in one place, but seeking each instance of that something and updating that. 
Overtime this overhead can be very time consuming.

Moving on. We need to ask the question, “What is the advantages and 
disadvantages of shipping HTML? Again to do this we need to look at the 
technical issues.

1.The disadvantage of shipping HTML is that you must transform to HTML before 
creating a package. This is the only disadvantage I can see.

2.There are many advantages to shipping HTML. 

	a)Rendering is fast as there is no need for transformation from XML to HTML. 
	b)Ability to customize formatting using CSS.
	c)Ability to customize layout and interactivity.
	d)Ability to be viewed with multiple user agents, including Yelp.

From a payload perspective, the difference in size of an XML package in 
comparison to a HTML package is negligible.

Having said all this. Let's me now put technical issues aside and address why 
I feel that a technical board should not be invoked in making the choice to 
ship HTML. Although I do ask that one not forget the technical explanations I 
have provided and that the artifacts presented also be kept in mind.

My primary problem with invoking a technical board is that while I have 
provided convincing technically founded arguments to justify my reasoning for 
creating desktop-neutral source formats and shipping HTML in order to ensure 
maximum interoperability between desktop environments and user agents, I have 
yet to receive an apposing argument of the same nature. As this is the case, 
I feel that no evidence has been present for me to reconsider my position 
prior to invoking a technical board hearing. Since an opposing argument has 
not been presented, I feel that the technical board has nothing to compare 
and therefore no basis on which to make an educated decision between XML and 
HTML.

Further to this I am concerned that the composition of a technical board would 
be comprised of ubuntu members. It is no secret that Ubuntu and its community 
members are extremely pro GNOME. My fear is therefore that religious passion 
and a tendency to protect GNOME interests, combined with a lack of technical 
understanding and depth of perspective on the subject, will over impair a 
technical boards ability to make informed and fair judgment on this matter.

In closing I would like to make the following points:

1.My proposal is not without history or consultation with the community. I 
have therefore not made a unilateral decision.

2.My proposal is for ubuntu-docs not gnome or kde docs. For purpose of clarity 
these terms are used as follows:

* ubuntu documents (ubuntu docs) – documents that are specific to ubuntu or 
kubuntu. These documents are not developed nor maintained upstream and have 
no place moving upstream.

* gnome documents (gnome docs) – documents that are specific to a gnome 
application. These documents may be developed at ubuntu-doc but will move 
upstream when the gnome application does.

* kde documents (kde docs, kdocs) – documents that are specific to a KDE 
application. These documents may be developed at ubuntu-doc but will move 
upstream when the KDE application does.

3.While there has been discussion regarding various help viewer applications 
my proposal is aimed toward “any browser” compatibility. since yelp can 
render HTML I am not advocating the elimination of Yelp as a tool for viewing 
ubuntu-docs.

4.I have made mention of creating a web-based application. This has been 
incorrectly interpreted as a proposal to build a new help viewer agent 
application. For the record the use of the term web-based application was 
used to describe a help system based on HTML, employing CSS for formatting 
and possibly javascript in order to impliment interactive functionality 
within pages. 
* web-based application – an application that is delivered over the Internet. 
Typically browser-based. http://en.wikipedia.org/wiki/Browser-based

Understanding of these terms and points is important and I believe will go a 
long way to dispelling part of the current confusion.

In light of the current situation I have announced in private to core members 
of the Ubuntu Documentation team that I am reconsidering my position within 
team and project as a whole. I feel that there have been several instances 
where I have embarked on initiatives in service of the project and they have 
been total time wasted when people external to the team have chosen to 
completely disregard the communication and collaboration that had taken place 
in order to start such initiatives. My feeling is that such people have not 
taken the time to understand nor ask about the history and reasoning for such 
initiatives. Instead they have taken the influence of their position or 
standing in the community as a warrant to oppose.

By the same token it concerns me that the Ubuntu Documentation team has not 
rallied to support such initiatives and, I think,  based on the standing or 
influence of such individuals, decided to side with such persons. Such 
abandonment of historical team decisions is concerning but more it is 
disruptive to the vision and direction of the team.

In such circumstances I find myself, once again, feeling like I have been full 
gas in neutral for the past few months. I do not wish to be in this position 
now or in the future and have therefore decided to take a break from 
activities until such time as resolution is found to the current situation. 
The direction in which the decision to ship XML or HTML is made will help me 
decide whether or not I continue to be an active member of the Ubuntu 
Documentation Team. I wish to stress that this is not blackmail. The 
community has the right to decide democratically on the technical direction 
vision of the documentation project. However, should the project decide to 
ship XML, a decision which demands yelp compatibility, then my work of the 
past half year would have been a waste of my time. I personal do not wish to 
undo this work. I also do not wish to be put in a position where I feel that 
I am having to compromise on a lesser solution that I know is possible.

I hope this document has explained my actions and position on the topic. I 
welcome discussion and questions. If the community still feel the need to 
invoke a technical board, I shall respect that decision and, in the event of 
a motion in favor of shipping XML, I trust that the community will respect my 
choice to abstain from the project.

Sincerely,

-- 
Sean Wheller
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <https://lists.ubuntu.com/archives/ubuntu-doc/attachments/20050619/9212dbd9/attachment.pgp>


More information about the ubuntu-doc mailing list