discuss: HTML-to-docbook


Previous by date: 5 Aug 2001 01:24:40 -0000 HTML-to-docbook, David Lawyer
Next by date: 5 Aug 2001 01:24:40 -0000 Re: HTML-to-docbook, Poet/Joshua Drake
Previous in thread: 5 Aug 2001 01:24:40 -0000 HTML-to-docbook, David Lawyer
Next in thread: 5 Aug 2001 01:24:40 -0000 Re: HTML-to-docbook, Poet/Joshua Drake

Subject: Re: HTML-to-docbook
From: Sandy Harris ####@####.####
Date: 5 Aug 2001 01:24:40 -0000
Message-Id: <3B6CA075.9671EB14@storm.ca>

David Lawyer wrote:

> You know that a presentation-based markup like HTML can't convert into
> Docbook.

HTML was designed as a structure-tagging system where the browser handles
the presentation. See W3C's Guidelines for authoring:
http://www.w3.org/MarkUp/#guidelines

> Suppose the author has used a certain
> color-coding/font-coding for content.  The converter will not
> understand most of this.  So you may have a converter but it will be
> very lossy.

Good.

A section of those guidelines is titled "Font tag considered harmful".
I've had to write sed scripts to remove font tags and other rubbish 
from HTML. Dropping those things gives you better HTML.

Dropping them during a translation really is a feature, not a bug.

> Look at the docbook tags and tell me what the equivalents
> are in HTML.  Most of them have no HTML equivalent and thus can't be
> mapped from HTML to docbook.  Of course it can be done the other way
> around (DocBook->HTML) by selecting a mapping from content to
> presentation.

You have it backwards. Docbook tags that have no HTML equivalent are
not needed in an HTML->docbook mapping. What would be a problem would
be a useful, therefore structural, HTML tag for which there was no
reasonable DocBook equivalent.

Another problem is missing close tags in HTML. Most browsers are
quite tolerant of this, but it would be hard to write a translator
that coped well in all cases. An easy solution is to just run the
HTML through Amaya (open source browser/editor from w3c.org) first.
It parses it, inserts additional tags, and can save as HTML or
XHTML.

Of course there may well be some manual work to do on the DocBook
text after translation, but the bulk of a conversion should be
automatable.

Previous by date: 5 Aug 2001 01:24:40 -0000 HTML-to-docbook, David Lawyer
Next by date: 5 Aug 2001 01:24:40 -0000 Re: HTML-to-docbook, Poet/Joshua Drake
Previous in thread: 5 Aug 2001 01:24:40 -0000 HTML-to-docbook, David Lawyer
Next in thread: 5 Aug 2001 01:24:40 -0000 Re: HTML-to-docbook, Poet/Joshua Drake


  ©The Linux Documentation Project, 2014. Listserver maintained by dr Serge Victor on ibiblio.org servers. See current spam statz.