discuss: automation cleanup of source LDP tree (informational)
Subject:
automation cleanup of source LDP tree (informational)
From:
"Martin A. Brown" ####@####.####
Date:
27 Jan 2016 01:11:23 +0000
Message-Id: <alpine.LSU.2.11.1601261656560.2025@znpeba.jbaqresebt.arg>
Hello,
> 1. automation: Be able to (re-)process and (re-)publish all of
> our existing documentation in an automated fashion.
This is a description of work I have already accomplished and
committed to my own git repository.
Automation cleanup (source):
----------------------------
Many of the documents at git HEAD [0] in our main LDP/howto tree
sport validation errors when processed with toolchains running on
modern Linux releases (i.e., OpenSUSE-13.2 and Ubuntu-14.04.3). I
have a (local) git repository with hundreds of corrections to
source files in all formats (Linuxdoc, DocBook SGML and DocBook
XML).
I would characterize these corrections as non-editorial--i.e. they
are technical only, to allow each document to validate and to allow
the processor to generate outputs.
The only substantive change I have made in the cleanups is to move
any <graphic/>, <mediaobject/>, or <inlinemediaobject/> images into
an ./images/ directory, which is copied to the HTML (output) tree.
Otherwise, images are not visible in the output. Not desirable.
My cleanup changes (about 200 commits) are at:
https://github.com/martin-a-brown/LDP
Since I doubt anybody wants to read through the entire git log,
here's a shorter description of the various classes of changes that
I have made to the individual documents:
* adding countless closing tags, such as </sect1>, </sect2>,
</sect3>, </listitem>, </para>, </varlistentry>
* switching to entities for reserved characters, e.g. & to &,
<> to <>, [] to [], etc. (particularly where
people had left email addresses in angle brackets)
* renaming files containing XML from stem.sgml to stem.xml
* character set encoding; using entities in ASCII, converting to
Unicode with Byte Order Marker (BOM) where possible
* corrections to many DOCTYPE definitions
* "upgrading" DocBook versions when authors used elements or
features from a newer DocBook standard (e.g. 3
* substituting dash for underscore in the id attribute ([open]jade
refuses _ in id=)
* commit in repo converted images (e.g. eps) files for documents
(processors do not generate them on the fly; did they used to?)
* adding XML/SGML comment closures -->, where accidentally
omitted; removing stray '--' which was confusing SGML/XML
processors
* wrapping large blocks of <programlisting/> code with
<![CDATA[]]>
* replacing non-DocBook XML elements with DocBook equivalents,
i.e. <xlink:href/> becomes <ulink/>; replacing HTML elements
<a href=""> with <url url=""> in Linuxdoc documents
* removing extra (and sometimes empty) tags which confused the
processor
* and, probably many other small errors that jade or xsltproc
complained about...
I will observe that the vast majority of these corrections were on
DocBook (both SGML and XML) files.
Several Linuxdoc files required adding missing tags, correcting a
few tag names and even a few entity corrections, as well. I guess
that earlier SGML processors (or their operating configurations)
were more forgiving of many of these errors.
This message treats the cleanup needed only of the source tree.
There is separate work for the cleanup of the output tree, lots of
old documents that maybe should be in archived, etc.
-Martin
[0] https://github.com/martin-a-brown/LDP
--
Martin A. Brown
http://linux-ip.net/