Subject:
The metadata question was [ progress update: 1. automation of
source to output publication]
From:
"Martin A. Brown" ####@####.####
Date:
6 Mar 2016 03:13:29 +0000
Message-Id: <alpine.LSU.2.11.1603051804520.19013@znpeba.jbaqresebt.arg>
Good evening again David,
Items covered:
1. Lampadas
2. Document metadata (scrollkeeper/OMF, DublinCore)
3. Open question, what metadata are important to TLDP?
>> Automation software
>> -------------------
>> I started by writing something in shell, but it quickly got rather
>> unwieldy and tangled. Therefore, I switched to Python, even though,
>> many of the core features are called out in other programs (like
>> 'sgml2html', 'xsltproc', and 'html2text').
>
>I'm a little concerned that there might be some duplication of
>effort here.
Yes, there has most definitely been duplication of effort.
Item #1: Lampadas
-----------------
>The Lampadas project for LDP was to use Plone.
As long as I have been lurking on (and contributing to) the LDP
mailing list (late 2002), Lampadas has been mentioned as
in-progress, abandoned or defunct.
To my knowledge, no TLDP volunteer has since been willing to pick up
(or capable of picking up) the lampadas (Lampadas?) project. So,
that's why we are where we are today.
I suspect that Lampadas is dead software.
The Plone project lives on and seems to be thriving [0]. I looked
at what others have contributed to the Plone project having anything
to do with DocBook (since that is more widely known than the other
documentation formats supported by TLDP). The result is not
inspiring, only one Plone add-on called 'collective.transform' which
handles bi-directional HTML <-> DocBook XML transformation.
If there were a volunteer who knew and understood Plone, s/he might
be able to resurrect Lampadas. If there were somebody who knew and
understood Lampadas, s/he could learn Plone.
>When looking over the lampadas folder bear in mind that it likely
>contains python code for the non-Plone version of lampadas which
>was rejected for incomplete object persistance (I only have a vague
>idea of what this term means).
I have a fairly good idea of what that means and I can see why this
might pose a problem for an upstream project reviewing a patch or
submission. Essentially, the upstream was probably saying:
Dude, you need to track anything for which you are assuming
responsibility in the module you are writing.
From a practical, software-development perspective, it is very
difficult to jump into two unknown software projects with nobody to
guide the new person as to the rationale for the project's
existence. With David Merrill's departure, there's little knowledge
to transfer about Lampadas itself (and why, specifically, Plone).
So, let's rewind the discussion to the fundamental purpose behind
the tool....
Item #2: Document metadata (scrollkeeper/OMF, DublinCore)
----------------------------------------------------------
>If we were to use Plone, then it also provides for publishing LDP
>docs and in addition has metadata.
TLDP's biggest lack, historically, has been some sort of metadata
management tool.
This sort of document metadata memory is the sort of thing that
transcends the memories of individuals and allows smoother
coordination across many timezones and people. Why else do we have
computers?
Earlier this year, when I was familiarizing myself with TLDP's
history (I read all of the mailing list that I had access to),
background (I read everything on the tldp.org site) and production
software (I read Greg Ferguson's scripts), I learned intimately
about this metadata lack.
For us, metadata management has been a hard problem. This may be
related to our distributed nature, our volunteer composition, our
divergent expectations or, perhaps, merely technical shortcomings.
I'm still not sure which, if any of the above, are the primary
reasons why we still have a metadata management problem, but, we
still feel this lack.
>The metadata part per D. Merrill was created for LDP by ibiblio
>(based on DublinCore) and is known as Open Metadata Framework = OMF
>(not to be confused with the OMF video game). This OMF is used by
>Gnome, etc.
<digression fork="OMF">
I learned about OMF (Open Metadata Framework) earlier this year. I
get the impression that it is stone-cold dead. The implementation
(in C) that we used was scrollkeeper [1], last updated in 2003. It
seems to have been in use by both GNOME and TLDP, but after 2002 (or
so), a piece of software called rarian [2] (last updated in 2008)
seems to have supplanted scrollkeeper.
Why did scrollkeeper die? I was paying attention to other things at
the time, but in trying to reconstruct what happened, I conclude
that OMF was too much of a niche language. In the mid-2000s, there
was widespread adoption of the more general XML tool, RDF [3] (later
also RDFa [4]), also capable of implementing DublinCore [5] (to
which you alluded). Later the idea of microformats [6] was layered
on top of RDF(a).
</digression>
The scrollkeeper software (OMF data structure) addressed one problem
quite well: description of document content, relationship and
linking. [I'm not certain we ever used scrollkeeper in the
production workflow. I just can't quite tell. It looks like we
didn't even though it was in the software repository.]
But, the scrollkeeper/OMF did not address the problem of "who is
reviewing this document," "when was the author last contacted," or
"who reviewed this thing, anyway?"
Item #3. Open question, what metadata are important to TLDP?
------------------------------------------------------------
With all of the above said, I think I'm going to stop here with one
final question.
What metadata are important to TLDP?
I think this will take time to answer properly and then we should
(re-)examine the tools that are available today and see if any of
the tools can help us with the metadata questions.
We may find that the wiki (that we already have) is "good enough".
We may find that we want to use comment strings, checked into each
document and that will be "good enough".
We may find that we actually need a Content Management System (CMS).
But, it has been approximately 15 years since we had this
discussion, so maybe it is time to have it again.
In the meantime, I'll continue finishing the work on the tool I have
written to automate publication of documents from the LDP repository
while thinking about the metadata problem, taking notes on what
people suggest on this mailing list and reading what I can about any
technologies (e.g. RDFa, DublinCore) and tools (e.g. Plone, rdflib)
that people may propose.
Best regards,
-Martin
[0] https://plone.org/
[1] https://sourceforge.net/projects/scrollkeeper/files/
[2] https://rarian.freedesktop.org/
https://rarian.freedesktop.org/Releases/
[3] https://www.w3.org/TR/2014/NOTE-rdf11-primer-20140225/
[4] https://www.w3.org/TR/xhtml-rdfa-primer/
https://rdfa.info/
[5] http://dublincore.org/documents/2012/06/14/dcmi-terms/?v=terms#
[6] http://microformats.org/wiki/faqs-for-rdf
--
Martin A. Brown
http://linux-ip.net/