discuss: Thread: The metadata question was [ progress update: 1. automation of source to output publication]

Subject: The metadata question was [ progress update: 1. automation of source to output publication]
From: "Martin A. Brown" ####@####.####
Date: 6 Mar 2016 03:13:29 +0000
Message-Id: <alpine.LSU.2.11.1603051804520.19013@znpeba.jbaqresebt.arg>
Good evening again David,

Items covered:

  1. Lampadas
  2. Document metadata (scrollkeeper/OMF, DublinCore)
  3. Open question, what metadata are important to TLDP?

>> Automation software
>> -------------------
>> I started by writing something in shell, but it quickly got rather 
>> unwieldy and tangled.  Therefore, I switched to Python, even though, 
>> many of the core features are called out in other programs (like 
>> 'sgml2html', 'xsltproc', and 'html2text').
>
>I'm a little concerned that there might be some duplication of 
>effort here.

Yes, there has most definitely been duplication of effort.


Item #1: Lampadas
-----------------

>The Lampadas project for LDP was to use Plone. 

As long as I have been lurking on (and contributing to) the LDP 
mailing list (late 2002), Lampadas has been mentioned as 
in-progress, abandoned or defunct.

To my knowledge, no TLDP volunteer has since been willing to pick up 
(or capable of picking up) the lampadas (Lampadas?) project.  So, 
that's why we are where we are today.

I suspect that Lampadas is dead software.

The Plone project lives on and seems to be thriving [0].  I looked 
at what others have contributed to the Plone project having anything 
to do with DocBook (since that is more widely known than the other 
documentation formats supported by TLDP).  The result is not 
inspiring, only one Plone add-on called 'collective.transform' which 
handles bi-directional HTML <-> DocBook XML transformation.

If there were a volunteer who knew and understood Plone, s/he might 
be able to resurrect Lampadas.  If there were somebody who knew and 
understood Lampadas, s/he could learn Plone.

>When looking over the lampadas folder bear in mind that it likely 
>contains python code for the non-Plone version of lampadas which 
>was rejected for incomplete object persistance (I only have a vague 
>idea of what this term means). 

I have a fairly good idea of what that means and I can see why this 
might pose a problem for an upstream project reviewing a patch or 
submission.  Essentially, the upstream was probably saying:

  Dude, you need to track anything for which you are assuming 
  responsibility in the module you are writing.

From a practical, software-development perspective, it is very 
difficult to jump into two unknown software projects with nobody to 
guide the new person as to the rationale for the project's 
existence.  With David Merrill's departure, there's little knowledge 
to transfer about Lampadas itself (and why, specifically, Plone).

So, let's rewind the discussion to the fundamental purpose behind 
the tool....


Item #2:  Document metadata (scrollkeeper/OMF, DublinCore)
----------------------------------------------------------

>If we were to use Plone, then it also provides for publishing LDP 
>docs and in addition has metadata. 

TLDP's biggest lack, historically, has been some sort of metadata 
management tool.

This sort of document metadata memory is the sort of thing that 
transcends the memories of individuals and allows smoother 
coordination across many timezones and people.  Why else do we have 
computers?

Earlier this year, when I was familiarizing myself with TLDP's 
history (I read all of the mailing list that I had access to), 
background (I read everything on the tldp.org site) and production 
software (I read Greg Ferguson's scripts), I learned intimately 
about this metadata lack.

For us, metadata management has been a hard problem.  This may be 
related to our distributed nature, our volunteer composition, our 
divergent expectations or, perhaps, merely technical shortcomings.  

I'm still not sure which, if any of the above, are the primary 
reasons why we still have a metadata management problem, but, we 
still feel this lack.

>The metadata part per D. Merrill was created for LDP by ibiblio 
>(based on DublinCore) and is known as Open Metadata Framework = OMF 
>(not to be confused with the OMF video game).  This OMF is used by 
>Gnome, etc.

<digression fork="OMF">
I learned about OMF (Open Metadata Framework) earlier this year.  I 
get the impression that it is stone-cold dead.  The implementation 
(in C) that we used was scrollkeeper [1], last updated in 2003.  It 
seems to have been in use by both GNOME and TLDP, but after 2002 (or 
so), a piece of software called rarian [2] (last updated in 2008) 
seems to have supplanted scrollkeeper.

Why did scrollkeeper die?  I was paying attention to other things at 
the time, but in trying to reconstruct what happened, I conclude 
that OMF was too much of a niche language.  In the mid-2000s, there 
was widespread adoption of the more general XML tool, RDF [3] (later 
also RDFa [4]), also capable of implementing DublinCore [5] (to 
which you alluded).  Later the idea of microformats [6] was layered 
on top of RDF(a).
</digression>

The scrollkeeper software (OMF data structure) addressed one problem 
quite well: description of document content, relationship and 
linking.  [I'm not certain we ever used scrollkeeper in the 
production workflow.  I just can't quite tell.  It looks like we 
didn't even though it was in the software repository.]

But, the scrollkeeper/OMF did not address the problem of "who is 
reviewing this document," "when was the author last contacted," or 
"who reviewed this thing, anyway?"


Item #3. Open question, what metadata are important to TLDP?
------------------------------------------------------------

With all of the above said, I think I'm going to stop here with one 
final question.

  What metadata are important to TLDP?

I think this will take time to answer properly and then we should 
(re-)examine the tools that are available today and see if any of 
the tools can help us with the metadata questions.

We may find that the wiki (that we already have) is "good enough".

We may find that we want to use comment strings, checked into each 
document and that will be "good enough".

We may find that we actually need a Content Management System (CMS).

But, it has been approximately 15 years since we had this 
discussion, so maybe it is time to have it again.

In the meantime, I'll continue finishing the work on the tool I have 
written to automate publication of documents from the LDP repository 
while thinking about the metadata problem, taking notes on what 
people suggest on this mailing list and reading what I can about any 
technologies (e.g. RDFa, DublinCore) and tools (e.g. Plone, rdflib) 
that people may propose.

Best regards,

-Martin

 [0] https://plone.org/
 [1] https://sourceforge.net/projects/scrollkeeper/files/
 [2] https://rarian.freedesktop.org/
     https://rarian.freedesktop.org/Releases/
 [3] https://www.w3.org/TR/2014/NOTE-rdf11-primer-20140225/
 [4] https://www.w3.org/TR/xhtml-rdfa-primer/
     https://rdfa.info/
 [5] http://dublincore.org/documents/2012/06/14/dcmi-terms/?v=terms#
 [6] http://microformats.org/wiki/faqs-for-rdf

-- 
Martin A. Brown
http://linux-ip.net/
©The Linux Documentation Project, 2014. Listserver maintained by dr Serge Victor on ibiblio.org servers. See current spam statz.