[<<] [<] Page 1 of 1 [>] [>>] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Subject:
The metadata question was [ progress update: 1. automation of
source to output publication]
From: "Martin A. Brown" ####@####.#### Date: 6 Mar 2016 03:13:29 +0000 Message-Id: <alpine.LSU.2.11.1603051804520.19013@znpeba.jbaqresebt.arg> Good evening again David, Items covered: 1. Lampadas 2. Document metadata (scrollkeeper/OMF, DublinCore) 3. Open question, what metadata are important to TLDP? >> Automation software >> ------------------- >> I started by writing something in shell, but it quickly got rather >> unwieldy and tangled. Therefore, I switched to Python, even though, >> many of the core features are called out in other programs (like >> 'sgml2html', 'xsltproc', and 'html2text'). > >I'm a little concerned that there might be some duplication of >effort here. Yes, there has most definitely been duplication of effort. Item #1: Lampadas ----------------- >The Lampadas project for LDP was to use Plone. As long as I have been lurking on (and contributing to) the LDP mailing list (late 2002), Lampadas has been mentioned as in-progress, abandoned or defunct. To my knowledge, no TLDP volunteer has since been willing to pick up (or capable of picking up) the lampadas (Lampadas?) project. So, that's why we are where we are today. I suspect that Lampadas is dead software. The Plone project lives on and seems to be thriving [0]. I looked at what others have contributed to the Plone project having anything to do with DocBook (since that is more widely known than the other documentation formats supported by TLDP). The result is not inspiring, only one Plone add-on called 'collective.transform' which handles bi-directional HTML <-> DocBook XML transformation. If there were a volunteer who knew and understood Plone, s/he might be able to resurrect Lampadas. If there were somebody who knew and understood Lampadas, s/he could learn Plone. >When looking over the lampadas folder bear in mind that it likely >contains python code for the non-Plone version of lampadas which >was rejected for incomplete object persistance (I only have a vague >idea of what this term means). I have a fairly good idea of what that means and I can see why this might pose a problem for an upstream project reviewing a patch or submission. Essentially, the upstream was probably saying: Dude, you need to track anything for which you are assuming responsibility in the module you are writing. From a practical, software-development perspective, it is very difficult to jump into two unknown software projects with nobody to guide the new person as to the rationale for the project's existence. With David Merrill's departure, there's little knowledge to transfer about Lampadas itself (and why, specifically, Plone). So, let's rewind the discussion to the fundamental purpose behind the tool.... Item #2: Document metadata (scrollkeeper/OMF, DublinCore) ---------------------------------------------------------- >If we were to use Plone, then it also provides for publishing LDP >docs and in addition has metadata. TLDP's biggest lack, historically, has been some sort of metadata management tool. This sort of document metadata memory is the sort of thing that transcends the memories of individuals and allows smoother coordination across many timezones and people. Why else do we have computers? Earlier this year, when I was familiarizing myself with TLDP's history (I read all of the mailing list that I had access to), background (I read everything on the tldp.org site) and production software (I read Greg Ferguson's scripts), I learned intimately about this metadata lack. For us, metadata management has been a hard problem. This may be related to our distributed nature, our volunteer composition, our divergent expectations or, perhaps, merely technical shortcomings. I'm still not sure which, if any of the above, are the primary reasons why we still have a metadata management problem, but, we still feel this lack. >The metadata part per D. Merrill was created for LDP by ibiblio >(based on DublinCore) and is known as Open Metadata Framework = OMF >(not to be confused with the OMF video game). This OMF is used by >Gnome, etc. <digression fork="OMF"> I learned about OMF (Open Metadata Framework) earlier this year. I get the impression that it is stone-cold dead. The implementation (in C) that we used was scrollkeeper [1], last updated in 2003. It seems to have been in use by both GNOME and TLDP, but after 2002 (or so), a piece of software called rarian [2] (last updated in 2008) seems to have supplanted scrollkeeper. Why did scrollkeeper die? I was paying attention to other things at the time, but in trying to reconstruct what happened, I conclude that OMF was too much of a niche language. In the mid-2000s, there was widespread adoption of the more general XML tool, RDF [3] (later also RDFa [4]), also capable of implementing DublinCore [5] (to which you alluded). Later the idea of microformats [6] was layered on top of RDF(a). </digression> The scrollkeeper software (OMF data structure) addressed one problem quite well: description of document content, relationship and linking. [I'm not certain we ever used scrollkeeper in the production workflow. I just can't quite tell. It looks like we didn't even though it was in the software repository.] But, the scrollkeeper/OMF did not address the problem of "who is reviewing this document," "when was the author last contacted," or "who reviewed this thing, anyway?" Item #3. Open question, what metadata are important to TLDP? ------------------------------------------------------------ With all of the above said, I think I'm going to stop here with one final question. What metadata are important to TLDP? I think this will take time to answer properly and then we should (re-)examine the tools that are available today and see if any of the tools can help us with the metadata questions. We may find that the wiki (that we already have) is "good enough". We may find that we want to use comment strings, checked into each document and that will be "good enough". We may find that we actually need a Content Management System (CMS). But, it has been approximately 15 years since we had this discussion, so maybe it is time to have it again. In the meantime, I'll continue finishing the work on the tool I have written to automate publication of documents from the LDP repository while thinking about the metadata problem, taking notes on what people suggest on this mailing list and reading what I can about any technologies (e.g. RDFa, DublinCore) and tools (e.g. Plone, rdflib) that people may propose. Best regards, -Martin [0] https://plone.org/ [1] https://sourceforge.net/projects/scrollkeeper/files/ [2] https://rarian.freedesktop.org/ https://rarian.freedesktop.org/Releases/ [3] https://www.w3.org/TR/2014/NOTE-rdf11-primer-20140225/ [4] https://www.w3.org/TR/xhtml-rdfa-primer/ https://rdfa.info/ [5] http://dublincore.org/documents/2012/06/14/dcmi-terms/?v=terms# [6] http://microformats.org/wiki/faqs-for-rdf -- Martin A. Brown http://linux-ip.net/ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
[<<] [<] Page 1 of 1 [>] [>>] |