discuss: ePUB format(s); description of options and hurdles


Previous by date: 14 Mar 2016 20:02:00 +0000 status update on LDP needed tasks, Martin A. Brown
Next by date: 14 Mar 2016 20:02:00 +0000 Re: ePUB format(s); description of options and hurdles, Leo Noordergraaf
Previous in thread:
Next in thread: 14 Mar 2016 20:02:00 +0000 Re: ePUB format(s); description of options and hurdles, Leo Noordergraaf

Subject: ePUB format(s); description of options and hurdles
From: "Martin A. Brown" ####@####.####
Date: 14 Mar 2016 20:02:00 +0000
Message-Id: <alpine.LSU.2.11.1603141231150.12423@znpeba.jbaqresebt.arg>

Hello all,

I have examined two ePUB specifications, both EPUB 2.0.1 (epub2) [0] 
and EPUB 3.0.1 (epub3) [1].  I have not studied EPUB 1.0.1 [2].

Here's what I have learned.

The standard for epub3 is newer and includes features that LDP is 
unlikely to use.  These include media overlay (which defines a 
format for synchronizing text and audio [3]) and content obfuscation 
(in lieu of full DRM).

While the docbook-xsl-stylesheets project (Bob Stayton) provides 
support and a handy README for generating epub3 content, there does 
not appear to be an (upstream, distribution-supplied) tool that can 
generate epub3.

Available tools:

  * xmlto generates epub1; only reads XML docs, for us, that would 
    mean support only for DocBook XML 4.x and DocBook XML 5.0

  * a2x generates epub1; internally, a2x converts asciidoc to 
    DocBook 4.5 XML before producing the epub

  * docbook-xsl-stylesheets can generate XHTML suitable epub3; 
    user still needs to package up the .epub file; would 
    mean support only for DocBook XML and asciidoc files

In addition to the question of epub3 vs. epub2, there's the problem 
of the HTML outputs from the SGML documents.  These are not XHTML 
and would need to be converted to XHTML before being included in any 
epub document.

My summary of the situation is roughly like this:

  * We could, probably, fairly easily support epub outputs for each
    the DocBook XML and Asciidoc formats.  Fastest solution would 
    probably be using xmlto.  But, that's no solution for the 
    Linuxdoc and DocBookSGML sources.

  * Convert the HTML outputs (from SGML sources) to XHTML.  Then, we 
    are building our own epub generation tool.  If so (and if I were 
    undertaking this, there's a pretty good-looking library called 
    python-epub which generates epub2.

I am definitely interested in this epub nonsense, but it seems 
there's quite a bit of work to support epub for our entire 
collection.  Partial support of our source set (XML sources) would 
not be as tricky (but somehow that bothers me a bit). I'd be 
interested in any thoughts people have about which of the many paths 
we wight take from here.

-Martin

 [0] http://idpf.org/epub/201
 [1] http://idpf.org/epub/301
 [2] http://www.digitalpreservation.gov/formats/fdd/fdd000054.shtml

 [3] Which reminds me of the old synchronized 78 rpm records that 
     had stories like Bozo the Clown Under the Sea.
     https://www.youtube.com/watch?v=lgJmBrW4D80

 [4] https://bitbucket.org/exirel/epub
     http://epub.exirel.me/  # -- in French

-- 
Martin A. Brown
http://linux-ip.net/

Previous by date: 14 Mar 2016 20:02:00 +0000 status update on LDP needed tasks, Martin A. Brown
Next by date: 14 Mar 2016 20:02:00 +0000 Re: ePUB format(s); description of options and hurdles, Leo Noordergraaf
Previous in thread:
Next in thread: 14 Mar 2016 20:02:00 +0000 Re: ePUB format(s); description of options and hurdles, Leo Noordergraaf


  ©The Linux Documentation Project, 2014. Listserver maintained by dr Serge Victor on ibiblio.org servers. See current spam statz.