discuss: ePUB format(s); description of options and hurdles
Subject:
ePUB format(s); description of options and hurdles
From:
"Martin A. Brown" ####@####.####
Date:
14 Mar 2016 20:02:00 +0000
Message-Id: <alpine.LSU.2.11.1603141231150.12423@znpeba.jbaqresebt.arg>
Hello all,
I have examined two ePUB specifications, both EPUB 2.0.1 (epub2) [0]
and EPUB 3.0.1 (epub3) [1]. I have not studied EPUB 1.0.1 [2].
Here's what I have learned.
The standard for epub3 is newer and includes features that LDP is
unlikely to use. These include media overlay (which defines a
format for synchronizing text and audio [3]) and content obfuscation
(in lieu of full DRM).
While the docbook-xsl-stylesheets project (Bob Stayton) provides
support and a handy README for generating epub3 content, there does
not appear to be an (upstream, distribution-supplied) tool that can
generate epub3.
Available tools:
* xmlto generates epub1; only reads XML docs, for us, that would
mean support only for DocBook XML 4.x and DocBook XML 5.0
* a2x generates epub1; internally, a2x converts asciidoc to
DocBook 4.5 XML before producing the epub
* docbook-xsl-stylesheets can generate XHTML suitable epub3;
user still needs to package up the .epub file; would
mean support only for DocBook XML and asciidoc files
In addition to the question of epub3 vs. epub2, there's the problem
of the HTML outputs from the SGML documents. These are not XHTML
and would need to be converted to XHTML before being included in any
epub document.
My summary of the situation is roughly like this:
* We could, probably, fairly easily support epub outputs for each
the DocBook XML and Asciidoc formats. Fastest solution would
probably be using xmlto. But, that's no solution for the
Linuxdoc and DocBookSGML sources.
* Convert the HTML outputs (from SGML sources) to XHTML. Then, we
are building our own epub generation tool. If so (and if I were
undertaking this, there's a pretty good-looking library called
python-epub which generates epub2.
I am definitely interested in this epub nonsense, but it seems
there's quite a bit of work to support epub for our entire
collection. Partial support of our source set (XML sources) would
not be as tricky (but somehow that bothers me a bit). I'd be
interested in any thoughts people have about which of the many paths
we wight take from here.
-Martin
[0] http://idpf.org/epub/201
[1] http://idpf.org/epub/301
[2] http://www.digitalpreservation.gov/formats/fdd/fdd000054.shtml
[3] Which reminds me of the old synchronized 78 rpm records that
had stories like Bozo the Clown Under the Sea.
https://www.youtube.com/watch?v=lgJmBrW4D80
[4] https://bitbucket.org/exirel/epub
http://epub.exirel.me/ # -- in French
--
Martin A. Brown
http://linux-ip.net/