discuss: Thread: ePUB format(s); description of options and hurdles

Subject: ePUB format(s); description of options and hurdles
From: "Martin A. Brown" ####@####.####
Date: 14 Mar 2016 20:02:00 +0000
Message-Id: <alpine.LSU.2.11.1603141231150.12423@znpeba.jbaqresebt.arg>

Hello all,

I have examined two ePUB specifications, both EPUB 2.0.1 (epub2) [0] 
and EPUB 3.0.1 (epub3) [1].  I have not studied EPUB 1.0.1 [2].

Here's what I have learned.

The standard for epub3 is newer and includes features that LDP is 
unlikely to use.  These include media overlay (which defines a 
format for synchronizing text and audio [3]) and content obfuscation 
(in lieu of full DRM).

While the docbook-xsl-stylesheets project (Bob Stayton) provides 
support and a handy README for generating epub3 content, there does 
not appear to be an (upstream, distribution-supplied) tool that can 
generate epub3.

Available tools:

  * xmlto generates epub1; only reads XML docs, for us, that would 
    mean support only for DocBook XML 4.x and DocBook XML 5.0

  * a2x generates epub1; internally, a2x converts asciidoc to 
    DocBook 4.5 XML before producing the epub

  * docbook-xsl-stylesheets can generate XHTML suitable epub3; 
    user still needs to package up the .epub file; would 
    mean support only for DocBook XML and asciidoc files

In addition to the question of epub3 vs. epub2, there's the problem 
of the HTML outputs from the SGML documents.  These are not XHTML 
and would need to be converted to XHTML before being included in any 
epub document.

My summary of the situation is roughly like this:

  * We could, probably, fairly easily support epub outputs for each
    the DocBook XML and Asciidoc formats.  Fastest solution would 
    probably be using xmlto.  But, that's no solution for the 
    Linuxdoc and DocBookSGML sources.

  * Convert the HTML outputs (from SGML sources) to XHTML.  Then, we 
    are building our own epub generation tool.  If so (and if I were 
    undertaking this, there's a pretty good-looking library called 
    python-epub which generates epub2.

I am definitely interested in this epub nonsense, but it seems 
there's quite a bit of work to support epub for our entire 
collection.  Partial support of our source set (XML sources) would 
not be as tricky (but somehow that bothers me a bit). I'd be 
interested in any thoughts people have about which of the many paths 
we wight take from here.

-Martin

 [0] http://idpf.org/epub/201
 [1] http://idpf.org/epub/301
 [2] http://www.digitalpreservation.gov/formats/fdd/fdd000054.shtml

 [3] Which reminds me of the old synchronized 78 rpm records that 
     had stories like Bozo the Clown Under the Sea.
     https://www.youtube.com/watch?v=lgJmBrW4D80

 [4] https://bitbucket.org/exirel/epub
     http://epub.exirel.me/  # -- in French

-- 
Martin A. Brown
http://linux-ip.net/

Subject: Re: ePUB format(s); description of options and hurdles
From: Leo Noordergraaf ####@####.####
Date: 16 Mar 2016 21:26:46 +0000
Message-Id: <56E9CFD4.8080006@noordergraaf.net>

Beste Martin (and all),

I came to like epub a lot as it allows to to carry my library in an
e-reader. So I do hope that tldp will support epub.

As far as generators go, it is unfortunate that not all accepted source
formats are easily converted. I understand your discomfort regarding
only partial support for epub.

So there are basically three choices:
1) do not support epub,
2) create a generator suite that can handle all accepted source formats,
3) use multiple generators, one for each source format and perhaps some
are not available yet.

Going for 1) is a pity in my opinion. TLDP isn't really in the business
of creating epub converters, let's drop 2). That leaves the third
option.

I suppose that at the moment there are more pressing things to do that
worry about a single output format. The pdf and html formats are far
more important and those should be supported for all documents.

So my suggestion is to support epub as an output format for those source
formats where it is easily supported and strive to include all source
formats eventually or otherwise drop epub support completely.

Leo

On 03/14/2016 09:03 PM, Martin A. Brown wrote:
> 
> Hello all,
> 
> I have examined two ePUB specifications, both EPUB 2.0.1 (epub2) [0] 
> and EPUB 3.0.1 (epub3) [1].  I have not studied EPUB 1.0.1 [2].
> 
> Here's what I have learned.
> 
> The standard for epub3 is newer and includes features that LDP is 
> unlikely to use.  These include media overlay (which defines a 
> format for synchronizing text and audio [3]) and content obfuscation 
> (in lieu of full DRM).
> 
> While the docbook-xsl-stylesheets project (Bob Stayton) provides 
> support and a handy README for generating epub3 content, there does 
> not appear to be an (upstream, distribution-supplied) tool that can 
> generate epub3.
> 
> Available tools:
> 
>   * xmlto generates epub1; only reads XML docs, for us, that would 
>     mean support only for DocBook XML 4.x and DocBook XML 5.0
> 
>   * a2x generates epub1; internally, a2x converts asciidoc to 
>     DocBook 4.5 XML before producing the epub
> 
>   * docbook-xsl-stylesheets can generate XHTML suitable epub3; 
>     user still needs to package up the .epub file; would 
>     mean support only for DocBook XML and asciidoc files
> 
> In addition to the question of epub3 vs. epub2, there's the problem 
> of the HTML outputs from the SGML documents.  These are not XHTML 
> and would need to be converted to XHTML before being included in any 
> epub document.
> 
> My summary of the situation is roughly like this:
> 
>   * We could, probably, fairly easily support epub outputs for each
>     the DocBook XML and Asciidoc formats.  Fastest solution would 
>     probably be using xmlto.  But, that's no solution for the 
>     Linuxdoc and DocBookSGML sources.
> 
>   * Convert the HTML outputs (from SGML sources) to XHTML.  Then, we 
>     are building our own epub generation tool.  If so (and if I were 
>     undertaking this, there's a pretty good-looking library called 
>     python-epub which generates epub2.
> 
> I am definitely interested in this epub nonsense, but it seems 
> there's quite a bit of work to support epub for our entire 
> collection.  Partial support of our source set (XML sources) would 
> not be as tricky (but somehow that bothers me a bit). I'd be 
> interested in any thoughts people have about which of the many paths 
> we wight take from here.
> 
> -Martin
> 
>  [0] http://idpf.org/epub/201
>  [1] http://idpf.org/epub/301
>  [2] http://www.digitalpreservation.gov/formats/fdd/fdd000054.shtml
> 
>  [3] Which reminds me of the old synchronized 78 rpm records that 
>      had stories like Bozo the Clown Under the Sea.
>      https://www.youtube.com/watch?v=lgJmBrW4D80
> 
>  [4] https://bitbucket.org/exirel/epub
>      http://epub.exirel.me/  # -- in French
>

Subject: Re: ePUB format(s); description of options and hurdles
From: "Martin A. Brown" ####@####.####
Date: 18 Mar 2016 23:32:32 +0000
Message-Id: <alpine.LSU.2.11.1603181615520.12423@znpeba.jbaqresebt.arg>

Hello Leo (et alia),

>I came to like epub a lot as it allows to to carry my library in an 
>e-reader. So I do hope that tldp will support epub.

I am in agreement with you here.  It would be good to be able to 
support an epub format (any epub format).  Especially since we have, 
effectively dropped the (ahead-of-its-time) PluckerDB format.

>As far as generators go, it is unfortunate that not all accepted 
>source formats are easily converted. I understand your discomfort 
>regarding only partial support for epub.
>
>1) do not support epub,
>
>2) create a generator suite that can handle all accepted source 
>   formats,
>
>3) use multiple generators, one for each source format and perhaps 
>   some are not available yet.

Yes, that's our set of options.

>Going for 1) is a pity in my opinion.

I agree completely.  It would be better to offer partial support for 
one of the EPUB standards (1.0.1, 2.0.1 or 3.0.1) than to avoid it 
entirely.

>TLDP isn't really in the business of creating epub converters, 
>let's drop 2). 

Maybe.  I'm still thinking about that.  Option 2 is my most desired 
outcome, but it represents work that goes beyond the scope of TLDP 
(and, possibly beyond my capabilities).  But, a tool that could 
process arbitrary HTML (or XHTML) and turn it into an epub would be, 
also, generally useful.

>That leaves the third option. I suppose that at the moment there 
>are more pressing things to do that worry about a single output 
>format. 

But, better to have agreement on a plan, even if the plan has not 
yet been set in motion.  So, thank you for replying!

I think the least-effort path to support EPUB from our source 
collection would involve partial support for the EPUB 2.0.1, but I 
thought I'd wait before engaging in any effort, to see what other 
TLDP members thought and whether others knew of tools or efforts 
that are invisible to me.

I have not done any work on additional output format support in the 
last week or so, as I have moved on to the question of the output 
tree and overall tldp.org website organization.

>So my suggestion is to support epub as an output format for those 
>source formats where it is easily supported and strive to include 
>all source formats eventually or otherwise drop epub support 
>completely.

I hear one vote and recommendation for supporting EPUB 
opportunistically, wherever the input format allows.  In our case, 
that would probably mean EPUB outputs could be generated from 
Asciidoc and any of the DocBook XML formats, but not from any of the 
SGML-based formats (DocBook 3.x, DocBook 4.x and Linuxdoc).

Thank you for your thoughts, Leo,

-Martin

>> I have examined two ePUB specifications, both EPUB 2.0.1 (epub2) [0] 
>> and EPUB 3.0.1 (epub3) [1].  I have not studied EPUB 1.0.1 [2].
>> 
>> Here's what I have learned.
>> 
>> The standard for epub3 is newer and includes features that LDP is 
>> unlikely to use.  These include media overlay (which defines a 
>> format for synchronizing text and audio [3]) and content obfuscation 
>> (in lieu of full DRM).
>> 
>> While the docbook-xsl-stylesheets project (Bob Stayton) provides 
>> support and a handy README for generating epub3 content, there does 
>> not appear to be an (upstream, distribution-supplied) tool that can 
>> generate epub3.
>> 
>> Available tools:
>> 
>>   * xmlto generates epub1; only reads XML docs, for us, that would 
>>     mean support only for DocBook XML 4.x and DocBook XML 5.0
>> 
>>   * a2x generates epub1; internally, a2x converts asciidoc to 
>>     DocBook 4.5 XML before producing the epub
>> 
>>   * docbook-xsl-stylesheets can generate XHTML suitable epub3; 
>>     user still needs to package up the .epub file; would 
>>     mean support only for DocBook XML and asciidoc files
>> 
>> In addition to the question of epub3 vs. epub2, there's the problem 
>> of the HTML outputs from the SGML documents.  These are not XHTML 
>> and would need to be converted to XHTML before being included in any 
>> epub document.
>> 
>> My summary of the situation is roughly like this:
>> 
>>   * We could, probably, fairly easily support epub outputs for each
>>     the DocBook XML and Asciidoc formats.  Fastest solution would 
>>     probably be using xmlto.  But, that's no solution for the 
>>     Linuxdoc and DocBookSGML sources.
>> 
>>   * Convert the HTML outputs (from SGML sources) to XHTML.  Then, we 
>>     are building our own epub generation tool.  If so (and if I were 
>>     undertaking this, there's a pretty good-looking library called 
>>     python-epub which generates epub2.
>> 
>> I am definitely interested in this epub nonsense, but it seems 
>> there's quite a bit of work to support epub for our entire 
>> collection.  Partial support of our source set (XML sources) would 
>> not be as tricky (but somehow that bothers me a bit). I'd be 
>> interested in any thoughts people have about which of the many paths 
>> we wight take from here.
>> 
>> -Martin
>> 
>>  [0] http://idpf.org/epub/201
>>  [1] http://idpf.org/epub/301
>>  [2] http://www.digitalpreservation.gov/formats/fdd/fdd000054.shtml
>> 
>>  [3] Which reminds me of the old synchronized 78 rpm records that 
>>      had stories like Bozo the Clown Under the Sea.
>>      https://www.youtube.com/watch?v=lgJmBrW4D80
>> 
>>  [4] https://bitbucket.org/exirel/epub
>>      http://epub.exirel.me/  # -- in French
>> 
>
>
>______________________
>http://lists.tldp.org/
>
>
>

-- 
Martin A. Brown
http://linux-ip.net/

Subject: Re: ePUB format(s); description of options and hurdles
From: jdd ####@####.####
Date: 19 Mar 2016 06:57:31 +0000
Message-Id: <56ECF89D.1050803@dodin.org>

Le 19/03/2016 00:33, Martin A. Brown a écrit :

> I think the least-effort path to support EPUB from our source

http://manual.calibre-ebook.com/faq.html#what-formats-does-app-support-conversion-to-from

calibre can do most of what we need and I think there is a command line 
version.

It's java, so it works everywhere but is very slow, but we don't have 
usually so many documents to convert at the same time

jdd

Subject: Re: ePUB format(s); description of options and hurdles
From: "Martin A. Brown" ####@####.####
Date: 19 Mar 2016 21:43:32 +0000
Message-Id: <alpine.LSU.2.11.1603191440590.12423@znpeba.jbaqresebt.arg>

Hello and greetings jdd,

>> I think the least-effort path to support EPUB from our source
>
> http://manual.calibre-ebook.com/faq.html#what-formats-does-app-support-conversion-to-from
>
> calibre can do most of what we need and I think there is a command 
> line version.

New software!  Yay!  Thank you for the reference!

> It's java, so it works everywhere but is very slow, but we don't 
> have usually so many documents to convert at the same time

I have a copy of 'calibre' (1.48.0) on my box and it seems to come 
with something called 'ebook-convert'.  I will be spending some time 
with it to see how predictable it is and to see how well it handles 
the outputs generated from SGML (DocBook SGML and Linuxdoc).  This 
may be a good solution, jdd.

Thanks for the pointer,

-Martin

-- 
Martin A. Brown
http://linux-ip.net/

©The Linux Documentation Project, 2014. Listserver maintained by dr Serge Victor on ibiblio.org servers. See current spam statz.