editors: Thread: MS Word to XML


[<<] [<] Page 1 of 1 [>] [>>]
Subject: MS Word to XML
From: Tabatha Marshall ####@####.####
Date: 14 Nov 2003 00:55:23 -0000
Message-Id: <1068771294.20384.8.camel@mysticchild>

Hi all,

I've been exploring Windows (and Linux) solutions for transforming MS
Word documents into XML, preferably DocBook XML 4.2.

I tried XMLSPY, which can only be evaluated for 30 days, I've tried
Morphon, which is nice for working in XML, but I couldn't figure out
what to do with the MS Word doc.

When I've used my Linux tools to convert, I've ended up with an XML
file, but it's so awful thanks to all the junk in MS Word, it makes me
want to scrap it and just cut/paste everything in, writing the tags in
myself.

Anybody have better luck finding an easy way to convert?  Your
suggestions are most welcome, and the sooner the better.  I have a guide
that needs conversion to XML before month-end.

For the benefit of our reviewers, many of whom use Windows, please use
"Reply All" if you have ideas to share on this subject!

Much thanks!
Tabatha

-- 
Tabatha Marshall
Web: www.merlinmonroe.com
Linux Documentation Project Review Coordinator (http://www.tldp.org)
Linux Counter Area Manager US:wa (http://counter.li.org)

Subject: Re: MS Word to XML
From: Tabatha Marshall ####@####.####
Date: 14 Nov 2003 03:00:52 -0000
Message-Id: <1068778823.20382.95.camel@mysticchild>

On Thu, 2003-11-13 at 18:29, Saqib Ali wrote:
> Hello Tabatha,
> 
> OpenOffice(OO) 1.1 supports exporting to DocBook XML 4.2. You might wanna
> try opening the MS Word document in OO and try exporting to DocBook.

I went to xml.openoffice.org and looked into all that, but was very
lost.  I downloaded some things (*I think*) but couldn't figure out
their instructions to make it all work.  If you have any good
references, please let me know.

> Another option is Save the file OO format (XML file compressed in a ZIP),
> and try using an XSLT to go from OO XML to DocBook XML.

I saved the MS Word document in OpenOffice and unzipped the .sxw file,
since everything's done in XML, but it was still very ugly.  I couldn't
figure out how to just get the doctype statement reading "book" with the
proper link to open-oasis, other than to manually put it in.  The whole
thing overall left me a little muddled.

> Also try w2XML from http://www.docsoft.com/w2xmlv2.htm

Off to look at it right now.  Thanks!

> http://validate.sf.net <---- HTML/XHTML/DocBook Validator

I'm gonna take a peek at that too.  :D

> 
> On Thu, 13 Nov 2003, Tabatha Marshall wrote:
> 
> > Hi all,
> >
> > I've been exploring Windows (and Linux) solutions for transforming MS
> > Word documents into XML, preferably DocBook XML 4.2.
> >
> > I tried XMLSPY, which can only be evaluated for 30 days, I've tried
> > Morphon, which is nice for working in XML, but I couldn't figure out
> > what to do with the MS Word doc.
> >
> > When I've used my Linux tools to convert, I've ended up with an XML
> > file, but it's so awful thanks to all the junk in MS Word, it makes me
> > want to scrap it and just cut/paste everything in, writing the tags in
> > myself.
> >
> > Anybody have better luck finding an easy way to convert?  Your
> > suggestions are most welcome, and the sooner the better.  I have a guide
> > that needs conversion to XML before month-end.
> >
> > For the benefit of our reviewers, many of whom use Windows, please use
> > "Reply All" if you have ideas to share on this subject!
> >
> > Much thanks!
> > Tabatha
> >
> > --
> > Tabatha Marshall
> > Web: www.merlinmonroe.com
> > Linux Documentation Project Review Coordinator (http://www.tldp.org)
> > Linux Counter Area Manager US:wa (http://counter.li.org)
> >
> >
> > ______________________
> > http://lists.tldp.org/
> >
> >
> 
> ______________________
> http://lists.tldp.org/
-- 
Tabatha Marshall
Web: www.merlinmonroe.com
Linux Documentation Project Review Coordinator (http://www.tldp.org)
Linux Counter Area Manager US:wa (http://counter.li.org)

Subject: Re: MS Word to XML
From: Tabatha Marshall ####@####.####
Date: 14 Nov 2003 03:44:54 -0000
Message-Id: <1068781465.20382.98.camel@mysticchild>

On Thu, 2003-11-13 at 19:00, Tabatha Marshall wrote:
> > Also try w2XML from http://www.docsoft.com/w2xmlv2.htm

Nope.  When I tried to install it I was told I need the .Net framework
and the install aborted.  I was going for the trial version of W2XML.

Did I try the wrong thing?


-- 
Tabatha Marshall
Web: www.merlinmonroe.com
Linux Documentation Project Review Coordinator (http://www.tldp.org)
Linux Counter Area Manager US:wa (http://counter.li.org)

Subject: Re: MS Word to XML
From: Randy Kramer ####@####.####
Date: 14 Nov 2003 04:18:43 -0000
Message-Id: <200311132329.31036.rhkramer@fast.net>

On Thursday 13 November 2003 09:29 pm, Saqib Ali wrote:
> OpenOffice(OO) 1.1 supports exporting to DocBook XML 4.2. You might wanna
> try opening the MS Word document in OO and try exporting to DocBook.
>
> Another option is Save the file OO format (XML file compressed in a ZIP),
> and try using an XSLT to go from OO XML to DocBook XML.
>
> Also try w2XML from http://www.docsoft.com/w2xmlv2.htm

Hmm, just realized that AbiWord can import and export DocBook (via a plugin) 
-- see http://www.abisource.com/twiki/bin/view/Abiword/PluginMatrix.  (Which, 
BTW, is a TWiki -- if you go there, also check out 
http://www.abisource.com/twiki/bin/view/Abiword/AbiWordFAQ for something of a 
TWiki "application".

Randy Kramer

Two asides:  

In some sense, AbiWord is something of a native DocBook editor then, as, IIUC, 
import and export "filters" in AbiWord don't translate to and from the native 
AbiWord file format, but instead directly load / unload data from AbiWords 
internal piece table. 

And having reminded myself of that, if I could built an import and export 
filter for "TWikiText" ...
Subject: Re: MS Word to XML
From: Randy Kramer ####@####.####
Date: 14 Nov 2003 12:25:50 -0000
Message-Id: <200311140736.50113.rhkramer@fast.net>

I will re-read my emails before posting.
I will re-read my emails before posting.
I will re-read my emails before posting.
...

<edited>
On Thursday 13 November 2003 11:29 pm, Randy Kramer wrote:
> Hmm, just realized that AbiWord can import and export DocBook (via a
> plugin) -- see
> http://www.abisource.com/twiki/bin/view/Abiword/PluginMatrix.  (Which, BTW,
> is a TWiki -- if you go there, also check out
> http://www.abisource.com/twiki/bin/view/Abiword/AbiWordFAQ for an example 
> of
> a TWiki "application".
>
> Randy Kramer
>
> Two asides:
>
> In some sense, AbiWord is a native DocBook editor as,
> IIUC, import and export "filters" in AbiWord don't translate to and from
> the native AbiWord file format, but instead directly load / unload data
> from AbiWord's internal piece table.
>
> And having reminded myself of that, if I could build an import and export
> filter for "TWikiText" ...


Subject: Re: MS Word to XML
From: Tabatha Marshall ####@####.####
Date: 15 Nov 2003 09:11:55 -0000
Message-Id: <1068887483.21346.122.camel@mysticchild>

Thanks for the info, Bob!

I tried a few tools, and was actually waiting for the Upcast license in
my email when I got this message from you.  

I loaded an MS Word doc into Upcast, and managed to get the settings
correct for conversion, learning that you first have to use the Upcast
DTD first, then noting it uses the resulting xml file to run it through
the DocBook DTD, which I understand way you are supposed to get a
DocBook output.

But something happened that I didn't expect.  Unfortunately, the MS Word
document properties are interpreted such that it ruined the metadata of
the resulting xml file.  It attempted to use unusual combinations of
nested tags to do things that would take me only one or tags to do in
good old XEmacs.

Since we have revieweres that don't have Linux but run Windows, I
followed the suggested link to www.morphon.com.

I am VERY VERY pleased with this program!

It's being offered by a free license.  It is able to parse and validate
DocBook just fine.  It also offers alternative views other than source
with markup tags, for those Windows users who aren't comfortable working
that way.  I found though, that other views seemed to hide url links
provided in the document.  

This seems to be a good solution for reviewers who are still only
comfortable using Windows applications, and will allow them to make
their revisions without adding any proprietary data to the source file. 
And since the reviewers are working from a copy of the original, we can
still easily provide diff files to the authors to compare against the
original, as we always send these with the revisions (at least that's
always been my practice).

I just wanted to make sure that both lists found out about this tool.  I
thought it might make the newer reviewers feel better about not having
Linux tools.

Thanks for all the help and references!
Tab


On Fri, 2003-11-14 at 10:09, Bob Stayton wrote:
> On Thu, Nov 13, 2003 at 04:54:55PM -0800, Tabatha Marshall wrote:
> > Hi all,
> > 
> > I've been exploring Windows (and Linux) solutions for transforming MS
> > Word documents into XML, preferably DocBook XML 4.2.
> > 
> > I tried XMLSPY, which can only be evaluated for 30 days, I've tried
> > Morphon, which is nice for working in XML, but I couldn't figure out
> > what to do with the MS Word doc.
> > 
> > When I've used my Linux tools to convert, I've ended up with an XML
> > file, but it's so awful thanks to all the junk in MS Word, it makes me
> > want to scrap it and just cut/paste everything in, writing the tags in
> > myself.
> > 
> > Anybody have better luck finding an easy way to convert?  Your
> > suggestions are most welcome, and the sooner the better.  I have a guide
> > that needs conversion to XML before month-end.
> > 
> > For the benefit of our reviewers, many of whom use Windows, please use
> > "Reply All" if you have ideas to share on this subject!
> 
> You could check the DocBookWiki tools page, which includes
> several "up" conversion tools:
> 
> http://docbook.org/wiki/moin.cgi/DocBookTools
> 
> I've used UpCast with some success.  It converts a Word
> file to an XML file in its own generalized UpCast DTD,
> and then you can get an XSL stylesheet from them that
> converts the UpCast document to a DocBook document.
> 
> Bob Stayton                                 400 Encinal Street
> Publications Architect                      Santa Cruz, CA  95060
> Technical Publications                      voice: (831) 427-7796
> The SCO Group                               fax:   (831) 429-1887
>                                             email: ####@####.####
-- 
Tabatha Marshall
Web: www.merlinmonroe.com
Linux Documentation Project Review Coordinator (http://www.tldp.org)
Linux Counter Area Manager US:wa (http://counter.li.org)

Subject: Re: MS Word to XML
From: Tabatha Marshall ####@####.####
Date: 15 Nov 2003 09:16:37 -0000
Message-Id: <1068887769.21346.124.camel@mysticchild>

On Sat, 2003-11-15 at 01:11, Tabatha Marshall wrote:
> I loaded an MS Word doc into Upcast, and managed to get the settings
> correct for conversion, learning that you first have to use the Upcast
> DTD first, then noting it uses the resulting xml file to run it through
> the DocBook DTD, which I understand *IS THE* way you are supposed to get a
> DocBook output.

:D
Tab

[<<] [<] Page 1 of 1 [>] [>>]


  ©The Linux Documentation Project, 2014. Listserver maintained by dr Serge Victor on ibiblio.org servers. See current spam statz.