[<<] [<] Page 1 of 1 [>] [>>] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Subject:
MS Word to XML
From: Tabatha Marshall ####@####.#### Date: 14 Nov 2003 00:55:23 -0000 Message-Id: <1068771294.20384.8.camel@mysticchild> Hi all, I've been exploring Windows (and Linux) solutions for transforming MS Word documents into XML, preferably DocBook XML 4.2. I tried XMLSPY, which can only be evaluated for 30 days, I've tried Morphon, which is nice for working in XML, but I couldn't figure out what to do with the MS Word doc. When I've used my Linux tools to convert, I've ended up with an XML file, but it's so awful thanks to all the junk in MS Word, it makes me want to scrap it and just cut/paste everything in, writing the tags in myself. Anybody have better luck finding an easy way to convert? Your suggestions are most welcome, and the sooner the better. I have a guide that needs conversion to XML before month-end. For the benefit of our reviewers, many of whom use Windows, please use "Reply All" if you have ideas to share on this subject! Much thanks! Tabatha -- Tabatha Marshall Web: www.merlinmonroe.com Linux Documentation Project Review Coordinator (http://www.tldp.org) Linux Counter Area Manager US:wa (http://counter.li.org) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Subject:
Re: MS Word to XML
From: Tabatha Marshall ####@####.#### Date: 14 Nov 2003 03:00:52 -0000 Message-Id: <1068778823.20382.95.camel@mysticchild> On Thu, 2003-11-13 at 18:29, Saqib Ali wrote: > Hello Tabatha, > > OpenOffice(OO) 1.1 supports exporting to DocBook XML 4.2. You might wanna > try opening the MS Word document in OO and try exporting to DocBook. I went to xml.openoffice.org and looked into all that, but was very lost. I downloaded some things (*I think*) but couldn't figure out their instructions to make it all work. If you have any good references, please let me know. > Another option is Save the file OO format (XML file compressed in a ZIP), > and try using an XSLT to go from OO XML to DocBook XML. I saved the MS Word document in OpenOffice and unzipped the .sxw file, since everything's done in XML, but it was still very ugly. I couldn't figure out how to just get the doctype statement reading "book" with the proper link to open-oasis, other than to manually put it in. The whole thing overall left me a little muddled. > Also try w2XML from http://www.docsoft.com/w2xmlv2.htm Off to look at it right now. Thanks! > http://validate.sf.net <---- HTML/XHTML/DocBook Validator I'm gonna take a peek at that too. :D > > On Thu, 13 Nov 2003, Tabatha Marshall wrote: > > > Hi all, > > > > I've been exploring Windows (and Linux) solutions for transforming MS > > Word documents into XML, preferably DocBook XML 4.2. > > > > I tried XMLSPY, which can only be evaluated for 30 days, I've tried > > Morphon, which is nice for working in XML, but I couldn't figure out > > what to do with the MS Word doc. > > > > When I've used my Linux tools to convert, I've ended up with an XML > > file, but it's so awful thanks to all the junk in MS Word, it makes me > > want to scrap it and just cut/paste everything in, writing the tags in > > myself. > > > > Anybody have better luck finding an easy way to convert? Your > > suggestions are most welcome, and the sooner the better. I have a guide > > that needs conversion to XML before month-end. > > > > For the benefit of our reviewers, many of whom use Windows, please use > > "Reply All" if you have ideas to share on this subject! > > > > Much thanks! > > Tabatha > > > > -- > > Tabatha Marshall > > Web: www.merlinmonroe.com > > Linux Documentation Project Review Coordinator (http://www.tldp.org) > > Linux Counter Area Manager US:wa (http://counter.li.org) > > > > > > ______________________ > > http://lists.tldp.org/ > > > > > > ______________________ > http://lists.tldp.org/ -- Tabatha Marshall Web: www.merlinmonroe.com Linux Documentation Project Review Coordinator (http://www.tldp.org) Linux Counter Area Manager US:wa (http://counter.li.org) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Subject:
Re: MS Word to XML
From: Tabatha Marshall ####@####.#### Date: 14 Nov 2003 03:44:54 -0000 Message-Id: <1068781465.20382.98.camel@mysticchild> On Thu, 2003-11-13 at 19:00, Tabatha Marshall wrote: > > Also try w2XML from http://www.docsoft.com/w2xmlv2.htm Nope. When I tried to install it I was told I need the .Net framework and the install aborted. I was going for the trial version of W2XML. Did I try the wrong thing? -- Tabatha Marshall Web: www.merlinmonroe.com Linux Documentation Project Review Coordinator (http://www.tldp.org) Linux Counter Area Manager US:wa (http://counter.li.org) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Subject:
Re: MS Word to XML
From: Randy Kramer ####@####.#### Date: 14 Nov 2003 04:18:43 -0000 Message-Id: <200311132329.31036.rhkramer@fast.net> On Thursday 13 November 2003 09:29 pm, Saqib Ali wrote: > OpenOffice(OO) 1.1 supports exporting to DocBook XML 4.2. You might wanna > try opening the MS Word document in OO and try exporting to DocBook. > > Another option is Save the file OO format (XML file compressed in a ZIP), > and try using an XSLT to go from OO XML to DocBook XML. > > Also try w2XML from http://www.docsoft.com/w2xmlv2.htm Hmm, just realized that AbiWord can import and export DocBook (via a plugin) -- see http://www.abisource.com/twiki/bin/view/Abiword/PluginMatrix. (Which, BTW, is a TWiki -- if you go there, also check out http://www.abisource.com/twiki/bin/view/Abiword/AbiWordFAQ for something of a TWiki "application". Randy Kramer Two asides: In some sense, AbiWord is something of a native DocBook editor then, as, IIUC, import and export "filters" in AbiWord don't translate to and from the native AbiWord file format, but instead directly load / unload data from AbiWords internal piece table. And having reminded myself of that, if I could built an import and export filter for "TWikiText" ... | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Subject:
Re: MS Word to XML
From: Randy Kramer ####@####.#### Date: 14 Nov 2003 12:25:50 -0000 Message-Id: <200311140736.50113.rhkramer@fast.net> I will re-read my emails before posting. I will re-read my emails before posting. I will re-read my emails before posting. ... <edited> On Thursday 13 November 2003 11:29 pm, Randy Kramer wrote: > Hmm, just realized that AbiWord can import and export DocBook (via a > plugin) -- see > http://www.abisource.com/twiki/bin/view/Abiword/PluginMatrix. (Which, BTW, > is a TWiki -- if you go there, also check out > http://www.abisource.com/twiki/bin/view/Abiword/AbiWordFAQ for an example > of > a TWiki "application". > > Randy Kramer > > Two asides: > > In some sense, AbiWord is a native DocBook editor as, > IIUC, import and export "filters" in AbiWord don't translate to and from > the native AbiWord file format, but instead directly load / unload data > from AbiWord's internal piece table. > > And having reminded myself of that, if I could build an import and export > filter for "TWikiText" ... | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Subject:
Re: MS Word to XML
From: Tabatha Marshall ####@####.#### Date: 15 Nov 2003 09:11:55 -0000 Message-Id: <1068887483.21346.122.camel@mysticchild> Thanks for the info, Bob! I tried a few tools, and was actually waiting for the Upcast license in my email when I got this message from you. I loaded an MS Word doc into Upcast, and managed to get the settings correct for conversion, learning that you first have to use the Upcast DTD first, then noting it uses the resulting xml file to run it through the DocBook DTD, which I understand way you are supposed to get a DocBook output. But something happened that I didn't expect. Unfortunately, the MS Word document properties are interpreted such that it ruined the metadata of the resulting xml file. It attempted to use unusual combinations of nested tags to do things that would take me only one or tags to do in good old XEmacs. Since we have revieweres that don't have Linux but run Windows, I followed the suggested link to www.morphon.com. I am VERY VERY pleased with this program! It's being offered by a free license. It is able to parse and validate DocBook just fine. It also offers alternative views other than source with markup tags, for those Windows users who aren't comfortable working that way. I found though, that other views seemed to hide url links provided in the document. This seems to be a good solution for reviewers who are still only comfortable using Windows applications, and will allow them to make their revisions without adding any proprietary data to the source file. And since the reviewers are working from a copy of the original, we can still easily provide diff files to the authors to compare against the original, as we always send these with the revisions (at least that's always been my practice). I just wanted to make sure that both lists found out about this tool. I thought it might make the newer reviewers feel better about not having Linux tools. Thanks for all the help and references! Tab On Fri, 2003-11-14 at 10:09, Bob Stayton wrote: > On Thu, Nov 13, 2003 at 04:54:55PM -0800, Tabatha Marshall wrote: > > Hi all, > > > > I've been exploring Windows (and Linux) solutions for transforming MS > > Word documents into XML, preferably DocBook XML 4.2. > > > > I tried XMLSPY, which can only be evaluated for 30 days, I've tried > > Morphon, which is nice for working in XML, but I couldn't figure out > > what to do with the MS Word doc. > > > > When I've used my Linux tools to convert, I've ended up with an XML > > file, but it's so awful thanks to all the junk in MS Word, it makes me > > want to scrap it and just cut/paste everything in, writing the tags in > > myself. > > > > Anybody have better luck finding an easy way to convert? Your > > suggestions are most welcome, and the sooner the better. I have a guide > > that needs conversion to XML before month-end. > > > > For the benefit of our reviewers, many of whom use Windows, please use > > "Reply All" if you have ideas to share on this subject! > > You could check the DocBookWiki tools page, which includes > several "up" conversion tools: > > http://docbook.org/wiki/moin.cgi/DocBookTools > > I've used UpCast with some success. It converts a Word > file to an XML file in its own generalized UpCast DTD, > and then you can get an XSL stylesheet from them that > converts the UpCast document to a DocBook document. > > Bob Stayton 400 Encinal Street > Publications Architect Santa Cruz, CA 95060 > Technical Publications voice: (831) 427-7796 > The SCO Group fax: (831) 429-1887 > email: ####@####.#### -- Tabatha Marshall Web: www.merlinmonroe.com Linux Documentation Project Review Coordinator (http://www.tldp.org) Linux Counter Area Manager US:wa (http://counter.li.org) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Subject:
Re: MS Word to XML
From: Tabatha Marshall ####@####.#### Date: 15 Nov 2003 09:16:37 -0000 Message-Id: <1068887769.21346.124.camel@mysticchild> On Sat, 2003-11-15 at 01:11, Tabatha Marshall wrote: > I loaded an MS Word doc into Upcast, and managed to get the settings > correct for conversion, learning that you first have to use the Upcast > DTD first, then noting it uses the resulting xml file to run it through > the DocBook DTD, which I understand *IS THE* way you are supposed to get a > DocBook output. :D Tab | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
[<<] [<] Page 1 of 1 [>] [>>] |