以前看到google能搜索office文档,心里想,对于google来说,其中的难点在于doc到文本格式的转化,这是是怎么完成的呢?
其实这一步在Office提供的COM 中就可以解决了.以C#为例,我们来看看如何将word文档转化为其他的各种格式.
首先,我们需要添加引用(Solution Explorer中右击选择Add Reference… ),选择COM中的Microsoft Word 11.0 Object Library。
然后,添加代码:
[coolcode lang=”java”]
object objFileName = ((object)”sampleWordFile.doc”);
Word.Application Word_App = null;
Word.Document Word_doc = null;
Word_App = new Word.Application();
Word_doc = new Word.Document();
Word.Documents Docs = Word_App.Documents;
Word_App.Visible = false;
//Open File
Word_doc = Docs.Open(ref objFileName, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing,ref missing,
ref missing, ref missing, ref missing);
object format = null;
object tempfile = null;
format = Word.WdSaveFormat.wdFormatText;
tempfile = “c:/temp.txt”;
Word_App.ActiveDocument.SaveAs(ref tempfile, ref format, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing, ref missing);
Word_doc.Close(ref missing, ref missing, ref missing);
[/coolcode]
这样就将 sampleWordFile.doc转化为c:/temp.txt。同样的,我们可以设置
[coolcode lang=”java”]
format = Word.WdSaveFormat.wdFormatRTF;
[/coolcode]
这样,可以转化为rtf格式,甚至是html、xml格式,等等。

BTW,CoolCode 没有专门的CSharp语法,所以只能用java代用了,效果还不错。两者语法真的好像,汗……