[JCR-1894] Word doc extraction problem - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Incomplete
Affects Version/s: core 1.4.3
Fix Version/s: None
Component/s: jackrabbit-text-extractors
Labels:
None
Environment:
OS: Windows 2003 sp2 My-eclipse6.0 / tomcat 5.5 and Athelon500+

Description

Hi,
I have a .doc file which contains data inside a table. Now i want to parse the table to get the table values. Normal Parsing is not working for table( I mean using String tokenizer) because it is giving some unwanted special characters while parsing the table. So I just want to convert that .doc to .txt file, then only it is easy to split the values. But i can't make it! Can any one please tell me how to parse a MS WORD TABLE Values?

We need to know the process by which we can index a doc file excluding special characters,
When we will show the excerpt then these special characters make it unreadable.

Thanks in advance.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Rajesh Upadhyay

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 03/Dec/08 10:28

Updated:: 27/Sep/11 21:39

Resolved:: 29/Nov/10 14:13