Uploaded image for project: 'Jackrabbit Content Repository'
  1. Jackrabbit Content Repository
  2. JCR-1894

Word doc extraction problem

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Incomplete
    • core 1.4.3
    • None
    • None
    • OS: Windows 2003 sp2 My-eclipse6.0 / tomcat 5.5 and Athelon500+

    Description

      Hi,
      I have a .doc file which contains data inside a table. Now i want to parse the table to get the table values. Normal Parsing is not working for table( I mean using String tokenizer) because it is giving some unwanted special characters while parsing the table. So I just want to convert that .doc to .txt file, then only it is easy to split the values. But i can't make it! Can any one please tell me how to parse a MS WORD TABLE Values?

      We need to know the process by which we can index a doc file excluding special characters,
      When we will show the excerpt then these special characters make it unreadable.

      Thanks in advance.

      Attachments

        Activity

          People

            Unassigned Unassigned
            raj.upa8 Rajesh Upadhyay
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: