Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1440

Auto-Paragraph numbers not extracted from Word Document

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: parser
    • Environment:

      Windows 7, Windows Server 2008, Tomcat

      Description

      When the text is extracted from a Microsoft Word document that uses automatic numbering, the text of the automatic numbers is not extracted. As the numbers can be critical to the meaning of the document (as in the case of cross-references), they should be calculated and extracted if at all possible.

        Attachments

        1. Tika test 2003.doc
          24 kB
          Steve Gullion
        2. Tika Test.docx
          15 kB
          Steve Gullion

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                gullbyrd Steve Gullion
              • Votes:
                1 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: