Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1440

Auto-Paragraph numbers not extracted from Word Document

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • None
    • None
    • parser
    • Windows 7, Windows Server 2008, Tomcat

    Description

      When the text is extracted from a Microsoft Word document that uses automatic numbering, the text of the automatic numbers is not extracted. As the numbers can be critical to the meaning of the document (as in the case of cross-references), they should be calculated and extracted if at all possible.

      Attachments

        1. Tika test 2003.doc
          24 kB
          Steve Gullion
        2. Tika Test.docx
          15 kB
          Steve Gullion

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gullbyrd Steve Gullion
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: