Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2435

docx parser missing content when multiple body sections

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.17
    • None
    • None

    Description

      On https://bz.apache.org/bugzilla/show_bug.cgi?id=61354, kramachandran@commvault.com reported that our DOM parser was missing "body" sections after the first body section in docx. PJ Fanning applied the patch, and this will be available when we upgrade to POI 3.17-beta2.

      As a side note, the experimental SAX parser was correctly extracting all text from the triggering document.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tallison Tim Allison
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: