Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2265

Problem with footnotes/endnotes in Tika.parseToString with MS Word (.docx) files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 1.14
    • None
    • parser
    • N/A

    Description

      It seems to be the case that a footnote numbered "1" in the real document will be outputted by Tika.parseToString() as "2" in the footnote reference, and "2" in the corresponding footnote body text.... real footnote "2" becomes "3", "3" becomes "4", etc. Have not yet looked at source code ... I can't imagine it would be difficult to correct this.

      Attachments

        1. test.docx
          15 kB
          Mike Rodent
        2. test shorter.docx
          15 kB
          Mike Rodent

        Activity

          People

            tallison Tim Allison
            mrodent Mike Rodent
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: