Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3156

Missing content from .odt file with hyperlinked image

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.24.1
    • 1.25
    • parser
    • None

    Description

      The attached file was created in Google Docs with an image inside and saved as an .odt file. After saving, I opened the file with LibreOffice and added a hyperlink to the image.
       
      When I parse the file with Tika, neither LinkContentHandler or ToXMLContentHandler show any trace of the hyperlink.
       
      The link is clickable when I open the document, and inside content.xml as :
      <draw:a xlink:type="simple" xlink:href="http://example.test/">
       
      I tried enabling all options in OfficeParserConfig and OOXMLParser but the link is still not extracted.

      Attachments

        1. link-gdocs.odt
          43 kB
          Robert Kaulbach

        Activity

          People

            davemeikle Dave Meikle
            rkv Robert Kaulbach
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: