Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2143

POI deprecated method used in TIKA 1.13

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.9, 1.13
    • Fix Version/s: 1.13
    • Component/s: parser
    • Labels:
      None
    • Environment:

      Windows java application

      Description

      We see that TIKA throws a long list of errors when extraction ppt files. We tested with standalone tike application (1.13) we cannot reproduce the issue.
      We took a look at POI source code and abserved the class "HSLFSlideShow" we could see the below deprecated method defined

      *
      /**

      • * Get the lookup from slide numbers to their offsets inside
      • * _ptrData, used when adding or moving slides.
      • *
      • * @deprecated since POI 3.11, not supported anymore
      • */
      • @Deprecated
      • public Hashtable<Integer,Integer> getSlideOffsetDataLocationsLookup() { - throw new UnsupportedOperationException("PersistPtrHolder.getSlideOffsetDataLocationsLookup() is not supported since 3.12-Beta1"); - }

      *
      we may think Tika library still calling this deprecated method causing this run time Exception

      Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@204c3b78
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
      at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
      at com.searchtechnologies.aspire.docprocessing.extracttext.ExtractTextStage.process(ExtractTextStage.java:140)
      ... 14 more
      Caused by: java.lang.UnsupportedOperationException
      at java.util.AbstractMap$SimpleImmutableEntry.setValue(Unknown Source)
      at org.apache.poi.hslf.HSLFSlideShow.read(HSLFSlideShow.java:293)
      at org.apache.poi.hslf.HSLFSlideShow.buildRecords(HSLFSlideShow.java:273)
      at org.apache.poi.hslf.HSLFSlideShow.<init>(HSLFSlideShow.java:188)
      at org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:61)
      at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:149)
      at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
      ... 17 more

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              sbathrutheen sbathrutheen
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: