Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2143

POI deprecated method used in TIKA 1.13

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.9, 1.13
    • 1.13
    • parser
    • None
    • Windows java application

    Description

      We see that TIKA throws a long list of errors when extraction ppt files. We tested with standalone tike application (1.13) we cannot reproduce the issue.
      We took a look at POI source code and abserved the class "HSLFSlideShow" we could see the below deprecated method defined

      *
      /**

      • * Get the lookup from slide numbers to their offsets inside
      • * _ptrData, used when adding or moving slides.
      • *
      • * @deprecated since POI 3.11, not supported anymore
      • */
      • @Deprecated
      • public Hashtable<Integer,Integer> getSlideOffsetDataLocationsLookup() { - throw new UnsupportedOperationException("PersistPtrHolder.getSlideOffsetDataLocationsLookup() is not supported since 3.12-Beta1"); - }

      *
      we may think Tika library still calling this deprecated method causing this run time Exception

      Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@204c3b78
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
      at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
      at com.searchtechnologies.aspire.docprocessing.extracttext.ExtractTextStage.process(ExtractTextStage.java:140)
      ... 14 more
      Caused by: java.lang.UnsupportedOperationException
      at java.util.AbstractMap$SimpleImmutableEntry.setValue(Unknown Source)
      at org.apache.poi.hslf.HSLFSlideShow.read(HSLFSlideShow.java:293)
      at org.apache.poi.hslf.HSLFSlideShow.buildRecords(HSLFSlideShow.java:273)
      at org.apache.poi.hslf.HSLFSlideShow.<init>(HSLFSlideShow.java:188)
      at org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:61)
      at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:149)
      at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
      ... 17 more

      Attachments

        Activity

          People

            Unassigned Unassigned
            sbathrutheen sbathrutheen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: