Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3128

MOV file produces RuntimeException with 1.24.1, used to work with earlier version 1.19.1

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.24.1
    • None
    • parser
    • None

    Description

      Attached mov file produces RuntimeException when parsed with tika v1.24.1

      The same mov file can be parsed without any issues with tika v1.19.1

       Tika 1.19.1 stand alone app SUCCESSFUL run

      [sapte@sapte-dt tikatest]$ java -jar tika-app-1.19.1.jar -m HDSIT_157516.mov
      Jun 18, 2020 11:25:00 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
      WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
      See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
      for optional dependencies.Jun 18, 2020 11:25:00 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
      WARNING: org.xerial's sqlite-jdbc is not loaded.
      Please provide the jar on your classpath to parse sqlite files.
      See tika-parsers/pom.xml for the correct version.
      Content-Length: 51066400
      Content-Type: application/mp4
      Creation-Date: 2015-05-18T16:23:25Z
      Last-Modified: 2015-05-18T16:31:09Z
      Last-Save-Date: 2015-05-18T16:31:09Z
      X-Parsed-By: org.apache.tika.parser.DefaultParser
      X-Parsed-By: org.apache.tika.parser.mp4.MP4Parser
      date: 2015-05-18T16:31:09Z
      dcterms:created: 2015-05-18T16:23:25Z
      dcterms:modified: 2015-05-18T16:31:09Z
      meta:creation-date: 2015-05-18T16:23:25Z
      meta:save-date: 2015-05-18T16:31:09Z
      modified: 2015-05-18T16:31:09Z
      resourceName: HDSIT_157516.mov
      tiff:ImageLength: 1080
      tiff:ImageWidth: 1920
      xmpDM:audioSampleRate: 30000
      xmpDM:duration: 125.99
       

      Tika 1.24.1 standalone app RUNTIMEEXCEPTION run

      [sapte@sapte-dt tikatest]$ java -jar tika-app-1.24.1.jar -m HDSIT_157516.mov
      Jun 18, 2020 11:24:50 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
      WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
      See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
      for optional dependencies.
      
      Jun 18, 2020 11:24:50 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
      WARNING: org.xerial's sqlite-jdbc is not loaded.
      Please provide the jar on your classpath to parse sqlite files.
      See tika-parsers/pom.xml for the correct version.
      Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.mp4.MP4Parser@23348b5d
      	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:293)
      	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
      	at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209)
      	at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496)
      	at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149)
      Caused by: java.lang.RuntimeException: box size of zero means 'till end of file. That is not yet supported
      	at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:90)
      	at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
      	at org.mp4parser.boxes.sampleentry.VisualSampleEntry.parse(VisualSampleEntry.java:195)
      	at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
      	at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
      	at org.mp4parser.boxes.iso14496.part12.SampleDescriptionBox.parse(SampleDescriptionBox.java:91)
      	at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
      	at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
      	at org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76)
      	at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
      	at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
      	at org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76)
      	at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
      	at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
      	at org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76)
      	at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
      	at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
      	at org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76)
      	at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
      	at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
      	at org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76)
      	at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
      	at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
      	at org.mp4parser.IsoFile.<init>(IsoFile.java:58)
      	at org.mp4parser.IsoFile.<init>(IsoFile.java:45)
      	at org.apache.tika.parser.mp4.MP4Parser.parse(MP4Parser.java:130)
      	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      	... 5 more
      

      Commit 8e2eb05292bc35503a3d82a908c426854e23ac83 in v1.24.1 which switched the mp4 parser from googlecode to tallison appears to be directly responsible for the change in behavior.

      Attachments

        1. HDSIT_157516.mov
          48.70 MB
          Sameer Apte

        Activity

          People

            Unassigned Unassigned
            samapt0100 Sameer Apte
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: