Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.24.1
-
None
-
None
Description
Attached mov file produces RuntimeException when parsed with tika v1.24.1
The same mov file can be parsed without any issues with tika v1.19.1
Tika 1.19.1 stand alone app SUCCESSFUL run
[sapte@sapte-dt tikatest]$ java -jar tika-app-1.19.1.jar -m HDSIT_157516.mov Jun 18, 2020 11:25:00 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies.Jun 18, 2020 11:25:00 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. Content-Length: 51066400 Content-Type: application/mp4 Creation-Date: 2015-05-18T16:23:25Z Last-Modified: 2015-05-18T16:31:09Z Last-Save-Date: 2015-05-18T16:31:09Z X-Parsed-By: org.apache.tika.parser.DefaultParser X-Parsed-By: org.apache.tika.parser.mp4.MP4Parser date: 2015-05-18T16:31:09Z dcterms:created: 2015-05-18T16:23:25Z dcterms:modified: 2015-05-18T16:31:09Z meta:creation-date: 2015-05-18T16:23:25Z meta:save-date: 2015-05-18T16:31:09Z modified: 2015-05-18T16:31:09Z resourceName: HDSIT_157516.mov tiff:ImageLength: 1080 tiff:ImageWidth: 1920 xmpDM:audioSampleRate: 30000 xmpDM:duration: 125.99
Tika 1.24.1 standalone app RUNTIMEEXCEPTION run
[sapte@sapte-dt tikatest]$ java -jar tika-app-1.24.1.jar -m HDSIT_157516.mov Jun 18, 2020 11:24:50 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. Jun 18, 2020 11:24:50 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.mp4.MP4Parser@23348b5d at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:293) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149) Caused by: java.lang.RuntimeException: box size of zero means 'till end of file. That is not yet supported at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:90) at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107) at org.mp4parser.boxes.sampleentry.VisualSampleEntry.parse(VisualSampleEntry.java:195) at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115) at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107) at org.mp4parser.boxes.iso14496.part12.SampleDescriptionBox.parse(SampleDescriptionBox.java:91) at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115) at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107) at org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76) at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115) at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107) at org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76) at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115) at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107) at org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76) at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115) at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107) at org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76) at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115) at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107) at org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76) at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115) at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107) at org.mp4parser.IsoFile.<init>(IsoFile.java:58) at org.mp4parser.IsoFile.<init>(IsoFile.java:45) at org.apache.tika.parser.mp4.MP4Parser.parse(MP4Parser.java:130) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 5 more
Commit 8e2eb05292bc35503a3d82a908c426854e23ac83 in v1.24.1 which switched the mp4 parser from googlecode to tallison appears to be directly responsible for the change in behavior.