Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1179

A corrupt mp3 file can cause an infinite loop in Mp3Parser

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 1.4
    • Fix Version/s: 1.5
    • Component/s: parser
    • Labels:
      None

      Description

      I have a thread that indexes (among other things) files using Apache Sorl. This thread hangs (still running but with no progress) when trying to extract meta data from the mp3 file attached to this issue. Here are a couple of thread dumps taken at various moments:

      "XWiki Solr index thread" daemon prio=10 tid=0x0000000003b72800 nid=0x64b5 runnable [0x00007f46f4617000]
         java.lang.Thread.State: RUNNABLE
      	at org.apache.commons.io.input.AutoCloseInputStream.close(AutoCloseInputStream.java:63)
      	at org.apache.commons.io.input.AutoCloseInputStream.afterRead(AutoCloseInputStream.java:77)
      	at org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:99)
      	at java.io.BufferedInputStream.fill(Unknown Source)
      	at java.io.BufferedInputStream.read1(Unknown Source)
      	at java.io.BufferedInputStream.read(Unknown Source)
      	- locked <0x00000000cb7094e8> (a java.io.BufferedInputStream)
      	at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
      	at java.io.FilterInputStream.read(Unknown Source)
      	at org.apache.tika.io.TailStream.read(TailStream.java:117)
      	at org.apache.tika.io.TailStream.skip(TailStream.java:140)
      	at org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283)
      	at org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160)
      	at org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193)
      	at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)
      	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
      	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
      	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
      	at org.apache.tika.Tika.parseToString(Tika.java:380)
      	...
      
      "XWiki Solr index thread" daemon prio=10 tid=0x0000000003b72800 nid=0x64b5 runnable [0x00007f46f4618000]
         java.lang.Thread.State: RUNNABLE
      	at org.apache.tika.io.TailStream.skip(TailStream.java:133)
      	at org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283)
      	at org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160)
      	at org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193)
      	at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)
      	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
      	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
      	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
      	at org.apache.tika.Tika.parseToString(Tika.java:380)
      	...
      
      "XWiki Solr index thread" daemon prio=10 tid=0x0000000003b72800 nid=0x64b5 runnable [0x00007f46f4617000]
         java.lang.Thread.State: RUNNABLE
      	at java.io.BufferedInputStream.read1(Unknown Source)
      	at java.io.BufferedInputStream.read(Unknown Source)
      	- locked <0x00000000cb1be170> (a java.io.BufferedInputStream)
      	at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
      	at java.io.FilterInputStream.read(Unknown Source)
      	at org.apache.tika.io.TailStream.read(TailStream.java:117)
      	at org.apache.tika.io.TailStream.skip(TailStream.java:140)
      	at org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283)
      	at org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160)
      	at org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193)
      	at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)
      	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
      	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
      	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
      	at org.apache.tika.Tika.parseToString(Tika.java:380)
      	...
      

      This makes our Solr indexer very fragile as it prevents it from indexing other files thus leading to incomplete search results.

        Attachments

        1. corrupt.mp3
          2.11 MB
          Marius Dumitru Florea

          Issue Links

            Activity

              People

              • Assignee:
                rgauss Ray Gauss II
                Reporter:
                mflorea Marius Dumitru Florea
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: