Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-366

Increase buffer size for mime type sniffing

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.5
    • 0.6
    • mime
    • None
    • My local MacBook pro laptop.

    Description

      While working on TIKA-357 to address a similar problem for charset detection, I found an issue with mime identification having to do with the same general problem. Tika right now only deals with the first MimeTypes#getMinLength() bytes of a magic header to do the sniffing of mime type. With the example file attached from Ken Krugler, it's clear that the current min length size of 4 * 1024 bytes isn't enough. Extending it to 8K (8 * 1024 bytes) addresses this issue and seems to open up more opportunity for mime detection at little overhead cost.

      Attachments

        Issue Links

          Activity

            People

              chrismattmann Chris A. Mattmann
              chrismattmann Chris A. Mattmann
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: