Tika
  1. Tika
  2. TIKA-366

Increase buffer size for mime type sniffing

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.5
    • Fix Version/s: 0.6
    • Component/s: mime
    • Labels:
      None
    • Environment:

      My local MacBook pro laptop.

      Description

      While working on TIKA-357 to address a similar problem for charset detection, I found an issue with mime identification having to do with the same general problem. Tika right now only deals with the first MimeTypes#getMinLength() bytes of a magic header to do the sniffing of mime type. With the example file attached from Ken Krugler, it's clear that the current min length size of 4 * 1024 bytes isn't enough. Extending it to 8K (8 * 1024 bytes) addresses this issue and seems to open up more opportunity for mime detection at little overhead cost.

        Issue Links

          Activity

          Hide
          Chris A. Mattmann added a comment -
          • fixed in r901033
          Show
          Chris A. Mattmann added a comment - fixed in r901033

            People

            • Assignee:
              Chris A. Mattmann
              Reporter:
              Chris A. Mattmann
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development