Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1991

Tika mime detection not using Nutch supplied tika-mimetypes.xml for content based detection

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1, 1.10, 1.11, 2.3.1
    • Fix Version/s: 1.10
    • Component/s: util
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      From Nutch Version 1.5 onwards the MimeUtil.java class that acts as a facade to Tika to perform mime type detection uses a process that attempts a match using the mimetype returned by the server, the filename and the content. NUTCH-1045 provided for the use of an external tika-mimetype.xml file which provides the configuration for this process. However, the content based detection did not use this file, but instead reverted to using the configuration included in the tika library. Consequently, any content based match rules added to the nutch version of the configuration file were not used.

        Attachments

        1. NUTCH-1991-trunk.v2.patch
          1 kB
          Sebastian Nagel
        2. NUTCH-1991-1.6.patch
          2 kB
          Iain Lopata

          Activity

            People

            • Assignee:
              chrismattmann Chris A. Mattmann
              Reporter:
              ilopata1 Iain Lopata
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: