Nutch
  1. Nutch
  2. NUTCH-1259

Store detected content type in crawldatum metadata

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 1.5
    • Component/s: parser
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      The MIME-type detected by Tika's Detect() API is never added to a Parse's ContentMetaData or ParseMetaData. Because of this bad Content-Types will end up in the documents.

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Julien Nioche
              Reporter:
              Markus Jelsma
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development