Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-4718

Improve Detection of FlowFile V3 in IdentityMimeType

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 2.0.0-M1, 1.21.0
    • Extensions
    • None

    Description

      IdentifyMimeType uses tika configured with a custom-mimetypes.xml[1] to specify (among others) the flowfile-v* mime types. However, these do not include priorities. Therefore, a NiFi FlowFile V3 package with a payload containing, for example, html including the string:

      <html xmlns=
      

      will be identified as "application/xhtml+xml" [2] which, while matching the pattern, is not as correct as identifying it as application/flowfile-v3. To fix this, I believe we need to specify a higher priority for the FlowFile V3 "magic"...

      [1] https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml#L26-L31
      [2] https://gitbox.apache.org/repos/asf?p=tika.git;a=blob;f=tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml;hb=refs/heads/master

      Attachments

        Issue Links

          Activity

            People

              Nissim Shiman Nissim Shiman
              devriesb Brandon Rhys DeVries
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m