Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-11466

Add a ModifyCompression processor

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.0-M1, 1.22.0
    • Extensions
    • None

    Description

      If a user would like to convert from one compression format to another, they currently have to use CompressContent to decompress, then another CompressContent to compress into a different format. Two processors plus disk I/O for the FlowFiles and their underlying content claims can be I/O intensive in that case.

      Instead, a new ModifyCompression processor is proposed, to allow for both decompression of the incoming FlowFile and compression for the outgoing FlowFile, using appropriate memory buffers for the decompression/recompression. Adding "no decompression" and "no compression" options for the respective properties could allow this property to function like CompressContent does now, plus the ability to convert from one compression format (gzip, e.g.) to another (snappy-hadoop, e.g.). One example of a use case where this would be helpful is an I/O bound flow to get compressed data from a legacy source system into HDFS for faster (and larger-volume / distributed) processing of the data.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mattyb149 Matt Burgess
            mattyb149 Matt Burgess
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 20m
                1h 20m

                Slack

                  Issue deployment