Details

    • Type: New Feature
    • Status: Patch Available
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Today the PutHDFS Processor merely places a file into HDFS from NiFi. There are times when we may want to move Files/Directories around on HDFS as part of a workflow. This could be after the PutHDFS processor has placed the file, or from some other trigger.

      Initially we are targeting to take a flow file attribute of an absolute HDFS path, and be able to move it to a target HDFS Path using the Filesystem RENAME API.

        Activity

        Hide
        jtstorck Jeff Storck added a comment -

        Gray Gwizdz Thanks for your contribution! In doing an initial review of the patch, I have a few questions/comments:

        • In MoveHDFS#processBatchOfFiles, line 354, there's a for loop that tries 10 times to rename/move the file. Do the rename/copy attempts return false often enough that the for loop is required? Have you investigated why those attempts return false?
        • Can you add a test for the copy operation, in addition to the existing test for the move operation?
        • PutHDFS properties for REMOTE_OWNER and REMOTE_GROUP both allow expression language. Have you considered allowing that for the same properties in MoveHDFS?
        • There are three public static variables that appear to be unused, and should be removed: "MAX_WORKING_QUEUE_SIZE", "BUFFER_SIZE_KEY", "BUFFER_SIZE_DEFAULT".
        Show
        jtstorck Jeff Storck added a comment - Gray Gwizdz Thanks for your contribution! In doing an initial review of the patch, I have a few questions/comments: In MoveHDFS#processBatchOfFiles, line 354, there's a for loop that tries 10 times to rename/move the file. Do the rename/copy attempts return false often enough that the for loop is required? Have you investigated why those attempts return false? Can you add a test for the copy operation, in addition to the existing test for the move operation? PutHDFS properties for REMOTE_OWNER and REMOTE_GROUP both allow expression language. Have you considered allowing that for the same properties in MoveHDFS? There are three public static variables that appear to be unused, and should be removed: "MAX_WORKING_QUEUE_SIZE", "BUFFER_SIZE_KEY", "BUFFER_SIZE_DEFAULT".
        Hide
        jtstorck Jeff Storck added a comment -

        Gray Gwizdz Very sorry for the delay on this! I had some other tasks that pulled me away from reviewing your PR, but I'll be picking it back up tomorrow.

        Show
        jtstorck Jeff Storck added a comment - Gray Gwizdz Very sorry for the delay on this! I had some other tasks that pulled me away from reviewing your PR, but I'll be picking it back up tomorrow.
        Hide
        jtstorck Jeff Storck added a comment -

        Reviewing...

        Show
        jtstorck Jeff Storck added a comment - Reviewing...
        Hide
        ggwizdz1 Gray Gwizdz added a comment -

        The most recent patch will have all of the code!

        Show
        ggwizdz1 Gray Gwizdz added a comment - The most recent patch will have all of the code!
        Hide
        joewitt Joseph Witt added a comment -

        Joseph Niemiec ah thanks for the bump. We have a bit of an easier time tracking the github ones but the patch available thing we should see. Anyway, do these patches build on eachother or is one the right one and the other and old one?

        Show
        joewitt Joseph Witt added a comment - Joseph Niemiec ah thanks for the bump. We have a bit of an easier time tracking the github ones but the patch available thing we should see. Anyway, do these patches build on eachother or is one the right one and the other and old one?
        Hide
        josephxsxn Joseph Niemiec added a comment -

        Hey guys,

        any change in the review of this processor? Its been stuck with a patch available for a while now.

        Show
        josephxsxn Joseph Niemiec added a comment - Hey guys, any change in the review of this processor? Its been stuck with a patch available for a while now.
        Hide
        ggwizdz1 Gray Gwizdz added a comment -

        Hi Peter,

        Please see the attached patch that include your recommendations, thanks for the ideas!

        Gray

        Show
        ggwizdz1 Gray Gwizdz added a comment - Hi Peter, Please see the attached patch that include your recommendations, thanks for the ideas! Gray
        Hide
        ggwizdz1 Gray Gwizdz added a comment -

        Hi Peter,

        Thank you for the recommendations! I agree with your thinking, let me spend some time working through these changes and I'll send another patch once I've made progress.

        Thanks!
        Gray

        Show
        ggwizdz1 Gray Gwizdz added a comment - Hi Peter, Thank you for the recommendations! I agree with your thinking, let me spend some time working through these changes and I'll send another patch once I've made progress. Thanks! Gray
        Hide
        patricker Peter Wicks added a comment -

        Gray Gwizdz, thanks for the contribution. Couple of thoughts, let me know what you think/if you'd be willing to do some additional work:

        • Make Input Folder optional, and if not present use an incoming FlowFile, either it's value or an attribute on it.
        • Allow for Copying along with Moving. This is a common use case I run into where right now I end up Putting to HDFS twice (once for HIVE ingest and once for Archive) where a copy would be a great second step.

        Copy could be a separate ticket, but it might be nice to have a Copy/Move processor all in one.

        Thoughts?

        Show
        patricker Peter Wicks added a comment - Gray Gwizdz , thanks for the contribution. Couple of thoughts, let me know what you think/if you'd be willing to do some additional work: Make Input Folder optional, and if not present use an incoming FlowFile, either it's value or an attribute on it. Allow for Copying along with Moving. This is a common use case I run into where right now I end up Putting to HDFS twice (once for HIVE ingest and once for Archive) where a copy would be a great second step. Copy could be a separate ticket, but it might be nice to have a Copy/Move processor all in one. Thoughts?
        Hide
        ggwizdz1 Gray Gwizdz added a comment -

        Attached patch demonstrates the ability to move a file from one directory to another on HDFS.

        Show
        ggwizdz1 Gray Gwizdz added a comment - Attached patch demonstrates the ability to move a file from one directory to another on HDFS.
        Hide
        ggwizdz1 Gray Gwizdz added a comment -

        Ford Motor Company is interested in this functionality, and has developed a patch to fix this need.

        Show
        ggwizdz1 Gray Gwizdz added a comment - Ford Motor Company is interested in this functionality, and has developed a patch to fix this need.

          People

          • Assignee:
            ggwizdz1 Gray Gwizdz
            Reporter:
            josephxsxn Joseph Niemiec
          • Votes:
            3 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development