Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-2613

Tool/script for deleting individual message from queue

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.6.0
    • Component/s: None
    • Labels:
      None

      Description

      We had a situation where one of our Flume agents got stuck on a message due to unexpected format. To get things moving again, I stopped the Flume agent, moved the file backed channel data out of the way and re-started the Flume agent. I'd like to pop the bad message from the queue data on disk and ideally there would be a recommended tool/script.

      1. FLUME-2613-0.patch
        22 kB
        Ashish Paliwal
      2. FLUME-2613-1.patch
        14 kB
        Ashish Paliwal
      3. FLUME-2613-2.patch
        14 kB
        Ashish Paliwal
      4. FLUME-2613-3.patch
        16 kB
        Ashish Paliwal

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Flume-trunk-hbase-98 #85 (See https://builds.apache.org/job/Flume-trunk-hbase-98/85/)
          FLUME-2613. Add support in FileChannelIntegrityTool to remove invalid events from the channel. (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=91ec5794589bf3711cca2a251a511fa360e5ac30)

          • flume-tools/src/main/java/org/apache/flume/tools/EventValidator.java
          • flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestEventUtils.java
          • flume-tools/src/test/java/org/apache/flume/tools/TestFileChannelIntegrityTool.java
          • flume-tools/src/main/java/org/apache/flume/tools/FileChannelIntegrityTool.java
          • flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/EventUtils.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Flume-trunk-hbase-98 #85 (See https://builds.apache.org/job/Flume-trunk-hbase-98/85/ ) FLUME-2613 . Add support in FileChannelIntegrityTool to remove invalid events from the channel. (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=91ec5794589bf3711cca2a251a511fa360e5ac30 ) flume-tools/src/main/java/org/apache/flume/tools/EventValidator.java flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestEventUtils.java flume-tools/src/test/java/org/apache/flume/tools/TestFileChannelIntegrityTool.java flume-tools/src/main/java/org/apache/flume/tools/FileChannelIntegrityTool.java flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/EventUtils.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in flume-trunk #728 (See https://builds.apache.org/job/flume-trunk/728/)
          FLUME-2613. Add support in FileChannelIntegrityTool to remove invalid events from the channel. (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=91ec5794589bf3711cca2a251a511fa360e5ac30)

          • flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/EventUtils.java
          • flume-tools/src/test/java/org/apache/flume/tools/TestFileChannelIntegrityTool.java
          • flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestEventUtils.java
          • flume-tools/src/main/java/org/apache/flume/tools/FileChannelIntegrityTool.java
          • flume-tools/src/main/java/org/apache/flume/tools/EventValidator.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in flume-trunk #728 (See https://builds.apache.org/job/flume-trunk/728/ ) FLUME-2613 . Add support in FileChannelIntegrityTool to remove invalid events from the channel. (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=91ec5794589bf3711cca2a251a511fa360e5ac30 ) flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/EventUtils.java flume-tools/src/test/java/org/apache/flume/tools/TestFileChannelIntegrityTool.java flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestEventUtils.java flume-tools/src/main/java/org/apache/flume/tools/FileChannelIntegrityTool.java flume-tools/src/main/java/org/apache/flume/tools/EventValidator.java
          Hide
          hshreedharan Hari Shreedharan added a comment -

          Committed! Thanks Ashish!

          Show
          hshreedharan Hari Shreedharan added a comment - Committed! Thanks Ashish!
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 6019fcf4c9f773b521813145e026c10c96584527 in flume's branch refs/heads/flume-1.6 from Hari Shreedharan
          [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=6019fcf ]

          FLUME-2613. Add support in FileChannelIntegrityTool to remove invalid events from the channel.

          (Ashish Paliwal via Hari)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 6019fcf4c9f773b521813145e026c10c96584527 in flume's branch refs/heads/flume-1.6 from Hari Shreedharan [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=6019fcf ] FLUME-2613 . Add support in FileChannelIntegrityTool to remove invalid events from the channel. (Ashish Paliwal via Hari)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 91ec5794589bf3711cca2a251a511fa360e5ac30 in flume's branch refs/heads/trunk from Hari Shreedharan
          [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=91ec579 ]

          FLUME-2613. Add support in FileChannelIntegrityTool to remove invalid events from the channel.

          (Ashish Paliwal via Hari)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 91ec5794589bf3711cca2a251a511fa360e5ac30 in flume's branch refs/heads/trunk from Hari Shreedharan [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=91ec579 ] FLUME-2613 . Add support in FileChannelIntegrityTool to remove invalid events from the channel. (Ashish Paliwal via Hari)
          Hide
          hshreedharan Hari Shreedharan added a comment -

          +1. I am making a couple of minor changes and committing this.

          Show
          hshreedharan Hari Shreedharan added a comment - +1. I am making a couple of minor changes and committing this.
          Hide
          paliwalashish Ashish Paliwal added a comment -

          Handling for command line parameters added. Parameter for Builder can be passed using -D option and they are passed on EventValidator Builder.

          Show
          paliwalashish Ashish Paliwal added a comment - Handling for command line parameters added. Parameter for Builder can be passed using -D option and they are passed on EventValidator Builder.
          Hide
          paliwalashish Ashish Paliwal added a comment -

          Writing in customized format was in context of a JIRA where it was needed to extract log lines from event, writing back in channel is tricky. It's to get the data and write the Event payload like log line to a file.

          Command line options was the way I wanted to go, but it was making things a bit messy so let go of it. Shall work on it.

          Show
          paliwalashish Ashish Paliwal added a comment - Writing in customized format was in context of a JIRA where it was needed to extract log lines from event, writing back in channel is tricky. It's to get the data and write the Event payload like log line to a file. Command line options was the way I wanted to go, but it was making things a bit messy so let go of it. Shall work on it.
          Hide
          hshreedharan Hari Shreedharan added a comment -

          Updating events in the channel would be really tricky/difficult, since that would require the entire remaining part of the file to be re-written and would also need to adjust for the file reaching max length etc. Also, the checkpoint would have to be re-written. I don't think that is something we want to worry about. A validator can write out the file to a different file if it needs to.

          We should also allow the validator to parse command line options, since it might be useful to make custom changes based on config.

          Show
          hshreedharan Hari Shreedharan added a comment - Updating events in the channel would be really tricky/difficult, since that would require the entire remaining part of the file to be re-written and would also need to adjust for the file reaching max length etc. Also, the checkpoint would have to be re-written. I don't think that is something we want to worry about. A validator can write out the file to a different file if it needs to. We should also allow the validator to parse command line options, since it might be useful to make custom changes based on config.
          Hide
          paliwalashish Ashish Paliwal added a comment -

          Just a wild thought, The Event Validator implementation can also be used as a way to validate Event Data and can write in customised format, if the need is to recover the data.

          Show
          paliwalashish Ashish Paliwal added a comment - Just a wild thought, The Event Validator implementation can also be used as a way to validate Event Data and can write in customised format, if the need is to recover the data.
          Hide
          paliwalashish Ashish Paliwal added a comment -

          Updated per review comments

          Show
          paliwalashish Ashish Paliwal added a comment - Updated per review comments
          Hide
          paliwalashish Ashish Paliwal added a comment -

          Updated as per comment. Now the tool is part of File Channel Integrity tool, default is NOOP, which does nothing. If event validator is provided, it shall be used.

          Show
          paliwalashish Ashish Paliwal added a comment - Updated as per comment. Now the tool is part of File Channel Integrity tool, default is NOOP, which does nothing. If event validator is provided, it shall be used.
          Hide
          paliwalashish Ashish Paliwal added a comment -

          Yup. The code shall be simple. Check would be similar to the new tool, default would be NOOP validator. If a validator is provided it would replace the NOOP validator.

          Show
          paliwalashish Ashish Paliwal added a comment - Yup. The code shall be simple. Check would be similar to the new tool, default would be NOOP validator. If a validator is provided it would replace the NOOP validator.
          Hide
          hshreedharan Hari Shreedharan added a comment -

          We should be careful to ensure that the new parameter should not be mandatory. If added, the event's integrity is checked. If the parameter is not there, the tool continues to work as before.

          Show
          hshreedharan Hari Shreedharan added a comment - We should be careful to ensure that the new parameter should not be mandatory. If added, the event's integrity is checked. If the parameter is not there, the tool continues to work as before.
          Hide
          paliwalashish Ashish Paliwal added a comment -

          Yup we can do that. Let me get a patch with that. Shouldn't take long. Shall add a flag and required parameter to the file channel tool.

          Show
          paliwalashish Ashish Paliwal added a comment - Yup we can do that. Let me get a patch with that. Shouldn't take long. Shall add a flag and required parameter to the file channel tool.
          Hide
          hshreedharan Hari Shreedharan added a comment -

          Ashish Paliwal - How about adding this functionality to the existing tool? We could just add one parameter, which tells the tool that event integrity should be checked using the class specified? Much of the code seems to be reused anyway - we should probably avoid that. I think we should be able to do that, correct?

          Show
          hshreedharan Hari Shreedharan added a comment - Ashish Paliwal - How about adding this functionality to the existing tool? We could just add one parameter, which tells the tool that event integrity should be checked using the class specified? Much of the code seems to be reused anyway - we should probably avoid that. I think we should be able to do that, correct?
          Hide
          paliwalashish Ashish Paliwal added a comment -

          Review request created, working on adding documentation. I didn't test it with real data, just relied on test cases. Shall submit documentation along with review comments.

          Show
          paliwalashish Ashish Paliwal added a comment - Review request created, working on adding documentation. I didn't test it with real data, just relied on test cases. Shall submit documentation along with review comments.
          Hide
          paliwalashish Ashish Paliwal added a comment -

          Working on test cases, got caught in other things. Shall try to submit a patch today.

          Show
          paliwalashish Ashish Paliwal added a comment - Working on test cases, got caught in other things. Shall try to submit a patch today.
          Hide
          hshreedharan Hari Shreedharan added a comment -

          Any updates, Ashish Paliwal?

          Show
          hshreedharan Hari Shreedharan added a comment - Any updates, Ashish Paliwal ?
          Hide
          paliwalashish Ashish Paliwal added a comment -

          Yup, I thought the same but since I was not sure, needed a confirmation. Let me try to get a working patch before weekend so that it makes it in 1.6

          Show
          paliwalashish Ashish Paliwal added a comment - Yup, I thought the same but since I was not sure, needed a confirmation. Let me try to get a working patch before weekend so that it makes it in 1.6
          Hide
          hshreedharan Hari Shreedharan added a comment -

          The rest - Take, Commit, Rollback can be ignored, since you don't care about those.

          Show
          hshreedharan Hari Shreedharan added a comment - The rest - Take, Commit, Rollback can be ignored, since you don't care about those.
          Hide
          hshreedharan Hari Shreedharan added a comment -

          Yes, so TransactionEventRecord is the parent class of Put. You can get a TER, check if it is a Put using instanceOf, if it is, then build an Event (don't expose the Put) by grabbing the header and body from the Put, and pass it to the verifier. Does that make sense?

          Show
          hshreedharan Hari Shreedharan added a comment - Yes, so TransactionEventRecord is the parent class of Put. You can get a TER, check if it is a Put using instanceOf, if it is, then build an Event (don't expose the Put) by grabbing the header and body from the Put, and pass it to the verifier. Does that make sense?
          Hide
          paliwalashish Ashish Paliwal added a comment -

          record.getEvent() returns TransactionEventRecord, which doesn't have Event info, only Put class has. This is where I am loosing track. It can be Put, Commit, Rollback or Take. This is where I am looking for some help, how to handle this. If I can make through it, shall submit a patch today

          Show
          paliwalashish Ashish Paliwal added a comment - record.getEvent() returns TransactionEventRecord, which doesn't have Event info, only Put class has. This is where I am loosing track. It can be Put, Commit, Rollback or Take. This is where I am looking for some help, how to handle this. If I can make through it, shall submit a patch today
          Hide
          hshreedharan Hari Shreedharan added a comment -

          record.getEvent() on line 89 -> returns the actual event. Just pass this to the verifier class and you get false, just mark it as Noop.

          Show
          hshreedharan Hari Shreedharan added a comment - record.getEvent() on line 89 -> returns the actual event. Just pass this to the verifier class and you get false, just mark it as Noop.
          Hide
          hshreedharan Hari Shreedharan added a comment -

          I think the logic is exactly the same as current in the tool - loop through the events, but for each event instead of verifying the checksum run the custom code. So I think all you need is an interface with one method boolean verify(Event e). If that returns false, mark as NOOP, else just keep moving.

          Show
          hshreedharan Hari Shreedharan added a comment - I think the logic is exactly the same as current in the tool - loop through the events, but for each event instead of verifying the checksum run the custom code. So I think all you need is an interface with one method boolean verify(Event e). If that returns false, mark as NOOP, else just keep moving.
          Hide
          paliwalashish Ashish Paliwal added a comment -

          Are there any classes we can reuse from File Channel itself that could ease development?

          Show
          paliwalashish Ashish Paliwal added a comment - Are there any classes we can reuse from File Channel itself that could ease development?
          Hide
          hshreedharan Hari Shreedharan added a comment -

          Sounds good to me sir.

          Show
          hshreedharan Hari Shreedharan added a comment - Sounds good to me sir.
          Hide
          paliwalashish Ashish Paliwal added a comment -

          I couldn't find a way to easily do the job using existing code in FileChannel Integrity tool. Most of the file that could be used have package scope in file channel. Found one way to do this

          1. Reuse existing File Channel integrity tool code base
          2. Inside the loop where file is verified for non-corrupted events, open a RandomAccessFile on the same datadir in read only mode
          3. We get event position inside the loop (already present), we reuse the pointer to seek to the offset from Step#2
          4. Reuse LogFile.java#get(int offset) code to get the Flume Event
          5. Apply the user supplied logic to validate Event data
          6. If Event is invalid, we mark record as NOOP

          Hari Shreedharan/Roshan Naik Any suggestions on the approach? I am going nuts trying for figure an easy solution for this.

          Show
          paliwalashish Ashish Paliwal added a comment - I couldn't find a way to easily do the job using existing code in FileChannel Integrity tool. Most of the file that could be used have package scope in file channel. Found one way to do this 1. Reuse existing File Channel integrity tool code base 2. Inside the loop where file is verified for non-corrupted events, open a RandomAccessFile on the same datadir in read only mode 3. We get event position inside the loop (already present), we reuse the pointer to seek to the offset from Step#2 4. Reuse LogFile.java#get(int offset) code to get the Flume Event 5. Apply the user supplied logic to validate Event data 6. If Event is invalid, we mark record as NOOP Hari Shreedharan / Roshan Naik Any suggestions on the approach? I am going nuts trying for figure an easy solution for this.

            People

            • Assignee:
              paliwalashish Ashish Paliwal
              Reporter:
              mclaughlinct Charles McLaughlin
            • Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development