Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-2450

Improve replay index insertion speed.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.6.0
    • Component/s: None
    • Labels:
      None

      Description

      Insertion into the replay index can take long sometimes because we use a file based index and tree set. We should switch this out for a memory mapped db and a hash set.

      1. FLUME-2450.patch
        0.9 kB
        Hari Shreedharan

        Activity

        Hide
        gvarada75 gautham varada added a comment -

        Could you share how many events were in the queue? Also, was that for a full replay? Are you using backup checkpoints?

        Im not using back up checkpoints, yes it was for the full replay and how do i check the events in the queue ? I can recreate the scenario.

        Show
        gvarada75 gautham varada added a comment - Could you share how many events were in the queue? Also, was that for a full replay? Are you using backup checkpoints? Im not using back up checkpoints, yes it was for the full replay and how do i check the events in the queue ? I can recreate the scenario.
        Hide
        hshreedharan Hari Shreedharan added a comment -

        Gary Malouf - Did you try a build with this patch? What kind of performance did you see?

        The faster alternative I can see is the one that use direct memory instead of mmap - though I can't be sure how much faster it would be. My guesstimate would be that this is reasonably fast as most of these ops would take place in page cache and not on the fs.

        The direct memory one is tricky since the user has to start the application with more direct memory. Even if we check if there is enough direct memory at the time of replay start, it is difficult to be sure there is enough as the replay goes on, since there could be multiple file channels replaying at the same time.

        Show
        hshreedharan Hari Shreedharan added a comment - Gary Malouf - Did you try a build with this patch? What kind of performance did you see? The faster alternative I can see is the one that use direct memory instead of mmap - though I can't be sure how much faster it would be. My guesstimate would be that this is reasonably fast as most of these ops would take place in page cache and not on the fs. The direct memory one is tricky since the user has to start the application with more direct memory. Even if we check if there is enough direct memory at the time of replay start, it is difficult to be sure there is enough as the replay goes on, since there could be multiple file channels replaying at the same time.
        Hide
        brocknoland Brock Noland added a comment -

        I had 45gigs of data parked in the file channel , with the patch flume took about 25 mins to figure itself out

        Could you share how many events were in the queue? Also, was that for a full replay? Are you using backup checkpoints?

        The frustration right now for us is that our flume nodes are basically 'down' until this recovery completes.

        Are your nodes performing a full recovery often? Are you using backup checkpoints? Unless the checkpoint and backpoint checkpoints are gone, a replay should be quite fast.

        Make a new config option to run the version that requires extending the amount of JVM memory

        This actually would not improve recovery much.

        Show
        brocknoland Brock Noland added a comment - I had 45gigs of data parked in the file channel , with the patch flume took about 25 mins to figure itself out Could you share how many events were in the queue? Also, was that for a full replay? Are you using backup checkpoints? The frustration right now for us is that our flume nodes are basically 'down' until this recovery completes. Are your nodes performing a full recovery often? Are you using backup checkpoints? Unless the checkpoint and backpoint checkpoints are gone, a replay should be quite fast. Make a new config option to run the version that requires extending the amount of JVM memory This actually would not improve recovery much.
        Hide
        gmalouf Gary Malouf added a comment -

        The frustration right now for us is that our flume nodes are basically 'down' until this recovery completes. Could it make sense to do one or both of the following:

        1) Make a new config option to run the version that requires extending the amount of JVM memory. This would allow for faster recovery than the current solution, but only be used if it is explicitly set.

        2) Open the ports/start consuming while recovery is taking place.

        Show
        gmalouf Gary Malouf added a comment - The frustration right now for us is that our flume nodes are basically 'down' until this recovery completes. Could it make sense to do one or both of the following: 1) Make a new config option to run the version that requires extending the amount of JVM memory. This would allow for faster recovery than the current solution, but only be used if it is explicitly set. 2) Open the ports/start consuming while recovery is taking place.
        Hide
        gvarada75 gautham varada added a comment -

        I tried the patch it's definitely much faster , I had 45gigs of data parked in the file channel , with the patch flume took about 25 mins to figure itself out and the sinks to start pulling the data from the channel. However the ports avro source port and the Json reporting port opened only after the sinks started to pull the data from the channel.

        Sent from my iPhone

        Show
        gvarada75 gautham varada added a comment - I tried the patch it's definitely much faster , I had 45gigs of data parked in the file channel , with the patch flume took about 25 mins to figure itself out and the sinks to start pulling the data from the channel. However the ports avro source port and the Json reporting port opened only after the sinks started to pull the data from the channel. Sent from my iPhone
        Hide
        hudson Hudson added a comment -

        UNSTABLE: Integrated in flume-trunk #654 (See https://builds.apache.org/job/flume-trunk/654/)
        FLUME-2450 - Improve replay index insertion speed. (Hari via Brock) (brock: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=5c5b96a8c89d2fe58f1425a4ece8160b76f03f26)

        • flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FlumeEventQueue.java
        Show
        hudson Hudson added a comment - UNSTABLE: Integrated in flume-trunk #654 (See https://builds.apache.org/job/flume-trunk/654/ ) FLUME-2450 - Improve replay index insertion speed. (Hari via Brock) (brock: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=5c5b96a8c89d2fe58f1425a4ece8160b76f03f26 ) flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FlumeEventQueue.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Flume-trunk-hbase-98 #14 (See https://builds.apache.org/job/Flume-trunk-hbase-98/14/)
        FLUME-2450 - Improve replay index insertion speed. (Hari via Brock) (brock: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=5c5b96a8c89d2fe58f1425a4ece8160b76f03f26)

        • flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FlumeEventQueue.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Flume-trunk-hbase-98 #14 (See https://builds.apache.org/job/Flume-trunk-hbase-98/14/ ) FLUME-2450 - Improve replay index insertion speed. (Hari via Brock) (brock: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=5c5b96a8c89d2fe58f1425a4ece8160b76f03f26 ) flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FlumeEventQueue.java
        Hide
        brocknoland Brock Noland added a comment -

        Committed to trunk and flume 1.6!

        Show
        brocknoland Brock Noland added a comment - Committed to trunk and flume 1.6!
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit c0364d0a9f89706ce05d310af5af61888d22ce6d in flume's branch refs/heads/flume-1.6 from Brock Noland
        [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=c0364d0 ]

        FLUME-2450 - Improve replay index insertion speed. (Hari via Brock)

        Show
        jira-bot ASF subversion and git services added a comment - Commit c0364d0a9f89706ce05d310af5af61888d22ce6d in flume's branch refs/heads/flume-1.6 from Brock Noland [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=c0364d0 ] FLUME-2450 - Improve replay index insertion speed. (Hari via Brock)
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 5c5b96a8c89d2fe58f1425a4ece8160b76f03f26 in flume's branch refs/heads/trunk from Brock Noland
        [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=5c5b96a ]

        FLUME-2450 - Improve replay index insertion speed. (Hari via Brock)

        Show
        jira-bot ASF subversion and git services added a comment - Commit 5c5b96a8c89d2fe58f1425a4ece8160b76f03f26 in flume's branch refs/heads/trunk from Brock Noland [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=5c5b96a ] FLUME-2450 - Improve replay index insertion speed. (Hari via Brock)
        Hide
        brocknoland Brock Noland added a comment -

        +1

        Show
        brocknoland Brock Noland added a comment - +1
        Hide
        hshreedharan Hari Shreedharan added a comment -

        Brock Noland - Please take a look when you get a chance.

        gautham varada - It will be part of the next release if a committer reviews and commits this soon. I can't be sure how much performance improvement you'd see. You can try it out yourself. Gary Malouf did try an earlier version of this patch that I sent to him directly, which uses only direct memory - and the performance improved by at least an order of magnitude, but that would require the user to configure more direct memory on startup, or the channel would never start. So this one does mmap of the files and uses a hash map, which should give some improvement - though how much it would improve is something I am not sure of.

        Show
        hshreedharan Hari Shreedharan added a comment - Brock Noland - Please take a look when you get a chance. gautham varada - It will be part of the next release if a committer reviews and commits this soon. I can't be sure how much performance improvement you'd see. You can try it out yourself. Gary Malouf did try an earlier version of this patch that I sent to him directly, which uses only direct memory - and the performance improved by at least an order of magnitude, but that would require the user to configure more direct memory on startup, or the channel would never start. So this one does mmap of the files and uses a hash map, which should give some improvement - though how much it would improve is something I am not sure of.
        Hide
        gvarada75 gautham varada added a comment -

        hi hari, will this patch be a part of a near future minor release ? we really need this patch to be a part of some stable release to run this in production

        Show
        gvarada75 gautham varada added a comment - hi hari, will this patch be a part of a near future minor release ? we really need this patch to be a part of some stable release to run this in production

          People

          • Assignee:
            hshreedharan Hari Shreedharan
            Reporter:
            hshreedharan Hari Shreedharan
          • Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development