Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-3020

Improve HDFSEventSink Escape Ingestion by more then 10x by not getting InetAddress on every record

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.8.0
    • Component/s: None
    • Labels:
      None

      Description

      If you are use escaping the current code will call InetAddress on every record. Which will result is a huge impact to performance.

      TotalTime,8403,
      totalEventTakeTime,1498,
      totalWriteTime,1981,
      totalWriterSetupTime,65,
      commitTime,201,
      flushTime,18,
      startTrans,7,

      The rest is all InetAddress

      1. flume-3020.patch
        2 kB
        Theodore michael Malaska
      2. flume-3020.patch.2
        2 kB
        Theodore michael Malaska

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          ABORTED: Integrated in Jenkins build Flume-trunk-hbase-1 #227 (See https://builds.apache.org/job/Flume-trunk-hbase-1/227/)
          FLUME-3020. Improve HDFS Sink escape sequence substitution (bessbd: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=9868c4231362e5568a1675a604288d60cbadd7fe)

          • (edit) flume-ng-core/src/main/java/org/apache/flume/formatter/output/BucketPath.java
          Show
          hudson Hudson added a comment - ABORTED: Integrated in Jenkins build Flume-trunk-hbase-1 #227 (See https://builds.apache.org/job/Flume-trunk-hbase-1/227/ ) FLUME-3020 . Improve HDFS Sink escape sequence substitution (bessbd: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=9868c4231362e5568a1675a604288d60cbadd7fe ) (edit) flume-ng-core/src/main/java/org/apache/flume/formatter/output/BucketPath.java
          Hide
          bessbd Bessenyei Balázs Donát added a comment -

          Hari Shreedharan, Jeff Holoman: thank you for the review!

          Show
          bessbd Bessenyei Balázs Donát added a comment - Hari Shreedharan , Jeff Holoman : thank you for the review!
          Hide
          bessbd Bessenyei Balázs Donát added a comment -

          Theodore michael Malaska: thank you for the patch!

          Show
          bessbd Bessenyei Balázs Donát added a comment - Theodore michael Malaska : thank you for the patch!
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 9868c4231362e5568a1675a604288d60cbadd7fe in flume's branch refs/heads/trunk from Ted Malaska
          [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=9868c42 ]

          FLUME-3020. Improve HDFS Sink escape sequence substitution

          When using escape sequences, the current code will call InetAddress
          for every event which results in a huge impact to performance.

          This patch fixes that issue by caching the local host in a static variable.
          We can do this because there is zero chance the local host will change for a life of a JVM.

          Reviewers: Hari Shreedharan, Jeff Holoman, Bessenyei Balázs Donát

          (Theodore michael Malaska via Bessenyei Balázs Donát)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 9868c4231362e5568a1675a604288d60cbadd7fe in flume's branch refs/heads/trunk from Ted Malaska [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=9868c42 ] FLUME-3020 . Improve HDFS Sink escape sequence substitution When using escape sequences, the current code will call InetAddress for every event which results in a huge impact to performance. This patch fixes that issue by caching the local host in a static variable. We can do this because there is zero chance the local host will change for a life of a JVM. Reviewers: Hari Shreedharan, Jeff Holoman, Bessenyei Balázs Donát (Theodore michael Malaska via Bessenyei Balázs Donát)
          Hide
          bessbd Bessenyei Balázs Donát added a comment -

          +1, LGTM

          I'm about to commit this.

          (Checkstyle is complaining about the order of modifiers 'static' modifier out of order with the JLS suggestions. I'll reoder them before committing if you don't mind, Theodore michael Malaska)

          Show
          bessbd Bessenyei Balázs Donát added a comment - +1, LGTM I'm about to commit this. (Checkstyle is complaining about the order of modifiers 'static' modifier out of order with the JLS suggestions . I'll reoder them before committing if you don't mind, Theodore michael Malaska )
          Hide
          ted.m Theodore michael Malaska added a comment -

          Added private and final to new patch

          Show
          ted.m Theodore michael Malaska added a comment - Added private and final to new patch
          Hide
          bessbd Bessenyei Balázs Donát added a comment -

          Theodore michael Malaska: Jeff Holoman has left a comment on your review request.

          I'll let you respond to that issue. That should leave enough time for anyone else to review it if they want to.

          After you have reacted to it, I think the patch should be ready to be committed. (Which I'd be happy to do.)

          Show
          bessbd Bessenyei Balázs Donát added a comment - Theodore michael Malaska : Jeff Holoman has left a comment on your review request. I'll let you respond to that issue. That should leave enough time for anyone else to review it if they want to. After you have reacted to it, I think the patch should be ready to be committed. (Which I'd be happy to do.)
          Hide
          ted.m Theodore michael Malaska added a comment -

          Thanks Bessenyei Balázs Donát, let me know if there is anything else needed before we commit this ticket

          Show
          ted.m Theodore michael Malaska added a comment - Thanks Bessenyei Balázs Donát , let me know if there is anything else needed before we commit this ticket
          Hide
          bessbd Bessenyei Balázs Donát added a comment -

          Theodore michael Malaska: If it's really only initialized at those assignments, then please ignore my previous comment.

          Thank you for the clarification!

          The change looks good to me

          Show
          bessbd Bessenyei Balázs Donát added a comment - Theodore michael Malaska : If it's really only initialized at those assignments, then please ignore my previous comment. Thank you for the clarification! The change looks good to me
          Hide
          ted.m Theodore michael Malaska added a comment -

          Hey Bessenyei Balázs Donát

          So on your latest comment. I'm not sure where you would want the StaticClass.init() function to be called from.

          I tested the code and it looks like the static class will initiate when one of the followings lines are hit for the first time.

          replacementString = InetAddressCache.hostName;
          replacementString = InetAddressCache.hostAddress;
          replacementString = InetAddressCache.canonicalHostName;

          The only place I would see to add the StaticClass.init would be at the init of the parent class, but that will take the small hit of the one time lookup even if the values are never used.

          I'm personally in different. Just let me know where you would like the StaticClass.init and let me know the value you are aiming to get from its placement.

          Thanks

          Show
          ted.m Theodore michael Malaska added a comment - Hey Bessenyei Balázs Donát So on your latest comment. I'm not sure where you would want the StaticClass.init() function to be called from. I tested the code and it looks like the static class will initiate when one of the followings lines are hit for the first time. replacementString = InetAddressCache.hostName; replacementString = InetAddressCache.hostAddress; replacementString = InetAddressCache.canonicalHostName; The only place I would see to add the StaticClass.init would be at the init of the parent class, but that will take the small hit of the one time lookup even if the values are never used. I'm personally in different. Just let me know where you would like the StaticClass.init and let me know the value you are aiming to get from its placement. Thanks
          Hide
          bessbd Bessenyei Balázs Donát added a comment -

          Theodore michael Malaska: I like your latest version!

          Just one more thing: as per http://stackoverflow.com/a/9130560/5323166 , wouldn't it be better to make the evaluation (and caching) explicit? (

          "which occurs the first time it is referenced in code" — incorrect, implementation is free to decide when to load the class.

          )

          Show
          bessbd Bessenyei Balázs Donát added a comment - Theodore michael Malaska : I like your latest version! Just one more thing: as per http://stackoverflow.com/a/9130560/5323166 , wouldn't it be better to make the evaluation (and caching) explicit? ( "which occurs the first time it is referenced in code" — incorrect, implementation is free to decide when to load the class. )
          Hide
          ted.m Theodore michael Malaska added a comment -

          Let me know if you like the latest version

          Show
          ted.m Theodore michael Malaska added a comment - Let me know if you like the latest version
          Hide
          bessbd Bessenyei Balázs Donát added a comment -

          Awesome! Thank you, Theodore michael Malaska

          Show
          bessbd Bessenyei Balázs Donát added a comment - Awesome! Thank you, Theodore michael Malaska
          Hide
          ted.m Theodore michael Malaska added a comment -

          Sounds good for both. I will make another patch in a little bite

          Show
          ted.m Theodore michael Malaska added a comment - Sounds good for both. I will make another patch in a little bite
          Hide
          bessbd Bessenyei Balázs Donát added a comment -

          Under what circumstances does InetAddress.getLocalHost() throw UnknownHostException?

          If the chances are not negligible, maybe we should just go "lazy" and cache it when it's first required (possibly as String).

          Show
          bessbd Bessenyei Balázs Donát added a comment - Under what circumstances does InetAddress.getLocalHost() throw UnknownHostException ? If the chances are not negligible, maybe we should just go "lazy" and cache it when it's first required (possibly as String).
          Hide
          hshreedharan Hari Shreedharan added a comment -

          This looks good. We could actually cache even the replacement strings in local static variables to save that cost as well (which of course is trivial compared to the lookup cost).

          Show
          hshreedharan Hari Shreedharan added a comment - This looks good. We could actually cache even the replacement strings in local static variables to save that cost as well (which of course is trivial compared to the lookup cost).
          Hide
          ted.m Theodore michael Malaska added a comment -

          Cache the local host in a static variable. We can do this because there is zero chance the local host will change for a life of a JVM.

          Show
          ted.m Theodore michael Malaska added a comment - Cache the local host in a static variable. We can do this because there is zero chance the local host will change for a life of a JVM.

            People

            • Assignee:
              ted.m Theodore michael Malaska
              Reporter:
              ted.m Theodore michael Malaska
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development