Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.1.0
    • None
    • None

    Description

      Off late netty 4 has earned so much praise in the community. For example,
      https://blog.twitter.com/2013/netty-4-at-twitter-reduced-gc-overhead

      A switch to netty 4 enables a significant reduction in gc pressure and also huge performance gains. I started working on this last Sunday and have a patch that shows performance gains on the order of 15-25% (total execution time) for some applications at Facebook. However, I only tested it with hadoop_facebook. So there might be issues with SASL path.

      I will release the patch today and want to open up a discussion if anyone is using the secure feature anymore. If not we can just deprecate it.

      Attachments

        1. GIRAPH-840.patch
          116 kB
          Eugene Joseph Koontz
        2. GIRAPH-840.patch
          117 kB
          Pavan Kumar

        Issue Links

          Activity

            Hi Pavan,
            Looking forward to seeing your patch. Let's see if we can get it to work with other profiles and with SASL.
            -Eugene

            ekoontz Eugene Joseph Koontz added a comment - Hi Pavan, Looking forward to seeing your patch. Let's see if we can get it to work with other profiles and with SASL. -Eugene
            pavanka Pavan Kumar added a comment -

            https://reviews.apache.org/r/17644/
            There are some small changes like making direct/undirect configurable, looking at support for secure versions etc. I just submitted it as an initial version so I can get more suggestions based on above description.
            [However note that this diff is stable and works on long running apps currently with expected performance gains]

            pavanka Pavan Kumar added a comment - https://reviews.apache.org/r/17644/ There are some small changes like making direct/undirect configurable, looking at support for secure versions etc. I just submitted it as an initial version so I can get more suggestions based on above description. [However note that this diff is stable and works on long running apps currently with expected performance gains]
            pavanka Pavan Kumar added a comment -

            Note that https://issues.apache.org/jira/browse/GIRAPH-839 must be fixed before this patch will work. If not, we will see some wierd deadlocking issues.

            pavanka Pavan Kumar added a comment - Note that https://issues.apache.org/jira/browse/GIRAPH-839 must be fixed before this patch will work. If not, we will see some wierd deadlocking issues.
            pavanka Pavan Kumar added a comment -

            [ps: by direct/undirect in above comment I mean direct memory or heap memory]

            pavanka Pavan Kumar added a comment - [ps: by direct/undirect in above comment I mean direct memory or heap memory]
            aching Avery Ching added a comment -

            This is awesome work. Eugene, would you mind taking a look at the patch to see is SASL works?

            aching Avery Ching added a comment - This is awesome work. Eugene, would you mind taking a look at the patch to see is SASL works?

            Pavan,
            Impressive, looks like a lot of work went into this!

            Avery,

            mvn -P2.2.0 test

            and

            mvn -Phadoop_yarn -Dhadoop.version=2.2.0 test

            work for me, provided I use Pavan's patch from this issue together with my GIRAPH-841 patch.

            I have not yet looked at the SASL-specific tests, but I assume those passed if the whole thing passes. Will look more at it later tonight.

            ekoontz Eugene Joseph Koontz added a comment - Pavan, Impressive, looks like a lot of work went into this! Avery, mvn -P2.2.0 test and mvn -Phadoop_yarn -Dhadoop.version=2.2.0 test work for me, provided I use Pavan's patch from this issue together with my GIRAPH-841 patch. I have not yet looked at the SASL-specific tests, but I assume those passed if the whole thing passes. Will look more at it later tonight.
            aching Avery Ching added a comment -

            Thanks Eugene!

            aching Avery Ching added a comment - Thanks Eugene!

            Gettting a failure with mvn -Phadoop_1.0 clean test:

            https://gist.github.com/ekoontz/8799669

            ekoontz Eugene Joseph Koontz added a comment - Gettting a failure with mvn -Phadoop_1.0 clean test : https://gist.github.com/ekoontz/8799669

            I've seen that failure as well, to correct it I had to change the maven-antrun-plugin call to org.apache.giraph.conf.AllOptions in giraph-core/pom.xml to reference "${maven.dependency.org.apache.hadoop.hadoop-core.jar.path}" instead of "${maven.dependency.org.apache.hadoop.hadoop-common.jar.path}".

            cmuchinsky@initiatesystems.com Craig Muchinsky added a comment - I've seen that failure as well, to correct it I had to change the maven-antrun-plugin call to org.apache.giraph.conf.AllOptions in giraph-core/pom.xml to reference "${maven.dependency.org.apache.hadoop.hadoop-core.jar.path}" instead of "${maven.dependency.org.apache.hadoop.hadoop-common.jar.path}".
            aching Avery Ching added a comment -

            ekoontz, any update? This looks good to go. Otherwise, we can fix any issues going forward if there are any...

            aching Avery Ching added a comment - ekoontz , any update? This looks good to go. Otherwise, we can fix any issues going forward if there are any...

            Pavan thanks for the updating the review. It looks good and I recognize the significant work it took; especially having to mess with the SASL channel handlers

            My only problem is that I cannot apply the latest patch:

            https://gist.github.com/ekoontz/8950523

            Am I doing something wrong? Do I have the wrong URL for the latest patch?

            ekoontz Eugene Joseph Koontz added a comment - Pavan thanks for the updating the review. It looks good and I recognize the significant work it took; especially having to mess with the SASL channel handlers My only problem is that I cannot apply the latest patch: https://gist.github.com/ekoontz/8950523 Am I doing something wrong? Do I have the wrong URL for the latest patch?
            pavanka Pavan Kumar added a comment -

            Eugune,

            My diff was based off of http://pastebin.com/uxtK6TPz
            There is a merge conflict in pom.xml

            So please apply against the above version then rebase. (or I just attached a non-conflicting patch in jira for convenience)
            Otherwise I thought the review will get messed up also showing changes from commits after the base commit (which I did not make, so confusing the reviewers; sorry first time here)

            pavanka Pavan Kumar added a comment - Eugune, My diff was based off of http://pastebin.com/uxtK6TPz There is a merge conflict in pom.xml So please apply against the above version then rebase. (or I just attached a non-conflicting patch in jira for convenience) Otherwise I thought the review will get messed up also showing changes from commits after the base commit (which I did not make, so confusing the reviewers; sorry first time here)
            pavanka Pavan Kumar added a comment -

            rebased against trunk

            pavanka Pavan Kumar added a comment - rebased against trunk

            Re-created Pavan's patch, works for me now.

            ekoontz Eugene Joseph Koontz added a comment - Re-created Pavan's patch, works for me now.
            pavanka Pavan Kumar added a comment -

            Eugene -
            did you make any changes in the patch? (against one I uploaded)
            also please tell me if you verified the working of sasl?
            thanks

            pavanka Pavan Kumar added a comment - Eugene - did you make any changes in the patch? (against one I uploaded) also please tell me if you verified the working of sasl? thanks
            pavanka Pavan Kumar added a comment -

            I see that the commit message was removed. Thanks, forgot to do so while uploading the patch.

            pavanka Pavan Kumar added a comment - I see that the commit message was removed. Thanks, forgot to do so while uploading the patch.

            Hi Pavan,

            Currently there are no SASL unit tests - I added GIRAPH-852 for these. I have not had a chance to test your patch with an existing SASL setup. However the changes you made related to SASL look fine to me.

            ekoontz Eugene Joseph Koontz added a comment - Hi Pavan, Currently there are no SASL unit tests - I added GIRAPH-852 for these. I have not had a chance to test your patch with an existing SASL setup. However the changes you made related to SASL look fine to me.
            aching Avery Ching added a comment -

            Thanks Eugene!

            aching Avery Ching added a comment - Thanks Eugene!
            majakabiljo Maja Kabiljo added a comment -

            Thanks, +1, committing!

            majakabiljo Maja Kabiljo added a comment - Thanks, +1, committing!
            hudson Hudson added a comment -

            ABORTED: Integrated in Giraph-trunk-Commit #1414 (See https://builds.apache.org/job/Giraph-trunk-Commit/1414/)
            GIRAPH-840: Upgrade to netty 4 (pavanka via majakabiljo) (majakabiljo: http://git-wip-us.apache.org/repos/asf?p=giraph.git&a=commit&h=62f9fd3fd3c13c8387e7448c4f3ea6b8f65b8a8c)

            • giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/SaslServerHandler.java
            • giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/ResponseEncoder.java
            • giraph-core/src/main/java/org/apache/giraph/comm/netty/OutboundByteCounter.java
            • giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestServerHandler.java
            • giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestInfo.java
            • giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestEncoder.java
            • giraph-core/src/main/java/org/apache/giraph/comm/netty/ByteCounterDelegate.java
            • giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/ResponseClientHandler.java
            • giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java
            • giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/SaslClientHandler.java
            • giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestDecoder.java
            • giraph-core/src/main/java/org/apache/giraph/utils/DynamicChannelBufferInputStream.java
            • giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyServer.java
            • giraph-core/pom.xml
            • giraph-core/src/main/java/org/apache/giraph/comm/netty/InboundByteCounter.java
            • giraph-core/src/main/java/org/apache/giraph/comm/netty/WrappedAdaptiveReceiveBufferSizePredictorFactory.java
            • giraph-core/src/main/java/org/apache/giraph/utils/DynamicChannelBufferOutputStream.java
            • giraph-core/src/main/java/org/apache/giraph/comm/netty/ByteCounter.java
            • giraph-core/src/main/java/org/apache/giraph/utils/PipelineUtils.java
            • CHANGELOG
            • giraph-core/src/main/java/org/apache/giraph/comm/netty/ChannelRotater.java
            • giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java
            • giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java
            • giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/AuthorizeServerHandler.java
            • pom.xml
            • giraph-core/src/main/java/org/apache/giraph/utils/ProgressableUtils.java
            hudson Hudson added a comment - ABORTED: Integrated in Giraph-trunk-Commit #1414 (See https://builds.apache.org/job/Giraph-trunk-Commit/1414/ ) GIRAPH-840 : Upgrade to netty 4 (pavanka via majakabiljo) (majakabiljo: http://git-wip-us.apache.org/repos/asf?p=giraph.git&a=commit&h=62f9fd3fd3c13c8387e7448c4f3ea6b8f65b8a8c ) giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/SaslServerHandler.java giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/ResponseEncoder.java giraph-core/src/main/java/org/apache/giraph/comm/netty/OutboundByteCounter.java giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestServerHandler.java giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestInfo.java giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestEncoder.java giraph-core/src/main/java/org/apache/giraph/comm/netty/ByteCounterDelegate.java giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/ResponseClientHandler.java giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/SaslClientHandler.java giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestDecoder.java giraph-core/src/main/java/org/apache/giraph/utils/DynamicChannelBufferInputStream.java giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyServer.java giraph-core/pom.xml giraph-core/src/main/java/org/apache/giraph/comm/netty/InboundByteCounter.java giraph-core/src/main/java/org/apache/giraph/comm/netty/WrappedAdaptiveReceiveBufferSizePredictorFactory.java giraph-core/src/main/java/org/apache/giraph/utils/DynamicChannelBufferOutputStream.java giraph-core/src/main/java/org/apache/giraph/comm/netty/ByteCounter.java giraph-core/src/main/java/org/apache/giraph/utils/PipelineUtils.java CHANGELOG giraph-core/src/main/java/org/apache/giraph/comm/netty/ChannelRotater.java giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/AuthorizeServerHandler.java pom.xml giraph-core/src/main/java/org/apache/giraph/utils/ProgressableUtils.java
            majakabiljo Maja Kabiljo added a comment -

            It seems this is the problem:
            14/02/14 16:57:16 WARN netty.NettyClient: checkRequestsForProblems: Problem with request id (destTask=-1,reqId=0) connected = true, future done = true, success = false, cause = java.lang.OutOfMemoryError: Direct buffer memory, elapsed time = 15000, destination = hemera.apache.org/140.211.11.27:29999
            On my machine tests pass. pavanka can you please take a look, is there some setting we should change?

            majakabiljo Maja Kabiljo added a comment - It seems this is the problem: 14/02/14 16:57:16 WARN netty.NettyClient: checkRequestsForProblems: Problem with request id (destTask=-1,reqId=0) connected = true, future done = true, success = false, cause = java.lang.OutOfMemoryError: Direct buffer memory, elapsed time = 15000, destination = hemera.apache.org/140.211.11.27:29999 On my machine tests pass. pavanka can you please take a look, is there some setting we should change?
            pavanka Pavan Kumar added a comment -

            I saw that tests might fail/pass due to out of memory or gc overhead based on the machine configuration.
            So I guess, one way is to change options to default to use unpooled heap memory, in which case I hope tests should pass.
            So, how do I make that change?

            pavanka Pavan Kumar added a comment - I saw that tests might fail/pass due to out of memory or gc overhead based on the machine configuration. So I guess, one way is to change options to default to use unpooled heap memory, in which case I hope tests should pass. So, how do I make that change?
            pavanka Pavan Kumar added a comment - added a patch at https://issues.apache.org/jira/browse/GIRAPH-854
            hudson Hudson added a comment -

            FAILURE: Integrated in Giraph-trunk-Commit #1416 (See https://builds.apache.org/job/Giraph-trunk-Commit/1416/)
            GIRAPH-854: fix for test fail due to GIRAPH-840 (pavanka via majakabiljo) (majakabiljo: http://git-wip-us.apache.org/repos/asf?p=giraph.git&a=commit&h=94033dbbd6f20e03b64b63095a52939a654e52ac)

            • CHANGELOG
            • giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java
            hudson Hudson added a comment - FAILURE: Integrated in Giraph-trunk-Commit #1416 (See https://builds.apache.org/job/Giraph-trunk-Commit/1416/ ) GIRAPH-854 : fix for test fail due to GIRAPH-840 (pavanka via majakabiljo) (majakabiljo: http://git-wip-us.apache.org/repos/asf?p=giraph.git&a=commit&h=94033dbbd6f20e03b64b63095a52939a654e52ac ) CHANGELOG giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java
            pavanka Pavan Kumar added a comment -

            I just ran

            mvn clean verify -Phadoop_0.20.203
            and
            mvn clean verify -Phadoop_0.23

            both succeed

            Avery can you please help understand the issue.
            I did built the dependency tree for netty in rexter-io and see this
            http://pastebin.com/q0FpwWwu

            I find this wierd since it has "io.netty:netty:jar:3.6.3.Final:test" which is incorrect.
            Unable to reproduce locally since tests succeed.

            pavanka Pavan Kumar added a comment - I just ran mvn clean verify -Phadoop_0.20.203 and mvn clean verify -Phadoop_0.23 both succeed Avery can you please help understand the issue. I did built the dependency tree for netty in rexter-io and see this http://pastebin.com/q0FpwWwu I find this wierd since it has "io.netty:netty:jar:3.6.3.Final:test" which is incorrect. Unable to reproduce locally since tests succeed.

            pavanka please make sure that hadoop_1 and hadoop_2 profiles work. These are the ones we absolutely need to make work for the 1.1.0 release.

            Thanks,
            Roman.

            rvs Roman Shaposhnik added a comment - pavanka please make sure that hadoop_1 and hadoop_2 profiles work. These are the ones we absolutely need to make work for the 1.1.0 release. Thanks, Roman.
            pavanka Pavan Kumar added a comment -

            Roman,

            Please see that outputs
            http://pastebin.com/n5N1zHeW (hadoop_1)
            http://pastebin.com/32Mq6J1p (hadoop_2) --> this had been failing before 840 commit. (git log shows 642)

            If the break is fixed I feel that it should also pass with 840 since it passes for other profiles when tested.
            Thanks

            pavanka Pavan Kumar added a comment - Roman, Please see that outputs http://pastebin.com/n5N1zHeW (hadoop_1) http://pastebin.com/32Mq6J1p (hadoop_2) --> this had been failing before 840 commit. (git log shows 642) If the break is fixed I feel that it should also pass with 840 since it passes for other profiles when tested. Thanks

            pavanka please run hadoop_2 profile tests with -fae so we can see what else is failing

            rvs Roman Shaposhnik added a comment - pavanka please run hadoop_2 profile tests with -fae so we can see what else is failing

            Btw, there's currently no way to fix the breaking tests in hadoop_2 profile. The versions of HBase/Hive,etc we're using are currently NOT providing artifacts compiled against Hadoop 2 APIs.

            rvs Roman Shaposhnik added a comment - Btw, there's currently no way to fix the breaking tests in hadoop_2 profile. The versions of HBase/Hive,etc we're using are currently NOT providing artifacts compiled against Hadoop 2 APIs.
            pavanka Pavan Kumar added a comment -

            Result of the build
            http://pastebin.com/T0eUMdxR

            pavanka Pavan Kumar added a comment - Result of the build http://pastebin.com/T0eUMdxR

            LGTM! +1 for committing.

            rvs Roman Shaposhnik added a comment - LGTM! +1 for committing.

            People

              pavanka Pavan Kumar
              pavanka Pavan Kumar
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: