Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Off late netty 4 has earned so much praise in the community. For example,
https://blog.twitter.com/2013/netty-4-at-twitter-reduced-gc-overhead
A switch to netty 4 enables a significant reduction in gc pressure and also huge performance gains. I started working on this last Sunday and have a patch that shows performance gains on the order of 15-25% (total execution time) for some applications at Facebook. However, I only tested it with hadoop_facebook. So there might be issues with SASL path.
I will release the patch today and want to open up a discussion if anyone is using the secure feature anymore. If not we can just deprecate it.
Attachments
Attachments
- GIRAPH-840.patch
- 116 kB
- Eugene Joseph Koontz
- GIRAPH-840.patch
- 117 kB
- Pavan Kumar
Issue Links
- relates to
-
GIRAPH-871 Map task jvm never exits since netty 4 upgrade
- Resolved
-
GIRAPH-872 Minor inconsistencies with netty handler logic after netty 4 upgrade
- Resolved
Activity
https://reviews.apache.org/r/17644/
There are some small changes like making direct/undirect configurable, looking at support for secure versions etc. I just submitted it as an initial version so I can get more suggestions based on above description.
[However note that this diff is stable and works on long running apps currently with expected performance gains]
Note that https://issues.apache.org/jira/browse/GIRAPH-839 must be fixed before this patch will work. If not, we will see some wierd deadlocking issues.
[ps: by direct/undirect in above comment I mean direct memory or heap memory]
This is awesome work. Eugene, would you mind taking a look at the patch to see is SASL works?
Pavan,
Impressive, looks like a lot of work went into this!
Avery,
mvn -P2.2.0 test
and
mvn -Phadoop_yarn -Dhadoop.version=2.2.0 test
work for me, provided I use Pavan's patch from this issue together with my GIRAPH-841 patch.
I have not yet looked at the SASL-specific tests, but I assume those passed if the whole thing passes. Will look more at it later tonight.
I've seen that failure as well, to correct it I had to change the maven-antrun-plugin call to org.apache.giraph.conf.AllOptions in giraph-core/pom.xml to reference "${maven.dependency.org.apache.hadoop.hadoop-core.jar.path}" instead of "${maven.dependency.org.apache.hadoop.hadoop-common.jar.path}".
ekoontz, any update? This looks good to go. Otherwise, we can fix any issues going forward if there are any...
Pavan thanks for the updating the review. It looks good and I recognize the significant work it took; especially having to mess with the SASL channel handlers
My only problem is that I cannot apply the latest patch:
https://gist.github.com/ekoontz/8950523
Am I doing something wrong? Do I have the wrong URL for the latest patch?
Eugune,
My diff was based off of http://pastebin.com/uxtK6TPz
There is a merge conflict in pom.xml
So please apply against the above version then rebase. (or I just attached a non-conflicting patch in jira for convenience)
Otherwise I thought the review will get messed up also showing changes from commits after the base commit (which I did not make, so confusing the reviewers; sorry first time here)
Eugene -
did you make any changes in the patch? (against one I uploaded)
also please tell me if you verified the working of sasl?
thanks
I see that the commit message was removed. Thanks, forgot to do so while uploading the patch.
Hi Pavan,
Currently there are no SASL unit tests - I added GIRAPH-852 for these. I have not had a chance to test your patch with an existing SASL setup. However the changes you made related to SASL look fine to me.
ABORTED: Integrated in Giraph-trunk-Commit #1414 (See https://builds.apache.org/job/Giraph-trunk-Commit/1414/)
GIRAPH-840: Upgrade to netty 4 (pavanka via majakabiljo) (majakabiljo: http://git-wip-us.apache.org/repos/asf?p=giraph.git&a=commit&h=62f9fd3fd3c13c8387e7448c4f3ea6b8f65b8a8c)
- giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/SaslServerHandler.java
- giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/ResponseEncoder.java
- giraph-core/src/main/java/org/apache/giraph/comm/netty/OutboundByteCounter.java
- giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestServerHandler.java
- giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestInfo.java
- giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestEncoder.java
- giraph-core/src/main/java/org/apache/giraph/comm/netty/ByteCounterDelegate.java
- giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/ResponseClientHandler.java
- giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java
- giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/SaslClientHandler.java
- giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestDecoder.java
- giraph-core/src/main/java/org/apache/giraph/utils/DynamicChannelBufferInputStream.java
- giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyServer.java
- giraph-core/pom.xml
- giraph-core/src/main/java/org/apache/giraph/comm/netty/InboundByteCounter.java
- giraph-core/src/main/java/org/apache/giraph/comm/netty/WrappedAdaptiveReceiveBufferSizePredictorFactory.java
- giraph-core/src/main/java/org/apache/giraph/utils/DynamicChannelBufferOutputStream.java
- giraph-core/src/main/java/org/apache/giraph/comm/netty/ByteCounter.java
- giraph-core/src/main/java/org/apache/giraph/utils/PipelineUtils.java
- CHANGELOG
- giraph-core/src/main/java/org/apache/giraph/comm/netty/ChannelRotater.java
- giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java
- giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java
- giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/AuthorizeServerHandler.java
- pom.xml
- giraph-core/src/main/java/org/apache/giraph/utils/ProgressableUtils.java
It seems this is the problem:
14/02/14 16:57:16 WARN netty.NettyClient: checkRequestsForProblems: Problem with request id (destTask=-1,reqId=0) connected = true, future done = true, success = false, cause = java.lang.OutOfMemoryError: Direct buffer memory, elapsed time = 15000, destination = hemera.apache.org/140.211.11.27:29999
On my machine tests pass. pavanka can you please take a look, is there some setting we should change?
I saw that tests might fail/pass due to out of memory or gc overhead based on the machine configuration.
So I guess, one way is to change options to default to use unpooled heap memory, in which case I hope tests should pass.
So, how do I make that change?
FAILURE: Integrated in Giraph-trunk-Commit #1416 (See https://builds.apache.org/job/Giraph-trunk-Commit/1416/)
GIRAPH-854: fix for test fail due to GIRAPH-840 (pavanka via majakabiljo) (majakabiljo: http://git-wip-us.apache.org/repos/asf?p=giraph.git&a=commit&h=94033dbbd6f20e03b64b63095a52939a654e52ac)
- CHANGELOG
- giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java
I just ran
mvn clean verify -Phadoop_0.20.203
and
mvn clean verify -Phadoop_0.23
both succeed
Avery can you please help understand the issue.
I did built the dependency tree for netty in rexter-io and see this
http://pastebin.com/q0FpwWwu
I find this wierd since it has "io.netty:netty:jar:3.6.3.Final:test" which is incorrect.
Unable to reproduce locally since tests succeed.
pavanka please make sure that hadoop_1 and hadoop_2 profiles work. These are the ones we absolutely need to make work for the 1.1.0 release.
Thanks,
Roman.
Roman,
Please see that outputs
http://pastebin.com/n5N1zHeW (hadoop_1)
http://pastebin.com/32Mq6J1p (hadoop_2) --> this had been failing before 840 commit. (git log shows 642)
If the break is fixed I feel that it should also pass with 840 since it passes for other profiles when tested.
Thanks
pavanka please run hadoop_2 profile tests with -fae so we can see what else is failing
Btw, there's currently no way to fix the breaking tests in hadoop_2 profile. The versions of HBase/Hive,etc we're using are currently NOT providing artifacts compiled against Hadoop 2 APIs.
Hi Pavan,
Looking forward to seeing your patch. Let's see if we can get it to work with other profiles and with SASL.
-Eugene