Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-12090

minikdc-related unit tests fail consistently on some platforms

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 2.7.0
    • None
    • kms, test
    • None

    Description

      On some platforms all unit tests that use minikdc fail consistently. Those tests include TestKMS, TestSaslDataTransfer, TestTimelineAuthenticationFilter, etc.

      Typical failures on the unit tests:

      java.lang.AssertionError: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Cannot get a KDC reply)
      	at org.junit.Assert.fail(Assert.java:88)
      	at org.apache.hadoop.crypto.key.kms.server.TestKMS$8$4.run(TestKMS.java:1154)
      	at org.apache.hadoop.crypto.key.kms.server.TestKMS$8$4.run(TestKMS.java:1145)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1645)
      	at org.apache.hadoop.crypto.key.kms.server.TestKMS.doAs(TestKMS.java:261)
      	at org.apache.hadoop.crypto.key.kms.server.TestKMS.access$100(TestKMS.java:76)
      

      The errors that cause this failure on the KDC server on the minikdc are a NullPointerException:

      org.apache.mina.filter.codec.ProtocolDecoderException: java.lang.NullPointerException: message (Hexdump: ...)
      	at org.apache.mina.filter.codec.ProtocolCodecFilter.messageReceived(ProtocolCodecFilter.java:234)
      	at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:434)
      	at org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:48)
      	at org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:802)
      	at org.apache.mina.core.filterchain.IoFilterAdapter.messageReceived(IoFilterAdapter.java:120)
      	at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:434)
      	at org.apache.mina.core.filterchain.DefaultIoFilterChain.fireMessageReceived(DefaultIoFilterChain.java:426)
      	at org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:604)
      	at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:564)
      	at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:553)
      	at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$400(AbstractPollingIoProcessor.java:57)
      	at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:892)
      	at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:65)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.NullPointerException: message
      	at org.apache.mina.filter.codec.AbstractProtocolDecoderOutput.write(AbstractProtocolDecoderOutput.java:44)
      	at org.apache.directory.server.kerberos.protocol.codec.MinaKerberosDecoder.decode(MinaKerberosDecoder.java:65)
      	at org.apache.mina.filter.codec.ProtocolCodecFilter.messageReceived(ProtocolCodecFilter.java:224)
      	... 15 more
      

      Attachments

        1. HADOOP-12090.002.patch
          5 kB
          Sangjin Lee
        2. HADOOP-12090.001.patch
          5 kB
          Sangjin Lee

        Issue Links

          Activity

            sjlee0 Sangjin Lee added a comment -

            This is caused by fragmented TCP packets for the kerberos authentication request.

            In the problem situation, the kerberos authentication request sent by the client gets fragmented into 2 packets although the size is tiny (e.g. 584 bytes). It gets split into one packet with 570 bytes of data and another with 14 bytes in this case. Tcpdump output:

            10:30:32.358645 IP localhost.50199 > localhost.60538: Flags [S], seq 1804572222, win 32792, options [mss 16396,sackOK,TS val 566449661 ecr 0,nop,wscale 8], length 0
            10:30:32.358661 IP localhost.60538 > localhost.50199: Flags [S.], seq 2381946627, ack 1804572223, win 1140, options [mss 16396,sackOK,TS val 566449661 ecr 566449661,nop,wscale 0], length 0
            10:30:32.358672 IP localhost.50199 > localhost.60538: Flags [.], ack 1, win 129, options [nop,nop,TS val 566449661 ecr 566449661], length 0
            10:30:32.358788 IP localhost.50199 > localhost.60538: Flags [.], seq 1:571, ack 1, win 129, options [nop,nop,TS val 566449661 ecr 566449661], length 570
            10:30:32.358796 IP localhost.60538 > localhost.50199: Flags [.], ack 571, win 570, options [nop,nop,TS val 566449661 ecr 566449661], length 0
            10:30:32.358801 IP localhost.50199 > localhost.60538: Flags [P.], seq 571:585, ack 1, win 129, options [nop,nop,TS val 566449661 ecr 566449661], length 14
            

            It turns out there is a bug with apacheds (on which minikdc is based) where the kerberos message decoding fails with a NPE if the kerberos message is not contained in a single TCP packet (DIRSERVER-2071).

            Furthermore, the TCP fragmentation itself has something to do with apacheds as well. Mina, the underlying I/O framework for apacheds, sets a pretty small receive/send buffer size by default (1 KB). This has an affect of reducing the TCP window size significantly as it is evidenced by the tcp dump above. This is causing the fragmentation.

            sjlee0 Sangjin Lee added a comment - This is caused by fragmented TCP packets for the kerberos authentication request. In the problem situation, the kerberos authentication request sent by the client gets fragmented into 2 packets although the size is tiny (e.g. 584 bytes). It gets split into one packet with 570 bytes of data and another with 14 bytes in this case. Tcpdump output: 10:30:32.358645 IP localhost.50199 > localhost.60538: Flags [S], seq 1804572222, win 32792, options [mss 16396,sackOK,TS val 566449661 ecr 0,nop,wscale 8], length 0 10:30:32.358661 IP localhost.60538 > localhost.50199: Flags [S.], seq 2381946627, ack 1804572223, win 1140, options [mss 16396,sackOK,TS val 566449661 ecr 566449661,nop,wscale 0], length 0 10:30:32.358672 IP localhost.50199 > localhost.60538: Flags [.], ack 1, win 129, options [nop,nop,TS val 566449661 ecr 566449661], length 0 10:30:32.358788 IP localhost.50199 > localhost.60538: Flags [.], seq 1:571, ack 1, win 129, options [nop,nop,TS val 566449661 ecr 566449661], length 570 10:30:32.358796 IP localhost.60538 > localhost.50199: Flags [.], ack 571, win 570, options [nop,nop,TS val 566449661 ecr 566449661], length 0 10:30:32.358801 IP localhost.50199 > localhost.60538: Flags [P.], seq 571:585, ack 1, win 129, options [nop,nop,TS val 566449661 ecr 566449661], length 14 It turns out there is a bug with apacheds (on which minikdc is based) where the kerberos message decoding fails with a NPE if the kerberos message is not contained in a single TCP packet ( DIRSERVER-2071 ). Furthermore, the TCP fragmentation itself has something to do with apacheds as well. Mina, the underlying I/O framework for apacheds, sets a pretty small receive/send buffer size by default (1 KB). This has an affect of reducing the TCP window size significantly as it is evidenced by the tcp dump above. This is causing the fragmentation.
            sjlee0 Sangjin Lee added a comment -

            Both of these issues need to be fixed by apacheds. This may entail upgrading apacheds from 2.0.0-M15 to a future version (2.0.0-M21). There are some API incompatible changes between those versions, so some work would be required.

            I propose a workaround on our minikdc code to configure the receive/send buffer size to be larger than the default; e.g. 32 KB or 64 KB. I have confirmed that that patch makes all the failed unit tests succeed again. So until we upgrade apacheds to pick up the fix, it'd be good to have this workaround.

            I'll come up with a patch that does that soon.

            sjlee0 Sangjin Lee added a comment - Both of these issues need to be fixed by apacheds. This may entail upgrading apacheds from 2.0.0-M15 to a future version (2.0.0-M21). There are some API incompatible changes between those versions, so some work would be required. I propose a workaround on our minikdc code to configure the receive/send buffer size to be larger than the default; e.g. 32 KB or 64 KB. I have confirmed that that patch makes all the failed unit tests succeed again. So until we upgrade apacheds to pick up the fix, it'd be good to have this workaround. I'll come up with a patch that does that soon.
            sjlee0 Sangjin Lee added a comment -

            Patch v.1 posted.

            sjlee0 Sangjin Lee added a comment - Patch v.1 posted.
            hadoopqa Hadoop QA added a comment -



            -1 overall



            Vote Subsystem Runtime Comment
            0 pre-patch 15m 59s Pre-patch trunk compilation is healthy.
            +1 @author 0m 0s The patch does not contain any @author tags.
            -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
            +1 javac 7m 50s There were no new javac warning messages.
            +1 javadoc 10m 0s There were no new javadoc warning messages.
            +1 release audit 0m 25s The applied patch does not increase the total number of release audit warnings.
            -1 checkstyle 0m 16s The applied patch generated 1 new checkstyle issues (total was 9, now 10).
            +1 whitespace 0m 0s The patch has no lines that end in whitespace.
            +1 install 1m 35s mvn install still works.
            +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
            +1 findbugs 0m 41s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
            +1 common tests 0m 32s Tests passed in hadoop-minikdc.
                37m 56s  



            Subsystem Report/Notes
            Patch URL http://issues.apache.org/jira/secure/attachment/12739670/HADOOP-12090.001.patch
            Optional Tests javadoc javac unit findbugs checkstyle
            git revision trunk / 32ffda1
            checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/6965/artifact/patchprocess/diffcheckstylehadoop-minikdc.txt
            hadoop-minikdc test log https://builds.apache.org/job/PreCommit-HADOOP-Build/6965/artifact/patchprocess/testrun_hadoop-minikdc.txt
            Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/6965/testReport/
            Java 1.7.0_55
            uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
            Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/6965/console

            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 15m 59s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 7m 50s There were no new javac warning messages. +1 javadoc 10m 0s There were no new javadoc warning messages. +1 release audit 0m 25s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 0m 16s The applied patch generated 1 new checkstyle issues (total was 9, now 10). +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 35s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 0m 41s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 common tests 0m 32s Tests passed in hadoop-minikdc.     37m 56s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12739670/HADOOP-12090.001.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 32ffda1 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/6965/artifact/patchprocess/diffcheckstylehadoop-minikdc.txt hadoop-minikdc test log https://builds.apache.org/job/PreCommit-HADOOP-Build/6965/artifact/patchprocess/testrun_hadoop-minikdc.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/6965/testReport/ Java 1.7.0_55 uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/6965/console This message was automatically generated.
            sjlee0 Sangjin Lee added a comment -

            Addressed the checkstyle violation.

            sjlee0 Sangjin Lee added a comment - Addressed the checkstyle violation.
            hadoopqa Hadoop QA added a comment -



            -1 overall



            Vote Subsystem Runtime Comment
            0 pre-patch 15m 28s Pre-patch trunk compilation is healthy.
            +1 @author 0m 0s The patch does not contain any @author tags.
            -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
            +1 javac 7m 38s There were no new javac warning messages.
            +1 javadoc 9m 44s There were no new javadoc warning messages.
            +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
            +1 checkstyle 0m 16s There were no new checkstyle issues.
            +1 whitespace 0m 0s The patch has no lines that end in whitespace.
            +1 install 1m 34s mvn install still works.
            +1 eclipse:eclipse 0m 36s The patch built with eclipse:eclipse.
            +1 findbugs 0m 40s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
            +1 common tests 0m 31s Tests passed in hadoop-minikdc.
                36m 54s  



            Subsystem Report/Notes
            Patch URL http://issues.apache.org/jira/secure/attachment/12739687/HADOOP-12090.002.patch
            Optional Tests javadoc javac unit findbugs checkstyle
            git revision trunk / 75a2560
            hadoop-minikdc test log https://builds.apache.org/job/PreCommit-HADOOP-Build/6967/artifact/patchprocess/testrun_hadoop-minikdc.txt
            Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/6967/testReport/
            Java 1.7.0_55
            uname Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
            Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/6967/console

            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 15m 28s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 7m 38s There were no new javac warning messages. +1 javadoc 9m 44s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 16s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 34s mvn install still works. +1 eclipse:eclipse 0m 36s The patch built with eclipse:eclipse. +1 findbugs 0m 40s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 common tests 0m 31s Tests passed in hadoop-minikdc.     36m 54s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12739687/HADOOP-12090.002.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 75a2560 hadoop-minikdc test log https://builds.apache.org/job/PreCommit-HADOOP-Build/6967/artifact/patchprocess/testrun_hadoop-minikdc.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/6967/testReport/ Java 1.7.0_55 uname Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/6967/console This message was automatically generated.
            sjlee0 Sangjin Lee added a comment -

            Due to the nature of the change, I'm not sure if there is a useful unit test that can test this, other than the fact that the patch makes a number of unit tests pass on some platforms.

            Committers, could you kindly review the patch and let me know your feedback? Thanks!

            sjlee0 Sangjin Lee added a comment - Due to the nature of the change, I'm not sure if there is a useful unit test that can test this, other than the fact that the patch makes a number of unit tests pass on some platforms. Committers, could you kindly review the patch and let me know your feedback? Thanks!
            sjlee0 Sangjin Lee added a comment -

            Ping? Any feedback on this patch? Thanks in advance.

            sjlee0 Sangjin Lee added a comment - Ping? Any feedback on this patch? Thanks in advance.
            sjlee0 Sangjin Lee added a comment -

            Ping?

            sjlee0 Sangjin Lee added a comment - Ping?
            wheat9 Haohui Mai added a comment -

            Thanks for the work. I'll take a look later today.

            wheat9 Haohui Mai added a comment - Thanks for the work. I'll take a look later today.
            sjlee0 Sangjin Lee added a comment -

            Thanks!

            sjlee0 Sangjin Lee added a comment - Thanks!
            wheat9 Haohui Mai added a comment -

            I think it might make more sense to upgrade apacheds to the latest version to resolve the issue. There is no backward compatibility concerns as apacheds is only used by minikdc.

            Tweaking the buffer size might work most of the time, but it does not seem to guarantee the packet will not be fragmented due to timing issues.

            wheat9 Haohui Mai added a comment - I think it might make more sense to upgrade apacheds to the latest version to resolve the issue. There is no backward compatibility concerns as apacheds is only used by minikdc. Tweaking the buffer size might work most of the time, but it does not seem to guarantee the packet will not be fragmented due to timing issues.
            sjlee0 Sangjin Lee added a comment -

            Thanks for the comments wheat9. FWIW, it's not clear when this fixed version of apacheds will be released (2.0.0-M21). For those of us who cannot wait for that would need this patch internally to work around the issue.

            Also, regarding the packet fragmentation, it's correct that packets can be fragmented for other reasons as well. That said, it is mitigated by 2 facts: (1) typical kerberos authentication request messages are much smaller (~ 500 bytes) than the proposed window size (64 KB), and (2) mini-kdc is basically loopback connections thus there is little risk of fragmentation unless software (e.g. apacheds) sets the window size arbitrarily small.

            sjlee0 Sangjin Lee added a comment - Thanks for the comments wheat9 . FWIW, it's not clear when this fixed version of apacheds will be released (2.0.0-M21). For those of us who cannot wait for that would need this patch internally to work around the issue. Also, regarding the packet fragmentation, it's correct that packets can be fragmented for other reasons as well. That said, it is mitigated by 2 facts: (1) typical kerberos authentication request messages are much smaller (~ 500 bytes) than the proposed window size (64 KB), and (2) mini-kdc is basically loopback connections thus there is little risk of fragmentation unless software (e.g. apacheds) sets the window size arbitrarily small.
            hadoopqa Hadoop QA added a comment -
            -1 overall



            Vote Subsystem Runtime Comment
            0 reexec 0m 0s Docker mode activated.
            -1 patch 0m 3s HADOOP-12090 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 patch 0m 3s HADOOP-12090 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12739687/HADOOP-12090.002.patch JIRA Issue HADOOP-12090 Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/8677/console Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
            drankye Kai Zheng added a comment -

            Just made a proposal to suggest updating the related codes to rebase on Apache Kerby as Apache Directory project has shifted the Kerberos related effort to the sub-project.

            drankye Kai Zheng added a comment - Just made a proposal to suggest updating the related codes to rebase on Apache Kerby as Apache Directory project has shifted the Kerberos related effort to the sub-project.
            jzhuge John Zhuge added a comment -

            sjlee0 On which platforms do you see this problem? We are seeing the issue 10% of the time with a source tree based on 2.6.

            Since the patch 002 increased the socket buffer size for minikdc, does it mean a certain socker buffer size can reliably reproduce the problem on some platforms? What is that size? 1140 bytes?

            jzhuge John Zhuge added a comment - sjlee0 On which platforms do you see this problem? We are seeing the issue 10% of the time with a source tree based on 2.6. Since the patch 002 increased the socket buffer size for minikdc, does it mean a certain socker buffer size can reliably reproduce the problem on some platforms? What is that size? 1140 bytes?
            hadoopqa Hadoop QA added a comment -
            -1 overall



            Vote Subsystem Runtime Comment
            0 reexec 0m 0s Docker mode activated.
            -1 patch 0m 5s HADOOP-12090 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 patch 0m 5s HADOOP-12090 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Issue HADOOP-12090 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12739687/HADOOP-12090.002.patch Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10695/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
            sjlee0 Sangjin Lee added a comment -

            Sangjin Lee On which platforms do you see this problem? We are seeing the issue 10% of the time with a source tree based on 2.6.

            I think I saw this on Centos 6.

            Since the patch 002 increased the socket buffer size for minikdc, does it mean a certain socker buffer size can reliably reproduce the problem on some platforms? What is that size? 1140 bytes?

            The default would reproduce this (at least in my problem environment). The default is 1 KB if I'm not mistaken. To some extent, this is OS sensitive because it has something to do with the TCP window management. For example, I was not able to reproduce this on mac.

            sjlee0 Sangjin Lee added a comment - Sangjin Lee On which platforms do you see this problem? We are seeing the issue 10% of the time with a source tree based on 2.6. I think I saw this on Centos 6. Since the patch 002 increased the socket buffer size for minikdc, does it mean a certain socker buffer size can reliably reproduce the problem on some platforms? What is that size? 1140 bytes? The default would reproduce this (at least in my problem environment). The default is 1 KB if I'm not mistaken. To some extent, this is OS sensitive because it has something to do with the TCP window management. For example, I was not able to reproduce this on mac.
            drankye Kai Zheng added a comment -

            I'm wondering if this could be reproduced upon the trunk, with the updated MiniKDC.

            drankye Kai Zheng added a comment - I'm wondering if this could be reproduced upon the trunk, with the updated MiniKDC.
            jzhuge John Zhuge added a comment - - edited

            I reproduced TestKMS failures with CDH (2.6.0 based) on Ubuntu 12.02 and 14.04, but not on Mac and Centos 6.6.

            jzhuge John Zhuge added a comment - - edited I reproduced TestKMS failures with CDH (2.6.0 based) on Ubuntu 12.02 and 14.04, but not on Mac and Centos 6.6.
            drankye Kai Zheng added a comment -

            Thanks John. I suppose you meant it on trunk. Did you see the similar errors or logs as described by the issue? Owing to being familiar with the MiniKDC/Kerby implementation, I somehow believe it shouldn't be caused by the same reason, i.g. TCP packet fragmentation.

            drankye Kai Zheng added a comment - Thanks John. I suppose you meant it on trunk. Did you see the similar errors or logs as described by the issue? Owing to being familiar with the MiniKDC/Kerby implementation, I somehow believe it shouldn't be caused by the same reason, i.g. TCP packet fragmentation.
            jzhuge John Zhuge added a comment -

            Just tried trunk, TestKMS, TestSaslDataTransfer, TestTimelineAuthenticationFilter were fine.

            jzhuge John Zhuge added a comment - Just tried trunk, TestKMS, TestSaslDataTransfer, TestTimelineAuthenticationFilter were fine.
            jzhuge John Zhuge added a comment -

            drankye, Sorry I didn't mention "CDH (2.6.0 based)" in my last comment. Corrected.

            So the MiniKDC in trunk seems fine.

            Will test the commit before HADOOP-12911. Upgrade Hadoop MiniKDC with Kerby.

            jzhuge John Zhuge added a comment - drankye , Sorry I didn't mention "CDH (2.6.0 based)" in my last comment. Corrected. So the MiniKDC in trunk seems fine. Will test the commit before HADOOP-12911 . Upgrade Hadoop MiniKDC with Kerby.
            drankye Kai Zheng added a comment -

            Thanks jzhuge for the clarifying and verifying tests. The results sound reasonable to me.

            drankye Kai Zheng added a comment - Thanks jzhuge for the clarifying and verifying tests. The results sound reasonable to me.
            jzhuge John Zhuge added a comment -

            Sync'd my trunk to "f71eb51 HADOOP-10134 [JDK8] Fix Javadoc errors caused by incorrect or illegal tags in doc comments.", TestKMS still passed. Maybe not the changes in minikdc itself but rather updated dependencies fixed the issue?

            jzhuge John Zhuge added a comment - Sync'd my trunk to "f71eb51 HADOOP-10134 [JDK8] Fix Javadoc errors caused by incorrect or illegal tags in doc comments.", TestKMS still passed. Maybe not the changes in minikdc itself but rather updated dependencies fixed the issue?
            drankye Kai Zheng added a comment -

            Did you mvn clean package install when you sync'd to that revision? Maybe the test can only fail in random?

            drankye Kai Zheng added a comment - Did you mvn clean package install when you sync'd to that revision? Maybe the test can only fail in random?
            jzhuge John Zhuge added a comment -

            Did clean install. However, the test is flaky, so it doesn't necessarily fail each time. I will repeat the test 10 times.

            jzhuge John Zhuge added a comment - Did clean install. However, the test is flaky, so it doesn't necessarily fail each time. I will repeat the test 10 times.
            jzhuge John Zhuge added a comment -

            By default Linux automatically adjust the socket buffer size starting from the default value, see tcp(7). The 3 values from tcp_rmem or tcp_wmem are: min, default, max.

            [jzhuge@jzhuge-ubuntu hadoop2]((8fca972...))$ cat /proc/sys/net/ipv4/tcp_rmem
            4096    87380   6291456
            [jzhuge@jzhuge-ubuntu hadoop2]((8fca972...))$ cat /proc/sys/net/ipv4/tcp_wmem
            4096    16384   4194304
            [jzhuge@jzhuge-ubuntu hadoop2]((8fca972...))$ cat /proc/sys/net/ipv4/tcp_moderate_rcvbuf 
            1
            

            Setting SO_SNDBUF and SO_RCVBUF will turn off auto adjustment. The max for SO_RCVBUF or SO_SNDBUF is limited by /proc/sys/net/core/rmem_max or /proc/sys/net/core/wmem_max.

            jzhuge John Zhuge added a comment - By default Linux automatically adjust the socket buffer size starting from the default value, see tcp(7) . The 3 values from tcp_rmem or tcp_wmem are: min, default, max. [jzhuge@jzhuge-ubuntu hadoop2]((8fca972...))$ cat /proc/sys/net/ipv4/tcp_rmem 4096 87380 6291456 [jzhuge@jzhuge-ubuntu hadoop2]((8fca972...))$ cat /proc/sys/net/ipv4/tcp_wmem 4096 16384 4194304 [jzhuge@jzhuge-ubuntu hadoop2]((8fca972...))$ cat /proc/sys/net/ipv4/tcp_moderate_rcvbuf 1 Setting SO_SNDBUF and SO_RCVBUF will turn off auto adjustment. The max for SO_RCVBUF or SO_SNDBUF is limited by /proc/sys/net/core/rmem_max or /proc/sys/net/core/wmem_max .
            sjlee0 Sangjin Lee added a comment - - edited

            Just to be clear, this issue is caused because Mina (the networking stack on which ApacheDS depends) does set the send and receive buffer size to 1 KB (see DIRSERVER-2074 for more detail). If we move away from that behavior by using different libraries or else, the problem may go away.

            sjlee0 Sangjin Lee added a comment - - edited Just to be clear, this issue is caused because Mina (the networking stack on which ApacheDS depends) does set the send and receive buffer size to 1 KB (see DIRSERVER-2074 for more detail). If we move away from that behavior by using different libraries or else, the problem may go away.
            jzhuge John Zhuge added a comment -

            Thanks sjlee0 for the clarification. Increasing socket buffer size to 64K does fix my issues!

            BTW, what is the purpose of change to hadoop-minikdc/pom.xml in patch 002?

            jzhuge John Zhuge added a comment - Thanks sjlee0 for the clarification. Increasing socket buffer size to 64K does fix my issues! BTW, what is the purpose of change to hadoop-minikdc/pom.xml in patch 002?
            sjlee0 Sangjin Lee added a comment -

            The patch adds references to SocketAcceptor and SocketSessionConfig which are classes in Mina. Since these are new direct references, I added the explicit dependency.

            sjlee0 Sangjin Lee added a comment - The patch adds references to SocketAcceptor and SocketSessionConfig which are classes in Mina. Since these are new direct references, I added the explicit dependency.
            hadoopqa Hadoop QA added a comment -
            -1 overall



            Vote Subsystem Runtime Comment
            0 reexec 0m 0s Docker mode activated.
            -1 patch 0m 6s HADOOP-12090 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 patch 0m 6s HADOOP-12090 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Issue HADOOP-12090 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12739687/HADOOP-12090.002.patch Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11967/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
            hadoopqa Hadoop QA added a comment -
            -1 overall



            Vote Subsystem Runtime Comment
            0 reexec 0m 0s Docker mode activated.
            -1 patch 0m 6s HADOOP-12090 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 patch 0m 6s HADOOP-12090 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Issue HADOOP-12090 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12739687/HADOOP-12090.002.patch Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/13231/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.

            People

              sjlee0 Sangjin Lee
              sjlee0 Sangjin Lee
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated: