Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1583

Improve backup-node sync performance by wrapping RPC parameters

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.23.0
    • Component/s: namenode
    • Labels:
      None

      Description

      The journal edit records are sent by the active name-node to the backup-node with RPC:

        public void journal(NamenodeRegistration registration,
                            int jAction,
                            int length,
                            byte[] records) throws IOException;
      

      During the name-node throughput benchmark, the size of byte array records is around 8000. Then the serialization and deserialization is time-consuming. I wrote a simple application to test RPC with byte array parameter. When the size got to 8000, each RPC call need about 6 ms. While name-node sync 8k byte to local disk only need 0.3~0.4ms.

      1. HDFS-1583-1.patch
        7 kB
        Liyin Liang
      2. HDFS-1583-2.patch
        9 kB
        Liyin Liang
      3. test-rpc.diff
        8 kB
        Liyin Liang

        Issue Links

          Activity

          Liyin Liang created issue -
          Liyin Liang made changes -
          Field Original Value New Value
          Link This issue relates to HADOOP-4539 [ HADOOP-4539 ]
          Hide
          Liyin Liang added a comment -

          The backup-node is introduced by HADOOP-4539.

          Show
          Liyin Liang added a comment - The backup-node is introduced by HADOOP-4539 .
          Hide
          Liyin Liang added a comment -

          Attach a patch which wraps Journal function's parameters with a writable class JournalArgs. The following table shows the performance number of RPC call with and without this patch.
          The contents of the form is each rpc call's average time which is total elapse(ms) divided by 10,000 times.

          array size 1k 2k 3k 4k 5k 6k 7k 8k
          without patch 1.2212 1.9266 2.5415 3.2025 4.8677 4.5679 5.2211 5.9386
          with patch 0.4774 0.4087 0.4521 0.4375 0.4215 0.4616 0.4551 0.4844
          Show
          Liyin Liang added a comment - Attach a patch which wraps Journal function's parameters with a writable class JournalArgs . The following table shows the performance number of RPC call with and without this patch. The contents of the form is each rpc call's average time which is total elapse(ms) divided by 10,000 times. array size 1k 2k 3k 4k 5k 6k 7k 8k without patch 1.2212 1.9266 2.5415 3.2025 4.8677 4.5679 5.2211 5.9386 with patch 0.4774 0.4087 0.4521 0.4375 0.4215 0.4616 0.4551 0.4844
          Liyin Liang made changes -
          Attachment HDFS-1583-1.patch [ 12468224 ]
          Liyin Liang made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Fix Version/s 0.23.0 [ 12315571 ]
          Hide
          Todd Lipcon added a comment -

          Why is our RPC framework so bad at byte[] parameters? Could we optimize this at the IPC layer?

          Show
          Todd Lipcon added a comment - Why is our RPC framework so bad at byte[] parameters? Could we optimize this at the IPC layer?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12468224/HDFS-1583-1.patch
          against trunk revision 1058402.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.hdfs.server.namenode.TestStorageRestore

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/104//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/104//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/104//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12468224/HDFS-1583-1.patch against trunk revision 1058402. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.server.namenode.TestStorageRestore -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/104//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/104//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/104//console This message is automatically generated.
          Hide
          Liyin Liang added a comment -

          Hi Todd,
          This is mainly caused by the serialization of array. The job is done by :

          ObjectWritable::writeObject(DataOutput out, Object instance,
                                           Class declaredClass, 
                                           Configuration conf)
          

          This function traverses the array and serialize each element as an object. According to my test, an byte array with 8000 elements will grow up to 56008 elements after serialization (2.4ms). However, a wrapped object size is 8094 after serialization (0.03ms).

          By the way, there is a array wrapper class already:

          public class ArrayWritable implements Writable
          

          This class is used in FSEditLog to log operations, e.g. FSEditLog::logMkDir(String path, INode newNode).
          I'll update the patch to use ArrayWritable.

          Show
          Liyin Liang added a comment - Hi Todd, This is mainly caused by the serialization of array. The job is done by : ObjectWritable::writeObject(DataOutput out, Object instance, Class declaredClass, Configuration conf) This function traverses the array and serialize each element as an object. According to my test, an byte array with 8000 elements will grow up to 56008 elements after serialization (2.4ms). However, a wrapped object size is 8094 after serialization (0.03ms). By the way, there is a array wrapper class already: public class ArrayWritable implements Writable This class is used in FSEditLog to log operations, e.g. FSEditLog::logMkDir(String path, INode newNode). I'll update the patch to use ArrayWritable.
          Hide
          Liyin Liang added a comment -

          add a test case of JournalArgs.
          ArrayWritable only accepts writable object, so we cannot use it to wrap byte array.

          Show
          Liyin Liang added a comment - add a test case of JournalArgs . ArrayWritable only accepts writable object, so we cannot use it to wrap byte array.
          Liyin Liang made changes -
          Attachment HDFS-1583-2.patch [ 12468257 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12468257/HDFS-1583-2.patch
          against trunk revision 1058402.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.cli.TestHDFSCLI
          org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery
          org.apache.hadoop.hdfs.server.namenode.TestStorageRestore

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/105//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/105//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/105//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12468257/HDFS-1583-2.patch against trunk revision 1058402. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery org.apache.hadoop.hdfs.server.namenode.TestStorageRestore -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/105//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/105//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/105//console This message is automatically generated.
          Hide
          Konstantin Shvachko added a comment -

          Liyin, This is a nice optimization, and thanks for measuring the performance.
          I think this is critical for 0.22 release. Writing edits to BN should not be slower than writing to disk.

          > byte array with 8000 elements will grow up to 56008 elements after serialization

          This makes RPC very inefficient.
          Java arrays can not hold different type instances (see ArrayStoreException for refs), so serializing type name for each element does not make sense. Type should be stored only once for the entire array.
          Does anybody remember discussions or jiras open for it?

          We need to decide if it should be fixed in RPC or locally for BackupNode only. Seems that RPC level fix would optimize communication in general, but will be massively backward incompatible. It could be a good time to do that now before the major release.

          Show
          Konstantin Shvachko added a comment - Liyin, This is a nice optimization, and thanks for measuring the performance. I think this is critical for 0.22 release. Writing edits to BN should not be slower than writing to disk. > byte array with 8000 elements will grow up to 56008 elements after serialization This makes RPC very inefficient. Java arrays can not hold different type instances (see ArrayStoreException for refs), so serializing type name for each element does not make sense. Type should be stored only once for the entire array. Does anybody remember discussions or jiras open for it? We need to decide if it should be fixed in RPC or locally for BackupNode only. Seems that RPC level fix would optimize communication in general, but will be massively backward incompatible. It could be a good time to do that now before the major release.
          Hide
          Todd Lipcon added a comment -

          We did this optimization for the RPC layer in HBase long ago (HBASE-82). Here's the current code:

          https://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java#L387

          Is there a way to do the change to ObjectWritable in a way that the new version can still read old data?

          Show
          Todd Lipcon added a comment - We did this optimization for the RPC layer in HBase long ago ( HBASE-82 ). Here's the current code: https://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java#L387 Is there a way to do the change to ObjectWritable in a way that the new version can still read old data?
          Todd Lipcon made changes -
          Assignee Liyin Liang [ liangly ]
          Hide
          Doug Cutting added a comment -

          ObjectWritable might be used by applications and changing its format may be hard to do compatibly. But it would be relatively easy to switch RPC to use a different format. So we might move ObjectWritable's array i/o to protected methods, add a subclass with an optimized implementation of these methods, then use that subclass in RPC in place of ObjectWritable.

          Show
          Doug Cutting added a comment - ObjectWritable might be used by applications and changing its format may be hard to do compatibly. But it would be relatively easy to switch RPC to use a different format. So we might move ObjectWritable's array i/o to protected methods, add a subclass with an optimized implementation of these methods, then use that subclass in RPC in place of ObjectWritable.
          Hide
          Konstantin Shvachko added a comment -

          Looks like there is a jira for that HADOOP-6949.
          I'll reopen it as we have more benchmarks now and let's move the RPC discussion there.

          Show
          Konstantin Shvachko added a comment - Looks like there is a jira for that HADOOP-6949 . I'll reopen it as we have more benchmarks now and let's move the RPC discussion there.
          Konstantin Shvachko made changes -
          Link This issue is blocked by HADOOP-6949 [ HADOOP-6949 ]
          Hide
          Konstantin Shvachko added a comment -

          Liyin, what benchmarks did you use to measure RPC performance in this case. Can it be patch, so that anybody could reproduce the results, and may be apply the same to blockReport() or other methods.

          Show
          Konstantin Shvachko added a comment - Liyin, what benchmarks did you use to measure RPC performance in this case. Can it be patch, so that anybody could reproduce the results, and may be apply the same to blockReport() or other methods.
          Hide
          Liyin Liang added a comment -

          Hi Konstantin,
          I'll attach the benchmark code as a diff file, which is a simple client-server app based on trunk-common project.

          Show
          Liyin Liang added a comment - Hi Konstantin, I'll attach the benchmark code as a diff file, which is a simple client-server app based on trunk-common project.
          Liyin Liang made changes -
          Attachment test-rpc.diff [ 12468458 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12468458/test-rpc.diff
          against trunk revision 1058402.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          -1 release audit. The applied patch generated 3 release audit warnings (more than the trunk's current 0 warnings).

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.fs.permission.TestStickyBit
          org.apache.hadoop.hdfs.security.TestDelegationToken
          org.apache.hadoop.hdfs.server.common.TestDistributedUpgrade
          org.apache.hadoop.hdfs.server.datanode.TestBlockReport
          org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
          org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics
          org.apache.hadoop.hdfs.server.namenode.TestBackupNode
          org.apache.hadoop.hdfs.server.namenode.TestBlocksWithNotEnoughRacks
          org.apache.hadoop.hdfs.server.namenode.TestBlockTokenWithDFS
          org.apache.hadoop.hdfs.server.namenode.TestCheckpoint
          org.apache.hadoop.hdfs.server.namenode.TestFsck
          org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs
          org.apache.hadoop.hdfs.server.namenode.TestStorageRestore
          org.apache.hadoop.hdfs.server.namenode.TestUnderReplicatedBlocks
          org.apache.hadoop.hdfs.TestCrcCorruption
          org.apache.hadoop.hdfs.TestDatanodeBlockScanner
          org.apache.hadoop.hdfs.TestDatanodeDeath
          org.apache.hadoop.hdfs.TestDFSClientRetries
          org.apache.hadoop.hdfs.TestDFSFinalize
          org.apache.hadoop.hdfs.TestDFSRollback
          org.apache.hadoop.hdfs.TestDFSShell
          org.apache.hadoop.hdfs.TestDFSStartupVersions
          org.apache.hadoop.hdfs.TestDFSStorageStateRecovery
          org.apache.hadoop.hdfs.TestDFSUpgradeFromImage
          org.apache.hadoop.hdfs.TestDFSUpgrade
          org.apache.hadoop.hdfs.TestDistributedFileSystem
          org.apache.hadoop.hdfs.TestFileAppend2
          org.apache.hadoop.hdfs.TestFileAppend3
          org.apache.hadoop.hdfs.TestFileAppend4
          org.apache.hadoop.hdfs.TestFileAppend
          org.apache.hadoop.hdfs.TestFileConcurrentReader
          org.apache.hadoop.hdfs.TestFileCreationNamenodeRestart
          org.apache.hadoop.hdfs.TestFileCreation
          org.apache.hadoop.hdfs.TestHDFSFileSystemContract
          org.apache.hadoop.hdfs.TestHDFSTrash
          org.apache.hadoop.hdfs.TestPread
          org.apache.hadoop.hdfs.TestQuota
          org.apache.hadoop.hdfs.TestReplication
          org.apache.hadoop.hdfs.TestRestartDFS
          org.apache.hadoop.hdfs.TestSetrepDecreasing
          org.apache.hadoop.hdfs.TestSetrepIncreasing
          org.apache.hadoop.hdfs.TestWriteConfigurationToDFS

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/111//testReport/
          Release audit warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/111//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/111//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/111//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12468458/test-rpc.diff against trunk revision 1058402. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. -1 release audit. The applied patch generated 3 release audit warnings (more than the trunk's current 0 warnings). -1 core tests. The patch failed these core unit tests: org.apache.hadoop.fs.permission.TestStickyBit org.apache.hadoop.hdfs.security.TestDelegationToken org.apache.hadoop.hdfs.server.common.TestDistributedUpgrade org.apache.hadoop.hdfs.server.datanode.TestBlockReport org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics org.apache.hadoop.hdfs.server.namenode.TestBackupNode org.apache.hadoop.hdfs.server.namenode.TestBlocksWithNotEnoughRacks org.apache.hadoop.hdfs.server.namenode.TestBlockTokenWithDFS org.apache.hadoop.hdfs.server.namenode.TestCheckpoint org.apache.hadoop.hdfs.server.namenode.TestFsck org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs org.apache.hadoop.hdfs.server.namenode.TestStorageRestore org.apache.hadoop.hdfs.server.namenode.TestUnderReplicatedBlocks org.apache.hadoop.hdfs.TestCrcCorruption org.apache.hadoop.hdfs.TestDatanodeBlockScanner org.apache.hadoop.hdfs.TestDatanodeDeath org.apache.hadoop.hdfs.TestDFSClientRetries org.apache.hadoop.hdfs.TestDFSFinalize org.apache.hadoop.hdfs.TestDFSRollback org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.hdfs.TestDFSStartupVersions org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestDFSUpgradeFromImage org.apache.hadoop.hdfs.TestDFSUpgrade org.apache.hadoop.hdfs.TestDistributedFileSystem org.apache.hadoop.hdfs.TestFileAppend2 org.apache.hadoop.hdfs.TestFileAppend3 org.apache.hadoop.hdfs.TestFileAppend4 org.apache.hadoop.hdfs.TestFileAppend org.apache.hadoop.hdfs.TestFileConcurrentReader org.apache.hadoop.hdfs.TestFileCreationNamenodeRestart org.apache.hadoop.hdfs.TestFileCreation org.apache.hadoop.hdfs.TestHDFSFileSystemContract org.apache.hadoop.hdfs.TestHDFSTrash org.apache.hadoop.hdfs.TestPread org.apache.hadoop.hdfs.TestQuota org.apache.hadoop.hdfs.TestReplication org.apache.hadoop.hdfs.TestRestartDFS org.apache.hadoop.hdfs.TestSetrepDecreasing org.apache.hadoop.hdfs.TestSetrepIncreasing org.apache.hadoop.hdfs.TestWriteConfigurationToDFS -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/111//testReport/ Release audit warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/111//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/111//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/111//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12468458/test-rpc.diff
          against trunk revision 1059508.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          -1 release audit. The applied patch generated 3 release audit warnings (more than the trunk's current 0 warnings).

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.hdfs.server.namenode.TestLargeDirectoryDelete
          org.apache.hadoop.hdfs.server.namenode.TestListCorruptFileBlocks
          org.apache.hadoop.hdfs.server.namenode.TestStorageRestore
          org.apache.hadoop.hdfs.TestDatanodeBlockScanner
          org.apache.hadoop.hdfs.TestFileConcurrentReader

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/114//testReport/
          Release audit warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/114//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/114//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/114//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12468458/test-rpc.diff against trunk revision 1059508. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. -1 release audit. The applied patch generated 3 release audit warnings (more than the trunk's current 0 warnings). -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.server.namenode.TestLargeDirectoryDelete org.apache.hadoop.hdfs.server.namenode.TestListCorruptFileBlocks org.apache.hadoop.hdfs.server.namenode.TestStorageRestore org.apache.hadoop.hdfs.TestDatanodeBlockScanner org.apache.hadoop.hdfs.TestFileConcurrentReader -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/114//testReport/ Release audit warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/114//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/114//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/114//console This message is automatically generated.
          Hide
          Matt Foley added a comment -

          A patch with an alternative solution is attached to issue #HADOOP-6949. This is a generalized fix for any protocol that pumps arrays of primitives (long[], byte[], etc.) through the RPC and ObjectWritable mechanisms, including block reports and journaling.

          It has the following benefits:
          1. it is fully backward compatible with any persisted data that may be out there, because it can still successfully receive/read the old format
          2. it has minimal changes to the ObjectWritable module, and doesn't require any changes to protocols or APIs
          3. it is consistent with current practice regarding *Writable classes
          4. It automatically benefits all protocols that send arrays of primitives through the RPC and ObjectWritable mechanisms, without changes to the protocol APIs
          5. It has fully optimized wire format, with almost no object overhead, unlike solutions based on ArrayWritable.

          That said, while it is backward compatible with old stored data, it is admittedly not backward compatible with older versions of Client software: A NEWER RPC RECEIVER can correctly receive data from an OLDER SENDER, but an OLDER RECEIVER will not correctly receive arrays of primitives sent from a NEWER SENDER. I haven't been able to think of a solution without getting into protocol negotiation schemes.

          I understand that Liyin's proposed solution, by adding new signatures to the journaling protocol, could have the benefit of backward compatibility with old Clients. If that is critical, then we must go that way. But that way only benefits the specific APIs that get modified, and adds significant maintenance complexity in the long run. I would also point out that Liyin's current patch does NOT provide that backward compatibility, because it replaces the protocol signatures rather than extending them. If the current patch is acceptable, meaning we don't need Client API backward compatibility, then I strongly recommend the more general solution of HADOOP-6949 instead.

          As far as performance goes, I measured the RPC overhead time in sending a 50,000-block Block Report in the form of an array of long[150000]. (That's the problem that got me working on this.) The current RPC round-trip overhead in a system with no resource contention is about 180 msec. With my proposed patch, and no code change to the calling code, this is reduced to only 50 msec, i.e. an 80% reduction. The size reduction is comparable to what Liyin measured, for similar reasons.

          Show
          Matt Foley added a comment - A patch with an alternative solution is attached to issue # HADOOP-6949 . This is a generalized fix for any protocol that pumps arrays of primitives (long[], byte[], etc.) through the RPC and ObjectWritable mechanisms, including block reports and journaling. It has the following benefits: 1. it is fully backward compatible with any persisted data that may be out there, because it can still successfully receive/read the old format 2. it has minimal changes to the ObjectWritable module, and doesn't require any changes to protocols or APIs 3. it is consistent with current practice regarding *Writable classes 4. It automatically benefits all protocols that send arrays of primitives through the RPC and ObjectWritable mechanisms, without changes to the protocol APIs 5. It has fully optimized wire format, with almost no object overhead, unlike solutions based on ArrayWritable. That said, while it is backward compatible with old stored data, it is admittedly not backward compatible with older versions of Client software: A NEWER RPC RECEIVER can correctly receive data from an OLDER SENDER, but an OLDER RECEIVER will not correctly receive arrays of primitives sent from a NEWER SENDER. I haven't been able to think of a solution without getting into protocol negotiation schemes. I understand that Liyin's proposed solution, by adding new signatures to the journaling protocol, could have the benefit of backward compatibility with old Clients. If that is critical, then we must go that way. But that way only benefits the specific APIs that get modified, and adds significant maintenance complexity in the long run. I would also point out that Liyin's current patch does NOT provide that backward compatibility, because it replaces the protocol signatures rather than extending them. If the current patch is acceptable, meaning we don't need Client API backward compatibility, then I strongly recommend the more general solution of HADOOP-6949 instead. As far as performance goes, I measured the RPC overhead time in sending a 50,000-block Block Report in the form of an array of long [150000] . (That's the problem that got me working on this.) The current RPC round-trip overhead in a system with no resource contention is about 180 msec. With my proposed patch, and no code change to the calling code, this is reduced to only 50 msec, i.e. an 80% reduction. The size reduction is comparable to what Liyin measured, for similar reasons.
          Hide
          Matt Foley added a comment -

          D'uh typo: That's a 70% reduction, not 80%.

          Show
          Matt Foley added a comment - D'uh typo: That's a 70% reduction, not 80%.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12468458/test-rpc.diff
          against trunk revision 1072023.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          -1 release audit. The applied patch generated 3 release audit warnings (more than the trunk's current 0 warnings).

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/176//testReport/
          Release audit warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/176//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/176//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/176//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12468458/test-rpc.diff against trunk revision 1072023. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. -1 release audit. The applied patch generated 3 release audit warnings (more than the trunk's current 0 warnings). +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/176//testReport/ Release audit warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/176//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/176//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/176//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12468458/test-rpc.diff
          against trunk revision 1074282.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          -1 release audit. The applied patch generated 3 release audit warnings (more than the trunk's current 0 warnings).

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.hdfs.server.balancer.TestBalancer
          org.apache.hadoop.hdfs.server.datanode.TestBlockReport
          org.apache.hadoop.hdfs.TestFileConcurrentReader

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/212//testReport/
          Release audit warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/212//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/212//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/212//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12468458/test-rpc.diff against trunk revision 1074282. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. -1 release audit. The applied patch generated 3 release audit warnings (more than the trunk's current 0 warnings). -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.server.balancer.TestBalancer org.apache.hadoop.hdfs.server.datanode.TestBlockReport org.apache.hadoop.hdfs.TestFileConcurrentReader -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/212//testReport/ Release audit warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/212//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/212//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/212//console This message is automatically generated.
          Hide
          Matt Foley added a comment -

          HADOOP-6949 has been accepted and committed. Respectfully recommend that HDFS-1583-2.patch be withdrawn and this bug resolved as Fixed.

          Show
          Matt Foley added a comment - HADOOP-6949 has been accepted and committed. Respectfully recommend that HDFS-1583 -2.patch be withdrawn and this bug resolved as Fixed.
          Hide
          Liyin Liang added a comment -

          HADOOP-6949 has been accepted and committed.

          Show
          Liyin Liang added a comment - HADOOP-6949 has been accepted and committed.
          Liyin Liang made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Liyin Liang added a comment -

          This bug has been fixed by HADOOP-6949.

          Show
          Liyin Liang added a comment - This bug has been fixed by HADOOP-6949 .
          Liyin Liang made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Patch Available Patch Available
          26m 6s 1 Liyin Liang 13/Jan/11 06:13
          Patch Available Patch Available Open Open
          67d 23h 46m 1 Liyin Liang 22/Mar/11 06:00
          Open Open Resolved Resolved
          1m 9s 1 Liyin Liang 22/Mar/11 06:01

            People

            • Assignee:
              Liyin Liang
              Reporter:
              Liyin Liang
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development