Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-2421

Improve the concurrency of SerialNumberMap in NameNode

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.23.3, 2.0.2-alpha
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      After enabled permission checking in our HDFS test cluster, our benchmark observed a significant reduced concurrency in NameNode. Investigation showed that most threads were blocked at acquiring the lock of org.apache.hadoop.hdfs.server.namenode.SerialNumberManager$SerialNumberMap. We used concurrentHashMap to replace Hashmap + synchronized methods, which greatly improved the situation.

      1. TestSerialNumberManager.java
        4 kB
        Jing Zhao
      2. performance_SerialNumberManager.png
        55 kB
        Jing Zhao
      3. performance_100000_op.png
        48 kB
        Jing Zhao
      4. performance_10000_op.png
        59 kB
        Jing Zhao
      5. HDFS-2421.patch
        2 kB
        Jing Zhao

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #347 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/347/)
          svn merge -c 1374127. FIXES: HDFS-2421. Improve the concurrency of SerialNumberMap in NameNode. Contributed by Jing Zhao and Weiyan Wang (Revision 1374417)

          Result = SUCCESS
          daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374417
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SerialNumberManager.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #347 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/347/ ) svn merge -c 1374127. FIXES: HDFS-2421 . Improve the concurrency of SerialNumberMap in NameNode. Contributed by Jing Zhao and Weiyan Wang (Revision 1374417) Result = SUCCESS daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374417 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SerialNumberManager.java
          Hide
          Daryn Sharp added a comment -

          Merged to branch 23.

          Show
          Daryn Sharp added a comment - Merged to branch 23.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1169 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1169/)
          HDFS-2421. Improve the concurrency of SerialNumberMap in NameNode. Contributed by Jing Zhao and Weiyan Wang (Revision 1374127)

          Result = FAILURE
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374127
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SerialNumberManager.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1169 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1169/ ) HDFS-2421 . Improve the concurrency of SerialNumberMap in NameNode. Contributed by Jing Zhao and Weiyan Wang (Revision 1374127) Result = FAILURE szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374127 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SerialNumberManager.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1137 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1137/)
          HDFS-2421. Improve the concurrency of SerialNumberMap in NameNode. Contributed by Jing Zhao and Weiyan Wang (Revision 1374127)

          Result = FAILURE
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374127
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SerialNumberManager.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1137 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1137/ ) HDFS-2421 . Improve the concurrency of SerialNumberMap in NameNode. Contributed by Jing Zhao and Weiyan Wang (Revision 1374127) Result = FAILURE szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374127 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SerialNumberManager.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2623 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2623/)
          HDFS-2421. Improve the concurrency of SerialNumberMap in NameNode. Contributed by Jing Zhao and Weiyan Wang (Revision 1374127)

          Result = FAILURE
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374127
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SerialNumberManager.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2623 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2623/ ) HDFS-2421 . Improve the concurrency of SerialNumberMap in NameNode. Contributed by Jing Zhao and Weiyan Wang (Revision 1374127) Result = FAILURE szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374127 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SerialNumberManager.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2657 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2657/)
          HDFS-2421. Improve the concurrency of SerialNumberMap in NameNode. Contributed by Jing Zhao and Weiyan Wang (Revision 1374127)

          Result = SUCCESS
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374127
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SerialNumberManager.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2657 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2657/ ) HDFS-2421 . Improve the concurrency of SerialNumberMap in NameNode. Contributed by Jing Zhao and Weiyan Wang (Revision 1374127) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374127 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SerialNumberManager.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2592 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2592/)
          HDFS-2421. Improve the concurrency of SerialNumberMap in NameNode. Contributed by Jing Zhao and Weiyan Wang (Revision 1374127)

          Result = SUCCESS
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374127
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SerialNumberManager.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2592 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2592/ ) HDFS-2421 . Improve the concurrency of SerialNumberMap in NameNode. Contributed by Jing Zhao and Weiyan Wang (Revision 1374127) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374127 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SerialNumberManager.java
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I have committed this. Thanks, Jing Zhao and Weiyan Wang!

          Show
          Tsz Wo Nicholas Sze added a comment - I have committed this. Thanks, Jing Zhao and Weiyan Wang!
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1 patch looks good.

          Show
          Tsz Wo Nicholas Sze added a comment - +1 patch looks good.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12541170/HDFS-2421.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3020//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3020//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12541170/HDFS-2421.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3020//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3020//console This message is automatically generated.
          Hide
          Jing Zhao added a comment -

          Upload the patch for Weiyan's code.

          Show
          Jing Zhao added a comment - Upload the patch for Weiyan's code.
          Hide
          Suresh Srinivas added a comment -

          Jing, can you attach a patch based on code that Weiyan posted.

          Show
          Suresh Srinivas added a comment - Jing, can you attach a patch based on code that Weiyan posted.
          Hide
          Jing Zhao added a comment -

          Nicholas, my previous patch was just for a simple performance testing, and did not consider too much about synchronization semantic. I guess you may want to review Weiyan's code at https://github.com/facebook/hadoop-20/blob/develop/src/hdfs/org/apache/hadoop/hdfs/server/namenode/SerialNumberManager.java, and I think it should be fine in synchronization and can improve the read performance (although the value of serial number is not increasing in the original way).

          Show
          Jing Zhao added a comment - Nicholas, my previous patch was just for a simple performance testing, and did not consider too much about synchronization semantic. I guess you may want to review Weiyan's code at https://github.com/facebook/hadoop-20/blob/develop/src/hdfs/org/apache/hadoop/hdfs/server/namenode/SerialNumberManager.java , and I think it should be fine in synchronization and can improve the read performance (although the value of serial number is not increasing in the original way).
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Hi Jing, there is a race condition even for writing different names since nextSerialNumber() is not synchronized. It could be easily fixed by adding "synchronized" or changing max to AtomicInteger.

          Show
          Tsz Wo Nicholas Sze added a comment - Hi Jing, there is a race condition even for writing different names since nextSerialNumber() is not synchronized. It could be easily fixed by adding "synchronized" or changing max to AtomicInteger.
          Hide
          Weiyan Wang added a comment -

          I think most of operations are just permission checking (read/get). This patch is aimed at improving concurrent read

          Show
          Weiyan Wang added a comment - I think most of operations are just permission checking (read/get). This patch is aimed at improving concurrent read
          Hide
          Jing Zhao added a comment -

          also attach the performance measurement for 10,000 operation version (in the same cluster node). Seems the read/get can get a big improvement under this scenario.

          Show
          Jing Zhao added a comment - also attach the performance measurement for 10,000 operation version (in the same cluster node). Seems the read/get can get a big improvement under this scenario.
          Hide
          Jing Zhao added a comment -

          Measure the time performance with the same test code, but run 100,000 sets of operations per thread. Run the program in a cluster node with 2.5 GHz CPU and 16 GB mem.

          Show
          Jing Zhao added a comment - Measure the time performance with the same test code, but run 100,000 sets of operations per thread. Run the program in a cluster node with 2.5 GHz CPU and 16 GB mem.
          Hide
          Jing Zhao added a comment -

          And Weiyan's patch should be fine with the synchronization. I will test his code's performance and post the result.

          Show
          Jing Zhao added a comment - And Weiyan's patch should be fine with the synchronization. I will test his code's performance and post the result.
          Hide
          Jing Zhao added a comment -

          Hi Nicholas, I agree. So I said there would be a synchronization semantic difference if we remove the "synchronized" keyword. Two threads concurrently writing with the same name may raise issue.

          Show
          Jing Zhao added a comment - Hi Nicholas, I agree. So I said there would be a synchronization semantic difference if we remove the "synchronized" keyword. Two threads concurrently writing with the same name may raise issue.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Hi Jing, I think we cannot simply remove "synchronized" from get(T t). Otherwise, there are race conditions in the following code.

                if (sn == null) {
                  sn = nextSerialNumber();
                  t2i.put(t, sn);
                  i2t.put(sn, t);
                }
          
          Show
          Tsz Wo Nicholas Sze added a comment - Hi Jing, I think we cannot simply remove "synchronized" from get(T t). Otherwise, there are race conditions in the following code. if (sn == null ) { sn = nextSerialNumber(); t2i.put(t, sn); i2t.put(sn, t); }
          Hide
          Weiyan Wang added a comment -

          Sorry, I don't have time to submit a patch.
          Here is our running version of SerialNumberManager
          https://github.com/facebook/hadoop-20/blob/develop/src/hdfs/org/apache/hadoop/hdfs/server/namenode/SerialNumberManager.java
          Unfortunately, we don't do much performance analysis on it, we just observe one thread is no longer blocking other threads any more.

          Show
          Weiyan Wang added a comment - Sorry, I don't have time to submit a patch. Here is our running version of SerialNumberManager https://github.com/facebook/hadoop-20/blob/develop/src/hdfs/org/apache/hadoop/hdfs/server/namenode/SerialNumberManager.java Unfortunately, we don't do much performance analysis on it, we just observe one thread is no longer blocking other threads any more.
          Hide
          Suresh Srinivas added a comment -

          Jing, given the performance numbers you have posted for the map is being used in a tight loop, I doubt if this makes such a big difference for the overall NameNode performance.

          Hairong or Weiyan, do you have any numbers to substantiate the improvement? BTW we should create a separate jira to improve the get method which checks the map entries two times, once with contains then to get.

          Show
          Suresh Srinivas added a comment - Jing, given the performance numbers you have posted for the map is being used in a tight loop, I doubt if this makes such a big difference for the overall NameNode performance. Hairong or Weiyan, do you have any numbers to substantiate the improvement? BTW we should create a separate jira to improve the get method which checks the map entries two times, once with contains then to get.
          Hide
          Jing Zhao added a comment -

          Write a simple test program to compare the two programs: ConcurrentHashMap vs. HashMap + synchronized. The TestSerialNumberManager.java is the test code. The ConcurrentHashMap version changes the original synchronization semantic, but seems fine if every thread is generating serial number for different names.

          A simple performance measurement is included (results are in milliseconds). For both read and write, every thread is doing
          10000 sets of r/w operations, and I tested scenarios with 10, 20, 50, and 100 threads. The test is on a MAC with 8G memory and a 2.2 GHz i7 processor.

          Show
          Jing Zhao added a comment - Write a simple test program to compare the two programs: ConcurrentHashMap vs. HashMap + synchronized. The TestSerialNumberManager.java is the test code. The ConcurrentHashMap version changes the original synchronization semantic, but seems fine if every thread is generating serial number for different names. A simple performance measurement is included (results are in milliseconds). For both read and write, every thread is doing 10000 sets of r/w operations, and I tested scenarios with 10, 20, 50, and 100 threads. The test is on a MAC with 8G memory and a 2.2 GHz i7 processor.
          Hide
          luoli added a comment -

          any performance different details?

          Show
          luoli added a comment - any performance different details?
          Hide
          luoli added a comment -

          hey weiyan ,any patch for this?

          Show
          luoli added a comment - hey weiyan ,any patch for this?

            People

            • Assignee:
              Jing Zhao
              Reporter:
              Hairong Kuang
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development