Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1605

Convert DFSInputStream synchronized sections to a ReadWrite lock

    Details

    • Type: Improvement Improvement
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: hdfs-client
    • Labels:

      Description

      Hbase does concurrent preads from multiple threads to different blocks of the same hdfs file. Each of these pread calls invoke DFSInputStream.getFileLength() and DFSInputStream.getBlockAt(). These methods are "synchronized", thus causing all the concurrent threads to serialize. It would help performance to convert this to a Read/Write lock

      1. HDFS-1605.txt
        45 kB
        Liang Xie
      2. HADOOP-1605-trunk-2.patch
        46 kB
        kartheek muthyala
      3. HADOOP-1605-trunk-1.patch
        80 kB
        kartheek muthyala
      4. HADOOP-1605-trunk.patch
        45 kB
        kartheek muthyala
      5. DFSClientRWlock.3.txt
        11 kB
        dhruba borthakur
      6. DFSClientRWlock.1.txt
        11 kB
        dhruba borthakur

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Patch Available Patch Available Open Open
          514d 15h 48m 1 Ravi Prakash 08/May/15 19:25
          Open Open Patch Available Patch Available
          1048d 22h 32m 2 Allen Wittenauer 12/May/15 23:25
          Hide
          Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 37s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 javac 7m 33s There were no new javac warning messages.
          +1 javadoc 9m 33s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 2m 16s There were no new checkstyle issues.
          +1 whitespace 0m 1s The patch has no lines that end in whitespace.
          +1 install 1m 33s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 3m 3s The patch does not introduce any new Findbugs (version 2.0.3) warnings.
          +1 native 3m 12s Pre-build of native portion
          +1 hdfs tests 164m 46s Tests passed in hadoop-hdfs.
              207m 34s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12733470/HADOOP-1605-trunk-2.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / a46506d
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11026/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11026/testReport/
          Java 1.7.0_55
          uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11026/console

          This message was automatically generated.

          Show
          Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 37s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 7m 33s There were no new javac warning messages. +1 javadoc 9m 33s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 2m 16s There were no new checkstyle issues. +1 whitespace 0m 1s The patch has no lines that end in whitespace. +1 install 1m 33s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 3m 3s The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 native 3m 12s Pre-build of native portion +1 hdfs tests 164m 46s Tests passed in hadoop-hdfs.     207m 34s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12733470/HADOOP-1605-trunk-2.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / a46506d hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11026/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11026/testReport/ Java 1.7.0_55 uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11026/console This message was automatically generated.
          kartheek muthyala made changes -
          Attachment HADOOP-1605-trunk-2.patch [ 12733470 ]
          Hide
          kartheek muthyala added a comment -

          Whitespaces and checkstyle issues fixed.

          Show
          kartheek muthyala added a comment - Whitespaces and checkstyle issues fixed.
          Hide
          Ravi Prakash added a comment -

          Kartheek! Thanks. I also did not look through all possible cases of new race conditions. Did you?

          Show
          Ravi Prakash added a comment - Kartheek! Thanks. I also did not look through all possible cases of new race conditions. Did you?
          Hide
          Vinayakumar B added a comment -

          kartheek muthyala, Thanks for the patch.
          It would be lot better if patch excludes whitespace changes and format changes and limit to only current Jira related changes. Because lot of them are whitespace changes and format changes.
          I assume you might have done code format on entire file.

          Show
          Vinayakumar B added a comment - kartheek muthyala , Thanks for the patch. It would be lot better if patch excludes whitespace changes and format changes and limit to only current Jira related changes. Because lot of them are whitespace changes and format changes. I assume you might have done code format on entire file.
          Hide
          Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 39s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 javac 7m 30s There were no new javac warning messages.
          +1 javadoc 9m 40s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 2m 12s The applied patch generated 13 new checkstyle issues (total was 89, now 41).
          -1 whitespace 0m 1s The patch has 5 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 install 1m 34s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 3m 6s The patch does not introduce any new Findbugs (version 2.0.3) warnings.
          +1 native 3m 13s Pre-build of native portion
          -1 hdfs tests 165m 58s Tests failed in hadoop-hdfs.
              208m 52s  



          Reason Tests
          Failed unit tests hadoop.tracing.TestTraceAdmin
            hadoop.tools.TestHdfsConfigFields



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12732197/HADOOP-1605-trunk-1.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / f24452d
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/10940/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/10940/artifact/patchprocess/whitespace.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/10940/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/10940/testReport/
          Java 1.7.0_55
          uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10940/console

          This message was automatically generated.

          Show
          Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 39s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 7m 30s There were no new javac warning messages. +1 javadoc 9m 40s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 2m 12s The applied patch generated 13 new checkstyle issues (total was 89, now 41). -1 whitespace 0m 1s The patch has 5 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 install 1m 34s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 3m 6s The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 native 3m 13s Pre-build of native portion -1 hdfs tests 165m 58s Tests failed in hadoop-hdfs.     208m 52s   Reason Tests Failed unit tests hadoop.tracing.TestTraceAdmin   hadoop.tools.TestHdfsConfigFields Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12732197/HADOOP-1605-trunk-1.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / f24452d checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/10940/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/10940/artifact/patchprocess/whitespace.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/10940/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/10940/testReport/ Java 1.7.0_55 uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10940/console This message was automatically generated.
          Allen Wittenauer made changes -
          Assignee kartheek muthyala [ kartheek ]
          Allen Wittenauer made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          kartheek muthyala made changes -
          Labels BB2015-05-TBR BB2015-05-RFC
          kartheek muthyala made changes -
          Attachment HADOOP-1605-trunk-1.patch [ 12732197 ]
          Hide
          kartheek muthyala added a comment -

          New patch to handle issues raised by Ravi Prakash..

          Show
          kartheek muthyala added a comment - New patch to handle issues raised by Ravi Prakash ..
          Hide
          kartheek muthyala added a comment -

          Hi Ravi Prakash , thanks for pointing out the bug. Yes these two methods are also needs to be synchronized through a single lock. We can extend the same logic and take a lock on getCurrentDataNode function too. Fundamentally, the patch would hold write locks on the synchronized functions and would hold read locks on the functions that threads from HBase invoke such as getFileLength() and getBlockAt(). I can submit a new patch with resolving the bug. Let me know.

          Show
          kartheek muthyala added a comment - Hi Ravi Prakash , thanks for pointing out the bug. Yes these two methods are also needs to be synchronized through a single lock. We can extend the same logic and take a lock on getCurrentDataNode function too. Fundamentally, the patch would hold write locks on the synchronized functions and would hold read locks on the functions that threads from HBase invoke such as getFileLength() and getBlockAt(). I can submit a new patch with resolving the bug. Let me know.
          Hide
          Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 37s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 javac 7m 28s There were no new javac warning messages.
          +1 javadoc 9m 36s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 2m 14s The applied patch generated 554 new checkstyle issues (total was 627, now 621).
          -1 whitespace 0m 1s The patch has 34 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 install 1m 32s mvn install still works.
          +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse.
          +1 findbugs 3m 4s The patch does not introduce any new Findbugs (version 2.0.3) warnings.
          +1 native 3m 13s Pre-build of native portion
          -1 hdfs tests 166m 11s Tests failed in hadoop-hdfs.
              208m 55s  



          Reason Tests
          Failed unit tests hadoop.tracing.TestTraceAdmin



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12731455/HADOOP-1605-trunk.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 6f62267
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/10872/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/10872/artifact/patchprocess/whitespace.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/10872/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/10872/testReport/
          Java 1.7.0_55
          uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10872/console

          This message was automatically generated.

          Show
          Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 37s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 7m 28s There were no new javac warning messages. +1 javadoc 9m 36s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 2m 14s The applied patch generated 554 new checkstyle issues (total was 627, now 621). -1 whitespace 0m 1s The patch has 34 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 install 1m 32s mvn install still works. +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse. +1 findbugs 3m 4s The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 native 3m 13s Pre-build of native portion -1 hdfs tests 166m 11s Tests failed in hadoop-hdfs.     208m 55s   Reason Tests Failed unit tests hadoop.tracing.TestTraceAdmin Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12731455/HADOOP-1605-trunk.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 6f62267 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/10872/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/10872/artifact/patchprocess/whitespace.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/10872/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/10872/testReport/ Java 1.7.0_55 uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10872/console This message was automatically generated.
          Ravi Prakash made changes -
          Labels BB2015-05-RFC BB2015-05-TBR
          Ravi Prakash made changes -
          Labels BB2015-05-RFC
          Ravi Prakash made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Assignee dhruba borthakur [ dhruba ]
          Hide
          Ravi Prakash added a comment -

          Thanks for the patch kartheek muthyala
          I'm afraid we may have changed the synchronization model for DFSInputStream in this patch. For example, earlier 2 threads: one accessing getCurrentDatanode() and another seekToNewSource() ; would have synchronized on a single lock. However now, the two threads could proceed at the same time and hence there's a race condition on currentNode. Is my understanding correct?

          Show
          Ravi Prakash added a comment - Thanks for the patch kartheek muthyala I'm afraid we may have changed the synchronization model for DFSInputStream in this patch. For example, earlier 2 threads: one accessing getCurrentDatanode() and another seekToNewSource() ; would have synchronized on a single lock. However now, the two threads could proceed at the same time and hence there's a race condition on currentNode. Is my understanding correct?
          Ravi Prakash made changes -
          Labels BB2015-05-RFC
          kartheek muthyala made changes -
          Attachment HADOOP-1605-trunk.patch [ 12731455 ]
          Hide
          kartheek muthyala added a comment -

          The patch available for this jira was old. Creating a new patch for the latest trunk. Most of these changes are pulled from the patches submitted earlier.

          Show
          kartheek muthyala added a comment - The patch available for this jira was old. Creating a new patch for the latest trunk. Most of these changes are pulled from the patches submitted earlier.
          kartheek muthyala made changes -
          Labels BB2015-05-RFC BB2015-05-TBR BB2015-05-RFC
          kartheek muthyala made changes -
          Labels BB2015-05-TBR BB2015-05-RFC BB2015-05-TBR
          Allen Wittenauer made changes -
          Labels BB2015-05-TBR
          Hide
          Liang Xie added a comment -

          We should write a test to demonstrate the improvement. I can do that unless you have one around the place?

          I think it's still too early to do it. We need to fix another hotspot about BlockReaderLocal as well, just filed HDFS-5664 FYI.

          Show
          Liang Xie added a comment - We should write a test to demonstrate the improvement. I can do that unless you have one around the place? I think it's still too early to do it. We need to fix another hotspot about BlockReaderLocal as well, just filed HDFS-5664 FYI.
          Hide
          Liang Xie added a comment -

          Have you deployed this on your serving cluster

          Our hdfs production version upgrade has a long interval/window, this patch has not be run on a production cluster yet.

          Show
          Liang Xie added a comment - Have you deployed this on your serving cluster Our hdfs production version upgrade has a long interval/window, this patch has not be run on a production cluster yet.
          Hide
          Liang Xie added a comment -

          FYI, HDFS-5663 will track waitFor(4000) issue.

          Show
          Liang Xie added a comment - FYI, HDFS-5663 will track waitFor(4000) issue.
          Hide
          Liang Xie added a comment -

          waitFor(4000);

          totally agreed with your point, it should have a different configurable value for online app like HBase , or offline app processing app.
          I can file a minor jira for this, thanks stack.

          Show
          Liang Xie added a comment - waitFor(4000); totally agreed with your point, it should have a different configurable value for online app like HBase , or offline app processing app. I can file a minor jira for this, thanks stack .
          Hide
          stack added a comment -

          I like this one:

          + waitFor(4000);

          (I know you are only indenting and the above is in the indent and retraining the old code as is when you are changing something else is the way to go but... we should fix the above; could explain a few of those 99th percentiles)

          Patch looks good on a first glance. You are conservative in the changes you make and it seems like the methods are small enough they are easy to cast as methods that require read or write lock. Will check closer soon.

          Have you deployed this on your serving cluster Liang Xie ?

          We should write a test to demonstrate the improvement. I can do that unless you have one around the place?

          Show
          stack added a comment - I like this one: + waitFor(4000); (I know you are only indenting and the above is in the indent and retraining the old code as is when you are changing something else is the way to go but... we should fix the above; could explain a few of those 99th percentiles) Patch looks good on a first glance. You are conservative in the changes you make and it seems like the methods are small enough they are easy to cast as methods that require read or write lock. Will check closer soon. Have you deployed this on your serving cluster Liang Xie ? We should write a test to demonstrate the improvement. I can do that unless you have one around the place?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12617846/HDFS-1605.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5684//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5684//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617846/HDFS-1605.txt against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5684//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5684//console This message is automatically generated.
          Liang Xie made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Liang Xie added a comment -

          Considering risk, i changes most of synchronized section with writeLock. only few places(like getFileLengh, some simple getter methods) use readLock in current impl, i think it's ok since both i and dhruba borthakur observed the very same hotspot: getFileLengh, in similar scenario: HBase read.

          Show
          Liang Xie added a comment - Considering risk, i changes most of synchronized section with writeLock. only few places(like getFileLengh, some simple getter methods) use readLock in current impl, i think it's ok since both i and dhruba borthakur observed the very same hotspot: getFileLengh, in similar scenario: HBase read.
          Liang Xie made changes -
          Attachment HDFS-1605.txt [ 12617846 ]
          Hide
          Liang Xie added a comment -

          Attached is a patch against trunk.

          The getFileLength hotspot pattern like below:

          "IPC Server handler 88 on 12600" daemon prio=10 tid=0x00007f82fc241580 nid=0x4ddc waiting for monitor entry [0x00007f821eefb000]
          java.lang.Thread.State: BLOCKED (on object monitor)
          at org.apache.hadoop.hdfs.DFSInputStream.getFileLength(DFSInputStream.java:242)

          • waiting to lock <0x00000004404597a8> (a org.apache.hadoop.hdfs.DFSInputStream)
            at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:982)
            at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:73)
            at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1393)
            at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1829)
            at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1673)
            at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:341)
            at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
            at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:485)
            at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:506)
            at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226)
            at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:146)
            at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:354)
            at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:385)
            at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:344)
            at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:304)
            at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:584)
          • locked <0x00000004285cb478> (a org.apache.hadoop.hbase.regionserver.StoreScanner)
            at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:446)
          • locked <0x00000004285cb478> (a org.apache.hadoop.hbase.regionserver.StoreScanner)
            at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:164)
            at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3658)
            at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3590)
          • locked <0x00000004285cb310> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
            at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3615)
          • locked <0x00000004285cb310> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
            at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2513)
            at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
            at java.lang.reflect.Method.invoke(Method.java:597)
            at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:460)
            at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1457)

          "IPC Server handler 192 on 12600" daemon prio=10 tid=0x00007f82fc2dd910 nid=0x4e44 runnable [0x00007f821d493000]
          java.lang.Thread.State: RUNNABLE
          at java.io.FileInputStream.readBytes(Native Method)
          at java.io.FileInputStream.read(FileInputStream.java:220)
          at org.apache.hadoop.hdfs.BlockReaderLocal.read(BlockReaderLocal.java:568)

          • locked <0x00000004287fb350> (a org.apache.hadoop.hdfs.BlockReaderLocal)
            at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:542)
            at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:594)
          • locked <0x00000004404597a8> (a org.apache.hadoop.hdfs.DFSInputStream)
            at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:648)
            at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:689)
          • locked <0x00000004404597a8> (a org.apache.hadoop.hdfs.DFSInputStream)
            at java.io.DataInputStream.read(DataInputStream.java:132)
            at org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:614)
            at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1384)
            at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1829)
            at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1673)
            at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:341)
            at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
            at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:485)
            at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:506)
            at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226)
            at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:146)
            at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:354)
            at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:385)
            at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:344)
            at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:304)
            at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:584)
          • locked <0x0000000428374d70> (a org.apache.hadoop.hbase.regionserver.StoreScanner)
            at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:446)
          • locked <0x0000000428374d70> (a org.apache.hadoop.hbase.regionserver.StoreScanner)
            at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:164)
            at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3658)
            at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3590)
          • locked <0x0000000428374c08> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
            at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3615)
          • locked <0x0000000428374c08> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
            at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2513)
            at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
            at java.lang.reflect.Method.invoke(Method.java:597)
            at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:460)
            at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1457)
          Show
          Liang Xie added a comment - Attached is a patch against trunk. The getFileLength hotspot pattern like below: "IPC Server handler 88 on 12600" daemon prio=10 tid=0x00007f82fc241580 nid=0x4ddc waiting for monitor entry [0x00007f821eefb000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.DFSInputStream.getFileLength(DFSInputStream.java:242) waiting to lock <0x00000004404597a8> (a org.apache.hadoop.hdfs.DFSInputStream) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:982) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:73) at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1393) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1829) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1673) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:341) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:485) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:506) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:146) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:354) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:385) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:344) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:304) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:584) locked <0x00000004285cb478> (a org.apache.hadoop.hbase.regionserver.StoreScanner) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:446) locked <0x00000004285cb478> (a org.apache.hadoop.hbase.regionserver.StoreScanner) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:164) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3658) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3590) locked <0x00000004285cb310> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3615) locked <0x00000004285cb310> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2513) at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:460) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1457) "IPC Server handler 192 on 12600" daemon prio=10 tid=0x00007f82fc2dd910 nid=0x4e44 runnable [0x00007f821d493000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:220) at org.apache.hadoop.hdfs.BlockReaderLocal.read(BlockReaderLocal.java:568) locked <0x00000004287fb350> (a org.apache.hadoop.hdfs.BlockReaderLocal) at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:542) at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:594) locked <0x00000004404597a8> (a org.apache.hadoop.hdfs.DFSInputStream) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:648) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:689) locked <0x00000004404597a8> (a org.apache.hadoop.hdfs.DFSInputStream) at java.io.DataInputStream.read(DataInputStream.java:132) at org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:614) at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1384) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1829) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1673) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:341) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:485) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:506) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:146) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:354) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:385) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:344) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:304) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:584) locked <0x0000000428374d70> (a org.apache.hadoop.hbase.regionserver.StoreScanner) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:446) locked <0x0000000428374d70> (a org.apache.hadoop.hbase.regionserver.StoreScanner) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:164) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3658) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3590) locked <0x0000000428374c08> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3615) locked <0x0000000428374c08> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2513) at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:460) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1457)
          Hide
          Liang Xie added a comment -

          we observed this hotspot in our production cluster these days. Most of the waiting lock threads jsut like below:

          19205.637:"IPC Server handler 27 on 12600" daemon prio=10 tid=0x00007f82fc1e5750 nid=0x4d9b waiting for monitor entry [0x00007f821fe78000]
          19205.637- java.lang.Thread.State: BLOCKED (on object monitor)
          19205.637- at org.apache.hadoop.hdfs.DFSInputStream.getFileLength(DFSInputStream.java:242)
          19205.637- - waiting to lock <0x000000044e20d238> (a org.apache.hadoop.hdfs.DFSInputStream)
          19205.637- at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:982)
          19205.637- at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:73)
          19205.637- at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1393)

          and the lock holder is doing the right read things that time.

          Show
          Liang Xie added a comment - we observed this hotspot in our production cluster these days. Most of the waiting lock threads jsut like below: 19205.637:"IPC Server handler 27 on 12600" daemon prio=10 tid=0x00007f82fc1e5750 nid=0x4d9b waiting for monitor entry [0x00007f821fe78000] 19205.637- java.lang.Thread.State: BLOCKED (on object monitor) 19205.637- at org.apache.hadoop.hdfs.DFSInputStream.getFileLength(DFSInputStream.java:242) 19205.637- - waiting to lock <0x000000044e20d238> (a org.apache.hadoop.hdfs.DFSInputStream) 19205.637- at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:982) 19205.637- at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:73) 19205.637- at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1393) and the lock holder is doing the right read things that time.
          Hide
          dhruba borthakur added a comment -

          Hi todd, it will take me quite a while to get performance numbers for you. But it is fine if you do not want this in the 0.20-append branch, no problems with me.

          Show
          dhruba borthakur added a comment - Hi todd, it will take me quite a while to get performance numbers for you. But it is fine if you do not want this in the 0.20-append branch, no problems with me.
          Hide
          Hairong Kuang added a comment -

          +1. The latest patch looks good to me.

          Show
          Hairong Kuang added a comment - +1. The latest patch looks good to me.
          Hide
          Todd Lipcon added a comment -

          Hey Dhruba. I've never seen this lock actually be a problem in practice given current performance issues in DFSInputStream. I understand from Jonathan Gray that you're also testing some other patches to improve DFSInputStream performance. Maybe it would make sense to put those upstream first?

          (or do you have some benchmarks that shows that the rwlock helps things even with an otherwise "stock" DFSInputStream)?

          Show
          Todd Lipcon added a comment - Hey Dhruba. I've never seen this lock actually be a problem in practice given current performance issues in DFSInputStream. I understand from Jonathan Gray that you're also testing some other patches to improve DFSInputStream performance. Maybe it would make sense to put those upstream first? (or do you have some benchmarks that shows that the rwlock helps things even with an otherwise "stock" DFSInputStream)?
          dhruba borthakur made changes -
          Attachment DFSClientRWlock.3.txt [ 12469965 ]
          Hide
          dhruba borthakur added a comment -

          Merged patch with latest branch.

          Show
          dhruba borthakur added a comment - Merged patch with latest branch.
          dhruba borthakur made changes -
          Attachment DFSClientRWlock.1.txt [ 12469867 ]
          Hide
          dhruba borthakur added a comment -

          Patch for hadoop 0.20 branch

          Show
          dhruba borthakur added a comment - Patch for hadoop 0.20 branch
          dhruba borthakur made changes -
          Field Original Value New Value
          Link This issue is related to HDFS-1599 [ HDFS-1599 ]
          dhruba borthakur created issue -

            People

            • Assignee:
              kartheek muthyala
              Reporter:
              dhruba borthakur
            • Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

              • Created:
                Updated:

                Development