Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-6450

Support non-positional hedged reads in HDFS

    Details

    • Type: Improvement Improvement
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.4.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Target Version/s:

      Description

      HDFS-5776 added support for hedged positional reads. We should also support hedged non-position reads (aka regular reads).

        Issue Links

          Activity

          Hide
          Liang Xie added a comment -

          will dive into it a couple of days later.

          Show
          Liang Xie added a comment - will dive into it a couple of days later.
          Hide
          Liang Xie added a comment -

          Colin Patrick McCabe Stack I made a prototype already(still adding more test cases), but i think we need to clear one thing right now: how to handle the block reader for read().
          In current impl, the read() maintains a blockreader to reuse for furture read calls, and pread() will create a new blockreader in each call always.
          So if we want to enhance read() to have hedged read ability, should we assign the first completed read request's block reader to above maintained blockreader variable?(i guess most of situations, this behavior probably will be like change the local block reader to a remote block reader?)

          Show
          Liang Xie added a comment - Colin Patrick McCabe Stack I made a prototype already(still adding more test cases), but i think we need to clear one thing right now: how to handle the block reader for read(). In current impl, the read() maintains a blockreader to reuse for furture read calls, and pread() will create a new blockreader in each call always. So if we want to enhance read() to have hedged read ability, should we assign the first completed read request's block reader to above maintained blockreader variable?(i guess most of situations, this behavior probably will be like change the local block reader to a remote block reader?)
          Hide
          Liang Xie added a comment -

          Attached is a preliminary patch similar with pread() doing, means creating block reader always. It should be the simiplest way, but not very friendly for read(), since no block reader be reusable...

          Show
          Liang Xie added a comment - Attached is a preliminary patch similar with pread() doing, means creating block reader always. It should be the simiplest way, but not very friendly for read(), since no block reader be reusable...
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12654295/HDFS-6450-like-pread.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7285//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/7285//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7285//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12654295/HDFS-6450-like-pread.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7285//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/7285//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7285//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12654948/HDFS-6450-like-pread.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.namenode.TestProcessCorruptBlocks

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7320//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7320//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12654948/HDFS-6450-like-pread.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestProcessCorruptBlocks +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7320//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7320//console This message is automatically generated.
          Hide
          Liang Xie added a comment -

          After a deep looking, it's kind of hard to reuse/maintain block reader as before.
          In pread(), we don't have this trouble, because we always create new block reader.
          In read(), if we want to support hedged read ability, in general:
          1) first read(r1) using the old block reader if possible, then wait hedged read timeout setting
          2) second read(r2) must create a new block reader, and submit into thread pool
          3) wait the first completed task, and return final read result to client side. Here we need to set(remember) this task's block reader to DFIS's block reader variable, and should keep it open, but we also need to close the other block reader to avoid leak.

          Another thing need to know is that if we remember the faster block reader, if it's a remote block reader, then the following read() will bypass local read in the following r1 operations...
          Any thought ? Colin Patrick McCabe, Stack ...

          Show
          Liang Xie added a comment - After a deep looking, it's kind of hard to reuse/maintain block reader as before. In pread(), we don't have this trouble, because we always create new block reader. In read(), if we want to support hedged read ability, in general: 1) first read(r1) using the old block reader if possible, then wait hedged read timeout setting 2) second read(r2) must create a new block reader, and submit into thread pool 3) wait the first completed task, and return final read result to client side. Here we need to set(remember) this task's block reader to DFIS's block reader variable, and should keep it open, but we also need to close the other block reader to avoid leak. Another thing need to know is that if we remember the faster block reader, if it's a remote block reader, then the following read() will bypass local read in the following r1 operations... Any thought ? Colin Patrick McCabe , Stack ...
          Hide
          Colin Patrick McCabe added a comment -

          Yeah... if the first read, done with the old block reader, doesn't finish first, then I think we should forget about that block reader... even if it's a BlockReaderLocal and the new one is remote. After all, slow or misbehaving local disks are one of the main problems we're trying to cover up with hedged reads. As you pointed out, we need to close the old block reader in this situation.

          I think from an implementation point of view, you might consider unsetting DFSInputStream#blockReader at the beginning of the hedged read (set to null and move the old block reader into the first future). Then have the "winning" future set it to the block reader that it used.

          There will be some performance regression just due to using threads and futures here, instead of just doing the read from the same thread that needs the data. Passing data between threads is slower because you might be going between CPU cores. I don't know if there's really a good way to address this without doing something like HDFS-6695, which is out of scope for this change. I think in the short term, applications will have to turn on non-positional hedged reads explicitly, and accept some small loss in throughput for a major reduction in long-tail latency.

          Show
          Colin Patrick McCabe added a comment - Yeah... if the first read, done with the old block reader, doesn't finish first, then I think we should forget about that block reader... even if it's a BlockReaderLocal and the new one is remote. After all, slow or misbehaving local disks are one of the main problems we're trying to cover up with hedged reads. As you pointed out, we need to close the old block reader in this situation. I think from an implementation point of view, you might consider unsetting DFSInputStream#blockReader at the beginning of the hedged read (set to null and move the old block reader into the first future). Then have the "winning" future set it to the block reader that it used. There will be some performance regression just due to using threads and futures here, instead of just doing the read from the same thread that needs the data. Passing data between threads is slower because you might be going between CPU cores. I don't know if there's really a good way to address this without doing something like HDFS-6695 , which is out of scope for this change. I think in the short term, applications will have to turn on non-positional hedged reads explicitly, and accept some small loss in throughput for a major reduction in long-tail latency.
          Hide
          Liang Xie added a comment -

          I think in the short term, applications will have to turn on non-positional hedged reads explicitly, and accept some small loss in throughput for a major reduction in long-tail latency.

          yes, it's a trade-off

          Show
          Liang Xie added a comment - I think in the short term, applications will have to turn on non-positional hedged reads explicitly, and accept some small loss in throughput for a major reduction in long-tail latency. yes, it's a trade-off
          Hide
          stack added a comment -

          I like Colin Patrick McCabe's take and his suggestion on unsetting current blockReader while the race is on.

          In the patch, nit, you have to make two instances of Random? Can't share?

          How is testHedgedReadBasic exercising the new functionality? Or is it just verifying nothing broke when it is turned on?

          readWithStrategy has synchronized added (you've moved the synchronize down a level it seems) but hedgedReadWithStrategy is not synchronized. Makes for interesting synchronizes inside hedgedReadWithStrategy. Could do w/ comments explaining what is going on. For example, the below is a little baffling until I look closely and see pos accesses are always inside synchronized methods (I think).

          + synchronized (this) {
          + if (pos >= getFileLength())

          { + return -1; + }

          + }

          I took a look at this seems to be the only data member that needs synchronize protection in this method. Any danger if someone else changes it under you while this method is running? (between sync blocks)? And it is ok having multiple threads inside hedgedReadWithStrategy at the one time when say readWithStrategy doesn't allow this to happen?

          I like your use of a an old english catchall when problem WARN logging a hedgedReadWithStrategy return. Might want to change that in v2 (smile).

          Lots of overlap w/ the pread version. Would be coolio if could have the two methods share code.

          Thanks for working on this one Liang Xie

          Show
          stack added a comment - I like Colin Patrick McCabe 's take and his suggestion on unsetting current blockReader while the race is on. In the patch, nit, you have to make two instances of Random? Can't share? How is testHedgedReadBasic exercising the new functionality? Or is it just verifying nothing broke when it is turned on? readWithStrategy has synchronized added (you've moved the synchronize down a level it seems) but hedgedReadWithStrategy is not synchronized. Makes for interesting synchronizes inside hedgedReadWithStrategy. Could do w/ comments explaining what is going on. For example, the below is a little baffling until I look closely and see pos accesses are always inside synchronized methods (I think). + synchronized (this) { + if (pos >= getFileLength()) { + return -1; + } + } I took a look at this seems to be the only data member that needs synchronize protection in this method. Any danger if someone else changes it under you while this method is running? (between sync blocks)? And it is ok having multiple threads inside hedgedReadWithStrategy at the one time when say readWithStrategy doesn't allow this to happen? I like your use of a an old english catchall when problem WARN logging a hedgedReadWithStrategy return. Might want to change that in v2 (smile). Lots of overlap w/ the pread version. Would be coolio if could have the two methods share code. Thanks for working on this one Liang Xie
          Hide
          Liang Xie added a comment -

          How is testHedgedReadBasic exercising the new functionality? Or is it just verifying nothing broke when it is turned on?

          It was not finished yet, this patch is a preliminary one, and i hold on due to the block reader stuff, i think i have got a feasible direction per Colin's last comments, i'll continue to do the rest asap

          Show
          Liang Xie added a comment - How is testHedgedReadBasic exercising the new functionality? Or is it just verifying nothing broke when it is turned on? It was not finished yet, this patch is a preliminary one, and i hold on due to the block reader stuff, i think i have got a feasible direction per Colin's last comments, i'll continue to do the rest asap

            People

            • Assignee:
              Liang Xie
              Reporter:
              Colin Patrick McCabe
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:

                Development