Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-5788

listLocatedStatus response can be very large

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0, 0.23.10, 2.2.0
    • Fix Version/s: 2.3.0
    • Component/s: namenode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Currently we limit the size of listStatus requests to a default of 1000 entries. This works fine except in the case of listLocatedStatus where the location information can be quite large. As an example, a directory with 7000 entries, 4 blocks each, 3 way replication - a listLocatedStatus response is over 1MB. This can chew up very large amounts of memory in the NN if lots of clients try to do this simultaneously.

      Seems like it would be better if we also considered the amount of location information being returned when deciding how many files to return.

      Patch will follow shortly.

      1. HDFS-5788.patch
        6 kB
        Nathan Roberts

        Issue Links

          Activity

          Nathan Roberts created issue -
          Hide
          Suresh Srinivas added a comment -

          a listLocatedStatus response is over 1MB

          These are short lived objects and are garbage collected in young generation. This causes lot of issues?

          Seems like it would be better if we also considered the amount of location information being returned when deciding how many files to return.

          Can you please add details about the solution?

          Show
          Suresh Srinivas added a comment - a listLocatedStatus response is over 1MB These are short lived objects and are garbage collected in young generation. This causes lot of issues? Seems like it would be better if we also considered the amount of location information being returned when deciding how many files to return. Can you please add details about the solution?
          Jason Lowe made changes -
          Field Original Value New Value
          Link This issue relates to HADOOP-8942 [ HADOOP-8942 ]
          Hide
          Jason Lowe added a comment -

          They are usually short-lived but a bit longer-lived when we can't push them out the network in a timely manner. Then due to lack of flow control in the RPC layer we can fill up the heap with these given a large enough average response buffer per call and enough clients. See HADOOP-8942.

          This change mitigates the issue for listLocatedStatus since a much smaller response payload means it takes a lot more simultaneous clients to consume an equal amount of heap space.

          Show
          Jason Lowe added a comment - They are usually short-lived but a bit longer-lived when we can't push them out the network in a timely manner. Then due to lack of flow control in the RPC layer we can fill up the heap with these given a large enough average response buffer per call and enough clients. See HADOOP-8942 . This change mitigates the issue for listLocatedStatus since a much smaller response payload means it takes a lot more simultaneous clients to consume an equal amount of heap space.
          Hide
          Suresh Srinivas added a comment -

          Then due to lack of flow control in the RPC layer we can fill up the heap with these given a large enough average response buffer per call and enough clients.

          Jason Lowe, thanks for the pointer.

          We can certainly reduce the number of files returned in each iteration. But it would increase the number of requests to be processed by NameNode though.

          Show
          Suresh Srinivas added a comment - Then due to lack of flow control in the RPC layer we can fill up the heap with these given a large enough average response buffer per call and enough clients. Jason Lowe , thanks for the pointer. We can certainly reduce the number of files returned in each iteration. But it would increase the number of requests to be processed by NameNode though.
          Hide
          Nathan Roberts added a comment -

          A simple solution is:
          Restrict the size to dfs.ls.limit (default 1000) files OR dfs.ls.limit block locations, whichever comes first (obviously always returning only whole entries, so we could send more than this number of locations)

          Yes, it will require more RPCs. However, it would seem to lower the risk of a DoS.

          Show
          Nathan Roberts added a comment - A simple solution is: Restrict the size to dfs.ls.limit (default 1000) files OR dfs.ls.limit block locations, whichever comes first (obviously always returning only whole entries, so we could send more than this number of locations) Yes, it will require more RPCs. However, it would seem to lower the risk of a DoS.
          Hide
          Nathan Roberts added a comment -

          patch for trunk.

          Show
          Nathan Roberts added a comment - patch for trunk.
          Nathan Roberts made changes -
          Attachment HDFS-5788.patch [ 12624374 ]
          Nathan Roberts made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12624374/HDFS-5788.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5936//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5936//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12624374/HDFS-5788.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5936//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5936//console This message is automatically generated.
          Hide
          Daryn Sharp added a comment -

          For a bit more context, we had about ~6-7k tasks (erroneously) issuing listLocatedStatus. Each limited response was over 1M. The handler attempts a non-blocking write for the response. If the entire response cannot be written, the call is added to the background responder thread. The kernel accepts well below 1M for a non-blocking write so all the responses were added to the responder thread.

          The call response byte buffers track the position of the last write, thus the entire response buffer is retained until the full response is sent. Re-allocating a buffer with the unsent response will likely introduce additional memory pressure, so the most logical/simplistic change is limiting the response size of the located status.

          The end result in our case was the heap bloating by over 8G. Full GC kicked in. The NN was unresponsive for up to 5m at a time. Each time it woke up it marked DNs as dead, causing a flurry of replications which further aggravated the memory issue. Due to other exposed bugs, the NN required a restart.

          Although more RPCs are required to satisfy the large requests, I believe the tradeoff is reasonable. It's also not likely to be a common occurrence.

          Show
          Daryn Sharp added a comment - For a bit more context, we had about ~6-7k tasks (erroneously) issuing listLocatedStatus. Each limited response was over 1M. The handler attempts a non-blocking write for the response. If the entire response cannot be written, the call is added to the background responder thread. The kernel accepts well below 1M for a non-blocking write so all the responses were added to the responder thread. The call response byte buffers track the position of the last write, thus the entire response buffer is retained until the full response is sent. Re-allocating a buffer with the unsent response will likely introduce additional memory pressure, so the most logical/simplistic change is limiting the response size of the located status. The end result in our case was the heap bloating by over 8G. Full GC kicked in. The NN was unresponsive for up to 5m at a time. Each time it woke up it marked DNs as dead, causing a flurry of replications which further aggravated the memory issue. Due to other exposed bugs, the NN required a restart. Although more RPCs are required to satisfy the large requests, I believe the tradeoff is reasonable. It's also not likely to be a common occurrence.
          Hide
          Kihwal Lee added a comment -

          The location counting can be off if blocks are under-replicated or over-replicated, but spending more cycles to make it perfect will be a waste. So I am okay with this approach.

          +1

          Show
          Kihwal Lee added a comment - The location counting can be off if blocks are under-replicated or over-replicated, but spending more cycles to make it perfect will be a waste. So I am okay with this approach. +1
          Hide
          Kihwal Lee added a comment -

          Thanks for working on the issue, Nathan. I've committed it to trunk and branch-2.

          Show
          Kihwal Lee added a comment - Thanks for working on the issue, Nathan. I've committed it to trunk and branch-2.
          Kihwal Lee made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Fix Version/s 3.0.0 [ 12320356 ]
          Fix Version/s 2.4.0 [ 12324588 ]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #5036 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5036/)
          HDFS-5788. listLocatedStatus response can be very large. Contributed by Nathan Roberts. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1560750)

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeFile.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #5036 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5036/ ) HDFS-5788 . listLocatedStatus response can be very large. Contributed by Nathan Roberts. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1560750 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeFile.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #461 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/461/)
          HDFS-5788. listLocatedStatus response can be very large. Contributed by Nathan Roberts. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1560750)

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeFile.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #461 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/461/ ) HDFS-5788 . listLocatedStatus response can be very large. Contributed by Nathan Roberts. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1560750 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeFile.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #1678 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1678/)
          HDFS-5788. listLocatedStatus response can be very large. Contributed by Nathan Roberts. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1560750)

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeFile.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1678 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1678/ ) HDFS-5788 . listLocatedStatus response can be very large. Contributed by Nathan Roberts. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1560750 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeFile.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #1653 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1653/)
          HDFS-5788. listLocatedStatus response can be very large. Contributed by Nathan Roberts. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1560750)

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeFile.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1653 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1653/ ) HDFS-5788 . listLocatedStatus response can be very large. Contributed by Nathan Roberts. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1560750 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeFile.java
          Arun C Murthy made changes -
          Fix Version/s 2.3.0 [ 12325255 ]
          Fix Version/s 2.4.0 [ 12324588 ]
          Arun C Murthy made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Allen Wittenauer made changes -
          Fix Version/s 3.0.0 [ 12320356 ]

            People

            • Assignee:
              Nathan Roberts
              Reporter:
              Nathan Roberts
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development