Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.20.203.0, 0.23.0, 1.0.2
    • Fix Version/s: 1.0.3
    • Component/s: namenode
    • Labels:
      None
    • Environment:

      Description

      It appears that there's a condition under which a HDFS directory with a space quota set can get to a point where the cached size for the directory can permanently differ from the computed value. When this happens the following command:

      hadoop fs -count -q /tmp/quota-test
      

      results in the following output in the NameNode logs:

      WARN org.apache.hadoop.hdfs.server.namenode.NameNode: Inconsistent diskspace for directory quota-test. Cached: 6000 Computed: 6072
      

      I've observed both transient and persistent instances of this happening. In the transient instances this warning goes away, but in the persistent instances every invocation of the fs -count -q command yields the above warning.

      I've seen instances where the actual disk usage of a directory is 25% of the cached value in INodeDirectory, which creates problems since the quota code uses this cached value to determine whether block write requests are permitted.

      This isn't easy to reproduce - I am able to (inconsistently) get HDFS into this state with a simple program which:

      1. Writes files into HDFS
      2. When a DSQuotaExceededException is encountered removes all files created in step 1
      3. Repeat step 1

      I'm going to try and come up with a more repeatable test case to reproduce this issue.

      1. QuotaTestSimple.java
        4 kB
        Alex Holmes
      2. hdfs-3061-branch-1.patch
        3 kB
        Kihwal Lee

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Patch Available Patch Available
          57d 22h 26m 1 Kihwal Lee 04/May/12 23:38
          Patch Available Patch Available Resolved Resolved
          1d 22h 1m 1 Matt Foley 06/May/12 21:39
          Resolved Resolved Closed Closed
          10d 5m 1 Matt Foley 16/May/12 21:45
          Eli Collins made changes -
          Summary Cached directory size in INodeDirectory can get permantently out of sync with computed size, causing quota issues Backport HDFS-1487 to branch-1
          Hide
          Eli Collins added a comment -

          Updating summary to reflect that this patch is the HDFS-1487 patch applied to branch-1.

          Show
          Eli Collins added a comment - Updating summary to reflect that this patch is the HDFS-1487 patch applied to branch-1.
          Matt Foley made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Matt Foley added a comment -

          Closed upon release of Hadoop-1.0.3.

          Show
          Matt Foley added a comment - Closed upon release of Hadoop-1.0.3.
          Matt Foley made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 1.1.0 [ 12317959 ]
          Resolution Fixed [ 1 ]
          Hide
          Matt Foley added a comment -

          +1. Looks like the correct port of HDFS-1487 to branch-1. Committing to branch-1 and branch-1.0. Thanks, Kihwal!

          Show
          Matt Foley added a comment - +1. Looks like the correct port of HDFS-1487 to branch-1. Committing to branch-1 and branch-1.0. Thanks, Kihwal!
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12525690/hdfs-3061-branch-1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2380//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525690/hdfs-3061-branch-1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2380//console This message is automatically generated.
          Kihwal Lee made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Kihwal Lee made changes -
          Attachment hdfs-3061-branch-1.patch [ 12525690 ]
          Kihwal Lee made changes -
          Fix Version/s 0.23.3 [ 12320052 ]
          Fix Version/s 2.0.0 [ 12320353 ]
          Fix Version/s 3.0.0 [ 12320356 ]
          Hide
          Kihwal Lee added a comment -

          0.23.3, 2.0 and trunk have the fix. It's only the branch-1.

          Show
          Kihwal Lee added a comment - 0.23.3, 2.0 and trunk have the fix. It's only the branch-1.
          Hide
          Kihwal Lee added a comment -

          Since HDFS-1487 is closed, I will track the work in this jira.

          Show
          Kihwal Lee added a comment - Since HDFS-1487 is closed, I will track the work in this jira.
          Kihwal Lee made changes -
          Fix Version/s 1.1.0 [ 12317959 ]
          Fix Version/s 0.23.3 [ 12320052 ]
          Fix Version/s 1.0.3 [ 12320249 ]
          Fix Version/s 2.0.0 [ 12320353 ]
          Fix Version/s 3.0.0 [ 12320356 ]
          Affects Version/s 1.0.2 [ 12320051 ]
          Affects Version/s 0.23.0 [ 12315571 ]
          Priority Major [ 3 ] Blocker [ 1 ]
          Kihwal Lee made changes -
          Link This issue is related to HDFS-1487 [ HDFS-1487 ]
          Kihwal Lee made changes -
          Assignee Kihwal Lee [ kihwal ]
          Hide
          Kihwal Lee added a comment -

          We (mostly Koji) have tracked it down to abandonBlock(). Branch-1 is missing HDFS-1487.

          Show
          Kihwal Lee added a comment - We (mostly Koji) have tracked it down to abandonBlock(). Branch-1 is missing HDFS-1487 .
          Hide
          Kihwal Lee added a comment -

          One example:

           
          012-04-18 00:06:04,246 WARN org.apache.hadoop.hdfs.server.namenode.NameNode:
          Inconsistent diskspace for directory xxxxx. Cached: 7877252111770 Computed:
          4159086535
          

          The delta grows everyday.

          Show
          Kihwal Lee added a comment - One example: 012-04-18 00:06:04,246 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: Inconsistent diskspace for directory xxxxx. Cached: 7877252111770 Computed: 4159086535 The delta grows everyday.
          Hide
          Kihwal Lee added a comment -

          We are seeing this in 1.0.

          Show
          Kihwal Lee added a comment - We are seeing this in 1.0.
          Eli Collins made changes -
          Link This issue relates to HDFS-1377 [ HDFS-1377 ]
          Eli Collins made changes -
          Link This issue relates to HDFS-2053 [ HDFS-2053 ]
          Alex Holmes made changes -
          Environment 0.20.203 with HDFS-1377 and HDFS-2053 patches applied
          Hide
          Alex Holmes added a comment -

          We're running on 0.20.203 with HDFS-1377 and HDFS-2053 backported - sorry, I should have mentioned that in the description. Would HDFS-1189 apply, since the test case which (sometimes) reproduces this condition doesn't clear the quota?

          Show
          Alex Holmes added a comment - We're running on 0.20.203 with HDFS-1377 and HDFS-2053 backported - sorry, I should have mentioned that in the description. Would HDFS-1189 apply, since the test case which (sometimes) reproduces this condition doesn't clear the quota?
          Hide
          Todd Lipcon added a comment -

          Possible this is HDFS-1189 or HDFS-1377? Can you repro on 0.20.204 or later?

          Show
          Todd Lipcon added a comment - Possible this is HDFS-1189 or HDFS-1377 ? Can you repro on 0.20.204 or later?
          Alex Holmes made changes -
          Attachment QuotaTestSimple.java [ 12517489 ]
          Hide
          Alex Holmes added a comment -

          Sample Java class which may eventually reproduce the problem.

          Show
          Alex Holmes added a comment - Sample Java class which may eventually reproduce the problem.
          Alex Holmes made changes -
          Field Original Value New Value
          Description It appears that there's a condition under which a HDFS directory with a quota set can get to a point where the cached size for the directory can permanently differ from the computed value. When this happens the following command:

          {code}
          hadoop fs -count -q /tmp/quota-test
          {code}

          results in the following output in the NameNode logs:

          {code}
          WARN org.apache.hadoop.hdfs.server.namenode.NameNode: Inconsistent diskspace for directory quota-test. Cached: 6000 Computed: 6072
          {code}

          I've observed both transient and persistent instances of this happening. In the transient instances this warning goes away, but in the persistent instances every invocation of the {{fs -count -q}} command yields the above warning.

          I've seen instances where the actual disk usage of a directory is 25% of the cached value in INodeDirectory, which creates problems since the quota code uses this cached value to determine whether block write requests are permitted.

          This isn't easy to reproduce - I am able to (inconsistently) get HDFS into this state with a simple program which:

          # Writes files into HDFS
          # When a DSQuotaExceededException is encountered removes all files created in step 1
          # Repeat step 1

          I'm going to try and come up with a more repeatable test case to reproduce this issue.
          It appears that there's a condition under which a HDFS directory with a space quota set can get to a point where the cached size for the directory can permanently differ from the computed value. When this happens the following command:

          {code}
          hadoop fs -count -q /tmp/quota-test
          {code}

          results in the following output in the NameNode logs:

          {code}
          WARN org.apache.hadoop.hdfs.server.namenode.NameNode: Inconsistent diskspace for directory quota-test. Cached: 6000 Computed: 6072
          {code}

          I've observed both transient and persistent instances of this happening. In the transient instances this warning goes away, but in the persistent instances every invocation of the {{fs -count -q}} command yields the above warning.

          I've seen instances where the actual disk usage of a directory is 25% of the cached value in INodeDirectory, which creates problems since the quota code uses this cached value to determine whether block write requests are permitted.

          This isn't easy to reproduce - I am able to (inconsistently) get HDFS into this state with a simple program which:

          # Writes files into HDFS
          # When a DSQuotaExceededException is encountered removes all files created in step 1
          # Repeat step 1

          I'm going to try and come up with a more repeatable test case to reproduce this issue.
          Alex Holmes created issue -

            People

            • Assignee:
              Kihwal Lee
              Reporter:
              Alex Holmes
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development