Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1542

Deadlock in Configuration.writeXml when serialized form is larger than one DFS block

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.22.0, 0.23.0
    • Fix Version/s: 0.22.0
    • Component/s: hdfs-client
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Configuration.writeXml holds a lock on itself and then writes the XML to an output stream, during which DFSOutputStream will try to get a lock on ackQueue/dataQueue. Meanwihle the DataStreamer thread will call functions like conf.getInt() and deadlock against the other thread, since it could be the same conf object.

      This causes a deterministic deadlock whenever the serialized form is larger than block size.

      1. Test.java
        0.5 kB
        Todd Lipcon
      2. deadlock.txt
        82 kB
        Amit Nithian
      3. hdfs-1542.txt
        6 kB
        Todd Lipcon
      4. hdfs1542_cdh3b3.txt
        1 kB
        Amit Nithian
      5. hdfs-1542.txt
        6 kB
        Todd Lipcon
      6. hdfs-1542.txt
        6 kB
        Todd Lipcon
      7. hdfs-1542.txt
        2 kB
        Todd Lipcon

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-22-branch #35 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-22-branch/35/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-22-branch #35 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-22-branch/35/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/ )
          Hide
          Todd Lipcon added a comment -

          Committed to trunk and 0.22. Thanks Dhruba.

          Show
          Todd Lipcon added a comment - Committed to trunk and 0.22. Thanks Dhruba.
          Hide
          dhruba borthakur added a comment -

          +1 to the unit test code.

          Show
          dhruba borthakur added a comment - +1 to the unit test code.
          Hide
          Todd Lipcon added a comment -

          Here's the patch with just the unit test, so we can make sure it doesn't regress on the common side.

          Show
          Todd Lipcon added a comment - Here's the patch with just the unit test, so we can make sure it doesn't regress on the common side.
          Hide
          Todd Lipcon added a comment -

          This is now blocked on HADOOP-7082. Dhruba, do you have any thoughts on my new proposal above to fix this on the Configuration locking side?

          Show
          Todd Lipcon added a comment - This is now blocked on HADOOP-7082 . Dhruba, do you have any thoughts on my new proposal above to fix this on the Configuration locking side?
          Hide
          Todd Lipcon added a comment -

          Some of those failures are legitimate, because of the behavior change here.

          Rather than make this breaking change to such an important API, on second thought, I think it makes more sense to fix this at the Configuration level. Looking at that code, we already build up an XML Document object before we output it, so there's no need to hold the Configuration lock while outputting. Let's use this JIRA for the new test case and I'll open a new one for the Configuration change.

          Show
          Todd Lipcon added a comment - Some of those failures are legitimate, because of the behavior change here. Rather than make this breaking change to such an important API, on second thought, I think it makes more sense to fix this at the Configuration level. Looking at that code, we already build up an XML Document object before we output it, so there's no need to hold the Configuration lock while outputting. Let's use this JIRA for the new test case and I'll open a new one for the Configuration change.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12467232/hdfs-1542.txt
          against trunk revision 1053214.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.hdfs.server.common.TestDistributedUpgrade
          org.apache.hadoop.hdfs.server.namenode.TestLargeDirectoryDelete
          org.apache.hadoop.hdfs.server.namenode.TestStorageRestore
          org.apache.hadoop.hdfs.TestDFSPermission
          org.apache.hadoop.hdfs.TestDFSStorageStateRecovery
          org.apache.hadoop.hdfs.TestFileAppend3

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/63//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/63//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/63//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12467232/hdfs-1542.txt against trunk revision 1053214. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.server.common.TestDistributedUpgrade org.apache.hadoop.hdfs.server.namenode.TestLargeDirectoryDelete org.apache.hadoop.hdfs.server.namenode.TestStorageRestore org.apache.hadoop.hdfs.TestDFSPermission org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestFileAppend3 -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/63//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/63//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/63//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          Reupload again to retrigger hudson - previous build timed out, see HADOOP-7081

          Show
          Todd Lipcon added a comment - Reupload again to retrigger hudson - previous build timed out, see HADOOP-7081
          Hide
          Todd Lipcon added a comment -

          reuploading same patch so QA bot tests it (it picked up the backport patch last time)

          Show
          Todd Lipcon added a comment - reuploading same patch so QA bot tests it (it picked up the backport patch last time)
          Hide
          dhruba borthakur added a comment -

          +1 to the patch provided by Todd. I think this is the right approach (to make DFSClient make a new copy of the Conf object). Once Hudson passes the unit tests, i will commit it.

          Show
          dhruba borthakur added a comment - +1 to the patch provided by Todd. I think this is the right approach (to make DFSClient make a new copy of the Conf object). Once Hudson passes the unit tests, i will commit it.
          Hide
          Jeremy Carroll added a comment -

          +1

          Just compiled the patch and it looks good to me. Running hadoop-0.20-0.20.2+737. I could not recreate the issue after the patch.

          Show
          Jeremy Carroll added a comment - +1 Just compiled the patch and it looks good to me. Running hadoop-0.20-0.20.2+737. I could not recreate the issue after the patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12467003/hdfs1542_cdh3b3.txt
          against trunk revision 1052823.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/47//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12467003/hdfs1542_cdh3b3.txt against trunk revision 1052823. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/47//console This message is automatically generated.
          Hide
          Amit Nithian added a comment -

          Todd,

          Thanks for this patch. I took it and applied it to the CDH3b3 and have seen much better results from my jobs that were previously deadlocking. Attached is the same patch except applied to the CDH3b3 source for those who are using this and seeing the deadlock.

          Thanks!
          Amit

          Show
          Amit Nithian added a comment - Todd, Thanks for this patch. I took it and applied it to the CDH3b3 and have seen much better results from my jobs that were previously deadlocking. Attached is the same patch except applied to the CDH3b3 source for those who are using this and seeing the deadlock. Thanks! Amit
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12466854/hdfs-1542.txt
          against trunk revision 1052823.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.hdfs.server.namenode.TestStorageRestore
          org.apache.hadoop.hdfs.TestDFSPermission

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/45//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/45//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/45//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12466854/hdfs-1542.txt against trunk revision 1052823. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.server.namenode.TestStorageRestore org.apache.hadoop.hdfs.TestDFSPermission -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/45//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/45//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/45//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          Yes, removing the synchronized keyword will fix the deadlock though you may see CMEs in some unusual circumstances.

          Show
          Todd Lipcon added a comment - Yes, removing the synchronized keyword will fix the deadlock though you may see CMEs in some unusual circumstances.
          Hide
          Amit Nithian added a comment -

          Hey guys,

          Not sure where we are on this but it just hit me.. if I unapply the aforementioned patch that added the synchronized keyword to the writeXML, then would the deadlock not happen? I don't care about dumping the configuration at runtime for the moment but if I can get my jobs to run then that would free up a lot of other internal tasks me and my team are doing. I suspect I just need to do this and rebuild hadoop on the jobtracker machine which should minimize any code changes across my cluster

          Happy holidays!
          Amit

          Show
          Amit Nithian added a comment - Hey guys, Not sure where we are on this but it just hit me.. if I unapply the aforementioned patch that added the synchronized keyword to the writeXML, then would the deadlock not happen? I don't care about dumping the configuration at runtime for the moment but if I can get my jobs to run then that would free up a lot of other internal tasks me and my team are doing. I suspect I just need to do this and rebuild hadoop on the jobtracker machine which should minimize any code changes across my cluster Happy holidays! Amit
          Hide
          Peter Voss added a comment -

          Hi Todd,

          I might be the wrong person to judge, but I totally agree with you that avoiding synchronization and locking in the first place is a very good idea. Therefore it seems totally valid to make a copy of the Configuration. I don't like the "feature" that changing settings in the configuration after the FS was created is reflected in open streams. I think this adds unnecessary complexity and synchronization issues. I like your solution.

          Show
          Peter Voss added a comment - Hi Todd, I might be the wrong person to judge, but I totally agree with you that avoiding synchronization and locking in the first place is a very good idea. Therefore it seems totally valid to make a copy of the Configuration. I don't like the "feature" that changing settings in the configuration after the FS was created is reflected in open streams. I think this adds unnecessary complexity and synchronization issues. I like your solution.
          Hide
          Todd Lipcon added a comment -

          Here's one possible fix.

          It's very slightly incompatible in that it used to be that we could make a FileSystem using a Configuration, then change settings in that Configuration (after the FS was created) and they'd be reflected in open streams. This was only true for some of the configurations, so I think it was probably more of a bug than a feature.

          What do you think? I think fixing it at this layer makes more sense than changing Configuration.writeXML to copy itself first.

          Show
          Todd Lipcon added a comment - Here's one possible fix. It's very slightly incompatible in that it used to be that we could make a FileSystem using a Configuration, then change settings in that Configuration ( after the FS was created) and they'd be reflected in open streams. This was only true for some of the configurations, so I think it was probably more of a bug than a feature. What do you think? I think fixing it at this layer makes more sense than changing Configuration.writeXML to copy itself first.
          Hide
          Amit Nithian added a comment -

          Thanks to everyone who has been looking at this.. i'm glad that it was introduced in a CDHb3 patch else I'd be worried about some wacky random condition!

          Show
          Amit Nithian added a comment - Thanks to everyone who has been looking at this.. i'm glad that it was introduced in a CDHb3 patch else I'd be worried about some wacky random condition!
          Hide
          Todd Lipcon added a comment -

          Aha, you're right, this was introduced by HADOOP-6408, which added the synchronized keyword for writeXml to fix a potential CME issue. So this exists in Apache 21 and 22 but not 0.20.

          Show
          Todd Lipcon added a comment - Aha, you're right, this was introduced by HADOOP-6408 , which added the synchronized keyword for writeXml to fix a potential CME issue. So this exists in Apache 21 and 22 but not 0.20.
          Hide
          Peter Voss added a comment -

          Hi Todd,

          I have just tried to reproduce this today using your test program. I didn't get a deadlock on the original Hadoop 0.20.2 release. But I did get a deadlock on CDH3B3.

          Show
          Peter Voss added a comment - Hi Todd, I have just tried to reproduce this today using your test program. I didn't get a deadlock on the original Hadoop 0.20.2 release. But I did get a deadlock on CDH3B3.
          Hide
          Todd Lipcon added a comment -

          Amit: as far as I can tell this has always been present in 0.20.x, but I may be incorrect, and didn't try reproducing on an old version.

          Show
          Todd Lipcon added a comment - Amit: as far as I can tell this has always been present in 0.20.x, but I may be incorrect, and didn't try reproducing on an old version.
          Hide
          Todd Lipcon added a comment -

          Nicholas/Dhruba - I see you're watching this ticket Which do you think is better? (a) have Configuration.writeXML copy itself to a new object first, or (b) fix up DFSOutputStream to either copy Configuration or never call conf.get() from the DataStreamer/ResponseProcessor threads?

          Show
          Todd Lipcon added a comment - Nicholas/Dhruba - I see you're watching this ticket Which do you think is better? (a) have Configuration.writeXML copy itself to a new object first, or (b) fix up DFSOutputStream to either copy Configuration or never call conf.get() from the DataStreamer/ResponseProcessor threads?
          Hide
          Amit Nithian added a comment -

          How can I verify that latter condition? (DN in the pipeline being inaccessible)? I restarted the cluster in case but is there some simple checks. I am curious if this bug has always been in the source code or if this was recently introduced?

          Show
          Amit Nithian added a comment - How can I verify that latter condition? (DN in the pipeline being inaccessible)? I restarted the cluster in case but is there some simple checks. I am curious if this bug has always been in the source code or if this was recently introduced?
          Hide
          Todd Lipcon added a comment -

          It will also happen if any DN in the pipeline fails or is inaccessible from the client, I believe. Basically anything that makes DFSOutputStream go and ask the conf for something will cause this issue.

          Show
          Todd Lipcon added a comment - It will also happen if any DN in the pipeline fails or is inaccessible from the client, I believe. Basically anything that makes DFSOutputStream go and ask the conf for something will cause this issue.
          Hide
          Peter Voss added a comment -

          Hi Todd,

          please note that in Amit's case the configuration is just 50K (when serialized to xml), which means that this deadlock doesn't only happen when the conf size is greater than the block size.

          Show
          Peter Voss added a comment - Hi Todd, please note that in Amit's case the configuration is just 50K (when serialized to xml), which means that this deadlock doesn't only happen when the conf size is greater than the block size.
          Hide
          Todd Lipcon added a comment -

          Yep, looks from your trace that it's the same bug.

          Show
          Todd Lipcon added a comment - Yep, looks from your trace that it's the same bug.
          Hide
          Amit Nithian added a comment -

          Todd. Thanks! I was just curious as to if we are indeed seeing the same issue. As an aside, I also tried to drop the block size of another job down to 1024 and it didn't deadlock just didn't make a ton of progress either .. hence why I was trying to reproduce. I was just making sure that the deadlock that I am seeing is caused by the reason you mentioned because the job conf XML for the specific job(s) is nowhere near 64MB in size.

          Show
          Amit Nithian added a comment - Todd. Thanks! I was just curious as to if we are indeed seeing the same issue. As an aside, I also tried to drop the block size of another job down to 1024 and it didn't deadlock just didn't make a ton of progress either .. hence why I was trying to reproduce. I was just making sure that the deadlock that I am seeing is caused by the reason you mentioned because the job conf XML for the specific job(s) is nowhere near 64MB in size.
          Hide
          Todd Lipcon added a comment -

          Hi Amit. There's no need to reproduce on your cluster - this is guaranteed to happen I will work on a patch soon.

          Show
          Todd Lipcon added a comment - Hi Amit. There's no need to reproduce on your cluster - this is guaranteed to happen I will work on a patch soon.
          Hide
          Amit Nithian added a comment -

          Attached is an example of the thread dump of the jobtracker when submitting a job that causes the tracker to deadlock. This is consistent with the one that was posted on the mailing lists to kick off this issue. I tried to increase my block size to 128M to no avail.

          BTW how can I use your test program to reproduce the problem on my cluster? I successfully ran it locally but am not sure how I can run this in a distributed mode to test.

          Show
          Amit Nithian added a comment - Attached is an example of the thread dump of the jobtracker when submitting a job that causes the tracker to deadlock. This is consistent with the one that was posted on the mailing lists to kick off this issue. I tried to increase my block size to 128M to no avail. BTW how can I use your test program to reproduce the problem on my cluster? I successfully ran it locally but am not sure how I can run this in a distributed mode to test.
          Hide
          Todd Lipcon added a comment -

          Yes, you can probably set the dfs block size to very large for your job, but it will have the side effect of having very large blocks on the output as well.

          The other workaround is to not use Configuration to store very large objects. Instead please consider using the DistributedCache API - it should be more efficient anyway.

          Show
          Todd Lipcon added a comment - Yes, you can probably set the dfs block size to very large for your job, but it will have the side effect of having very large blocks on the output as well. The other workaround is to not use Configuration to store very large objects. Instead please consider using the DistributedCache API - it should be more efficient anyway.
          Hide
          Amit Nithian added a comment -

          is there a temporary workaround.. changing the blocksize or something.. this is a blocker for some stuff we are doing in production.

          Show
          Amit Nithian added a comment - is there a temporary workaround.. changing the blocksize or something.. this is a blocker for some stuff we are doing in production.
          Hide
          Todd Lipcon added a comment -

          Here's a test program which illustrates the deadlock

          Show
          Todd Lipcon added a comment - Here's a test program which illustrates the deadlock

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development