Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-1878

TestHDFSServerPorts unit test failure - race condition in FSNamesystem.close() causes NullPointerException without serious consequence

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 0.20.204.0
    • 0.20.204.0
    • namenode
    • None

    Description

      In 20.204, TestHDFSServerPorts was observed to intermittently throw a NullPointerException. This only happens when FSNamesystem.close() is called, which means system termination for the Namenode, so this is not a serious bug for .204. TestHDFSServerPorts is more likely than normal execution to stimulate the race, because it runs two Namenodes in the same JVM, causing more interleaving and more potential to see a race condition.

      The race is in FSNamesystem.close(), line 566, we have:
      if (replthread != null) replthread.interrupt();
      if (replmon != null) replmon = null;

      Since the interrupted replthread is not waited on, there is a potential race condition with replmon being nulled before replthread is dead, but replthread references replmon in computeDatanodeWork() where the NullPointerException occurs.

      The solution is either to wait on replthread or just don't null replmon. The latter is preferred, since none of the sibling Namenode processing threads are waited on in close().

      I'll attach a patch for .205.

      Attachments

        1. 1878-1.patch
          0.7 kB
          Matthew Foley

        Activity

          People

            mattf Matthew Foley
            mattf Matthew Foley
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: