Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14575

LeaseRenewer#daemon threads leak in DFSClient

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.1.0
    • 3.4.0, 3.2.3, 3.3.2, 3.2.4
    • dfsclient
    • None

    Description

      Currently LeaseRenewer (and its daemon thread) without clients should be terminated after a grace period which defaults to 60 seconds. A race condition may happen when a new request is coming just after LeaseRenewer expired.
      Reproduce this race condition:

      1. Client#1 creates File#1: creates LeaseRenewer#1 and starts Daemon#1 thread, after a few seconds, File#1 is closed , there is no clients in LeaseRenewer#1 now.
      2. 60 seconds (grace period) later, LeaseRenewer#1 just expires but daemon#1 thread is still in sleep, Client#1 creates File#2, lead to the creation of Daemon#2.
      3. Daemon#1 is awake then exit, after that, LeaseRenewer#1 is removed from factory.
      4. File#2 is closed after a few seconds, LeaseRenewer#2 is created since it can’t get renewer from factory.

      Daemon#2 thread leaks from now on, since Client#1 in it can never be removed and it won't have a chance to stop.

      To solve this problem, IIUIC, a simple way I think is to make sure that all clients are cleared when LeaseRenewer is removed from factory. Please feel free to give your suggestions. Thanks!

      Attachments

        1. HDFS-14575.004.patch
          10 kB
          Renukaprasad C
        2. HDFS-14575.003.patch
          10 kB
          Renukaprasad C
        3. HDFS-14575.002.patch
          5 kB
          Tao Yang
        4. HDFS-14575.001.patch
          4 kB
          Tao Yang

        Issue Links

          Activity

            People

              prasad-acit Renukaprasad C
              Tao Yang Tao Yang
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: