Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10688

BPServiceActor may run into a tight loop for sending block report when hitting IOException

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: datanode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Currently in BPServiceActor#offerService, when datanode runs into a local IOException, the DataNode only logs the exception and runs into the while loop again:

            } catch(RemoteException re) {
              .......
              LOG.warn("RemoteException in offerService", re);
              try {
                long sleepTime = Math.min(1000, dnConf.heartBeatInterval);
                Thread.sleep(sleepTime);
              } catch (InterruptedException ie) {
                Thread.currentThread().interrupt();
              }
            } catch (IOException e) {
              LOG.warn("IOException in offerService", e);
            }
      

      This tight loop may cause some issue. For example, in a production cluster, we saw a DataNode hit exception when doing kerberos realm lookup. This tight loop finally caused the DataNode to send hundreds of DNS lookup queries per second.

        Attachments

        1. HDFS-10688.002.patch
          1 kB
          Chen Liang
        2. HDFS-10688.001.patch
          1 kB
          Chen Liang

          Activity

            People

            • Assignee:
              vagarychen Chen Liang
              Reporter:
              jingzhao Jing Zhao
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: