Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13828

DataNode breaching Xceiver Count

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Not A Problem
    • 2.7.1
    • None
    • datanode
    • None

    Description

      We were observing the breach of the xceiver count 4096, On a particular set of nodes from 5 - 8 nodes in a 900 nodes cluster.
      And we stopped the datanode services on those nodes and made to replicate across the cluster. After that also, we observed the same issue on a new set of nodes.

      Q1: Why on a particular node, and also after decommissioning the node the data should be replicated across the cluster, But why again difference set of node?

      Assumptions :
      Reading a particular block/ data on that node might be the cause for this but it should be mitigated after the decommission but not why? So suspected that those MR jobs are triggered from Hive, so the query might be referring to the same block mulitple times  in different stages and creating this issue?

      From Thread Dump :

      Thread dump of datanode says that out of 4090+ xceiver threads created on that node nearly 4000+ where belong to the same AppId of multiple mappers with state no operation.

       

      Any suggestions on this?

       

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            Amithsha Amithsha
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: