Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15419

RBF: Router should retry communicate with NN when cluster is unavailable using configurable time interval

    XMLWordPrintableJSON

Details

    Description

      When cluster is unavailable, router -> namenode communication will only retry once without any time interval, that is not reasonable.

      For example, in my company, which has several hdfs clusters with more than 1000 nodes, we have encountered this problem. In some cases, the cluster becomes unavailable briefly for about 10 or 30 seconds, at the same time, almost all rpc requests to router failed because router only retry once without time interval.

      It's better for us to enhance the router retry strategy, to retry **communicate with NN using configurable time interval and max retry times.

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bhji123 bhji123
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m