Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5718

TimelineClient (and other places in YARN) shouldn't over-write HDFS client retry settings which could cause unexpected behavior

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Incompatible change

    Description

      In one HA cluster, after NN failed over, we noticed that job is getting failed as TimelineClient failed to retry connection to proper NN. This is because we are overwrite hdfs client settings that hard code retry policy to be enabled that conflict NN failed-over case - hdfs client should fail fast so can retry on another NN.
      We shouldn't assume any retry policy for hdfs client at all places in YARN. This should keep consistent with HDFS settings that has different retry polices in different deployment case. Thus, we should clean up these hard code settings in YARN, include: FileSystemTimelineWriter, FileSystemRMStateStore and FileSystemNodeLabelsStore.

      Attachments

        1. YARN-5718-v2.patch
          9 kB
          Junping Du
        2. YARN-5718-v2.1.patch
          9 kB
          Junping Du
        3. YARN-5718.patch
          4 kB
          Junping Du

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            junping_du Junping Du
            junping_du Junping Du
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment