Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-18072 Über-JIRA: abfs phase III: Hadoop 3.4.0 features & fixes
  3. HADOOP-18872

ABFS: Misreporting Retry Count for Sub-sequential and Parallel Operations

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      There was a bug identified where retry count in the client correlation id was wrongly reported for sub-sequential and parallel operations triggered by a single file system call. This was due to reusing same tracing context for all such calls.
      We create a new tracing context as soon as HDFS call comes. We keep on passing that same TC for all the client calls.

      For instance, when we get a createFile call, we first call metadata operations. If those metadata operations somehow succeeded after a few retries, the tracing context will have that many retry count in it. Now when actual call for create is made, same retry count will be used to construct the headers(clientCorrelationId). Alhough the create operation never failed, we will still see retry count from the previous request.

      Fix is to use a new tracing context object for all the network calls made. All the sub-sequential and parallel operations will have same primary request Id to correlate them, yet they will have their own tracing of retry count.

      Attachments

        Activity

          People

            anujmodi2021 Anuj Modi
            asrani_anmol Anmol Asrani
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: