Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15763 Über-JIRA: abfs phase II: Hadoop 3.3 features & fixes
  3. HADOOP-16852

ABFS: Send error back to client for Read Ahead request failure

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Issue seen by a customer:

      The failed requests we were seeing in the AbfsClient logging actually never made it out over the wire. We have found that there’s an issue with ADLS passthrough and the 8 read ahead threads that ADLSv2 spawns in ReadBufferManager.java. We depend on thread local storage in order to get the right JWT token and those threads do not have the right information in their thread local storage. Thus, when they pick up a task from the read ahead queue they fail by throwing an AzureCredentialNotFoundException exception in AbfsRestOperation.executeHttpOperation() where it calls client.getAccessToken(). This exception is silently swallowed by the read ahead threads in ReadBufferWorker.run(). As a result, every read ahead attempt results in a failed executeHttpOperation(), but still calls AbfsClientThrottlingIntercept.updateMetrics() and contributes to throttling (despite not making it out over the wire). After the read aheads fail, the main task thread performs the read with the right thread local storage information and succeeds, but first sleeps for up to 10 seconds due to the throttling.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            snvijaya Sneha Vijayarajan
            snvijaya Sneha Vijayarajan
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 10m
                1h 10m

                Slack

                  Issue deployment