Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15619 Über-JIRA: S3Guard Phase IV: Hadoop 3.3 features
  3. HADOOP-16185

S3Guard: Optimize performance of handling OOB operations in non-authoritative mode

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 3.3.0
    • Fix Version/s: None
    • Component/s: fs/s3
    • Labels:
      None

      Description

      HADOOP-15999 modifies the S3Guard's non-authoritative mode, so when S3Guard runs non-authoritative, every fs.getFileStatus will check S3 because we don't handle the MetadataStore as a single source of truth. This has a negative performance impact.

       

      In other words HADOOP-15999 is going to reinstate the HEAD on every read, so making non-auth S3Guard a bit slower. We could think about addressing that by moving the checks into the input stream itself. That is: the first GET which returns data will also act as the metadata check. That'd mean the read context will need updating with some "metastoreProcessHeader" callback to invoke on the first GET.

      The good news is that because it's reading a file, its only one HTTP HEAD request: no need to do any of the other two directory probes except in the case that the file isn't there.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                gabor.bota Gabor Bota
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: