Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-14825 Über-JIRA: S3Guard Phase II: Hadoop 3.1 features
  3. HADOOP-13761

S3Guard: implement retries for DDB failures and throttling; translate exceptions

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 3.0.0-beta1
    • 3.1.0
    • fs/s3
    • None

    Description

      Following the S3AFileSystem integration patch in HADOOP-13651, we need to add retry logic.

      In HADOOP-13651, I added TODO comments in most of the places retry loops are needed, including:

      • open(path). If MetadataStore reflects recent create/move of file path, but we fail to read it from S3, retry.
      • delete(path). If deleteObject() on S3 fails, but MetadataStore shows the file exists, retry.
      • rename(src,dest). If source path is not visible in S3 yet, retry.
      • listFiles(). Skip for now. Not currently implemented in S3Guard. I will create a separate JIRA for this as it will likely require interface changes (i.e. prefix or subtree scan).

      We may miss some cases initially and we should do failure injection testing to make sure we're covered. Failure injection tests can be a separate JIRA to make this easier to review.

      We also need basic configuration parameters around retry policy. There should be a way to specify maximum retry duration, as some applications would prefer to receive an error eventually, than waiting indefinitely. We should also be keeping statistics when inconsistency is detected and we enter a retry loop.

      Attachments

        1. HADOOP-13761.001.patch
          11 kB
          Aaron Fabbri
        2. HADOOP-13761.002.patch
          11 kB
          Aaron Fabbri
        3. HADOOP-13761.003.patch
          60 kB
          Aaron Fabbri
        4. HADOOP-13761.004.patch
          62 kB
          Aaron Fabbri
        5. HADOOP-13761-004-to-005.patch
          11 kB
          Steve Loughran
        6. HADOOP-13761-005.patch
          62 kB
          Steve Loughran
        7. HADOOP-13761-005-to-006-approx.diff.txt
          17 kB
          Aaron Fabbri
        8. HADOOP-13761-006.patch
          67 kB
          Aaron Fabbri
        9. HADOOP-13761-007.patch
          74 kB
          Aaron Fabbri
        10. HADOOP-13761-008.patch
          74 kB
          Aaron Fabbri
        11. HADOOP-13761-009.patch
          75 kB
          Aaron Fabbri
        12. HADOOP-13761-010.patch
          75 kB
          Steve Loughran
        13. HADOOP-13761-010.patch
          75 kB
          Steve Loughran
        14. HADOOP-13761-011.patch
          75 kB
          Steve Loughran
        15. HADOOP-13761-012.patch
          76 kB
          Aaron Fabbri
        16. HADOOP-13761-013.patch
          76 kB
          Steve Loughran
        17. HADOOP-15183-013.patch
          76 kB
          Steve Loughran

        Issue Links

          Activity

            People

              fabbri Aaron Fabbri
              fabbri Aaron Fabbri
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: