Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15620 Über-jira: S3A phase VI: Hadoop 3.3 features
  3. HADOOP-15625

S3A input stream to use etags/version number to detect changed source files

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0
    • 3.3.0, 3.2.1
    • fs/s3
    • None

    Description

      S3A input stream doesn't handle changing source files any better than the other cloud store connectors. Specifically: it doesn't noticed it has changed, caches the length from startup, and whenever a seek triggers a new GET, you may get one of: old data, new data, and even perhaps go from new data to old data due to eventual consistency.

      We can't do anything to stop this, but we could detect changes by

      1. caching the etag of the first HEAD/GET (we don't get that HEAD on open with S3Guard, BTW)
      2. on future GET requests, verify the etag of the response
      3. raise an IOE if the remote file changed during the read.

      It's a more dramatic failure, but it stops changes silently corrupting things.

      Attachments

        1. HADOOP-15625-branch-3.2-018.patch
          86 kB
          Steve Loughran
        2. HADOOP-15625-017.patch
          84 kB
          Steve Loughran
        3. HADOOP-15625-016.patch
          84 kB
          Ben Roling
        4. HADOOP-15625-015.patch
          83 kB
          Gabor Bota
        5. HADOOP-15625-015.patch
          83 kB
          Ben Roling
        6. HADOOP-15625-014.patch
          84 kB
          Steve Loughran
        7. HADOOP-15625-013.patch
          84 kB
          Steve Loughran
        8. HADOOP-15625-013-delta.patch
          20 kB
          Steve Loughran
        9. HADOOP-15625-012.patch
          76 kB
          Ben Roling
        10. HADOOP-15625-011.patch
          76 kB
          Ben Roling
        11. HADOOP-15625-010.patch
          76 kB
          Ben Roling
        12. HADOOP-15625-009.patch
          76 kB
          Ben Roling
        13. HADOOP-15625-008.patch
          76 kB
          Ben Roling
        14. HADOOP-15625-007.patch
          65 kB
          Ben Roling
        15. HADOOP--15625-006.patch
          60 kB
          Steve Loughran
        16. HADOOP-15625-006.patch
          38 kB
          Ben Roling
        17. HADOOP-15625-005.patch
          38 kB
          Ben Roling
        18. HADOOP-15625-004.patch
          38 kB
          Ben Roling
        19. HADOOP-15625-003.patch
          12 kB
          Steve Loughran
        20. HADOOP-15625-002.patch
          10 kB
          Brahma Reddy Battula
        21. HADOOP-15625-001.patch
          9 kB
          Brahma Reddy Battula

        Issue Links

          Activity

            People

              ben.roling Ben Roling
              brahmareddy Brahma Reddy Battula
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: