Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7182

MapReduce input format/record readers to support S3 select queries

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 3.3.0
    • None
    • mrv2
    • None

    Description

      HADOOP-15229 adds S3 select through the (new) async openFile API, but the classic RecordReader &c can't handle it because

      1. the files are shorter than they are in a getFileStatus, and the readers assume that an EOFException is an error in that situation
      2. everything assumes plain text is splitable
      3. if a file has a gz extension, the gunzip codec should be used. So breaks transcoded/uncompressed data

      to handle s3 select data sources we need to be able to address them, either through changes to the existing code (danger?) or some new readers

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              stevel@apache.org Steve Loughran
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: