Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7182

MapReduce input format/record readers to support S3 select queries

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 3.3.0
    • None
    • mrv2
    • None

    Description

      HADOOP-15229 adds S3 select through the (new) async openFile API, but the classic RecordReader &c can't handle it because

      1. the files are shorter than they are in a getFileStatus, and the readers assume that an EOFException is an error in that situation
      2. everything assumes plain text is splitable
      3. if a file has a gz extension, the gunzip codec should be used. So breaks transcoded/uncompressed data

      to handle s3 select data sources we need to be able to address them, either through changes to the existing code (danger?) or some new readers

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            stevel@apache.org Steve Loughran
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment