[MAPREDUCE-7182] MapReduce input format/record readers to support S3 select queries - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: 3.3.0
Fix Version/s: None
Component/s: mrv2
Labels:
None

Target Version/s:

3.4.0

Description

~~HADOOP-15229~~ adds S3 select through the (new) async openFile API, but the classic RecordReader &c can't handle it because

the files are shorter than they are in a getFileStatus, and the readers assume that an EOFException is an error in that situation
everything assumes plain text is splitable
if a file has a gz extension, the gunzip codec should be used. So breaks transcoded/uncompressed data

to handle s3 select data sources we need to be able to address them, either through changes to the existing code (danger?) or some new readers

Attachments

Issue Links

depends upon

HADOOP-15364 Add support for S3 Select to S3A

Resolved

HADOOP-15229 Add FileSystem builder-based openFile() API to match createFile(); S3A to implement S3 Select through this API.

Resolved

is depended upon by

HADOOP-13887 Encrypt S3A data client-side with AWS SDK (S3-CSE)

Resolved

HADOOP-16829 Über-jira: S3A Hadoop 3.3.1 features

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Steve Loughran

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 04/Feb/19 17:32

Updated:: 11/Mar/21 15:23

Resolved:: 11/Mar/21 15:23