Description
HADOOP-15229 adds S3 select through the (new) async openFile API, but the classic RecordReader &c can't handle it because
- the files are shorter than they are in a getFileStatus, and the readers assume that an EOFException is an error in that situation
- everything assumes plain text is splitable
- if a file has a gz extension, the gunzip codec should be used. So breaks transcoded/uncompressed data
to handle s3 select data sources we need to be able to address them, either through changes to the existing code (danger?) or some new readers
Attachments
Issue Links
- depends upon
-
HADOOP-15364 Add support for S3 Select to S3A
- Resolved
-
HADOOP-15229 Add FileSystem builder-based openFile() API to match createFile(); S3A to implement S3 Select through this API.
- Resolved
- is depended upon by
-
HADOOP-13887 Encrypt S3A data client-side with AWS SDK (S3-CSE)
- Resolved
-
HADOOP-16829 Über-jira: S3A Hadoop 3.3.1 features
- Resolved