Details
-
Bug
-
Status: Closed
-
Trivial
-
Resolution: Fixed
-
None
-
None
-
Patch
Description
S3A filesystem client (inherited by Hadoop) supports the notion of input policies.
These policies tune the behaviour of HTTP requests that are used for reading different filetypes such as TEXT or ORC.
For formats such as ORC and Parquet that do a lot of seek operations, there is an optimized RANDOM mode that reads files only partially instead of fully (default).
I am suggesting to add some extra logic as part of HiveInputFormat to make sure we optimize RecordReader requests for random IO when data is stored on S3A using formats such as ORC or Parquet.
Attachments
Attachments
Issue Links
- is fixed by
-
HIVE-24225 FIX S3A recordReader policy selection
- Closed
- relates to
-
HIVE-23393 LLapInputFormat reader policy for Random IO formats
- Closed
- links to