Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23158

Optimize S3A recordReader policy for Random IO formats

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0.0
    • Component/s: None
    • Target Version/s:
    • Flags:
      Patch

      Description

      S3A filesystem client (inherited by Hadoop) supports the notion of input policies.
      These policies tune the behaviour of HTTP requests that are used for reading different filetypes such as TEXT or ORC.

      For formats such as ORC and Parquet that do a lot of seek operations, there is an optimized RANDOM mode that reads files only partially instead of fully (default).

      I am suggesting to add some extra logic as part of HiveInputFormat to make sure we optimize RecordReader requests for random IO when data is stored on S3A using formats such as ORC or Parquet.

        Attachments

        1. HIVE-23158.01.patch
          12 kB
          Panagiotis Garefalakis

          Issue Links

            Activity

              People

              • Assignee:
                pgaref Panagiotis Garefalakis
                Reporter:
                pgaref Panagiotis Garefalakis
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m