Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-5562

Provide stripe level column statistics in ORC

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.13.0
    • Fix Version/s: 0.13.0
    • Component/s: File Formats
    • Labels:

      Description

      ORC maintains two levels of column statistics. Index statistics (for every rowgroup) and file level column statistics for the entire file. It is useful to have stripe level column statistics which will be intermediate to index and file statistics. The reason to maintain stripe level statistics is that, the current input split computation logic is based on stripe boundaries. So if stripe level statistics are available and if a stripe doesn't satisfy a predicate condition then that entire stripe (also split) can be eliminated from split computation.

        Attachments

        1. HIVE-5562.2.patch.txt
          100 kB
          Prasanth Jayachandran
        2. HIVE-5562.1.patch.txt
          104 kB
          Prasanth Jayachandran

          Issue Links

            Activity

              People

              • Assignee:
                prasanth_j Prasanth Jayachandran
                Reporter:
                prasanth_j Prasanth Jayachandran
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: