Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3874

Create a new Optimized Row Columnar file format for Hive

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11.0
    • Labels:
      None

      Description

      There are several limitations of the current RC File format that I'd like to address by creating a new format:

      • each column value is stored as a binary blob, which means:
        • the entire column value must be read, decompressed, and deserialized
        • the file format can't use smarter type-specific compression
        • push down filters can't be evaluated
      • the start of each row group needs to be found by scanning
      • user metadata can only be added to the file when the file is created
      • the file doesn't store the number of rows per a file or row group
      • there is no mechanism for seeking to a particular row number, which is required for external indexes.
      • there is no mechanism for storing light weight indexes within the file to enable push-down filters to skip entire row groups.
      • the type of the rows aren't stored in the file

        Attachments

        1. OrcFileIntro.pptx
          1.10 MB
          Owen O'Malley
        2. orc.tgz
          49 kB
          Owen O'Malley
        3. hive.3874.2.patch
          666 kB
          Namit Jain
        4. HIVE-3874.D8529.1.patch
          735 kB
          Phabricator
        5. HIVE-3874.D8529.2.patch
          740 kB
          Phabricator
        6. HIVE-3874.D8529.3.patch
          741 kB
          Phabricator
        7. HIVE-3874.D8871.1.patch
          12 kB
          Phabricator
        8. HIVE-3874.D8529.4.patch
          745 kB
          Phabricator

          Issue Links

            Activity

              People

              • Assignee:
                owen.omalley Owen O'Malley
                Reporter:
                owen.omalley Owen O'Malley
              • Votes:
                6 Vote for this issue
                Watchers:
                61 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: