Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3874

Create a new Optimized Row Columnar file format for Hive

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11.0
    • Labels:
      None

      Description

      There are several limitations of the current RC File format that I'd like to address by creating a new format:

      • each column value is stored as a binary blob, which means:
        • the entire column value must be read, decompressed, and deserialized
        • the file format can't use smarter type-specific compression
        • push down filters can't be evaluated
      • the start of each row group needs to be found by scanning
      • user metadata can only be added to the file when the file is created
      • the file doesn't store the number of rows per a file or row group
      • there is no mechanism for seeking to a particular row number, which is required for external indexes.
      • there is no mechanism for storing light weight indexes within the file to enable push-down filters to skip entire row groups.
      • the type of the rows aren't stored in the file

        Attachments

        1. hive.3874.2.patch
          666 kB
          Namit Jain
        2. HIVE-3874.D8529.1.patch
          735 kB
          Phabricator
        3. HIVE-3874.D8529.2.patch
          740 kB
          Phabricator
        4. HIVE-3874.D8529.3.patch
          741 kB
          Phabricator
        5. HIVE-3874.D8529.4.patch
          745 kB
          Phabricator
        6. HIVE-3874.D8871.1.patch
          12 kB
          Phabricator
        7. orc.tgz
          49 kB
          Owen O'Malley
        8. OrcFileIntro.pptx
          1.10 MB
          Owen O'Malley

          Issue Links

            Activity

              People

              • Assignee:
                owen.omalley Owen O'Malley
                Reporter:
                owen.omalley Owen O'Malley
              • Votes:
                6 Vote for this issue
                Watchers:
                61 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: