Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3874

Create a new Optimized Row Columnar file format for Hive

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.11.0
    • None

    Description

      There are several limitations of the current RC File format that I'd like to address by creating a new format:

      • each column value is stored as a binary blob, which means:
        • the entire column value must be read, decompressed, and deserialized
        • the file format can't use smarter type-specific compression
        • push down filters can't be evaluated
      • the start of each row group needs to be found by scanning
      • user metadata can only be added to the file when the file is created
      • the file doesn't store the number of rows per a file or row group
      • there is no mechanism for seeking to a particular row number, which is required for external indexes.
      • there is no mechanism for storing light weight indexes within the file to enable push-down filters to skip entire row groups.
      • the type of the rows aren't stored in the file

      Attachments

        1. hive.3874.2.patch
          666 kB
          Namit Jain
        2. HIVE-3874.D8529.1.patch
          735 kB
          Phabricator
        3. HIVE-3874.D8529.2.patch
          740 kB
          Phabricator
        4. HIVE-3874.D8529.3.patch
          741 kB
          Phabricator
        5. HIVE-3874.D8529.4.patch
          745 kB
          Phabricator
        6. HIVE-3874.D8871.1.patch
          12 kB
          Phabricator
        7. orc.tgz
          49 kB
          Owen O'Malley
        8. OrcFileIntro.pptx
          1.10 MB
          Owen O'Malley

        Issue Links

          Activity

            People

              omalley Owen O'Malley
              omalley Owen O'Malley
              Votes:
              6 Vote for this issue
              Watchers:
              60 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: