Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-461

Optimize RCFile reading by using column pruning results

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.4.0
    • 0.4.0
    • None
    • Reviewed
    • HIVE-461. Optimize RCFile reading by using column pruning results. (Yongqiang He via zshao)

    Description

      RCFile is a column-based file format introduced in HIVE-352. Column-based storage has shown better compression ratio. On our internal data set (30 columns, most of them are short integer strings), we are seeing gzip-compressed RCFile to be 20%+ smaller than gzip-compressed SequenceFile.

      RCFIle also has the potential to improve the reading efficiency a lot since it compresses each column separately.

      We should integrate RCFile with the column pruning results from Hive to make the reading faster.

      Attachments

        1. hive-461-2009-07-04.patch
          358 kB
          He Yongqiang
        2. hive-461-2009-06-27.patch
          478 kB
          He Yongqiang
        3. hive-461-2009-06-26.patch
          484 kB
          He Yongqiang
        4. hive-461-2009-05-26.patch
          13 kB
          He Yongqiang

        Issue Links

          Activity

            People

              he yongqiang He Yongqiang
              zshao Zheng Shao
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: