Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-5102

ORC getSplits should create splits based the stripes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.11.0, 0.12.0
    • 0.13.0
    • File Formats
    • None

    Description

      Currently ORC inherits getSplits from FileFormat, which basically makes a split per an HDFS block. This can create too little parallelism and would be better done by having getSplits look at the file footer and create splits based on the stripes.

      Attachments

        1. HIVE-5102.D12579.1.patch
          42 kB
          Phabricator
        2. HIVE-5102.D12579.2.patch
          43 kB
          Phabricator
        3. HIVE-5102.D12849.1.patch
          44 kB
          Phabricator
        4. HIVE-5102.D12849.2.patch
          44 kB
          Phabricator

        Issue Links

          Activity

            People

              omalley Owen O'Malley
              omalley Owen O'Malley
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: