Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-4868

When reading an ORC file by an MR job, some Mappers may not be able to process data in some cases

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Let's say a stripe of an ORC file is 256 MB and we set the split size for an MR job to 64 MB. Right now, splits are created based on byte ranges.
      Here is an example:

      |<-The start of a stripe                |<-The end of a stripe
      v                                       v
      |---------------------------------------|
         ^                        ^ 
         |<- The start of a split |<- The end of a split
      

      So, for some Mappers, it is possible that there is no start of a stripe within the byte range of a split. Those Mappers will process 0 record. We can improve how splits are created for ORC.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                yhuai Yin Huai
                Reporter:
                yhuai Yin Huai
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: