Hive
  1. Hive
  2. HIVE-4868

When reading an ORC file by an MR job, some Mappers may not be able to process data in some cases

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Let's say a stripe of an ORC file is 256 MB and we set the split size for an MR job to 64 MB. Right now, splits are created based on byte ranges.
      Here is an example:

      |<-The start of a stripe                |<-The end of a stripe
      v                                       v
      |---------------------------------------|
         ^                        ^ 
         |<- The start of a split |<- The end of a split
      

      So, for some Mappers, it is possible that there is no start of a stripe within the byte range of a split. Those Mappers will process 0 record. We can improve how splits are created for ORC.

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Yin Huai
              Reporter:
              Yin Huai
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development