Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-4868

When reading an ORC file by an MR job, some Mappers may not be able to process data in some cases

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • None
    • None

    Description

      Let's say a stripe of an ORC file is 256 MB and we set the split size for an MR job to 64 MB. Right now, splits are created based on byte ranges.
      Here is an example:

      |<-The start of a stripe                |<-The end of a stripe
      v                                       v
      |---------------------------------------|
         ^                        ^ 
         |<- The start of a split |<- The end of a split
      

      So, for some Mappers, it is possible that there is no start of a stripe within the byte range of a split. Those Mappers will process 0 record. We can improve how splits are created for ORC.

      Attachments

        Issue Links

          Activity

            People

              yhuai Yin Huai
              yhuai Yin Huai
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: