Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20254

CheckNonCombinablePathCallable is buggy

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 1.1.0
    • Fix Version/s: 2.3.0
    • Component/s: None
    • Labels:
      None

      Description

      CombineHiveInputFormat provides the possibility for people to avoid combine some part of their inputs (by implementing AvoidSplitCombination)

      We spot a problem with that when our query tries to read a lot of partitions (more than 100). In fact, when there are more than 100 input paths, the check of combinability is run in parallel:

      • dividing the input path array into several chunks (each chunk with no more than 100 paths)
      • submit each chunk to a¬†CheckNonCombinablePathCallable
      • each¬†CheckNonCombinablePathCallable will return a set of index for the paths to not be combined

      The problem is that CheckNonCombinablePathCallable returns a set of relative index (the index inside the chunk) instead of the absolute index, it means that the returned indices are always smaller than 100, thus all the paths in the array with position bigger than 100 are never taken into account for avoiding combine input.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                q.xu Qinghui Xu
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: