Sqoop
  1. Sqoop
  2. SQOOP-603

Support small intervals in IntegerSplitter implementation

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.2
    • Fix Version/s: 1.4.3
    • Component/s: None
    • Labels:
      None

      Description

      IntegerSplitter is currently creating splits of following nature:

      minimal value <= x < splitPoint1
      splitPoint1 <= x < splitPoint2
      ...
      splitPointN <= x <= maximal value
      

      Please notice that upper bound is always with using condition "<" with exception of the last split that is using condition "<=". This is perfectly fine when creating reasonable amount of splits on very huge interval.

      This approach will however cause issues on very small intervals. For example following splits will be created on interval [0, 5] with 5 splits:

      • 0 <= x < 1
      • 1 <= x < 2
      • 2 <= x < 3
      • 3 <= x < 4
      • 4 <= x <= 5

      Notice that all splits have equal count of numbers except the last one having two numbers - 4 and 5. This becomes very huge issue when for example user needs to create one split per one partition as one mapper will end up with moving two partitions and thus entire job will take twice as long as the other ones.

      Jarcec

      1. SQOOP-603.patch
        3 kB
        Jarek Jarcec Cecho

        Issue Links

          Activity

          Jarek Jarcec Cecho created issue -
          Jarek Jarcec Cecho made changes -
          Field Original Value New Value
          Remote Link This issue links to "Review board (Web Link)" [ 11149 ]
          Jarek Jarcec Cecho made changes -
          Attachment SQOOP-603.patch [ 12545921 ]
          Jarek Jarcec Cecho made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Cheolsoo Park made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]

            People

            • Assignee:
              Jarek Jarcec Cecho
              Reporter:
              Jarek Jarcec Cecho
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development