Uploaded image for project: 'Sqoop (Retired)'
  1. Sqoop (Retired)
  2. SQOOP-1714

DateSplitter makes wrong splits

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.4.4
    • None
    • None
    • None
    • CentOS 6.4 CDH-5.1.0

    Description

      If the split-by column is a Date type, Sqoop will send a query to read Min(Date) and Max(Date), those two values are passed to DateSplitter. DateSplitter converts those values into long, and does a split using num-mappers. But this method is wrong. If min(Date) and max(Date) are 2013-09-26 and 2013-09-28, how many days do we have? 3 days. But if 2013-09-28 as a java.sql.Date#getTIme will returns the value actually is (2013-09-28 00:00:00), the maxVal - minVal has only two days.

      I encountered this issue when I tried to import a Teradata table: Given date between 2013-09-26 and 2013-09-28, and num-mappers=3, there are 3 tasks, the conditions are

      1. date >= 2013-09-26 and date < 2013-09-26;
      2. date >=2013-09-26 and date < 2013-09-27,
      3. date >= 2013-09-27 and date <= 2013-09-28
        The first one has nothing, and the last one has two days.

      Because the difference of the minVal and maxVal is two days (24*2*3600*1000), the split size will be 2/3 day, when it is converted back to Date, it will be still 2013-09-26, that's why the first partition is wrong.

      Attachments

        Activity

          People

            Unassigned Unassigned
            bewang.tech Benyi Wang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: