Uploaded image for project: 'Sqoop (Retired)'
  1. Sqoop (Retired)
  2. SQOOP-1946

DateSplitter relies on database string-to-date conversion when creating splits based on date columns

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.4.5
    • None
    • connectors/generic
    • None

    Description

      When running a sqoop import job with a split-by column of data type DATE, the DateSplitter relies on the database implicit string-to-date convertion when generating the lower and upper bound clauses for the splits. The splits are generated with clauses similar to the following:

      date_col >= '2013-08-26 00:00:00.0'
      date_col <= '2013-08-26 00:00:00.0'
      

      This forces the database to either implcitly convert the date_col to string or implicitly cast the literal string to a date type. In case the database default date format is not the expected, this could lead to either exceptions, as in the example below, or to unexpected behavior (string comparison of to dates in different string formats).

      For Oracle databases, for example, we may see the following errors when the database cannot implicitly convert the string above to date:

      2014-12-23 12:38:25,690 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: sales_date >= '2013-08-26 00:00:00.0' AND sales_date <= '2013-08-26 00:00:00.0'
      
      2014-12-23 12:38:25,745 INFO [main] org.apache.sqoop.mapreduce.db.DBRecordReader: Working on split: sales_date >= '2013-08-26 00:00:00.0' AND sales_date <= '2013-08-26 00:00:00.0'
      
      2014-12-23 12:38:25,860 INFO [main] org.apache.sqoop.mapreduce.db.DBRecordReader: Executing query: select * from schema.TABLE where ( date_col >= '2013-08-26 00:00:00.0' ) AND ( date_col <= '2013-08-26 00:00:00.0' )
      
      2014-12-23 12:38:25,909 ERROR [main] org.apache.sqoop.mapreduce.db.DBRecordReader: Top level exception:  
      
      java.sql.SQLDataException: ORA-01861: literal does not match format string 
      

      A workaround for that is to set the default date format for the database user used by Sqoop to the "expected" format using a logon trigger in Oracle, like the one below:

      CREATE OR REPLACE TRIGGER tr_a_l_set_date_format
         AFTER LOGON ON DATABASE WHEN (USER = 'SQOOP_USER')
         BEGIN
           execute immediate 'alter session set nls_date_format="yyyy-mm-dd hh24:mi:ss"';
         END;
      

      A better form, though, would be to explicitly convert the date string literal to a DATE using a specific format, in the exact same way that OracleManager.datetimeToQueryString() does.

      Attachments

        1. SQOOP-1946.patch
          0.8 kB
          Michael Hsu

        Issue Links

          Activity

            People

              Unassigned Unassigned
              asdaraujo Andre Araujo
              Votes:
              2 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m