Uploaded image for project: 'Sqoop'
  1. Sqoop
  2. SQOOP-423

Sqoop import of timestamps to Avro from Postgres - Timezone Issue

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.3.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      I am running sqoop-1.3.0-cdh3u2 on a Mac and when I sqoop import from a postgres table with columns of type 'timestamp without time zone', they are converted to longs in the time zone of my local operating system, even after I have started Hadoop up with TZ=GMT or passed in HADOOP_OPTS="-Duser.timezone=GMT". My ultimate goal is to sqoop import into long representations that are in GMT timezone rather than my operating system's timezone.

      Postgres example:

      acamp_id |     start_time      |      end_time       
      ----------+---------------------+---------------------
             1 | 2008-01-01 00:00:00 | 2011-12-16 00:00:00
      

      After import, you can see the values are 8 hours ahead, even with TZ=GMT and user.timezone set properly (this is the json representation of the parsed imported avro file):

      {"acamp_id": 1, "end_time": 1324022400000, "start_time": 1199174400000}
      

      date utility invocation:

      lynngoh@unknown:~$ date -u -r 1199174400
      Tue Jan  1 08:00:00 UTC 2008
      

        Activity

        Hide
        marciosilva Marcio Silva added a comment -

        A workaround you can try is passing the user.timezone property as a mapreduce child JVM option. We were seeing a similar error with Oracle Date mappings in Avro and we "resolved" it by specifying the system timezone that was being used to create the java.sql.TimeStamp's.

        The HADOOP_OPTS setting only makes the change in the job submission process, but as the actual code is being run on the cluster, you need to modify the settings there.

          
          -D mapred.child.java.opts=" -Duser.timezone=GMT"
        
        Show
        marciosilva Marcio Silva added a comment - A workaround you can try is passing the user.timezone property as a mapreduce child JVM option. We were seeing a similar error with Oracle Date mappings in Avro and we "resolved" it by specifying the system timezone that was being used to create the java.sql.TimeStamp's. The HADOOP_OPTS setting only makes the change in the job submission process, but as the actual code is being run on the cluster, you need to modify the settings there. -D mapred.child.java.opts=" -Duser.timezone=GMT"
        Hide
        pmazak Paul Mazak added a comment -

        This was a major issue for us. We had to re-import a bunch of data.
        Sqoop is importing a SQL Server Timestamp, which the driver turns into a java.sql.Timestamp datatype, and doing a ((Timestamp) o).getTime() which returns a long in the timezone of the executing datanode, not the timezone of SQL Server where the datatype was created. It seems that ((Timestamp) o).toString() would be better as that represents what SQL Server shows when selecting the column.

        We've since added the workaround suggested above. BTW, the property is now called:

        -D mapreduce.map.java.opts=" -Duser.timezone=GMT"
        Show
        pmazak Paul Mazak added a comment - This was a major issue for us. We had to re-import a bunch of data. Sqoop is importing a SQL Server Timestamp, which the driver turns into a java.sql.Timestamp datatype, and doing a ((Timestamp) o).getTime() which returns a long in the timezone of the executing datanode, not the timezone of SQL Server where the datatype was created. It seems that ((Timestamp) o).toString() would be better as that represents what SQL Server shows when selecting the column. We've since added the workaround suggested above. BTW, the property is now called: -D mapreduce.map.java.opts= " -Duser.timezone=GMT"
        Hide
        yourtechchick simran added a comment - - edited

        But how do I add it to a sqoop job ? Paul Mazak Marcio Silva

        I tried:

        sqoop job -Duser.timezone=GMT --meta-connect jdbc:hsqldb:hsql://FQDN:16000/sqoop --create JOB_NAME – import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToNull --username root --password 'PASSWORD' --table TABLE_NAME --incremental lastmodified --check-column updated_at --last-value 0 --merge-key entity_id --split-by entity_id --target-dir PATH_TO_DIRECTORY --hive-database DB_NAME --hive-drop-import-delims --null-string '
        N' --null-non-string '
        N' --fields-terminated-by '\001' --input-null-string '
        N' --input-null-non-string '
        N' --input-null-non-string '
        N' --input-fields-terminated-by '\001'

        and

        sqoop job -Dmapred.child.java.opts="-Duser.timezone=GMT"
        --meta-connect jdbc:hsqldb:hsql://FQDN:16000/sqoop --create JOB_NAME – import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToNull --username root --password 'PASSWORD' --table TABLE_NAME --incremental lastmodified --check-column updated_at --last-value 0 --merge-key entity_id --split-by entity_id --target-dir PATH_TO_DIRECTORY --hive-database DB_NAME --hive-drop-import-delims --null-string '
        N' --null-non-string '
        N' --fields-terminated-by '\001' --input-null-string '
        N' --input-null-non-string '
        N' --input-null-non-string '
        N' --input-fields-terminated-by '\001'

        but none of these really worked.

        Show
        yourtechchick simran added a comment - - edited But how do I add it to a sqoop job ? Paul Mazak Marcio Silva I tried: sqoop job -Duser.timezone=GMT --meta-connect jdbc:hsqldb:hsql://FQDN:16000/sqoop --create JOB_NAME – import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToNull --username root --password 'PASSWORD' --table TABLE_NAME --incremental lastmodified --check-column updated_at --last-value 0 --merge-key entity_id --split-by entity_id --target-dir PATH_TO_DIRECTORY --hive-database DB_NAME --hive-drop-import-delims --null-string ' N' --null-non-string ' N' --fields-terminated-by '\001' --input-null-string ' N' --input-null-non-string ' N' --input-null-non-string ' N' --input-fields-terminated-by '\001' and sqoop job -Dmapred.child.java.opts="-Duser.timezone=GMT" --meta-connect jdbc:hsqldb:hsql://FQDN:16000/sqoop --create JOB_NAME – import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToNull --username root --password 'PASSWORD' --table TABLE_NAME --incremental lastmodified --check-column updated_at --last-value 0 --merge-key entity_id --split-by entity_id --target-dir PATH_TO_DIRECTORY --hive-database DB_NAME --hive-drop-import-delims --null-string ' N' --null-non-string ' N' --fields-terminated-by '\001' --input-null-string ' N' --input-null-non-string ' N' --input-null-non-string ' N' --input-fields-terminated-by '\001' but none of these really worked.

          People

          • Assignee:
            Unassigned
            Reporter:
            lgoh Lynn Goh
          • Votes:
            3 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development