Uploaded image for project: 'Sqoop (Retired)'
  1. Sqoop (Retired)
  2. SQOOP-3245

Documentation for timezone handling with Oracle and Parquet may be confusing

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      The current documentation does not mention that --as-parquetfile will convert all date/timestamp types to Long format (milliseconds since epoch), while also converting to match the session timezone.

      This can cause inconsistencies with some cases where data that is inserted in another timezone as the host running the sqoop command differ.

      In addition, the current documentation around Oracle says the below:

      Oracle also includes the additional date/time types TIMESTAMP WITH TIMEZONE and TIMESTAMP WITH LOCAL TIMEZONE. To support these types, the user’s session timezone must be specified. By default, Sqoop will specify the timezone "GMT" to Oracle. You can override this setting by specifying a Hadoop property oracle.sessionTimeZone on the command-line when running a Sqoop job. For example:
      

      What is not mentioned is that this is only applicable with OraOop (--direct) enabled. This can be also be interpreted that only 'TIMESTAMP WITH TIMEZONE' and 'TIMESTAMP WITH LOCAL TIMEZONE' will be affected, not the entire session will have the GMT timezone

      Attachments

        Activity

          People

            Unassigned Unassigned
            jphelps Jason Phelps
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: