Sqoop
  1. Sqoop
  2. SQOOP-456

Generated Java class does not distinguish date and timestamp in Oracle DB

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: codegen
    • Labels:
      None

      Description

      In genereated Java class, both date and timestamp are considered as timestamps. For examples, the following getter method is generated for a date column:

       
      public void readFields(ResultSet __dbResults) throws SQLException {
        this.__cur_result_set = __dbResults;
        this.DATA_COL0 = JdbcWritableBridge.readTimestamp(1, __dbResults);
      }
      

      This happens because both date and timestamp is labeled as Types.TIMESTAMP by ConnManager. The fix seems straightforward. In ClassWriter, the new method getColTypeNamesForTable introduced by Bilung at SQOOP-352 should replace getColTypes().

        Activity

        Hide
        Cheolsoo Park added a comment -

        The Oracle JDBC driver has a long history on date and timestamp. In shorts, versions < 9.2 && > 11.1 auto-cast date to timestamp. But the rest of versions do not.

        For more details, please refer to the following link: http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-faq-090281.html#08_01

        Show
        Cheolsoo Park added a comment - The Oracle JDBC driver has a long history on date and timestamp. In shorts, versions < 9.2 && > 11.1 auto-cast date to timestamp. But the rest of versions do not. For more details, please refer to the following link: http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-faq-090281.html#08_01
        Hide
        Cheolsoo Park added a comment -

        The best way to fix the problems is probably to override the getColumnTypes() method in OracleManager. If the type is timestamp, but the type name is "DATE", the column type should be labelled as date not timestamp. Since getColumnTypes() is used everywhere to determine the type of columns in Sqoop, if we fixed this method, ClassWriter would generate code that reads dates.

        Show
        Cheolsoo Park added a comment - The best way to fix the problems is probably to override the getColumnTypes() method in OracleManager. If the type is timestamp, but the type name is "DATE", the column type should be labelled as date not timestamp. Since getColumnTypes() is used everywhere to determine the type of columns in Sqoop, if we fixed this method, ClassWriter would generate code that reads dates.
        Hide
        Arvind Prabhakar added a comment -

        Good find Cheolsoo. Changing the code generator will fix this problem, but will introduce backward incompatibility for previously generated records. Any suggestion on how to handle that?

        Show
        Arvind Prabhakar added a comment - Good find Cheolsoo. Changing the code generator will fix this problem, but will introduce backward incompatibility for previously generated records. Any suggestion on how to handle that?
        Hide
        Cheolsoo Park added a comment - - edited

        Hi Arvind,

        Thanks for your comment. I think that we would need to introduce a new option something like --mapDateToTimestamp. In fact, this is the property that Oracle provides for this particular problem. By default, we could set --mapDatetoTimestamp to true so that backward incompatibility wouldn't be introduced.

        Show
        Cheolsoo Park added a comment - - edited Hi Arvind, Thanks for your comment. I think that we would need to introduce a new option something like --mapDateToTimestamp. In fact, this is the property that Oracle provides for this particular problem. By default, we could set --mapDatetoTimestamp to true so that backward incompatibility wouldn't be introduced.
        Hide
        Arvind Prabhakar added a comment -

        Sounds good to me. As long as we are backward compatible by default, the user should have the ability to work around this issue. Introducing a option that allows user to override this functionality sounds like a good plan.

        Show
        Arvind Prabhakar added a comment - Sounds good to me. As long as we are backward compatible by default, the user should have the ability to work around this issue. Introducing a option that allows user to override this functionality sounds like a good plan.
        Hide
        Jarek Jarcec Cecho added a comment -

        It appears to me that one of the recommended solutions is to provide property -Doracle.jdbc.V8Compatible="true". As we are supporting -D options (properties) on the command line does it make sense to introduce another command line option?

        Jarcec

        Show
        Jarek Jarcec Cecho added a comment - It appears to me that one of the recommended solutions is to provide property -Doracle.jdbc.V8Compatible="true". As we are supporting -D options (properties) on the command line does it make sense to introduce another command line option? Jarcec
        Hide
        Cheolsoo Park added a comment -

        Agreed. I am marking this jira as a won't fix.

        Thanks for your comment Jarcec!

        Show
        Cheolsoo Park added a comment - Agreed. I am marking this jira as a won't fix. Thanks for your comment Jarcec!
        Hide
        Jonathan Leech added a comment -

        Problem as I see it even with system properties / jdbc driver hacks / column mappings, is that with splits on an Oracle date column, Sqoop's generated sql statements have something like to_timestamp('format', 'the date of the split as text') and Oracle ain't never gonna treat that as a date no matter what, at least for the purposes of using either a date in an index or to go to a range of date partitions.

        I may be missing something but my only way to avoid full scans on tables was to manually split the data with multiple invocations of Sqoop with different output directories and --where clauses, which worked but was annoying.

        The different flavors of -D props for the Oracle driver would work if the splits queries used a bound parameter with a java.sql.Timestamp instead of to_timestamp, with the added benefit of a slight gain from Oracle not having to hard parse, and even crazy DBA stuff like query baselining etc. could then be possible. For that matter another command-line arg --hints would be pretty useful.

        If you reopen this and let me loose on your codebase I will fix it for you. I would love to make it that much easier for people to migrate from Oracle.

        Show
        Jonathan Leech added a comment - Problem as I see it even with system properties / jdbc driver hacks / column mappings, is that with splits on an Oracle date column, Sqoop's generated sql statements have something like to_timestamp('format', 'the date of the split as text') and Oracle ain't never gonna treat that as a date no matter what, at least for the purposes of using either a date in an index or to go to a range of date partitions. I may be missing something but my only way to avoid full scans on tables was to manually split the data with multiple invocations of Sqoop with different output directories and --where clauses, which worked but was annoying. The different flavors of -D props for the Oracle driver would work if the splits queries used a bound parameter with a java.sql.Timestamp instead of to_timestamp, with the added benefit of a slight gain from Oracle not having to hard parse, and even crazy DBA stuff like query baselining etc. could then be possible. For that matter another command-line arg --hints would be pretty useful. If you reopen this and let me loose on your codebase I will fix it for you. I would love to make it that much easier for people to migrate from Oracle.
        Hide
        Jarek Jarcec Cecho added a comment -

        Hi Jonathan Leech,
        thank you very much for your feedback! Sqoop should always follow the data types provided by the JDBC drivers (or the overrides on command line) and that should be accomplished by the property Doracle.jdbc.V8Compatible. If this is not the case and sqoop is still using different type than please create a new JIRA with description on what circumstances this occurs and we will be more than happy to take a look!

        Jarcec

        Show
        Jarek Jarcec Cecho added a comment - Hi Jonathan Leech , thank you very much for your feedback! Sqoop should always follow the data types provided by the JDBC drivers (or the overrides on command line) and that should be accomplished by the property Doracle.jdbc.V8Compatible . If this is not the case and sqoop is still using different type than please create a new JIRA with description on what circumstances this occurs and we will be more than happy to take a look! Jarcec

          People

          • Assignee:
            Unassigned
            Reporter:
            Cheolsoo Park
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development