Sqoop
  1. Sqoop
  2. SQOOP-481

Sqoop import with --hive-import using wrong column names in --columns throws a NPE

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.2
    • Component/s: None
    • Labels:
      None

      Description

      To reproduce the error,

      1) Create a table "foo" with a column name "I" on Oracle DB
      2) Run sqoop import --connect jdbc:oracle:thin:@//localhost/xe --username **** --password **** --verbose --table foo --split-by i --columns i --hive-import

      This generates the following call stack:

      12/05/01 16:12:00 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
      java.lang.NullPointerException
      	at com.cloudera.sqoop.hive.TableDefWriter.getCreateTableStmt(TableDefWriter.java:162)
      	at com.cloudera.sqoop.hive.HiveImport.importTable(HiveImport.java:195)
      	at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:394)
      	at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:455)
      	at com.cloudera.sqoop.Sqoop.run(Sqoop.java:146)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
      	at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:182)
      	at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:221)
      	at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:230)
      	at com.cloudera.sqoop.Sqoop.main(Sqoop.java:239)
      

      The reason is simple. In the following lines of code:

      Integer colType = columnTypes.get(col);
      ...
      tring hiveColType = connManager.toHiveType(colType);
      

      colType is null because column "i" does not exist in the table "foo" but "I" exists. Now toHiveType(int colType) tries to autocast a null to a primitive int, resulting a NPE.

      It would be better if more informative message is provided rather than a random NPE.

      1. SQOOP_481.patch
        6 kB
        Cheolsoo Park
      2. SQOOP_481-2.patch
        6 kB
        Cheolsoo Park

        Activity

        Hide
        Hudson added a comment -

        Integrated in Sqoop-ant-jdk-1.6 #117 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6/117/)
        SQOOP-481. Sqoop import with --hive-import using wrong column names in --columns throws a NPE.

        (Cheolsoo Park via Jarek Jarcec Cecho) (Revision 1345225)

        Result = SUCCESS
        jarcec :
        Files :

        • /sqoop/trunk/src/java/org/apache/sqoop/manager/SqlManager.java
        • /sqoop/trunk/src/java/org/apache/sqoop/util/SqlTypeMap.java
        • /sqoop/trunk/src/test/com/cloudera/sqoop/hive/TestTableDefWriter.java
        Show
        Hudson added a comment - Integrated in Sqoop-ant-jdk-1.6 #117 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6/117/ ) SQOOP-481 . Sqoop import with --hive-import using wrong column names in --columns throws a NPE. (Cheolsoo Park via Jarek Jarcec Cecho) (Revision 1345225) Result = SUCCESS jarcec : Files : /sqoop/trunk/src/java/org/apache/sqoop/manager/SqlManager.java /sqoop/trunk/src/java/org/apache/sqoop/util/SqlTypeMap.java /sqoop/trunk/src/test/com/cloudera/sqoop/hive/TestTableDefWriter.java
        Jarek Jarcec Cecho made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Fix Version/s 1.4.2-incubating [ 12320141 ]
        Resolution Fixed [ 1 ]
        Hide
        Jarek Jarcec Cecho added a comment -

        Committed revision 1345225.

        Thank you Cheolsoo for your patch!

        Jarcec

        Show
        Jarek Jarcec Cecho added a comment - Committed revision 1345225. Thank you Cheolsoo for your patch! Jarcec
        Hide
        Jarek Jarcec Cecho added a comment -

        I personally like this approach as it solve the issue with proper error message, it's quite clean and I do not believe that it will introduce backward incompatibility.

        Jarcec

        Show
        Jarek Jarcec Cecho added a comment - I personally like this approach as it solve the issue with proper error message, it's quite clean and I do not believe that it will introduce backward incompatibility. Jarcec
        Cheolsoo Park made changes -
        Attachment SQOOP_481-2.patch [ 12529907 ]
        Hide
        Cheolsoo Park added a comment -

        In my patch, I introduced a new class SqlTypeMap that sub-classes Java HashMap and encapsulates validation logic.

        The main reason why I took this approach is because it is much cleaner - eliminating need to add a null check to every auto-unboxing code.

        I'd like to know what others think.

        Review board:
        https://reviews.apache.org/r/4974/

        Show
        Cheolsoo Park added a comment - In my patch, I introduced a new class SqlTypeMap that sub-classes Java HashMap and encapsulates validation logic. The main reason why I took this approach is because it is much cleaner - eliminating need to add a null check to every auto-unboxing code. I'd like to know what others think. Review board: https://reviews.apache.org/r/4974/
        Cheolsoo Park made changes -
        Attachment SQOOP_481.patch [ 12525372 ]
        Cheolsoo Park made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4974/
        -----------------------------------------------------------

        Review request for Sqoop, Arvind Prabhakar and Jarek Cecho.

        Summary
        -------

        The changes include:

        1) Introduce a new class SqlTypeMap (subclass of HashMap) that validates values inside the get() and put() methods. This guarantees that the values in the map are always valid (i.e. not null) so that a NPE during auto unboxing can be prevented.

        2) Replace HashMap<String, Integer> with SqlTypeMap<String, Integer> in code.

        This addresses bug SQOOP-481.
        https://issues.apache.org/jira/browse/SQOOP-481

        Diffs


        /src/test/com/cloudera/sqoop/hive/TestTableDefWriter.java 1333183
        /src/java/org/apache/sqoop/manager/SqlManager.java 1333183
        /src/java/org/apache/sqoop/util/SqlTypeMap.java PRE-CREATION

        Diff: https://reviews.apache.org/r/4974/diff

        Testing
        -------

        Ran ant test, ant test -Dthirdparty=true, and ant checkstyle.

        Thanks,

        Cheolsoo

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4974/ ----------------------------------------------------------- Review request for Sqoop, Arvind Prabhakar and Jarek Cecho. Summary ------- The changes include: 1) Introduce a new class SqlTypeMap (subclass of HashMap) that validates values inside the get() and put() methods. This guarantees that the values in the map are always valid (i.e. not null) so that a NPE during auto unboxing can be prevented. 2) Replace HashMap<String, Integer> with SqlTypeMap<String, Integer> in code. This addresses bug SQOOP-481 . https://issues.apache.org/jira/browse/SQOOP-481 Diffs /src/test/com/cloudera/sqoop/hive/TestTableDefWriter.java 1333183 /src/java/org/apache/sqoop/manager/SqlManager.java 1333183 /src/java/org/apache/sqoop/util/SqlTypeMap.java PRE-CREATION Diff: https://reviews.apache.org/r/4974/diff Testing ------- Ran ant test, ant test -Dthirdparty=true, and ant checkstyle. Thanks, Cheolsoo
        Hide
        Cheolsoo Park added a comment - - edited

        Hi Arvind and Jarec, thank you for your suggestions!

        Indeed, failing fast is a good thing to do, and we should always prevent maps from having null values. In fact, fast-fast logic is already in place. For example, we call valueOf() when putting a new value into a map, and valueOf does not return a null.

        protected Map<String, Integer> getColumnTypesForRawQuery(String stmt)
        colTypes.put(colName, Integer.valueOf(typeId));
        

        But the problem remains because it is still possible for maps to not have specific keys, which cannot be detected until a look-up happens. For nonexistent keys, when the get() call is made is the earliest point we can fail. So I believe that adding a null check after get() is the best we can do.

        Integer sqlType = columnTypes.get(col);
        if (sqlType == null) {
           throw new IOException("Column " + col + " does not exist in table " + tableName);
        }
        String javaType = toJavaType(col, sqlType);
        

        Please correct me know if I misunderstand your suggestions.

        Show
        Cheolsoo Park added a comment - - edited Hi Arvind and Jarec, thank you for your suggestions! Indeed, failing fast is a good thing to do, and we should always prevent maps from having null values. In fact, fast-fast logic is already in place. For example, we call valueOf() when putting a new value into a map, and valueOf does not return a null. protected Map<String, Integer> getColumnTypesForRawQuery(String stmt) colTypes.put(colName, Integer .valueOf(typeId)); But the problem remains because it is still possible for maps to not have specific keys, which cannot be detected until a look-up happens. For nonexistent keys, when the get() call is made is the earliest point we can fail. So I believe that adding a null check after get() is the best we can do. Integer sqlType = columnTypes.get(col); if (sqlType == null ) { throw new IOException( "Column " + col + " does not exist in table " + tableName); } String javaType = toJavaType(col, sqlType); Please correct me know if I misunderstand your suggestions.
        Hide
        Jarek Jarcec Cecho added a comment -

        I would also prefer failing as soon as possible with proper description what has happened.

        E.g. Arvind +1

        Jarcec

        Show
        Jarek Jarcec Cecho added a comment - I would also prefer failing as soon as possible with proper description what has happened. E.g. Arvind +1 Jarcec
        Cheolsoo Park made changes -
        Summary Sqoop import with --have-import using wrong column names in --columns throws a NPE Sqoop import with --hive-import using wrong column names in --columns throws a NPE
        Hide
        Arvind Prabhakar added a comment -

        Thanks for finding this and the analysis Cheolsoo. One alternate way to consider handling this issue is to implement fail-fast logic. Which is to say that the moment a column map is constructed with empty or null column names in it, we should immediately register a failure.

        Show
        Arvind Prabhakar added a comment - Thanks for finding this and the analysis Cheolsoo. One alternate way to consider handling this issue is to implement fail-fast logic. Which is to say that the moment a column map is constructed with empty or null column names in it, we should immediately register a failure.
        Hide
        Cheolsoo Park added a comment -

        In fact, there are multiple places where the same issue can occur due to use of autoboxing. For example, in the following lines of code in ClassWriter.generateLoadLargeObjects() can also throw a NPE if "col" does not exist in the table because columnTypes.get() will return a null that in turn will be autocated to a primitive integer:

        int sqlType = columnTypes.get(col);
        String javaType = toJavaType(col, sqlType);
        

        Even worse, in some DBs including Oracle, it is very easy to run into this problem because column names are case-sensitive. For example, table has a column "i" but the user may specify "I" in an option.

        What I think we should do is:
        1) Identify all places where autoboxing takes place in Sqoop
        2) Surround them with a try-catch block (or add a null check) and print an informative error message such as: column 'x' does not exist in table 'y'.

        Please let me know if anyone has a better suggestion.

        Show
        Cheolsoo Park added a comment - In fact, there are multiple places where the same issue can occur due to use of autoboxing. For example, in the following lines of code in ClassWriter.generateLoadLargeObjects() can also throw a NPE if "col" does not exist in the table because columnTypes.get() will return a null that in turn will be autocated to a primitive integer: int sqlType = columnTypes.get(col); String javaType = toJavaType(col, sqlType); Even worse, in some DBs including Oracle, it is very easy to run into this problem because column names are case-sensitive. For example, table has a column "i" but the user may specify "I" in an option. What I think we should do is: 1) Identify all places where autoboxing takes place in Sqoop 2) Surround them with a try-catch block (or add a null check) and print an informative error message such as: column 'x' does not exist in table 'y'. Please let me know if anyone has a better suggestion.
        Cheolsoo Park made changes -
        Field Original Value New Value
        Description To reproduce the error,

        1) Create a table "foo" with a column name "I" on Oracle DB
        2) Run sqoop import --connect jdbc:oracle:thin:@//localhost/xe --username **** --password **** --verbose --table foo --split-by i --columns i --hive-import

        This generates the following call stack:

        12/05/01 16:12:00 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
        java.lang.NullPointerException
        at com.cloudera.sqoop.hive.TableDefWriter.getCreateTableStmt(TableDefWriter.java:162)
        at com.cloudera.sqoop.hive.HiveImport.importTable(HiveImport.java:195)
        at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:394)
        at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:455)
        at com.cloudera.sqoop.Sqoop.run(Sqoop.java:146)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:182)
        at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:221)
        at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:230)
        at com.cloudera.sqoop.Sqoop.main(Sqoop.java:239)

        The reason is simple. In the following lines of code:

        {code}
        Integer colType = columnTypes.get(col);
        ...
        tring hiveColType = connManager.toHiveType(colType);
        {code}

        colType is null because column "i" does not exist in the table "foo" but "I" exists. Now toHiveType(int colType) tries to autocast a null to a primitive int, resulting a NPE.

        It would be better if more informative message is provided rather than a random NPE.
        To reproduce the error,

        1) Create a table "foo" with a column name "I" on Oracle DB
        2) Run sqoop import --connect jdbc:oracle:thin:@//localhost/xe --username **** --password **** --verbose --table foo --split-by i --columns i --hive-import

        This generates the following call stack:

        {code}
        12/05/01 16:12:00 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
        java.lang.NullPointerException
        at com.cloudera.sqoop.hive.TableDefWriter.getCreateTableStmt(TableDefWriter.java:162)
        at com.cloudera.sqoop.hive.HiveImport.importTable(HiveImport.java:195)
        at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:394)
        at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:455)
        at com.cloudera.sqoop.Sqoop.run(Sqoop.java:146)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:182)
        at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:221)
        at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:230)
        at com.cloudera.sqoop.Sqoop.main(Sqoop.java:239)
        {code}

        The reason is simple. In the following lines of code:

        {code}
        Integer colType = columnTypes.get(col);
        ...
        tring hiveColType = connManager.toHiveType(colType);
        {code}

        colType is null because column "i" does not exist in the table "foo" but "I" exists. Now toHiveType(int colType) tries to autocast a null to a primitive int, resulting a NPE.

        It would be better if more informative message is provided rather than a random NPE.
        Cheolsoo Park created issue -

          People

          • Assignee:
            Cheolsoo Park
            Reporter:
            Cheolsoo Park
          • Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development