Sqoop
  1. Sqoop
  2. SQOOP-934

JDBC Connection can timeout after import but before hive import

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.2
    • Fix Version/s: 1.4.4
    • Component/s: None
    • Labels:
      None

      Description

      Our current import rutine imports data into HDFS and then tries to do Hive import. As the connection to the remote server is opened only once at the begging it might timeout during very long mapreduce job. I believe that we should ensure that the connection is still valid before performing the hive import.

      1. SQOOP-934-2.patch
        3 kB
        Raghav Kumar Gautam
      2. SQOOP-934.patch
        3 kB
        Raghav Kumar Gautam

        Issue Links

          Activity

          Hide
          Raghav Kumar Gautam added a comment -

          This patch discards the connection prior to hive script generation. So, a fresh connection will be used for generating hive script.

          Show
          Raghav Kumar Gautam added a comment - This patch discards the connection prior to hive script generation. So, a fresh connection will be used for generating hive script.
          Hide
          Venkat Ranganathan added a comment -

          Raghav, can you please create a review board link? Also, catching the exception from close if any is a good idea (if the connection is stale, this will force the next getConnection to return a valid connection as this will guarantee that we null out the connection variable.

          Show
          Venkat Ranganathan added a comment - Raghav, can you please create a review board link? Also, catching the exception from close if any is a good idea (if the connection is stale, this will force the next getConnection to return a valid connection as this will guarantee that we null out the connection variable.
          Hide
          Raghav Kumar Gautam added a comment -

          Link for review request:
          https://reviews.apache.org/r/11940/

          Show
          Raghav Kumar Gautam added a comment - Link for review request: https://reviews.apache.org/r/11940/
          Hide
          Raghav Kumar Gautam added a comment -

          Attaching updated patch.

          Show
          Raghav Kumar Gautam added a comment - Attaching updated patch.
          Hide
          ASF subversion and git services added a comment -

          Commit 217e2af3f8aa0329d370240749afd01b1e36d1db in branch refs/heads/trunk from Jarek Jarcec Cecho
          [ https://git-wip-us.apache.org/repos/asf?p=sqoop.git;h=217e2af ]

          SQOOP-934: JDBC Connection can timeout after import but before hive import

          (Raghav Kumar Gautam via Jarek Jarcec Cecho)

          Show
          ASF subversion and git services added a comment - Commit 217e2af3f8aa0329d370240749afd01b1e36d1db in branch refs/heads/trunk from Jarek Jarcec Cecho [ https://git-wip-us.apache.org/repos/asf?p=sqoop.git;h=217e2af ] SQOOP-934 : JDBC Connection can timeout after import but before hive import (Raghav Kumar Gautam via Jarek Jarcec Cecho)
          Hide
          Jarek Jarcec Cecho added a comment -

          Thank you Raghav for your contribution!

          Show
          Jarek Jarcec Cecho added a comment - Thank you Raghav for your contribution!
          Hide
          Hudson added a comment -

          Integrated in Sqoop-ant-jdk-1.6-hadoop23 #947 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop23/947/)
          SQOOP-934: JDBC Connection can timeout after import but before hive import (Revision 217e2af3f8aa0329d370240749afd01b1e36d1db)

          Result = SUCCESS
          jarcec : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=217e2af3f8aa0329d370240749afd01b1e36d1db
          Files :

          • src/java/org/apache/sqoop/manager/GenericJdbcManager.java
          • src/java/org/apache/sqoop/manager/ConnManager.java
          • src/java/org/apache/sqoop/hive/HiveImport.java
          Show
          Hudson added a comment - Integrated in Sqoop-ant-jdk-1.6-hadoop23 #947 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop23/947/ ) SQOOP-934 : JDBC Connection can timeout after import but before hive import (Revision 217e2af3f8aa0329d370240749afd01b1e36d1db) Result = SUCCESS jarcec : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=217e2af3f8aa0329d370240749afd01b1e36d1db Files : src/java/org/apache/sqoop/manager/GenericJdbcManager.java src/java/org/apache/sqoop/manager/ConnManager.java src/java/org/apache/sqoop/hive/HiveImport.java
          Hide
          Hudson added a comment -

          Integrated in Sqoop-ant-jdk-1.6-hadoop20 #741 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop20/741/)
          SQOOP-934: JDBC Connection can timeout after import but before hive import (Revision 217e2af3f8aa0329d370240749afd01b1e36d1db)

          Result = SUCCESS
          jarcec : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=217e2af3f8aa0329d370240749afd01b1e36d1db
          Files :

          • src/java/org/apache/sqoop/manager/GenericJdbcManager.java
          • src/java/org/apache/sqoop/hive/HiveImport.java
          • src/java/org/apache/sqoop/manager/ConnManager.java
          Show
          Hudson added a comment - Integrated in Sqoop-ant-jdk-1.6-hadoop20 #741 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop20/741/ ) SQOOP-934 : JDBC Connection can timeout after import but before hive import (Revision 217e2af3f8aa0329d370240749afd01b1e36d1db) Result = SUCCESS jarcec : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=217e2af3f8aa0329d370240749afd01b1e36d1db Files : src/java/org/apache/sqoop/manager/GenericJdbcManager.java src/java/org/apache/sqoop/hive/HiveImport.java src/java/org/apache/sqoop/manager/ConnManager.java
          Hide
          Hudson added a comment -

          Integrated in Sqoop-ant-jdk-1.6-hadoop200 #756 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop200/756/)
          SQOOP-934: JDBC Connection can timeout after import but before hive import (Revision 217e2af3f8aa0329d370240749afd01b1e36d1db)

          Result = SUCCESS
          jarcec : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=217e2af3f8aa0329d370240749afd01b1e36d1db
          Files :

          • src/java/org/apache/sqoop/manager/ConnManager.java
          • src/java/org/apache/sqoop/hive/HiveImport.java
          • src/java/org/apache/sqoop/manager/GenericJdbcManager.java
          Show
          Hudson added a comment - Integrated in Sqoop-ant-jdk-1.6-hadoop200 #756 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop200/756/ ) SQOOP-934 : JDBC Connection can timeout after import but before hive import (Revision 217e2af3f8aa0329d370240749afd01b1e36d1db) Result = SUCCESS jarcec : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=217e2af3f8aa0329d370240749afd01b1e36d1db Files : src/java/org/apache/sqoop/manager/ConnManager.java src/java/org/apache/sqoop/hive/HiveImport.java src/java/org/apache/sqoop/manager/GenericJdbcManager.java
          Hide
          Hudson added a comment -

          Integrated in Sqoop-ant-jdk-1.6-hadoop100 #722 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop100/722/)
          SQOOP-934: JDBC Connection can timeout after import but before hive import (Revision 217e2af3f8aa0329d370240749afd01b1e36d1db)

          Result = SUCCESS
          jarcec : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=217e2af3f8aa0329d370240749afd01b1e36d1db
          Files :

          • src/java/org/apache/sqoop/manager/GenericJdbcManager.java
          • src/java/org/apache/sqoop/manager/ConnManager.java
          • src/java/org/apache/sqoop/hive/HiveImport.java
          Show
          Hudson added a comment - Integrated in Sqoop-ant-jdk-1.6-hadoop100 #722 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop100/722/ ) SQOOP-934 : JDBC Connection can timeout after import but before hive import (Revision 217e2af3f8aa0329d370240749afd01b1e36d1db) Result = SUCCESS jarcec : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=217e2af3f8aa0329d370240749afd01b1e36d1db Files : src/java/org/apache/sqoop/manager/GenericJdbcManager.java src/java/org/apache/sqoop/manager/ConnManager.java src/java/org/apache/sqoop/hive/HiveImport.java
          Hide
          Andrey Dmitriev added a comment -

          Hi,

          I'm sorry if this is wrong place to write, but we are using Sqoop v1.4.4

          14/05/27 13:49:14 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.0.0
          Sqoop 1.4.4-cdh5.0.0
          git commit id 8e266e052e423af592871e2dfe09d54c03f6a0e8

          And when I importing a table from Oracle which takes more than 1 hour to extract, I'm getting following error message at the stage when Sqoop tries to import data from temporary HDFS location to Hive:

          14/05/27 13:05:51 INFO mapreduce.ImportJobBase: Transferred 47.2606 GB in 6,389.4644 seconds (6.7206 MB/sec)
          14/05/27 13:05:51 INFO mapreduce.ImportJobBase: Retrieved 98235461 records.
          14/05/27 13:05:51 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@566d0085
          14/05/27 13:05:51 DEBUG hive.HiveImport: Hive.inputTable: WAREHOUSE.MY_BIG_TABLE
          14/05/27 13:05:51 DEBUG hive.HiveImport: Hive.outputTable: WAREHOUSE.MY_BIG_TABLE
          14/05/27 13:05:51 DEBUG manager.OracleManager: Using column names query: SELECT t.* FROM WAREHOUSE.MY_BIG_TABLE t WHERE 1=0
          14/05/27 13:05:51 DEBUG manager.SqlManager: Execute getColumnTypesRawQuery : SELECT t.* FROM WAREHOUSE.MY_BIG_TABLE t WHERE 1=0
          14/05/27 13:05:51 ERROR manager.SqlManager: Error executing statement: java.sql.SQLException: ORA-02396: exceeded maximum idle time, please connect again

          java.sql.SQLException: ORA-02396: exceeded maximum idle time, please connect again

          at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:447)
          at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:389)

          With small tables (under 1 hour) everything is fine.

          This problems looks exactly as described in this (SQOOP-934) issue.

          Show
          Andrey Dmitriev added a comment - Hi, I'm sorry if this is wrong place to write, but we are using Sqoop v1.4.4 14/05/27 13:49:14 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.0.0 Sqoop 1.4.4-cdh5.0.0 git commit id 8e266e052e423af592871e2dfe09d54c03f6a0e8 And when I importing a table from Oracle which takes more than 1 hour to extract, I'm getting following error message at the stage when Sqoop tries to import data from temporary HDFS location to Hive: 14/05/27 13:05:51 INFO mapreduce.ImportJobBase: Transferred 47.2606 GB in 6,389.4644 seconds (6.7206 MB/sec) 14/05/27 13:05:51 INFO mapreduce.ImportJobBase: Retrieved 98235461 records. 14/05/27 13:05:51 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@566d0085 14/05/27 13:05:51 DEBUG hive.HiveImport: Hive.inputTable: WAREHOUSE.MY_BIG_TABLE 14/05/27 13:05:51 DEBUG hive.HiveImport: Hive.outputTable: WAREHOUSE.MY_BIG_TABLE 14/05/27 13:05:51 DEBUG manager.OracleManager: Using column names query: SELECT t.* FROM WAREHOUSE.MY_BIG_TABLE t WHERE 1=0 14/05/27 13:05:51 DEBUG manager.SqlManager: Execute getColumnTypesRawQuery : SELECT t.* FROM WAREHOUSE.MY_BIG_TABLE t WHERE 1=0 14/05/27 13:05:51 ERROR manager.SqlManager: Error executing statement: java.sql.SQLException: ORA-02396: exceeded maximum idle time, please connect again java.sql.SQLException: ORA-02396: exceeded maximum idle time, please connect again at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:447) at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:389) With small tables (under 1 hour) everything is fine. This problems looks exactly as described in this ( SQOOP-934 ) issue.
          Hide
          Gwen Shapira added a comment -

          Andrey,

          Indeed, the patch does not solve the problem for Oracle.
          It fixes the issue for the genericJDBC connection manager, but unfortunately the OracleManager (which implements its own connection pool) was not fixed in this patch.

          Since this JIRA is already marked as resolved, it will be nice if you can open a new JIRA specifically for the Oracle case.

          As a work-around, I'd use Oraoop - which inherits from the generic connection manager (where the problem is resolved) without implementing its own connection pool.

          Show
          Gwen Shapira added a comment - Andrey, Indeed, the patch does not solve the problem for Oracle. It fixes the issue for the genericJDBC connection manager, but unfortunately the OracleManager (which implements its own connection pool) was not fixed in this patch. Since this JIRA is already marked as resolved, it will be nice if you can open a new JIRA specifically for the Oracle case. As a work-around, I'd use Oraoop - which inherits from the generic connection manager (where the problem is resolved) without implementing its own connection pool.
          Hide
          Andrey Dmitriev added a comment -

          Thank you Gwen for clarification. I opened new JIRA for the Oracle case SQOOP-1329

          Show
          Andrey Dmitriev added a comment - Thank you Gwen for clarification. I opened new JIRA for the Oracle case SQOOP-1329

            People

            • Assignee:
              Raghav Kumar Gautam
              Reporter:
              Jarek Jarcec Cecho
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development