Uploaded image for project: 'Sqoop'
  1. Sqoop
  2. SQOOP-3014

Sqoop with HCatalog import loose precision for large numbers that does not fit into double

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.4.6
    • Fix Version/s: 1.5.0
    • Component/s: hive-integration
    • Labels:
      None

      Description

      When using sqoop with HCatalog to import data from JDBC (I have tried Oracle11) all numbers that does not fit into double are loosing its precision or are distorted.

      Steps to reproduce:

      1) Create test table in Oracle and fill it with test data

      CREATE TABLE TEST_SQOOP_ERROR(ID VARCHAR(10), TYPE_NUMBER DECIMAL(22,5))
       
      INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-1', 454018528782.42006329)
      INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-2', 87658675864540185.123456789123456789) 
      INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-3', 87658675864540185.12345)
      INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-4', 87658675864540185.123)
      INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-5', 7658675864540185.12345) 
      INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-6', 7658675864540185.123456789) 
      INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-7', 658675864540185.12345)
      INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-8', 58675864540185.12345)
      INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-9', 8675864540185.12345) 
      INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-10', 675864540185.12345) 
      INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-11', 75864540185.12345) 
      INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-12', 35864540185.12345) 
      INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-13', 5864540185.12345) 
      

      2) Create table in Hive database

      CREATE TABLE pbe_test_sqoop_error(id string, type_number decimal(22,5)) STORED AS ORC;
      

      3) Import data from Oracle to Hive using sqoop

      export HADOOP_CLASSPATH=/opt/mantis/jdbc/oracle-11.2/ojdbc6.jar
      sqoop import -connect jdbc:oracle:thin:@//1.1.1.1:1521/XE --username  XXX --password  XXX --hcatalog-database  default --hcatalog-table pbe_test_sqoop_error  --driver oracle.jdbc.OracleDriver  --query 'SELECT id, type_number FROM MMDINGEST.TEST_SQOOP_ERROR WHERE $CONDITIONS' -m 1
      

      4) Display data from Hive table

      hive> select * from pbe_test_sqoop_error;
      OK
      row-1   454018528782.42004
      row-2   87658675864540192
      row-3   87658675864540192
      row-4   87658675864540192
      row-5   7658675864540185
      row-6   7658675864540185
      row-7   658675864540185.125
      row-8   58675864540185.125
      row-9   8675864540185.12305
      row-10  675864540185.12341
      row-11  75864540185.12344
      row-12  35864540185.12345
      row-13  5864540185.12345
      Time taken: 0.455 seconds, Fetched: 13 row(s)
      

      Only the values at line 1, 12, 13 are correct. At the lines 2-4 even the part of the number before dot is wrong. All looks correctly in Oracle as can be seen on the attached screenshot.

      The problem seems to be in the java class https://www.codatlas.com/github.com/apache/sqoop/branch-1.4.6/src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatImportHelper.java starting at line 437.

          } else if (hfsType == HCatFieldSchema.Type.DECIMAL) {
            BigDecimal bd = new BigDecimal(n.doubleValue(),
              MathContext.DECIMAL128);
            return HiveDecimal.create(bd);
          }
      

      all numbers, even those that are stored in BigDecimal are squeezed through double which leads to the precision lost The same issue could be at some places when working with large numbers.

      The following code fixes this issue:

      } else if (hfsType == HCatFieldSchema.Type.DECIMAL) {
            BigDecimal bd = val instanceof BigDecimal ?  (BigDecimal) val : new BigDecimal(n.doubleValue(), MathContext.DECIMAL128);
            return HiveDecimal.create(bd);
          }
      

      .

        Attachments

        1. SQOOP-3014.patch
          6 kB
          Zoltán Tóth
        2. oracle-sqoop-error.png
          53 kB
          Pavel Benes

          Issue Links

            Activity

              People

              • Assignee:
                Swank Zoltán Tóth
                Reporter:
                benesp Pavel Benes
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: