Uploaded image for project: 'Sqoop (Retired)'
  1. Sqoop (Retired)
  2. SQOOP-3182

Sqoop1 (import + --incremental + --merge-key + --as-parquetfile) fails with (Can't parse input data: 'PAR1')

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • no-release
    • None
    • None

    Description

      Sqoop1 (import + --incremental + --merge-key + --as-parquet) fails with (Can't parse input data: 'PAR1'). See test case below.

      Test Case

      #################
      # STEP 01 - Create Table and Data
      #################
      
      export MYCONN=jdbc:oracle:thin:@oracle.sqoop.com:1521/db11g;
      export MYUSER=sqoop
      export MYPSWD=sqoop
      
      sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1"
      sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1 (c1 int, c2 date, c3 varchar(10), c4 timestamp)"
      sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (1, sysdate, 'NEW ROW 1', sysdate)"
      sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1"
      
      Output:
      -------------------------------------------------------------
      | C1                   | C2      | C3         | C4          | 
      -------------------------------------------------------------
      | 1                    | 2017-05-06 06:59:02.0 | NEW ROW 1  | 2017-05-06 06:59:02 | 
      -------------------------------------------------------------
      
      #################
      # STEP 02 - Import Data into HDFS 
      #################
      
      hdfs dfs -rm -r /user/root/t1
      sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 --target-dir /user/root/t1 --incremental lastmodified --check-column C4 --merge-key C1 --last-value '2017-01-01 00:00:00.0' --as-parquetfile --map-column-java C2=String,C4=String --num-mappers 1 --verbose 
      hdfs dfs -ls /user/root/t1/*.parquet
      parquet-tools cat --json 'hdfs://namenode/user/root/t1/b65c1ca5-c8f0-44c6-8c60-8ee83161347f.parquet'
      
      Output:
      17/05/06 07:01:34 INFO mapreduce.ImportJobBase: Transferred 2.627 KB in 23.6174 seconds (113.8988 bytes/sec)
      17/05/06 07:01:34 INFO mapreduce.ImportJobBase: Retrieved 1 records.
      17/05/06 07:01:34 INFO tool.ImportTool:   --last-value 2017-05-06 07:01:09.0
      ~~~~~
      -rw-r--r--   3 root root       1144 2017-05-06 07:01 /user/root/t1/b65c1ca5-c8f0-44c6-8c60-8ee83161347f.parquet
      ~~~~~
      {"C1":"1","C2":"2017-05-06 06:59:02.0","C3":"NEW ROW 1","C4":"2017-05-06 06:59:02"}
      
      #################
      # STEP 03 - Insert New Row and Update Existing Row
      #################
      
      sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (2, sysdate, 'NEW ROW 2', sysdate)"
      sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "update t1 set c3 = 'UPDATE 1', c4 = sysdate where c1 = 1"
      sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1 order by c1"
      
      Output:
      -------------------------------------------------------------
      | C1                   | C2      | C3         | C4          | 
      -------------------------------------------------------------
      | 1                    | 2017-05-06 06:59:02.0 | UPDATE 1   | 2017-05-06 07:04:40 | 
      | 2                    | 2017-05-06 07:04:38.0 | NEW ROW 2  | 2017-05-06 07:04:38 | 
      -------------------------------------------------------------
      
      #################
      # STEP 04 - Import Data into HDFS and Merge changes 
      #################
      
      sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 --target-dir /user/root/t1 --incremental lastmodified --check-column C4 --merge-key C1 --last-value '2017-05-06 07:01:09.0' --as-parquetfile --map-column-java C2=String,C4=String --num-mappers 1 --verbose 
      
      Output:
      17/05/06 07:06:43 INFO mapreduce.ImportJobBase: Transferred 2.6611 KB in 27.4934 seconds (99.1148 bytes/sec)
      17/05/06 07:06:43 INFO mapreduce.ImportJobBase: Retrieved 2 records.
      17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Restoring classloader: java.net.FactoryURLClassLoader@121fdcee
      17/05/06 07:06:43 INFO tool.ImportTool: Final destination exists, will run merge job.
      17/05/06 07:06:43 DEBUG tool.ImportTool: Using temporary folder: 4bc6b65cd0194b81938f4660974ee392_T1
      17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Checking for existing class: T1
      17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Attempting to load jar through URL: jar:file:/tmp/sqoop-root/compile/6ed24910abcbc6ea38a1963bfce9a92d/codegen_T1.jar!/
      17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Previous classloader is java.net.FactoryURLClassLoader@121fdcee
      17/05/06 07:06:43 DEBUG util.ClassLoaderStack: Testing class in jar: T1
      17/05/06 07:06:43 ERROR tool.ImportTool: Import failed: java.io.IOException: Could not load jar /tmp/sqoop-root/compile/6ed24910abcbc6ea38a1963bfce9a92d/codegen_T1.jar into JVM. (Could not find class T1.)
      	at org.apache.sqoop.util.ClassLoaderStack.addJarFile(ClassLoaderStack.java:92)
      	at com.cloudera.sqoop.util.ClassLoaderStack.addJarFile(ClassLoaderStack.java:36)
      	at org.apache.sqoop.tool.ImportTool.loadJars(ImportTool.java:120)
      	at org.apache.sqoop.tool.ImportTool.lastModifiedMerge(ImportTool.java:456)
      	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:522)
      	at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621)
      	at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
      	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
      	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
      	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
      	at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
      Caused by: java.lang.ClassNotFoundException: T1
      
      #################
      # STEP 05 - Generate Java Class (using codegen)
      #################
      
      hdfs dfs -ls /user/root/t1/*.parquet
      rm -rf /root/sqoop/jobs/* /root/sqoop/bindir/*
      sqoop codegen --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 --map-column-java C2=String,C4=String --class-name T1 --bindir /root/sqoop/bindir/
      ls -l /root/sqoop/jobs/T1* /root/sqoop/bindir/T1*
      
      Output:
      -rw-r--r--   3 root root       1144 2017-05-06 07:01 /user/root/t1/b65c1ca5-c8f0-44c6-8c60-8ee83161347f.parquet
      ~~~~~
      -rw-r--r-- 1 root root   562 May  6 07:35 /root/sqoop/bindir/T1$1.class
      -rw-r--r-- 1 root root   550 May  6 07:35 /root/sqoop/bindir/T1$2.class
      -rw-r--r-- 1 root root   550 May  6 07:35 /root/sqoop/bindir/T1$3.class
      -rw-r--r-- 1 root root   550 May  6 07:35 /root/sqoop/bindir/T1$4.class
      -rw-r--r-- 1 root root 10891 May  6 07:35 /root/sqoop/bindir/T1.class
      -rw-r--r-- 1 root root   207 May  6 07:35 /root/sqoop/bindir/T1$FieldSetterCommand.class
      -rw-r--r-- 1 root root  6563 May  6 07:35 /root/sqoop/bindir/T1.jar
      -rw-r--r-- 1 root root 14170 May  6 07:35 /root/sqoop/jobs/T1.java
      
      #################
      # STEP 06 - Import Data into HDFS and Merge changes
      #################
      
      hdfs dfs -ls /user/root/t1/*.parquet
      hdfs dfs -text /user/root/t1/*.parquet
      sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 --target-dir /user/root/t1 --incremental lastmodified --check-column C4 --merge-key C1 --last-value '2017-05-06 07:01:09.0' --as-parquetfile --map-column-java C2=String,C4=String --num-mappers 1 --verbose --jar-file /root/sqoop/bindir/T1.jar --class-name T1
      
      Output:
      -rw-r--r--   3 root root       1144 2017-04-11 13:35 /user/root/t1/54614794-b970-4e74-a0d7-669105f29a2b.parquet
      ~~~~~
      PAR111
            (1>B2017-05-06 06:59:02.02017-05-06 06:59:02.0x2017-05-06 06:59:02.0&*,   NEW ROW 1	NEW ROW 1H	NEW ROW 1:>2017-05-06 06:59:022017-05-06 06:59:02p2017-05-06 06:59:02\HT1
      <SNIP>      
      ~~~~~
      17/05/06 07:44:31 INFO mapreduce.ImportJobBase: Transferred 2.6611 KB in 25.6244 seconds (106.3442 bytes/sec)
      17/05/06 07:44:31 INFO mapreduce.ImportJobBase: Retrieved 2 records.
      ~~~~~
      17/05/06 07:44:57 INFO mapreduce.Job: Task Id : attempt_1491569377351_0244_m_000001_0, Status : FAILED
      Error: java.lang.RuntimeException: Can't parse input data: 'PAR1'
      	at T1.__loadFromFields(T1.java:365)
      	at T1.parse(T1.java:298)
      	at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:53)
      	at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:34)
      	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
      	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
      	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
      	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
      Caused by: java.lang.NumberFormatException
      	at java.math.BigDecimal.<init>(BigDecimal.java:470)
      	at java.math.BigDecimal.<init>(BigDecimal.java:739)
      	at T1.__loadFromFields(T1.java:347)
      

      Attachments

        Issue Links

          Activity

            People

              sanysandish@gmail.com Sandish Kumar HN
              markuskemper@me.com Markus Kemper
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: