Uploaded image for project: 'Sqoop'
  1. Sqoop
  2. SQOOP-3123

Import from oracle using oraoop with map-column-java to avro fails if special characters encounter in table name or column name

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.6, 1.4.7
    • Fix Version/s: 1.4.7
    • Component/s: codegen
    • Labels:
      None
    • Flags:
      Patch

      Description

      I'm trying to import data from oracle to avro using oraoop.

      My table:

      CREATE TABLE "IBS"."BRITISH#CATS"
      (    "ID" NUMBER,
           "C_CODE" VARCHAR2(10),
           "C_USE_START#DATE" DATE,
           "C_USE_USE#NEXT_DAY" VARCHAR2(1),
           "C_LIM_MIN#DAT" DATE,
           "C_LIM_MIN#TIME" TIMESTAMP,
           "C_LIM_MIN#SUM" NUMBER,
           "C_OWNCODE" VARCHAR2(1),
           "C_LIMIT#SUM_LIMIT" NUMBER(17,2),
           "C_L@M" NUMBER(17,2),
           "C_1_THROW" NUMBER NOT NULL ENABLE,
           "C_#_LIMITS" NUMBER NOT NULL ENABLE
      ) SEGMENT CREATION IMMEDIATE
      PCTFREE 70 PCTUSED 40 INITRANS 2 MAXTRANS 255
      NOCOMPRESS LOGGING
      STORAGE(INITIAL 2097152 NEXT 524288 MINEXTENTS 1 MAXEXTENTS 2147483645
      PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
      BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
      TABLESPACE "WORK" ;
      

      My first script is:

      ./sqoop import \                                                                                                  
        -Doraoop.timestamp.string=false \
        --direct \
        --connect jdbc:oracle:thin:@localhost:49161:XE \
        --username system \
        --password oracle \
        --table IBS.BRITISH#CATS \
        --target-dir /Users/Dmitry/Developer/Java/sqoop/bin/imported \
        --as-avrodatafile \
        --map-column-java ID=String,C_CODE=String,C_USE_START#DATE=String,C_USE_USE#NEXT_DAY=String,C_LIM_MIN#DAT=String,C_LIM_MIN#TIME=String,C_LIM_MIN#SUM=String,C_OWNCODE=String,C_LIMIT#SUM_LIMIT=String,C_L_M=String,C_1_THROW=String,C_#_LIMITS=String
      

      fails with

      2017-01-13 16:11:21,348 ERROR [main] tool.ImportTool (ImportTool.java:run(625)) - Import failed: No column by the name C_LIMIT#SUM_LIMITfound while importing data; expecting one of [C_LIMIT_SUM_LIMIT, C_OWNCODE, C_L_M, C___LIMITS, C_LIM_MIN_DAT, C_1_THROW, C_CODE, C_USE_START_DATE, C_LIM_MIN_SUM, ID, C_LIM_MIN_TIME, C_USE_USE_NEXT_DAY]
      

      After i've found that sqoop has replaced all special characters with underscore. My second script is:

      ./sqoop import \                                                                                                  
        -D oraoop.timestamp.string=false \
        --direct \
        --connect jdbc:oracle:thin:@localhost:49161:XE \
        --username system \
        --password oracle \
        --table IBS.BRITISH#CATS \
        --target-dir /Users/Dmitry/Developer/Java/sqoop/bin/imported \
        --as-avrodatafile \
        --map-column-java ID=String,C_CODE=String,C_USE_START_DATE=String,C_USE_USE_NEXT_DAY=String,C_LIM_MIN_DAT=String,C_LIM_MIN_TIME=String,C_LIM_MIN_SUM=String,C_OWNCODE=String,C_LIMIT_SUM_LIMIT=String,C_L_M=String,C_1_THROW=String,C___LIMITS=String \
        --verbose
      

      Fails with: Caused by: org.apache.avro.UnresolvedUnionException: Not in union ["null","long"]: 2017-01-13 11:22:53.0

      2017-01-13 16:14:54,687 WARN  [Thread-26] mapred.LocalJobRunner (LocalJobRunner.java:run(560)) - job_local1372531461_0001
      java.lang.Exception: org.apache.avro.file.DataFileWriter$AppendWriteException: org.apache.avro.UnresolvedUnionException: Not in union ["null","long"]: 2017-01-13 11:22:53.0
      	at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
      	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
      Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException: org.apache.avro.UnresolvedUnionException: Not in union ["null","long"]: 2017-01-13 11:22:53.0
      	at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:308)
      	at org.apache.sqoop.mapreduce.AvroOutputFormat$1.write(AvroOutputFormat.java:112)
      	at org.apache.sqoop.mapreduce.AvroOutputFormat$1.write(AvroOutputFormat.java:108)
      	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:655)
      	at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
      	at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
      	at org.apache.sqoop.mapreduce.AvroImportMapper.map(AvroImportMapper.java:73)
      	at org.apache.sqoop.mapreduce.AvroImportMapper.map(AvroImportMapper.java:39)
      	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
      	at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
      	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
      	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: org.apache.avro.UnresolvedUnionException: Not in union ["null","long"]: 2017-01-13 11:22:53.0
      	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:709)
      	at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:192)
      	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:110)
      	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
      	at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:150)
      	at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:153)
      	at org.apache.avro.specific.SpecificDatumWriter.writeField(SpecificDatumWriter.java:90)
      	at org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:182)
      	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:143)
      	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:105)
      	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
      	at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:150)
      	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:60)
      	at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:302)
      	... 17 more
      

      I've found that old problem and oraoop.timestamp.string=false must solve it, but it does not.

      What do you think?
      Also please assign this problem to me.

        Attachments

        1. SQOOP_3123.patch
          4 kB
          Dmitry Zagorulkin

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                hddimon Dmitry Zagorulkin
              • Votes:
                2 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: