Sqoop
  1. Sqoop
  2. SQOOP-1283

Export doesn't detect Avro files without .avro extension (ie created by Hive)

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 1.4.3
    • Fix Version/s: None
    • Labels:
      None
    • Environment:

      CDH 4.5

      Description

      Exporting to PostgreSQL, Sqoop doesn't detect Avro files properly if they don't have the .avro extension (ie they are called 000000_0 in HDFS as they were created by Hive) and falls back to unknown file type in the code, which then attempts to use Text export mapper which fails with a parse exception:

      java.io.IOException: Can't export data, please check failed map task logs
      at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
      at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
      at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
      at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
      at org.apache.hadoop.mapred.Child.main(Child.java:262)
      Caused by: java.lang.RuntimeException: Can't parse input data: 'Objavro.codecdeflateavro.schema�{"type":"record","name":"<scrubbed>","namespace":"<scrubbed>.avro","fields":[{"name":"pane
      14/02/03 17:13:52 INFO mapred.JobClient: Task Id : attempt_201312101527_93532_m_000000_0, Status : FAILED
      java.io.IOException: Can't export data, please check failed map task logs
      at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
      at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
      at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
      at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
      at org.apache.hadoop.mapred.Child.main(Child.java:262)

      Thanks

      Hari Sekhon
      http://www.linkedin.com/in/harisekhon

        Issue Links

          Activity

          Hide
          Hari Sekhon added a comment -

          I see you raised the other ticket an hour earlier and have already posted a patch, great work as usual Harsh.

          Thanks

          Hari

          Show
          Hari Sekhon added a comment - I see you raised the other ticket an hour earlier and have already posted a patch, great work as usual Harsh. Thanks Hari
          Hide
          Harsh J added a comment -

          Closing as dupe of SQOOP-1282.

          P.s. Setting "hive.output.file.extension" to ".avro" will let you get .avro extensions in Hive produced files, as added by HIVE-2457.

          Show
          Harsh J added a comment - Closing as dupe of SQOOP-1282 . P.s. Setting "hive.output.file.extension" to ".avro" will let you get .avro extensions in Hive produced files, as added by HIVE-2457 .

            People

            • Assignee:
              Unassigned
              Reporter:
              Hari Sekhon
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development