Sqoop
  1. Sqoop
  2. SQOOP-1283

Export doesn't detect Avro files without .avro extension (ie created by Hive)

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 1.4.3
    • Fix Version/s: None
    • Labels:
      None
    • Environment:

      CDH 4.5

      Description

      Exporting to PostgreSQL, Sqoop doesn't detect Avro files properly if they don't have the .avro extension (ie they are called 000000_0 in HDFS as they were created by Hive) and falls back to unknown file type in the code, which then attempts to use Text export mapper which fails with a parse exception:

      java.io.IOException: Can't export data, please check failed map task logs
      at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
      at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
      at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
      at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
      at org.apache.hadoop.mapred.Child.main(Child.java:262)
      Caused by: java.lang.RuntimeException: Can't parse input data: 'Objavro.codecdeflateavro.schema�{"type":"record","name":"<scrubbed>","namespace":"<scrubbed>.avro","fields":[{"name":"pane
      14/02/03 17:13:52 INFO mapred.JobClient: Task Id : attempt_201312101527_93532_m_000000_0, Status : FAILED
      java.io.IOException: Can't export data, please check failed map task logs
      at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
      at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
      at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
      at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
      at org.apache.hadoop.mapred.Child.main(Child.java:262)

      Thanks

      Hari Sekhon
      http://www.linkedin.com/in/harisekhon

        Issue Links

          Activity

          Hari Sekhon created issue -
          Hari Sekhon made changes -
          Field Original Value New Value
          Component/s hive-integration [ 12315203 ]
          Harsh J made changes -
          Link This issue duplicates SQOOP-1282 [ SQOOP-1282 ]
          Harsh J made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Duplicate [ 3 ]
          Hari Sekhon made changes -
          Comment [ Thanks Harsh! I'd prefer if Sqoop did the detection regardless of the file extension... it's one less thing for users to worry about. If you've already got the backing files without .avro then having to transform a large table is annoying...

          EDIT: I see you have posted a patch to do just that, thanks! ]

            People

            • Assignee:
              Unassigned
              Reporter:
              Hari Sekhon
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development