Uploaded image for project: 'Sqoop'
  1. Sqoop
  2. SQOOP-3147

Import data to Hive Table in S3 in Parquet format

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.4.6
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Using this command succeeds only if the Hive Table's location is HDFS. If the table is backed by S3 it throws an exception while trying to move the data from HDFS tmp directory to S3

      Job job_1486539699686_3090 failed with state FAILED due to: Job commit failed: org.kitesdk.data.DatasetIOException: Dataset merge failed
      at org.kitesdk.data.spi.filesystem.FileSystemDataset.merge(FileSystemDataset.java:333)
      at org.kitesdk.data.spi.filesystem.FileSystemDataset.merge(FileSystemDataset.java:56)
      at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$MergeOutputCommitter.commitJob(DatasetKeyOutputFormat.java:370)
      at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:285)
      at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.io.IOException: Dataset merge failed during rename of hdfs://hdfs-path/tmp/dev_kamal/.temp/job_1486539699686_3090/mr/job_1486539699686_3090/0192f987-bd4c-4cb7-836f-562ac483e008.parquet to s3://bucket_name/dev_kamal/address/0192f987-bd4c-4cb7-836f-562ac483e008.parquet
      at org.kitesdk.data.spi.filesystem.FileSystemDataset.merge(FileSystemDataset.java:329)
      ... 7 more

      sqoop import --connect "jdbc:mysql://connectionUrl" --table "tableName" --as-parquetfile --verbose --username=uname --password=pass --hive-import --delete-target-dir --hive-database dev_kamal --hive-table tableName --hive-overwrite -m 150

      Another issue that I noticed is that Sqoop loads the Avro schema in TBLProperties under avro.schema.literal attribute and if the table has a lot of columns, the schema would be truncated and this would cause a weird exception like this one.
      Exception :

      17/03/07 12:13:13 INFO hive.metastore: Trying to connect to metastore with URI thrift://url:9083
      17/03/07 12:13:13 INFO hive.metastore: Opened a connection to metastore, current connections: 1
      17/03/07 12:13:13 INFO hive.metastore: Connected to metastore.
      17/03/07 12:13:17 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@3e9b1010
      17/03/07 12:13:17 ERROR sqoop.Sqoop: Got exception running Sqoop: org.apache.avro.SchemaParseException: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: was expecting closing quote for a string value
      at [Source: java.io.StringReader@3fb42ec7; line: 1, column: 6001]
      org.apache.avro.SchemaParseException: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: was expecting closing quote for a string value
      at [Source: java.io.StringReader@3fb42ec7; line: 1, column: 6001]
      at org.apache.avro.Schema$Parser.parse(Schema.java:929)
      at org.apache.avro.Schema$Parser.parse(Schema.java:917)
      at org.kitesdk.data.DatasetDescriptor$Builder.schemaLiteral(DatasetDescriptor.java:475)
      at org.kitesdk.data.spi.hive.HiveUtils.descriptorForTable(HiveUtils.java:154)
      at org.kitesdk.data.spi.hive.HiveAbstractMetadataProvider.load(HiveAbstractMetadataProvider.java:104)
      at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:192)
      at org.kitesdk.data.Datasets.load(Datasets.java:108)
      at org.kitesdk.data.Datasets.load(Datasets.java:165)
      at org.kitesdk.data.Datasets.load(Datasets.java:187)
      at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:78)
      at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:108)
      at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
      at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
      at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
      at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
      at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
      at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
      at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
      at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
      at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
      at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
      Caused by: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: was expecting closing quote for a string value
      at [Source: java.io.StringReader@3fb42ec7; line: 1, column: 6001]
      at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433)
      at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521)
      at org.codehaus.jackson.impl.JsonParserMinimalBase._reportInvalidEOF(JsonParserMinimalBase.java:454)
      at org.codehaus.jackson.impl.ReaderBasedParser._finishString2(ReaderBasedParser.java:1342)
      at org.codehaus.jackson.impl.ReaderBasedParser._finishString(ReaderBasedParser.java:1330)
      at org.codehaus.jackson.impl.ReaderBasedParser.getText(ReaderBasedParser.java:200)
      at org.codehaus.jackson.map.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:203)
      at org.codehaus.jackson.map.deser.std.BaseNodeDeserializer.deserializeArray(JsonNodeDeserializer.java:224)
      at org.codehaus.jackson.map.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:200)
      at org.codehaus.jackson.map.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:58)
      at org.codehaus.jackson.map.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:15)
      at org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:2704)
      at org.codehaus.jackson.map.ObjectMapper.readTree(ObjectMapper.java:1344)
      at org.apache.avro.Schema$Parser.parse(Schema.java:927)
      ... 21 more

        Attachments

          Activity

            People

            • Assignee:
              sanysandish@gmail.com Sandish Kumar HN
              Reporter:
              akamal Ahmed Kamal
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: