Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.6
    • Component/s: tools
    • Labels:
      None

      Description

      Add supports for different compression codecs

      1. SQOOP-1391.patch
        6 kB
        Qian Xu
      2. SQOOP-1391.2.patch
        7 kB
        Qian Xu
      3. SQOOP-1391.3.patch
        6 kB
        Qian Xu

        Issue Links

          Activity

          Hide
          Qian Xu added a comment -

          Regard SQOOP-1390, the Parquet support requires Kite SDK. Currently Parquet files are created with Snappy code. Kite has planned to provide the possibility to specify compression codec in 0.17.0. See also https://issues.cloudera.org/browser/CDK-299

          Show
          Qian Xu added a comment - Regard SQOOP-1390 , the Parquet support requires Kite SDK. Currently Parquet files are created with Snappy code. Kite has planned to provide the possibility to specify compression codec in 0.17.0. See also https://issues.cloudera.org/browser/CDK-299
          Hide
          Pratik Khadloya added a comment -

          From my understanding, it wasn't compressed. Is there a way of verifying after the file is created?

          Show
          Pratik Khadloya added a comment - From my understanding, it wasn't compressed. Is there a way of verifying after the file is created?
          Hide
          Joey Echeverria added a comment -

          FYI, CDK-299 was committed upstream.

          The Parquet files are compressed with Snappy.

          Show
          Joey Echeverria added a comment - FYI, CDK-299 was committed upstream. The Parquet files are compressed with Snappy.
          Hide
          Jarek Jarcec Cecho added a comment -

          I'm seeing one test failure in TestHiveImport:

          Testcase: testNormalHiveImportAsParquet took 0.665 sec
          	Caused an ERROR
          Failure during job; return status 1
          java.io.IOException: Failure during job; return status 1
          	at com.cloudera.sqoop.testutil.ImportJobTestCase.runImport(ImportJobTestCase.java:228)
          	at com.cloudera.sqoop.hive.TestHiveImport.runImportTest(TestHiveImport.java:214)
          	at com.cloudera.sqoop.hive.TestHiveImport.testNormalHiveImportAsParquet(TestHiveImport.java:278)
          

          With stack trace:

          Got exception running Sqoop: java.lang.IllegalArgumentException: Missing Hive MetaStore connection URI
          java.lang.IllegalArgumentException: Missing Hive MetaStore connection URI
          	at org.kitesdk.data.spi.hive.MetaStoreUtil.<init>(MetaStoreUtil.java:78)
          	at org.kitesdk.data.spi.hive.HiveAbstractMetadataProvider.getMetaStoreUtil(HiveAbstractMetadataProvider.java:56)
          	at org.kitesdk.data.spi.hive.HiveAbstractMetadataProvider.resolveNamespace(HiveAbstractMetadataProvider.java:237)
          	at org.kitesdk.data.spi.hive.HiveManagedMetadataProvider.create(HiveManagedMetadataProvider.java:43)
          	at org.kitesdk.data.spi.hive.HiveManagedDatasetRepository.create(HiveManagedDatasetRepository.java:77)
          	at org.kitesdk.data.Datasets.create(Datasets.java:166)
          	at org.kitesdk.data.Datasets.create(Datasets.java:206)
          	at org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:102)
          	at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:89)
          	at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:112)
          	at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
          	at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:665)
          	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
          	at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
          	at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
          	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
          	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
          	at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:45)
          	at com.cloudera.sqoop.testutil.ImportJobTestCase.runImport(ImportJobTestCase.java:219)
          	at com.cloudera.sqoop.hive.TestHiveImport.runImportTest(TestHiveImport.java:214)
          	at com.cloudera.sqoop.hive.TestHiveImport.testNormalHiveImportAsParquet(TestHiveImport.java:278)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          	at java.lang.reflect.Method.invoke(Method.java:606)
          	at junit.framework.TestCase.runTest(TestCase.java:176)
          	at junit.framework.TestCase.runBare(TestCase.java:141)
          	at junit.framework.TestResult$1.protect(TestResult.java:122)
          	at junit.framework.TestResult.runProtected(TestResult.java:142)
          	at junit.framework.TestResult.run(TestResult.java:125)
          	at junit.framework.TestCase.run(TestCase.java:129)
          	at junit.framework.TestSuite.runTest(TestSuite.java:255)
          	at junit.framework.TestSuite.run(TestSuite.java:250)
          	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:532)
          	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1179)
          	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1030)
          

          Could you take a look Qian Xu?

          Show
          Jarek Jarcec Cecho added a comment - I'm seeing one test failure in TestHiveImport : Testcase: testNormalHiveImportAsParquet took 0.665 sec Caused an ERROR Failure during job; return status 1 java.io.IOException: Failure during job; return status 1 at com.cloudera.sqoop.testutil.ImportJobTestCase.runImport(ImportJobTestCase.java:228) at com.cloudera.sqoop.hive.TestHiveImport.runImportTest(TestHiveImport.java:214) at com.cloudera.sqoop.hive.TestHiveImport.testNormalHiveImportAsParquet(TestHiveImport.java:278) With stack trace: Got exception running Sqoop: java.lang.IllegalArgumentException: Missing Hive MetaStore connection URI java.lang.IllegalArgumentException: Missing Hive MetaStore connection URI at org.kitesdk.data.spi.hive.MetaStoreUtil.<init>(MetaStoreUtil.java:78) at org.kitesdk.data.spi.hive.HiveAbstractMetadataProvider.getMetaStoreUtil(HiveAbstractMetadataProvider.java:56) at org.kitesdk.data.spi.hive.HiveAbstractMetadataProvider.resolveNamespace(HiveAbstractMetadataProvider.java:237) at org.kitesdk.data.spi.hive.HiveManagedMetadataProvider.create(HiveManagedMetadataProvider.java:43) at org.kitesdk.data.spi.hive.HiveManagedDatasetRepository.create(HiveManagedDatasetRepository.java:77) at org.kitesdk.data.Datasets.create(Datasets.java:166) at org.kitesdk.data.Datasets.create(Datasets.java:206) at org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:102) at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:89) at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:112) at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260) at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:665) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:45) at com.cloudera.sqoop.testutil.ImportJobTestCase.runImport(ImportJobTestCase.java:219) at com.cloudera.sqoop.hive.TestHiveImport.runImportTest(TestHiveImport.java:214) at com.cloudera.sqoop.hive.TestHiveImport.testNormalHiveImportAsParquet(TestHiveImport.java:278) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:532) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1179) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1030) Could you take a look Qian Xu ?
          Hide
          Qian Xu added a comment -

          You should specify environment variable HCAT_HOME and HIVE_HOME.

          Show
          Qian Xu added a comment - You should specify environment variable HCAT_HOME and HIVE_HOME .
          Hide
          Jarek Jarcec Cecho added a comment -

          I was looking into this problem a bit and I think that the problem is not with HCAT_HOME and HIVE_HOME. Kite folks has added a protection in CDK-651 (commit) that will thrown an exception if local repository is used and at the same time property kite.allow.test-metastore is not set. I believe that this is our case and perhaps we should set that property prior calling Kite libraries Qian Xu?

          Show
          Jarek Jarcec Cecho added a comment - I was looking into this problem a bit and I think that the problem is not with HCAT_HOME and HIVE_HOME . Kite folks has added a protection in CDK-651 ( commit ) that will thrown an exception if local repository is used and at the same time property kite.allow.test-metastore is not set. I believe that this is our case and perhaps we should set that property prior calling Kite libraries Qian Xu ?
          Hide
          Qian Xu added a comment -

          Jarek Jarcec Cecho You're right. We should provide a hive-site.xml in setup stage with kite.allow.test-metastore=true.

          Show
          Qian Xu added a comment - Jarek Jarcec Cecho You're right. We should provide a hive-site.xml in setup stage with kite.allow.test-metastore=true .
          Hide
          Jarek Jarcec Cecho added a comment -

          I'll be happy to finish my review if you can do such change Qian Xu

          Show
          Jarek Jarcec Cecho added a comment - I'll be happy to finish my review if you can do such change Qian Xu
          Hide
          Jarek Jarcec Cecho added a comment -

          Now when we've covered the Kite 0.17 upgrade via SQOOP-1693, I think that we have to update this patch by removing that functionality Qian Xu.

          Show
          Jarek Jarcec Cecho added a comment - Now when we've covered the Kite 0.17 upgrade via SQOOP-1693 , I think that we have to update this patch by removing that functionality Qian Xu .
          Hide
          Jarek Jarcec Cecho added a comment -

          Thank you for updating the patch Qian Xu, +1!

          Show
          Jarek Jarcec Cecho added a comment - Thank you for updating the patch Qian Xu , +1!
          Hide
          ASF subversion and git services added a comment -

          Commit ad13ad08106fe907c76fa3df4c7f5123874952fa in sqoop's branch refs/heads/trunk from Jarek Jarcec Cecho
          [ https://git-wip-us.apache.org/repos/asf?p=sqoop.git;h=ad13ad0 ]

          SQOOP-1391: Compression codec handling

          (Qian Xu via Jarek Jarcec Cecho)

          Show
          ASF subversion and git services added a comment - Commit ad13ad08106fe907c76fa3df4c7f5123874952fa in sqoop's branch refs/heads/trunk from Jarek Jarcec Cecho [ https://git-wip-us.apache.org/repos/asf?p=sqoop.git;h=ad13ad0 ] SQOOP-1391 : Compression codec handling (Qian Xu via Jarek Jarcec Cecho)
          Hide
          Jarek Jarcec Cecho added a comment -

          Thank you for your contribution Qian Xu!

          Show
          Jarek Jarcec Cecho added a comment - Thank you for your contribution Qian Xu !
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Sqoop-ant-jdk-1.6-hadoop100 #912 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop100/912/)
          SQOOP-1391: Compression codec handling (jarcec: https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=ad13ad08106fe907c76fa3df4c7f5123874952fa)

          • src/java/org/apache/sqoop/mapreduce/ParquetJob.java
          • src/test/com/cloudera/sqoop/TestParquetImport.java
          • src/java/org/apache/sqoop/mapreduce/ImportJobBase.java
          Show
          Hudson added a comment - FAILURE: Integrated in Sqoop-ant-jdk-1.6-hadoop100 #912 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop100/912/ ) SQOOP-1391 : Compression codec handling (jarcec: https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=ad13ad08106fe907c76fa3df4c7f5123874952fa ) src/java/org/apache/sqoop/mapreduce/ParquetJob.java src/test/com/cloudera/sqoop/TestParquetImport.java src/java/org/apache/sqoop/mapreduce/ImportJobBase.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Sqoop-ant-jdk-1.6-hadoop200 #953 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop200/953/)
          SQOOP-1391: Compression codec handling (jarcec: https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=ad13ad08106fe907c76fa3df4c7f5123874952fa)

          • src/test/com/cloudera/sqoop/TestParquetImport.java
          • src/java/org/apache/sqoop/mapreduce/ParquetJob.java
          • src/java/org/apache/sqoop/mapreduce/ImportJobBase.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Sqoop-ant-jdk-1.6-hadoop200 #953 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop200/953/ ) SQOOP-1391 : Compression codec handling (jarcec: https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=ad13ad08106fe907c76fa3df4c7f5123874952fa ) src/test/com/cloudera/sqoop/TestParquetImport.java src/java/org/apache/sqoop/mapreduce/ParquetJob.java src/java/org/apache/sqoop/mapreduce/ImportJobBase.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Sqoop-ant-jdk-1.6-hadoop20 #946 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop20/946/)
          SQOOP-1391: Compression codec handling (jarcec: https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=ad13ad08106fe907c76fa3df4c7f5123874952fa)

          • src/java/org/apache/sqoop/mapreduce/ImportJobBase.java
          • src/test/com/cloudera/sqoop/TestParquetImport.java
          • src/java/org/apache/sqoop/mapreduce/ParquetJob.java
          Show
          Hudson added a comment - FAILURE: Integrated in Sqoop-ant-jdk-1.6-hadoop20 #946 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop20/946/ ) SQOOP-1391 : Compression codec handling (jarcec: https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=ad13ad08106fe907c76fa3df4c7f5123874952fa ) src/java/org/apache/sqoop/mapreduce/ImportJobBase.java src/test/com/cloudera/sqoop/TestParquetImport.java src/java/org/apache/sqoop/mapreduce/ParquetJob.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Sqoop-ant-jdk-1.6-hadoop23 #1149 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop23/1149/)
          SQOOP-1391: Compression codec handling (jarcec: https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=ad13ad08106fe907c76fa3df4c7f5123874952fa)

          • src/test/com/cloudera/sqoop/TestParquetImport.java
          • src/java/org/apache/sqoop/mapreduce/ImportJobBase.java
          • src/java/org/apache/sqoop/mapreduce/ParquetJob.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Sqoop-ant-jdk-1.6-hadoop23 #1149 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop23/1149/ ) SQOOP-1391 : Compression codec handling (jarcec: https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=ad13ad08106fe907c76fa3df4c7f5123874952fa ) src/test/com/cloudera/sqoop/TestParquetImport.java src/java/org/apache/sqoop/mapreduce/ImportJobBase.java src/java/org/apache/sqoop/mapreduce/ParquetJob.java

            People

            • Assignee:
              Qian Xu
              Reporter:
              Qian Xu
            • Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development