Uploaded image for project: 'Sqoop (Retired)'
  1. Sqoop (Retired)
  2. SQOOP-3445

Spark with Sqoop and Kite - Parquet Mismatch in Command?

Add voteWatch issue
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Blocker
    • Resolution: Unresolved
    • 1.4.7
    • None
    • sqoop2-kite-connector
    • None

    Description

      Not sure if the error is deep in scoop or if the error is in Kite, so I cross-posted here: https://github.com/kite-sdk/kite/issues/490.

      I am reading from a MySQL Database and trying to write out to parquet. When writing to Avro there are no issues, but when Kite is involved (parquet) all hell breaks loose. First I had to manually add a ton of jar's to even get the sucker to run. But that all seems resolved.

      Also, please note, I have tried various versions of the installed dependencies, downgrading and upgrading scoop accordingly.

      When Sqoop is used without Kite (IE, Avro, not parquet) there are no issues. The moment the job runs to export to parquet, everything blows up. It seems like Kite may be the offender, but it may be in the scoop code for how Kite is run.

      System:

      • Debian 9
      • Hadoop 2.9
      • Spark 2.3

      Installed Dependencies (JARs):

      • sqoop-1.4.7-hadoop260
      • kite-data-mapreduce-1.1.0
      • kite-hadoop-compatibility-1.1.0.jar
      • kite-data-crunch-1.1.0
      • kite-data-core-1.1.0
      • avro-tools-1.8.2.jar
      • mysql-connector-java-5.1.42
      • parquet-tools-1.8.3

      Error:
       

      19/07/09 17:55:28 INFO mapreduce.Job: Job job_1562682312457_0020 failed with state FAILED due to: Job setup failed : java.lang.IllegalArgumentException: Parquet only supports generic and specific data models, type parameter must implement IndexedRecord at org.kitesdk.shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) at org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:96) at org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:128) at org.kitesdk.data.spi.filesystem.FileSystemDataset$Builder.build(FileSystemDataset.java:687) at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:199) at org.kitesdk.data.Datasets.load(Datasets.java:108) at org.kitesdk.data.Datasets.load(Datasets.java:165) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.load(DatasetKeyOutputFormat.java:542) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateJobDataset(DatasetKeyOutputFormat.java:569) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.access$300(DatasetKeyOutputFormat.java:67) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$MergeOutputCommitter.setupJob(DatasetKeyOutputFormat.java:369) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:255) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:235) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 19/07/09 17:55:28 INFO mapreduce.Job: Counters: 2

      Again, it only fails on the final conversion. I am not sure of the full details since the command is inside a parallel process. Any direction would be appreciated.

      Attachments

        Activity

          People

            Unassigned Unassigned
            dovy Dovy Paukstys

            Dates

              Created:
              Updated:

              Slack

                Issue deployment