[SQOOP-3445] Spark with Sqoop and Kite - Parquet Mismatch in Command? - ASF JIRA

Add vote

Watch issue

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Blocker
Resolution: Unresolved
Affects Version/s: 1.4.7
Fix Version/s: None
Component/s: sqoop2-kite-connector
Labels:
None
Environment:
Hide

System:

Debian 9

Hadoop 2.9

Spark 2.3

Installed Dependencies (JARs):

sqoop-1.4.7-hadoop260

kite-data-mapreduce-1.1.0

kite-hadoop-compatibility-1.1.0.jar

kite-data-crunch-1.1.0

kite-data-core-1.1.0

avro-tools-1.8.2.jar

mysql-connector-java-5.1.42

parquet-tools-1.8.3
Show
System: Debian 9 Hadoop 2.9 Spark 2.3 Installed Dependencies (JARs): sqoop-1.4.7-hadoop260 kite-data-mapreduce-1.1.0 kite-hadoop-compatibility-1.1.0.jar kite-data-crunch-1.1.0 kite-data-core-1.1.0 avro-tools-1.8.2.jar mysql-connector-java-5.1.42 parquet-tools-1.8.3

Description

Not sure if the error is deep in scoop or if the error is in Kite, so I cross-posted here: https://github.com/kite-sdk/kite/issues/490.

I am reading from a MySQL Database and trying to write out to parquet. When writing to Avro there are no issues, but when Kite is involved (parquet) all hell breaks loose. First I had to manually add a ton of jar's to even get the sucker to run. But that all seems resolved.

Also, please note, I have tried various versions of the installed dependencies, downgrading and upgrading scoop accordingly.

When Sqoop is used without Kite (IE, Avro, not parquet) there are no issues. The moment the job runs to export to parquet, everything blows up. It seems like Kite may be the offender, but it may be in the scoop code for how Kite is run.

System:

Debian 9
Hadoop 2.9
Spark 2.3

Installed Dependencies (JARs):

sqoop-1.4.7-hadoop260
kite-data-mapreduce-1.1.0
kite-hadoop-compatibility-1.1.0.jar
kite-data-crunch-1.1.0
kite-data-core-1.1.0
avro-tools-1.8.2.jar
mysql-connector-java-5.1.42
parquet-tools-1.8.3

Error:

19/07/09 17:55:28 INFO mapreduce.Job: Job job_1562682312457_0020 failed with state FAILED due to: Job setup failed : java.lang.IllegalArgumentException: Parquet only supports generic and specific data models, type parameter must implement IndexedRecord at org.kitesdk.shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) at org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:96) at org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:128) at org.kitesdk.data.spi.filesystem.FileSystemDataset$Builder.build(FileSystemDataset.java:687) at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:199) at org.kitesdk.data.Datasets.load(Datasets.java:108) at org.kitesdk.data.Datasets.load(Datasets.java:165) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.load(DatasetKeyOutputFormat.java:542) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateJobDataset(DatasetKeyOutputFormat.java:569) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.access$300(DatasetKeyOutputFormat.java:67) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$MergeOutputCommitter.setupJob(DatasetKeyOutputFormat.java:369) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:255) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:235) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 19/07/09 17:55:28 INFO mapreduce.Job: Counters: 2

Again, it only fails on the final conversion. I am not sure of the full details since the command is inside a parallel process. Any direction would be appreciated.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Dovy Paukstys

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 09/Jul/19 18:56

Updated:: 09/Jul/19 18:56

Agile

View on Board

Spark with Sqoop and Kite - Parquet Mismatch in Command?