Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
This is a follow up on SQOOP-2582 where we added support for query import into parquet. It seems that when run on a real cluster (rather then mini-cluster), the job fails with exception similar to this one:
16/01/08 09:47:13 INFO mapreduce.Job: Task Id : attempt_1452259292738_0001_m_000000_2, Status : FAILED Error: org.kitesdk.data.IncompatibleSchemaException: The type cannot be used to read from or write to the dataset: Type schema: {"type":"record","name":"QueryResult","fields":[{"name":"PROTOCOL_VERSION","type":"int"},{"name":"__cur_result_set","type":["null",{"type":"record","name":"ResultSet","namespace":"java.sql","fields":[]}],"default":null},{"name":"c1_int","type":["null","int"],"default":null},{"name":"c2_date","type":["null",{"type":"record","name":"Date","namespace":"java.sql","fields":[]}],"default":null},{"name":"c3_timestamp","type":["null",{"type":"record","name":"Timestamp","namespace":"java.sql","fields":[]}],"default":null},{"name":"c4_varchar20","type":["null","string"],"default":null},{"name":"__parser","type":["null",{"type":"record","name":"RecordParser","namespace":"com.cloudera.sqoop.lib","fields":[{"name":"delimiters","type":["null",{"type":"record","name":"DelimiterSet","fields":[{"name":"fieldDelim","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"recordDelim","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"enclosedBy","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"escapedBy","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"encloseRequired","type":"boolean"}]}],"default":null},{"name":"outputs","type":["null",{"type":"array","items":"string","java-class":"java.util.ArrayList"}],"default":null}]}],"default":null}]} Dataset schema: {"type":"record","name":"QueryResult","doc":"Sqoop import of QueryResult","fields":[{"name":"c1_int","type":["null","int"],"default":null,"columnName":"c1_int","sqlType":"4"},{"name":"c2_date","type":["null","long"],"default":null,"columnName":"c2_date","sqlType":"91"},{"name":"c3_timestamp","type":["null","long"],"default":null,"columnName":"c3_timestamp","sqlType":"93"},{"name":"c4_varchar20","type":["null","string"],"default":null,"columnName":"c4_varchar20","sqlType":"12"}],"tableName":"QueryResult"} at org.kitesdk.data.IncompatibleSchemaException.check(IncompatibleSchemaException.java:55) at org.kitesdk.data.spi.AbstractRefinableView.<init>(AbstractRefinableView.java:90) at org.kitesdk.data.spi.filesystem.FileSystemView.<init>(FileSystemView.java:71) at org.kitesdk.data.spi.filesystem.FileSystemPartitionView.<init>(FileSystemPartitionView.java:57) at org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:116) at org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:129) at org.kitesdk.data.spi.filesystem.FileSystemDataset$Builder.build(FileSystemDataset.java:696) at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:199) at org.kitesdk.data.spi.AbstractDatasetRepository.load(AbstractDatasetRepository.java:40) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadJobDataset(DatasetKeyOutputFormat.java:591) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateTaskAttemptDataset(DatasetKeyOutputFormat.java:602) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateTaskAttemptView(DatasetKeyOutputFormat.java:615) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.getRecordWriter(DatasetKeyOutputFormat.java:448) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Looking into Sqoop and Kite source code I was not able to precisely identify where is the problem. Not until I found SQOOP-1395/SQOOP-2294 that are talking about similar problem, just for table based import. I do not clearly understand why the test added back in SQOOP-2582 is not failing, but I assume that it's due to the differences in classpath on minicluster versus real cluster.
I would suggest to change the avro schema name generated from QueryResult to something more generic, such as AutoGeneratedSchema that will avoid this problem. I'm not particularly concerned about backward compatibility here because it doesn't make much sense to depend on name that can be generated for every single query based import.
Attachments
Attachments
Issue Links
- relates to
-
SQOOP-3134 --class-name should override default Avro schema name
- Resolved