Uploaded image for project: 'Sqoop'
  1. Sqoop
  2. SQOOP-2783

Query import with parquet fails on incompatible schema

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.7
    • Component/s: None
    • Labels:
      None

      Description

      This is a follow up on SQOOP-2582 where we added support for query import into parquet. It seems that when run on a real cluster (rather then mini-cluster), the job fails with exception similar to this one:

      16/01/08 09:47:13 INFO mapreduce.Job: Task Id : attempt_1452259292738_0001_m_000000_2, Status : FAILED
      Error: org.kitesdk.data.IncompatibleSchemaException: The type cannot be used to read from or write to the dataset:
      Type schema: {"type":"record","name":"QueryResult","fields":[{"name":"PROTOCOL_VERSION","type":"int"},{"name":"__cur_result_set","type":["null",{"type":"record","name":"ResultSet","namespace":"java.sql","fields":[]}],"default":null},{"name":"c1_int","type":["null","int"],"default":null},{"name":"c2_date","type":["null",{"type":"record","name":"Date","namespace":"java.sql","fields":[]}],"default":null},{"name":"c3_timestamp","type":["null",{"type":"record","name":"Timestamp","namespace":"java.sql","fields":[]}],"default":null},{"name":"c4_varchar20","type":["null","string"],"default":null},{"name":"__parser","type":["null",{"type":"record","name":"RecordParser","namespace":"com.cloudera.sqoop.lib","fields":[{"name":"delimiters","type":["null",{"type":"record","name":"DelimiterSet","fields":[{"name":"fieldDelim","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"recordDelim","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"enclosedBy","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"escapedBy","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"encloseRequired","type":"boolean"}]}],"default":null},{"name":"outputs","type":["null",{"type":"array","items":"string","java-class":"java.util.ArrayList"}],"default":null}]}],"default":null}]}
      Dataset schema: {"type":"record","name":"QueryResult","doc":"Sqoop import of QueryResult","fields":[{"name":"c1_int","type":["null","int"],"default":null,"columnName":"c1_int","sqlType":"4"},{"name":"c2_date","type":["null","long"],"default":null,"columnName":"c2_date","sqlType":"91"},{"name":"c3_timestamp","type":["null","long"],"default":null,"columnName":"c3_timestamp","sqlType":"93"},{"name":"c4_varchar20","type":["null","string"],"default":null,"columnName":"c4_varchar20","sqlType":"12"}],"tableName":"QueryResult"}
      	at org.kitesdk.data.IncompatibleSchemaException.check(IncompatibleSchemaException.java:55)
      	at org.kitesdk.data.spi.AbstractRefinableView.<init>(AbstractRefinableView.java:90)
      	at org.kitesdk.data.spi.filesystem.FileSystemView.<init>(FileSystemView.java:71)
      	at org.kitesdk.data.spi.filesystem.FileSystemPartitionView.<init>(FileSystemPartitionView.java:57)
      	at org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:116)
      	at org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:129)
      	at org.kitesdk.data.spi.filesystem.FileSystemDataset$Builder.build(FileSystemDataset.java:696)
      	at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:199)
      	at org.kitesdk.data.spi.AbstractDatasetRepository.load(AbstractDatasetRepository.java:40)
      	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadJobDataset(DatasetKeyOutputFormat.java:591)
      	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateTaskAttemptDataset(DatasetKeyOutputFormat.java:602)
      	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateTaskAttemptView(DatasetKeyOutputFormat.java:615)
      	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.getRecordWriter(DatasetKeyOutputFormat.java:448)
      	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647)
      	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
      	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707)
      	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
      

      Looking into Sqoop and Kite source code I was not able to precisely identify where is the problem. Not until I found SQOOP-1395/SQOOP-2294 that are talking about similar problem, just for table based import. I do not clearly understand why the test added back in SQOOP-2582 is not failing, but I assume that it's due to the differences in classpath on minicluster versus real cluster.

      I would suggest to change the avro schema name generated from QueryResult to something more generic, such as AutoGeneratedSchema that will avoid this problem. I'm not particularly concerned about backward compatibility here because it doesn't make much sense to depend on name that can be generated for every single query based import.

        Attachments

        1. SQOOP-2783.patch
          1 kB
          Jarek Jarcec Cecho

          Issue Links

            Activity

              People

              • Assignee:
                jarcec Jarek Jarcec Cecho
                Reporter:
                jarcec Jarek Jarcec Cecho
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: