Uploaded image for project: 'Sqoop (Retired)'
  1. Sqoop (Retired)
  2. SQOOP-2783

Query import with parquet fails on incompatible schema

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.4.7
    • None
    • None

    Description

      This is a follow up on SQOOP-2582 where we added support for query import into parquet. It seems that when run on a real cluster (rather then mini-cluster), the job fails with exception similar to this one:

      16/01/08 09:47:13 INFO mapreduce.Job: Task Id : attempt_1452259292738_0001_m_000000_2, Status : FAILED
      Error: org.kitesdk.data.IncompatibleSchemaException: The type cannot be used to read from or write to the dataset:
      Type schema: {"type":"record","name":"QueryResult","fields":[{"name":"PROTOCOL_VERSION","type":"int"},{"name":"__cur_result_set","type":["null",{"type":"record","name":"ResultSet","namespace":"java.sql","fields":[]}],"default":null},{"name":"c1_int","type":["null","int"],"default":null},{"name":"c2_date","type":["null",{"type":"record","name":"Date","namespace":"java.sql","fields":[]}],"default":null},{"name":"c3_timestamp","type":["null",{"type":"record","name":"Timestamp","namespace":"java.sql","fields":[]}],"default":null},{"name":"c4_varchar20","type":["null","string"],"default":null},{"name":"__parser","type":["null",{"type":"record","name":"RecordParser","namespace":"com.cloudera.sqoop.lib","fields":[{"name":"delimiters","type":["null",{"type":"record","name":"DelimiterSet","fields":[{"name":"fieldDelim","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"recordDelim","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"enclosedBy","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"escapedBy","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"encloseRequired","type":"boolean"}]}],"default":null},{"name":"outputs","type":["null",{"type":"array","items":"string","java-class":"java.util.ArrayList"}],"default":null}]}],"default":null}]}
      Dataset schema: {"type":"record","name":"QueryResult","doc":"Sqoop import of QueryResult","fields":[{"name":"c1_int","type":["null","int"],"default":null,"columnName":"c1_int","sqlType":"4"},{"name":"c2_date","type":["null","long"],"default":null,"columnName":"c2_date","sqlType":"91"},{"name":"c3_timestamp","type":["null","long"],"default":null,"columnName":"c3_timestamp","sqlType":"93"},{"name":"c4_varchar20","type":["null","string"],"default":null,"columnName":"c4_varchar20","sqlType":"12"}],"tableName":"QueryResult"}
      	at org.kitesdk.data.IncompatibleSchemaException.check(IncompatibleSchemaException.java:55)
      	at org.kitesdk.data.spi.AbstractRefinableView.<init>(AbstractRefinableView.java:90)
      	at org.kitesdk.data.spi.filesystem.FileSystemView.<init>(FileSystemView.java:71)
      	at org.kitesdk.data.spi.filesystem.FileSystemPartitionView.<init>(FileSystemPartitionView.java:57)
      	at org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:116)
      	at org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:129)
      	at org.kitesdk.data.spi.filesystem.FileSystemDataset$Builder.build(FileSystemDataset.java:696)
      	at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:199)
      	at org.kitesdk.data.spi.AbstractDatasetRepository.load(AbstractDatasetRepository.java:40)
      	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadJobDataset(DatasetKeyOutputFormat.java:591)
      	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateTaskAttemptDataset(DatasetKeyOutputFormat.java:602)
      	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateTaskAttemptView(DatasetKeyOutputFormat.java:615)
      	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.getRecordWriter(DatasetKeyOutputFormat.java:448)
      	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647)
      	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
      	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707)
      	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
      

      Looking into Sqoop and Kite source code I was not able to precisely identify where is the problem. Not until I found SQOOP-1395/SQOOP-2294 that are talking about similar problem, just for table based import. I do not clearly understand why the test added back in SQOOP-2582 is not failing, but I assume that it's due to the differences in classpath on minicluster versus real cluster.

      I would suggest to change the avro schema name generated from QueryResult to something more generic, such as AutoGeneratedSchema that will avoid this problem. I'm not particularly concerned about backward compatibility here because it doesn't make much sense to depend on name that can be generated for every single query based import.

      Attachments

        1. SQOOP-2783.patch
          1 kB
          Jarek Jarcec Cecho

        Issue Links

          Activity

            People

              jarcec Jarek Jarcec Cecho
              jarcec Jarek Jarcec Cecho
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: