Uploaded image for project: 'Sqoop'
  1. Sqoop
  2. SQOOP-2408

Sqoop doesnt support --as-parquetfile with -query option.

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Flags:
      Patch

      Description

      Sqoop doesnt support --as-parquetfile option with -query, but works fine with -table.

      1. SQOOP-2408.patch
        4 kB
        Sergey Svinarchuk

        Activity

        Hide
        Satyajit Satyajit varma added a comment -

        Hi All,

        Need help in working on this issue.
        I have made the changes to make sure --as-parquetfile works fine with -query option and things work fine , when running the code from Intellij(LocalJobRunner), and i was able to create parquetfile on hdfs.

        But when i try running the same from command line, i get the below error
        15/07/21 18:56:46 INFO mapreduce.Job: Task Id : attempt_1437440812635_0010_m_000000_2, Status : FAILED
        Error: org.apache.avro.AvroRuntimeException: Not a Specific class: char
        at org.apache.avro.specific.SpecificData.createSchema(SpecificData.java:213)
        at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:303)
        at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:430)
        at org.kitesdk.data.spi.DataModelUtil$AllowNulls.createFieldSchema(DataModelUtil.java:54)
        at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:354)
        at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:430)
        at org.kitesdk.data.spi.DataModelUtil$AllowNulls.createFieldSchema(DataModelUtil.java:54)
        at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:354)
        at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:430)
        at org.kitesdk.data.spi.DataModelUtil$AllowNulls.createFieldSchema(DataModelUtil.java:54)
        at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:354)
        at org.apache.avro.specific.SpecificData.getSchema(SpecificData.java:154)
        at org.kitesdk.data.spi.DataModelUtil.getReaderSchema(DataModelUtil.java:178)
        at org.kitesdk.data.spi.AbstractDataset.<init>(AbstractDataset.java:50)
        at org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:94)
        at org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:128)
        at org.kitesdk.data.spi.filesystem.FileSystemDataset$Builder.build(FileSystemDataset.java:687)
        at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:199)
        at org.kitesdk.data.spi.AbstractDatasetRepository.load(AbstractDatasetRepository.java:40)
        at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadJobDataset(DatasetKeyOutputFormat.java:544)
        at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateTaskAttemptDataset(DatasetKeyOutputFormat.java:555)
        at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateTaskAttemptView(DatasetKeyOutputFormat.java:568)
        at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.getRecordWriter(DatasetKeyOutputFormat.java:426)
        at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:644)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

        Regards,
        Satyajit.

        Show
        Satyajit Satyajit varma added a comment - Hi All, Need help in working on this issue. I have made the changes to make sure --as-parquetfile works fine with -query option and things work fine , when running the code from Intellij(LocalJobRunner), and i was able to create parquetfile on hdfs. But when i try running the same from command line, i get the below error 15/07/21 18:56:46 INFO mapreduce.Job: Task Id : attempt_1437440812635_0010_m_000000_2, Status : FAILED Error: org.apache.avro.AvroRuntimeException: Not a Specific class: char at org.apache.avro.specific.SpecificData.createSchema(SpecificData.java:213) at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:303) at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:430) at org.kitesdk.data.spi.DataModelUtil$AllowNulls.createFieldSchema(DataModelUtil.java:54) at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:354) at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:430) at org.kitesdk.data.spi.DataModelUtil$AllowNulls.createFieldSchema(DataModelUtil.java:54) at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:354) at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:430) at org.kitesdk.data.spi.DataModelUtil$AllowNulls.createFieldSchema(DataModelUtil.java:54) at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:354) at org.apache.avro.specific.SpecificData.getSchema(SpecificData.java:154) at org.kitesdk.data.spi.DataModelUtil.getReaderSchema(DataModelUtil.java:178) at org.kitesdk.data.spi.AbstractDataset.<init>(AbstractDataset.java:50) at org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:94) at org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:128) at org.kitesdk.data.spi.filesystem.FileSystemDataset$Builder.build(FileSystemDataset.java:687) at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:199) at org.kitesdk.data.spi.AbstractDatasetRepository.load(AbstractDatasetRepository.java:40) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadJobDataset(DatasetKeyOutputFormat.java:544) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateTaskAttemptDataset(DatasetKeyOutputFormat.java:555) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateTaskAttemptView(DatasetKeyOutputFormat.java:568) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.getRecordWriter(DatasetKeyOutputFormat.java:426) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:644) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Regards, Satyajit.
        Hide
        vaifer Sergey Svinarchuk added a comment -

        I attached patch with fix for this issue. The problem was that Avro doesn't support type 'char', but when we run import with "query" parameter, Avro wants create the schema for fields from DelimiterSet class.
        This bug doesn't reproduce in Unit Test, but reproduced on Hadoop cluster

        Show
        vaifer Sergey Svinarchuk added a comment - I attached patch with fix for this issue. The problem was that Avro doesn't support type 'char', but when we run import with "query" parameter, Avro wants create the schema for fields from DelimiterSet class. This bug doesn't reproduce in Unit Test, but reproduced on Hadoop cluster

          People

          • Assignee:
            Unassigned
            Reporter:
            Satyajit Satyajit varma
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development