Sqoop
  1. Sqoop
  2. SQOOP-332

Cannot use --as-avrodatafile with --query

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.3.0
    • Fix Version/s: 1.4.0-incubating
    • Component/s: codegen
    • Labels:
      None

      Description

      Using sqoop with --as-avrodatafile and --query to specify a freeform query causes an exception:

      11/08/30 19:55:28 ERROR sqoop.Sqoop: Got exception running Sqoop: org.apache.avro.AvroRuntimeException: Can't set a property to null: tableName
      org.apache.avro.AvroRuntimeException: Can't set a property to null: tableName
      	at org.apache.avro.Schema$Props.add(Schema.java:124)
      	at org.apache.avro.Schema.addProp(Schema.java:166)
      	at com.cloudera.sqoop.orm.AvroSchemaGenerator.generate(AvroSchemaGenerator.java:69)
      	at com.cloudera.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:78)
      	at com.cloudera.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:175)
      	at com.cloudera.sqoop.manager.SqlManager.importQuery(SqlManager.java:442)
      	at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:352)
      	at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:423)
      	at com.cloudera.sqoop.Sqoop.run(Sqoop.java:144)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
      	at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:180)
      	at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:219)
      	at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:228)
      	at com.cloudera.sqoop.Sqoop.main(Sqoop.java:237)
      1. SQOOP-332.patch
        0.9 kB
        Joseph Boyd
      2. SQOOP-332.patch
        5 kB
        Joseph Boyd

        Activity

        Hide
        Hudson added a comment -

        Integrated in Sqoop-jdk-1.6 #22 (See https://builds.apache.org/job/Sqoop-jdk-1.6/22/)
        SQOOP-332. Cannot use --as-avrodatafile with --query.

        (Joseph Boyd via Arvind Prabhakar)

        arvind : http://svn.apache.org/viewvc/?view=rev&rev=1170977
        Files :

        • /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/orm/AvroSchemaGenerator.java
        • /incubator/sqoop/trunk/src/test/com/cloudera/sqoop/TestAvroImportExportRoundtrip.java
        Show
        Hudson added a comment - Integrated in Sqoop-jdk-1.6 #22 (See https://builds.apache.org/job/Sqoop-jdk-1.6/22/ ) SQOOP-332 . Cannot use --as-avrodatafile with --query. (Joseph Boyd via Arvind Prabhakar) arvind : http://svn.apache.org/viewvc/?view=rev&rev=1170977 Files : /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/orm/AvroSchemaGenerator.java /incubator/sqoop/trunk/src/test/com/cloudera/sqoop/TestAvroImportExportRoundtrip.java
        Hide
        Arvind Prabhakar added a comment -

        Patch committed. Thanks Joseph!

        Show
        Arvind Prabhakar added a comment - Patch committed. Thanks Joseph!
        Hide
        Tom White added a comment -

        +1 looks good to me.

        Show
        Tom White added a comment - +1 looks good to me.
        Hide
        Joseph Boyd added a comment -

        I've added a new patch that includes a test of import-export round trip, using --query instead of --table.

        (I've tried my best to maintain the style of the existing test, but I'll admit I stopped short of understanding when includeHadoopFlags, rowsPerStmt, statementsPerTx args might be used)

        Show
        Joseph Boyd added a comment - I've added a new patch that includes a test of import-export round trip, using --query instead of --table. (I've tried my best to maintain the style of the existing test, but I'll admit I stopped short of understanding when includeHadoopFlags, rowsPerStmt, statementsPerTx args might be used)
        Hide
        Joseph Boyd added a comment -

        Updated patch, includes a quick test.

        Show
        Joseph Boyd added a comment - Updated patch, includes a quick test.
        Hide
        Joseph Boyd added a comment -

        adding a new patch that includes a test.

        Show
        Joseph Boyd added a comment - adding a new patch that includes a test.
        Hide
        Aaron Kimball added a comment -

        that sounds good to me

        Show
        Aaron Kimball added a comment - that sounds good to me
        Hide
        Joseph Boyd added a comment -

        I'll see if I can work up a test. (Running the existing test suite now)

        Perhaps a version of TestAvroImportExportRoundtrip that uses specs --query (select * from foo) instead of --table would be a good start.

        Show
        Joseph Boyd added a comment - I'll see if I can work up a test. (Running the existing test suite now) Perhaps a version of TestAvroImportExportRoundtrip that uses specs --query (select * from foo) instead of --table would be a good start.
        Hide
        Aaron Kimball added a comment -

        +1 sounds good to me. It would be good to see a unit test for this too before committing it.

        Show
        Aaron Kimball added a comment - +1 sounds good to me. It would be good to see a unit test for this too before committing it.
        Hide
        Joseph Boyd added a comment -

        I think I'll leave the patch as-is, and ask for any comments on it.

        I looked at ClassWriter a bit. ClassWriter is using com.cloudera.sqoop.orm.TableClassName internally to pick a class name. TableClassName is defaulting to a class named 'QueryResult' when no table name was specified on the command line, AND no class name was given on the command line.

        That look at ClassWriter left me thinking that a simple default like 'QueryResult' is useful enough for this JIRA:

        • Forcing a match to TableClassName's choice probably wouldn't suit some people
        • Adding another sqoop option (--avro-record-name or similar) just for setting the avro record name seems a bit too much at this point.
        Show
        Joseph Boyd added a comment - I think I'll leave the patch as-is, and ask for any comments on it. I looked at ClassWriter a bit. ClassWriter is using com.cloudera.sqoop.orm.TableClassName internally to pick a class name. TableClassName is defaulting to a class named 'QueryResult' when no table name was specified on the command line, AND no class name was given on the command line. That look at ClassWriter left me thinking that a simple default like 'QueryResult' is useful enough for this JIRA: Forcing a match to TableClassName's choice probably wouldn't suit some people Adding another sqoop option (--avro-record-name or similar) just for setting the avro record name seems a bit too much at this point.
        Hide
        Arvind Prabhakar added a comment -

        Thanks for taking this up Joseph! Looking forward to see the patch.

        Show
        Arvind Prabhakar added a comment - Thanks for taking this up Joseph! Looking forward to see the patch.
        Hide
        Joseph Boyd added a comment -

        Attached patch fixes the problem, defaulting the table name to 'QueryResult' if its not set.

        As I mentioned above, I'd like to take another look at how class writer picks its defaults, to make sure they both match (if the code for wouldn't be too crazy)

        Show
        Joseph Boyd added a comment - Attached patch fixes the problem, defaulting the table name to 'QueryResult' if its not set. As I mentioned above, I'd like to take another look at how class writer picks its defaults, to make sure they both match (if the code for wouldn't be too crazy)
        Hide
        Joseph Boyd added a comment -

        Looking at the code, it seems we need to pick a default name for the avro 'tableName' and 'name' properties when no table name is supplied, similar to how ClassWriter picks 'QueryResult' for the java class name when no table name is supplied.

        I've a quick, hack-ish patch that does this (which I'll attach), but it'd probably be best for me to take another look at how ClassWriter picks its default class name and see if I can't make the default that AvroSchemaGenerator picks match.

        Show
        Joseph Boyd added a comment - Looking at the code, it seems we need to pick a default name for the avro 'tableName' and 'name' properties when no table name is supplied, similar to how ClassWriter picks 'QueryResult' for the java class name when no table name is supplied. I've a quick, hack-ish patch that does this (which I'll attach), but it'd probably be best for me to take another look at how ClassWriter picks its default class name and see if I can't make the default that AvroSchemaGenerator picks match.

          People

          • Assignee:
            Joseph Boyd
            Reporter:
            Aaron Kimball
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development