Pig
  1. Pig
  2. PIG-2509

Util.getSchemaFromString fails with java.lang.NullPointerException when a tuple in a bag has no name (as when used in MongoStorage UDF)

    Details

    • Hadoop Flags:
      Reviewed

      Description

      The MongoStorage UDF ( https://github.com/mongodb/mongo-hadoop ) fails on its call to Util.getSchemaFromString when a tuple is not named. The call succeeds when the tuple is explicitly named. As tuple naming is optional, this is a bug.

      See http://www.mail-archive.com/user%40pig.apache.org/msg04199.html and http://www.mail-archive.com/dev%40pig.apache.org/msg08016.html

      This makes me sad

      1. PIG-2509-0.patch
        2 kB
        Jonathan Coveney

        Activity

        Russell Jurney created issue -
        Hide
        Russell Jurney added a comment -
        Show
        Russell Jurney added a comment - See also https://jira.mongodb.org/browse/HADOOP-19
        Hide
        Russell Jurney added a comment -

        The offending code that is called is here: class org.apache.pig.newplan.logical.Util

        ...

        public static LogicalSchema.LogicalFieldSchema translateFieldSchema(Schema.FieldSchema fs) {
        LogicalSchema newSchema = null;
        if (fs.schema!=null)

        { newSchema = translateSchema(fs.schema); }

        LogicalSchema.LogicalFieldSchema newFs = new LogicalSchema.LogicalFieldSchema(fs.alias, newSchema, fs.type);
        return newFs;
        }

        /**

        • This function translates the new LogicalSchema into old Schema format required
        • by PhysicalOperators
        • @param schema LogicalSchema to be converted to Schema
        • @return Schema that is converted from LogicalSchema
        • @throws FrontendException
          */
          public static Schema translateSchema(LogicalSchema schema) {
          if (schema == null) { return null; }

        Schema s2 = new Schema();
        List<LogicalSchema.LogicalFieldSchema> ll = schema.getFields();
        for (LogicalSchema.LogicalFieldSchema f: ll) {
        Schema.FieldSchema f2 = null;
        try

        { f2 = new Schema.FieldSchema(f.alias, translateSchema(f.schema), f.type); f2.canonicalName = ((Long)f.uid).toString(); s2.add(f2); }

        catch (FrontendException e) {
        }
        }

        return s2;
        }

        Show
        Russell Jurney added a comment - The offending code that is called is here: class org.apache.pig.newplan.logical.Util ... public static LogicalSchema.LogicalFieldSchema translateFieldSchema(Schema.FieldSchema fs) { LogicalSchema newSchema = null; if (fs.schema!=null) { newSchema = translateSchema(fs.schema); } LogicalSchema.LogicalFieldSchema newFs = new LogicalSchema.LogicalFieldSchema(fs.alias, newSchema, fs.type); return newFs; } /** This function translates the new LogicalSchema into old Schema format required by PhysicalOperators @param schema LogicalSchema to be converted to Schema @return Schema that is converted from LogicalSchema @throws FrontendException */ public static Schema translateSchema(LogicalSchema schema) { if (schema == null) { return null; } Schema s2 = new Schema(); List<LogicalSchema.LogicalFieldSchema> ll = schema.getFields(); for (LogicalSchema.LogicalFieldSchema f: ll) { Schema.FieldSchema f2 = null; try { f2 = new Schema.FieldSchema(f.alias, translateSchema(f.schema), f.type); f2.canonicalName = ((Long)f.uid).toString(); s2.add(f2); } catch (FrontendException e) { } } return s2; }
        Hide
        Jonathan Coveney added a comment -

        The fix basically just allows a tuple where it is explicitly called "null:" as in the case of "b:

        {null:(a:int)}

        ". In such a case, it will just throw out the tuple name, which will make it equivalent to "b:

        {(a:int)}

        ". Russell ran it and it worked. Russell, I did change it slightly, so it'd be lovely if you can try it again just to be sure.

        I successfully applied to 0.9 and ran "ant test-commit", and applied it to 0.10 and 0.11 and ran "ant -Dtestcase=TestSchema test."

        Show
        Jonathan Coveney added a comment - The fix basically just allows a tuple where it is explicitly called "null:" as in the case of "b: {null:(a:int)} ". In such a case, it will just throw out the tuple name, which will make it equivalent to "b: {(a:int)} ". Russell ran it and it worked. Russell, I did change it slightly, so it'd be lovely if you can try it again just to be sure. I successfully applied to 0.9 and ran "ant test-commit", and applied it to 0.10 and 0.11 and ran "ant -Dtestcase=TestSchema test."
        Jonathan Coveney made changes -
        Field Original Value New Value
        Status Open [ 1 ] Patch Available [ 10002 ]
        Assignee Jonathan Coveney [ jcoveney ]
        Fix Version/s 0.10 [ 12316246 ]
        Fix Version/s 0.9.3 [ 12319456 ]
        Fix Version/s 0.11 [ 12318878 ]
        Jonathan Coveney made changes -
        Attachment PIG-2509-0.patch [ 12513845 ]
        Hide
        Russell Jurney added a comment -

        I have applied this patch, and have been using it since. It works. Tests pass. Can we resolve and include this in 0.10?

        Show
        Russell Jurney added a comment - I have applied this patch, and have been using it since. It works. Tests pass. Can we resolve and include this in 0.10?
        Hide
        Russell Jurney added a comment -

        Tested. Working well in MongoStorage.

        Show
        Russell Jurney added a comment - Tested. Working well in MongoStorage.
        Russell Jurney made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Jonathan Coveney added a comment -

        This was closed incorrectly, and needs to be reviewed and committed

        Show
        Jonathan Coveney added a comment - This was closed incorrectly, and needs to be reviewed and committed
        Jonathan Coveney made changes -
        Resolution Fixed [ 1 ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Jonathan Coveney made changes -
        Status Reopened [ 4 ] Patch Available [ 10002 ]
        Hide
        Daniel Dai added a comment -

        Unit test pass. test-patch:
        [exec] -1 overall.
        [exec]
        [exec] +1 @author. The patch does not contain any @author tags.
        [exec]
        [exec] +1 tests included. The patch appears to include 3 new or modified tests.
        [exec]
        [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages.
        [exec]
        [exec] -1 javac. The applied patch generated 904 javac compiler warnings (more than the trunk's current 900 warnings).
        [exec]
        [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
        [exec]
        [exec] -1 release audit. The applied patch generated 533 release audit warnings (more than the trunk's current 530 warnings).

        javac warning is caused by Antlr. javadoc and release audit warning is not related.

        I don't feel a compelling reason to check it into 0.9 branch. Commit into 0.10/trunk first.

        Thanks Jonathan!

        Show
        Daniel Dai added a comment - Unit test pass. test-patch: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages. [exec] [exec] -1 javac. The applied patch generated 904 javac compiler warnings (more than the trunk's current 900 warnings). [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 533 release audit warnings (more than the trunk's current 530 warnings). javac warning is caused by Antlr. javadoc and release audit warning is not related. I don't feel a compelling reason to check it into 0.9 branch. Commit into 0.10/trunk first. Thanks Jonathan!
        Daniel Dai made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Fix Version/s 0.9.3 [ 12319456 ]
        Resolution Fixed [ 1 ]
        Daniel Dai made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Russell Jurney added a comment -

        MongoStorage is still not working when tuples aren't named, but as there is no longer a call to Util.getSchemaFromString at all in mongo-hadoop, this problem appears to be unrelated.

        Show
        Russell Jurney added a comment - MongoStorage is still not working when tuples aren't named, but as there is no longer a call to Util.getSchemaFromString at all in mongo-hadoop, this problem appears to be unrelated.

          People

          • Assignee:
            Jonathan Coveney
            Reporter:
            Russell Jurney
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development