Cassandra
  1. Cassandra
  2. CASSANDRA-5488

CassandraStorage throws NullPointerException (NPE) when widerows is set to 'true'

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Fix Version/s: 1.1.12, 1.2.6
    • Component/s: None
    • Labels:
    • Environment:

      Ubuntu 12.04.1 x64, Cassandra 1.2.4

      Description

      CassandraStorage throws NPE when widerows is set to 'true'.

      2 problems in getNextWide:
      1. Creation of tuple without specifying size
      2. Calling addKeyToTuple on lastKey instead of key

      java.lang.NullPointerException
      at org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:167)
      at org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:124)
      at org.apache.cassandra.cql.jdbc.JdbcUTF8.getString(JdbcUTF8.java:73)
      at org.apache.cassandra.cql.jdbc.JdbcUTF8.compose(JdbcUTF8.java:93)
      at org.apache.cassandra.db.marshal.UTF8Type.compose(UTF8Type.java:34)
      at org.apache.cassandra.db.marshal.UTF8Type.compose(UTF8Type.java:26)
      at org.apache.cassandra.hadoop.pig.CassandraStorage.addKeyToTuple(CassandraStorage.java:313)
      at org.apache.cassandra.hadoop.pig.CassandraStorage.getNextWide(CassandraStorage.java:196)
      at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(CassandraStorage.java:224)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:194)
      at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
      at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
      at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
      at org.apache.hadoop.mapred.Child.main(Child.java:249)
      2013-04-16 12:28:03,671 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task

      1. 5488-2.txt
        4 kB
        Jeremy Hanna
      2. 5488.txt
        3 kB
        Sheetal Gosrani

        Activity

        Sheetal Gosrani created issue -
        Hide
        Sheetal Gosrani added a comment -

        This patch (5488.txt) fixes the issue.

        Show
        Sheetal Gosrani added a comment - This patch (5488.txt) fixes the issue.
        Sheetal Gosrani made changes -
        Field Original Value New Value
        Status Open [ 1 ] Patch Available [ 10002 ]
        Sheetal Gosrani made changes -
        Attachment 5488.txt [ 12579184 ]
        Sheetal Gosrani made changes -
        Attachment 5488.txt [ 12579184 ]
        Sheetal Gosrani made changes -
        Attachment 5488.txt [ 12579212 ]
        Jonathan Ellis made changes -
        Fix Version/s 1.2.5 [ 12324301 ]
        Priority Major [ 3 ] Minor [ 4 ]
        Reviewer brandon.williams
        Hide
        Brandon Williams added a comment -

        Can you add a test to examples/pig/test/test_storage.pig that demonstrates the problem?

        Show
        Brandon Williams added a comment - Can you add a test to examples/pig/test/test_storage.pig that demonstrates the problem?
        Hide
        Jeremy Hanna added a comment -

        I've reproduced this with 1.1.9 as well.

        Show
        Jeremy Hanna added a comment - I've reproduced this with 1.1.9 as well.
        Jeremy Hanna made changes -
        Affects Version/s 1.1.9 [ 12323843 ]
        Hide
        Jeremy Hanna added a comment -

        Looks like it's from CASSANDRA-5098

        Show
        Jeremy Hanna added a comment - Looks like it's from CASSANDRA-5098
        Sylvain Lebresne made changes -
        Fix Version/s 1.2.6 [ 12324449 ]
        Fix Version/s 1.2.5 [ 12324301 ]
        Hide
        Jeremy Hanna added a comment -

        An alternative way to do it with consolidating the two methods and checking for null in that method.

        Show
        Jeremy Hanna added a comment - An alternative way to do it with consolidating the two methods and checking for null in that method.
        Jeremy Hanna made changes -
        Attachment 5488-2.txt [ 12584029 ]
        Hide
        Brandon Williams added a comment -

        Committed v2, and also flipped the copy test to use widerow mode as a smoke test.

        Show
        Brandon Williams added a comment - Committed v2, and also flipped the copy test to use widerow mode as a smoke test.
        Brandon Williams made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Fix Version/s 1.1.12 [ 12324332 ]
        Resolution Fixed [ 1 ]
        Brandon Williams made changes -
        Assignee Sheetal Gosrani [ sgosrani ]
        Hide
        Jeremy Hanna added a comment -

        There ended up being a secondary problem that was hidden by the first NPE. It seems to be related to getting the AbstractType. The NPE was for this line: https://github.com/apache/cassandra/blob/cassandra-1.1/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java#L307 which I decomposed to find out what it was NPEing on, and got this:

                    List<AbstractType> atList = getDefaultMarshallers(cfDef);
                    AbstractType at = atList.get(2);
                    Object o = at.compose(key); //NPE from this line
                    setTupleValue(tuple, 0, o);
                    //setTupleValue(tuple, 0, getDefaultMarshallers(cfDef).get(2).compose(key));
        

        So it seems unrelated to the original NPE, but still matches the description of this ticket.

        To reproduce, here is my schema:

        CREATE KEYSPACE circus
        with placement_strategy = 'SimpleStrategy'
        and strategy_options = {replication_factor:1};
        
        use circus;
        
        CREATE COLUMN FAMILY acrobats
        WITH comparator = UTF8Type
        AND key_validation_class=UTF8Type
        AND default_validation_class = UTF8Type;
        

        Here is a pycassa script to create the data:

        from pycassa.pool import ConnectionPool
        from pycassa.columnfamily import ColumnFamily
        
        pool = ConnectionPool('circus')
        col_fam = pycassa.ColumnFamily(pool, 'acrobats')
        
        for i in range(1, 10):
            for j in range(1, 200000):
                col_fam.insert('row_key' + str(i), {str(j): 'val'})
        

        Here is the pig (0.9.2) that I'm running in local mode:

        rows = LOAD 'cassandra://circus/acrobats?widerows=true&limit=200000' USING CassandraStorage();
        filtered = filter rows by key == 'row_key1';
        columns = foreach filtered generate flatten(columns);
        counted = foreach (group columns all) generate COUNT($1);
        dump counted;
        
        Show
        Jeremy Hanna added a comment - There ended up being a secondary problem that was hidden by the first NPE. It seems to be related to getting the AbstractType. The NPE was for this line: https://github.com/apache/cassandra/blob/cassandra-1.1/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java#L307 which I decomposed to find out what it was NPEing on, and got this: List<AbstractType> atList = getDefaultMarshallers(cfDef); AbstractType at = atList.get(2); Object o = at.compose(key); //NPE from this line setTupleValue(tuple, 0, o); //setTupleValue(tuple, 0, getDefaultMarshallers(cfDef).get(2).compose(key)); So it seems unrelated to the original NPE, but still matches the description of this ticket. To reproduce, here is my schema: CREATE KEYSPACE circus with placement_strategy = 'SimpleStrategy' and strategy_options = {replication_factor:1}; use circus; CREATE COLUMN FAMILY acrobats WITH comparator = UTF8Type AND key_validation_class=UTF8Type AND default_validation_class = UTF8Type; Here is a pycassa script to create the data: from pycassa.pool import ConnectionPool from pycassa.columnfamily import ColumnFamily pool = ConnectionPool('circus') col_fam = pycassa.ColumnFamily(pool, 'acrobats') for i in range(1, 10): for j in range(1, 200000): col_fam.insert('row_key' + str(i), {str(j): 'val'}) Here is the pig (0.9.2) that I'm running in local mode: rows = LOAD 'cassandra: //circus/acrobats?widerows= true &limit=200000' USING CassandraStorage(); filtered = filter rows by key == 'row_key1'; columns = foreach filtered generate flatten(columns); counted = foreach (group columns all) generate COUNT($1); dump counted;
        Jeremy Hanna made changes -
        Resolution Fixed [ 1 ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Aleksey Yeschenko made changes -
        Assignee Sheetal Gosrani [ sgosrani ] Aleksey Yeschenko [ iamaleksey ]
        Hide
        Brandon Williams added a comment -

        v2 was a little too aggressive in function consolidation. I reverted it and applied v1.

        Show
        Brandon Williams added a comment - v2 was a little too aggressive in function consolidation. I reverted it and applied v1.
        Brandon Williams made changes -
        Status Reopened [ 4 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Aleksey Yeschenko made changes -
        Assignee Aleksey Yeschenko [ iamaleksey ] Brandon Williams [ brandon.williams ]
        Brandon Williams made changes -
        Assignee Brandon Williams [ brandon.williams ] Sheetal Gosrani [ sgosrani ]
        Aleksey Yeschenko made changes -
        Component/s Hadoop [ 12313540 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        13m 10s 1 Sheetal Gosrani 17/Apr/13 20:04
        Patch Available Patch Available Resolved Resolved
        33d 20h 10m 1 Brandon Williams 21/May/13 16:15
        Resolved Resolved Reopened Reopened
        18h 29m 1 Jeremy Hanna 22/May/13 10:45
        Reopened Reopened Resolved Resolved
        5h 36m 1 Brandon Williams 22/May/13 16:21

          People

          • Assignee:
            Sheetal Gosrani
            Reporter:
            Sheetal Gosrani
            Reviewer:
            Brandon Williams
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development