Cassandra
  1. Cassandra
  2. CASSANDRA-3962

CassandraStorage can't cast fields from a CF correctly

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 1.0.9, 1.1.0
    • Component/s: None
    • Labels:
    • Environment:

      OSX 10.6.latest, Pig 0.9.2.

      Description

      Included scripts demonstrate the problem. Regardless of whether the key is cast as a chararray or not, the Pig scripts fail with

      java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.String
      	at org.apache.pig.backend.hadoop.HDataType.getWritableComparableTypes(HDataType.java:72)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:117)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
      	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
      	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
      	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
      
      1. 0002-Compose-key-from-marshaller.txt
        2 kB
        Brandon Williams
      2. 0001-Add-LoadCaster-to-CassandraStorage.txt
        4 kB
        Brandon Williams
      3. test.pig
        0.4 kB
        Janne Jalkanen
      4. test.cli
        0.4 kB
        Janne Jalkanen

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        18h 55m 1 Brandon Williams 27/Feb/12 15:46
        Patch Available Patch Available Resolved Resolved
        1h 17m 1 Brandon Williams 27/Feb/12 17:04
        Aleksey Yeschenko made changes -
        Component/s Hadoop [ 12313540 ]
        Gavin made changes -
        Workflow patch-available, re-open possible [ 12749550 ] reopen-resolved, no closed status, patch-avail, testing [ 12757099 ]
        Gavin made changes -
        Workflow no-reopen-closed, patch-avail [ 12654912 ] patch-available, re-open possible [ 12749550 ]
        Brandon Williams made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Brandon Williams added a comment -

        Committed 0002, and also incorporated Janne's test into the existing tests.

        Show
        Brandon Williams added a comment - Committed 0002, and also incorporated Janne's test into the existing tests.
        Hide
        Pavel Yaskevich added a comment -

        +1

        Show
        Pavel Yaskevich added a comment - +1
        Brandon Williams made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Brandon Williams made changes -
        Fix Version/s 1.0.9 [ 12319856 ]
        Fix Version/s 1.1.0 [ 12317615 ]
        Reviewer xedin
        Brandon Williams made changes -
        Attachment 0002-Compose-key-from-marshaller.txt [ 12516167 ]
        Hide
        Brandon Williams added a comment -

        You're totally right. It wasn't a problem casting the key from Bytes, but trying to use the one from U8. Patch to compose the key from the marshaller.

        Show
        Brandon Williams added a comment - You're totally right. It wasn't a problem casting the key from Bytes, but trying to use the one from U8. Patch to compose the key from the marshaller.
        Hide
        Janne Jalkanen added a comment -

        Could the fact that these are row keys, not columns, have something to do with the issue? Looking at CassandraStorage.getNext(), there's a line

        // set the key
        tuple.append(new DataByteArray(ByteBufferUtil.getArray(key)));

        So it looks to me like the key is always added as a DataByteArray, regardless of it's actual type? getSchema() does seem to read the value from CfDef correctly tho'.

        Show
        Janne Jalkanen added a comment - Could the fact that these are row keys, not columns, have something to do with the issue? Looking at CassandraStorage.getNext(), there's a line // set the key tuple.append(new DataByteArray(ByteBufferUtil.getArray(key))); So it looks to me like the key is always added as a DataByteArray, regardless of it's actual type? getSchema() does seem to read the value from CfDef correctly tho'.
        Brandon Williams made changes -
        Hide
        Brandon Williams added a comment -

        Patch to add a LoadCaster. It does get used and converts the byte[] to String, but the join still fails with the same error

        Show
        Brandon Williams added a comment - Patch to add a LoadCaster. It does get used and converts the byte[] to String, but the join still fails with the same error
        Hide
        Brandon Williams added a comment -

        I think implementing LoadCaster will fix this, but it's strange to me that pig doesn't allow going to the other way, casting a chararray to a bytearray since that's the only thing guaranteed to work here, in case the Bytes CF has keys that won't map to UTF8.

        Show
        Brandon Williams added a comment - I think implementing LoadCaster will fix this, but it's strange to me that pig doesn't allow going to the other way, casting a chararray to a bytearray since that's the only thing guaranteed to work here, in case the Bytes CF has keys that won't map to UTF8.
        Janne Jalkanen made changes -
        Labels hadoop pig
        Brandon Williams made changes -
        Assignee Brandon Williams [ brandon.williams ]
        Hide
        Janne Jalkanen added a comment -

        Relevant IRC log from #cassandra

        [22:26] driftx: hmm, I think this is the udfcontext signature reuse problem
        [22:26] driftx: jeromatron: what's the workaround for that again?
        [22:26] Ecyrd: Is there an open bug?
        [22:28] driftx: #2869, but we fixed it. hmm.
        [22:28] CassBotJr: https://issues.apache.org/jira/browse/CASSANDRA-2869 : CassandraStorage does not function properly when used multiple times in a single pig script due to UDFContext sharing issues
        [22:34] driftx:         case DataType.CHARARRAY:
        [22:34] driftx:             return new NullableText((String)o);
        [22:34] driftx: so it thinks it has a chararray, but it still has a bytearray
        [22:42] driftx: I think we have to implement a LoadCaster to get around this
        [22:43] Ecyrd: So I'm not insane, this is a real bug 
        
        Show
        Janne Jalkanen added a comment - Relevant IRC log from #cassandra [22:26] driftx: hmm, I think this is the udfcontext signature reuse problem [22:26] driftx: jeromatron: what's the workaround for that again? [22:26] Ecyrd: Is there an open bug? [22:28] driftx: #2869, but we fixed it. hmm. [22:28] CassBotJr: https: //issues.apache.org/jira/browse/CASSANDRA-2869 : CassandraStorage does not function properly when used multiple times in a single pig script due to UDFContext sharing issues [22:34] driftx: case DataType.CHARARRAY: [22:34] driftx: return new NullableText(( String )o); [22:34] driftx: so it thinks it has a chararray, but it still has a bytearray [22:42] driftx: I think we have to implement a LoadCaster to get around this [22:43] Ecyrd: So I'm not insane, this is a real bug
        Janne Jalkanen made changes -
        Attachment test.pig [ 12516116 ]
        Hide
        Janne Jalkanen added a comment -

        The test pig script for both cases, you might want to comment out the "dump a" to let it continue.

        Show
        Janne Jalkanen added a comment - The test pig script for both cases, you might want to comment out the "dump a" to let it continue.
        Janne Jalkanen made changes -
        Field Original Value New Value
        Attachment test.cli [ 12516115 ]
        Hide
        Janne Jalkanen added a comment -

        Generate the test CFs.

        Show
        Janne Jalkanen added a comment - Generate the test CFs.
        Janne Jalkanen created issue -

          People

          • Assignee:
            Brandon Williams
            Reporter:
            Janne Jalkanen
            Reviewer:
            Pavel Yaskevich
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development