Cassandra
  1. Cassandra
  2. CASSANDRA-3962

CassandraStorage can't cast fields from a CF correctly

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 1.0.9, 1.1.0
    • Component/s: Hadoop
    • Labels:
    • Environment:

      OSX 10.6.latest, Pig 0.9.2.

      Description

      Included scripts demonstrate the problem. Regardless of whether the key is cast as a chararray or not, the Pig scripts fail with

      java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.String
      	at org.apache.pig.backend.hadoop.HDataType.getWritableComparableTypes(HDataType.java:72)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:117)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
      	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
      	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
      	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
      
      1. test.cli
        0.4 kB
        Janne Jalkanen
      2. test.pig
        0.4 kB
        Janne Jalkanen
      3. 0001-Add-LoadCaster-to-CassandraStorage.txt
        4 kB
        Brandon Williams
      4. 0002-Compose-key-from-marshaller.txt
        2 kB
        Brandon Williams

        Activity

        Hide
        Janne Jalkanen added a comment -

        Generate the test CFs.

        Show
        Janne Jalkanen added a comment - Generate the test CFs.
        Hide
        Janne Jalkanen added a comment -

        The test pig script for both cases, you might want to comment out the "dump a" to let it continue.

        Show
        Janne Jalkanen added a comment - The test pig script for both cases, you might want to comment out the "dump a" to let it continue.
        Hide
        Janne Jalkanen added a comment -

        Relevant IRC log from #cassandra

        [22:26] driftx: hmm, I think this is the udfcontext signature reuse problem
        [22:26] driftx: jeromatron: what's the workaround for that again?
        [22:26] Ecyrd: Is there an open bug?
        [22:28] driftx: #2869, but we fixed it. hmm.
        [22:28] CassBotJr: https://issues.apache.org/jira/browse/CASSANDRA-2869 : CassandraStorage does not function properly when used multiple times in a single pig script due to UDFContext sharing issues
        [22:34] driftx:         case DataType.CHARARRAY:
        [22:34] driftx:             return new NullableText((String)o);
        [22:34] driftx: so it thinks it has a chararray, but it still has a bytearray
        [22:42] driftx: I think we have to implement a LoadCaster to get around this
        [22:43] Ecyrd: So I'm not insane, this is a real bug 
        
        Show
        Janne Jalkanen added a comment - Relevant IRC log from #cassandra [22:26] driftx: hmm, I think this is the udfcontext signature reuse problem [22:26] driftx: jeromatron: what's the workaround for that again? [22:26] Ecyrd: Is there an open bug? [22:28] driftx: #2869, but we fixed it. hmm. [22:28] CassBotJr: https: //issues.apache.org/jira/browse/CASSANDRA-2869 : CassandraStorage does not function properly when used multiple times in a single pig script due to UDFContext sharing issues [22:34] driftx: case DataType.CHARARRAY: [22:34] driftx: return new NullableText(( String )o); [22:34] driftx: so it thinks it has a chararray, but it still has a bytearray [22:42] driftx: I think we have to implement a LoadCaster to get around this [22:43] Ecyrd: So I'm not insane, this is a real bug
        Hide
        Brandon Williams added a comment -

        I think implementing LoadCaster will fix this, but it's strange to me that pig doesn't allow going to the other way, casting a chararray to a bytearray since that's the only thing guaranteed to work here, in case the Bytes CF has keys that won't map to UTF8.

        Show
        Brandon Williams added a comment - I think implementing LoadCaster will fix this, but it's strange to me that pig doesn't allow going to the other way, casting a chararray to a bytearray since that's the only thing guaranteed to work here, in case the Bytes CF has keys that won't map to UTF8.
        Hide
        Brandon Williams added a comment -

        Patch to add a LoadCaster. It does get used and converts the byte[] to String, but the join still fails with the same error

        Show
        Brandon Williams added a comment - Patch to add a LoadCaster. It does get used and converts the byte[] to String, but the join still fails with the same error
        Hide
        Janne Jalkanen added a comment -

        Could the fact that these are row keys, not columns, have something to do with the issue? Looking at CassandraStorage.getNext(), there's a line

        // set the key
        tuple.append(new DataByteArray(ByteBufferUtil.getArray(key)));

        So it looks to me like the key is always added as a DataByteArray, regardless of it's actual type? getSchema() does seem to read the value from CfDef correctly tho'.

        Show
        Janne Jalkanen added a comment - Could the fact that these are row keys, not columns, have something to do with the issue? Looking at CassandraStorage.getNext(), there's a line // set the key tuple.append(new DataByteArray(ByteBufferUtil.getArray(key))); So it looks to me like the key is always added as a DataByteArray, regardless of it's actual type? getSchema() does seem to read the value from CfDef correctly tho'.
        Hide
        Brandon Williams added a comment -

        You're totally right. It wasn't a problem casting the key from Bytes, but trying to use the one from U8. Patch to compose the key from the marshaller.

        Show
        Brandon Williams added a comment - You're totally right. It wasn't a problem casting the key from Bytes, but trying to use the one from U8. Patch to compose the key from the marshaller.
        Hide
        Pavel Yaskevich added a comment -

        +1

        Show
        Pavel Yaskevich added a comment - +1
        Hide
        Brandon Williams added a comment -

        Committed 0002, and also incorporated Janne's test into the existing tests.

        Show
        Brandon Williams added a comment - Committed 0002, and also incorporated Janne's test into the existing tests.

          People

          • Assignee:
            Brandon Williams
            Reporter:
            Janne Jalkanen
            Reviewer:
            Pavel Yaskevich
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development