Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-2810

RuntimeException in Pig when using "dump" command on column name

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Fix Version/s: 0.8.7
    • Component/s: None
    • Labels:
      None
    • Environment:

      Ubuntu 10.10, 32 bits
      java version "1.6.0_24"
      Brisk beta-2 installed from Debian packages

      Description

      This bug was previously report on Brisk bug tracker.

      In cassandra-cli:

      [default@unknown] create keyspace Test
          with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
          and strategy_options = [{replication_factor:1}];
      
      [default@unknown] use Test;
      Authenticated to keyspace: Test
      
      [default@Test] create column family test;
      
      [default@Test] set test[ascii('row1')][long(1)]=integer(35);
      set test[ascii('row1')][long(2)]=integer(36);
      set test[ascii('row1')][long(3)]=integer(38);
      set test[ascii('row2')][long(1)]=integer(45);
      set test[ascii('row2')][long(2)]=integer(42);
      set test[ascii('row2')][long(3)]=integer(33);
      
      [default@Test] list test;
      Using default limit of 100
      -------------------
      RowKey: 726f7731
      => (column=0000000000000001, value=35, timestamp=1308744931122000)
      => (column=0000000000000002, value=36, timestamp=1308744931124000)
      => (column=0000000000000003, value=38, timestamp=1308744931125000)
      -------------------
      RowKey: 726f7732
      => (column=0000000000000001, value=45, timestamp=1308744931127000)
      => (column=0000000000000002, value=42, timestamp=1308744931128000)
      => (column=0000000000000003, value=33, timestamp=1308744932722000)
      
      2 Rows Returned.
      
      [default@Test] describe keyspace;
      Keyspace: Test:
        Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
        Durable Writes: true
          Options: [replication_factor:1]
        Column Families:
          ColumnFamily: test
            Key Validation Class: org.apache.cassandra.db.marshal.BytesType
            Default column value validator: org.apache.cassandra.db.marshal.BytesType
            Columns sorted by: org.apache.cassandra.db.marshal.BytesType
            Row cache size / save period in seconds: 0.0/0
            Key cache size / save period in seconds: 200000.0/14400
            Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
            GC grace seconds: 864000
            Compaction min/max thresholds: 4/32
            Read repair chance: 1.0
            Replicate on write: false
            Built indexes: []
      

      In Pig command line:

      grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
      
      grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
      
      grunt> dump value_test;
      

      In /var/log/cassandra/system.log, I have severals time this exception:

      INFO [IPC Server handler 3 on 8012] 2011-06-22 15:03:28,533 TaskInProgress.java (line 551) Error from attempt_201106210955_0051_m_000000_3: java.lang.RuntimeException: Unexpected data type -1 found in stream.
      	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:478)
      	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
      	at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:522)
      	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
      	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
      	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
      	at org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73)
      	at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:87)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
      	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
      	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:239)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
      	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
      	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
      	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:396)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
      	at org.apache.hadoop.mapred.Child.main(Child.java:253)
      

      and the request failed.

      grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
      
      grunt> value_test = foreach test generate rowkey, columns.value;
      
      grunt> dump value_test;
      

      This time, without the column name, it's work (but the value are displayed as char instead of integer). Result:

      (row1,{(#),($),(&)})
      (row2,{(-),(*),(!)})
      

      Now we do the same test but we set a comparator to the CF.

      [default@Test] create column family test with comparator = 'LongType';
      
      [default@Test] set test[ascii('row1')][long(1)]=integer(35);
      set test[ascii('row1')][long(2)]=integer(36);
      set test[ascii('row1')][long(3)]=integer(38);
      set test[ascii('row2')][long(1)]=integer(45);
      set test[ascii('row2')][long(2)]=integer(42);
      set test[ascii('row2')][long(3)]=integer(33);
      
      [default@Test] list test;
      Using default limit of 100
      -------------------
      RowKey: 726f7731
      => (column=1, value=35, timestamp=1308748643506000)
      => (column=2, value=36, timestamp=1308748643508000)
      => (column=3, value=38, timestamp=1308748643509000)
      -------------------
      RowKey: 726f7732
      => (column=1, value=45, timestamp=1308748643510000)
      => (column=2, value=42, timestamp=1308748643512000)
      => (column=3, value=33, timestamp=1308748645138000)
      
      2 Rows Returned.
      
      [default@Test] describe keyspace;
      Keyspace: Test:
        Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
        Durable Writes: true
          Options: [replication_factor:1]
        Column Families:
          ColumnFamily: test
            Key Validation Class: org.apache.cassandra.db.marshal.BytesType
            Default column value validator: org.apache.cassandra.db.marshal.BytesType
            Columns sorted by: org.apache.cassandra.db.marshal.LongType
            Row cache size / save period in seconds: 0.0/0
            Key cache size / save period in seconds: 200000.0/14400
            Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
            GC grace seconds: 864000
            Compaction min/max thresholds: 4/32
            Read repair chance: 1.0
            Replicate on write: false
            Built indexes: []
      
      grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
      
      grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
      
      grunt> dump value_test;
      

      This time it's work as expected (appart from the value displayed as char). Result:

      (row1,{(1),(2),(3)},{(#),($),(&)})
      (row2,{(1),(2),(3)},{(-),(*),(!)})
      

        Attachments

        1. 2810.txt
          5 kB
          Brandon Williams
        2. 2810-v2.txt
          4 kB
          Brandon Williams
        3. 2810-v3.txt
          4 kB
          Brandon Williams

          Activity

            People

            • Assignee:
              brandon.williams Brandon Williams
              Reporter:
              silvere Silvère Lestang
              Reviewer:
              Jeremy Hanna
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: