Cassandra
  1. Cassandra
  2. CASSANDRA-5234

Table created through CQL3 are not accessble to Pig 0.10

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 1.2.7
    • Component/s: Hadoop
    • Labels:
      None
    • Environment:

      Red hat linux 5

      Description

      Hi,
      i have faced a bug when creating table through CQL3 and trying to load data through pig 0.10 as follows:
      java.lang.RuntimeException: Column family 'abc' not found in keyspace 'XYZ'
      at org.apache.cassandra.hadoop.pig.CassandraStorage.initSchema(CassandraStorage.java:1112)
      at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(CassandraStorage.java:615).
      This effects from Simple table to table with compound key.

      1. pigCounter-patch.txt
        0.8 kB
        Christopher Smith
      2. fix_where_clause.patch
        3 kB
        Shamim Ahmed
      3. 5234-3-trunk.txt
        98 kB
        Alex Liu
      4. 5234-3-1.2branch.txt
        96 kB
        Alex Liu
      5. 5234-2-1.2branch.txt
        94 kB
        Alex Liu
      6. 5234-1-1.2-patch.txt
        94 kB
        Alex Liu
      7. 5234-1.2-patch.txt
        93 kB
        Alex Liu
      8. 5234.tx
        89 kB
        Alex Liu

        Activity

        Shamim Ahmed created issue -
        Hide
        Aleksey Yeschenko added a comment -

        This is not a bug - CQL3 tables are intentionally not included in thrift describe_keyspace(s) (CASSANDRA-4377).

        Show
        Aleksey Yeschenko added a comment - This is not a bug - CQL3 tables are intentionally not included in thrift describe_keyspace(s) ( CASSANDRA-4377 ).
        Aleksey Yeschenko made changes -
        Field Original Value New Value
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Not A Problem [ 8 ]
        Hide
        Brandon Williams added a comment -
        Show
        Brandon Williams added a comment - See CASSANDRA-4421
        Hide
        Cyril Scetbon added a comment -

        It means that CQL3 column families are not accessible through thrift and for me it's an issue (I do not agree with your Resolution label). That's why Pig 0.11 cannot use it. Is there a way to solve it ?
        I can help you if necessary

        Show
        Cyril Scetbon added a comment - It means that CQL3 column families are not accessible through thrift and for me it's an issue (I do not agree with your Resolution label). That's why Pig 0.11 cannot use it. Is there a way to solve it ? I can help you if necessary
        Hide
        Cyril Scetbon added a comment -

        It should be fixed after CASSANDRA-4421

        Show
        Cyril Scetbon added a comment - It should be fixed after CASSANDRA-4421
        Cyril Scetbon made changes -
        Resolution Not A Problem [ 8 ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Hide
        Alex Liu added a comment -

        I will work on it once I am done with other assignments soon.

        Show
        Alex Liu added a comment - I will work on it once I am done with other assignments soon.
        Hide
        Cyril Scetbon added a comment -

        thanks Alex !

        Show
        Cyril Scetbon added a comment - thanks Alex !
        Hide
        Alex Liu added a comment -

        To fix it, we need modify CassandraStorage to get CF meta data from system table instead of thrift describe_keyspace because of the CQL3 table doesn't show up in thrift describe_keyspace call.

        Show
        Alex Liu added a comment - To fix it, we need modify CassandraStorage to get CF meta data from system table instead of thrift describe_keyspace because of the CQL3 table doesn't show up in thrift describe_keyspace call.
        Hide
        Alex Liu added a comment -

        Patch is attached. It's on top of the 4421 patch

        Show
        Alex Liu added a comment - Patch is attached. It's on top of the 4421 patch
        Alex Liu made changes -
        Attachment 5234.tx [ 12585444 ]
        Hide
        Alex Liu added a comment - - edited

        pull @ https://github.com/alexliu68/cassandra/pull/3

        Use CassandraStorage for any cql3 tables, you will have composite columns in "columns" bag

        Use CqlStorage for any cql3 table.

        cassandra://[username:password@]<keyspace>/<columnfamily>[?[page_size=<size>]
        [&columns=<col1,col2>][&output_query=<prepared_statement>]
        [&where_clause=<clause>][&split_size=<size>][&partitioner=<partitioner>]]
        

        where
        page_size is the number of cql3 rows per page (the default is 1000, it's optional)

        columns is the column names for the cql3 select query, it's optional

        where_clause is the user defined where clause on the indexed column, it's optional

        split_size is the number of C* rows per split which can be used to tune the number of mappers

        output_query is the prepared query for inserting data to cql3 table (replace the = by @ and ? by #,
        because Pig can't take = and ? as url parameter values)

        Output row are in the following format

        (((name, value), (name, value)), (value ... value), (value...value))
        

        where the name and value tuples are key name and value pairs.

        The input schema: ((name, value), (name, value), (name, value)) where keys are in the front.

        Show
        Alex Liu added a comment - - edited pull @ https://github.com/alexliu68/cassandra/pull/3 Use CassandraStorage for any cql3 tables, you will have composite columns in "columns" bag Use CqlStorage for any cql3 table. cassandra: //[username:password@]<keyspace>/<columnfamily>[?[page_size=<size>] [&columns=<col1,col2>][&output_query=<prepared_statement>] [&where_clause=<clause>][&split_size=<size>][&partitioner=<partitioner>]] where page_size is the number of cql3 rows per page (the default is 1000, it's optional) columns is the column names for the cql3 select query, it's optional where_clause is the user defined where clause on the indexed column, it's optional split_size is the number of C* rows per split which can be used to tune the number of mappers output_query is the prepared query for inserting data to cql3 table (replace the = by @ and ? by #, because Pig can't take = and ? as url parameter values) Output row are in the following format (((name, value), (name, value)), (value ... value), (value...value)) where the name and value tuples are key name and value pairs. The input schema: ((name, value), (name, value), (name, value)) where keys are in the front.
        Hide
        Alex Liu added a comment -

        I attach 5234-1.2-patch.txt to patch 1.2 branch. It update the last patch with the latest 4421 changes.

        Show
        Alex Liu added a comment - I attach 5234-1.2-patch.txt to patch 1.2 branch. It update the last patch with the latest 4421 changes.
        Alex Liu made changes -
        Attachment 5234-1.2-patch.txt [ 12587779 ]
        Alex Liu made changes -
        Status Reopened [ 4 ] Patch Available [ 10002 ]
        Hide
        Alex Liu added a comment -

        The patch resolves the following issues.

        1. allow access to cql3 type table through CassandraStorage.

        2. create new CqlStorage to easy access cql3 tables.

        Show
        Alex Liu added a comment - The patch resolves the following issues. 1. allow access to cql3 type table through CassandraStorage. 2. create new CqlStorage to easy access cql3 tables.
        Alex Liu made changes -
        Attachment 5234-1.2-patch.txt [ 12587779 ]
        Alex Liu made changes -
        Attachment 5234-1.2-patch.txt [ 12587782 ]
        Hide
        Cyril Scetbon added a comment -

        Do you think I can test them now ?

        Show
        Cyril Scetbon added a comment - Do you think I can test them now ?
        Hide
        Alex Liu added a comment -

        yes, I have done some testing.

        Show
        Alex Liu added a comment - yes, I have done some testing.
        Jeremiah Jordan made changes -
        Fix Version/s 1.2.6 [ 12324449 ]
        Fix Version/s 1.2.2 [ 12323924 ]
        Hide
        Cyril Scetbon added a comment -

        Okay, I'll give it a try
        thanks

        Show
        Cyril Scetbon added a comment - Okay, I'll give it a try thanks
        Hide
        Cyril Scetbon added a comment - - edited

        My Pig script doesn't work anymore. I suppose you changed the input format ?
        I get :
        Projected field [filtre] does not exist in schema: key:chararray,columns:bag{:tuple(name:tuple(),value:byte array)}
        when column family is created without COMPACT STORAGE

        Something weird is that I get no values when I should get some :
        cqlsh:k1> SELECT * FROM cf1 WHERE ISE='XXXX';

        ise | filtre | value_1
        -------------------
        XXXX | 1 | 81056

        2013-06-17 13:13:01,372 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
        (XXXX,

        {((),),((filtre),),((value_1),<?)}

        ) <-- value not printable for filtre ?

        Show
        Cyril Scetbon added a comment - - edited My Pig script doesn't work anymore. I suppose you changed the input format ? I get : Projected field [filtre] does not exist in schema: key:chararray,columns:bag{:tuple(name:tuple(),value:byte array)} when column family is created without COMPACT STORAGE Something weird is that I get no values when I should get some : cqlsh:k1> SELECT * FROM cf1 WHERE ISE='XXXX'; ise | filtre | value_1 ----- ------ -------- XXXX | 1 | 81056 2013-06-17 13:13:01,372 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 (XXXX, {((),),((filtre),),((value_1),<?)} ) <-- value not printable for filtre ?
        Hide
        Alex Liu added a comment -

        Can you post your schema for the table and pig script, so I can test it.

        Show
        Alex Liu added a comment - Can you post your schema for the table and pig script, so I can test it.
        Hide
        Cyril Scetbon added a comment - - edited

        http://pastebin.com/Fub9t6j9 <-- my column family
        http://pastebin.com/HwKxsC4f <-- my pig script

        Show
        Cyril Scetbon added a comment - - edited http://pastebin.com/Fub9t6j9 <-- my column family http://pastebin.com/HwKxsC4f <-- my pig script
        Hide
        Alex Liu added a comment -

        Can you just dump(data) to check whether you have all the data? Then do the following filter scripts.

        Show
        Alex Liu added a comment - Can you just dump(data) to check whether you have all the data? Then do the following filter scripts.
        Hide
        Cyril Scetbon added a comment - - edited

        You definitely broke something as I get different results when dumping data for cql2 and cql3 tables (same structure).
        CQL2 :

        * format is 
        x: {key: chararray,value_1: (name: chararray,value: int),filtre: (name: chararray,value: int),columns: {(name: chararray,value: bytearray)}}
        * rows are
        (XXXX,(value_1,18584),(filtre,0),{})
        (YYYY,(value_1,49926),(filtre,2),{})
        

        CQL3 :

        * format is 
        x: {key: chararray,columns: {(name: (),value: bytearray)}}
        * rows are
        (XXXX,{((),),((filtre),),((value_1),??)})
        (YYYY,{((),),((filtre),),((value_1),??)})
        
        Show
        Cyril Scetbon added a comment - - edited You definitely broke something as I get different results when dumping data for cql2 and cql3 tables (same structure). CQL2 : * format is x: {key: chararray,value_1: (name: chararray,value: int ),filtre: (name: chararray,value: int ),columns: {(name: chararray,value: bytearray)}} * rows are (XXXX,(value_1,18584),(filtre,0),{}) (YYYY,(value_1,49926),(filtre,2),{}) CQL3 : * format is x: {key: chararray,columns: {(name: (),value: bytearray)}} * rows are (XXXX,{((),),((filtre),),((value_1),??)}) (YYYY,{((),),((filtre),),((value_1),??)})
        Hide
        Alex Liu added a comment -

        CQL2 has different structure from CQL3, CQL2 is more to the legacy thrift type CF(Check the developer blog from datastax). You should follow the cql3 format to change your pig script. or use CqlStorage for easy data mapping for cql3 type table.

        Show
        Alex Liu added a comment - CQL2 has different structure from CQL3, CQL2 is more to the legacy thrift type CF(Check the developer blog from datastax). You should follow the cql3 format to change your pig script. or use CqlStorage for easy data mapping for cql3 type table.
        Hide
        Alex Liu added a comment -

        Use the following script to find the structure of cql3 table, then change your pig script

           rows = LOAD 'cassandra://MyKeyspace/MyColumnFamily' USING CassandraStorage();
           dump(rows);
        
        Show
        Alex Liu added a comment - Use the following script to find the structure of cql3 table, then change your pig script rows = LOAD 'cassandra: //MyKeyspace/MyColumnFamily' USING CassandraStorage(); dump(rows);
        Hide
        Cyril Scetbon added a comment - - edited

        Great, CqlStorage() helps a lot to get same input.

        Thanks

        Show
        Cyril Scetbon added a comment - - edited Great, CqlStorage() helps a lot to get same input. Thanks
        Hide
        Alex Liu added a comment -

        5234-1-1.2-patch.txt is attached to allow CassandraStorage to pass partitioner as parameter.

        Show
        Alex Liu added a comment - 5234-1-1.2-patch.txt is attached to allow CassandraStorage to pass partitioner as parameter.
        Alex Liu made changes -
        Attachment 5234-1-1.2-patch.txt [ 12588217 ]
        Alex Liu made changes -
        Attachment 5234-1-1.2-patch.txt [ 12588217 ]
        Alex Liu made changes -
        Attachment 5234-1-1.2-patch.txt [ 12588404 ]
        Alex Liu made changes -
        Attachment 5234-1-1.2-patch.txt [ 12588404 ]
        Alex Liu made changes -
        Attachment 5234-1-1.2-patch.txt [ 12588405 ]
        Brandon Williams made changes -
        Assignee Alex Liu [ alexliu68 ]
        Reviewer brandon.williams
        Cyril Scetbon made changes -
        Comment [ [~alexliu68] When using a DC dedicated to hadoop tasks, do nodes from this DC contact nodes from other DCs when using Hadoop jobs ? If yes, I see you use ConsistencyLevel.ONE, can you provide a way to override it to use LOCAL_QUORUM ? We don't want to load other nodes in case of performance are bad for example. ]
        Hide
        Alex Liu added a comment -

        5234-2-1.2branch.txt is attached to use "cql://" instead of "cassandra://" for CqlStorage

        Show
        Alex Liu added a comment - 5234-2-1.2branch.txt is attached to use "cql://" instead of "cassandra://" for CqlStorage
        Alex Liu made changes -
        Attachment 5234-2-1.2branch.txt [ 12588946 ]
        Hide
        Brandon Williams added a comment -

        Hmm, I'm seeing errors when running the examples/pig/test tests that don't use cql3.

        Show
        Brandon Williams added a comment - Hmm, I'm seeing errors when running the examples/pig/test tests that don't use cql3.
        Cyril Scetbon made changes -
        Comment [ In my previous comment I asked for a consistency parameter and found there is already one in hadoop configuration, so never mind. We won't be able to use it as we have Replication-Factor=1 :( I sent an email on user@apache.cassandra.org to ask for a workaround. ]
        Hide
        Alex Liu added a comment -

        I will fix it today.

        Show
        Alex Liu added a comment - I will fix it today.
        Hide
        Alex Liu added a comment -

        The failed test is where there is filter for COUNT(columns)

        -- filter to fully visible rows (no uuid columns) and dump
        visible = FILTER rows BY COUNT(columns) == 0;
        dump visible;
        
        Show
        Alex Liu added a comment - The failed test is where there is filter for COUNT(columns) -- filter to fully visible rows (no uuid columns) and dump visible = FILTER rows BY COUNT(columns) == 0; dump visible;
        Alex Liu made changes -
        Attachment 5234-3-1.2branch.txt [ 12589677 ]
        Hide
        Alex Liu added a comment - - edited

        It turns out to wide row schema issue which could be an error from previous version.

        5234-3-1.2branch.txt is attached to fixd failing to run examples/pig/test

        Show
        Alex Liu added a comment - - edited It turns out to wide row schema issue which could be an error from previous version. 5234-3-1.2branch.txt is attached to fixd failing to run examples/pig/test
        Sylvain Lebresne made changes -
        Fix Version/s 1.2.7 [ 12324628 ]
        Fix Version/s 1.2.6 [ 12324449 ]
        Hide
        Deepak Rosario Pancras added a comment -

        @Alex I pulled the branch CASSANDRA-5234 and tried building it. But the build fails when compiling.

        Show
        Deepak Rosario Pancras added a comment - @Alex I pulled the branch CASSANDRA-5234 and tried building it. But the build fails when compiling.
        Hide
        Alex Liu added a comment -

        It works for me.

        try the commands

        git checkout cassandra-1.2
        patch -p1 < 5234-3-1.2branch.txt 
        
        Show
        Alex Liu added a comment - It works for me. try the commands git checkout cassandra-1.2 patch -p1 < 5234-3-1.2branch.txt
        Hide
        Deepak Rosario Pancras added a comment -

        @Alex: I did check out cassandra-1.2 and tried to apply the patch using "patch -p1 < 5234-3-1.2branch.txt" But it didnt work it threw and exception < is reserveed for future use.
        So, I downloaded the patch 5234-3-1.2branch.txt and did a git apply and then committed the unstaged changes. Now when I browsed the file location cassandra "/ src / java / org / apache / cassandra / hadoop / pig /" the patch is not applied. Sorry for my naive questions, I am very new to git and cassandra.

        Show
        Deepak Rosario Pancras added a comment - @Alex: I did check out cassandra-1.2 and tried to apply the patch using "patch -p1 < 5234-3-1.2branch.txt" But it didnt work it threw and exception < is reserveed for future use. So, I downloaded the patch 5234-3-1.2branch.txt and did a git apply and then committed the unstaged changes. Now when I browsed the file location cassandra "/ src / java / org / apache / cassandra / hadoop / pig /" the patch is not applied. Sorry for my naive questions, I am very new to git and cassandra.
        Hide
        Deepak Rosario Pancras added a comment -

        The patch is applied. But do I have to worry about the

        5234-3-1.2branch.txt:22: trailing whitespace.
        partitioner = FBUtilities.newPartitioner(client.describe_partitioner());
        5234-3-1.2branch.txt:735: trailing whitespace.

        5234-3-1.2branch.txt:925: trailing whitespace.

        5234-3-1.2branch.txt:1297: trailing whitespace.
        }
        5234-3-1.2branch.txt:1321: trailing whitespace.

        Show
        Deepak Rosario Pancras added a comment - The patch is applied. But do I have to worry about the 5234-3-1.2branch.txt:22: trailing whitespace. partitioner = FBUtilities.newPartitioner(client.describe_partitioner()); 5234-3-1.2branch.txt:735: trailing whitespace. 5234-3-1.2branch.txt:925: trailing whitespace. 5234-3-1.2branch.txt:1297: trailing whitespace. } 5234-3-1.2branch.txt:1321: trailing whitespace.
        Hide
        Brandon Williams added a comment -

        Committed, and created CASSANDRA-5709 for example follow-up. Thanks!

        Show
        Brandon Williams added a comment - Committed, and created CASSANDRA-5709 for example follow-up. Thanks!
        Hide
        Brandon Williams added a comment -

        Deepak Rosario Pancras it's in the 1.2 branch now, just do a git pull.

        Show
        Brandon Williams added a comment - Deepak Rosario Pancras it's in the 1.2 branch now, just do a git pull.
        Hide
        Deepak Rosario Pancras added a comment -

        Thanks Brandon

        Show
        Deepak Rosario Pancras added a comment - Thanks Brandon
        Hide
        Alex Liu added a comment -

        5234-3-trunk.txt patch for trunk branch is attached.

        Show
        Alex Liu added a comment - 5234-3-trunk.txt patch for trunk branch is attached.
        Alex Liu made changes -
        Attachment 5234-3-trunk.txt [ 12589810 ]
        Alex Liu made changes -
        Attachment 5234-3-trunk.txt [ 12589810 ]
        Alex Liu made changes -
        Attachment 5234-3-trunk.txt [ 12589811 ]
        Hide
        Deepak Rosario Pancras added a comment -

        @Brandon. Thanks for your patience. I did a git pull on 1.2 and ran the build. It fails again.

        Show
        Deepak Rosario Pancras added a comment - @Brandon. Thanks for your patience. I did a git pull on 1.2 and ran the build. It fails again.
        Hide
        Brandon Williams added a comment -

        I recommend sending an email to user list with the details, as the 1.2 branch is obviously compiling for everyone and JIRA isn't a support forum.

        Show
        Brandon Williams added a comment - I recommend sending an email to user list with the details, as the 1.2 branch is obviously compiling for everyone and JIRA isn't a support forum.
        Brandon Williams made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Shamim Ahmed added a comment -

        Hello Alex!
        I have got error when trying to send where_clause in cqlStorage URL as follows:
        rows = LOAD 'cql://keyspace1/test?page_size=1&columns=title,age&split_size=4&where_clause=age=41' USING CqlStorage();
        and haven't find any way to escape the equal operator.
        It will be better to send the where_clause with URL encoding, for example where_clause=age%3D41.
        However for a quick fix i have slightly modify the method getQueryMap as follows:
        /** decompose the query to store the parameters in a map*/
        public static Map<String, String> getQueryMap(String query) throws Exception
        {
        String[] params = query.split("&");
        Map<String, String> map = new HashMap<String, String>();
        for (String param : params)

        { String[] keyValue = param.split("="); map.put(keyValue[0], URLDecoder.decode(keyValue[1],"UTF-8")); }

        return map;
        }
        and now i can send the query with URL-Encoding character.

        Show
        Shamim Ahmed added a comment - Hello Alex! I have got error when trying to send where_clause in cqlStorage URL as follows: rows = LOAD 'cql://keyspace1/test?page_size=1&columns=title,age&split_size=4&where_clause=age=41' USING CqlStorage(); and haven't find any way to escape the equal operator. It will be better to send the where_clause with URL encoding, for example where_clause=age%3D41. However for a quick fix i have slightly modify the method getQueryMap as follows: /** decompose the query to store the parameters in a map*/ public static Map<String, String> getQueryMap(String query) throws Exception { String[] params = query.split("&"); Map<String, String> map = new HashMap<String, String>(); for (String param : params) { String[] keyValue = param.split("="); map.put(keyValue[0], URLDecoder.decode(keyValue[1],"UTF-8")); } return map; } and now i can send the query with URL-Encoding character.
        Hide
        Alex Liu added a comment -

        Thx, I will update the patch for it.

        Show
        Alex Liu added a comment - Thx, I will update the patch for it.
        Shamim Ahmed made changes -
        Attachment fix_where_clause.patch [ 12591607 ]
        Hide
        Shamim Ahmed added a comment -

        add the patch as a temporary fix

        Show
        Shamim Ahmed added a comment - add the patch as a temporary fix
        Hide
        Konrad Kurdej added a comment - - edited

        I was trying it locally and I'm not sure whether it fully works. My stacktrace and cassandra table schema is here: http://pastebin.com/uPUAs9T2 (i tried using old cassandra:// ... - it worked for other table) and output when I used cql://.. is here: http://pastebin.com/b0bKd7G3 . I have worked on this with Pig in local mode. It might be important or not: this table contained counters.

        I was working on cassandra-1.2 with latest commit 27efded38d855b24f41e5332ffb29cd13d98f8da

        Show
        Konrad Kurdej added a comment - - edited I was trying it locally and I'm not sure whether it fully works. My stacktrace and cassandra table schema is here: http://pastebin.com/uPUAs9T2 (i tried using old cassandra:// ... - it worked for other table) and output when I used cql://.. is here: http://pastebin.com/b0bKd7G3 . I have worked on this with Pig in local mode. It might be important or not: this table contained counters. I was working on cassandra-1.2 with latest commit 27efded38d855b24f41e5332ffb29cd13d98f8da
        Hide
        Christopher Smith added a comment -

        Just adding that I'm getting the same result as Konrad Kurdej. I'm using DSE 3.1, but I think this is the same bug. I was able to isolate the problem specifically to counter fields. Here's a simple set up and test case:

        cqlsh> create keyspace pigtest with REPLICATION =

        { 'class': 'SimpleStrategy', 'replication_factor': 1}

        ;
        cqlsh> use pigtest;
        cqlsh:pigtest> create table foo ( key_1 text primary key, value_1 bigint );
        cqlsh:pigtest> create table foo2 ( key_1 text primary key, value_1 counter );
        cqlsh:pigtest> update foo set value_1 = 1 where key_1 = 'foo';
        cqlsh:pigtest> update foo2 set value_1 = value_1 + 2 where key_1 = 'foo2';
        cqlsh:pigtest> select * from foo;

        key_1 | value_1
        ------+--------
        foo | 1

        cqlsh:pigtest> select * from foo2;

        key_1 | value_1
        ------+--------
        foo2 | 2

        Now, the following grunt commands:

        counts = LOAD 'cassandra://pigtest/foo' USING CassandraStorage();
        dump counts;

        Will work, but:

        counts = LOAD 'cassandra://pigtest/foo2' USING CassandraStorage();
        dump counts;

        Will fail with the same stack trace that Konrad mentioned.

        Show
        Christopher Smith added a comment - Just adding that I'm getting the same result as Konrad Kurdej. I'm using DSE 3.1, but I think this is the same bug. I was able to isolate the problem specifically to counter fields. Here's a simple set up and test case: cqlsh> create keyspace pigtest with REPLICATION = { 'class': 'SimpleStrategy', 'replication_factor': 1} ; cqlsh> use pigtest; cqlsh:pigtest> create table foo ( key_1 text primary key, value_1 bigint ); cqlsh:pigtest> create table foo2 ( key_1 text primary key, value_1 counter ); cqlsh:pigtest> update foo set value_1 = 1 where key_1 = 'foo'; cqlsh:pigtest> update foo2 set value_1 = value_1 + 2 where key_1 = 'foo2'; cqlsh:pigtest> select * from foo; key_1 | value_1 ------ + -------- foo | 1 cqlsh:pigtest> select * from foo2; key_1 | value_1 ------ + -------- foo2 | 2 Now, the following grunt commands: counts = LOAD 'cassandra://pigtest/foo' USING CassandraStorage(); dump counts; Will work, but: counts = LOAD 'cassandra://pigtest/foo2' USING CassandraStorage(); dump counts; Will fail with the same stack trace that Konrad mentioned.
        Hide
        Christopher Smith added a comment -

        This patch against the head appears to have fixed the problem for me. I applied it to DSE 3.1 and it also worked for me.

        Basic deal is to use LongType for validation too.

        Show
        Christopher Smith added a comment - This patch against the head appears to have fixed the problem for me. I applied it to DSE 3.1 and it also worked for me. Basic deal is to use LongType for validation too.
        Christopher Smith made changes -
        Attachment pigCounter-patch.txt [ 12594708 ]
        Hide
        Alex Liu added a comment -

        Use CqlStorage for your use case. CassandraStorage has some drawbacks.

        Show
        Alex Liu added a comment - Use CqlStorage for your use case. CassandraStorage has some drawbacks.
        Hide
        Alex Liu added a comment -

        CassandraStorage is legacy for any none-CQL3 tables. Use CqlStorage for CQL3 tables.

        Show
        Alex Liu added a comment - CassandraStorage is legacy for any none-CQL3 tables. Use CqlStorage for CQL3 tables.
        Hide
        Marcos Trama added a comment -

        Cassandra 1.2.8 was released (after regression in 1.2.7), but this last patch for counters appears to be left. Anyone can confirm this?

        I downloaded the .deb from http://people.apache.org/~eevans/ but when in grunt i dump a table with counter column, get the error:

        java.lang.IndexOutOfBoundsException
        at java.nio.Buffer.checkIndex(Buffer.java:537)
        at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:410)
        at org.apache.cassandra.db.context.CounterContext.total(CounterContext.java:477)
        at org.apache.cassandra.db.marshal.AbstractCommutativeType.compose(AbstractCommutativeType.java:34)
        at org.apache.cassandra.db.marshal.AbstractCommutativeType.compose(AbstractCommutativeType.java:25)
        at org.apache.cassandra.hadoop.pig.AbstractCassandraStorage.columnToTuple(AbstractCassandraStorage.java:137)
        at org.apache.cassandra.hadoop.pig.CqlStorage.getNext(CqlStorage.java:110)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
        at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:416)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)

        My table:
        cqlsh:test> desc table votes_count_period3;

        CREATE TABLE votes_count_period3 (
        period text,
        poll timeuuid,
        votes counter,
        PRIMARY KEY (period, poll)
        ) WITH
        bloom_filter_fp_chance=0.010000 AND
        caching='KEYS_ONLY' AND
        comment='' AND
        dclocal_read_repair_chance=0.000000 AND
        gc_grace_seconds=864000 AND
        read_repair_chance=0.100000 AND
        replicate_on_write='true' AND
        populate_io_cache_on_flush='false' AND
        compaction=

        {'class': 'SizeTieredCompactionStrategy'}

        AND
        compression=

        {'sstable_compression': 'SnappyCompressor'}

        ;

        Show
        Marcos Trama added a comment - Cassandra 1.2.8 was released (after regression in 1.2.7), but this last patch for counters appears to be left. Anyone can confirm this? I downloaded the .deb from http://people.apache.org/~eevans/ but when in grunt i dump a table with counter column, get the error: java.lang.IndexOutOfBoundsException at java.nio.Buffer.checkIndex(Buffer.java:537) at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:410) at org.apache.cassandra.db.context.CounterContext.total(CounterContext.java:477) at org.apache.cassandra.db.marshal.AbstractCommutativeType.compose(AbstractCommutativeType.java:34) at org.apache.cassandra.db.marshal.AbstractCommutativeType.compose(AbstractCommutativeType.java:25) at org.apache.cassandra.hadoop.pig.AbstractCassandraStorage.columnToTuple(AbstractCassandraStorage.java:137) at org.apache.cassandra.hadoop.pig.CqlStorage.getNext(CqlStorage.java:110) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.Child.main(Child.java:249) My table: cqlsh:test> desc table votes_count_period3; CREATE TABLE votes_count_period3 ( period text, poll timeuuid, votes counter, PRIMARY KEY (period, poll) ) WITH bloom_filter_fp_chance=0.010000 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.000000 AND gc_grace_seconds=864000 AND read_repair_chance=0.100000 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction= {'class': 'SizeTieredCompactionStrategy'} AND compression= {'sstable_compression': 'SnappyCompressor'} ;
        Hide
        Marcos Trama added a comment -

        I confirmed this issue. I downloaded the src, applied the patch and after build the cassandra jar, pig works with counter. This will be patched in 1.2.9?

        Show
        Marcos Trama added a comment - I confirmed this issue. I downloaded the src, applied the patch and after build the cassandra jar, pig works with counter. This will be patched in 1.2.9?
        Hide
        Alex Liu added a comment -

        +1

        Show
        Alex Liu added a comment - +1
        Hide
        Christopher Smith added a comment -

        Alex Liu The patch is to the base class shared by CqlStorage and CassandraStorage. They were both broken and they are now both fixed.

        Show
        Christopher Smith added a comment - Alex Liu The patch is to the base class shared by CqlStorage and CassandraStorage. They were both broken and they are now both fixed.
        Hide
        Jacques Lemire added a comment -

        The last patch for counters to AbstractCassandraStorage.java has not been applied either to cassandra-1.2 or to the trunk. Consequently, the problem still exists in 1.2.9, as I could verify myself today. I found another recent bug report on SO for the same problem: http://stackoverflow.com/questions/18553230/error-with-cassandra-pig-cql-counter-column.

        Should I open a new bug report for the counters bug even though we have a working patch or will you reopen the current issue?

        Show
        Jacques Lemire added a comment - The last patch for counters to AbstractCassandraStorage.java has not been applied either to cassandra-1.2 or to the trunk. Consequently, the problem still exists in 1.2.9, as I could verify myself today. I found another recent bug report on SO for the same problem: http://stackoverflow.com/questions/18553230/error-with-cassandra-pig-cql-counter-column . Should I open a new bug report for the counters bug even though we have a working patch or will you reopen the current issue?
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        7h 58m 1 Aleksey Yeschenko 08/Feb/13 16:12
        Resolved Resolved Reopened Reopened
        93d 16h 35m 1 Cyril Scetbon 13/May/13 09:48
        Reopened Reopened Patch Available Patch Available
        31d 21h 3m 1 Alex Liu 14/Jun/13 06:51
        Patch Available Patch Available Resolved Resolved
        14d 17h 17m 1 Brandon Williams 29/Jun/13 00:09

          People

          • Assignee:
            Alex Liu
            Reporter:
            Shamim Ahmed
            Reviewer:
            Brandon Williams
          • Votes:
            4 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development