Pig
  1. Pig
  2. PIG-1849

Pig cannot dereference Cassandra subcolumns in a Super Column Family

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: None
    • Component/s: data
    • Labels:
    • Environment:

      Ubuntu 10, Cassandra 0.7, Hadoop

      Description

      When using the ColumnFamilyInputFormat to load data from a Cassandra Super Column Family, the subcolumns always return in a bag where individual values cannot be dereferenced, no matter what schema is used. Flattening does not solve the issue.

        Issue Links

          Activity

          Hide
          Olga Natkovich added a comment -

          Do you jave a loader that wraps the input format. I would suspect that the problem will be in the loader that produces the data.

          Also, please, add the script and the error that you are getting

          Show
          Olga Natkovich added a comment - Do you jave a loader that wraps the input format. I would suspect that the problem will be in the loader that produces the data. Also, please, add the script and the error that you are getting
          Hide
          Robbie Strickland added a comment -

          I am using the input format directly, with this sample data:

          (6B108476-1C40-4847-A1B0-9DA4B0B0BF83,{(12345,

          {(TestColumn,This is a test),(TestColumn2,This is a test 2)}

          ),(12346,

          {(TestColumn1,This is a test 1),(TestColumn2,This is a test 2)}

          )})

          and this load statement:

          rows = LOAD 'cassandra://E3/StreamByProfile' USING CassandraStorage() AS (objectid, scolumns: bag {ST: tuple(timestamp, columns: bag

          {T: tuple(name:chararray, value)}

          )});

          I have tried quite a number of different schema possibilities, but all produce effectively the same result. They don't produce an error; when you attempt to reference individual items in a bag you still get the full bag (even though it allows the syntax). Attempts to flatten create the same issue.

          Show
          Robbie Strickland added a comment - I am using the input format directly, with this sample data: (6B108476-1C40-4847-A1B0-9DA4B0B0BF83,{(12345, {(TestColumn,This is a test),(TestColumn2,This is a test 2)} ),(12346, {(TestColumn1,This is a test 1),(TestColumn2,This is a test 2)} )}) and this load statement: rows = LOAD 'cassandra://E3/StreamByProfile' USING CassandraStorage() AS (objectid, scolumns: bag {ST: tuple(timestamp, columns: bag {T: tuple(name:chararray, value)} )}); I have tried quite a number of different schema possibilities, but all produce effectively the same result. They don't produce an error; when you attempt to reference individual items in a bag you still get the full bag (even though it allows the syntax). Attempts to flatten create the same issue.
          Hide
          Jeremy Hanna added a comment -

          I wonder if this is fixed by PIG-1866.

          Show
          Jeremy Hanna added a comment - I wonder if this is fixed by PIG-1866 .
          Hide
          Robbie Strickland added a comment -

          Looks like it probably is. Once there's a fix available I'll give it a try.

          Show
          Robbie Strickland added a comment - Looks like it probably is. Once there's a fix available I'll give it a try.
          Hide
          Jeremy Hanna added a comment -

          Robbie - look at Daniel's last comment. Looks like a fix is in trunk for dereferencing bags inside a tuple - in case you wanted to try it.

          Show
          Jeremy Hanna added a comment - Robbie - look at Daniel's last comment. Looks like a fix is in trunk for dereferencing bags inside a tuple - in case you wanted to try it.
          Hide
          Robbie Strickland added a comment -

          PIG-1866 may be the underlying cause of this issue.

          Show
          Robbie Strickland added a comment - PIG-1866 may be the underlying cause of this issue.
          Hide
          Fabio Souto added a comment -

          Finally PIG-1866 solve this issue?

          Show
          Fabio Souto added a comment - Finally PIG-1866 solve this issue?

            People

            • Assignee:
              Unassigned
              Reporter:
              Robbie Strickland
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development