Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1849

Pig cannot dereference Cassandra subcolumns in a Super Column Family

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: None
    • Component/s: data
    • Labels:
    • Environment:

      Ubuntu 10, Cassandra 0.7, Hadoop

      Description

      When using the ColumnFamilyInputFormat to load data from a Cassandra Super Column Family, the subcolumns always return in a bag where individual values cannot be dereferenced, no matter what schema is used. Flattening does not solve the issue.

        Issue Links

          Activity

          Hide
          fabio.souto Fabio Souto added a comment -

          Finally PIG-1866 solve this issue?

          Show
          fabio.souto Fabio Souto added a comment - Finally PIG-1866 solve this issue?
          Hide
          rstrickland Robbie Strickland added a comment -

          PIG-1866 may be the underlying cause of this issue.

          Show
          rstrickland Robbie Strickland added a comment - PIG-1866 may be the underlying cause of this issue.
          Hide
          jeromatron Jeremy Hanna added a comment -

          Robbie - look at Daniel's last comment. Looks like a fix is in trunk for dereferencing bags inside a tuple - in case you wanted to try it.

          Show
          jeromatron Jeremy Hanna added a comment - Robbie - look at Daniel's last comment. Looks like a fix is in trunk for dereferencing bags inside a tuple - in case you wanted to try it.
          Hide
          rstrickland Robbie Strickland added a comment -

          Looks like it probably is. Once there's a fix available I'll give it a try.

          Show
          rstrickland Robbie Strickland added a comment - Looks like it probably is. Once there's a fix available I'll give it a try.
          Hide
          jeromatron Jeremy Hanna added a comment -

          I wonder if this is fixed by PIG-1866.

          Show
          jeromatron Jeremy Hanna added a comment - I wonder if this is fixed by PIG-1866 .
          Hide
          rstrickland Robbie Strickland added a comment -

          I am using the input format directly, with this sample data:

          (6B108476-1C40-4847-A1B0-9DA4B0B0BF83,{(12345,

          {(TestColumn,This is a test),(TestColumn2,This is a test 2)}

          ),(12346,

          {(TestColumn1,This is a test 1),(TestColumn2,This is a test 2)}

          )})

          and this load statement:

          rows = LOAD 'cassandra://E3/StreamByProfile' USING CassandraStorage() AS (objectid, scolumns: bag {ST: tuple(timestamp, columns: bag

          {T: tuple(name:chararray, value)}

          )});

          I have tried quite a number of different schema possibilities, but all produce effectively the same result. They don't produce an error; when you attempt to reference individual items in a bag you still get the full bag (even though it allows the syntax). Attempts to flatten create the same issue.

          Show
          rstrickland Robbie Strickland added a comment - I am using the input format directly, with this sample data: (6B108476-1C40-4847-A1B0-9DA4B0B0BF83,{(12345, {(TestColumn,This is a test),(TestColumn2,This is a test 2)} ),(12346, {(TestColumn1,This is a test 1),(TestColumn2,This is a test 2)} )}) and this load statement: rows = LOAD 'cassandra://E3/StreamByProfile' USING CassandraStorage() AS (objectid, scolumns: bag {ST: tuple(timestamp, columns: bag {T: tuple(name:chararray, value)} )}); I have tried quite a number of different schema possibilities, but all produce effectively the same result. They don't produce an error; when you attempt to reference individual items in a bag you still get the full bag (even though it allows the syntax). Attempts to flatten create the same issue.
          Hide
          olgan Olga Natkovich added a comment -

          Do you jave a loader that wraps the input format. I would suspect that the problem will be in the loader that produces the data.

          Also, please, add the script and the error that you are getting

          Show
          olgan Olga Natkovich added a comment - Do you jave a loader that wraps the input format. I would suspect that the problem will be in the loader that produces the data. Also, please, add the script and the error that you are getting

            People

            • Assignee:
              Unassigned
              Reporter:
              rstrickland Robbie Strickland
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development