Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-2777

Pig storage handler should implement LoadMetadata

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Fix Version/s: 0.8.7
    • Component/s: None
    • Labels:
      None

      Description

      The reason for this is many builtin functions like SUM won't work on longs (you can workaround using LongSum, but that's lame) because the query planner doesn't know about the types beforehand, even though we are casting to native longs.

      There is some impact to this, though. With LoadMetadata implemented, existing scripts that specify schema will need to remove it (since LM is doing it for them) and they will need to conform to LM's terminology (key, columns, name, value) within the script. This is trivial to change, however, and the increased functionality is worth the switch.

      1. 2777-v2.txt
        6 kB
        Brandon Williams
      2. 2777.txt
        5 kB
        Brandon Williams

        Activity

        Hide
        brandon.williams Brandon Williams added a comment -

        Patch implements the LoadMetadata interface. Doesn't handle supercolumns since we already punted on deserializing those.

        Show
        brandon.williams Brandon Williams added a comment - Patch implements the LoadMetadata interface. Doesn't handle supercolumns since we already punted on deserializing those.
        Hide
        jeromatron Jeremy Hanna added a comment - - edited

        while we're add it can we remove the redundant addMutation call on line 505 and on line 513 add the e param on:

        throw new IOException(e + " Output must be (key, {(column,value)...}) for ColumnFamily or (key, {supercolumn:{(column,value)...}...}) for SuperColumnFamily", e);
        
        Show
        jeromatron Jeremy Hanna added a comment - - edited while we're add it can we remove the redundant addMutation call on line 505 and on line 513 add the e param on: throw new IOException(e + " Output must be (key, {(column,value)...}) for ColumnFamily or (key, {supercolumn:{(column,value)...}...}) for SuperColumnFamily" , e);
        Hide
        brandon.williams Brandon Williams added a comment -

        Sure, will add those on commit.

        Show
        brandon.williams Brandon Williams added a comment - Sure, will add those on commit.
        Hide
        jbellis Jonathan Ellis added a comment -

        Is that +1 otherwise, Jeremy?

        Show
        jbellis Jonathan Ellis added a comment - Is that +1 otherwise, Jeremy?
        Hide
        jeromatron Jeremy Hanna added a comment -

        Brandon and I were still trying to track down a problem that I was seeing in one of the tests I was running. I'd like to get that resolved before it gets in if possible.

        Show
        jeromatron Jeremy Hanna added a comment - Brandon and I were still trying to track down a problem that I was seeing in one of the tests I was running. I'd like to get that resolved before it gets in if possible.
        Hide
        brandon.williams Brandon Williams added a comment -

        v2 rebased.

        Show
        brandon.williams Brandon Williams added a comment - v2 rebased.
        Hide
        steeve Steeve Morin added a comment - - edited

        Fixed it for me on Pig 0.9 0.8.3 and Cassandra 0.8.6 (Brisk).

        Pig 0.9 complains:
        2011-10-03 14:41:21,033 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <file test.pig, line 8, column 78> mismatched input ')' expecting IDENTIFIER_L

        Show
        steeve Steeve Morin added a comment - - edited Fixed it for me on Pig 0.9 0.8.3 and Cassandra 0.8.6 (Brisk). Pig 0.9 complains: 2011-10-03 14:41:21,033 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <file test.pig, line 8, column 78> mismatched input ')' expecting IDENTIFIER_L
        Hide
        jeromatron Jeremy Hanna added a comment -

        +1 - if we find any issues with it in production, we'll submit bug reports.

        Show
        jeromatron Jeremy Hanna added a comment - +1 - if we find any issues with it in production, we'll submit bug reports.
        Hide
        brandon.williams Brandon Williams added a comment -

        Committed.

        Show
        brandon.williams Brandon Williams added a comment - Committed.
        Hide
        hudson Hudson added a comment -

        Integrated in Cassandra-0.8 #348 (See https://builds.apache.org/job/Cassandra-0.8/348/)
        Pig storage handler implements LoadMetadata interface.
        Patch by brandonwilliams, reviewed by Jeremy Hanna for CASSANDRA-2777

        brandonwilliams : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177083
        Files :

        • /cassandra/branches/cassandra-0.8/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java
        Show
        hudson Hudson added a comment - Integrated in Cassandra-0.8 #348 (See https://builds.apache.org/job/Cassandra-0.8/348/ ) Pig storage handler implements LoadMetadata interface. Patch by brandonwilliams, reviewed by Jeremy Hanna for CASSANDRA-2777 brandonwilliams : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177083 Files : /cassandra/branches/cassandra-0.8/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java
        Hide
        steeve Steeve Morin added a comment - - edited

        Please note that this patch doesn't work for Pig 0.9, it doesn't like the AS ();.
        2011-10-03 14:41:21,033 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <file test.pig, line 8, column 78> mismatched input ')' expecting IDENTIFIER_L

        Show
        steeve Steeve Morin added a comment - - edited Please note that this patch doesn't work for Pig 0.9, it doesn't like the AS ();. 2011-10-03 14:41:21,033 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <file test.pig, line 8, column 78> mismatched input ')' expecting IDENTIFIER_L
        Hide
        dberg Adam Denenberg added a comment -

        same here for 0.9.1

        [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 2, column 65> mismatched input '(' expecting SEMI_COLON

        Show
        dberg Adam Denenberg added a comment - same here for 0.9.1 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 2, column 65> mismatched input '(' expecting SEMI_COLON

          People

          • Assignee:
            brandon.williams Brandon Williams
            Reporter:
            brandon.williams Brandon Williams
            Reviewer:
            Jeremy Hanna
          • Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development