Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Fix Version/s: 0.7.4
    • Component/s: Examples, Hadoop
    • Labels:
      None

      Description

      Now that we have a ColumnFamilyOutputFormat, we can write data back to cassandra in mapreduce jobs, however we can only do this in java. It would be nice if pig could also output to cassandra.

      1. 0001-add-storage-ability-to-pig-CassandraStorage.txt
        11 kB
        Brandon Williams
      2. 0003-StoreFunc_with_deletion.txt
        12 kB
        Eldon Stegall
      3. 0002-Fix-build-bin-script.txt
        2 kB
        Brandon Williams

        Activity

        Brandon Williams created issue -
        Stu Hood made changes -
        Field Original Value New Value
        Component/s Hadoop [ 12313540 ]
        Jonathan Ellis made changes -
        Original Estimate 32h [ 115200 ]
        Remaining Estimate 32h [ 115200 ]
        Jonathan Ellis made changes -
        Fix Version/s 0.7.1 [ 12315199 ]
        Jonathan Ellis made changes -
        Fix Version/s 0.7.2 [ 12316100 ]
        Hide
        Brandon Williams added a comment - - edited

        Patch to allow storing output to cassandra, so long as the output matches what you would get from loading either a CF or SCF. Note that I had to custom build pig with jackson 1.4, since it includes its own jackson 1.0.1 which avro does not seem to like.

        Show
        Brandon Williams added a comment - - edited Patch to allow storing output to cassandra, so long as the output matches what you would get from loading either a CF or SCF. Note that I had to custom build pig with jackson 1.4, since it includes its own jackson 1.0.1 which avro does not seem to like.
        Brandon Williams made changes -
        Attachment 0001-add-storage-ability-to-pig-CassandraStorage.txt [ 12470952 ]
        Attachment 0002-Fix-build-bin-script.txt [ 12470953 ]
        Hide
        Brandon Williams added a comment -

        The way to rebuild pig is to edit ivy/libraries.properties and bump the jackson version to 1.4.0 then run ant.

        Show
        Brandon Williams added a comment - The way to rebuild pig is to edit ivy/libraries.properties and bump the jackson version to 1.4.0 then run ant.
        Brandon Williams made changes -
        Fix Version/s 0.7.3 [ 12316182 ]
        Fix Version/s 0.7.2 [ 12316100 ]
        Hide
        Eldon Stegall added a comment -

        Should add deletion to the storage function. This patch applies cleanly to the 0.7.0 tag.

        Show
        Eldon Stegall added a comment - Should add deletion to the storage function. This patch applies cleanly to the 0.7.0 tag.
        Eldon Stegall made changes -
        Attachment 0003-StoreFunc_with_deletion.txt [ 12471864 ]
        Eldon Stegall made changes -
        Attachment 0003-StoreFunc_with_deletion.txt [ 12471864 ]
        Eldon Stegall made changes -
        Attachment 0003-StoreFunc_with_deletion.txt [ 12471867 ]
        Jonathan Ellis made changes -
        Fix Version/s 0.7.4 [ 12316241 ]
        Fix Version/s 0.7.3 [ 12316182 ]
        Brandon Williams made changes -
        Attachment 0001-add-storage-ability-to-pig-CassandraStorage.txt [ 12470952 ]
        Hide
        Brandon Williams added a comment - - edited

        Couldn't get Eldon's patch to apply, but updated 0001 with his changes to add deletions and explicitly cast String, as well as other cleanups. Only 0001 and 0002 are part of the patchset, 0003 is an outdated conglomeration of the two now.

        Show
        Brandon Williams added a comment - - edited Couldn't get Eldon's patch to apply, but updated 0001 with his changes to add deletions and explicitly cast String, as well as other cleanups. Only 0001 and 0002 are part of the patchset, 0003 is an outdated conglomeration of the two now.
        Brandon Williams made changes -
        Hide
        Jeremy Hanna added a comment -

        +1

        2 comments:

        • I really like exceptions for invalid configuration - e.g. "PIG_INITIAL_ADDRESS environment variable not set"
        • why not just have getOutputFormat just return new ColumnFamilyOutputFormat();
        Show
        Jeremy Hanna added a comment - +1 2 comments: I really like exceptions for invalid configuration - e.g. "PIG_INITIAL_ADDRESS environment variable not set" why not just have getOutputFormat just return new ColumnFamilyOutputFormat();
        Jeremy Hanna made changes -
        Reviewer jeromatron
        Hide
        Brandon Williams added a comment -

        Committed with the second point change (also in the input format) and updated README

        Show
        Brandon Williams added a comment - Committed with the second point change (also in the input format) and updated README
        Brandon Williams made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in Cassandra-0.7 #345 (See https://hudson.apache.org/hudson/job/Cassandra-0.7/345/)
        Pig storefunc.
        Patch by brandonwilliams, reviewed by Jeremy Hanna for CASSANDRA-1828.

        Show
        Hudson added a comment - Integrated in Cassandra-0.7 #345 (See https://hudson.apache.org/hudson/job/Cassandra-0.7/345/ ) Pig storefunc. Patch by brandonwilliams, reviewed by Jeremy Hanna for CASSANDRA-1828 .
        Gavin made changes -
        Workflow no-reopen-closed, patch-avail [ 12539446 ] patch-available, re-open possible [ 12752537 ]
        Gavin made changes -
        Workflow patch-available, re-open possible [ 12752537 ] reopen-resolved, no closed status, patch-avail, testing [ 12755411 ]

          People

          • Assignee:
            Brandon Williams
            Reporter:
            Brandon Williams
            Reviewer:
            Jeremy Hanna
          • Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 32h
              32h
              Remaining:
              Remaining Estimate - 32h
              32h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development