Cassandra
  1. Cassandra
  2. CASSANDRA-2869

CassandraStorage does not function properly when used multiple times in a single pig script due to UDFContext sharing issues

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 0.7.9, 0.8.2
    • Component/s: Examples
    • Labels:
      None

      Description

      CassandraStorage appears to have threading issues along the lines of those described at http://pig.markmail.org/message/oz7oz2x2dwp66eoz due to the sharing of the UDFContext.

      I believe the fix lies in implementing

      public void setStoreFuncUDFContextSignature(String signature)
          {
          }
      

      and then using that signature when getting the UDFContext.

      From the Pig manual:

      setStoreFunc!UDFContextSignature(): This method will be called by Pig both in the front end and back end to pass a unique signature to the Storer. The signature can be used to store into the UDFContext any information which the Storer needs to store between various method invocations in the front end and back end. The default implementation in StoreFunc has an empty body. This method will be called before other methods.

      1. 2869-2.txt
        5 kB
        Jeremy Hanna
      2. 2869.txt
        4 kB
        Jeremy Hanna

        Activity

        Hide
        Jeremy Hanna added a comment -

        Simple patch to use the load and store signatures instead of the udf context property keys we had been using. We're running this in our data pipeline and appears to work correctly. However, I haven't found evidence that the old way wasn't working - that seems to be more related to read consistency level we were using. But, this is probably the way we should be doing it, as it appears to be the Pig approach. Also there could be some corner cases that might trip up the current approach.

        Show
        Jeremy Hanna added a comment - Simple patch to use the load and store signatures instead of the udf context property keys we had been using. We're running this in our data pipeline and appears to work correctly. However, I haven't found evidence that the old way wasn't working - that seems to be more related to read consistency level we were using. But, this is probably the way we should be doing it, as it appears to be the Pig approach. Also there could be some corner cases that might trip up the current approach.
        Hide
        Brandon Williams added a comment -

        Looks like we can remove UDFCONTEXT_SCHEMA_KEY_PREFIX now too, no?

        Show
        Brandon Williams added a comment - Looks like we can remove UDFCONTEXT_SCHEMA_KEY_PREFIX now too, no?
        Hide
        Jeremy Hanna added a comment -

        Yes. I was about to post an updated patch last night but got sidetracked. Do you mind removing that if it's otherwise good to go? Otherwise I can do that later today.

        Show
        Jeremy Hanna added a comment - Yes. I was about to post an updated patch last night but got sidetracked. Do you mind removing that if it's otherwise good to go? Otherwise I can do that later today.
        Hide
        Jeremy Hanna added a comment -

        Removed that String. Also removed adding mutation twice and put in the nested exception in putNext into the IOException. We've been meaning to add those last two items to one of these tickets.

        Show
        Jeremy Hanna added a comment - Removed that String. Also removed adding mutation twice and put in the nested exception in putNext into the IOException. We've been meaning to add those last two items to one of these tickets.
        Hide
        Brandon Williams added a comment -

        Committed

        Show
        Brandon Williams added a comment - Committed
        Hide
        Hudson added a comment -

        Integrated in Cassandra-0.7 #534 (See https://builds.apache.org/job/Cassandra-0.7/534/)
        Use a UDF-specific context signature.
        Patch by Jeremy Hanna, reviewed by brandonwilliams for CASSANDRA-2869

        brandonwilliams : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1149341
        Files :

        • /cassandra/branches/cassandra-0.7/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java
        Show
        Hudson added a comment - Integrated in Cassandra-0.7 #534 (See https://builds.apache.org/job/Cassandra-0.7/534/ ) Use a UDF-specific context signature. Patch by Jeremy Hanna, reviewed by brandonwilliams for CASSANDRA-2869 brandonwilliams : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1149341 Files : /cassandra/branches/cassandra-0.7/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java

          People

          • Assignee:
            Jeremy Hanna
            Reporter:
            Grant Ingersoll
            Reviewer:
            Brandon Williams
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development