Pig
  1. Pig
  2. PIG-3573

Provide StoreFunc and LoadFunc for Accumulo

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: internal-udfs
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Accumulo has some code to allow reading and writing from it through Pig. I've been working on making it more robust and would like to try to get it included into Pig (to avoid the necessity to bundle additional jars).

      Some info on what currently exists http://people.apache.org/~elserj/accumulo-pig/, and the current code https://git-wip-us.apache.org/repos/asf?p=accumulo-pig.git

      1. Need to translate Maven build into Ant+Ivy
      2. Need to figure out how to support Accumulo 1.4 and 1.5 builds (differences in dependencies and APIs)

        Issue Links

          Activity

          Hide
          Josh Elser added a comment -

          First run at patch for pulling the Accumulo Store/Load Func code into Pig.

          Show
          Josh Elser added a comment - First run at patch for pulling the Accumulo Store/Load Func code into Pig.
          Hide
          Josh Elser added a comment -

          A few changes that would likely raise some questions if not otherwise explained:

          1. It looks like the Hadoop 1.0 line doesn't have the Configuration#unset method which I was using. I might be able to come up with an alternative way to address it, but I figured I'd test the waters to see if there are any reasons to not just upgrade the dependency.

          2. Upped ZooKeeper version from 3.4.4 to 3.4.5 because I, for the life of me, couldn't get Accumulo's standalone "minicluster" testing harness (used in AccumuloPigClusterTest) to work with 3.4.4. By the looks of ZooKeeper's release page, 3.4.4 might have been a bad release (as they don't have it listed on the prominently).

          3. I'm a Maven person so I tried my best to do the right thing with Ant+Ivy. In the end, I got the dependencies I needed, but I'm not sure if I did it the "right" way.

          Overall, this is still very much a bare-bones implementation. I wanted to make sure I have a good basic set of tested functionality before getting into new functionality/usability. Feedback is greatly appreciated

          Show
          Josh Elser added a comment - A few changes that would likely raise some questions if not otherwise explained: 1. It looks like the Hadoop 1.0 line doesn't have the Configuration#unset method which I was using. I might be able to come up with an alternative way to address it, but I figured I'd test the waters to see if there are any reasons to not just upgrade the dependency. 2. Upped ZooKeeper version from 3.4.4 to 3.4.5 because I, for the life of me, couldn't get Accumulo's standalone "minicluster" testing harness (used in AccumuloPigClusterTest) to work with 3.4.4. By the looks of ZooKeeper's release page, 3.4.4 might have been a bad release (as they don't have it listed on the prominently). 3. I'm a Maven person so I tried my best to do the right thing with Ant+Ivy. In the end, I got the dependencies I needed, but I'm not sure if I did it the "right" way. Overall, this is still very much a bare-bones implementation. I wanted to make sure I have a good basic set of tested functionality before getting into new functionality/usability. Feedback is greatly appreciated
          Hide
          Josh Elser added a comment -

          Also, this patch depends on a SNAPSHOT version of Accumulo 1.5. I had to make some upstream changes in how our InputFormat works which hasn't been officially released yet. We're in the midst of closing out a 1.6.0 release, but I want to get a 1.5.1 release in the near future as well. My thought was to just use reflection as needed to support any API changes we have in Accumulo between our 1.5 and 1.6 lines, so the patch will need to change a bit more as I get tagged releases I can depend on to provide to you all.

          Show
          Josh Elser added a comment - Also, this patch depends on a SNAPSHOT version of Accumulo 1.5. I had to make some upstream changes in how our InputFormat works which hasn't been officially released yet. We're in the midst of closing out a 1.6.0 release, but I want to get a 1.5.1 release in the near future as well. My thought was to just use reflection as needed to support any API changes we have in Accumulo between our 1.5 and 1.6 lines, so the patch will need to change a bit more as I get tagged releases I can depend on to provide to you all.
          Hide
          Josh Elser added a comment -

          A friend was talking to me about this and I think there are some places that I can clean things up to 1) update less dependencies and 2) simplify the implementation. Expect an updated patch soon.

          Show
          Josh Elser added a comment - A friend was talking to me about this and I think there are some places that I can clean things up to 1) update less dependencies and 2) simplify the implementation. Expect an updated patch soon.
          Hide
          Josh Elser added a comment -

          Rebase'ed on top of the upstream Pig changes. In addition to the changes introduced in the first patch, this patch:

          1. Removes the necessity for hadoop version changes by using some reflection to work around API differences with tests
          2. Eliminates the need to get a new version of Accumulo released (works against the already released 1.5.0 version).
          3. Missing license notices on some source files.
          Show
          Josh Elser added a comment - Rebase'ed on top of the upstream Pig changes. In addition to the changes introduced in the first patch, this patch: Removes the necessity for hadoop version changes by using some reflection to work around API differences with tests Eliminates the need to get a new version of Accumulo released (works against the already released 1.5.0 version). Missing license notices on some source files.
          Hide
          Cheolsoo Park added a comment -

          Josh Elser, thank you very much for the patch, and sorry for the delay in review.

          Do you mind uploading it to the review board? It's relatively big, so it would be easier in the review board.

          Show
          Cheolsoo Park added a comment - Josh Elser , thank you very much for the patch, and sorry for the delay in review. Do you mind uploading it to the review board ? It's relatively big, so it would be easier in the review board.
          Hide
          Josh Elser added a comment -

          sorry for the delay in review.

          No worries! These things happen. I made some time to revisit this with a fresh mind – good all around.

          Do you mind uploading it to the review board? It's relatively big, so it would be easier in the review board.

          Of course. I set it to the "pig" group on RB, and linked it to this ticket.

          Show
          Josh Elser added a comment - sorry for the delay in review. No worries! These things happen. I made some time to revisit this with a fresh mind – good all around. Do you mind uploading it to the review board? It's relatively big, so it would be easier in the review board. Of course. I set it to the "pig" group on RB, and linked it to this ticket.
          Hide
          Daniel Dai added a comment -
          Show
          Daniel Dai added a comment - RB link: https://reviews.apache.org/r/16705/
          Hide
          Josh Elser added a comment -

          Daniel Dai, should I close out the other RB that I linked on this issue? (https://reviews.apache.org/r/16533/)

          Show
          Josh Elser added a comment - Daniel Dai , should I close out the other RB that I linked on this issue? ( https://reviews.apache.org/r/16533/ )
          Hide
          Daniel Dai added a comment -

          Oh, sorry, didn't see that. Let's close that one to avoid confusion. Thanks!

          Show
          Daniel Dai added a comment - Oh, sorry, didn't see that. Let's close that one to avoid confusion. Thanks!
          Hide
          Josh Elser added a comment -

          Let's close that one to avoid confusion

          I can, but it looks like the RB you made was off of the old patch on this ticket. The new one has a slightly different implementation with less ivy changes. I'd much rather use the newer patch. I can still close mine if you want to update the one you made. Either way – thanks.

          Show
          Josh Elser added a comment - Let's close that one to avoid confusion I can, but it looks like the RB you made was off of the old patch on this ticket. The new one has a slightly different implementation with less ivy changes. I'd much rather use the newer patch. I can still close mine if you want to update the one you made. Either way – thanks.
          Hide
          Daniel Dai added a comment -

          Then I can close mine and use yours instead: https://reviews.apache.org/r/16533/

          Show
          Daniel Dai added a comment - Then I can close mine and use yours instead: https://reviews.apache.org/r/16533/
          Hide
          Josh Elser added a comment -

          Updated patch from RB for convenience.

          Show
          Josh Elser added a comment - Updated patch from RB for convenience.
          Hide
          Daniel Dai added a comment -

          Attach the final patch after review.

          Show
          Daniel Dai added a comment - Attach the final patch after review.
          Hide
          Daniel Dai added a comment -

          Patch committed to trunk. Created PIG-3674 to fix TestAccumuloPigCluster on Hadoop 2 and PIG-3675 for documentation. Thanks Josh!

          Show
          Daniel Dai added a comment - Patch committed to trunk. Created PIG-3674 to fix TestAccumuloPigCluster on Hadoop 2 and PIG-3675 for documentation. Thanks Josh!
          Hide
          Cheolsoo Park added a comment -

          Daniel Dai and Josh Elser, this commit breaks unit tests in jenkins. Since TestAccumuloPigCluster does not shut down the mini cluster, sub-sequential tests fail with the following error-

          java.net.ConnectException: Call to localhost/127.0.0.1:54029 failed on connection exception: java.net.ConnectException: Connection refused
          

          Here 127.0.0.1:54029 is the NN that is used by TestAccumuloPigCluster and no longer exists, but sub-sequential tests (e.g. TestInputSizeReducerEstimator) try to talk to it.

          Attaching a fix.

          Show
          Cheolsoo Park added a comment - Daniel Dai and Josh Elser , this commit breaks unit tests in jenkins. Since TestAccumuloPigCluster does not shut down the mini cluster, sub-sequential tests fail with the following error- java.net.ConnectException: Call to localhost/127.0.0.1:54029 failed on connection exception: java.net.ConnectException: Connection refused Here 127.0.0.1:54029 is the NN that is used by TestAccumuloPigCluster and no longer exists, but sub-sequential tests (e.g. TestInputSizeReducerEstimator) try to talk to it. Attaching a fix.
          Hide
          Cheolsoo Park added a comment -

          Please review PIG-3573-hotfix.patch

          Show
          Cheolsoo Park added a comment - Please review PIG-3573 -hotfix.patch
          Hide
          Josh Elser added a comment -

          Please review PIG-3573-hotfix.patch

          LGTM – Sorry for missing that, Cheolsoo Park.

          Show
          Josh Elser added a comment - Please review PIG-3573 -hotfix.patch LGTM – Sorry for missing that, Cheolsoo Park .
          Hide
          Cheolsoo Park added a comment -

          No worries. I committed the hotfix to trunk.

          Show
          Cheolsoo Park added a comment - No worries. I committed the hotfix to trunk.
          Hide
          Daniel Dai added a comment -

          Thanks Cheolsoo Park!

          Show
          Daniel Dai added a comment - Thanks Cheolsoo Park !

            People

            • Assignee:
              Josh Elser
              Reporter:
              Josh Elser
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development