Pig
  1. Pig
  2. PIG-3573

Provide StoreFunc and LoadFunc for Accumulo

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: internal-udfs
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Accumulo has some code to allow reading and writing from it through Pig. I've been working on making it more robust and would like to try to get it included into Pig (to avoid the necessity to bundle additional jars).

      Some info on what currently exists http://people.apache.org/~elserj/accumulo-pig/, and the current code https://git-wip-us.apache.org/repos/asf?p=accumulo-pig.git

      1. Need to translate Maven build into Ant+Ivy
      2. Need to figure out how to support Accumulo 1.4 and 1.5 builds (differences in dependencies and APIs)

        Issue Links

          Activity

          Josh Elser created issue -
          Hide
          Josh Elser added a comment -

          First run at patch for pulling the Accumulo Store/Load Func code into Pig.

          Show
          Josh Elser added a comment - First run at patch for pulling the Accumulo Store/Load Func code into Pig.
          Josh Elser made changes -
          Field Original Value New Value
          Attachment 0001-PIG-3573-Initial-contribution-of-accumulo-store-load.patch [ 12615792 ]
          Josh Elser made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Fix Version/s 0.13.0 [ 12324971 ]
          Hide
          Josh Elser added a comment -

          A few changes that would likely raise some questions if not otherwise explained:

          1. It looks like the Hadoop 1.0 line doesn't have the Configuration#unset method which I was using. I might be able to come up with an alternative way to address it, but I figured I'd test the waters to see if there are any reasons to not just upgrade the dependency.

          2. Upped ZooKeeper version from 3.4.4 to 3.4.5 because I, for the life of me, couldn't get Accumulo's standalone "minicluster" testing harness (used in AccumuloPigClusterTest) to work with 3.4.4. By the looks of ZooKeeper's release page, 3.4.4 might have been a bad release (as they don't have it listed on the prominently).

          3. I'm a Maven person so I tried my best to do the right thing with Ant+Ivy. In the end, I got the dependencies I needed, but I'm not sure if I did it the "right" way.

          Overall, this is still very much a bare-bones implementation. I wanted to make sure I have a good basic set of tested functionality before getting into new functionality/usability. Feedback is greatly appreciated

          Show
          Josh Elser added a comment - A few changes that would likely raise some questions if not otherwise explained: 1. It looks like the Hadoop 1.0 line doesn't have the Configuration#unset method which I was using. I might be able to come up with an alternative way to address it, but I figured I'd test the waters to see if there are any reasons to not just upgrade the dependency. 2. Upped ZooKeeper version from 3.4.4 to 3.4.5 because I, for the life of me, couldn't get Accumulo's standalone "minicluster" testing harness (used in AccumuloPigClusterTest) to work with 3.4.4. By the looks of ZooKeeper's release page, 3.4.4 might have been a bad release (as they don't have it listed on the prominently). 3. I'm a Maven person so I tried my best to do the right thing with Ant+Ivy. In the end, I got the dependencies I needed, but I'm not sure if I did it the "right" way. Overall, this is still very much a bare-bones implementation. I wanted to make sure I have a good basic set of tested functionality before getting into new functionality/usability. Feedback is greatly appreciated
          Hide
          Josh Elser added a comment -

          Also, this patch depends on a SNAPSHOT version of Accumulo 1.5. I had to make some upstream changes in how our InputFormat works which hasn't been officially released yet. We're in the midst of closing out a 1.6.0 release, but I want to get a 1.5.1 release in the near future as well. My thought was to just use reflection as needed to support any API changes we have in Accumulo between our 1.5 and 1.6 lines, so the patch will need to change a bit more as I get tagged releases I can depend on to provide to you all.

          Show
          Josh Elser added a comment - Also, this patch depends on a SNAPSHOT version of Accumulo 1.5. I had to make some upstream changes in how our InputFormat works which hasn't been officially released yet. We're in the midst of closing out a 1.6.0 release, but I want to get a 1.5.1 release in the near future as well. My thought was to just use reflection as needed to support any API changes we have in Accumulo between our 1.5 and 1.6 lines, so the patch will need to change a bit more as I get tagged releases I can depend on to provide to you all.
          Hide
          Josh Elser added a comment -

          A friend was talking to me about this and I think there are some places that I can clean things up to 1) update less dependencies and 2) simplify the implementation. Expect an updated patch soon.

          Show
          Josh Elser added a comment - A friend was talking to me about this and I think there are some places that I can clean things up to 1) update less dependencies and 2) simplify the implementation. Expect an updated patch soon.
          Hide
          Josh Elser added a comment -

          Rebase'ed on top of the upstream Pig changes. In addition to the changes introduced in the first patch, this patch:

          1. Removes the necessity for hadoop version changes by using some reflection to work around API differences with tests
          2. Eliminates the need to get a new version of Accumulo released (works against the already released 1.5.0 version).
          3. Missing license notices on some source files.
          Show
          Josh Elser added a comment - Rebase'ed on top of the upstream Pig changes. In addition to the changes introduced in the first patch, this patch: Removes the necessity for hadoop version changes by using some reflection to work around API differences with tests Eliminates the need to get a new version of Accumulo released (works against the already released 1.5.0 version). Missing license notices on some source files.
          Josh Elser made changes -
          Hide
          Cheolsoo Park added a comment -

          Josh Elser, thank you very much for the patch, and sorry for the delay in review.

          Do you mind uploading it to the review board? It's relatively big, so it would be easier in the review board.

          Show
          Cheolsoo Park added a comment - Josh Elser , thank you very much for the patch, and sorry for the delay in review. Do you mind uploading it to the review board ? It's relatively big, so it would be easier in the review board.
          Josh Elser made changes -
          Remote Link This issue links to "ReviewBoard (Web Link)" [ 13614 ]
          Hide
          Josh Elser added a comment -

          sorry for the delay in review.

          No worries! These things happen. I made some time to revisit this with a fresh mind – good all around.

          Do you mind uploading it to the review board? It's relatively big, so it would be easier in the review board.

          Of course. I set it to the "pig" group on RB, and linked it to this ticket.

          Show
          Josh Elser added a comment - sorry for the delay in review. No worries! These things happen. I made some time to revisit this with a fresh mind – good all around. Do you mind uploading it to the review board? It's relatively big, so it would be easier in the review board. Of course. I set it to the "pig" group on RB, and linked it to this ticket.
          Hide
          Daniel Dai added a comment -
          Show
          Daniel Dai added a comment - RB link: https://reviews.apache.org/r/16705/
          Hide
          Josh Elser added a comment -

          Daniel Dai, should I close out the other RB that I linked on this issue? (https://reviews.apache.org/r/16533/)

          Show
          Josh Elser added a comment - Daniel Dai , should I close out the other RB that I linked on this issue? ( https://reviews.apache.org/r/16533/ )
          Hide
          Daniel Dai added a comment -

          Oh, sorry, didn't see that. Let's close that one to avoid confusion. Thanks!

          Show
          Daniel Dai added a comment - Oh, sorry, didn't see that. Let's close that one to avoid confusion. Thanks!
          Hide
          Josh Elser added a comment -

          Let's close that one to avoid confusion

          I can, but it looks like the RB you made was off of the old patch on this ticket. The new one has a slightly different implementation with less ivy changes. I'd much rather use the newer patch. I can still close mine if you want to update the one you made. Either way – thanks.

          Show
          Josh Elser added a comment - Let's close that one to avoid confusion I can, but it looks like the RB you made was off of the old patch on this ticket. The new one has a slightly different implementation with less ivy changes. I'd much rather use the newer patch. I can still close mine if you want to update the one you made. Either way – thanks.
          Hide
          Daniel Dai added a comment -

          Then I can close mine and use yours instead: https://reviews.apache.org/r/16533/

          Show
          Daniel Dai added a comment - Then I can close mine and use yours instead: https://reviews.apache.org/r/16533/
          Hide
          Josh Elser added a comment -

          Updated patch from RB for convenience.

          Show
          Josh Elser added a comment - Updated patch from RB for convenience.
          Josh Elser made changes -
          Hide
          Daniel Dai added a comment -

          Attach the final patch after review.

          Show
          Daniel Dai added a comment - Attach the final patch after review.
          Daniel Dai made changes -
          Attachment PIG-3573-1.patch [ 12623591 ]
          Daniel Dai made changes -
          Link This issue is depended upon by PIG-3675 [ PIG-3675 ]
          Hide
          Daniel Dai added a comment -

          Patch committed to trunk. Created PIG-3674 to fix TestAccumuloPigCluster on Hadoop 2 and PIG-3675 for documentation. Thanks Josh!

          Show
          Daniel Dai added a comment - Patch committed to trunk. Created PIG-3674 to fix TestAccumuloPigCluster on Hadoop 2 and PIG-3675 for documentation. Thanks Josh!
          Daniel Dai made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Resolution Fixed [ 1 ]
          Daniel Dai made changes -
          Component/s internal-udfs [ 12315411 ]
          Hide
          Cheolsoo Park added a comment -

          Daniel Dai and Josh Elser, this commit breaks unit tests in jenkins. Since TestAccumuloPigCluster does not shut down the mini cluster, sub-sequential tests fail with the following error-

          java.net.ConnectException: Call to localhost/127.0.0.1:54029 failed on connection exception: java.net.ConnectException: Connection refused
          

          Here 127.0.0.1:54029 is the NN that is used by TestAccumuloPigCluster and no longer exists, but sub-sequential tests (e.g. TestInputSizeReducerEstimator) try to talk to it.

          Attaching a fix.

          Show
          Cheolsoo Park added a comment - Daniel Dai and Josh Elser , this commit breaks unit tests in jenkins. Since TestAccumuloPigCluster does not shut down the mini cluster, sub-sequential tests fail with the following error- java.net.ConnectException: Call to localhost/127.0.0.1:54029 failed on connection exception: java.net.ConnectException: Connection refused Here 127.0.0.1:54029 is the NN that is used by TestAccumuloPigCluster and no longer exists, but sub-sequential tests (e.g. TestInputSizeReducerEstimator) try to talk to it. Attaching a fix.
          Cheolsoo Park made changes -
          Attachment PIG-3573-hostfix.patch [ 12623872 ]
          Hide
          Cheolsoo Park added a comment -

          Please review PIG-3573-hotfix.patch

          Show
          Cheolsoo Park added a comment - Please review PIG-3573 -hotfix.patch
          Cheolsoo Park made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Hide
          Josh Elser added a comment -

          Please review PIG-3573-hotfix.patch

          LGTM – Sorry for missing that, Cheolsoo Park.

          Show
          Josh Elser added a comment - Please review PIG-3573 -hotfix.patch LGTM – Sorry for missing that, Cheolsoo Park .
          Hide
          Cheolsoo Park added a comment -

          No worries. I committed the hotfix to trunk.

          Show
          Cheolsoo Park added a comment - No worries. I committed the hotfix to trunk.
          Cheolsoo Park made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Daniel Dai added a comment -

          Thanks Cheolsoo Park!

          Show
          Daniel Dai added a comment - Thanks Cheolsoo Park !
          Cheolsoo Park made changes -
          Link This issue breaks PIG-3776 [ PIG-3776 ]
          Daniel Dai made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Patch Available Patch Available
          11d 10h 47m 1 Josh Elser 26/Nov/13 04:50
          Patch Available Patch Available Resolved Resolved
          52d 1h 51m 1 Daniel Dai 17/Jan/14 06:42
          Resolved Resolved Reopened Reopened
          2d 15h 54m 1 Cheolsoo Park 19/Jan/14 22:37
          Reopened Reopened Resolved Resolved
          8h 35m 1 Cheolsoo Park 20/Jan/14 07:12
          Resolved Resolved Closed Closed
          168d 10h 55m 1 Daniel Dai 07/Jul/14 19:08

            People

            • Assignee:
              Josh Elser
              Reporter:
              Josh Elser
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development