Pig
  1. Pig
  2. PIG-1680

Pig 0.8 HBaseStorage may not against HBase 0.89

    Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.8.1
    • Component/s: impl
    • Labels:
      None
    • Patch Info:
      Patch Available
    • Hadoop Flags:
      Incompatible change
    • Release Note:
      This patch bumps HBase version to 0.90. Versions 0.20.x are no longer supported.
    • Tags:
      hbase

      Description

      HBaseStorage is currently coded against the hbase 0.20.6 API. The hbase 0.89 API deprecates some methods and outright removes some others which causes HBaseStorage to no longer compile.

      It is unclear whether one can run an HBase 0.20.6 client against a running 0.89 hbase instance. In my experience, it does not work. Therefore, HBaseStorage has to be compiled against 0.89.

      Attached is a proposed patch to make 0.8.0 trunk compatible with hbase 0.89 as well as a script to help automate the upgrade.

      1. pig-hbase-0.89.SNAPSHOT.patch
        10 kB
        George P. Stathis
      2. apply-PIG-1680-patch.sh
        1 kB
        George P. Stathis
      3. pig-0.8.0-hbase-0.89.SNAPSHOT.patch
        10 kB
        Bill Graham
      4. apply-PIG-1680-patch-0.8.0.sh
        2 kB
        Bill Graham
      5. PIG_1680.patch
        59 kB
        Dmitriy V. Ryaboy
      6. PIG_1680.patch
        91 kB
        Dmitriy V. Ryaboy
      7. PIG_1680.patch
        23 kB
        Dmitriy V. Ryaboy
      8. PIG-1680.3.patch
        25 kB
        Dmitriy V. Ryaboy
      9. hbase-0.90.0-tests.jar
        880 kB
        Dmitriy V. Ryaboy
      10. pig-1680-minicluster-fixes.txt
        2 kB
        Todd Lipcon
      11. pig-1680-with-ivy.xml
        28 kB
        Todd Lipcon
      12. pig_1680.final.patch
        27 kB
        Dmitriy V. Ryaboy

        Issue Links

          Activity

          Hide
          George P. Stathis added a comment -

          See attached patch.

          Also the hbase tests jar contains a test hbase-site.xml which uses 21810 as the zookeeper port instead of the default 2181. This prevents pig from connecting with default zookeeper installations out of the box. The attached shell script puts an hbase-site.xml under $PIG_HOME/conf and changes the port back to the default.

          Show
          George P. Stathis added a comment - See attached patch. Also the hbase tests jar contains a test hbase-site.xml which uses 21810 as the zookeeper port instead of the default 2181. This prevents pig from connecting with default zookeeper installations out of the box. The attached shell script puts an hbase-site.xml under $PIG_HOME/conf and changes the port back to the default.
          Hide
          Dmitriy V. Ryaboy added a comment -

          George,
          I am not sure about replacing Hadoop 20.2 with the appends version.

          Can we instead push the incompatibilities off into a shim layer and use the appropriate methods depending on the version of HBase/Hadoop that are on the classpath at runtime? We did something like that for Pig when folks were slowly switching over from Hadoop 18 to 20.

          Once the dust settles, we'll remove the shims and put HBase 89 + Hadoop with appends in, but not while there are a lot of people still on the old versions.

          Show
          Dmitriy V. Ryaboy added a comment - George, I am not sure about replacing Hadoop 20.2 with the appends version. Can we instead push the incompatibilities off into a shim layer and use the appropriate methods depending on the version of HBase/Hadoop that are on the classpath at runtime? We did something like that for Pig when folks were slowly switching over from Hadoop 18 to 20. Once the dust settles, we'll remove the shims and put HBase 89 + Hadoop with appends in, but not while there are a lot of people still on the old versions.
          Hide
          George P. Stathis added a comment -

          Absolutely, it makes sense. I uploaded the patch as a starting point. It woud be nice if someone other than me verified that it works for them as well.

          As for the shim layer, if there is an example of how others have done it in the past, I can spend some time taking a look and adapting the patch to it.

          Show
          George P. Stathis added a comment - Absolutely, it makes sense. I uploaded the patch as a starting point. It woud be nice if someone other than me verified that it works for them as well. As for the shim layer, if there is an example of how others have done it in the past, I can spend some time taking a look and adapting the patch to it.
          Hide
          Dmitriy V. Ryaboy added a comment -

          The old shim layer I am referring to is in PIG-924 (as you can see by the
          exciting discussion in that ticket, it wound up not going into trunk, but
          did go into a Cloudera distribution of Pig).

          -D

          On Thu, Oct 14, 2010 at 10:42 AM, George P. Stathis (JIRA)

          Show
          Dmitriy V. Ryaboy added a comment - The old shim layer I am referring to is in PIG-924 (as you can see by the exciting discussion in that ticket, it wound up not going into trunk, but did go into a Cloudera distribution of Pig). -D On Thu, Oct 14, 2010 at 10:42 AM, George P. Stathis (JIRA)
          Hide
          Olga Natkovich added a comment -

          Is somebody actually planning to address this for 0.8 or should it be moved to 0.9. I think previously we discussed to start the release process for 0.8 at the end of October

          Show
          Olga Natkovich added a comment - Is somebody actually planning to address this for 0.8 or should it be moved to 0.9. I think previously we discussed to start the release process for 0.8 at the end of October
          Hide
          Dmitriy V. Ryaboy added a comment -

          Moved to 9.
          It's easy to make work against 89 (using the patch provided by George), but the shim layer is a fair bit of work.

          Show
          Dmitriy V. Ryaboy added a comment - Moved to 9. It's easy to make work against 89 (using the patch provided by George), but the shim layer is a fair bit of work.
          Hide
          Jeremy Hinegardner added a comment -

          Hi,

          I am testing this patch + 0.8.0 branch against hbase 0.89.20100924 and all appears to be working correctly, except it only appears to run against a single hbase region. Only one map and one reduce job is kicked off, and the number of rows processed matches up with the amount of data in a single region.

          I'm attempting to dig and and figure out where this is happening, but I am quite new to the Pig codebase.

          Thanks,

          -jeremy

          Show
          Jeremy Hinegardner added a comment - Hi, I am testing this patch + 0.8.0 branch against hbase 0.89.20100924 and all appears to be working correctly, except it only appears to run against a single hbase region. Only one map and one reduce job is kicked off, and the number of rows processed matches up with the amount of data in a single region. I'm attempting to dig and and figure out where this is happening, but I am quite new to the Pig codebase. Thanks, -jeremy
          Hide
          Bill Graham added a comment -

          FWIW, this patch also worked for me against HBase 0.90.0-rc1.

          The hardest part was applying the patch actually. I tried to manually apply it to both the Pig 0.7.0 and the 0.8.0 branches and neither was working. There were just too many dependent patches required. Once I used the attached script instead, all worked fine after I updated the HBase snapshot jars to the versions currently in the repos.

          It would be very helpful to get an updated version of the patch that works against the current Pig 0.8.0 branch.

          Show
          Bill Graham added a comment - FWIW, this patch also worked for me against HBase 0.90.0-rc1. The hardest part was applying the patch actually. I tried to manually apply it to both the Pig 0.7.0 and the 0.8.0 branches and neither was working. There were just too many dependent patches required. Once I used the attached script instead, all worked fine after I updated the HBase snapshot jars to the versions currently in the repos. It would be very helpful to get an updated version of the patch that works against the current Pig 0.8.0 branch.
          Hide
          Bill Graham added a comment -

          Attached are both an updated patch and an updated script to apply it. Patch functionality is the same (and has been tested), but this patch can be applied to the current Pig 0.8.0 branch, the Pig 0.8.0 distro, or the current Pig trunk. Hopefully this helps bleeding-edgers who can't wait for the shims impl.

          The script permits the ability to apply the patch to a working copy of Pig (with the -d option), instead of doing an svn checkout, which is still the default. The apply script also now takes the HBase version from the patched ivy properties file.

          Show
          Bill Graham added a comment - Attached are both an updated patch and an updated script to apply it. Patch functionality is the same (and has been tested), but this patch can be applied to the current Pig 0.8.0 branch, the Pig 0.8.0 distro, or the current Pig trunk. Hopefully this helps bleeding-edgers who can't wait for the shims impl. The script permits the ability to apply the patch to a working copy of Pig (with the -d option), instead of doing an svn checkout, which is still the default. The apply script also now takes the HBase version from the patched ivy properties file.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Attached patch fixes problems with storing to HBase in 90, and sets up ivy to pull down hbase 90.

          We still have to manually check in hbase-0.90.0-tests.jar, as it's published in a way that doesn't lend itself to ivifycation easily.

          I also rolled in the critical changes required to make PIG-1828 work. Sorry, remembered too late that svn doesn't have an equivalent to git stash.

          Please review.

          Show
          Dmitriy V. Ryaboy added a comment - Attached patch fixes problems with storing to HBase in 90, and sets up ivy to pull down hbase 90. We still have to manually check in hbase-0.90.0-tests.jar, as it's published in a way that doesn't lend itself to ivifycation easily. I also rolled in the critical changes required to make PIG-1828 work. Sorry, remembered too late that svn doesn't have an equivalent to git stash. Please review.
          Hide
          Ashutosh Chauhan added a comment -

          Reviewed PIG-1828 part of it. Cool that you got it working from setLocation() with no additional changes. Your comment on PIG-1828 was suggesting otherwise.

          Show
          Ashutosh Chauhan added a comment - Reviewed PIG-1828 part of it. Cool that you got it working from setLocation() with no additional changes. Your comment on PIG-1828 was suggesting otherwise.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Oops! Forgot to attach the changes to jobControlCompiler

          Show
          Dmitriy V. Ryaboy added a comment - Oops! Forgot to attach the changes to jobControlCompiler
          Hide
          Dmitriy V. Ryaboy added a comment -

          I put a slightly updated version of the patch, as applied to trunk, here: https://github.com/dvryaboy/pig/tree/pig_1680
          Makes it easier to work with binary dependencies, etc.
          Could those interested try pulling that branch and seeing if the NPEs come back?

          Show
          Dmitriy V. Ryaboy added a comment - I put a slightly updated version of the patch, as applied to trunk, here: https://github.com/dvryaboy/pig/tree/pig_1680 Makes it easier to work with binary dependencies, etc. Could those interested try pulling that branch and seeing if the NPEs come back?
          Hide
          Bill Graham added a comment -

          @Dmitriy, your git branch is working for me. No NPEs.

          Show
          Bill Graham added a comment - @Dmitriy, your git branch is working for me. No NPEs.
          Hide
          Dmitriy V. Ryaboy added a comment -

          I think this is ready to commit; note that there is a binary dependency (hbase-0.90.0-tests.jar).

          Show
          Dmitriy V. Ryaboy added a comment - I think this is ready to commit; note that there is a binary dependency (hbase-0.90.0-tests.jar).
          Hide
          Dmitriy V. Ryaboy added a comment -

          hbase is an apache project, this jar is provided here as a convenience.

          Show
          Dmitriy V. Ryaboy added a comment - hbase is an apache project, this jar is provided here as a convenience.
          Hide
          Dmitriy V. Ryaboy added a comment -

          I'll backport these changes to 0.8 branch in a different patch.

          Ready for review.

          Show
          Dmitriy V. Ryaboy added a comment - I'll backport these changes to 0.8 branch in a different patch. Ready for review.
          Hide
          Alan Gates added a comment -

          Which patch should I review?

          Also, it looks like hbase is now in maven. See https://repository.apache.org/content/repositories/releases/org/apache/hbase/hbase/ We should move to pulling it from maven instead of checking it into our lib directory.

          Show
          Alan Gates added a comment - Which patch should I review? Also, it looks like hbase is now in maven. See https://repository.apache.org/content/repositories/releases/org/apache/hbase/hbase/ We should move to pulling it from maven instead of checking it into our lib directory.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Alan, the latest patch is PIG-1680.3.patch

          I set up ivy to pull hbase from maven, but we also need the testing jar, which they don't publish.

          Show
          Dmitriy V. Ryaboy added a comment - Alan, the latest patch is PIG-1680 .3.patch I set up ivy to pull hbase from maven, but we also need the testing jar, which they don't publish.
          Hide
          Ted Dunning added a comment -

          Hbase is definitely available in warden.

          Another question is the use of the hbase test jar for all purposes in this patch. Is that correct.

          Show
          Ted Dunning added a comment - Hbase is definitely available in warden. Another question is the use of the hbase test jar for all purposes in this patch. Is that correct.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Ted I am not sure what you mean. The hbase test jar is used for the unit tests (we borrow the ability to start a mini hbase cluster, etc).

          Show
          Dmitriy V. Ryaboy added a comment - Ted I am not sure what you mean. The hbase test jar is used for the unit tests (we borrow the ability to start a mini hbase cluster, etc).
          Show
          Alan Gates added a comment - https://repository.apache.org/content/repositories/releases/org/apache/hbase/hbase/0.90.0/hbase-0.90.0-tests.jar isn't the right jar?
          Hide
          Ted Dunning added a comment -

          Warden = maven (don't type while others are talking is the lesson)

          Show
          Ted Dunning added a comment - Warden = maven (don't type while others are talking is the lesson)
          Hide
          Ted Dunning added a comment -

          There are two hbase jars. Hbase-0.90-test.jar and hbase-0.90.jar. I believe that the -test version is a superset that includes the test cases that Dmitriy wants.

          The patch in question replaces lines that used to differentiate between the two with a single reference to the -test jar. I only started tracing things through so I can't say definitively what needs to happen, but that stood out.

          For example, we have this part of the patch:

               <property name="automaton.jarfile" value="automaton.jar" />
          -    <property name="hbase.jarfile" value="hbase-0.20.6.jar" />
          -    <property name="hbase.test.jarfile" value="hbase-0.20.6-test.jar" />
          -	<property name="zookeeper.jarfile" value="zookeeper-hbase-1329.jar" />
          -	
          +    <property name="hbase.test.jarfile" value="hbase-0.90.0-tests.jar" />
          

          This loses the paired hbase.jarfile and hbase.test.jarfile properties in favor of the single hbase.jarfile property that points to the test jar.

          Show
          Ted Dunning added a comment - There are two hbase jars. Hbase-0.90-test.jar and hbase-0.90.jar. I believe that the -test version is a superset that includes the test cases that Dmitriy wants. The patch in question replaces lines that used to differentiate between the two with a single reference to the -test jar. I only started tracing things through so I can't say definitively what needs to happen, but that stood out. For example, we have this part of the patch: <property name= "automaton.jarfile" value= "automaton.jar" /> - <property name= "hbase.jarfile" value= "hbase-0.20.6.jar" /> - <property name= "hbase.test.jarfile" value= "hbase-0.20.6-test.jar" /> - <property name= "zookeeper.jarfile" value= "zookeeper-hbase-1329.jar" /> - + <property name= "hbase.test.jarfile" value= "hbase-0.90.0-tests.jar" /> This loses the paired hbase.jarfile and hbase.test.jarfile properties in favor of the single hbase.jarfile property that points to the test jar.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Ah, I didn't realize that the tests jar was a superset, I thought it was only tests and supporting classes, not hbase itself.

          We still pull in hbase-0.90.jar, we just no longer get it from lib/hbase-0.20.6.jar and instead fetch via maven.

          Does hbase have a maven target that only jars up the classes needed to complete hbase "core" to run tests?

          Show
          Dmitriy V. Ryaboy added a comment - Ah, I didn't realize that the tests jar was a superset, I thought it was only tests and supporting classes, not hbase itself. We still pull in hbase-0.90.jar, we just no longer get it from lib/hbase-0.20.6.jar and instead fetch via maven. Does hbase have a maven target that only jars up the classes needed to complete hbase "core" to run tests?
          Hide
          Dmitriy V. Ryaboy added a comment -

          Alan it's the correct jar but it's not published in the pom, or rather it is but only as a facet (or some other maven fancyness). I am not good enough with ivy yet to pull it down right now :-/. If Giri has a sec to look that would be awesome.

          Show
          Dmitriy V. Ryaboy added a comment - Alan it's the correct jar but it's not published in the pom, or rather it is but only as a facet (or some other maven fancyness). I am not good enough with ivy yet to pull it down right now :-/. If Giri has a sec to look that would be awesome.
          Hide
          Alan Gates added a comment -

          Since you added an additional call to setLocation I want to test this against other loaders that use setLocation (e.g. Howl). I'm starting those tests now. The rest of the code looks fine, with the caveat that I did not review HBase specific code.

          Show
          Alan Gates added a comment - Since you added an additional call to setLocation I want to test this against other loaders that use setLocation (e.g. Howl). I'm starting those tests now. The rest of the code looks fine, with the caveat that I did not review HBase specific code.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Alan,
          How did the tests go?

          Show
          Dmitriy V. Ryaboy added a comment - Alan, How did the tests go?
          Hide
          Alan Gates added a comment -

          Sorry Dmitriy, I still haven't gotten to testing it. Go ahead and check it in. We'll catch any issues when we do Howl/Pig 0.9 integration testing here in a few months. I don't want to hold up your progress anymore.

          Show
          Alan Gates added a comment - Sorry Dmitriy, I still haven't gotten to testing it. Go ahead and check it in. We'll catch any issues when we do Howl/Pig 0.9 integration testing here in a few months. I don't want to hold up your progress anymore.
          Hide
          Todd Lipcon added a comment -

          Hey Dmitriy,

          Something strange is up with these tests - they seem to use the MiniCluster class from Pig to start a MiniDFS and MiniMR cluster, and then they use HBaseTestingUtility.startMiniCluster which goes and starts another DFS cluster. On my system (against a CDH hadoop core) this is causing some issues where the job jar ends up on the wrong HDFS and the test fails.

          Show
          Todd Lipcon added a comment - Hey Dmitriy, Something strange is up with these tests - they seem to use the MiniCluster class from Pig to start a MiniDFS and MiniMR cluster, and then they use HBaseTestingUtility.startMiniCluster which goes and starts another DFS cluster. On my system (against a CDH hadoop core) this is causing some issues where the job jar ends up on the wrong HDFS and the test fails.
          Hide
          Todd Lipcon added a comment -

          With these fixes in place it fixed my problem.

          Show
          Todd Lipcon added a comment - With these fixes in place it fixed my problem.
          Hide
          Todd Lipcon added a comment -

          Here's a combined patch that also consumes hbase and zookeeper via ivy.

          I had to make a few changes to the ivy settings (with the help of my esteemed coworker Arvind) but it seems to be working now.

          Show
          Todd Lipcon added a comment - Here's a combined patch that also consumes hbase and zookeeper via ivy. I had to make a few changes to the ivy settings (with the help of my esteemed coworker Arvind) but it seems to be working now.
          Hide
          Todd Lipcon added a comment -

          Should we also update the deprecated "kitchen sink" jar to include the hbase jar?

          Show
          Todd Lipcon added a comment - Should we also update the deprecated "kitchen sink" jar to include the hbase jar?
          Hide
          Dmitriy V. Ryaboy added a comment -

          Yes, and probably guava (though I can also add directly into the job via reflection, the way you did for the bulk loader tsv job in HBase).

          I will do both tonight and commit this.

          Show
          Dmitriy V. Ryaboy added a comment - Yes, and probably guava (though I can also add directly into the job via reflection, the way you did for the bulk loader tsv job in HBase). I will do both tonight and commit this.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Attaching the hopefully-named final patch.

          Committing it.

          Show
          Dmitriy V. Ryaboy added a comment - Attaching the hopefully-named final patch. Committing it.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Committed to trunk.
          Leaving ticket open until a 0.8 version is committed.

          Show
          Dmitriy V. Ryaboy added a comment - Committed to trunk. Leaving ticket open until a 0.8 version is committed.
          Hide
          Todd Lipcon added a comment -

          Hey Dmitriy. One note - it seems we also should be adding the zookeeper jar to the uberjar, and also including ZK in the addDependencyJars() call.

          Show
          Todd Lipcon added a comment - Hey Dmitriy. One note - it seems we also should be adding the zookeeper jar to the uberjar, and also including ZK in the addDependencyJars() call.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Committed to the 0.8 branch. Closing.

          Show
          Dmitriy V. Ryaboy added a comment - Committed to the 0.8 branch. Closing.

            People

            • Assignee:
              Dmitriy V. Ryaboy
              Reporter:
              George P. Stathis
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development