Pig
  1. Pig
  2. PIG-2277

Make Pig compile against Hadoop 0.22

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.8.1
    • Fix Version/s: None
    • Component/s: build
    • Labels:
      None

      Description

      Currently Pig depends on hadoop 0.20.
      While it is relatively easy to pass a property to the build (through -D or through build.properties) to set hadoop-core.version to something else, the problem is that since hadoop 0.22 the project has been split.
      No longer is there one single hadoop-core jar to build against.

      1. PIG-2277-branch-0.8.patch
        28 kB
        Joep Rottinghuis
      2. PIG-2277-branch-0.8.patch
        25 kB
        Joep Rottinghuis

        Issue Links

          Activity

          Hide
          Joep Rottinghuis added a comment -

          As far as I can tell ivy.xml needs to be updated along with ivy/library.properties.
          Also, hadoop-core is references in build.xml in the buildJar target when a single jar is created.
          I'm happy to submit a patch for this if that sounds like a reasonable approach.

          Show
          Joep Rottinghuis added a comment - As far as I can tell ivy.xml needs to be updated along with ivy/library.properties. Also, hadoop-core is references in build.xml in the buildJar target when a single jar is created. I'm happy to submit a patch for this if that sounds like a reasonable approach.
          Hide
          Joep Rottinghuis added a comment -

          Another interesting challenge will be that Pig appears to depend on HBase, which in turn depends on Hadoop.
          While the hadoop-core module is excluded (which should remain like that until HBase is built against 0.22, in which case the exclusion needs to be switched to the post-split format of hadoop-common, hadoop-hdfs, and hadoop-mapred.

          Show
          Joep Rottinghuis added a comment - Another interesting challenge will be that Pig appears to depend on HBase, which in turn depends on Hadoop. While the hadoop-core module is excluded (which should remain like that until HBase is built against 0.22, in which case the exclusion needs to be switched to the post-split format of hadoop-common, hadoop-hdfs, and hadoop-mapred.
          Hide
          Milind Bhandarkar added a comment -

          Joep, it might be easier to change bin/pig to use pig-withouthadoop.jar (the patch went in recently in pig trunk, and could be easily backported.) That way, pig does not bundle hadoop, and uses whatever version of hadoop it is using.

          Show
          Milind Bhandarkar added a comment - Joep, it might be easier to change bin/pig to use pig-withouthadoop.jar (the patch went in recently in pig trunk, and could be easily backported.) That way, pig does not bundle hadoop, and uses whatever version of hadoop it is using.
          Hide
          Joep Rottinghuis added a comment -

          Seems like a reasonable approach for the scripts once it is run.

          Still I need to make the proposed changes, otherwise Pig will simply not build (ivy cannot resolve hadoop-core module).
          Of course, we can take a pre-packaged build and run it, but I prefer to build things from source when running on a larger cluster because that will put us in a position that we can quickly check in bug fixes and roll.

          Or are you suggesting to build Pig against 0.20 and run against 0.22?

          Show
          Joep Rottinghuis added a comment - Seems like a reasonable approach for the scripts once it is run. Still I need to make the proposed changes, otherwise Pig will simply not build (ivy cannot resolve hadoop-core module). Of course, we can take a pre-packaged build and run it, but I prefer to build things from source when running on a larger cluster because that will put us in a position that we can quickly check in bug fixes and roll. Or are you suggesting to build Pig against 0.20 and run against 0.22?
          Hide
          Milind Bhandarkar added a comment -

          .bq Or are you suggesting to build Pig against 0.20 and run against 0.22?

          No, that will not work at all. It needs to be built against 0.22. I take my earlier comment back.

          Show
          Milind Bhandarkar added a comment - .bq Or are you suggesting to build Pig against 0.20 and run against 0.22? No, that will not work at all. It needs to be built against 0.22. I take my earlier comment back.
          Hide
          Daniel Dai added a comment -

          PIG-2239 made the change to bin/pig to pick the hadoop in HADOOP_HOME. I am not sure if there's API change between 0.20 and 0.22, if so, you may need a shim wrap as we did in PIG-2125.

          Show
          Daniel Dai added a comment - PIG-2239 made the change to bin/pig to pick the hadoop in HADOOP_HOME. I am not sure if there's API change between 0.20 and 0.22, if so, you may need a shim wrap as we did in PIG-2125 .
          Hide
          Joep Rottinghuis added a comment -

          Ok what API's are concerned, we'll have to jump off that bridge when we get to it.

          For now this jira is about building Pig. I'm hoping to do this without having to manually copy any jars into the lib directory.
          I'll create a patch and upload for folks to comment on.

          Show
          Joep Rottinghuis added a comment - Ok what API's are concerned, we'll have to jump off that bridge when we get to it. For now this jira is about building Pig. I'm hoping to do this without having to manually copy any jars into the lib directory. I'll create a patch and upload for folks to comment on.
          Hide
          Dmitriy V. Ryaboy added a comment -

          FWIW, the Giraph project handles building against multiple hadoop targets pretty elegantly, we should check out what Avery did there.

          Show
          Dmitriy V. Ryaboy added a comment - FWIW, the Giraph project handles building against multiple hadoop targets pretty elegantly, we should check out what Avery did there.
          Hide
          Joep Rottinghuis added a comment -

          Attaching patch for branch-0.8.
          Updated build.xml, ivy settings/properties, fixed compilation issues.
          Some APIs did change and I adjusted code accordingly.
          This builds and packages fine for us.
          Will be doing additional testing on our cluster.

          Show
          Joep Rottinghuis added a comment - Attaching patch for branch-0.8. Updated build.xml, ivy settings/properties, fixed compilation issues. Some APIs did change and I adjusted code accordingly. This builds and packages fine for us. Will be doing additional testing on our cluster.
          Hide
          Joep Rottinghuis added a comment -

          Looks like I missed the eclipse.templates/.classpath file.
          Will update that and attach new patch tomorrow.

          Show
          Joep Rottinghuis added a comment - Looks like I missed the eclipse.templates/.classpath file. Will update that and attach new patch tomorrow.
          Hide
          Alan Gates added a comment -

          Joep, I have a few questions on your plans for a 0.22 release of Hadoop. Are you planning on releasing a version of 0.8 Pig that works with 0.22? I would suggest working with 0.9 or trunk instead since a number of significant features that your users will find useful (macros, python embedding, works with HCatalog, etc.) have been added. You are of course free to pursue whichever branch you prefer.

          Also, I'm curious how you plan to test this release. I think it would be really good if you could get the end-to-end tests running against this before releasing it to assure you have adequate test coverage.

          Show
          Alan Gates added a comment - Joep, I have a few questions on your plans for a 0.22 release of Hadoop. Are you planning on releasing a version of 0.8 Pig that works with 0.22? I would suggest working with 0.9 or trunk instead since a number of significant features that your users will find useful (macros, python embedding, works with HCatalog, etc.) have been added. You are of course free to pursue whichever branch you prefer. Also, I'm curious how you plan to test this release. I think it would be really good if you could get the end-to-end tests running against this before releasing it to assure you have adequate test coverage.
          Hide
          Daniel Dai added a comment -

          Hi, Joep, is the patch intended to commit? It seems it will only be compilable with 22 but not 20.2.

          Show
          Daniel Dai added a comment - Hi, Joep, is the patch intended to commit? It seems it will only be compilable with 22 but not 20.2.
          Hide
          Joep Rottinghuis added a comment -

          @Daniel: at the moment it is an either-or proposition between 22 and 20.2 due to the project split and the API changes.
          Pig relies on a single hadoop-core jar or a set of jars (due to the project split).
          The API changes could possibly be shimmed in with additional effort.

          At the moment I would not recommend to apply the patch to Pig 0.8 in SVN until Hadoop 22 is released.
          People that want to build against 22 can apply the patch themselves and build against 22.

          Show
          Joep Rottinghuis added a comment - @Daniel: at the moment it is an either-or proposition between 22 and 20.2 due to the project split and the API changes. Pig relies on a single hadoop-core jar or a set of jars (due to the project split). The API changes could possibly be shimmed in with additional effort. At the moment I would not recommend to apply the patch to Pig 0.8 in SVN until Hadoop 22 is released. People that want to build against 22 can apply the patch themselves and build against 22.
          Hide
          Joep Rottinghuis added a comment -

          Eclipse template fixed and resolved version mismatch for additional libraries that caused builds to fail (e.g. jackson 1.7.3 evicted the specified 1.4.2)

          Show
          Joep Rottinghuis added a comment - Eclipse template fixed and resolved version mismatch for additional libraries that caused builds to fail (e.g. jackson 1.7.3 evicted the specified 1.4.2)
          Hide
          Joep Rottinghuis added a comment -

          Alan, good point. We had quite a few users on Pig 0.7, so making things work with 0.8 was the smallest step.
          It does probably make sense to have pig 0.9 working against 0.22 as well, so I'll be looking into porting the patch there as well.

          Show
          Joep Rottinghuis added a comment - Alan, good point. We had quite a few users on Pig 0.7, so making things work with 0.8 was the smallest step. It does probably make sense to have pig 0.9 working against 0.22 as well, so I'll be looking into porting the patch there as well.
          Hide
          Thomas Weise added a comment -

          The changes to make Pig 0.9 and later work with Hadoop 0.23 was committed in the meantime.

          Show
          Thomas Weise added a comment - The changes to make Pig 0.9 and later work with Hadoop 0.23 was committed in the meantime.
          Hide
          Alan Gates added a comment -

          Given that this patch can't be applied to 0.8 as is and there's been no movement on it in almost a year I'm going to cancel patch available status.

          Show
          Alan Gates added a comment - Given that this patch can't be applied to 0.8 as is and there's been no movement on it in almost a year I'm going to cancel patch available status.
          Hide
          Joep Rottinghuis added a comment -

          Given that Pig 0.9 compiles against 0.23, this bug can be closed with will not fix as far as I am concerned.

          Show
          Joep Rottinghuis added a comment - Given that Pig 0.9 compiles against 0.23, this bug can be closed with will not fix as far as I am concerned.

            People

            • Assignee:
              Unassigned
              Reporter:
              Joep Rottinghuis
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development