Pig
  1. Pig
  2. PIG-1334

Make pig artifacts available through maven

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      ant mvn-install :To install artifact to the local filesystem
      ant mvn-deploy : To deploy snapshots to the apache nexus repo (looks for authentication in the ~/.m2/settings.xml)
      ant mvn-deploy -Drepo=staging :To deploy artifacts for voting before release , this also requires authentication configured in ~/.m2/settings.xml
      Deploying artifacts to the staging repository requires signing the artifacts with gpg keys, mvn-deploy target takes care of signing the artifacts. While executing mvn-deploy target with -Drepo=staging it would ask for gpg passphrase which need to be keyed in. Once the deployment is successful, to make the artifact available in the staging repository , login into the staging repository and close the staging by right clicking on the staged artifact at http:/repository.apache.org
      Show
      ant mvn-install :To install artifact to the local filesystem ant mvn-deploy : To deploy snapshots to the apache nexus repo (looks for authentication in the ~/.m2/settings.xml) ant mvn-deploy -Drepo=staging :To deploy artifacts for voting before release , this also requires authentication configured in ~/.m2/settings.xml Deploying artifacts to the staging repository requires signing the artifacts with gpg keys, mvn-deploy target takes care of signing the artifacts. While executing mvn-deploy target with -Drepo=staging it would ask for gpg passphrase which need to be keyed in. Once the deployment is successful, to make the artifact available in the staging repository , login into the staging repository and close the staging by right clicking on the staged artifact at http:/repository.apache.org
    1. mvn_pig_2.patch
      6 kB
      niraj rai
    2. mvn_pig_3.patch
      7 kB
      niraj rai
    3. mvn_pig_4.patch
      9 kB
      niraj rai
    4. mvn_pig_5.patch
      10 kB
      niraj rai
    5. mvn_pig_6.patch
      10 kB
      niraj rai
    6. mvn-pig.patch
      8 kB
      niraj rai

      Activity

      Hide
      Scott Carey added a comment -

      This would be very nice. It does not have to be in a future release however – one can package 0.7 and/or 0.6 and make an official release for a maven repository after the fact.

      Don't forget to package and publish the javadoc and source too – its wonderful when a development environment automatically pulls those down for reference too!

      Show
      Scott Carey added a comment - This would be very nice. It does not have to be in a future release however – one can package 0.7 and/or 0.6 and make an official release for a maven repository after the fact. Don't forget to package and publish the javadoc and source too – its wonderful when a development environment automatically pulls those down for reference too!
      Hide
      Johannes Rußek added a comment -

      agreed! I would love to pull in PigServer etc from a maven repository +1!

      Show
      Johannes Rußek added a comment - agreed! I would love to pull in PigServer etc from a maven repository +1!
      Hide
      Jeremy Hanna added a comment -

      Yes - 0.7.0 would be very nice to have in the public maven repos.

      Show
      Jeremy Hanna added a comment - Yes - 0.7.0 would be very nice to have in the public maven repos.
      Hide
      Jeremy Hanna added a comment -

      To clarify our need - the Cassandra project would like to use pig 0.7.0 using ivy as a build dependency.

      Show
      Jeremy Hanna added a comment - To clarify our need - the Cassandra project would like to use pig 0.7.0 using ivy as a build dependency.
      Hide
      Olga Natkovich added a comment -

      We need to look at how this is done by hadoop. build.xml in the hadoop-0.20 branch have the right information.

      We also need to update our release process similarly to Hadoop's: http://wiki.apache.org/hadoop/HowToRelease

      Show
      Olga Natkovich added a comment - We need to look at how this is done by hadoop. build.xml in the hadoop-0.20 branch have the right information. We also need to update our release process similarly to Hadoop's: http://wiki.apache.org/hadoop/HowToRelease
      Hide
      niraj rai added a comment -

      Hi,
      I have made change to copy the core pig.jar available to the maven repository. I have made it dependent on hadoop jar so that it pulls the hadoop jar. Please send me your feedback.
      Thanks
      Niraj

      Show
      niraj rai added a comment - Hi, I have made change to copy the core pig.jar available to the maven repository. I have made it dependent on hadoop jar so that it pulls the hadoop jar. Please send me your feedback. Thanks Niraj
      Hide
      niraj rai added a comment -

      Based on the feedback, I am renaming the pig jars to the old names. I had changed names to make them compatible with the maven naming standard. I am also putting pig.jar to maven repository rather than the pig-core-

      {version}

      .jar to the maven repo as the udf builders need the full jar rather than just the core jar.

      Show
      niraj rai added a comment - Based on the feedback, I am renaming the pig jars to the old names. I had changed names to make them compatible with the maven naming standard. I am also putting pig.jar to maven repository rather than the pig-core- {version} .jar to the maven repo as the udf builders need the full jar rather than just the core jar.
      Hide
      niraj rai added a comment -

      New patch after the review recommendations.

      Show
      niraj rai added a comment - New patch after the review recommendations.
      Hide
      niraj rai added a comment -

      Added mvn-deploy task to load the jar in the apache repos.
      Giri, can you test this patch as I don't have permission to run this test.

      Show
      niraj rai added a comment - Added mvn-deploy task to load the jar in the apache repos. Giri, can you test this patch as I don't have permission to run this test.
      Hide
      Giridharan Kesavan added a comment -

      patch fails with current trunk.

      Show
      Giridharan Kesavan added a comment - patch fails with current trunk.
      Hide
      niraj rai added a comment -

      implemented pgp signature

      Show
      niraj rai added a comment - implemented pgp signature
      Hide
      Richard Ding added a comment -

      I ran mvn-deploy target. It succeeded and the pig jar and other artifacts were deployed to

      https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/pig/0.8.0-SNAPSHOT/
      

      Giri, can you review the new patch?

      Show
      Richard Ding added a comment - I ran mvn-deploy target. It succeeded and the pig jar and other artifacts were deployed to https: //repository.apache.org/content/repositories/snapshots/org/apache/hadoop/pig/0.8.0-SNAPSHOT/ Giri, can you review the new patch?
      Hide
      Jeremy Hanna added a comment -

      is there a reason why this couldn't be used on 0.7 as well as 0.8+?

      Show
      Jeremy Hanna added a comment - is there a reason why this couldn't be used on 0.7 as well as 0.8+?
      Hide
      Olga Natkovich added a comment -

      This will be supported with all releases of 0.8 and later. For 0.7, we need a volunteer to backport it to 0.7 branch

      Show
      Olga Natkovich added a comment - This will be supported with all releases of 0.8 and later. For 0.7, we need a volunteer to backport it to 0.7 branch
      Hide
      Jeremy Hanna added a comment -

      I can take a look at backporting the patch once it is accepted as part of 0.8.

      Show
      Jeremy Hanna added a comment - I can take a look at backporting the patch once it is accepted as part of 0.8.
      Hide
      Olga Natkovich added a comment -

      Sounds great!

      Show
      Olga Natkovich added a comment - Sounds great!
      Hide
      Scott Carey added a comment -

      -1, based on the output at https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/pig/0.8.0-SNAPSHOT/

      Some problems here:

      1. There are no -sources and -javadoc jars generated.
      2. This jar is 11MB and includes a bunch of dependencies, many of which are optional:
      com.jcraft stuff, com.google stuff, and some com.sun stuff.
      It also includes, slf4j, modkito, junit, hamcrest, etc etc etc. Oh, and of course some javax.servlet that will break using it in a webapp container.

      In short, this is an improperly packaged maven/ivy jar.

      It should have only org.apache.pig, and specify other things as dependencies. Users can optionally package multiple things up into one jar. Pig can package a jar file that has all the required dependencies in it for the command line use (pig.jar). But the maven repo jars (pig-0.8.0.jar, pig-0.8.0-sources.jar, and pig-0.8.0-javadoc.jar) need to be pig and pig only.

      3. Any artifacts that are not needed at runtime should not be ordinary dependencies. Junit must be specified as test scope, webapp stuff (javax.*) is typically 'included' scope (the container provides it).

      Show
      Scott Carey added a comment - -1, based on the output at https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/pig/0.8.0-SNAPSHOT/ Some problems here: 1. There are no -sources and -javadoc jars generated. 2. This jar is 11MB and includes a bunch of dependencies, many of which are optional: com.jcraft stuff, com.google stuff, and some com.sun stuff. It also includes, slf4j, modkito, junit, hamcrest, etc etc etc. Oh, and of course some javax.servlet that will break using it in a webapp container. In short, this is an improperly packaged maven/ivy jar. It should have only org.apache.pig, and specify other things as dependencies. Users can optionally package multiple things up into one jar. Pig can package a jar file that has all the required dependencies in it for the command line use (pig.jar). But the maven repo jars (pig-0.8.0.jar, pig-0.8.0-sources.jar, and pig-0.8.0-javadoc.jar) need to be pig and pig only. 3. Any artifacts that are not needed at runtime should not be ordinary dependencies. Junit must be specified as test scope, webapp stuff (javax.*) is typically 'included' scope (the container provides it).
      Hide
      Richard Ding added a comment -

      2. This jar is 11MB and includes a bunch of dependencies, many of which are optional:

      We should deploy pig-0.8.0-SNAPSHOT-core.jar (which contains only Pig classes) instead of _pig-0.8.0-SNAPSHOT.jar (which also contains dependent jars).

      Show
      Richard Ding added a comment - 2. This jar is 11MB and includes a bunch of dependencies, many of which are optional: We should deploy pig-0.8.0-SNAPSHOT-core.jar (which contains only Pig classes) instead of _pig-0.8.0-SNAPSHOT.jar (which also contains dependent jars).
      Hide
      Jeremy Hanna added a comment -

      Maybe I'm missing something but why not take the required dependencies and make them into transitive dependencies and leave the optional ones out.

      Show
      Jeremy Hanna added a comment - Maybe I'm missing something but why not take the required dependencies and make them into transitive dependencies and leave the optional ones out.
      Hide
      Scott Carey added a comment -

      Sure, pig-<version>core.jar can be deployed to address this. deploying pig<version>.jar with packaged dependencies in addition to -core OK but non-standard practice. A -sources and -javadoc is trivial to add and very high value (My IDE will automatically link source and javadoc, for example).

      Maven/Ivy has "optional" dependencies, where a consumer must opt-in to pulling them but a direct build of pig will pull them. There is also 'included' scope which has a slightly different meaning with similar effect.

      Other things like JUnit are clearly test scope and should not be in any published jar.

      I am coming at this from the POV of a developer that wants to have Pig as a dependency for writing LoadFuncs and UDFs. For that, I only need the pig classes which should be available without baggage via a quick Ivy/Maven dependency declaration.

      Show
      Scott Carey added a comment - Sure, pig-<version> core.jar can be deployed to address this. deploying pig <version>.jar with packaged dependencies in addition to -core OK but non-standard practice. A -sources and -javadoc is trivial to add and very high value (My IDE will automatically link source and javadoc, for example). Maven/Ivy has "optional" dependencies, where a consumer must opt-in to pulling them but a direct build of pig will pull them. There is also 'included' scope which has a slightly different meaning with similar effect. Other things like JUnit are clearly test scope and should not be in any published jar. I am coming at this from the POV of a developer that wants to have Pig as a dependency for writing LoadFuncs and UDFs. For that, I only need the pig classes which should be available without baggage via a quick Ivy/Maven dependency declaration.
      Hide
      niraj rai added a comment -

      Attaching the patch with source jar. Source jar also includes docs. I have also changed to pigwithouthadoop jar.

      Show
      niraj rai added a comment - Attaching the patch with source jar. Source jar also includes docs. I have also changed to pigwithouthadoop jar.
      Hide
      niraj rai added a comment -

      Attaching another patch after merging with the latest build.xml

      Show
      niraj rai added a comment - Attaching another patch after merging with the latest build.xml
      Hide
      niraj rai added a comment -

      Attaching the source, core and the pom jar . It also has staging attachment url.

      Show
      niraj rai added a comment - Attaching the source, core and the pom jar . It also has staging attachment url.
      Show
      Richard Ding added a comment - The new output is at https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/pig/0.8.0-SNAPSHOT/
      Hide
      Richard Ding added a comment -

      The patch is committed to the trunk. Thanks Niraj for making this feature available.

      Show
      Richard Ding added a comment - The patch is committed to the trunk. Thanks Niraj for making this feature available.
      Hide
      Scott Carey added a comment -

      This ticket is incomplete.
      *It did not properly package javadoc.

      • JUnit, is not marked as a test-time dependency, but as a runtime dependency.
      • I suspect HBase is not a runtime dependency, but an 'optional' (non-transitive) or 'provided' dependency.

      Should this be re-opened or make a new ticket?

      There is a -sources.jar that has java source and additionally other documentation, but no javadoc that I can find, and if it is in there it doesn't have the right folder structure.

      A properly packaged Maven javadoc jar has a file structure like this:
      https://repository.apache.org/content/repositories/public/org/apache/avro/avro/1.4.0-SNAPSHOT/avro-1.4.0-20100825.231911-4-javadoc.jar

      When packaged properly, third party tools (IDE's like Eclipse) will automatically import the javadoc and java sources for the dependency, making them automatically available in the IDE when coding or debugging.

      Show
      Scott Carey added a comment - This ticket is incomplete. *It did not properly package javadoc. JUnit, is not marked as a test-time dependency, but as a runtime dependency. I suspect HBase is not a runtime dependency, but an 'optional' (non-transitive) or 'provided' dependency. Should this be re-opened or make a new ticket? There is a -sources.jar that has java source and additionally other documentation, but no javadoc that I can find, and if it is in there it doesn't have the right folder structure. A properly packaged Maven javadoc jar has a file structure like this: https://repository.apache.org/content/repositories/public/org/apache/avro/avro/1.4.0-SNAPSHOT/avro-1.4.0-20100825.231911-4-javadoc.jar When packaged properly, third party tools (IDE's like Eclipse) will automatically import the javadoc and java sources for the dependency, making them automatically available in the IDE when coding or debugging.
      Hide
      Richard Ding added a comment -

      Scott,

      Please create a new Jira for this. Another follow-up jira (PIG-1562) has already been opened.

      -Richard

      Show
      Richard Ding added a comment - Scott, Please create a new Jira for this. Another follow-up jira ( PIG-1562 ) has already been opened. -Richard

        People

        • Assignee:
          niraj rai
          Reporter:
          Olga Natkovich
        • Votes:
          2 Vote for this issue
          Watchers:
          5 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development