Avro
  1. Avro
  2. AVRO-647

Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.5.0
    • Component/s: java
    • Labels:
      None

      Description

      Our dependencies are starting to get a little complicated on the Java side.

      I propose we build two (possibly more) jars related to our major dependencies and functions.

      1. avro.jar (or perhaps avro-core.jar)
      This contains all of the core avro functionality for using avro as a library. This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop. This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?).

      2. avro-dev.jar
      This would contain compilers, idl, development tools, etc. Most applications will not need this, but build systems and developers will.

      3. avro-hadoop.jar
      This would contain the hadoop API and possibly pig/hive/whatever related to that. This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies.

      1. AVRO-647.patch
        51 kB
        Scott Carey
      2. AVRO-647.patch
        45 kB
        Scott Carey
      3. AVRO-647.patch
        43 kB
        Scott Carey
      4. migrateAvro.sh
        8 kB
        Scott Carey
      5. migrateAvro.sh
        8 kB
        Scott Carey

        Issue Links

          Activity

          Hide
          Scott Carey added a comment -

          I made one more commit to this JIRA to remove some accidentally added eclipse .project files. These are on svn:ignore yet were in svn.

          Show
          Scott Carey added a comment - I made one more commit to this JIRA to remove some accidentally added eclipse .project files. These are on svn:ignore yet were in svn.
          Hide
          Scott Carey added a comment -

          Looks good, more clear than mine.

          There is one error however. It turns out that I forgot to make avro-tools.jar include all of its dependencies. But, I think it makes sense to leave that alone and make another jar for this purpose, as discussed at AVRO-663.

          In that case, the documentation for avro-tools.jar should not contain "Embeds Avro components and dependencies" but should say "Depends on all other Avro jars".

          Show
          Scott Carey added a comment - Looks good, more clear than mine. There is one error however. It turns out that I forgot to make avro-tools.jar include all of its dependencies. But, I think it makes sense to leave that alone and make another jar for this purpose, as discussed at AVRO-663 . In that case, the documentation for avro-tools.jar should not contain "Embeds Avro components and dependencies" but should say "Depends on all other Avro jars".
          Hide
          Doug Cutting added a comment -

          Scott, I've re-formatted and edited the CHANGES.txt message as follows:

              AVRO-647. Java: Break avro.jar up into multiple parts: avro.jar,
              avro-compiler.jar, avro-ipc.jar, avro-mapred.jar, avro-tools.jar,
              and avro-maven-plugin.jar.
              
              Summary of artifacts: 
              * avro.jar
                Contains 'core' avro features:  schemas, data files,
                specific, generic, and reflect APIs.
                Dependencies: slf4j, Paranamer, Jackson.
              * avro-ipc.jar
                Contains Trancievers, Requestors, and Responders.
                Dependencies:  avro.jar, Jetty, Netty, and Velocity
              * avro-compiler.jar
                Contains SpecificCompiler, IDL compiler and Ant tasks.
                Dependencies: avro.jar, commmons-lang, and Velocity.
              * avro-maven-plugin.jar
                A Maven plugin for Avro's compiler.
                Dependencies: avro-compiler.jar
              * avro-mapred.jar
                API for Hadoop MapReduce with Avro data.
                Dependencies: avro-ipc.jar, hadoop-core, and jopt-simple.
              * avro-tools.jar
                Avro command-line tools.  Embeds Avro components and dependencies.
          
              (scottcarey)
          

          Does this look reasonable to you? If so, I'll commit it.

          Show
          Doug Cutting added a comment - Scott, I've re-formatted and edited the CHANGES.txt message as follows: AVRO-647. Java: Break avro.jar up into multiple parts: avro.jar, avro-compiler.jar, avro-ipc.jar, avro-mapred.jar, avro-tools.jar, and avro-maven-plugin.jar. Summary of artifacts: * avro.jar Contains 'core' avro features: schemas, data files, specific, generic , and reflect APIs. Dependencies: slf4j, Paranamer, Jackson. * avro-ipc.jar Contains Trancievers, Requestors, and Responders. Dependencies: avro.jar, Jetty, Netty, and Velocity * avro-compiler.jar Contains SpecificCompiler, IDL compiler and Ant tasks. Dependencies: avro.jar, commmons-lang, and Velocity. * avro-maven-plugin.jar A Maven plugin for Avro's compiler. Dependencies: avro-compiler.jar * avro-mapred.jar API for Hadoop MapReduce with Avro data. Dependencies: avro-ipc.jar, hadoop-core, and jopt-simple. * avro-tools.jar Avro command-line tools. Embeds Avro components and dependencies. (scottcarey) Does this look reasonable to you? If so, I'll commit it.
          Hide
          Scott Carey added a comment -

          Committed to trunk.

          Show
          Scott Carey added a comment - Committed to trunk.
          Hide
          Scott Carey added a comment -

          'mvn clean verify' does everything 'install' does other than push the snapshot.

          http://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html#Lifecycle_Reference

          We can tie the checkstyle stuff to the 'test' phase if we want.

          Adding javadoc and sources jar construction should be very easy --just add the plugins to the base pom. I'll have a quick look at that, or we can just commit this and look at these changes in a follow-on ticket that is required for 1.5.

          Thoughts? If I don't hear from anyone, I'll commit this tonight and create another ticket for follow-on tasks.

          Show
          Scott Carey added a comment - 'mvn clean verify' does everything 'install' does other than push the snapshot. http://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html#Lifecycle_Reference We can tie the checkstyle stuff to the 'test' phase if we want. Adding javadoc and sources jar construction should be very easy --just add the plugins to the base pom. I'll have a quick look at that, or we can just commit this and look at these changes in a follow-on ticket that is required for 1.5. Thoughts? If I don't hear from anyone, I'll commit this tonight and create another ticket for follow-on tasks.
          Hide
          Doug Cutting added a comment -

          This works for me. +1

          Today the suggested pre-commit invocation is 'ant clean test', which runs unit tests, javadoc, and checkstyle. It seems like 'mvn clean install' is close, but doesn't yet run javadoc and also has the side effect of installing the snapshot.

          Show
          Doug Cutting added a comment - This works for me. +1 Today the suggested pre-commit invocation is 'ant clean test', which runs unit tests, javadoc, and checkstyle. It seems like 'mvn clean install' is close, but doesn't yet run javadoc and also has the side effect of installing the snapshot.
          Hide
          Doug Cutting added a comment -

          RAT is invoked by the top-level build target to check the entire source distribution, not just Java. The excludes list is in share/rat-excludes.txt. Dunno if Maven will easily support this mode of operation, but, if not, we might just move that portion of build.xml to a share/rat/build.xml or somesuch.

          Show
          Doug Cutting added a comment - RAT is invoked by the top-level build target to check the entire source distribution, not just Java. The excludes list is in share/rat-excludes.txt. Dunno if Maven will easily support this mode of operation, but, if not, we might just move that portion of build.xml to a share/rat/build.xml or somesuch.
          Hide
          Scott Carey added a comment -

          Patch available.

          Show
          Scott Carey added a comment - Patch available.
          Hide
          Scott Carey added a comment -

          Same instructions as before. Both the shell script and the patch need to run from the lang/java directory.

          $ cd lang/java
          $ ../../migrateAvro.sh
          $ patch -p0 < ../../AVR0-647.patch
          $ svn add pom.xml avro/pom.xml compiler/pom.xml maven-plugin/pom.xml ipc/pom.xml mapred/pom.xml tools/pom.xml
          $ svn add maven-plugin/src/main/java/org/apache/avro/mojo/*
          

          To test it, just 'mvn install'. That should build, test, checkstyle, and install snapshot jars to your local maven repo.

          Maven 3.0.1 is recommended. It should work with Maven 2.2.1 for now though.

          Show
          Scott Carey added a comment - Same instructions as before. Both the shell script and the patch need to run from the lang/java directory. $ cd lang/java $ ../../migrateAvro.sh $ patch -p0 < ../../AVR0-647.patch $ svn add pom.xml avro/pom.xml compiler/pom.xml maven-plugin/pom.xml ipc/pom.xml mapred/pom.xml tools/pom.xml $ svn add maven-plugin/src/main/java/org/apache/avro/mojo/* To test it, just 'mvn install'. That should build, test, checkstyle, and install snapshot jars to your local maven repo. Maven 3.0.1 is recommended. It should work with Maven 2.2.1 for now though.
          Hide
          Scott Carey added a comment -

          checkstyle just needed a minor tweak, that works now. In the next patch that is bound to the maven 'validate' phase.

          RAT already works, via "mvn rat:check", because we inherit from the Apache master pom which sets that up for us. We probably want to run that in the validate phase too once it is configured right – it fails now due to several files that we should fix or ignore.

          After the migration, I validated that no java files were forgotten. The next patch will include 'svn delete java/lang/src' . We can remove the ivy stuff and bits of the ant script we no longer want in another ticket. This is also not wired up to the base project build.sh yet, but that should be an easy follow-on.

          Other follow-ons:
          Document the build process in the wiki.
          Ensure that javadoc/source jars are created in a release build.

          If this checks out and we have a +1 or two I'll commit it.

          Show
          Scott Carey added a comment - checkstyle just needed a minor tweak, that works now. In the next patch that is bound to the maven 'validate' phase. RAT already works, via "mvn rat:check", because we inherit from the Apache master pom which sets that up for us. We probably want to run that in the validate phase too once it is configured right – it fails now due to several files that we should fix or ignore. After the migration, I validated that no java files were forgotten. The next patch will include 'svn delete java/lang/src' . We can remove the ivy stuff and bits of the ant script we no longer want in another ticket. This is also not wired up to the base project build.sh yet, but that should be an easy follow-on. Other follow-ons: Document the build process in the wiki. Ensure that javadoc/source jars are created in a release build. If this checks out and we have a +1 or two I'll commit it.
          Hide
          Doug Cutting added a comment -

          Tests now pass for me. Thanks, Scott!

          Checkstyle & RAT should certainly be added soon, but we could perhaps commit this now without those?

          Show
          Doug Cutting added a comment - Tests now pass for me. Thanks, Scott! Checkstyle & RAT should certainly be added soon, but we could perhaps commit this now without those?
          Hide
          Scott Carey added a comment -

          Updated patch. migrateAvro.sh is unchanged. This should build, with all tests passing. Checkstyle and RAT are not yet working.

          Show
          Scott Carey added a comment - Updated patch. migrateAvro.sh is unchanged. This should build, with all tests passing. Checkstyle and RAT are not yet working.
          Hide
          Scott Carey added a comment -

          I have all tests passing now.

          The StatsPlugin needed to have Velocity configured to avoid auto-detection of the logging context, just like SpecificCompiler. That fixes the TestStatsPluginAndServlet test.
          The SpecificCompiler test was failing because javac did not get the classpath passed into it properly. The surefire plugin uses a manifest jar to set the classpath, and javac does not like this (it does not work with absolute paths).

          I'll upload a patch and shell script shortly with these changes.

          Show
          Scott Carey added a comment - I have all tests passing now. The StatsPlugin needed to have Velocity configured to avoid auto-detection of the logging context, just like SpecificCompiler. That fixes the TestStatsPluginAndServlet test. The SpecificCompiler test was failing because javac did not get the classpath passed into it properly. The surefire plugin uses a manifest jar to set the classpath, and javac does not like this (it does not work with absolute paths). I'll upload a patch and shell script shortly with these changes.
          Hide
          Doug Cutting added a comment -

          Scott, I'd really like to get this committed in the next week so that we can get a release out before the end of the year.

          What can I do to help? Do you want me to try to debug the tests that are failing after this is applied?

          Show
          Doug Cutting added a comment - Scott, I'd really like to get this committed in the next week so that we can get a release out before the end of the year. What can I do to help? Do you want me to try to debug the tests that are failing after this is applied?
          Hide
          Doug Cutting added a comment -

          Probably the IPC tests should be restructured. The way they work is ugly and perhaps fragile. The base test class has a static variables for the client and server. Then there's an @Before test which creates and starts the appropriate kind of server and client if they're null, setting this static variable. Then there's an @AfterClass method that closes the client and server. The @Test methods are inherited by subclasses so that the same tests can be run with different clients and servers. Perhaps this should be switched to use @RunWith(Parametrized.class)?

          Show
          Doug Cutting added a comment - Probably the IPC tests should be restructured. The way they work is ugly and perhaps fragile. The base test class has a static variables for the client and server. Then there's an @Before test which creates and starts the appropriate kind of server and client if they're null, setting this static variable. Then there's an @AfterClass method that closes the client and server. The @Test methods are inherited by subclasses so that the same tests can be run with different clients and servers. Perhaps this should be switched to use @RunWith(Parametrized.class)?
          Hide
          Scott Carey added a comment -

          Re: Failing tests.

          Mac OSX SnowLeopard (latest OSX version).

          I have test failures as described at : https://issues.apache.org/jira/browse/AVRO-668?focusedCommentId=12910239&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12910239

          There is a JIRA for one of them from a while back but its not what is blocking me now:
          https://issues.apache.org/jira/browse/AVRO-644

          We can work on that in another JIRA but from what I recall, we'll probably have to cooperate via irc. For the most part it boils down to ports and/or files not being there when the tests think they should be.

              [junit] Running org.apache.avro.TestProtocolGenericMeta
              [junit] Tests run: 6, Failures: 0, Errors: 5, Time elapsed: 0.018 sec
              [junit] Error: 
              [junit] 6057 [main] INFO org.apache.avro.ipc.SocketTransceiver - open to 0.0.0.0/0.0.0.0:61911
              [junit] 6058 [Connection to /10.0.0.231:61914] INFO org.apache.avro.ipc.SocketTransceiver - open to /10.0.0.231:61914
              [junit] 6061 [Connection to /10.0.0.231:61914] INFO org.apache.avro.TestProtocolGeneric - hello: bob
              [junit] 6062 [main] INFO org.apache.avro.ipc.SocketTransceiver - closing to 0.0.0.0/0.0.0.0:61911
              [junit] 
              [junit] TEST org.apache.avro.TestProtocolGenericMeta FAILED
          

          and more of a problem is that many tests related to ipc hang forever.
          At this time, I have this running indefinitely:

              [junit] Running org.apache.avro.TestProtocolHttp
              [junit] 6062 [main] INFO org.apache.avro.ipc.SocketTransceiver - closing to 0.0.0.0/0.0.0.0:61911
              [junit] 6072 [main] INFO org.apache.avro.ipc.DatagramTransceiver - sent to /127.0.0.1:11543
          

          in the stack trace:

          "main" prio=5 tid=102800800 nid=0x100501000 runnable [1004ff000]
             java.lang.Thread.State: RUNNABLE
          	at sun.nio.ch.DatagramChannelImpl.receive0(Native Method)
          	at sun.nio.ch.DatagramChannelImpl.receiveIntoNativeBuffer(DatagramChannelImpl.java:202)
          	at sun.nio.ch.DatagramChannelImpl.receive(DatagramChannelImpl.java:188)
          	at sun.nio.ch.DatagramChannelImpl.receive(DatagramChannelImpl.java:132)
          	- locked <10860b878> (a java.lang.Object)
          	at org.apache.avro.ipc.DatagramTransceiver.readBuffers(DatagramTransceiver.java:56)
          	- locked <10860b6c8> (a org.apache.avro.ipc.DatagramTransceiver)
          	at org.apache.avro.ipc.Transceiver.transceive(Transceiver.java:39)
          	- locked <10860b6c8> (a org.apache.avro.ipc.DatagramTransceiver)
          	at org.apache.avro.ipc.Requestor.request(Requestor.java:123)
          	- locked <10860f9b0> (a org.apache.avro.specific.SpecificRequestor)
          	at org.apache.avro.specific.SpecificRequestor.invoke(SpecificRequestor.java:52)
          	at $Proxy12.echo(Unknown Source)
          	at org.apache.avro.TestProtocolSpecific.testEcho(TestProtocolSpecific.java:108)
          

          I can spin up a linux VM to test before committing. It would definitely be good if we figured out what is going on however – i'm probably not the only Mac user with this problem.

          Show
          Scott Carey added a comment - Re: Failing tests. Mac OSX SnowLeopard (latest OSX version). I have test failures as described at : https://issues.apache.org/jira/browse/AVRO-668?focusedCommentId=12910239&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12910239 There is a JIRA for one of them from a while back but its not what is blocking me now: https://issues.apache.org/jira/browse/AVRO-644 We can work on that in another JIRA but from what I recall, we'll probably have to cooperate via irc. For the most part it boils down to ports and/or files not being there when the tests think they should be. [junit] Running org.apache.avro.TestProtocolGenericMeta [junit] Tests run: 6, Failures: 0, Errors: 5, Time elapsed: 0.018 sec [junit] Error: [junit] 6057 [main] INFO org.apache.avro.ipc.SocketTransceiver - open to 0.0.0.0/0.0.0.0:61911 [junit] 6058 [Connection to /10.0.0.231:61914] INFO org.apache.avro.ipc.SocketTransceiver - open to /10.0.0.231:61914 [junit] 6061 [Connection to /10.0.0.231:61914] INFO org.apache.avro.TestProtocolGeneric - hello: bob [junit] 6062 [main] INFO org.apache.avro.ipc.SocketTransceiver - closing to 0.0.0.0/0.0.0.0:61911 [junit] [junit] TEST org.apache.avro.TestProtocolGenericMeta FAILED and more of a problem is that many tests related to ipc hang forever. At this time, I have this running indefinitely: [junit] Running org.apache.avro.TestProtocolHttp [junit] 6062 [main] INFO org.apache.avro.ipc.SocketTransceiver - closing to 0.0.0.0/0.0.0.0:61911 [junit] 6072 [main] INFO org.apache.avro.ipc.DatagramTransceiver - sent to /127.0.0.1:11543 in the stack trace: "main" prio=5 tid=102800800 nid=0x100501000 runnable [1004ff000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.DatagramChannelImpl.receive0(Native Method) at sun.nio.ch.DatagramChannelImpl.receiveIntoNativeBuffer(DatagramChannelImpl.java:202) at sun.nio.ch.DatagramChannelImpl.receive(DatagramChannelImpl.java:188) at sun.nio.ch.DatagramChannelImpl.receive(DatagramChannelImpl.java:132) - locked <10860b878> (a java.lang.Object) at org.apache.avro.ipc.DatagramTransceiver.readBuffers(DatagramTransceiver.java:56) - locked <10860b6c8> (a org.apache.avro.ipc.DatagramTransceiver) at org.apache.avro.ipc.Transceiver.transceive(Transceiver.java:39) - locked <10860b6c8> (a org.apache.avro.ipc.DatagramTransceiver) at org.apache.avro.ipc.Requestor.request(Requestor.java:123) - locked <10860f9b0> (a org.apache.avro.specific.SpecificRequestor) at org.apache.avro.specific.SpecificRequestor.invoke(SpecificRequestor.java:52) at $Proxy12.echo(Unknown Source) at org.apache.avro.TestProtocolSpecific.testEcho(TestProtocolSpecific.java:108) I can spin up a linux VM to test before committing. It would definitely be good if we figured out what is going on however – i'm probably not the only Mac user with this problem.
          Hide
          Doug Cutting added a comment -

          > all tests run other than some IPC ones I need help with. Those don't work for me (and have not for 6+ months on my machine)

          I think we should get tests to pass for you before we commit this. Which tests fail and how? Is there a Jira issue for this? What platform are you running on? If the OS is the problem, an expedient short-term alternative might be to use a VM. Tests pass for me on Ubuntu 10.10 without problems.

          Show
          Doug Cutting added a comment - > all tests run other than some IPC ones I need help with. Those don't work for me (and have not for 6+ months on my machine) I think we should get tests to pass for you before we commit this. Which tests fail and how? Is there a Jira issue for this? What platform are you running on? If the OS is the problem, an expedient short-term alternative might be to use a VM. Tests pass for me on Ubuntu 10.10 without problems.
          Hide
          Doug Cutting added a comment -

          I finally had a chance to try this. It worries me that a bunch of tests don't pass. I'll try to have a look more at these after the weekend. Once we get tests passing then we should perhaps just commit this and then work hard to clean up the remaining issues ASAP afterwards. Scott, will you have time in the next few weeks to devote to this? I can work on it, but I'm a Maven newbie.

          Show
          Doug Cutting added a comment - I finally had a chance to try this. It worries me that a bunch of tests don't pass. I'll try to have a look more at these after the weekend. Once we get tests passing then we should perhaps just commit this and then work hard to clean up the remaining issues ASAP afterwards. Scott, will you have time in the next few weeks to devote to this? I can work on it, but I'm a Maven newbie.
          Hide
          Doug Cutting added a comment -

          > avro-ipc or avro-rpc ?

          I don't care either. I don't think it's worth renaming the Java package, yet RPC is the better-known term and what we tend to use in documentation. It's unfortunate to have two terms for the same thing, but I don't see how we can easily rectify that now.

          avro-mapred - [ ...] I stuck with the package name of the hadoop api.

          +1

          Show
          Doug Cutting added a comment - > avro-ipc or avro-rpc ? I don't care either. I don't think it's worth renaming the Java package, yet RPC is the better-known term and what we tend to use in documentation. It's unfortunate to have two terms for the same thing, but I don't see how we can easily rectify that now. avro-mapred - [ ...] I stuck with the package name of the hadoop api. +1
          Hide
          Scott Carey added a comment -

          Another minor detail, naming:

          avro-ipc or avro-rpc ? I really don't care.
          avro-mapred – we might end up with avro-mapreduce as well for the newer api, so I stuck with the package name of the hadoop api.

          Show
          Scott Carey added a comment - Another minor detail, naming: avro-ipc or avro-rpc ? I really don't care. avro-mapred – we might end up with avro-mapreduce as well for the newer api, so I stuck with the package name of the hadoop api.
          Hide
          Scott Carey added a comment -

          more remaining bits:

          adding RAT to the build.
          http://incubator.apache.org/rat/apache-rat-plugin/index.html

          fixing checkstyle
          adding the apache license header to the pom files – though the parent pom build process does seem to add that somehow.

          more notes:
          I had to copy a couple common test classes, we might want a build-only artifact for test-tools and test-resources.

          Show
          Scott Carey added a comment - more remaining bits: adding RAT to the build. http://incubator.apache.org/rat/apache-rat-plugin/index.html fixing checkstyle adding the apache license header to the pom files – though the parent pom build process does seem to add that somehow. more notes: I had to copy a couple common test classes, we might want a build-only artifact for test-tools and test-resources.
          Hide
          Scott Carey added a comment -

          Avro maven build patch and jar split-up.

          This is a mostly complete patch for splitting the Java portion of the avro project up into 6 sub-projects.
          This requires a maven plugin, and so solves AVRO-159 and part of AVRO-572 as well.

          The project structure is as follows:

          path from lang/java artifactId name artifact type notes
          / avro-parent Apache Avro Parent pom parent, inherits from Apache master pom, sets common build properties and versions
          /avro/ avro Apache Avro jar discussed as "avro-core" previously
          /compiler/ avro-compiler Apache Avro Compiler jar Avro IDL compiler and Specific compiler, including ant tasks
          /maven-plugin/ avro-maven-plugin Apache Avro Maven Plugin maven-plugin Maven mojos for avpr > java; avsc > java; avdl > java;
          /ipc/ avro-ipc Apache Avro IPC jar Avro IPC components, protocols, trancievers, etc
          /mapred/ avro-mapred Apache Avro Mapred API jar An org.apache.hadoop.mapred API using Avro serialization
          /tools/ avro-tools Apache Avro Tools jar (with dependencies) A single jar containing all of Avro and dependencies, with command line tools

          Status

          • Compiles, all tests run other than some IPC ones I need help with. Those don't work for me (and have not for 6+ months on my machine).
          • This is not integrated with the other language builds yet. There is a little work left there to tie the master buld to this.
          • This does not yet delete the old directory structure, so side-by-side comparrison is possible.
          • There are other changes / enhancements to the build and test process that can leverage this. I'm trying to get a commit done with the basics soon, we can open other tickets up for cleanup and enhancements. This is a big checkin with guaranteed merge issues. If we can get most of it in, that will solve the merge difficulties.
          • I have not gotten the 'with dependencies' part of avro-tools copmlete, that should not block reviewers from having a look.

          Patching Instructions

          Example instructions. Change to the lang/java directory, run the shell script, then the patch, then add the new items. The patch and script is based off of lang/java.

          $ cd lang/java
          $ ./migrateAvro.sh
          $ patch -p0 < ../../AVR0-647.patch
          $ svn add pom.xml avro/pom.xml compiler/pom.xml maven-plugin/pom.xml ipc/pom.xml mapred/pom.xml tools/pom.xml
          $ svn add maven-plugin/src/main/java/org/apache/avro/mojo/*
          

          Building Instructions: command-line

          To clean build all components without testing and install them in your local repository:

          $ mvn clean install -Dtest=false -DfailIfNoTests=false
          

          To compile only:

          $ mvn compile
          

          To run tests:

          $ mvn test
          

          To install to local repo, including running tests:

          $ mvn install
          

          Other useful mvn commands:

          $ mvn clean
          $ mvn validate
          $ mvn help:effective-pom
          $ mvn site
          $ mvn generate-resources
          

          To download all available javadoc and source of dependent projects into your local repo:

          $ mvn dependency:resolve -Dclassifier=javadoc
          $ mvn dependency:resolve -Dclassifier=sources
          

          Building Instructions: Eclipse

          Use Eclipse 3.6 Helios: http://www.eclipse.org/downloads/
          Use the m2Eclipse plugin, latest version.

          • Load the projects into the workspace using the "Import ..." dialog, and select "Existing Maven Projects"
          • Select the lang/java directory, and it should show all 7 projects including the parent. Import all of these.
          • After the load and first build, it will not completely compile. To fix it up to compile, select all of the projects and right-click. Select Maven > Update Project Configuration.

          More maven information:

          These are a good start:
          http://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html
          http://maven.apache.org/guides/introduction/introduction-to-the-pom.html

          For those new and experienced, "Maven By Example" is a very good intro – especially chapters 3+
          http://www.sonatype.com/books/mvnex-book/reference/public-book.html

          Apache's maven policies, tips, etc:
          http://www.apache.org/dev/publishing-maven-artifacts.html#inherit-parent

          Plugins used include:
          http://mojo.codehaus.org/javacc-maven-plugin/
          http://maven.apache.org/plugins/maven-surefire-plugin/
          http://maven.apache.org/plugins/maven-checkstyle-plugin/
          http://paranamer.codehaus.org/

          Other useful plugins:
          http://mojo.codehaus.org/build-helper-maven-plugin/usage.html
          http://mojo.codehaus.org/cobertura-maven-plugin/
          http://maven.apache.org/plugins/maven-shade-plugin/

          Documentation

          Much of this message is preliminary documentation. Please comment on it as well.

          Show
          Scott Carey added a comment - Avro maven build patch and jar split-up. This is a mostly complete patch for splitting the Java portion of the avro project up into 6 sub-projects. This requires a maven plugin, and so solves AVRO-159 and part of AVRO-572 as well. The project structure is as follows: path from lang/java artifactId name artifact type notes / avro-parent Apache Avro Parent pom parent, inherits from Apache master pom, sets common build properties and versions /avro/ avro Apache Avro jar discussed as "avro-core" previously /compiler/ avro-compiler Apache Avro Compiler jar Avro IDL compiler and Specific compiler, including ant tasks /maven-plugin/ avro-maven-plugin Apache Avro Maven Plugin maven-plugin Maven mojos for avpr > java; avsc > java; avdl > java; /ipc/ avro-ipc Apache Avro IPC jar Avro IPC components, protocols, trancievers, etc /mapred/ avro-mapred Apache Avro Mapred API jar An org.apache.hadoop.mapred API using Avro serialization /tools/ avro-tools Apache Avro Tools jar (with dependencies) A single jar containing all of Avro and dependencies, with command line tools Status Compiles, all tests run other than some IPC ones I need help with. Those don't work for me (and have not for 6+ months on my machine). This is not integrated with the other language builds yet. There is a little work left there to tie the master buld to this. This does not yet delete the old directory structure, so side-by-side comparrison is possible. There are other changes / enhancements to the build and test process that can leverage this. I'm trying to get a commit done with the basics soon, we can open other tickets up for cleanup and enhancements. This is a big checkin with guaranteed merge issues. If we can get most of it in, that will solve the merge difficulties. I have not gotten the 'with dependencies' part of avro-tools copmlete, that should not block reviewers from having a look. Patching Instructions Example instructions. Change to the lang/java directory, run the shell script, then the patch, then add the new items. The patch and script is based off of lang/java. $ cd lang/java $ ./migrateAvro.sh $ patch -p0 < ../../AVR0-647.patch $ svn add pom.xml avro/pom.xml compiler/pom.xml maven-plugin/pom.xml ipc/pom.xml mapred/pom.xml tools/pom.xml $ svn add maven-plugin/src/main/java/org/apache/avro/mojo/* Building Instructions: command-line To clean build all components without testing and install them in your local repository: $ mvn clean install -Dtest=false -DfailIfNoTests=false To compile only: $ mvn compile To run tests: $ mvn test To install to local repo, including running tests: $ mvn install Other useful mvn commands: $ mvn clean $ mvn validate $ mvn help:effective-pom $ mvn site $ mvn generate-resources To download all available javadoc and source of dependent projects into your local repo: $ mvn dependency:resolve -Dclassifier=javadoc $ mvn dependency:resolve -Dclassifier=sources Building Instructions: Eclipse Use Eclipse 3.6 Helios: http://www.eclipse.org/downloads/ Use the m2Eclipse plugin, latest version. Load the projects into the workspace using the "Import ..." dialog, and select "Existing Maven Projects" Select the lang/java directory, and it should show all 7 projects including the parent. Import all of these. After the load and first build, it will not completely compile. To fix it up to compile, select all of the projects and right-click. Select Maven > Update Project Configuration . More maven information: These are a good start: http://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html http://maven.apache.org/guides/introduction/introduction-to-the-pom.html For those new and experienced, "Maven By Example" is a very good intro – especially chapters 3+ http://www.sonatype.com/books/mvnex-book/reference/public-book.html Apache's maven policies, tips, etc: http://www.apache.org/dev/publishing-maven-artifacts.html#inherit-parent Plugins used include: http://mojo.codehaus.org/javacc-maven-plugin/ http://maven.apache.org/plugins/maven-surefire-plugin/ http://maven.apache.org/plugins/maven-checkstyle-plugin/ http://paranamer.codehaus.org/ Other useful plugins: http://mojo.codehaus.org/build-helper-maven-plugin/usage.html http://mojo.codehaus.org/cobertura-maven-plugin/ http://maven.apache.org/plugins/maven-shade-plugin/ Documentation Much of this message is preliminary documentation. Please comment on it as well.
          Hide
          Doug Cutting added a comment -

          Sounds great. Thanks for the update!

          Show
          Doug Cutting added a comment - Sounds great. Thanks for the update!
          Hide
          Scott Carey added a comment -

          I'll have time in the first week of November to work on it and produce a patch to review.

          About half the work is shared in both solutions: Splitting up the projects into directories, moving classes around, keeping track of all the 'svn mv' and 'svn add' commands that will be required. That, in combination with figuring out some of the more complicated testing bits, is what primarily stalled me.
          However, I think I already got past the difficult most difficult parts related to Requestor/Responder, but I won't know for sure until I try and tie the rest of it together.

          Show
          Scott Carey added a comment - I'll have time in the first week of November to work on it and produce a patch to review. About half the work is shared in both solutions: Splitting up the projects into directories, moving classes around, keeping track of all the 'svn mv' and 'svn add' commands that will be required. That, in combination with figuring out some of the more complicated testing bits, is what primarily stalled me. However, I think I already got past the difficult most difficult parts related to Requestor/Responder, but I won't know for sure until I try and tie the rest of it together.
          Hide
          Doug Cutting added a comment -

          Scott, are you going to be able to complete this as a Maven conversion of the Java build, or should I tackle it with Ivy & Ant?

          Show
          Doug Cutting added a comment - Scott, are you going to be able to complete this as a Maven conversion of the Java build, or should I tackle it with Ivy & Ant?
          Hide
          Scott Carey added a comment -

          Yes, the path I have gone down so far has split the classes in packages amongst the jars. Moving Requestor means moving several things that go with it. There is no way at this time to split by package in most places.

          Having it be easier in a few places would be nice however. The list of 'svn cp' s to run before applying the patch is getting very messy! If there are a few places where an entire package can move save for one or two classes, it might be worthwhile to move them. specific/reflect/generic are going to be split no matter what as far as I can tell – and rightfully so at this time.

          Show
          Scott Carey added a comment - Yes, the path I have gone down so far has split the classes in packages amongst the jars. Moving Requestor means moving several things that go with it. There is no way at this time to split by package in most places. Having it be easier in a few places would be nice however. The list of 'svn cp' s to run before applying the patch is getting very messy! If there are a few places where an entire package can move save for one or two classes, it might be worthwhile to move them. specific/reflect/generic are going to be split no matter what as far as I can tell – and rightfully so at this time.
          Hide
          Doug Cutting added a comment -

          > ByteBufferInputStream and ByteBufferOutputStream are used by BinaryDecoder and BinaryEncoder and we should consider moving them to util or io.

          Those should perhaps move to the io package.

          > In order to make a 'core' library I moved Requestor and Responder to avro-ipc.

          You moved these along with the various implementations of Requestor and Responder, so the jar splits don't correspond to java packages, right? If we embrace that approach generally, then we wouldn't move any classes to different packages at this stage. Rather the different trees and jars can overlap in the java packages they contain. The only incompatibility we create at this point will be in packaging, not in any APIs. It would be good to separate API changes from packaging changes.

          So we'd then leave ByteBufferInputStream, ByteBufferOutputStream and AvroRemoteException in the ipc package, but include them in the core jar & tree.

          Show
          Doug Cutting added a comment - > ByteBufferInputStream and ByteBufferOutputStream are used by BinaryDecoder and BinaryEncoder and we should consider moving them to util or io. Those should perhaps move to the io package. > In order to make a 'core' library I moved Requestor and Responder to avro-ipc. You moved these along with the various implementations of Requestor and Responder, so the jar splits don't correspond to java packages, right? If we embrace that approach generally, then we wouldn't move any classes to different packages at this stage. Rather the different trees and jars can overlap in the java packages they contain. The only incompatibility we create at this point will be in packaging, not in any APIs. It would be good to separate API changes from packaging changes. So we'd then leave ByteBufferInputStream, ByteBufferOutputStream and AvroRemoteException in the ipc package, but include them in the core jar & tree.
          Hide
          Scott Carey added a comment - - edited

          Which classes are you thinking of?

          ByteBufferInputStream and ByteBufferOutputStream are used by BinaryDecoder and BinaryEncoder and we should consider moving them to util or io.
          AvroRemoteException is referenced in many places as well.

          Generic, specific and reflect all depend on ipc for Requestor and Responder. The complicated bit is that ipc depends on the specific compiler for Handshake

          Unknown macro: {Request,Response}

          . So perhaps

          Unknown macro: {Generic,Specific,Reflect}
          Unknown macro: {Requestor,Responder}

          should all move to ipc to remove that circularity. That would make the build easier.

          In order to make a 'core' library I moved Requestor and Responder to avro-ipc. It was the cleanest break that allowed the Generic/Specific/Reflect API to otherwise remain.

          Moving them all to ipc doesn't remove the circularity, you still can't build Requestor/Responder without first building SpecificCompiler and generating classes. With Specific in 'core' ant tasks / maven plugins for the SpecificCompiler can be built off of core, and then ipc can be built after generating the classes that Requestor/Responder need using the just-built ant/maven tool.

          Unless we figure out how to extract the dependency on generated code in Requestor/Responder (wrappers?), it looks like we have to build the SpecificCompiler before Requestor/Responder.

          Show
          Scott Carey added a comment - - edited Which classes are you thinking of? ByteBufferInputStream and ByteBufferOutputStream are used by BinaryDecoder and BinaryEncoder and we should consider moving them to util or io. AvroRemoteException is referenced in many places as well. Generic, specific and reflect all depend on ipc for Requestor and Responder. The complicated bit is that ipc depends on the specific compiler for Handshake Unknown macro: {Request,Response} . So perhaps Unknown macro: {Generic,Specific,Reflect} Unknown macro: {Requestor,Responder} should all move to ipc to remove that circularity. That would make the build easier. In order to make a 'core' library I moved Requestor and Responder to avro-ipc. It was the cleanest break that allowed the Generic/Specific/Reflect API to otherwise remain. Moving them all to ipc doesn't remove the circularity, you still can't build Requestor/Responder without first building SpecificCompiler and generating classes. With Specific in 'core' ant tasks / maven plugins for the SpecificCompiler can be built off of core, and then ipc can be built after generating the classes that Requestor/Responder need using the just-built ant/maven tool. Unless we figure out how to extract the dependency on generated code in Requestor/Responder (wrappers?), it looks like we have to build the SpecificCompiler before Requestor/Responder.
          Hide
          Doug Cutting added a comment -

          > separating specific, generic, and reflect is meaningful.

          I agree they're logically separate, but I think we want to avoid slicing things into 20 logically distinct jars.

          > There are also dependencies on the o.a.a.ipc package from all over the place due to having utility classes there that should be in .util instead. [ ... ]

          Which classes are you thinking of? I think we should resist the tendency to move things into util when we can't figure out where they belong.

          Generic, specific and reflect all depend on ipc for Requestor and Responder. The complicated bit is that ipc depends on the specific compiler for Handshake

          {Request,Response}

          . So perhaps

          {Generic,Specific,Reflect} {Requestor,Responder}

          should all move to ipc to remove that circularity. That would make the build easier.

          Show
          Doug Cutting added a comment - > separating specific, generic, and reflect is meaningful. I agree they're logically separate, but I think we want to avoid slicing things into 20 logically distinct jars. > There are also dependencies on the o.a.a.ipc package from all over the place due to having utility classes there that should be in .util instead. [ ... ] Which classes are you thinking of? I think we should resist the tendency to move things into util when we can't figure out where they belong. Generic, specific and reflect all depend on ipc for Requestor and Responder. The complicated bit is that ipc depends on the specific compiler for Handshake {Request,Response} . So perhaps {Generic,Specific,Reflect} {Requestor,Responder} should all move to ipc to remove that circularity. That would make the build easier.
          Hide
          Scott Carey added a comment -

          Do you mean separating out all three from the 'inner' decoder/encoder/schema layer? Or separating out those individually?

          Separating Specific from the rest was easy. However, it turned out to only be a handful of classes without external dependencies, so there wasn't much of a point.

          There are also dependencies on the o.a.a.ipc package from all over the place due to having utility classes there that should be in .util instead. I think what I might try to do first is some refactoring to clean up that sort of stuff.

          Show
          Scott Carey added a comment - Do you mean separating out all three from the 'inner' decoder/encoder/schema layer? Or separating out those individually? Separating Specific from the rest was easy. However, it turned out to only be a handful of classes without external dependencies, so there wasn't much of a point. There are also dependencies on the o.a.a.ipc package from all over the place due to having utility classes there that should be in .util instead. I think what I might try to do first is some refactoring to clean up that sort of stuff.
          Hide
          Philip Zeyliger added a comment -

          BTW, I don't know how easy it is to separate (I suspect not easy), but separating specific, generic, and reflect is meaningful.

          For testing, I think it's not harmful, in large part, for the test targets to depend on everything.

          Show
          Philip Zeyliger added a comment - BTW, I don't know how easy it is to separate (I suspect not easy), but separating specific, generic, and reflect is meaningful. For testing, I think it's not harmful, in large part, for the test targets to depend on everything.
          Hide
          Scott Carey added a comment -

          More experimentation on the Avro front has brought out some interesting quirks in our dependencies.

          1. I'm not sure it makes sense to separate IDL and Specific from Core. It turns out that the only extra library required as a runtime dependency for those two is commons-lang, and the one class and method used there we could simply copy into our code to avoid the dependency. Javacc is a build time only dependency that should not show up in our POM at all. Paranamer-ant is the same. Both have Maven plugins. The upcoming templating version of SpecificCompiler might change what we want to do though.

          2. A LOT of our testing requires use of the Specific Compiler. Most of the ipc package depends on output of the Specific Compiler to compile; Requestor/Responder are at the heart of that. This would require that these be in a separate artifact. The Maven artifacts would be

          avro-core (possibly with IDL)
          avro-compile (optional, current version can be in core, template based one may require separation or shading)
          avro-maven-plugin (Maven plugins for idl, specific compiler; depends on core and compile)
          avro-ant (the two classes for Ant tasks; depends on core, compile)
          avro-ipc (IPC w/ netty/jetty; depends on core, compile, uses maven-plugin; most testing is not possible until here!)
          avro-mapred (including tether, or that separate?)
          avro-tools

          That is a lot of stuff, but really only 4 libraries that others can depend on, two build tools, and one command-line tool.
          The part that is a bit of a problem is that most of our testing of core can't happen in the core project because of its dependencies on specific compiler output.

          Show
          Scott Carey added a comment - More experimentation on the Avro front has brought out some interesting quirks in our dependencies. 1. I'm not sure it makes sense to separate IDL and Specific from Core. It turns out that the only extra library required as a runtime dependency for those two is commons-lang, and the one class and method used there we could simply copy into our code to avoid the dependency. Javacc is a build time only dependency that should not show up in our POM at all. Paranamer-ant is the same. Both have Maven plugins. The upcoming templating version of SpecificCompiler might change what we want to do though. 2. A LOT of our testing requires use of the Specific Compiler. Most of the ipc package depends on output of the Specific Compiler to compile; Requestor/Responder are at the heart of that. This would require that these be in a separate artifact. The Maven artifacts would be avro-core (possibly with IDL) avro-compile (optional, current version can be in core, template based one may require separation or shading) avro-maven-plugin (Maven plugins for idl, specific compiler; depends on core and compile) avro-ant (the two classes for Ant tasks; depends on core, compile) avro-ipc (IPC w/ netty/jetty; depends on core, compile, uses maven-plugin; most testing is not possible until here!) avro-mapred (including tether, or that separate?) avro-tools That is a lot of stuff, but really only 4 libraries that others can depend on, two build tools, and one command-line tool. The part that is a bit of a problem is that most of our testing of core can't happen in the core project because of its dependencies on specific compiler output.
          Hide
          Scott Carey added a comment -

          I think what is key here is whether the maintenance can be shared easily. Only one person has to build this out (me for Maven, or likely Doug if Ant/Ivy).

          Both take significantly less effort or expertise to modify and tweak once its mostly set up.

          Even if we go with Maven, Ant will be around to deal with things Maven doesn't do well. They are complimentary tools.

          At this point I've got Maven working as far as it will easily go without moving source trees around and splitting up the build. That is a significantly larger time investment. It doesn't look too difficult to keep going however.

          Show
          Scott Carey added a comment - I think what is key here is whether the maintenance can be shared easily. Only one person has to build this out (me for Maven, or likely Doug if Ant/Ivy). Both take significantly less effort or expertise to modify and tweak once its mostly set up. Even if we go with Maven, Ant will be around to deal with things Maven doesn't do well. They are complimentary tools. At this point I've got Maven working as far as it will easily go without moving source trees around and splitting up the build. That is a significantly larger time investment. It doesn't look too difficult to keep going however.
          Hide
          Philip Zeyliger added a comment -

          To be clear, I'm too much of a maven incompetent to volunteer. I would be happy to test it out after the fact, though.

          BTW, it would be totally acceptable and desirable for the maven plugins for avro code generation to be part of Avro's build. Patrick, who wrote the plugin, would be happy to contribute it, if he hasn't already. That solves a versioning problem for the plugin, too.

          Show
          Philip Zeyliger added a comment - To be clear, I'm too much of a maven incompetent to volunteer. I would be happy to test it out after the fact, though. BTW, it would be totally acceptable and desirable for the maven plugins for avro code generation to be part of Avro's build. Patrick, who wrote the plugin, would be happy to contribute it, if he hasn't already. That solves a versioning problem for the plugin, too.
          Hide
          Scott Carey added a comment -

          It sounds like there is at least consensus to split the source tree up. This will make either Ant or Maven easier to deal with to get the job done. So that rules out #1.

          > To wire up IDL and the Specific compiler, Maven plugins would be required. Interop testing would probably still require ant.

          Can you please explain these more?

          IDL and the Specific compiler depend on Avro core to run. We have a multi-step build: build classes that don't

          {depends.on.generated} then generate some stuff, then build those classes.

          In maven, its not strictly required, but very difficult, to do something like the above without declaring the dependency and making it its own artifact. Basically, the easy way is to split things up into core, rpc, idl, mapred, and tools and build them in the right order as separate components with explicit dependencies.
          The easy way to do code generation is to make a maven plugin like AVRO-159 and use it in the build. Fortunately, that means that Maven plugins for Specific and IDL are part of our own build and thus natural for us to maintain.

          I have made a pom.xml that will build avro, but it excludes the {depends.on.generated}

          stuff and doesn't do any tests that require code generation or interop.

          I haven't looked at how to do interop testing yet, but it seems like something that is at a higher level than the Java build. Maven doesn't naturally pull data from anywhere that is not within the project or a declared artifact. That might end up being easier to wire up with the other language builds using ant or shell scripts.

          Show
          Scott Carey added a comment - It sounds like there is at least consensus to split the source tree up. This will make either Ant or Maven easier to deal with to get the job done. So that rules out #1. > To wire up IDL and the Specific compiler, Maven plugins would be required. Interop testing would probably still require ant. Can you please explain these more? IDL and the Specific compiler depend on Avro core to run. We have a multi-step build: build classes that don't {depends.on.generated} then generate some stuff, then build those classes. In maven, its not strictly required, but very difficult, to do something like the above without declaring the dependency and making it its own artifact. Basically, the easy way is to split things up into core, rpc, idl, mapred, and tools and build them in the right order as separate components with explicit dependencies. The easy way to do code generation is to make a maven plugin like AVRO-159 and use it in the build. Fortunately, that means that Maven plugins for Specific and IDL are part of our own build and thus natural for us to maintain. I have made a pom.xml that will build avro, but it excludes the {depends.on.generated} stuff and doesn't do any tests that require code generation or interop. I haven't looked at how to do interop testing yet, but it seems like something that is at a higher level than the Java build. Maven doesn't naturally pull data from anywhere that is not within the project or a declared artifact. That might end up being easier to wire up with the other language builds using ant or shell scripts.
          Hide
          Doug Cutting added a comment -

          I'm +0 for a full-Maven Java build. I'd not oppose if someone else implements it, it's easy to maintain, supports what's required, etc.

          If I were to do it myself, I'd probably use Ant, split the tree in four (core, idl+rpc, mapred, tools), have each import a shared build.xml file then have a top-level build.xml that calls the others. I would be willing to do this over the coming month if no one else volunteers.

          But if someone else (Scott?, Philip?) volunteers to implement this using Maven, I'd not get in their way.

          > To wire up IDL and the Specific compiler, Maven plugins would be required. Interop testing would probably still require ant.

          Can you please explain these more?

          Show
          Doug Cutting added a comment - I'm +0 for a full-Maven Java build. I'd not oppose if someone else implements it, it's easy to maintain, supports what's required, etc. If I were to do it myself, I'd probably use Ant, split the tree in four (core, idl+rpc, mapred, tools), have each import a shared build.xml file then have a top-level build.xml that calls the others. I would be willing to do this over the coming month if no one else volunteers. But if someone else (Scott?, Philip?) volunteers to implement this using Maven, I'd not get in their way. > To wire up IDL and the Specific compiler, Maven plugins would be required. Interop testing would probably still require ant. Can you please explain these more?
          Hide
          Philip Zeyliger added a comment -

          I would be +1 full-maven for Java. Amongst the evils available, it's one of the least objectionable. I'm using it on another project now, and, well, I hate that I don't know what it's doing half the time, but it removes a considerable amount of the Ivy and Ant boilerplate.

          Show
          Philip Zeyliger added a comment - I would be +1 full-maven for Java. Amongst the evils available, it's one of the least objectionable. I'm using it on another project now, and, well, I hate that I don't know what it's doing half the time, but it removes a considerable amount of the Ivy and Ant boilerplate.
          Hide
          Scott Carey added a comment -

          Are there any strong feelings on the three choices above? To some extent I favor just going all the way to a maven build. That makes dependency management easy, but does add baggage otherwise and is a learning curve for some.

          Show
          Scott Carey added a comment - Are there any strong feelings on the three choices above? To some extent I favor just going all the way to a maven build. That makes dependency management easy, but does add baggage otherwise and is a learning curve for some.
          Hide
          Doug Cutting added a comment -

          > Instead, we could simply document this all clearly so that users are armed with the information necessary to configure their builds to exclude transitive dependencies they don't use.

          That might be a useful short-term strategy: make more dependencies optional and document which features require what dependencies.

          Show
          Doug Cutting added a comment - > Instead, we could simply document this all clearly so that users are armed with the information necessary to configure their builds to exclude transitive dependencies they don't use. That might be a useful short-term strategy: make more dependencies optional and document which features require what dependencies.
          Hide
          Scott Carey added a comment -

          Yeah, I'm actually using a custom avro-maven-plugin based on the earlier versions of that for my build (the early versions did not compile avsc, only avpr). So that part should not be too hard. It would be a very radical change from ant/ivy though and there are bound to be some tricky things in a big change like that.

          Show
          Scott Carey added a comment - Yeah, I'm actually using a custom avro-maven-plugin based on the earlier versions of that for my build (the early versions did not compile avsc, only avpr). So that part should not be too hard. It would be a very radical change from ant/ivy though and there are bound to be some tricky things in a big change like that.
          Hide
          Philip Zeyliger added a comment -

          If you go the mvn route (the one thing I love about maven is that it reliably puts the sources of the jars we depend on in my Eclipse workspace), http://github.com/phunt/avro-maven-plugin handles some of the specific stuff.

          Show
          Philip Zeyliger added a comment - If you go the mvn route (the one thing I love about maven is that it reliably puts the sources of the jars we depend on in my Eclipse workspace), http://github.com/phunt/avro-maven-plugin handles some of the specific stuff.
          Hide
          Scott Carey added a comment -

          Finally, to be clear, is there a motive for this beyond better expressing dependencies? Functionally sticking everything in a single jar with lots of optional dependencies works fine, but folks then have to guess which dependencies they actually need, and that's the primary problem this seeks to solve. Is that right, or are there other problems too?

          That is the main case here. Dependendies become more explicit. Users should be able to consume the parts they need without too much accidental baggage. Instead, we could simply document this all clearly so that users are armed with the information necessary to configure their builds to exclude transitive dependencies they don't use.

          However, Avro is by nature something that many things will depend on, and many of those things portions of Avro might itself depend on. In particular, making it easy to avoid circular dependencies is a plus. As we have seen (https://issues.apache.org/jira/browse/AVRO-545) , even if it is possible to use ivy/maven features to prevent circular dependency, it makes users uneasy.

          The guidelines I use for my projects is two-fold:

          • If the cascaded set of dependencies is large and likely to conflict with other things, it should be easy to separate (for Avro, this is the hadoop dependency).
          • If the dependency is physically large (large jar file), consider making it easy to separate.
          • If the dependency is for a minor rarely used feature, be careful. For example Jackson 1.0.1 being used by hadoop 0.20+ for dumping configuration files to JSON causes problems.

          So for the case of Reflect, if paranamer doesn't have a lot of cascaded dependencies itself, nor is a large jar on its own, then including it in avro-data is not going to be a big deal.

          If we separate jars, it might be good to split the build-time classpath in the same manner, by splitting the src tree.

          We have three choices, I think:
          1. Leave the source tree as-is, and have the build use ant file excludes/includes to define what is packaged in each one. Managing the excludes/includes will be troublesome and would be easier if the split was cleanly done by package. Not much else would have to change – the compile and test phases would stay the same. There would also be the downside that tests would not implicitly test the packaging boundaries.
          2. Break it into different source trees and continue using ant/ivy. This is more work and means we would be breaking up tests and compile phases too.
          3. Break it into different source trees and use maven. Maven is a natural fit for this sort of thing and I'm experienced with it, but it is not trivial and others here aren't as familiar with it. To wire up IDL and the Specific compiler, Maven plugins would be required. Interop testing would probably still require ant.

          Show
          Scott Carey added a comment - Finally, to be clear, is there a motive for this beyond better expressing dependencies? Functionally sticking everything in a single jar with lots of optional dependencies works fine, but folks then have to guess which dependencies they actually need, and that's the primary problem this seeks to solve. Is that right, or are there other problems too? That is the main case here. Dependendies become more explicit. Users should be able to consume the parts they need without too much accidental baggage. Instead, we could simply document this all clearly so that users are armed with the information necessary to configure their builds to exclude transitive dependencies they don't use. However, Avro is by nature something that many things will depend on, and many of those things portions of Avro might itself depend on. In particular, making it easy to avoid circular dependencies is a plus. As we have seen ( https://issues.apache.org/jira/browse/AVRO-545 ) , even if it is possible to use ivy/maven features to prevent circular dependency, it makes users uneasy. The guidelines I use for my projects is two-fold: If the cascaded set of dependencies is large and likely to conflict with other things, it should be easy to separate (for Avro, this is the hadoop dependency). If the dependency is physically large (large jar file), consider making it easy to separate. If the dependency is for a minor rarely used feature, be careful. For example Jackson 1.0.1 being used by hadoop 0.20+ for dumping configuration files to JSON causes problems. So for the case of Reflect, if paranamer doesn't have a lot of cascaded dependencies itself, nor is a large jar on its own, then including it in avro-data is not going to be a big deal. If we separate jars, it might be good to split the build-time classpath in the same manner, by splitting the src tree. We have three choices, I think: 1. Leave the source tree as-is, and have the build use ant file excludes/includes to define what is packaged in each one. Managing the excludes/includes will be troublesome and would be easier if the split was cleanly done by package. Not much else would have to change – the compile and test phases would stay the same. There would also be the downside that tests would not implicitly test the packaging boundaries. 2. Break it into different source trees and continue using ant/ivy. This is more work and means we would be breaking up tests and compile phases too. 3. Break it into different source trees and use maven. Maven is a natural fit for this sort of thing and I'm experienced with it, but it is not trivial and others here aren't as familiar with it. To wire up IDL and the Specific compiler, Maven plugins would be required. Interop testing would probably still require ant.
          Hide
          Philip Zeyliger added a comment -

          I may be missing something: what's http-client used for in the tools category?

          Show
          Philip Zeyliger added a comment - I may be missing something: what's http-client used for in the tools category?
          Hide
          Doug Cutting added a comment -

          A breakdown by use-case might be:

          • avro-data (core & data files)
          • avro-rpc (includes netty, jetty) depends on avro-data
          • avro-mapred (mapreduce APIs) depends on avro-data
          • avro-mapred-tether (RPC-based mapred API) depends on avro-mapred & avro-rpc
          • avro-dev (specific & idl compiler, ant tasks) depends on avdo-data

          About dependencies:

          • paranamer is used by reflect, to get the names of method parameters. Perhaps avro-reflect should be made a separate jar?
          • velocity is used by RPC stats charting stuff and by AVRO-648 (template-based specific compiler)
          • commons-lang is used by the IDL compiler for StringEscapeUtils

          If we separate jars, it might be good to split the build-time classpath in the same manner, by splitting the src tree. The build order would then be: data, mapred, dev, rpc, mapred-tether, since rpc depends on dev to compile the handshake. Note that this would split packages among trees, as specific has some data classes and some rpc classes.

          Finally, to be clear, is there a motive for this beyond better expressing dependencies? Functionally sticking everything in a single jar with lots of optional dependencies works fine, but folks then have to guess which dependencies they actually need, and that's the primary problem this seeks to solve. Is that right, or are there other problems too?

          Show
          Doug Cutting added a comment - A breakdown by use-case might be: avro-data (core & data files) avro-rpc (includes netty, jetty) depends on avro-data avro-mapred (mapreduce APIs) depends on avro-data avro-mapred-tether (RPC-based mapred API) depends on avro-mapred & avro-rpc avro-dev (specific & idl compiler, ant tasks) depends on avdo-data About dependencies: paranamer is used by reflect, to get the names of method parameters. Perhaps avro-reflect should be made a separate jar? velocity is used by RPC stats charting stuff and by AVRO-648 (template-based specific compiler) commons-lang is used by the IDL compiler for StringEscapeUtils If we separate jars, it might be good to split the build-time classpath in the same manner, by splitting the src tree. The build order would then be: data, mapred, dev, rpc, mapred-tether, since rpc depends on dev to compile the handshake. Note that this would split packages among trees, as specific has some data classes and some rpc classes. Finally, to be clear, is there a motive for this beyond better expressing dependencies? Functionally sticking everything in a single jar with lots of optional dependencies works fine, but folks then have to guess which dependencies they actually need, and that's the primary problem this seeks to solve. Is that right, or are there other problems too?
          Hide
          Scott Carey added a comment -

          So this is a rundown of what I know of the dependencies and what features use them:

          core requirements:
          jackson – JSON
          SLF4J – logging
          jetty – HTTP transport
          netty – Socket transport

          development:
          javacc

          tools:
          commons-httpclient
          jopt-simple

          build/test only:
          junit
          maven
          ant-eclipse
          rat
          checkstyle

          I'm not sure :
          paranamer
          velocity
          commons-lang

          Show
          Scott Carey added a comment - So this is a rundown of what I know of the dependencies and what features use them: core requirements: jackson – JSON SLF4J – logging jetty – HTTP transport netty – Socket transport development: javacc tools: commons-httpclient jopt-simple build/test only: junit maven ant-eclipse rat checkstyle I'm not sure : paranamer velocity commons-lang
          Hide
          Scott Carey added a comment -

          Thoughts?

          I know how to do the above with Maven directly, but I'm not as familiar with Ivy. Would we need one ivy.xml files per jar/pom combination we want to build? For some things this clearly breaks up by package:

          o.a.a.mapred
          o.a.a.mapred.tether
          o.a.a.pig
          >>> avro-hadoop.jar

          But some things such as the dev tools would be more difficult. I'm not sure we would choose to separate those from core. We could instead specify the dev dependencies such as javacc as 'optional' in the pom / 'transisive=false' in ivy.

          Two things jump out as definitely important to separate:
          1. Hadoop, etc.
          2. A future maven plugin for idl/specific compilers.

          Before I add a pig dependency I'd like to sort out our packaging and dependency strategy here.

          Show
          Scott Carey added a comment - Thoughts? I know how to do the above with Maven directly, but I'm not as familiar with Ivy. Would we need one ivy.xml files per jar/pom combination we want to build? For some things this clearly breaks up by package: o.a.a.mapred o.a.a.mapred.tether o.a.a.pig >>> avro-hadoop.jar But some things such as the dev tools would be more difficult. I'm not sure we would choose to separate those from core. We could instead specify the dev dependencies such as javacc as 'optional' in the pom / 'transisive=false' in ivy. Two things jump out as definitely important to separate: 1. Hadoop, etc. 2. A future maven plugin for idl/specific compilers. Before I add a pig dependency I'd like to sort out our packaging and dependency strategy here.
          Hide
          Philip Zeyliger added a comment -

          Definitely +1 to the idea.

          Show
          Philip Zeyliger added a comment - Definitely +1 to the idea.

            People

            • Assignee:
              Scott Carey
              Reporter:
              Scott Carey
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development