Related to: https://issues.apache.org/jira/browse/AVRO-647
None of the bits of Avro that Hadoop will use depend on Hadoop.
None of the bits of Avro that use Hadoop will be called by Hadoop.
If this was not so, then moving it out of the jar would not be a possible solution to the problem.
There is no circular dependency between Hadoop itself and Avro unless Hadoop decides to use classes in o.a.a.mapred.
Unless a user decided to call those in a Task. But then this inclusion might actually be desired!
Because of the way that Hadoop works, putting all of its dependencies in the front of the classpath for a Task, no user user will be able to run a newer version of Avro than what is in Hadoop. With the mapred package broken out, at least a user might have a chance of using a different version of that, provided it was compatible with the 'avro-core' version Hadoop was using; but the safe bet would be to force the exact same version and bundle it with hadoop.
So at first I thought this was important to break out for Hadoop's sake, but now I don't. Its important for Avro's sake for users and applications that don't use Hadoop.
It might be the other way around. Using Avro in Hadoop is blocked by Hadoop sorting out its classpath issues since it currently forces all user Tasks to run with its dependencies (there is no separate classloader for Tasks, for instance).
There is a Hadoop ticket for that, I can't seem to locate it right now.
Avro does not lie about its dependency on Hadoop. There is never a time that an Avro user needs hadoop. Although "provided" scope might be more appropriate than "optional", its functionally the same in this case.
The only way that a user can execute any avro.mapred code is to run from inside Hadoop, where the hadoop jars and dependencies come from hadoop and not from any packaging the user may attempt. Specifying the dependency as a runtime dependency would be a lie – the execution context (Hadoop) is expected to provide it.