Doug, I think you pin-pointed the heart of the build issue, though, and my understanding of it was wrong. Specifically:
[Paranamer] only needs to run it on Java interfaces used as protocols, a set that intersects generated code, but is neither a proper superset nor subset.
This is what I hadn't internalized. Currently, paranamer only runs on generated test code, because that's the only place where we use an interface as a protocol in the code base. Would you agree that we should change the build so that paranamer runs only on the set of files that we need it to run on? I'm not quite sure how to make that happen, actually, since the ParanamerGeneratorTask seems to take entire directories.
Here's the paranamer source I was looking at:
If we can separate out the files that paranamer runs on, then since they're already enumerated in the build, I can figure out how to make Eclipse understand them.
How do you think it best to move forward? Write a paranamer macro that copies one file out of a tree into a parallel temporary tree, works on it, and pushes it back? Something way more clever?
Some misgivings about paranamer
This discussion has given me some misgivings about first generating Java code, and then running paranamer on it. If we're generating code, can't we just generate the parameter information and stuff it into a static field in the interface, and be done with it? (We can, in fact, even use their interface and add __PARANAMER_DATA, so that there's only one read path.) People with trivial build systems will want to punt, generate the code with a command-line tool once, check in the generated code as if it's their own, and be done with it. (Checking in of generated code happens a lot with Thrift, in part because Thrift has, in the past, been recalcitrant to build.) But that's not enough: they also have to run paranamer on it, at which point they're checking in .class files, and that's not nice. More advanced users will generate code from schemas at build-time. Some of those people will be doing so with Eclipse... I'm a little worried that we're imposing more hoops on those people than they deserve.
Anyway, that's neither here nor there for Eclipse support. I think it'll get worked out as we build some tutorials of how to use Avro.
You asked some specific questions about Eclipse; I've done my best to answer below. Sorry it's so wordy.
[You] don't want Eclipse to compile everything via Ant (my first choice) [because] this would somehow disable lots of Eclipse awesomeness. Is that right?
Yep, that's right. Part of that awesomeness is speed (calling out to ant is slower), and part of that is that Eclipse's native compiler integrates with the editor and the debugger.
can Eclipse actually not handle multiple source trees that share a classes directory?
The Eclipse background
The problem is a bit more nuanced than that. It doesn't like the overlap of "sources" and "libraries". "Sources" are directories with a Java source tree, and "libraries" are either classes/ directories or jar files. The classes/ directory which comes from sources doesn't get included in the configuration--Eclipse prefers to compile to classes itself. (Eclipse is configured to build the .classes tree in .eclipse/something-or-other, and I don't want to colocate that with where ant builds it, because I want to very explicitly avoid at all costs accidentally packaging something compiled by Eclipse.)
I think a specific example would help. Let's say we have two classes: FooProtocol.java, which is generated and compiled by ant (because it needs paranamer). It's in classes/FooProtocol.class and build/src/FooProtocol.java. And then we have SpecificCompiler.java, in classes/SpecificCompiler.class and src/SpecificCompiler.java. If I tell Eclipse that "src" is one of the package sourcs, and "classes" is one of the libraries, then, when I do a lookup for "SpecificCompiler", it will give me two things: both SpecificCompiler.java and SpecificCompiler.class, because both are available to it. It's possible to convince it that SpecificCompiler.java is "ahead of" SpecificCompiler.class, but it makes Ecilpse work worse. And if you get the classpath misconfigured, it can get confused about which one it should run. So, the best thing to do is to not tell Eclipse about SpecificCompiler.class at all. You can do this by explicitly excluding it (hard to do when there are 100 of these), or by putting it in a different directory.
Does that make sense? Anyway, that's the problem.
I think I can make Eclipse work with however AVRO sets up the build.
We had Eclipse support built into Hadoop a long time ago.
Two observations here:
- There was and is an Eclipse plug-in for Hadoop. It suffers from lack of use and development.
- There is an ant target called "eclipse-files". It doesn't always work, but I can tell you that it works right now in mapred and hdfs. I'm one of the people who tries to post a patch every time I run into it being broken. The most common failure is that the dependencies/ivy have changed. If the way that Avro manages that turns out to work, I will try to port that over to Hadoop as well; it'll fix the most common error. Besides that frequent but easy-to-fix problem, the Eclipse stuff has been mostly working. The JIRA process (and how Hadoop stores the Eclipse templates in a separate directory, which don't show up in "svn diff") imposes some friction on trivial fixes, so I sense that people who are using Eclipse are well-versed enough with it to just work around problems as they come.
since they're unlikely to re-generate their project.
I tend to regenerate frequently, but it depends on the person. It helps that "ant clean" will blow it away