Thanks for the review!
libxml2 isn't always available (at least on my OS X box it isn't installed); is it possible to use the "xml" module instead? Or does it not have the features you need....? (And, is it always installed...?).
libxml2 enables XPath queries, which simplify the POM content checks. I thought libxml2 was generally installed - it is installed in my Cygwin installation - but I guess not. I tried using the "lxml" module, since it also includes XPath, and is said by several random Internet denizens to have a more pythonic API than libxml2, but the "lxml" module is not installed in my Cygwin distribution, and my (admittedly low-effort) attempt to install it wasn't successful .
Mike, do you know of any surveys of python modules' inclusion in different distributions?
I'll look into switching to the "xml" module and using DOM rather than XPath queries.
that sure is a LOT of Python code Maven requires a lot of verifying I guess...
Three sources of code volume here:
- I tried to minimize changes in existing parts of the script, so there is duplication in several places (e.g. signature and hash checks).
- I attempted to isolate each type of check to minimize function length and simplify maintenance; as a result, setup code is duplicated.
- As you say, there's lots of verifying to do:
- The Maven release artifacts are separately deployed in non-shallow directory hierchies, unlike the Lucene/Solr release packages, so a recursive crawl is required to collect them.
- Each artifact has detached metadata (the POM), source, and javadoc jars that need to be validated.
- Since the deployed POMs don't tell me if anything is missing, in order to figure out what should be deployed, I have to do a recursive crawl against the Subversion release branch to collect the POM templates.
- Most of the Maven artifacts are copies of those in the Lucene/Solr distributions, so in contrast to the regular binary distributions' case, the Maven copies have to be verified as identical to their sources. In the case of the non-Mavenized dependencies that are published as Lucene and Solr artifacts, the deployed Maven .jar names are different from their sources, so a map has to be created to track the Maven artifact copies back to their sources.
The first of these could be addressed by refactoring. The second could be addressed without creating huge function bodies by merging functions with the same setup code, then making new functions that are called from inner loops. And the third is just the nature of the beast - I guess we could do less verifying, but that direction wouldn't get my vote .