This looks like a great start. I've read through the patch and believe I understand most of how it works. I don't have any major architectural concerns, but there are a number of style issues that I think should be addressed before this is committed. All of these are outlined below. Comments are listed in the sequential order presented by your patch file.
(As for your question about test failures, build #511 has already been deleted by Hudson, so I can't check that.)
Do you actually depend on hsqldb?
Hadoop test classes typically go in the same package as that which they test (e.g., o.a.h.vertica), not a separate package like o.a.h.vertica.tests. This would save you a lot of imports in tests, and allows package-public things to be used for testing. (This applies to all your test classes)
- The method name setUp() has special meaning in JUnit. Since your setup() method isn't a setUp(), can you change this to something less misleadingly-similar?
- In test description string in suite(), o.a.h.vertica, not o.a.h.sqoop.
- re your "TODO: figure out jdbc jar packaging:" Based on Hadoop source tree style, I recommend creating a src/contrib/vertica/lib directory, check any external jars you need in there, and modify src/contrib/vertica/build.xml to include the jars from that dir on the classpath for building, testing, etc. Since patches are text not binary, you should attach the jar to the JIRA issue separately. Note that adding a jar requires its license be A2-compatible. See
http://www.apache.org/legal/resolved.html#category-a for a list of licenses which external dependencies may have applied to them.
TestExample.Reduce.setup(): I suggest that AllTests.setup(); go in a static initializer block in TestExample rather than getting called in every Reduce.setup() call. Given that you actually require AllTests.setup() in virtually all your tests, I would suggest creating a VerticaTestCase class that subclasses TestCase, have this class call AllTests.setup() in a static initializer block, and then have all your Test* classes subclass VerticaTestCase instead of TestCase. This way you won't worry about missing a call somewhere.
Also in this same method, why catch Exception e and print its stack trace? If Reduce.setup() fails for an exception, why shouldn't the whole test fail?
TestExample.Reduce.reduce(): Style nit: One-line if statements should still use curly-braces around the "then" clause. See http://java.sun.com/docs/codeconv/html/CodeConventions.doc6.html#449. Hadoop source should follow all Sun Java style conventions except for an indentation width of two spaces.
I don't think TestExample, etc, should have a run() method.
TestVertica.testVerticaRecord(): why are values of DATE, TIME, etc, commented out? Dead code should be removed, not commented out. Also, why catch the IOException and return? Why doesn't this method just throw IOException (and implicitly fail the test)?
In recordTest(), don't use "assert values.equals(new_values)", use JUnit: assertEquals("failure message", values, new_values);
Same with testVerticaSplit(), validateInput(), etc...
VerticaStreamingRecordWriter.java: Please use Java "lowerCamelCase" style for field and variable names, not "under_scores" (see writer_table, copy_stmt, etc. These should be writerTable and copyStmt respectively.)
In the constructor, the RuntimeException description "Vertica Formatter requies a the Vertica jdbc driver" contains a bunch of typos.
close() method: if statement should use curly-brace style described above.
write() method: Materializing record.toString() in LOG.debug() for every call to write is expensive. Consider wrapping this statement in a call to LOG.isDebugEnabled().
VerticaConfiguration: comment above definition of DELIMITER has typos.
(input_query.charAt(input_query.length() - 1) == ';' ... perhaps inputQuery.endsWith(';'); ?
getInputParameters() has meaningless javadoc attributes. See also getInputDelmiter() (which is a typo'd method name), setInputDelimiter(), etc... that all have empty @return attributes.
VerticaInputFormat: DateFormat is not thread-safe. datefmt should not be a static member.
This class also has a lot of under_score field and parameter names.
Can your javadoc comment for optimize() suggest when it is appropriate to call this, vs. when you would be better off not doing so? What's the heuristic a programmer should keep in mind?
This method also contains a lot of commented-out code. Please remove it entirely.
conn.wait(1000); should pull out 1000 into a static final constant, or even better, make it configurable.
VerticaRecordWriter.getValue() has hairy braces in an if..else statement. (You do this in write() as well.)
Also, what happens in the case where writer_table.split() returns a 0-length array? In this same method, can you please add a comment explaining why you're pulling rs.getString(4) and rs.getInt(5)? These seem arbitrary as-written.
VerticaInputSplit.executeQuery() has javadoc typos.
VerticaUtil uses tabs instead of spaces, and includes empty lines with leading whitespace. Various block statements and curly braces also require reformatting here, as well as variable_names.
VerticaRecord constructor has meaningless javadoc attributes.
Also, please obey 80-column limit in this class (as well as elsewhere).
in objectTypes(), why not use else if statements instead of just a series of if statements? You could then drop all the continue statements which make for awkward flow. Also, include a case at the end for unknown type where you throw an exception, rather than misalign the types ArrayList from the values ArrayList.
toSQLString(): Please do not start variable names with underscore. I suggest myDelimiter to differentiate it from delimiter.
Also, are fall-thrus in the case block intentional? If so, please mark this with a comment.