|
[
Permalink
| « Hide
]
Owen O'Malley added a comment - 21/Jul/09 10:30 PM
I'd suggest putting this in a contrib module rather than the main source tree.
src/contrib/vertica and package org.apache.hadoop.vertica?
Patch introduces Vertica optimized input and output formatters. Includes unit tests (requires vertica jdbc drivers and database) and example as unit test.
We can provide software and license for anyone wanting to run the unit test. All bindings are reflected so the driver is not required to compile.
Omer Trajman made changes - 27/Jul/09 04:26 AM
Omer Trajman made changes - 27/Jul/09 04:26 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12414581/MAPREDUCE-775.patch against trunk revision 800232. -1 @author. The patch appears to contain 1 @author tags which the Hadoop community has agreed to not allow in code contributions. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 204 release audit warnings (more than the trunk's current 203 warnings). -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/435/testReport/ This message is automatically generated. Most likely one of the newer files in the patch is missing the Apache License header...
Arun C Murthy made changes - 03/Aug/09 04:37 PM
SQL test script missing the header, eclipse codegen put @author in and I fat fingered the AllTest. Will fix/retest/repatch ASAP.
Patch to address reported issues including missing license in sql init script and conditional unit tests that won't fail if there's no jdbc driver.
Omer Trajman made changes - 04/Aug/09 08:52 PM
Omer Trajman made changes - 04/Aug/09 08:52 PM
Omer Trajman made changes - 04/Aug/09 08:54 PM
Fixing issues with new patch. I seem to have replaced the original instead of adding a .N.patch - sorry for the confusion.
Omer Trajman made changes - 24/Aug/09 09:27 PM
Omer Trajman made changes - 24/Aug/09 09:27 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12415514/MAPREDUCE-775.patch against trunk revision 807165. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 16 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/511/testReport/ This message is automatically generated. Looks like test failures are existing and not related to the patch. Can anyone more familiar with core validate?
Omer,
This looks like a great start. I've read through the patch and believe I understand most of how it works. I don't have any major architectural concerns, but there are a number of style issues that I think should be addressed before this is committed. All of these are outlined below. Comments are listed in the sequential order presented by your patch file. (As for your question about test failures, build #511 has already been deleted by Hudson, so I can't check that.) ivy.xml: Hadoop test classes typically go in the same package as that which they test (e.g., o.a.h.vertica), not a separate package like o.a.h.vertica.tests. This would save you a lot of imports in tests, and allows package-public things to be used for testing. (This applies to all your test classes) AllTests.java
TestExample.Reduce.setup(): I suggest that AllTests.setup(); go in a static initializer block in TestExample rather than getting called in every Reduce.setup() call. Given that you actually require AllTests.setup() in virtually all your tests, I would suggest creating a VerticaTestCase class that subclasses TestCase, have this class call AllTests.setup() in a static initializer block, and then have all your Test* classes subclass VerticaTestCase instead of TestCase. This way you won't worry about missing a call somewhere. Also in this same method, why catch Exception e and print its stack trace? If Reduce.setup() fails for an exception, why shouldn't the whole test fail? TestExample.Reduce.reduce(): Style nit: One-line if statements should still use curly-braces around the "then" clause. See http://java.sun.com/docs/codeconv/html/CodeConventions.doc6.html#449 I don't think TestExample, etc, should have a run() method. TestVertica.testVerticaRecord(): why are values of DATE, TIME, etc, commented out? Dead code should be removed, not commented out. Also, why catch the IOException and return? Why doesn't this method just throw IOException (and implicitly fail the test)? In recordTest(), don't use "assert values.equals(new_values)", use JUnit: assertEquals("failure message", values, new_values); Same with testVerticaSplit(), validateInput(), etc... VerticaStreamingRecordWriter.java: Please use Java "lowerCamelCase" style for field and variable names, not "under_scores" (see writer_table, copy_stmt, etc. These should be writerTable and copyStmt respectively.) In the constructor, the RuntimeException description "Vertica Formatter requies a the Vertica jdbc driver" contains a bunch of typos. close() method: if statement should use curly-brace style described above. write() method: Materializing record.toString() in LOG.debug() for every call to write is expensive. Consider wrapping this statement in a call to LOG.isDebugEnabled(). VerticaConfiguration: comment above definition of DELIMITER has typos. (input_query.charAt(input_query.length() - 1) == ';' ... perhaps inputQuery.endsWith(';'); ? getInputParameters() has meaningless javadoc attributes. See also getInputDelmiter() (which is a typo'd method name), setInputDelimiter(), etc... that all have empty @return attributes. VerticaInputFormat: DateFormat is not thread-safe. datefmt should not be a static member. This class also has a lot of under_score field and parameter names. Can your javadoc comment for optimize() suggest when it is appropriate to call this, vs. when you would be better off not doing so? What's the heuristic a programmer should keep in mind? This method also contains a lot of commented-out code. Please remove it entirely. conn.wait(1000); should pull out 1000 into a static final constant, or even better, make it configurable. VerticaRecordWriter.getValue() has hairy braces in an if..else statement. (You do this in write() as well.) Also, what happens in the case where writer_table.split() returns a 0-length array? In this same method, can you please add a comment explaining why you're pulling rs.getString(4) and rs.getInt(5)? These seem arbitrary as-written. VerticaInputSplit.executeQuery() has javadoc typos. VerticaUtil uses tabs instead of spaces, and includes empty lines with leading whitespace. Various block statements and curly braces also require reformatting here, as well as variable_names. VerticaRecord constructor has meaningless javadoc attributes. in objectTypes(), why not use else if statements instead of just a series of if statements? You could then drop all the continue statements which make for awkward flow. Also, include a case at the end for unknown type where you throw an exception, rather than misalign the types ArrayList from the values ArrayList. toSQLString(): Please do not start variable names with underscore. I suggest myDelimiter to differentiate it from delimiter. Also, are fall-thrus in the case block intentional? If so, please mark this with a comment.
Tom White made changes - 09/Sep/09 08:00 AM
Aaron - thank you for the excellent and thorough comments. All accepted and fixed.
Omer Trajman made changes - 12/Sep/09 04:28 AM
Updated patch addressing Aaron's feedback.
Omer Trajman made changes - 12/Sep/09 04:28 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12419368/MAPREDUCE-775.2.patch against trunk revision 813944. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 19 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/68/console This message is automatically generated. whoops...wrong base dir. replacing with same patch from up level up
Omer Trajman made changes - 12/Sep/09 12:13 PM
Omer Trajman made changes - 12/Sep/09 12:13 PM
Still v2 patch but created from root dir
Omer Trajman made changes - 12/Sep/09 12:14 PM
Omer Trajman made changes - 12/Sep/09 12:16 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12419380/MAPREDUCE-775.2.patch against trunk revision 814122. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 19 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/69/testReport/ This message is automatically generated. need to fix issue in ivy.xml
Omer Trajman made changes - 12/Sep/09 08:34 PM
Omer Trajman made changes - 12/Sep/09 08:35 PM
Omer Trajman made changes - 12/Sep/09 08:35 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12419401/MAPREDUCE-775.3.patch against trunk revision 814122. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 19 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 221 release audit warnings (more than the trunk's current 220 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/70/testReport/ This message is automatically generated. forgot apache license header in new fine
Omer Trajman made changes - 13/Sep/09 02:41 PM
Omer Trajman made changes - 13/Sep/09 02:41 PM
Omer Trajman made changes - 13/Sep/09 02:45 PM
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12419420/MAPREDUCE-775.4.patch against trunk revision 814122. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 19 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/71/testReport/ This message is automatically generated. Thanks for making those changes. Looks good; +1.
I will commit it later tonight if nobody gets to this one before that time.
I just committed this. Thanks, Omer!
Devaraj Das made changes - 18/Sep/09 06:24 PM
Integrated in Hadoop-Mapreduce-trunk-Commit #57 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/57/
. Add native and streaming support for Vertica as an input or output format taking advantage of parallel read and write properties of the DBMS. Contributed by Omer Trajman.
Chris Douglas made changes - 25/Jan/10 04:52 AM
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||