|
[
Permlink
| « Hide
]
Fredrik Hedberg added a comment - 07/Jan/08 11:28 AM
Initial code. Attached as archive as I didn't want to create a patch before we know where in the source tree we want to put it.
Example. Identity MapReduce from one table to another.
Oh.. Sorry,
I was just about to watch it. (missed 'assign button') I'm sorry this bug seems to have been forgotten. I'd suggest putting the code into org.apache.hadoop.mapred.lib.jdbc.*
I'd suggest getting rid of the JDBCMapper and JDBCReducer and moving the initJob into a static method of the JDBCInputFormat and OutputFormat. So have, public static void setInput(JobConf job, String table, JDBCField keyField, JDBCField[] fields) { ... } and a corresponding setOutput method in JDBCOutputFormat. The preferred style is to have getters and setters rather than public constants of the strings for the configuration. You should also use your own property for the table name rather than input/output path, because that might be confusing. New version of the JDBC layer for Hadoop. Took care of the issues pointed out by Owen and made some other changes that substantially improved performance.
Updated example. Identity MapReduce from one table to another.
when patches are ready, you need to submit them to make them "patch available"
OK, just wanted to get your input before doing so.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12383367/Driver.java against trunk revision 663079. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2572/console This message is automatically generated. Hudsun tried to apply the example. Removed example and resubmitted.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12383366/database.diff against trunk revision 663337. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 3 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2575/testReport/ This message is automatically generated. Fixed two out of three FindBugs issues. Last one is rather hard to avoid.
Also, Hudson complains about the lack of unit-tests. Bar the inclusion of an embedded SQL database, I can't really think of anything non-trivial in this case. Comments? > Bar the inclusion of an embedded SQL database, [ ... ]
We could add Derby to src/test/lib for this. This would add about 3MB of jar files to Hadoop... > Also, Hudson complains about the lack of unit-tests. Bar the inclusion of an embedded SQL database, I can't really think of anything non-trivial in this case.
We could implement a MiniDBMS with very limited ability (e.g. use array or java collection to store data in memory), implement a java.sql.Driver and register it in java.sql.DriverManager. Then, use it for testing. More embedded SQL options are listed at:
http://java-source.net/open-source/database-engines TinySQL looks attractive. Its jar is less than 100kB. Thanks for the input. I think I'll use HSQLDB instead of TinySQL - despite it's larger footprint (600kB) - it seems a lot more mature and is apparently used widely in its embedded form.
When we move to Java 6 (
> I think I'll use HSQLDB instead of TinySQL [...]
Good choice, since its license is BSD, not LGPL, which would rule TinySQL out. > When we move to Java 6 ( That would be nice. Perhaps we should make this issue dependent on HADOOP-2235? Thanks for the useful patch !
I think we should iron out a few issues before this issue gets in, #It has been discussed in several blogs that LIMIT and OFFSET should not be used w/o ORDER BY clause, since the query execution plan might opt for different row orderings (http://azimbabu.blogspot.com/2008/03/sqllimit-offset-without-order-by.html Enis,
Thanks for the comments. Those are all very valid points, I couldn't agree more. Unfortunately, I don't have the time to take this further at the moment, so if anyone else would like to continue working on this, I'd be happy to answer any questions. Not surprisingly, I think this functionality could be quite useful, so bringing this up to commit quality would be sweet. Since Fredrik said that he cannot continue to work on the patch, I have updated it with some changes.
The changes include :
Derby does not support LIMIT ... OFFSET clauses, so the patch uses HSQLDB, which has a BSD-like license.
I have included the jar and license for HSQLDB. The patch will fail w/o these. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12389832/hsqldb.tar.gz against trunk revision 694459. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3248/console This message is automatically generated. Nice work Enis. I can't test it right now but it looks good
Manually tested the patch (since hudson will fail to build due to hsqldb dependency). The tests and the release audit passes with :
[exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. I just committed this. Thanks, Fredrik and Enis!
Integrated in Hadoop-trunk #611 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/611/
We need to declare hsqldb.jar in eclipse plugin. See
Frederik or Enis, do you have any usage examples by any chance?
Indeed, there is an example checked-in with the patch. You can find it at src/examples/org/apache/hadoop/examples/DBCountPageView.java. You can use the example to use local hsqldb or you may configure it to use an external DB.
The
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||