|
[
Permalink
| « Hide
]
Aaron Kimball added a comment - 12/May/09 05:55 PM
Attaching patch that contains sqoop; adds project to src/contrib/sqoop/
Aaron Kimball made changes - 12/May/09 05:55 PM
Aaron Kimball made changes - 12/May/09 05:56 PM
there is a tool called DataImportHandler which is used to import data from RDBMS , http urls etc which is successfully used in Solr. If necessary we can reuse large parts of it.
http://wiki.apache.org/solr/DataImportHandler There is a plan to make it available as a library which can be used to import into any kind of document database solr/couchdb/hadoop etc -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12407903/HADOOP-5815.patch against trunk revision 774138. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. -1 release audit. The applied patch generated 489 release audit warnings (more than the trunk's current 486 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/332/testReport/ This message is automatically generated. Hi Noble,
I've read through your document there and the related JIRA item in Solr. I'm a bit confused as to how it is applicable here – maybe you could explain further. As I understand it, The DataImportHandler is designed to ingest data from various sources in a manner that is user-configured on a per-table basis, and incorporate that data into indices that are then readable from the rest of the Solr system. (disclaimer: I have very little understanding of Solr's goals and features. As I understand it, it's a search engine front-end.) Sqoop's goal (already met by this implementation) is to do ad-hoc loading of database tables into HDFS by performing a straightforward translation of rows to text while physically moving the bits from the database into flat files in HDFS. HDFS does not naturally include any indexing or other higher-level structures over a data set. Can you please explain further where you see integration points between these two tools? Thanks! New patch to fix releaseaudit warnings
Aaron Kimball made changes - 13/May/09 08:05 PM
Aaron Kimball made changes - 13/May/09 08:05 PM
Aaron Kimball made changes - 13/May/09 08:05 PM
DIH (dataImporthandler) is a small tool to extract data out of various structured datasources (rdbms/xml etc) to flat documents . a document is nothing but a Map<String,Object> .The key is the field name and the value can be a single object or a list of objects.
DIH is about collecting data from various sources using a config script (say you can mix and match data from an xml file + DB) to produce a record. how does a record look like in hadoop ? -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12408037/HADOOP-5815.2.patch against trunk revision 774625. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/336/testReport/ This message is automatically generated. Contrib test failures are unrelated (streaming).
Aaron Kimball made changes - 15/May/09 12:45 AM
Aaron Kimball made changes - 21/May/09 09:24 PM
Tom White made changes - 26/May/09 10:31 AM
Editorial pass over all release notes prior to publication of 0.21.
Robert Chansler made changes - 29/Sep/09 10:04 PM
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||