Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:

      All environments

      Description

      I've written some code for HBase, a BigTable-like file store. It's not perfect, but it's ready for other people to play with and examine.

      The attached tarball has the source and a README

      1. hbase.tar.gz
        51 kB
        Mike Cafarella
      2. hbase.patch
        249 kB
        Jim Kellerman
      3. hbase.patch
        254 kB
        Jim Kellerman
      4. hbase.patch
        261 kB
        Jim Kellerman
      5. hbase.patch
        262 kB
        Jim Kellerman
      6. hbase.patch
        253 kB
        Jim Kellerman
      7. hbase.patch
        253 kB
        Jim Kellerman
      8. hbase.patch
        259 kB
        Jim Kellerman
      9. hbase.patch
        265 kB
        Jim Kellerman
      10. hbase.patch
        274 kB
        Jim Kellerman
      11. hbase.patch
        298 kB
        Jim Kellerman
      12. hbase.patch
        298 kB
        Jim Kellerman

        Issue Links

          Activity

          Hide
          Edward J. Yoon added a comment -

          and, please check also my "BigTable Simulation Code Source".
          you can download here. http://www.hadoop.co.kr/moin.cgi/Maplib?action=AttachFile&do=get&target=maplib-0.0.1-alpha.tgz
          it will be used on mapreduce.

          I expect can recording of multiple <K,V> maps from Once-through process to parsing a tab delimited text file.

          Table t = new Table();
          t.Create("webtable", new String[]

          {"anchor","language"}

          );

          //Write a new anchor, language
          RowMutation r1 = new RowMutation(t.OpenOrDie("webtable"), "udanax.org");

          r1.Set("anchor:hadoop.co.kr","hadoop korean user group");
          r1.Set("anchor:joinc.co.kr","joinc");
          r1.Set("language:kr", "euc-kr");
          r1.Set("anchor:hadoop.co.kr", "hadoop");
          r1.Set("anchor:naver.com", "naver");

          Operation op = new Operation();
          op.Apply(r1);

          //Reading from table
          Scanner stream = new Scanner("webtable");
          stream.FetchColumnFamily("anchor");
          stream.Lookup("udanax.org");

          while(stream.next())
          (
          System.out.print("(RowName):");
          System.out.print(stream.RowName());
          System.out.print(" (ColumnName):");
          System.out.print(stream.ColumnName());
          System.out.print(" (Value):");
          System.out.println(stream.Value());
          )

          stream.close();

          Show
          Edward J. Yoon added a comment - and, please check also my "BigTable Simulation Code Source". you can download here. http://www.hadoop.co.kr/moin.cgi/Maplib?action=AttachFile&do=get&target=maplib-0.0.1-alpha.tgz it will be used on mapreduce. I expect can recording of multiple <K,V> maps from Once-through process to parsing a tab delimited text file. Table t = new Table(); t.Create("webtable", new String[] {"anchor","language"} ); //Write a new anchor, language RowMutation r1 = new RowMutation(t.OpenOrDie("webtable"), "udanax.org"); r1.Set("anchor:hadoop.co.kr","hadoop korean user group"); r1.Set("anchor:joinc.co.kr","joinc"); r1.Set("language:kr", "euc-kr"); r1.Set("anchor:hadoop.co.kr", "hadoop"); r1.Set("anchor:naver.com", "naver"); Operation op = new Operation(); op.Apply(r1); //Reading from table Scanner stream = new Scanner("webtable"); stream.FetchColumnFamily("anchor"); stream.Lookup("udanax.org"); while(stream.next()) ( System.out.print("(RowName):"); System.out.print(stream.RowName()); System.out.print(" (ColumnName):"); System.out.print(stream.ColumnName()); System.out.print(" (Value):"); System.out.println(stream.Value()); ) stream.close();
          Hide
          Edward J. Yoon added a comment -

          I suggest that you check the yahoo research website. (http://research.yahoo.com/project/pig)
          It is very impressive......

          I've recently started a similar project to make reports of aggregate statistics.
          but, It is now under pitiful.

          Show
          Edward J. Yoon added a comment - I suggest that you check the yahoo research website. ( http://research.yahoo.com/project/pig ) It is very impressive...... I've recently started a similar project to make reports of aggregate statistics. but, It is now under pitiful.
          Hide
          Hadoop QA added a comment -
          Show
          Hadoop QA added a comment - Integrated in Hadoop-Nightly #47 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/47/ )
          Hide
          Doug Cutting added a comment -

          I just committed this. Thanks, Jim!

          Show
          Doug Cutting added a comment - I just committed this. Thanks, Jim!
          Show
          Hadoop QA added a comment - +1, because http://issues.apache.org/jira/secure/attachment/12354799/hbase.patch applied and successfully tested against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/524929 . Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
          Hide
          Jim Kellerman added a comment -

          Fixed javadoc problem

          Show
          Jim Kellerman added a comment - Fixed javadoc problem
          Hide
          Jim Kellerman added a comment -

          Fix javadoc problem

          Show
          Jim Kellerman added a comment - Fix javadoc problem
          Hide
          Jim Kellerman added a comment -

          Problem with javadoc, ... fixing

          Show
          Jim Kellerman added a comment - Problem with javadoc, ... fixing
          Hide
          Hadoop QA added a comment -

          -1, because the javadoc tool appears to have generated warning messages when testing the latest attachment http://issues.apache.org/jira/secure/attachment/12354796/hbase.patch against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/524929. Please note that this message is automatically generated and may represent a problem with the automation system and not the patch. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch

          Show
          Hadoop QA added a comment - -1, because the javadoc tool appears to have generated warning messages when testing the latest attachment http://issues.apache.org/jira/secure/attachment/12354796/hbase.patch against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/524929 . Please note that this message is automatically generated and may represent a problem with the automation system and not the patch. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
          Hide
          Jim Kellerman added a comment -

          This is HBase revision 0.0.0

          It is code complete, but the distributed functionality needs debugging and unit tests. However it is a very good starting point for collaboration.

          Show
          Jim Kellerman added a comment - This is HBase revision 0.0.0 It is code complete, but the distributed functionality needs debugging and unit tests. However it is a very good starting point for collaboration.
          Hide
          Jim Kellerman added a comment -

          Changes in latest patch:

          • Scanners can now be started at a specific row and do not have to
            scan a whole table.
          • The client-server code is now complete but needs to be debugged and
            tests need to be written for it.
          • There is A Junit test for the base classes that covers most of
            non-distributed functionality: writing, reading, flushing,
            log-rolling, and scanning. If the environment variable
            DEBUGGING=TRUE is set when running the test, it runs a more
            extensive test that includes writing and reading 10^6^ rows,
            compaction, splitting and merging. The extensive test is not enabled
            by default as it takes over 10 minutes to run.

          Changes to MapFile.Reader:

          • Added public method midKey() which returns a key from approximately
            the middle of the MapFile.
          • Added private method int seekInternal(WritableComparable) whose body
            is most of what was in the public seek method. The difference is
            that seekInternal returns an integer value of the comparison.
          • modified public seek method to call seekInternal and do the boolean
            comparison for exact match.
          • Added public method getClosest which uses seekInternal to find the
            record whose key is the closest match to the supplied key. (unlike
            get which requires an exact key match)

          See the Wiki (http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture#status)
          for more information about the current project status and todo list.

          Show
          Jim Kellerman added a comment - Changes in latest patch: Scanners can now be started at a specific row and do not have to scan a whole table. The client-server code is now complete but needs to be debugged and tests need to be written for it. There is A Junit test for the base classes that covers most of non-distributed functionality: writing, reading, flushing, log-rolling, and scanning. If the environment variable DEBUGGING=TRUE is set when running the test, it runs a more extensive test that includes writing and reading 10^6^ rows, compaction, splitting and merging. The extensive test is not enabled by default as it takes over 10 minutes to run. Changes to MapFile.Reader: Added public method midKey() which returns a key from approximately the middle of the MapFile. Added private method int seekInternal(WritableComparable) whose body is most of what was in the public seek method. The difference is that seekInternal returns an integer value of the comparison. modified public seek method to call seekInternal and do the boolean comparison for exact match. Added public method getClosest which uses seekInternal to find the record whose key is the closest match to the supplied key. (unlike get which requires an exact key match) See the Wiki ( http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture#status ) for more information about the current project status and todo list.
          Hide
          Jim Kellerman added a comment -

          Scanners can now start from a specified row instead of just the beginning of a region (or the memcache). Added test for same.

          Some additional progress on the client-server code, but still more to do.

          Show
          Jim Kellerman added a comment - Scanners can now start from a specified row instead of just the beginning of a region (or the memcache). Added test for same. Some additional progress on the client-server code, but still more to do.
          Hide
          Jim Kellerman added a comment -

          Starting on client and servers

          Show
          Jim Kellerman added a comment - Starting on client and servers
          Hide
          Jim Kellerman added a comment -

          added logging, added compute midKey

          Show
          Jim Kellerman added a comment - added logging, added compute midKey
          Hide
          Jim Kellerman added a comment -

          Move HTableDescriptor.extractFamily to HStoreKey

          Refactor HMemcacheScanner and HStoreScanner so that they have an abstract base class (HAbstractScanner)

          Add support for column family regex's and for scanners that iterate over an entire column family

          Update unit tests for changes above. Also add test for fetching an entire column family

          Unit tests for base functions are pretty much complete.
          For the base functions, still need to look at performance and memory consumption

          Show
          Jim Kellerman added a comment - Move HTableDescriptor.extractFamily to HStoreKey Refactor HMemcacheScanner and HStoreScanner so that they have an abstract base class (HAbstractScanner) Add support for column family regex's and for scanners that iterate over an entire column family Update unit tests for changes above. Also add test for fetching an entire column family — Unit tests for base functions are pretty much complete. For the base functions, still need to look at performance and memory consumption
          Hide
          Jim Kellerman added a comment -

          HRegion.java: remove test code that is now included in unit test.
          HStore.java: add comment where part of the work to enhance scanners needs to happen.
          HStoreKey.java: add toString() method

          TestHRegion.java: added test for scanners, added test to verify that all that was written in the other tests is still there after splitting and merging.

          Show
          Jim Kellerman added a comment - HRegion.java: remove test code that is now included in unit test. HStore.java: add comment where part of the work to enhance scanners needs to happen. HStoreKey.java: add toString() method TestHRegion.java: added test for scanners, added test to verify that all that was written in the other tests is still there after splitting and merging.
          Hide
          Jim Kellerman added a comment -

          TODO: the current HBase scanner interface allows you to scan through multiple rows for an explicit set of column family members (e.g., contents:firstcolumn and anchor:secondcolumn), but it doesn't let you iterate over all the members of a column family for a particular row unless you explicitly enumerate them. This is a problem as you may not know apriori the names of all the family members.

          The Bigtable paper states that "For example, we could restrict the scan above to only produce anchors whose columns match the regular expression anchor:.cnn.com" (ignore for the moment that if "anchor:.cnn.com" were applied as a regular expression, :* means zero or more ':'s and that the '.' between the '*' and 'cnn' and between 'cnn' and 'com' match any character). You should be able to say 'anchor:' which means every member of the anchor family or 'anchor:anchornum-[0-9]+' which would match every anchor family member that starts with 'anchornum-' and then has one or more digits that follow it.

          This was uncovered in unit testing using the tests that were commented out in HRegion.java

          Show
          Jim Kellerman added a comment - TODO: the current HBase scanner interface allows you to scan through multiple rows for an explicit set of column family members (e.g., contents:firstcolumn and anchor:secondcolumn), but it doesn't let you iterate over all the members of a column family for a particular row unless you explicitly enumerate them. This is a problem as you may not know apriori the names of all the family members. The Bigtable paper states that "For example, we could restrict the scan above to only produce anchors whose columns match the regular expression anchor: .cnn.com" (ignore for the moment that if "anchor: .cnn.com" were applied as a regular expression, :* means zero or more ':'s and that the '.' between the '*' and 'cnn' and between 'cnn' and 'com' match any character). You should be able to say 'anchor:' which means every member of the anchor family or 'anchor:anchornum- [0-9] +' which would match every anchor family member that starts with 'anchornum-' and then has one or more digits that follow it. This was uncovered in unit testing using the tests that were commented out in HRegion.java
          Hide
          Jim Kellerman added a comment -

          With respect to block caching, it is HStore that 'talks' to MapFile. It has no notion of blocks as it just does a seek and a get. It would appear then that what is needed is a MapFile.CachingReader, which in turn would require SequenceFile.CachingReader, since MapFile.Reader merely tells the underlying Sequence file to seek and return the record at the appropriate location. If a SequenceFile.CachingReader could keep some (configurable) number of blocks in memory, it would not have to keep getting them from DFS. This is safe as SequenceFiles are immutable once they are closed after writing.

          Show
          Jim Kellerman added a comment - With respect to block caching, it is HStore that 'talks' to MapFile. It has no notion of blocks as it just does a seek and a get. It would appear then that what is needed is a MapFile.CachingReader, which in turn would require SequenceFile.CachingReader, since MapFile.Reader merely tells the underlying Sequence file to seek and return the record at the appropriate location. If a SequenceFile.CachingReader could keep some (configurable) number of blocks in memory, it would not have to keep getting them from DFS. This is safe as SequenceFiles are immutable once they are closed after writing.
          Hide
          Jim Kellerman added a comment -

          TODO: implement some kind of block caching in HRegion. While the DFS isn't hitting the disk to fetch blocks, HRegion is making IPC calls to DFS (via MapFile). This causes random reads to be very slow if you do a lot of them.

          Show
          Jim Kellerman added a comment - TODO: implement some kind of block caching in HRegion. While the DFS isn't hitting the disk to fetch blocks, HRegion is making IPC calls to DFS (via MapFile). This causes random reads to be very slow if you do a lot of them.
          Hide
          Jim Kellerman added a comment -

          Fixes a nasty bug around compaction. The bug was that records would get written that zero length keys and values. This created widespread unhappiness.

          Show
          Jim Kellerman added a comment - Fixes a nasty bug around compaction. The bug was that records would get written that zero length keys and values. This created widespread unhappiness.
          Hide
          Jim Kellerman added a comment -

          Latest patch also includes three unit tests, one of which is currently failing and is the subject of current investigation

          Show
          Jim Kellerman added a comment - Latest patch also includes three unit tests, one of which is currently failing and is the subject of current investigation
          Hide
          Jim Kellerman added a comment -
          • Fix bug that caused a null pointer exception in HRegion.closeAndMerge
          • Add detail to exception thrown from HRegion.checkRow
          • Change map.seek(); map.next() to map.getClosest() in Hstore.getFull,
            HStore.get
          • Add constructor and suite() to TestHRegion
          • Removed some constants that are no longer needed
          • Modify MapFile.Reader:
            o added private int seekInternal(WritableComparable) which is basically
            the body of public boolean seek(WritableComparable) but returns the
            integer value of the comparison. This is now called by boolean seek()
            o added public Writable getClosest(WritableComparable, Writable). Unlike
            get which only returns a value on an exact match, getClosest returns the
            value for the key which is closest to the requested key.
          Show
          Jim Kellerman added a comment - Fix bug that caused a null pointer exception in HRegion.closeAndMerge Add detail to exception thrown from HRegion.checkRow Change map.seek(); map.next() to map.getClosest() in Hstore.getFull, HStore.get Add constructor and suite() to TestHRegion Removed some constants that are no longer needed Modify MapFile.Reader: o added private int seekInternal(WritableComparable) which is basically the body of public boolean seek(WritableComparable) but returns the integer value of the comparison. This is now called by boolean seek() o added public Writable getClosest(WritableComparable, Writable). Unlike get which only returns a value on an exact match, getClosest returns the value for the key which is closest to the requested key.
          Hide
          Jim Kellerman added a comment -

          Latest patch also includes one unit test.

          Show
          Jim Kellerman added a comment - Latest patch also includes one unit test.
          Hide
          Jim Kellerman added a comment -

          HRegion: remove unnecessary cast
          HRegionServer: add stub method openScanner
          HStoreFile: change constructor so that it creates new Text objects for regionName and colFamily since doing a Text.set gets NullPointerException if the member was never initialized.

          Show
          Jim Kellerman added a comment - HRegion: remove unnecessary cast HRegionServer: add stub method openScanner HStoreFile: change constructor so that it creates new Text objects for regionName and colFamily since doing a Text.set gets NullPointerException if the member was never initialized.
          Hide
          Jim Kellerman added a comment -

          I'm either missing something or my environment is messed up.

          On test cases, the wiki instructs: "By default, do not let tests write any temporary files to /tmp. Instead, the tests should write to the location specified by the test.build.data system property."

          However test.build.data is null in the contrib build in my environment. Suggestions?

          Show
          Jim Kellerman added a comment - I'm either missing something or my environment is messed up. On test cases, the wiki instructs: "By default, do not let tests write any temporary files to /tmp. Instead, the tests should write to the location specified by the test.build.data system property." However test.build.data is null in the contrib build in my environment. Suggestions?
          Hide
          Doug Cutting added a comment -

          > er, don't you mean src/contrib ?

          Doh! Yes, that's what I meant.

          Show
          Doug Cutting added a comment - > er, don't you mean src/contrib ? Doh! Yes, that's what I meant.
          Hide
          Jim Kellerman added a comment -

          path is src/contrib/hbase

          Show
          Jim Kellerman added a comment - path is src/contrib/hbase
          Hide
          Jim Kellerman added a comment -

          > Doug Cutting [01/Mar/07 10:51 AM] wrote:
          >> You may want to make that contrib/hbase/... to follow Lucene's directory structure.
          >
          > Hadoop, unlike Lucene, puts contrib sources in contrib/src, so contrib/src/hbase is appropriate.

          er, don't you mean src/contrib ?

          Show
          Jim Kellerman added a comment - > Doug Cutting [01/Mar/07 10:51 AM] wrote: >> You may want to make that contrib/hbase/... to follow Lucene's directory structure. > > Hadoop, unlike Lucene, puts contrib sources in contrib/src, so contrib/src/hbase is appropriate. er, don't you mean src/contrib ?
          Hide
          Doug Cutting added a comment -

          > You may want to make that contrib/hbase/... to follow Lucene's directory structure.

          Hadoop, unlike Lucene, puts contrib sources in contrib/src, so contrib/src/hbase is appropriate.

          > Once I know how the patch is applied, I'll fix the paths in the patch file accordingly.

          Patches are applied with 'patch p 0 < PATCH' while connected to the root of the project (trunk, typically). Also, no need to remove obsolete patches-they're a fine part of history.

          Show
          Doug Cutting added a comment - > You may want to make that contrib/hbase/... to follow Lucene's directory structure. Hadoop, unlike Lucene, puts contrib sources in contrib/src, so contrib/src/hbase is appropriate. > Once I know how the patch is applied, I'll fix the paths in the patch file accordingly. Patches are applied with 'patch p 0 < PATCH' while connected to the root of the project (trunk, typically). Also, no need to remove obsolete patches -they're a fine part of history.
          Hide
          Jim Kellerman added a comment -

          > Otis Gospodnetic [01/Mar/07 01:37 AM] wrote:
          > Jim,
          > You may want to make that contrib/hbase/... to follow Lucene's directory structure. Here is what Lucene's contrib dir is like:
          >
          > [otis@localhost trunk]$ ls -al contrib/
          > total 184
          > drwxrwxr-x 22 otis otis 4096 Nov 28 15:44 .
          > drwxrwxr-x 10 otis otis 4096 Feb 16 15:03 ..
          > drwxrwxr-x 6 otis otis 4096 Feb 21 09:18 analyzers
          > drwxrwxr-x 5 otis otis 4096 Mar 16 2006 ant
          > drwxrwxr-x 6 otis otis 4096 Feb 16 12:34 benchmark
          > rw-rw-r- 1 otis otis 1347 May 22 2005 contrib-build.xml
          > drwxrwxr-x 6 otis otis 4096 Jan 9 2006 db
          > drwxrwxr-x 7 otis otis 4096 Dec 17 00:45 gdata-server
          > drwxrwxr-x 6 otis otis 4096 May 1 2005 highlighter
          > drwxrwxr-x 6 otis otis 4096 Feb 26 2005 javascript
          > drwxrwxr-x 6 otis otis 4096 Nov 24 13:34 lucli
          > drwxrwxr-x 4 otis otis 4096 May 22 2005 memory
          > drwxrwxr-x 4 otis otis 4096 Feb 9 2006 miscellaneous
          > drwxrwxr-x 4 otis otis 4096 Mar 16 2006 queries
          > drwxrwxr-x 5 otis otis 4096 Jan 9 2006 regex
          > drwxrwxr-x 4 otis otis 4096 Nov 28 15:37 similarity
          > drwxrwxr-x 7 otis otis 4096 Nov 20 01:32 snowball
          > drwxrwxr-x 4 otis otis 4096 Jun 30 2006 spellchecker
          > drwxrwxr-x 4 otis otis 4096 Jul 16 2005 surround
          > drwxrwxr-x 7 otis otis 4096 Feb 21 09:04 .svn
          > drwxrwxr-x 5 otis otis 4096 May 1 2005 swing
          > drwxrwxr-x 4 otis otis 4096 May 22 2005 wordnet
          > drwxrwxr-x 4 otis otis 4096 Mar 16 2006 xml-query-parser

          Otis,

          HBase is strictly a hadoop project. Don't you think it should go in hadoop's contrib directory?

          hadoop/trunk/src/contrib$ ls -al
          ls -al
          total 24
          drwxr-xr-x 10 jim jim 340 Feb 28 11:36 .
          drwxr-xr-x 12 jim jim 408 Feb 28 11:35 ..
          drwxr-xr-x 9 jim jim 306 Feb 28 11:38 .svn
          drwxr-xr-x 6 jim jim 204 Feb 28 11:35 abacus
          rw-rr- 1 jim jim 6616 Dec 12 11:36 build-contrib.xml
          rw-rr- 1 jim jim 1264 Jul 18 2006 build.xml
          drwxr-xr-x 5 jim jim 170 Feb 28 11:35 ec2
          drwxr-xr-x 5 jim jim 170 Feb 28 11:38 hbase
          drwxr-xr-x 5 jim jim 170 Feb 28 11:35 streaming
          drwxr-xr-x 5 jim jim 170 Feb 28 11:35 test

          that is what I intended. Is the path for the patch relative to the repository root, or hadoop's trunk?

          Once I know how the patch is applied, I'll fix the paths in the patch file accordingly.

          Show
          Jim Kellerman added a comment - > Otis Gospodnetic [01/Mar/07 01:37 AM] wrote: > Jim, > You may want to make that contrib/hbase/... to follow Lucene's directory structure. Here is what Lucene's contrib dir is like: > > [otis@localhost trunk] $ ls -al contrib/ > total 184 > drwxrwxr-x 22 otis otis 4096 Nov 28 15:44 . > drwxrwxr-x 10 otis otis 4096 Feb 16 15:03 .. > drwxrwxr-x 6 otis otis 4096 Feb 21 09:18 analyzers > drwxrwxr-x 5 otis otis 4096 Mar 16 2006 ant > drwxrwxr-x 6 otis otis 4096 Feb 16 12:34 benchmark > rw-rw-r - 1 otis otis 1347 May 22 2005 contrib-build.xml > drwxrwxr-x 6 otis otis 4096 Jan 9 2006 db > drwxrwxr-x 7 otis otis 4096 Dec 17 00:45 gdata-server > drwxrwxr-x 6 otis otis 4096 May 1 2005 highlighter > drwxrwxr-x 6 otis otis 4096 Feb 26 2005 javascript > drwxrwxr-x 6 otis otis 4096 Nov 24 13:34 lucli > drwxrwxr-x 4 otis otis 4096 May 22 2005 memory > drwxrwxr-x 4 otis otis 4096 Feb 9 2006 miscellaneous > drwxrwxr-x 4 otis otis 4096 Mar 16 2006 queries > drwxrwxr-x 5 otis otis 4096 Jan 9 2006 regex > drwxrwxr-x 4 otis otis 4096 Nov 28 15:37 similarity > drwxrwxr-x 7 otis otis 4096 Nov 20 01:32 snowball > drwxrwxr-x 4 otis otis 4096 Jun 30 2006 spellchecker > drwxrwxr-x 4 otis otis 4096 Jul 16 2005 surround > drwxrwxr-x 7 otis otis 4096 Feb 21 09:04 .svn > drwxrwxr-x 5 otis otis 4096 May 1 2005 swing > drwxrwxr-x 4 otis otis 4096 May 22 2005 wordnet > drwxrwxr-x 4 otis otis 4096 Mar 16 2006 xml-query-parser Otis, HBase is strictly a hadoop project. Don't you think it should go in hadoop's contrib directory? hadoop/trunk/src/contrib$ ls -al ls -al total 24 drwxr-xr-x 10 jim jim 340 Feb 28 11:36 . drwxr-xr-x 12 jim jim 408 Feb 28 11:35 .. drwxr-xr-x 9 jim jim 306 Feb 28 11:38 .svn drwxr-xr-x 6 jim jim 204 Feb 28 11:35 abacus rw-r r - 1 jim jim 6616 Dec 12 11:36 build-contrib.xml rw-r r - 1 jim jim 1264 Jul 18 2006 build.xml drwxr-xr-x 5 jim jim 170 Feb 28 11:35 ec2 drwxr-xr-x 5 jim jim 170 Feb 28 11:38 hbase drwxr-xr-x 5 jim jim 170 Feb 28 11:35 streaming drwxr-xr-x 5 jim jim 170 Feb 28 11:35 test that is what I intended. Is the path for the patch relative to the repository root, or hadoop's trunk? Once I know how the patch is applied, I'll fix the paths in the patch file accordingly.
          Hide
          Otis Gospodnetic added a comment -

          Jim,
          You may want to make that contrib/hbase/... to follow Lucene's directory structure. Here is what Lucene's contrib dir is like:

          [otis@localhost trunk]$ ls -al contrib/
          total 184
          drwxrwxr-x 22 otis otis 4096 Nov 28 15:44 .
          drwxrwxr-x 10 otis otis 4096 Feb 16 15:03 ..
          drwxrwxr-x 6 otis otis 4096 Feb 21 09:18 analyzers
          drwxrwxr-x 5 otis otis 4096 Mar 16 2006 ant
          drwxrwxr-x 6 otis otis 4096 Feb 16 12:34 benchmark
          rw-rw-r- 1 otis otis 1347 May 22 2005 contrib-build.xml
          drwxrwxr-x 6 otis otis 4096 Jan 9 2006 db
          drwxrwxr-x 7 otis otis 4096 Dec 17 00:45 gdata-server
          drwxrwxr-x 6 otis otis 4096 May 1 2005 highlighter
          drwxrwxr-x 6 otis otis 4096 Feb 26 2005 javascript
          drwxrwxr-x 6 otis otis 4096 Nov 24 13:34 lucli
          drwxrwxr-x 4 otis otis 4096 May 22 2005 memory
          drwxrwxr-x 4 otis otis 4096 Feb 9 2006 miscellaneous
          drwxrwxr-x 4 otis otis 4096 Mar 16 2006 queries
          drwxrwxr-x 5 otis otis 4096 Jan 9 2006 regex
          drwxrwxr-x 4 otis otis 4096 Nov 28 15:37 similarity
          drwxrwxr-x 7 otis otis 4096 Nov 20 01:32 snowball
          drwxrwxr-x 4 otis otis 4096 Jun 30 2006 spellchecker
          drwxrwxr-x 4 otis otis 4096 Jul 16 2005 surround
          drwxrwxr-x 7 otis otis 4096 Feb 21 09:04 .svn
          drwxrwxr-x 5 otis otis 4096 May 1 2005 swing
          drwxrwxr-x 4 otis otis 4096 May 22 2005 wordnet
          drwxrwxr-x 4 otis otis 4096 Mar 16 2006 xml-query-parser

          Show
          Otis Gospodnetic added a comment - Jim, You may want to make that contrib/hbase/... to follow Lucene's directory structure. Here is what Lucene's contrib dir is like: [otis@localhost trunk] $ ls -al contrib/ total 184 drwxrwxr-x 22 otis otis 4096 Nov 28 15:44 . drwxrwxr-x 10 otis otis 4096 Feb 16 15:03 .. drwxrwxr-x 6 otis otis 4096 Feb 21 09:18 analyzers drwxrwxr-x 5 otis otis 4096 Mar 16 2006 ant drwxrwxr-x 6 otis otis 4096 Feb 16 12:34 benchmark rw-rw-r - 1 otis otis 1347 May 22 2005 contrib-build.xml drwxrwxr-x 6 otis otis 4096 Jan 9 2006 db drwxrwxr-x 7 otis otis 4096 Dec 17 00:45 gdata-server drwxrwxr-x 6 otis otis 4096 May 1 2005 highlighter drwxrwxr-x 6 otis otis 4096 Feb 26 2005 javascript drwxrwxr-x 6 otis otis 4096 Nov 24 13:34 lucli drwxrwxr-x 4 otis otis 4096 May 22 2005 memory drwxrwxr-x 4 otis otis 4096 Feb 9 2006 miscellaneous drwxrwxr-x 4 otis otis 4096 Mar 16 2006 queries drwxrwxr-x 5 otis otis 4096 Jan 9 2006 regex drwxrwxr-x 4 otis otis 4096 Nov 28 15:37 similarity drwxrwxr-x 7 otis otis 4096 Nov 20 01:32 snowball drwxrwxr-x 4 otis otis 4096 Jun 30 2006 spellchecker drwxrwxr-x 4 otis otis 4096 Jul 16 2005 surround drwxrwxr-x 7 otis otis 4096 Feb 21 09:04 .svn drwxrwxr-x 5 otis otis 4096 May 1 2005 swing drwxrwxr-x 4 otis otis 4096 May 22 2005 wordnet drwxrwxr-x 4 otis otis 4096 Mar 16 2006 xml-query-parser
          Show
          Hadoop QA added a comment - +1, because http://issues.apache.org/jira/secure/attachment/12352270/hbase.patch applied and successfully tested against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/512944 . Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
          Hide
          Jim Kellerman added a comment -

          Fix path names in patch file so that they are relative instead of absolute.

          Show
          Jim Kellerman added a comment - Fix path names in patch file so that they are relative instead of absolute.
          Hide
          Hadoop QA added a comment -

          -1, because the patch command could not apply the latest attachment http://issues.apache.org/jira/secure/attachment/12352261/hbase-patch.txt as a patch to trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/512944. Please note that this message is automatically generated and may represent a problem with the automation system and not the patch. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch

          Show
          Hadoop QA added a comment - -1, because the patch command could not apply the latest attachment http://issues.apache.org/jira/secure/attachment/12352261/hbase-patch.txt as a patch to trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/512944 . Please note that this message is automatically generated and may represent a problem with the automation system and not the patch. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
          Hide
          Jim Kellerman added a comment -

          This is a unified patch that adds all the HBase source to contrib/src/hbase

          Please note that this is a work-in-progress, so while it does compile, it is not complete.
          README.txt and build.xml are included

          Show
          Jim Kellerman added a comment - This is a unified patch that adds all the HBase source to contrib/src/hbase Please note that this is a work-in-progress, so while it does compile, it is not complete. README.txt and build.xml are included
          Hide
          Doug Cutting added a comment -

          > I assume that before I submit the patch, you want me to complete the client.

          Not necessarily. Revising the patch is a fine development mode for something with only one or two developers, but if you have more collaborators, it will be easier to use subversion. Committing it to contrib before it's complete with a clear notice that it's a work in progress would be acceptable to me.

          Personally I'd prioritize unit tests for the existing code slightly ahead of a complete client implementation.

          However you proceed, it doesn't hurt to post updated patches frequently.

          Show
          Doug Cutting added a comment - > I assume that before I submit the patch, you want me to complete the client. Not necessarily. Revising the patch is a fine development mode for something with only one or two developers, but if you have more collaborators, it will be easier to use subversion. Committing it to contrib before it's complete with a clear notice that it's a work in progress would be acceptable to me. Personally I'd prioritize unit tests for the existing code slightly ahead of a complete client implementation. However you proceed, it doesn't hurt to post updated patches frequently.
          Hide
          Jim Kellerman added a comment -

          I assume that before I submit the patch, you want me to complete the client.

          I have reformatted the code per Hadoop standards and have done a lot of work to change Java 2 syntax to use Java 5 generics.

          Show
          Jim Kellerman added a comment - I assume that before I submit the patch, you want me to complete the client. I have reformatted the code per Hadoop standards and have done a lot of work to change Java 2 syntax to use Java 5 generics.
          Hide
          Jim Kellerman added a comment -

          Yes, I will work on a single patch to put this in contrib.

          Assigning bug to me for now.

          Show
          Jim Kellerman added a comment - Yes, I will work on a single patch to put this in contrib. Assigning bug to me for now.
          Hide
          Doug Cutting added a comment -

          I think until this is more solid we should keep it in the contrib directory.

          Jim, are you willing to convert this to a single patch file that puts this in the contrib directory, complete with a build.xml? Information on what's expected from patches is at:

          http://wiki.apache.org/lucene-hadoop/HowToContribute

          Mike, do you have any unit tests for this? It'd be nice to start building up a good set. If Mike doesn't have any, then perhaps writing a few would be a good way to get to know what's there.

          Show
          Doug Cutting added a comment - I think until this is more solid we should keep it in the contrib directory. Jim, are you willing to convert this to a single patch file that puts this in the contrib directory, complete with a build.xml? Information on what's expected from patches is at: http://wiki.apache.org/lucene-hadoop/HowToContribute Mike, do you have any unit tests for this? It'd be nice to start building up a good set. If Mike doesn't have any, then perhaps writing a few would be a good way to get to know what's there.
          Hide
          Jim Kellerman added a comment -

          Patch for problems 2, 4 file 3.

          Show
          Jim Kellerman added a comment - Patch for problems 2, 4 file 3.
          Hide
          Jim Kellerman added a comment -

          Patch for problems 2, 4, file 2

          Show
          Jim Kellerman added a comment - Patch for problems 2, 4, file 2
          Hide
          Jim Kellerman added a comment -

          Patch for problems 2, 4 file 1.

          Show
          Jim Kellerman added a comment - Patch for problems 2, 4 file 1.
          Hide
          Jim Kellerman added a comment -

          Problems:

          1. HRegionServer does not implement HRegionServerInterface.openScanner(Text)

          2. HStore needs to pass Configuration object to MapFile.Writer constructor (2 occurrences)

          3. HStore references undefined method MapFile.getClosest(WritableComparable, Writable) (2 occurrences)

          4. HStoreFile needs Configuration object to pass to MapFile.Writer constructor (3 occurrences) - suggest passing Configuration
          to HStoreFile constructors. (Requires changes to HRegion, HStore and HStoreFile)

          Show
          Jim Kellerman added a comment - Problems: 1. HRegionServer does not implement HRegionServerInterface.openScanner(Text) 2. HStore needs to pass Configuration object to MapFile.Writer constructor (2 occurrences) 3. HStore references undefined method MapFile.getClosest(WritableComparable, Writable) (2 occurrences) 4. HStoreFile needs Configuration object to pass to MapFile.Writer constructor (3 occurrences) - suggest passing Configuration to HStoreFile constructors. (Requires changes to HRegion, HStore and HStoreFile)

            People

            • Assignee:
              Unassigned
              Reporter:
              Mike Cafarella
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Due:
                Created:
                Updated:
                Resolved:

                Development