HBase
  1. HBase
  2. HBASE-6226

move DataBlockEncoding and related classes to hbase-common module

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.95.2
    • Fix Version/s: 0.95.0
    • Component/s: io, regionserver
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In order to isolate the implementation details of HBASE-4676 (PrefixTrie encoding) and other DataBlockEncoders by putting them in modules, this pulls up the DataBlockEncoding related interfaces into hbase-common.

      No tests are moved in this patch. The only notable change was trimming a few dependencies on HFileBlock which adds dependencies to much of the regionserver.

      The test suite passes locally for me.

      I tried to keep it as simple as possible... let me know if there are any concerns.

      1. HBASE-6226-v1.patch
        103 kB
        Matt Corgan
      2. HBASE-6226-v2.patch
        92 kB
        Matt Corgan
      3. HBASE-6226-v3.patch
        92 kB
        Matt Corgan
      4. HBASE-6226-v4.patch
        92 kB
        Matt Corgan
      5. 6226-suggestion.txt
        92 kB
        Ted Yu
      6. HBASE-6226-v4.patch
        92 kB
        stack

        Issue Links

          Activity

          Hide
          stack added a comment -

          Marking closed.

          Show
          stack added a comment - Marking closed.
          Hide
          stack added a comment -

          I ran the failed tests a few times locally and they passed.

          Committed to trunk. Thanks for the nice cleanup Matt. Want to open issue to get rid of hfile refs? Good stuff.

          Show
          stack added a comment - I ran the failed tests a few times locally and they passed. Committed to trunk. Thanks for the nice cleanup Matt. Want to open issue to get rid of hfile refs? Good stuff.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #78 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/78/)
          HBASE-6226 move DataBlockEncoding and related classes to hbase-common module (Revision 1356590)

          Result = FAILURE
          stack :
          Files :

          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding
          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoder.java
          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoding.java
          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/HFileBlockDecodingContext.java
          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/HFileBlockEncodingContext.java
          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/hfile
          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java
          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java
          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoder.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoding.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/encoding/HFileBlockDecodingContext.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/encoding/HFileBlockDefaultDecodingContext.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/encoding/HFileBlockEncodingContext.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #78 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/78/ ) HBASE-6226 move DataBlockEncoding and related classes to hbase-common module (Revision 1356590) Result = FAILURE stack : Files : /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoder.java /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoding.java /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/HFileBlockDecodingContext.java /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/HFileBlockEncodingContext.java /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/hfile /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoder.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoding.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/encoding/HFileBlockDecodingContext.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/encoding/HFileBlockDefaultDecodingContext.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/encoding/HFileBlockEncodingContext.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12534126/HBASE-6226-v4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings).

          -1 findbugs. The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.coprocessor.TestRowProcessorEndpoint
          org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2314//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2314//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2314//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2314//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12534126/HBASE-6226-v4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestRowProcessorEndpoint org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2314//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2314//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2314//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2314//console This message is automatically generated.
          Hide
          stack added a comment -

          @Matt It would if I submitted the patch... Ugh Lets see.

          Show
          stack added a comment - @Matt It would if I submitted the patch... Ugh Lets see.
          Hide
          Matt Corgan added a comment -

          Should Hadoop QA have posted a result for this one Stack?

          Show
          Matt Corgan added a comment - Should Hadoop QA have posted a result for this one Stack?
          Hide
          stack added a comment -

          Retry Matt's last patch

          Show
          stack added a comment - Retry Matt's last patch
          Hide
          stack added a comment -

          @Matt Ok. Lets get issues I note in another JIRA. Let me retry your v4 against hadoopqa. See if same tests fail.

          Show
          stack added a comment - @Matt Ok. Lets get issues I note in another JIRA. Let me retry your v4 against hadoopqa. See if same tests fail.
          Hide
          Matt Corgan added a comment -

          Thanks guys. I left a reply on the encoder instantiation via reflection discussion. repeating here:

          These encoder implementations are singletons that don't even have any member fields, so they are as lightweight as can be. I think it's better they fail-fast, though all implementations are currently bundled with hbase so it'd be weird that they ever fail.

          The only reason i made any change is that they are no longer in the compilation-time classpath making the existing instantiation impossible. So one concern might be that the implementation could change name or package, but i'm thinking unit tests cover us well there.

          The weird thing now is that hbase-common knows about different implementations, but it mirrors how the Compression implementations are handled. Maybe in the future we can consider having the HColumnDescriptor take these fully-qualified class names so that different implementations can be plugged without being added to this enum, but that's veering off track for this jira.

          Stack - it looks like Phabricator correctly matches up moved files, but if nothing in the file has changed it makes it look like a new file. Some of your comments on that top file are are things that already exist on trunk. All good points, but maybe we can address separately.

          Show
          Matt Corgan added a comment - Thanks guys. I left a reply on the encoder instantiation via reflection discussion. repeating here: These encoder implementations are singletons that don't even have any member fields, so they are as lightweight as can be. I think it's better they fail-fast, though all implementations are currently bundled with hbase so it'd be weird that they ever fail. The only reason i made any change is that they are no longer in the compilation-time classpath making the existing instantiation impossible. So one concern might be that the implementation could change name or package, but i'm thinking unit tests cover us well there. The weird thing now is that hbase-common knows about different implementations, but it mirrors how the Compression implementations are handled. Maybe in the future we can consider having the HColumnDescriptor take these fully-qualified class names so that different implementations can be plugged without being added to this enum, but that's veering off track for this jira. Stack - it looks like Phabricator correctly matches up moved files, but if nothing in the file has changed it makes it look like a new file. Some of your comments on that top file are are things that already exist on trunk. All good points, but maybe we can address separately.
          Hide
          stack added a comment -

          Matt I put some review up on phabricator. Good stuff.

          Show
          stack added a comment - Matt I put some review up on phabricator. Good stuff.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12534086/6226-suggestion.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings).

          -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks
          org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol
          org.apache.hadoop.hbase.master.TestSplitLogManager

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2299//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2299//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2299//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2299//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12534086/6226-suggestion.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol org.apache.hadoop.hbase.master.TestSplitLogManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2299//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2299//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2299//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2299//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          Patch reflecting my suggestion on phabricator.

          Show
          Ted Yu added a comment - Patch reflecting my suggestion on phabricator.
          Hide
          Matt Corgan added a comment -

          Supporting renamed files would be nice. I am running the "mvn -Darc initialize". Here is the full output:

          mcorgan@wyclef:~/hadoop/hbase$ mvn -Darc initialize
          [INFO] Scanning for projects...
          [INFO] ------------------------------------------------------------------------
          [INFO] Reactor Build Order:
          [INFO] 
          [INFO] HBase
          [INFO] HBase - Common
          [INFO] HBase - Server
          [INFO]                                                                         
          [INFO] ------------------------------------------------------------------------
          [INFO] Building HBase 0.95-SNAPSHOT
          [INFO] ------------------------------------------------------------------------
          [INFO]                                                                         
          [INFO] ------------------------------------------------------------------------
          [INFO] Building HBase - Common 0.95-SNAPSHOT
          [INFO] ------------------------------------------------------------------------
          [INFO]                                                                         
          [INFO] ------------------------------------------------------------------------
          [INFO] Building HBase - Server 0.95-SNAPSHOT
          [INFO] ------------------------------------------------------------------------
          [INFO] ------------------------------------------------------------------------
          [INFO] Reactor Summary:
          [INFO] 
          [INFO] HBase ............................................. SUCCESS [0.063s]
          [INFO] HBase - Common .................................... SUCCESS [0.001s]
          [INFO] HBase - Server .................................... SUCCESS [0.000s]
          [INFO] ------------------------------------------------------------------------
          [INFO] BUILD SUCCESS
          [INFO] ------------------------------------------------------------------------
          [INFO] Total time: 0.313s
          [INFO] Finished at: Thu Jun 28 16:47:22 PDT 2012
          [INFO] Final Memory: 5M/480M
          [INFO] ------------------------------------------------------------------------
          mcorgan@wyclef:~/hadoop/hbase$ arc diff --only
          Usage Exception: Failed to load phutil library at location '.arc_jira_lib'. This library is specified by the "phutil_libraries" setting in ".arcconfig". Check that the setting is correct and the library is located in the right place.
          mcorgan@wyclef:~/hadoop/hbase$ ..
          ..: command not found
          mcorgan@wyclef:~/hadoop/hbase$ ll
          total 396
          drwxr--r-x 12 mcorgan mcorgan   4096 Jun 25 14:53 ./
          drwxrwxr-x  8 mcorgan mcorgan   4096 Jun 28 15:25 ../
          -rw-rw-r--  1 mcorgan mcorgan    395 Jun 13 20:06 .arcconfig
          drwxrw-r-x  4 mcorgan mcorgan   4096 Jun 25 13:17 bin/
          -rw-rw-r--  1 mcorgan mcorgan 261312 Jun 13 20:06 CHANGES.txt
          drwxrw-r-x  2 mcorgan mcorgan   4096 Jun 13 20:06 conf/
          drwxrw-r-x  2 mcorgan mcorgan   4096 Jun 13 20:06 dev-support/
          drwxrw-r-x  5 mcorgan mcorgan   4096 Jun 13 20:06 examples/
          drwxrw-r-x  8 mcorgan mcorgan   4096 Jun 28 15:25 .git/
          -rw-rw-r--  1 mcorgan mcorgan    129 Jun 25 13:31 .gitignore
          drwxrw-r-x  5 mcorgan mcorgan   4096 Jun 25 14:53 hbase-common/
          drwxrw-r-x  5 mcorgan mcorgan   4096 Jun 28 11:46 hbase-server/
          -rw-rw-r--  1 mcorgan mcorgan  11358 Jun 13 20:06 LICENSE.txt
          -rw-rw-r--  1 mcorgan mcorgan    701 Jun 13 20:06 NOTICE.txt
          -rw-rw-r--  1 mcorgan mcorgan  58326 Jun 25 13:17 pom.xml
          -rw-rw-r--  1 mcorgan mcorgan    368 Jun 13 20:25 .project
          -rw-rw-r--  1 mcorgan mcorgan   1358 Jun 13 20:06 README.txt
          drwxrw-r-x  2 mcorgan mcorgan   4096 Jun 13 20:25 .settings/
          drwxrw-r-x  6 mcorgan mcorgan   4096 Jun 25 14:18 src/
          drwxr-xr-x  3 mcorgan mcorgan   4096 Jun 25 14:53 target/
          mcorgan@wyclef:~/hadoop/hbase$ 
          
          Show
          Matt Corgan added a comment - Supporting renamed files would be nice. I am running the "mvn -Darc initialize". Here is the full output: mcorgan@wyclef:~/hadoop/hbase$ mvn -Darc initialize [INFO] Scanning for projects... [INFO] ------------------------------------------------------------------------ [INFO] Reactor Build Order: [INFO] [INFO] HBase [INFO] HBase - Common [INFO] HBase - Server [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building HBase 0.95-SNAPSHOT [INFO] ------------------------------------------------------------------------ [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building HBase - Common 0.95-SNAPSHOT [INFO] ------------------------------------------------------------------------ [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building HBase - Server 0.95-SNAPSHOT [INFO] ------------------------------------------------------------------------ [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] HBase ............................................. SUCCESS [0.063s] [INFO] HBase - Common .................................... SUCCESS [0.001s] [INFO] HBase - Server .................................... SUCCESS [0.000s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 0.313s [INFO] Finished at: Thu Jun 28 16:47:22 PDT 2012 [INFO] Final Memory: 5M/480M [INFO] ------------------------------------------------------------------------ mcorgan@wyclef:~/hadoop/hbase$ arc diff --only Usage Exception: Failed to load phutil library at location '.arc_jira_lib'. This library is specified by the "phutil_libraries" setting in ".arcconfig" . Check that the setting is correct and the library is located in the right place. mcorgan@wyclef:~/hadoop/hbase$ .. ..: command not found mcorgan@wyclef:~/hadoop/hbase$ ll total 396 drwxr--r-x 12 mcorgan mcorgan 4096 Jun 25 14:53 ./ drwxrwxr-x 8 mcorgan mcorgan 4096 Jun 28 15:25 ../ -rw-rw-r-- 1 mcorgan mcorgan 395 Jun 13 20:06 .arcconfig drwxrw-r-x 4 mcorgan mcorgan 4096 Jun 25 13:17 bin/ -rw-rw-r-- 1 mcorgan mcorgan 261312 Jun 13 20:06 CHANGES.txt drwxrw-r-x 2 mcorgan mcorgan 4096 Jun 13 20:06 conf/ drwxrw-r-x 2 mcorgan mcorgan 4096 Jun 13 20:06 dev-support/ drwxrw-r-x 5 mcorgan mcorgan 4096 Jun 13 20:06 examples/ drwxrw-r-x 8 mcorgan mcorgan 4096 Jun 28 15:25 .git/ -rw-rw-r-- 1 mcorgan mcorgan 129 Jun 25 13:31 .gitignore drwxrw-r-x 5 mcorgan mcorgan 4096 Jun 25 14:53 hbase-common/ drwxrw-r-x 5 mcorgan mcorgan 4096 Jun 28 11:46 hbase-server/ -rw-rw-r-- 1 mcorgan mcorgan 11358 Jun 13 20:06 LICENSE.txt -rw-rw-r-- 1 mcorgan mcorgan 701 Jun 13 20:06 NOTICE.txt -rw-rw-r-- 1 mcorgan mcorgan 58326 Jun 25 13:17 pom.xml -rw-rw-r-- 1 mcorgan mcorgan 368 Jun 13 20:25 .project -rw-rw-r-- 1 mcorgan mcorgan 1358 Jun 13 20:06 README.txt drwxrw-r-x 2 mcorgan mcorgan 4096 Jun 13 20:25 .settings/ drwxrw-r-x 6 mcorgan mcorgan 4096 Jun 25 14:18 src/ drwxr-xr-x 3 mcorgan mcorgan 4096 Jun 25 14:53 target/ mcorgan@wyclef:~/hadoop/hbase$
          Hide
          Mikhail Bautin added a comment -

          Matt: regarding the Arcanist issue: have you run "mvn -Darc initialize"? It would download and install .arc_jira_lib in your HBase directory.

          Phabricator is a more convenient code review system compared to ReviewBoard or reviewing raw patches, because e.g. it correctly identifies renamed files (relevant to this particular diff).

          Show
          Mikhail Bautin added a comment - Matt: regarding the Arcanist issue: have you run "mvn -Darc initialize"? It would download and install .arc_jira_lib in your HBase directory. Phabricator is a more convenient code review system compared to ReviewBoard or reviewing raw patches, because e.g. it correctly identifies renamed files (relevant to this particular diff).
          Hide
          Mikhail Bautin added a comment -

          Matt: I have added one comment at https://reviews.facebook.net/D3891

          Show
          Mikhail Bautin added a comment - Matt: I have added one comment at https://reviews.facebook.net/D3891
          Hide
          Mikhail Bautin added a comment -

          Matt: I have submitted the diff to Phabricator as https://reviews.facebook.net/D3891. I will add my comments there.

          Show
          Mikhail Bautin added a comment - Matt: I have submitted the diff to Phabricator as https://reviews.facebook.net/D3891 . I will add my comments there.
          Hide
          Ted Yu added a comment -

          Thanks for the quick turn around.

          Will wait for 2 days to see if Mikhail has further comments.

          Show
          Ted Yu added a comment - Thanks for the quick turn around. Will wait for 2 days to see if Mikhail has further comments.
          Hide
          Matt Corgan added a comment -

          Thanks for the review Ted. Attaching v4 patch with license year removed

          Show
          Matt Corgan added a comment - Thanks for the review Ted. Attaching v4 patch with license year removed
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12533882/HBASE-6226-v3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2284//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2284//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2284//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2284//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12533882/HBASE-6226-v3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2284//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2284//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2284//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2284//console This message is automatically generated.
          Hide
          Matt Corgan added a comment -

          Attaching v3 patch which is also up for review at https://reviews.apache.org/r/5648/

          Thanks for the ReviewBoard help Ted.

          Show
          Matt Corgan added a comment - Attaching v3 patch which is also up for review at https://reviews.apache.org/r/5648/ Thanks for the ReviewBoard help Ted.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12533779/HBASE-6226-v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.master.TestSplitLogManager

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2282//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2282//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2282//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2282//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12533779/HBASE-6226-v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestSplitLogManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2282//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2282//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2282//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2282//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          @Matt:
          You should have selected hbase-git as the Repository.

          The change to prepareDecoding() looks good.

          +   * @param block HFile block object
          +   * @param onDiskBlock on disk bytes to be decoded
          +   * @param offset data start offset in onDiskBlock
          +   * @throws IOException
          +   */
          +  public void prepareDecoding(int onDiskSizeWithoutHeader, int uncompressedSizeWithoutHeader,
          

          The javadoc doesn't match parameters.
          In HFileBlock.java, there're some white spaces.

          Here is one review request: https://reviews.apache.org/r/5643/
          Feel free to create your own.

          Show
          Ted Yu added a comment - @Matt: You should have selected hbase-git as the Repository. The change to prepareDecoding() looks good. + * @param block HFile block object + * @param onDiskBlock on disk bytes to be decoded + * @param offset data start offset in onDiskBlock + * @ throws IOException + */ + public void prepareDecoding( int onDiskSizeWithoutHeader, int uncompressedSizeWithoutHeader, The javadoc doesn't match parameters. In HFileBlock.java, there're some white spaces. Here is one review request: https://reviews.apache.org/r/5643/ Feel free to create your own.
          Hide
          Matt Corgan added a comment -

          sorry, try #3 was "/hbase" (something broke)

          Show
          Matt Corgan added a comment - sorry, try #3 was "/hbase" (something broke)
          Hide
          Matt Corgan added a comment -

          i've tried 1) leaving it empty, 2) "/" (no quotes), and 3) "" (empty)

          1 gives me a normal error message, and 2 and 3 trigger the "something broke"

          Ted, maybe you can try to upload the v2 patch from this jira to see if you get the same error?

          Show
          Matt Corgan added a comment - i've tried 1) leaving it empty, 2) "/" (no quotes), and 3) "" (empty) 1 gives me a normal error message, and 2 and 3 trigger the "something broke" Ted, maybe you can try to upload the v2 patch from this jira to see if you get the same error?
          Hide
          Ted Yu added a comment -

          Try / for Base Directory on review board.

          Show
          Ted Yu added a comment - Try / for Base Directory on review board.
          Hide
          Matt Corgan added a comment -

          Trying to post the v2 patch to reviews.apache.org, i get:

          Something broke! (Error 500)

          It appears something broke when you tried to go to here. This is either a bug in Review Board or a server configuration error. Please report this to your administrator.

          The patch was generated from the project root directory, so is "/" what i should be entering for the "Base Directory"?

          Any suggestions for getting around that?

          I believe the only code modification (besides moving files) is in HFileBlock:1731, which is line 2504 of the v2 patch.

          Show
          Matt Corgan added a comment - Trying to post the v2 patch to reviews.apache.org, i get: Something broke! (Error 500) It appears something broke when you tried to go to here. This is either a bug in Review Board or a server configuration error. Please report this to your administrator. The patch was generated from the project root directory, so is "/" what i should be entering for the "Base Directory"? Any suggestions for getting around that? I believe the only code modification (besides moving files) is in HFileBlock:1731, which is line 2504 of the v2 patch.
          Hide
          Ted Yu added a comment -

          @Matt:
          Uploading patch onto https://reviews.apache.org is another option.

          Show
          Ted Yu added a comment - @Matt: Uploading patch onto https://reviews.apache.org is another option.
          Hide
          Matt Corgan added a comment -

          Hi Mikhail,

          I installed arcanist from scratch and haven't used it before. Do you know how to fix this error?

          Usage Exception: Failed to load phutil library at location '.arc_jira_lib'. This library is specified by the "phutil_libraries" setting in ".arcconfig". Check that the setting is correct and the library is located in the right place.

          here are the contents of my hbase/.arcconfig file:

          {
            "project_id" : "hbase",
            "conduit_uri" : "https://reviews.facebook.net/",
            "copyright_holder" : "Apache Software Foundation",
            "phutil_libraries" : {
              "arclib" : ".arc_jira_lib"
            },
            "arcanist_configuration" : "ArcJIRAConfiguration",
            "jira_project" : "HBASE",
            "jira_api_url" : "https://issues.apache.org/jira/si/",
            "lint_engine" : "JavaLintEngine",
            "max_line_length" : 100
          }
          
          Show
          Matt Corgan added a comment - Hi Mikhail, I installed arcanist from scratch and haven't used it before. Do you know how to fix this error? Usage Exception: Failed to load phutil library at location '.arc_jira_lib'. This library is specified by the "phutil_libraries" setting in ".arcconfig". Check that the setting is correct and the library is located in the right place. here are the contents of my hbase/.arcconfig file: { "project_id" : "hbase" , "conduit_uri" : "https: //reviews.facebook.net/" , "copyright_holder" : "Apache Software Foundation" , "phutil_libraries" : { "arclib" : ".arc_jira_lib" }, "arcanist_configuration" : "ArcJIRAConfiguration" , "jira_project" : "HBASE" , "jira_api_url" : "https: //issues.apache.org/jira/si/" , "lint_engine" : "JavaLintEngine" , "max_line_length" : 100 }
          Hide
          Mikhail Bautin added a comment -

          Hi Matt,

          The module breakdown you have described makes sense. Unfortunately, I can't see much from the patch because a lot of files are being moved around. Would you mind posting it at https://reviews.facebook.com? You need to run "mvn -Darc initialize", then (assuming you have a local git-svn checkout) run "arc diff --only" when your changes correspond to the latest local commit, copy-paste the URL arc gives you into the browser, and fill out some fields such as title and summary.

          Thanks,
          Mikhail

          Show
          Mikhail Bautin added a comment - Hi Matt, The module breakdown you have described makes sense. Unfortunately, I can't see much from the patch because a lot of files are being moved around. Would you mind posting it at https://reviews.facebook.com? You need to run "mvn -Darc initialize", then (assuming you have a local git-svn checkout) run "arc diff --only" when your changes correspond to the latest local commit, copy-paste the URL arc gives you into the browser, and fill out some fields such as title and summary. Thanks, Mikhail
          Hide
          Matt Corgan added a comment -

          Thanks for looking over it Mikhail. I've been looking at it from a few different angles:

          Testing:
          DataBlockEncoding is essentially a "codec" between KeyValues and byte[]'s. There is no threading, locking, IO, timeouts, retries, etc. There shouldn't even be exceptions. It would be nice to isolate the codec logic to minimize it's dependencies and to "harden" it as much as possible through pure unit tests, rather than heavyweight integration tests. We should be able to prove the correctness of the codecs and their edge cases without usage of a minicluster, etc. We will still need higher-level tests using the minicluster, but the low level tests should be extremely focused on the codec details. For example, I think some of the DeltaEncoders have some pretty complex and fragile logic that we could test a little more thoroughly if the testing environment were simpler.

          Maintenance:
          It would be good to eventually separate the codecs like this into a separate module than hbase-server. It will isolate the code and tests so developers know that danger lurks inside those modules, and committers can keep a sharp eye out for any changes that affect that module. I'm aiming to make a module called hbase-prefix-tree that can hold the pluggable implementation classes for the implementation, and it may grow to support trie-encoded block indexes and memstores. That some of the other current HFileBlock codecs go into an hbase-codec module even though they don't formally exist yet.

          Client:
          I haven't brought this up because I don't want to be too starry-eyed, but I think it is actually appropriate for the client to make use of the DataBlockEncoders to encode KeyValues on-the-fly over the wire. It brings the same cost/benefit trade-off as encoding for disk space or block cache space. An even more advanced feature would be to pass entire data blocks over the wire for certain use cases (primarily unfiltered scans), and let the client decode them, saving a ton of server cpu. Others have mentioned re-writing the client from scratch for various reasons, and I would love to see these encodings built in from the start.

          Dependency graph:
          Stack, Jesse, and I did some brainstorming on the path to modularization, and I suggested the separation of the codecs from the hbase-server module. There's a diagram on HBASE-5977. We would try to extract and encapsulate the logic for decoding what's inside each type of HFileBlock, creating a class hierarchy of HFileBlock implementations that implement interfaces like BloomChunkBlock. From the perspective of hbase-server, everything behind those interfaces is a perfectly tested, high-performance black box.

          I'm actually proposing that hbase-server cannot see into hbase-codec and that hbase-codec cannot see into hbase-server. They both would see into hbase-common where we store the DataBlockEncoding enum and the DataBlockEncoder and EncodedSeeker interfaces. In this case the DataBlockEncoding enum goes into the hbase-common module and contains strings pointing to the implementation classes, and the implementations are instantiated via reflection. This also sets up a simple framework for additional codecs (possibly highly customized to a particular use case) to be developed with minimal effect on hbase-server.

          Anyway, hopefully that makes sense. All in all, I see hbase-common as the "kernel" of the project, not simply a code-gateway between client and server. It should contain the core interfaces and classes that are fundamental to the concept of hbase with the assumption that they are simple enough to be thoroughly tested via unit tests.

          Show
          Matt Corgan added a comment - Thanks for looking over it Mikhail. I've been looking at it from a few different angles: Testing: DataBlockEncoding is essentially a "codec" between KeyValues and byte[]'s. There is no threading, locking, IO, timeouts, retries, etc. There shouldn't even be exceptions. It would be nice to isolate the codec logic to minimize it's dependencies and to "harden" it as much as possible through pure unit tests, rather than heavyweight integration tests. We should be able to prove the correctness of the codecs and their edge cases without usage of a minicluster, etc. We will still need higher-level tests using the minicluster, but the low level tests should be extremely focused on the codec details. For example, I think some of the DeltaEncoders have some pretty complex and fragile logic that we could test a little more thoroughly if the testing environment were simpler. Maintenance: It would be good to eventually separate the codecs like this into a separate module than hbase-server. It will isolate the code and tests so developers know that danger lurks inside those modules, and committers can keep a sharp eye out for any changes that affect that module. I'm aiming to make a module called hbase-prefix-tree that can hold the pluggable implementation classes for the implementation, and it may grow to support trie-encoded block indexes and memstores. That some of the other current HFileBlock codecs go into an hbase-codec module even though they don't formally exist yet. Client: I haven't brought this up because I don't want to be too starry-eyed, but I think it is actually appropriate for the client to make use of the DataBlockEncoders to encode KeyValues on-the-fly over the wire. It brings the same cost/benefit trade-off as encoding for disk space or block cache space. An even more advanced feature would be to pass entire data blocks over the wire for certain use cases (primarily unfiltered scans), and let the client decode them, saving a ton of server cpu. Others have mentioned re-writing the client from scratch for various reasons, and I would love to see these encodings built in from the start. Dependency graph: Stack, Jesse, and I did some brainstorming on the path to modularization, and I suggested the separation of the codecs from the hbase-server module. There's a diagram on HBASE-5977 . We would try to extract and encapsulate the logic for decoding what's inside each type of HFileBlock, creating a class hierarchy of HFileBlock implementations that implement interfaces like BloomChunkBlock. From the perspective of hbase-server, everything behind those interfaces is a perfectly tested, high-performance black box. I'm actually proposing that hbase-server cannot see into hbase-codec and that hbase-codec cannot see into hbase-server. They both would see into hbase-common where we store the DataBlockEncoding enum and the DataBlockEncoder and EncodedSeeker interfaces. In this case the DataBlockEncoding enum goes into the hbase-common module and contains strings pointing to the implementation classes, and the implementations are instantiated via reflection. This also sets up a simple framework for additional codecs (possibly highly customized to a particular use case) to be developed with minimal effect on hbase-server. Anyway, hopefully that makes sense. All in all, I see hbase-common as the "kernel" of the project, not simply a code-gateway between client and server. It should contain the core interfaces and classes that are fundamental to the concept of hbase with the assumption that they are simple enough to be thoroughly tested via unit tests.
          Hide
          Mikhail Bautin added a comment - - edited

          Data block encoding is a server-side feature, but putting it in hbase-common will make it available to clients, right? What is the desired dependency graph between hbase-server, hbase-common, and hbase-<data_block_encodingX> components?

          Show
          Mikhail Bautin added a comment - - edited Data block encoding is a server-side feature, but putting it in hbase-common will make it available to clients, right? What is the desired dependency graph between hbase-server, hbase-common, and hbase-<data_block_encodingX> components?
          Hide
          Matt Corgan added a comment -

          attaching initial patch

          Show
          Matt Corgan added a comment - attaching initial patch

            People

            • Assignee:
              Matt Corgan
              Reporter:
              Matt Corgan
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development