Details

    • Type: Brainstorming Brainstorming
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.95.2
    • Fix Version/s: None
    • Component/s: build
    • Labels:
      None
    • Tags:
      build, maven

      Description

      With HBASE-4336, HBase will have the ability to add multiple modules for different aspects of the codebase (less tests, see HBASE-4336 for details). We need to set a policy for when modules should be used versus putting the code into a single existing module or dispersed across modules.

        Issue Links

          Activity

          Hide
          Jesse Yates added a comment -

          I'd like to avoid creating a ton of packages (or the tendancy to have lots of packages) as I see it more as a rough separation of concerns (like how hadoop has dfs, mr, and common) versus the finer grained functionality separation (where hadoop-common has 20+ modules) as each module means a new jar.

          In the short to medium term, I would like to see the following packages materialize out the existing single package:

          • hbase-assemble - necessary for building
          • hbase-common - common functionality used between the client and server
          • hbase-client - functionality just for the client. A general hbase client would just need hbase-common and hbase-client to run
          • hbase-server - all server side functionality, including regionserver and master (this could even be separated, but not necessarily)

          Other potential things that came up earlier in the process that seemed useful:

          • hbase-security - shouldn't be needed if we roll in security, but still an option
          • hbase-it - for a single place for higher level integration tests (all those using the mini-cluster) to avoid the maven test-jar dependency issue discussed in HBASE-4336

          Any more granularity that these pacakges tends to be a bit of a mess and rarely all that useful. Instead, a lot of times its really better to just have a config option to specify the right class and load that from the path. The jar approach is much more heavy weight and only useful for wholesale replacements for which there are multiple (possibly competing) implementations. For instance, async-hbase could roll up into a hbase-client.jar and be a drop-in replacement in your install, but you wouldn't have a whole log-cleaner jar for switching the log cleaner class to use.

          Show
          Jesse Yates added a comment - I'd like to avoid creating a ton of packages (or the tendancy to have lots of packages) as I see it more as a rough separation of concerns (like how hadoop has dfs, mr, and common) versus the finer grained functionality separation (where hadoop-common has 20+ modules) as each module means a new jar. In the short to medium term, I would like to see the following packages materialize out the existing single package: hbase-assemble - necessary for building hbase-common - common functionality used between the client and server hbase-client - functionality just for the client. A general hbase client would just need hbase-common and hbase-client to run hbase-server - all server side functionality, including regionserver and master (this could even be separated, but not necessarily) Other potential things that came up earlier in the process that seemed useful: hbase-security - shouldn't be needed if we roll in security, but still an option hbase-it - for a single place for higher level integration tests (all those using the mini-cluster) to avoid the maven test-jar dependency issue discussed in HBASE-4336 Any more granularity that these pacakges tends to be a bit of a mess and rarely all that useful. Instead, a lot of times its really better to just have a config option to specify the right class and load that from the path. The jar approach is much more heavy weight and only useful for wholesale replacements for which there are multiple (possibly competing) implementations. For instance, async-hbase could roll up into a hbase-client.jar and be a drop-in replacement in your install, but you wouldn't have a whole log-cleaner jar for switching the log cleaner class to use.
          Hide
          Jesse Yates added a comment -

          Results of hackathon today: Matt is going to start working on pulling things into an hbase-common module for the common/utility classes. When we do a rewrite of the client (probably based on asynchbase), then we are getting an hbase-client module. Until then, we are going to slowly start pulling out modules as they seem necessary.

          Also, I'm going to add the hbase-common module so there is an example for how to add a new module, but let Matt deal with the actual moving of classes (thanks matt!).

          Show
          Jesse Yates added a comment - Results of hackathon today: Matt is going to start working on pulling things into an hbase-common module for the common/utility classes. When we do a rewrite of the client (probably based on asynchbase), then we are getting an hbase-client module. Until then, we are going to slowly start pulling out modules as they seem necessary. Also, I'm going to add the hbase-common module so there is an example for how to add a new module, but let Matt deal with the actual moving of classes (thanks matt!).
          Hide
          Jesse Yates added a comment -

          Adding link to the hbase-common ticket - HBASE-6087

          Show
          Jesse Yates added a comment - Adding link to the hbase-common ticket - HBASE-6087
          Hide
          Matt Corgan added a comment -

          Jesse, Stack and I have discussed this from a few different angles to try to identify some of the reasons for creating modules. The main benefit of modules is to isolate complex implementations behind simple interfaces. The main drawback is that modules add overhead in the form of more things to open in eclipse and more jar files in the build.

          Pasting from HBASE-5720 some arguments for creating a "codec" module that contains wrapper classes for individual HFile block types:

          • make it more testable, like a normal in-memory data structure without having to set up heavyweight testing environments
          • separate the encoding concerns from IO concerns. after the checksum happens, encoders/decoders should not even know what an IOException is
          • strongly discourage people from modifying anything in the codec packages without knowing what they're getting into
          • ensure the main project code only references the interfaces and not any codec internals (see if main project compiles without codecs in classpath)
          • make it easier for contributors to develop and profile the codecs without having to become experts in all aspects of hbase
          • help to simplify the main project. imagine if the gzip or snappy internals were sprinkled throughout the regionserver code. yikes.

          Attaching Potential-HBase-Modules-v1.pdf and Potential-HBaseModule-Descriptions-v1.pdf to illustrate a possible roadmap for extracting modules. We currently have hbase-server, and first going to "pull up" some files into hbase-common. Eventually we may "push down" an integration-test module.

          Extracting these modules can't really be done all at once, so this is just a roadmap meant to start discussion. For example, there's probably an opportunity to isolate some of regionserver and master code, but they also share a lot. This v1 doc shows a push down of master code out of the server module, but we probably need to think through that in more detail.

          Show
          Matt Corgan added a comment - Jesse, Stack and I have discussed this from a few different angles to try to identify some of the reasons for creating modules. The main benefit of modules is to isolate complex implementations behind simple interfaces. The main drawback is that modules add overhead in the form of more things to open in eclipse and more jar files in the build. Pasting from HBASE-5720 some arguments for creating a "codec" module that contains wrapper classes for individual HFile block types: make it more testable, like a normal in-memory data structure without having to set up heavyweight testing environments separate the encoding concerns from IO concerns. after the checksum happens, encoders/decoders should not even know what an IOException is strongly discourage people from modifying anything in the codec packages without knowing what they're getting into ensure the main project code only references the interfaces and not any codec internals (see if main project compiles without codecs in classpath) make it easier for contributors to develop and profile the codecs without having to become experts in all aspects of hbase help to simplify the main project. imagine if the gzip or snappy internals were sprinkled throughout the regionserver code. yikes. Attaching Potential-HBase-Modules-v1.pdf and Potential-HBaseModule-Descriptions-v1.pdf to illustrate a possible roadmap for extracting modules. We currently have hbase-server, and first going to "pull up" some files into hbase-common. Eventually we may "push down" an integration-test module. Extracting these modules can't really be done all at once, so this is just a roadmap meant to start discussion. For example, there's probably an opportunity to isolate some of regionserver and master code, but they also share a lot. This v1 doc shows a push down of master code out of the server module, but we probably need to think through that in more detail. Link to dependency chart: https://docs.google.com/presentation/d/16Kf9FAFjtneWwCnpy9Bql4QhXmORf7U9uJLoRobePHQ/edit Link to description doc: https://docs.google.com/document/d/1RHrUa9qWGvIR6ZmqVYP17rS7JTPSzCFCPKNjTo-XY38/edit
          Hide
          Jesse Yates added a comment -

          Other thing which would be awesome is an hbase-mapreduce package. Pull out all the classes that are map-reduce specific, but don't really touch the rest of the codebase.

          Show
          Jesse Yates added a comment - Other thing which would be awesome is an hbase-mapreduce package. Pull out all the classes that are map-reduce specific, but don't really touch the rest of the codebase.
          Hide
          Matt Corgan added a comment -

          oh yeah - great call Jesse. replacing Potential-HBase-Modules-v1.pdf with v2

          Show
          Matt Corgan added a comment - oh yeah - great call Jesse. replacing Potential-HBase-Modules-v1.pdf with v2

            People

            • Assignee:
              Unassigned
              Reporter:
              Jesse Yates
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - 336h
                336h
                Remaining:
                Remaining Estimate - 336h
                336h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development