Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.6.0
    • Component/s: None
    • Labels:

      Description

      Just realized we had no MR input method which supports multiple Tables for an input format. I would see it making the table the mapper's key and making the Key/Value a tuple, or alternatively have the Table/Key be the key tuple and stick with Values being the value.

      1. new-multitable-if.patch
        68 kB
        William Slacum
      2. multi-table-if.patch
        18 kB
        William Slacum
      3. ACCUMULO-391.patch
        103 kB
        Corey J. Nolet

        Issue Links

        There are no Sub-Tasks for this issue.

          Activity

          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 7eff42ffba6e8164c2266a2ddbd88566aa354552 in branch refs/heads/master from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=7eff42f ]

          ACCUMULO-391 Renaming TableQueryConfigTest to appriate InputTableConfigTest

          Show
          jira-bot ASF subversion and git services added a comment - Commit 7eff42ffba6e8164c2266a2ddbd88566aa354552 in branch refs/heads/master from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=7eff42f ] ACCUMULO-391 Renaming TableQueryConfigTest to appriate InputTableConfigTest
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 61353d1e3f838f566aa7006e19e0af1ccd02d18a in branch refs/heads/master from Christopher Tubbs
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=61353d1 ]

          ACCUMULO-391 Use more accurate "InputTableConfig" term

          Show
          jira-bot ASF subversion and git services added a comment - Commit 61353d1e3f838f566aa7006e19e0af1ccd02d18a in branch refs/heads/master from Christopher Tubbs [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=61353d1 ] ACCUMULO-391 Use more accurate "InputTableConfig" term
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit fe239249b11bc3de48e423c5e9e50f1f9fe00f5e in branch refs/heads/master from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=fe23924 ]

          ACCUMULO-391 Javadoc and comments

          Show
          jira-bot ASF subversion and git services added a comment - Commit fe239249b11bc3de48e423c5e9e50f1f9fe00f5e in branch refs/heads/master from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=fe23924 ] ACCUMULO-391 Javadoc and comments
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit cb97e82a4283ef028cd0500b1c23941386ecb6f8 in branch refs/heads/master from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=cb97e82 ]

          ACCUMULO-391 Performing refactor to the legacy mapred InputFormatBase

          Show
          jira-bot ASF subversion and git services added a comment - Commit cb97e82a4283ef028cd0500b1c23941386ecb6f8 in branch refs/heads/master from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=cb97e82 ] ACCUMULO-391 Performing refactor to the legacy mapred InputFormatBase
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 9a63ff4ecf4b479403d16f1ee44b4f552f71719d in branch refs/heads/master from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=9a63ff4 ]

          ACCUMULO-391 setters and getters for BatchScanConfigs on jobs now use Map<String,BatchScanConfig> instead of a vararg.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 9a63ff4ecf4b479403d16f1ee44b4f552f71719d in branch refs/heads/master from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=9a63ff4 ] ACCUMULO-391 setters and getters for BatchScanConfigs on jobs now use Map<String,BatchScanConfig> instead of a vararg.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit ebd112056017d3bbe32c62329ca31fecf6c22fea in branch refs/heads/master from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=ebd1120 ]

          ACCUMULO-391 AbstractRecordReader created to help common functionality.

          Show
          jira-bot ASF subversion and git services added a comment - Commit ebd112056017d3bbe32c62329ca31fecf6c22fea in branch refs/heads/master from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=ebd1120 ] ACCUMULO-391 AbstractRecordReader created to help common functionality.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 87b104d2322cb17e177322e76c51f8ea5eaaa206 in branch refs/heads/master from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=87b104d ]

          ACCUMULO-391 Renaming TableQueryConfig to BatchScanConfig and moving it into proper client location

          Show
          jira-bot ASF subversion and git services added a comment - Commit 87b104d2322cb17e177322e76c51f8ea5eaaa206 in branch refs/heads/master from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=87b104d ] ACCUMULO-391 Renaming TableQueryConfig to BatchScanConfig and moving it into proper client location
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 5c496552e024817bbd197514c3a37c84b6e28f50 in branch refs/heads/master from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=5c49655 ]

          ACCUMULO-391 Adding AccumuloMultiTableInputFormat and tests. Reverting AccumuloInputFormatTest back to pre-multi-table version.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 5c496552e024817bbd197514c3a37c84b6e28f50 in branch refs/heads/master from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=5c49655 ] ACCUMULO-391 Adding AccumuloMultiTableInputFormat and tests. Reverting AccumuloInputFormatTest back to pre-multi-table version.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit d399fc251dcda64d184e7ec0ee0f29317d496f16 in branch refs/heads/master from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=d399fc2 ]

          ACCUMULO-391 Removing deprecation from single-table methods

          Show
          jira-bot ASF subversion and git services added a comment - Commit d399fc251dcda64d184e7ec0ee0f29317d496f16 in branch refs/heads/master from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=d399fc2 ] ACCUMULO-391 Removing deprecation from single-table methods
          Hide
          ecn Eric Newton added a comment -

          Corey J. Nolet, can you clean up the warnings in trunk? I know you're not finished, but it's driving me a little crazy.

          Show
          ecn Eric Newton added a comment - Corey J. Nolet , can you clean up the warnings in trunk? I know you're not finished, but it's driving me a little crazy.
          Hide
          ecn Eric Newton added a comment - - edited

          +1 for separate classes for just simpler ease-of-use.

          +1 for reusing the multitable implementation if it doesn't decrease performance

          I think it's tough on users to deprecate the interface they are used to for a (slightly) more complicated interface they probably don't need.

          Show
          ecn Eric Newton added a comment - - edited +1 for separate classes for just simpler ease-of-use. +1 for reusing the multitable implementation if it doesn't decrease performance I think it's tough on users to deprecate the interface they are used to for a (slightly) more complicated interface they probably don't need.
          Hide
          ctubbsii Christopher Tubbs added a comment -

          the complexity introduced by keeping keeping separate iterator, range, columns, and tablenames on the job just made it very prone to falling into a bad state

          Right, I'd hate to have two implementations to maintain as well, especially since one is a specialized case of the other. We can simplify that, while preserving the existing single-table API, by internally delegating to the general purpose implementation (all this can happen in the common Configurator, so we don't have to maintain it separately for each mapred/mapreduce API), in order to have less code to maintain. I'm thinking that if the TableQueryConfig is easily serialized/deserialized, and fully mutable, it's a simple matter for the existing separate methods to deserialize the one table, mutate it, and serialize it back to the config. The rest of the code (the getSplits implementation, etc.) would be common.

          Show
          ctubbsii Christopher Tubbs added a comment - the complexity introduced by keeping keeping separate iterator, range, columns, and tablenames on the job just made it very prone to falling into a bad state Right, I'd hate to have two implementations to maintain as well, especially since one is a specialized case of the other. We can simplify that, while preserving the existing single-table API, by internally delegating to the general purpose implementation (all this can happen in the common Configurator, so we don't have to maintain it separately for each mapred/mapreduce API), in order to have less code to maintain. I'm thinking that if the TableQueryConfig is easily serialized/deserialized, and fully mutable, it's a simple matter for the existing separate methods to deserialize the one table, mutate it, and serialize it back to the config. The rest of the code (the getSplits implementation, etc.) would be common.
          Hide
          sonixbp Corey J. Nolet added a comment - - edited

          Chris,

          Thank you for your feedback! All of your bullet points look like quite simple fixes and I can work to incorporate them. As for new class vs. old deprecated class- It looked like a few people in this thread were for making the API change in this release- the complexity introduced by keeping keeping separate iterator, range, columns, and tablenames on the job just made it very prone to falling into a bad state.

          It seems like the only reason we'd need the single-table case to have its own set of methods anymore is so that we can continue to support legacy code without introducing breaking changes for users. I tried to limit the maintenance for the deprecated methods by ultimately converting the single table objects to TableQueryConfig objects when they are used by the internal code.

          All that being said- you have a good point about just implementing a new class and I think that was William's position when he wrote the initial patch. It looked like the majority of people on this thread were for just incorporating the patch into the InputFormatBase but this can be changed if that's the overall consensus.

          Show
          sonixbp Corey J. Nolet added a comment - - edited Chris, Thank you for your feedback! All of your bullet points look like quite simple fixes and I can work to incorporate them. As for new class vs. old deprecated class- It looked like a few people in this thread were for making the API change in this release- the complexity introduced by keeping keeping separate iterator, range, columns, and tablenames on the job just made it very prone to falling into a bad state. It seems like the only reason we'd need the single-table case to have its own set of methods anymore is so that we can continue to support legacy code without introducing breaking changes for users. I tried to limit the maintenance for the deprecated methods by ultimately converting the single table objects to TableQueryConfig objects when they are used by the internal code. All that being said- you have a good point about just implementing a new class and I think that was William's position when he wrote the initial patch. It looked like the majority of people on this thread were for just incorporating the patch into the InputFormatBase but this can be changed if that's the overall consensus.
          Hide
          ctubbsii Christopher Tubbs added a comment -

          A few issues:

          • shouldUseLocalIterators was added as a deprecated, public method
            • it didn't exist in a prior version, so it shouldn't be deprecated. If it's not needed, it should be removed.
            • our internal code still uses it, though it's deprecated. We shouldn't use our own deprecated code.
          • getters changed without deprecation
            • setupIterators
            • getTabletLocator
          • TableQueryConfig was placed in o.a.a.core.conf
            • o.a.a.core.conf isn't really part of the public API; it's essentially for server-side configuration representation, though we use it internal to some client code
            • precedent for narrowly scoped config is in the package in which its corresponding code exists (see o.a.a.core.client.BatchWriterConfig)
          • Javadoc
            • Empty @return statements
            • Javadocs advise using deprecated methods
            • Unnecessary change of variable name "context" in getSplits to "conf", with incorrect description based on new name rather than the object type
            • Incorrect Javadoc description on setTableQueryConfigs. It is not setting the objects on a Hadoop configuration. It is setting the configuration on the job. This is a minor thing, but the additional precision in language goes a long way towards clear documentation, especially for non-native English speakers and people less familiar with the way MapReduce works in Hadoop.
          • TableQueryConfig
            • For consistency and clarity, this should be named to match our other query code. Perhaps instead of "TableQueryConfig", it might be better to call it BatchScanConfig, similar to BatchWriterConfig.
            • Instead of the ambiguous setTableQueryConfigs, perhaps a method called Map<String, BatchScanConfig>, to make this object more re-usable.
            • getter should be protected

          Overall, given the significance of the changes to the API of the MapReduce code, for a limited application (most existing users of this class will probably continue to only scan one table at a time), I think it'd be better to put this code in a separate InputFormat class. This should be especially concerning because we've recently just stabilized the M/R code in 1.5.0, ironing out existing issues, and this is a bit too disruptive (deprecating brand new methods in 1.5, like setTableName and setRanges, for instance).

          Another reason this might be good as a separate class... is that we could actually have the existing single table version extend the multi-table version as a specialized case. That would save on maintenance costs of two implementations, but leave the existing stable and familiar API in tact until the new one is proven and stable. And then (if we really wanted to) we could deprecate the single table version as a whole, rather than deprecating half of it, and trying to maintain a half-deprecated half-current class.

          Show
          ctubbsii Christopher Tubbs added a comment - A few issues: shouldUseLocalIterators was added as a deprecated, public method it didn't exist in a prior version, so it shouldn't be deprecated. If it's not needed, it should be removed. our internal code still uses it, though it's deprecated. We shouldn't use our own deprecated code. getters changed without deprecation setupIterators getTabletLocator TableQueryConfig was placed in o.a.a.core.conf o.a.a.core.conf isn't really part of the public API; it's essentially for server-side configuration representation, though we use it internal to some client code precedent for narrowly scoped config is in the package in which its corresponding code exists (see o.a.a.core.client.BatchWriterConfig) Javadoc Empty @return statements Javadocs advise using deprecated methods Unnecessary change of variable name "context" in getSplits to "conf", with incorrect description based on new name rather than the object type Incorrect Javadoc description on setTableQueryConfigs. It is not setting the objects on a Hadoop configuration. It is setting the configuration on the job. This is a minor thing, but the additional precision in language goes a long way towards clear documentation, especially for non-native English speakers and people less familiar with the way MapReduce works in Hadoop. TableQueryConfig For consistency and clarity, this should be named to match our other query code. Perhaps instead of "TableQueryConfig", it might be better to call it BatchScanConfig, similar to BatchWriterConfig. Instead of the ambiguous setTableQueryConfigs, perhaps a method called Map<String, BatchScanConfig>, to make this object more re-usable. getter should be protected Overall, given the significance of the changes to the API of the MapReduce code, for a limited application (most existing users of this class will probably continue to only scan one table at a time), I think it'd be better to put this code in a separate InputFormat class. This should be especially concerning because we've recently just stabilized the M/R code in 1.5.0, ironing out existing issues, and this is a bit too disruptive (deprecating brand new methods in 1.5, like setTableName and setRanges, for instance). Another reason this might be good as a separate class... is that we could actually have the existing single table version extend the multi-table version as a specialized case. That would save on maintenance costs of two implementations, but leave the existing stable and familiar API in tact until the new one is proven and stable. And then (if we really wanted to) we could deprecate the single table version as a whole, rather than deprecating half of it, and trying to maintain a half-deprecated half-current class.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit b96701f220ecb3e891a71741179b867429fa1d39 in branch refs/heads/master from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=b96701f ]

          Squashed commit of the following:

          commit 3227a822379718d6c1297f11d7af37a716f78a60
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Tue Oct 1 23:20:34 2013 -0400

          Adding the following:

          • Deprecation to InputConfigurator, mapred.InputFormatBase, mapreduce.InputFormatBase
          • Comments to TableQueryConfig
          • Multi-table support to mapred.InputFormatBase

          ACCUMULO-391

          commit 6648e8a1c97939f740b24f9368ecda9f7072cbd2
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Tue Oct 1 21:45:37 2013 -0400

          Fixing some more formatting. Adding license headers. ACCUMULO-391

          commit 53bcc85689510fc988c9e9f6aff0da0cb7091c6c
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Mon Sep 30 21:01:55 2013 -0400

          Cleaning up tests. Adding test for legacy input for base + new multi-table methods. ACCUMULO-391

          commit e4e05c804ea7f486290181f0246cf6b2880f5d1a
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sun Sep 29 21:05:55 2013 -0400

          Fixing some formatting. Adding some comments. ACCUMULO-391

          commit 10b4eb8206ab4395ef2d4df375b52a7ffe77d655
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sun Sep 29 20:37:07 2013 -0400

          ACCUMULO-1732 Using table id in RangeInputSplit so that it can be resolved back to "working" table name in mappers. Scanner uses the "working" table name while everything else can still safely use the original configured table name.

          commit 7b8585f0333c09674f7612b4dc24887f684413fe
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 23:23:48 2013 -0400

          Removing deprecation for now until we have some discussions. Updating/adding comments. ACCUMULO-391

          commit 273ee49530de28c2c5dfe39c80ab0c90c3c3a95f
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 23:01:04 2013 -0400

          The legacy mapred InputFormatBase now verifies (and fixes the scanner for) a possible change in table name that could happen between the configuration of the map/reduce job and the actual processing of the scanner for a specific split. In that case, the most recent table name associated with the id is always used for the scanner (though the table name that was expected during job setup is still used in the RangeInputSplit). ACCUMULO-391

          commit e6a7c962f707487d832ba4b16c1f9066d13ff8f1
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 22:53:42 2013 -0400

          The original single-table setters/getters now populate a "default" TableQueryConfig object under the hood. This should make the switch over much easier. Deprecated single table methods in light of the API changes for the new configuration object. ACCUMULO-391

          commit fdf4cadb16c29fc03a610cf83399ee26d7f83bc9
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 21:58:40 2013 -0400

          Adding new TableQueryConfig object for setting multiple table info in the InputFormatBase

          Show
          jira-bot ASF subversion and git services added a comment - Commit b96701f220ecb3e891a71741179b867429fa1d39 in branch refs/heads/master from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=b96701f ] Squashed commit of the following: commit 3227a822379718d6c1297f11d7af37a716f78a60 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Tue Oct 1 23:20:34 2013 -0400 Adding the following: Deprecation to InputConfigurator, mapred.InputFormatBase, mapreduce.InputFormatBase Comments to TableQueryConfig Multi-table support to mapred.InputFormatBase ACCUMULO-391 commit 6648e8a1c97939f740b24f9368ecda9f7072cbd2 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Tue Oct 1 21:45:37 2013 -0400 Fixing some more formatting. Adding license headers. ACCUMULO-391 commit 53bcc85689510fc988c9e9f6aff0da0cb7091c6c Author: Corey J. Nolet <cjnolet@gmail.com> Date: Mon Sep 30 21:01:55 2013 -0400 Cleaning up tests. Adding test for legacy input for base + new multi-table methods. ACCUMULO-391 commit e4e05c804ea7f486290181f0246cf6b2880f5d1a Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sun Sep 29 21:05:55 2013 -0400 Fixing some formatting. Adding some comments. ACCUMULO-391 commit 10b4eb8206ab4395ef2d4df375b52a7ffe77d655 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sun Sep 29 20:37:07 2013 -0400 ACCUMULO-1732 Using table id in RangeInputSplit so that it can be resolved back to "working" table name in mappers. Scanner uses the "working" table name while everything else can still safely use the original configured table name. commit 7b8585f0333c09674f7612b4dc24887f684413fe Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 23:23:48 2013 -0400 Removing deprecation for now until we have some discussions. Updating/adding comments. ACCUMULO-391 commit 273ee49530de28c2c5dfe39c80ab0c90c3c3a95f Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 23:01:04 2013 -0400 The legacy mapred InputFormatBase now verifies (and fixes the scanner for) a possible change in table name that could happen between the configuration of the map/reduce job and the actual processing of the scanner for a specific split. In that case, the most recent table name associated with the id is always used for the scanner (though the table name that was expected during job setup is still used in the RangeInputSplit). ACCUMULO-391 commit e6a7c962f707487d832ba4b16c1f9066d13ff8f1 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 22:53:42 2013 -0400 The original single-table setters/getters now populate a "default" TableQueryConfig object under the hood. This should make the switch over much easier. Deprecated single table methods in light of the API changes for the new configuration object. ACCUMULO-391 commit fdf4cadb16c29fc03a610cf83399ee26d7f83bc9 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 21:58:40 2013 -0400 Adding new TableQueryConfig object for setting multiple table info in the InputFormatBase
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit b96701f220ecb3e891a71741179b867429fa1d39 in branch refs/heads/master from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=b96701f ]

          Squashed commit of the following:

          commit 3227a822379718d6c1297f11d7af37a716f78a60
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Tue Oct 1 23:20:34 2013 -0400

          Adding the following:

          • Deprecation to InputConfigurator, mapred.InputFormatBase, mapreduce.InputFormatBase
          • Comments to TableQueryConfig
          • Multi-table support to mapred.InputFormatBase

          ACCUMULO-391

          commit 6648e8a1c97939f740b24f9368ecda9f7072cbd2
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Tue Oct 1 21:45:37 2013 -0400

          Fixing some more formatting. Adding license headers. ACCUMULO-391

          commit 53bcc85689510fc988c9e9f6aff0da0cb7091c6c
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Mon Sep 30 21:01:55 2013 -0400

          Cleaning up tests. Adding test for legacy input for base + new multi-table methods. ACCUMULO-391

          commit e4e05c804ea7f486290181f0246cf6b2880f5d1a
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sun Sep 29 21:05:55 2013 -0400

          Fixing some formatting. Adding some comments. ACCUMULO-391

          commit 10b4eb8206ab4395ef2d4df375b52a7ffe77d655
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sun Sep 29 20:37:07 2013 -0400

          ACCUMULO-1732 Using table id in RangeInputSplit so that it can be resolved back to "working" table name in mappers. Scanner uses the "working" table name while everything else can still safely use the original configured table name.

          commit 7b8585f0333c09674f7612b4dc24887f684413fe
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 23:23:48 2013 -0400

          Removing deprecation for now until we have some discussions. Updating/adding comments. ACCUMULO-391

          commit 273ee49530de28c2c5dfe39c80ab0c90c3c3a95f
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 23:01:04 2013 -0400

          The legacy mapred InputFormatBase now verifies (and fixes the scanner for) a possible change in table name that could happen between the configuration of the map/reduce job and the actual processing of the scanner for a specific split. In that case, the most recent table name associated with the id is always used for the scanner (though the table name that was expected during job setup is still used in the RangeInputSplit). ACCUMULO-391

          commit e6a7c962f707487d832ba4b16c1f9066d13ff8f1
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 22:53:42 2013 -0400

          The original single-table setters/getters now populate a "default" TableQueryConfig object under the hood. This should make the switch over much easier. Deprecated single table methods in light of the API changes for the new configuration object. ACCUMULO-391

          commit fdf4cadb16c29fc03a610cf83399ee26d7f83bc9
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 21:58:40 2013 -0400

          Adding new TableQueryConfig object for setting multiple table info in the InputFormatBase

          Show
          jira-bot ASF subversion and git services added a comment - Commit b96701f220ecb3e891a71741179b867429fa1d39 in branch refs/heads/master from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=b96701f ] Squashed commit of the following: commit 3227a822379718d6c1297f11d7af37a716f78a60 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Tue Oct 1 23:20:34 2013 -0400 Adding the following: Deprecation to InputConfigurator, mapred.InputFormatBase, mapreduce.InputFormatBase Comments to TableQueryConfig Multi-table support to mapred.InputFormatBase ACCUMULO-391 commit 6648e8a1c97939f740b24f9368ecda9f7072cbd2 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Tue Oct 1 21:45:37 2013 -0400 Fixing some more formatting. Adding license headers. ACCUMULO-391 commit 53bcc85689510fc988c9e9f6aff0da0cb7091c6c Author: Corey J. Nolet <cjnolet@gmail.com> Date: Mon Sep 30 21:01:55 2013 -0400 Cleaning up tests. Adding test for legacy input for base + new multi-table methods. ACCUMULO-391 commit e4e05c804ea7f486290181f0246cf6b2880f5d1a Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sun Sep 29 21:05:55 2013 -0400 Fixing some formatting. Adding some comments. ACCUMULO-391 commit 10b4eb8206ab4395ef2d4df375b52a7ffe77d655 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sun Sep 29 20:37:07 2013 -0400 ACCUMULO-1732 Using table id in RangeInputSplit so that it can be resolved back to "working" table name in mappers. Scanner uses the "working" table name while everything else can still safely use the original configured table name. commit 7b8585f0333c09674f7612b4dc24887f684413fe Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 23:23:48 2013 -0400 Removing deprecation for now until we have some discussions. Updating/adding comments. ACCUMULO-391 commit 273ee49530de28c2c5dfe39c80ab0c90c3c3a95f Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 23:01:04 2013 -0400 The legacy mapred InputFormatBase now verifies (and fixes the scanner for) a possible change in table name that could happen between the configuration of the map/reduce job and the actual processing of the scanner for a specific split. In that case, the most recent table name associated with the id is always used for the scanner (though the table name that was expected during job setup is still used in the RangeInputSplit). ACCUMULO-391 commit e6a7c962f707487d832ba4b16c1f9066d13ff8f1 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 22:53:42 2013 -0400 The original single-table setters/getters now populate a "default" TableQueryConfig object under the hood. This should make the switch over much easier. Deprecated single table methods in light of the API changes for the new configuration object. ACCUMULO-391 commit fdf4cadb16c29fc03a610cf83399ee26d7f83bc9 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 21:58:40 2013 -0400 Adding new TableQueryConfig object for setting multiple table info in the InputFormatBase
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit b96701f220ecb3e891a71741179b867429fa1d39 in branch refs/heads/master from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=b96701f ]

          Squashed commit of the following:

          commit 3227a822379718d6c1297f11d7af37a716f78a60
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Tue Oct 1 23:20:34 2013 -0400

          Adding the following:

          • Deprecation to InputConfigurator, mapred.InputFormatBase, mapreduce.InputFormatBase
          • Comments to TableQueryConfig
          • Multi-table support to mapred.InputFormatBase

          ACCUMULO-391

          commit 6648e8a1c97939f740b24f9368ecda9f7072cbd2
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Tue Oct 1 21:45:37 2013 -0400

          Fixing some more formatting. Adding license headers. ACCUMULO-391

          commit 53bcc85689510fc988c9e9f6aff0da0cb7091c6c
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Mon Sep 30 21:01:55 2013 -0400

          Cleaning up tests. Adding test for legacy input for base + new multi-table methods. ACCUMULO-391

          commit e4e05c804ea7f486290181f0246cf6b2880f5d1a
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sun Sep 29 21:05:55 2013 -0400

          Fixing some formatting. Adding some comments. ACCUMULO-391

          commit 10b4eb8206ab4395ef2d4df375b52a7ffe77d655
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sun Sep 29 20:37:07 2013 -0400

          ACCUMULO-1732 Using table id in RangeInputSplit so that it can be resolved back to "working" table name in mappers. Scanner uses the "working" table name while everything else can still safely use the original configured table name.

          commit 7b8585f0333c09674f7612b4dc24887f684413fe
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 23:23:48 2013 -0400

          Removing deprecation for now until we have some discussions. Updating/adding comments. ACCUMULO-391

          commit 273ee49530de28c2c5dfe39c80ab0c90c3c3a95f
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 23:01:04 2013 -0400

          The legacy mapred InputFormatBase now verifies (and fixes the scanner for) a possible change in table name that could happen between the configuration of the map/reduce job and the actual processing of the scanner for a specific split. In that case, the most recent table name associated with the id is always used for the scanner (though the table name that was expected during job setup is still used in the RangeInputSplit). ACCUMULO-391

          commit e6a7c962f707487d832ba4b16c1f9066d13ff8f1
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 22:53:42 2013 -0400

          The original single-table setters/getters now populate a "default" TableQueryConfig object under the hood. This should make the switch over much easier. Deprecated single table methods in light of the API changes for the new configuration object. ACCUMULO-391

          commit fdf4cadb16c29fc03a610cf83399ee26d7f83bc9
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 21:58:40 2013 -0400

          Adding new TableQueryConfig object for setting multiple table info in the InputFormatBase

          Show
          jira-bot ASF subversion and git services added a comment - Commit b96701f220ecb3e891a71741179b867429fa1d39 in branch refs/heads/master from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=b96701f ] Squashed commit of the following: commit 3227a822379718d6c1297f11d7af37a716f78a60 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Tue Oct 1 23:20:34 2013 -0400 Adding the following: Deprecation to InputConfigurator, mapred.InputFormatBase, mapreduce.InputFormatBase Comments to TableQueryConfig Multi-table support to mapred.InputFormatBase ACCUMULO-391 commit 6648e8a1c97939f740b24f9368ecda9f7072cbd2 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Tue Oct 1 21:45:37 2013 -0400 Fixing some more formatting. Adding license headers. ACCUMULO-391 commit 53bcc85689510fc988c9e9f6aff0da0cb7091c6c Author: Corey J. Nolet <cjnolet@gmail.com> Date: Mon Sep 30 21:01:55 2013 -0400 Cleaning up tests. Adding test for legacy input for base + new multi-table methods. ACCUMULO-391 commit e4e05c804ea7f486290181f0246cf6b2880f5d1a Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sun Sep 29 21:05:55 2013 -0400 Fixing some formatting. Adding some comments. ACCUMULO-391 commit 10b4eb8206ab4395ef2d4df375b52a7ffe77d655 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sun Sep 29 20:37:07 2013 -0400 ACCUMULO-1732 Using table id in RangeInputSplit so that it can be resolved back to "working" table name in mappers. Scanner uses the "working" table name while everything else can still safely use the original configured table name. commit 7b8585f0333c09674f7612b4dc24887f684413fe Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 23:23:48 2013 -0400 Removing deprecation for now until we have some discussions. Updating/adding comments. ACCUMULO-391 commit 273ee49530de28c2c5dfe39c80ab0c90c3c3a95f Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 23:01:04 2013 -0400 The legacy mapred InputFormatBase now verifies (and fixes the scanner for) a possible change in table name that could happen between the configuration of the map/reduce job and the actual processing of the scanner for a specific split. In that case, the most recent table name associated with the id is always used for the scanner (though the table name that was expected during job setup is still used in the RangeInputSplit). ACCUMULO-391 commit e6a7c962f707487d832ba4b16c1f9066d13ff8f1 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 22:53:42 2013 -0400 The original single-table setters/getters now populate a "default" TableQueryConfig object under the hood. This should make the switch over much easier. Deprecated single table methods in light of the API changes for the new configuration object. ACCUMULO-391 commit fdf4cadb16c29fc03a610cf83399ee26d7f83bc9 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 21:58:40 2013 -0400 Adding new TableQueryConfig object for setting multiple table info in the InputFormatBase
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit b96701f220ecb3e891a71741179b867429fa1d39 in branch refs/heads/master from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=b96701f ]

          Squashed commit of the following:

          commit 3227a822379718d6c1297f11d7af37a716f78a60
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Tue Oct 1 23:20:34 2013 -0400

          Adding the following:

          • Deprecation to InputConfigurator, mapred.InputFormatBase, mapreduce.InputFormatBase
          • Comments to TableQueryConfig
          • Multi-table support to mapred.InputFormatBase

          ACCUMULO-391

          commit 6648e8a1c97939f740b24f9368ecda9f7072cbd2
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Tue Oct 1 21:45:37 2013 -0400

          Fixing some more formatting. Adding license headers. ACCUMULO-391

          commit 53bcc85689510fc988c9e9f6aff0da0cb7091c6c
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Mon Sep 30 21:01:55 2013 -0400

          Cleaning up tests. Adding test for legacy input for base + new multi-table methods. ACCUMULO-391

          commit e4e05c804ea7f486290181f0246cf6b2880f5d1a
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sun Sep 29 21:05:55 2013 -0400

          Fixing some formatting. Adding some comments. ACCUMULO-391

          commit 10b4eb8206ab4395ef2d4df375b52a7ffe77d655
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sun Sep 29 20:37:07 2013 -0400

          ACCUMULO-1732 Using table id in RangeInputSplit so that it can be resolved back to "working" table name in mappers. Scanner uses the "working" table name while everything else can still safely use the original configured table name.

          commit 7b8585f0333c09674f7612b4dc24887f684413fe
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 23:23:48 2013 -0400

          Removing deprecation for now until we have some discussions. Updating/adding comments. ACCUMULO-391

          commit 273ee49530de28c2c5dfe39c80ab0c90c3c3a95f
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 23:01:04 2013 -0400

          The legacy mapred InputFormatBase now verifies (and fixes the scanner for) a possible change in table name that could happen between the configuration of the map/reduce job and the actual processing of the scanner for a specific split. In that case, the most recent table name associated with the id is always used for the scanner (though the table name that was expected during job setup is still used in the RangeInputSplit). ACCUMULO-391

          commit e6a7c962f707487d832ba4b16c1f9066d13ff8f1
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 22:53:42 2013 -0400

          The original single-table setters/getters now populate a "default" TableQueryConfig object under the hood. This should make the switch over much easier. Deprecated single table methods in light of the API changes for the new configuration object. ACCUMULO-391

          commit fdf4cadb16c29fc03a610cf83399ee26d7f83bc9
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 21:58:40 2013 -0400

          Adding new TableQueryConfig object for setting multiple table info in the InputFormatBase

          Show
          jira-bot ASF subversion and git services added a comment - Commit b96701f220ecb3e891a71741179b867429fa1d39 in branch refs/heads/master from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=b96701f ] Squashed commit of the following: commit 3227a822379718d6c1297f11d7af37a716f78a60 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Tue Oct 1 23:20:34 2013 -0400 Adding the following: Deprecation to InputConfigurator, mapred.InputFormatBase, mapreduce.InputFormatBase Comments to TableQueryConfig Multi-table support to mapred.InputFormatBase ACCUMULO-391 commit 6648e8a1c97939f740b24f9368ecda9f7072cbd2 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Tue Oct 1 21:45:37 2013 -0400 Fixing some more formatting. Adding license headers. ACCUMULO-391 commit 53bcc85689510fc988c9e9f6aff0da0cb7091c6c Author: Corey J. Nolet <cjnolet@gmail.com> Date: Mon Sep 30 21:01:55 2013 -0400 Cleaning up tests. Adding test for legacy input for base + new multi-table methods. ACCUMULO-391 commit e4e05c804ea7f486290181f0246cf6b2880f5d1a Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sun Sep 29 21:05:55 2013 -0400 Fixing some formatting. Adding some comments. ACCUMULO-391 commit 10b4eb8206ab4395ef2d4df375b52a7ffe77d655 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sun Sep 29 20:37:07 2013 -0400 ACCUMULO-1732 Using table id in RangeInputSplit so that it can be resolved back to "working" table name in mappers. Scanner uses the "working" table name while everything else can still safely use the original configured table name. commit 7b8585f0333c09674f7612b4dc24887f684413fe Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 23:23:48 2013 -0400 Removing deprecation for now until we have some discussions. Updating/adding comments. ACCUMULO-391 commit 273ee49530de28c2c5dfe39c80ab0c90c3c3a95f Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 23:01:04 2013 -0400 The legacy mapred InputFormatBase now verifies (and fixes the scanner for) a possible change in table name that could happen between the configuration of the map/reduce job and the actual processing of the scanner for a specific split. In that case, the most recent table name associated with the id is always used for the scanner (though the table name that was expected during job setup is still used in the RangeInputSplit). ACCUMULO-391 commit e6a7c962f707487d832ba4b16c1f9066d13ff8f1 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 22:53:42 2013 -0400 The original single-table setters/getters now populate a "default" TableQueryConfig object under the hood. This should make the switch over much easier. Deprecated single table methods in light of the API changes for the new configuration object. ACCUMULO-391 commit fdf4cadb16c29fc03a610cf83399ee26d7f83bc9 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 21:58:40 2013 -0400 Adding new TableQueryConfig object for setting multiple table info in the InputFormatBase
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit b96701f220ecb3e891a71741179b867429fa1d39 in branch refs/heads/master from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=b96701f ]

          Squashed commit of the following:

          commit 3227a822379718d6c1297f11d7af37a716f78a60
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Tue Oct 1 23:20:34 2013 -0400

          Adding the following:

          • Deprecation to InputConfigurator, mapred.InputFormatBase, mapreduce.InputFormatBase
          • Comments to TableQueryConfig
          • Multi-table support to mapred.InputFormatBase

          ACCUMULO-391

          commit 6648e8a1c97939f740b24f9368ecda9f7072cbd2
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Tue Oct 1 21:45:37 2013 -0400

          Fixing some more formatting. Adding license headers. ACCUMULO-391

          commit 53bcc85689510fc988c9e9f6aff0da0cb7091c6c
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Mon Sep 30 21:01:55 2013 -0400

          Cleaning up tests. Adding test for legacy input for base + new multi-table methods. ACCUMULO-391

          commit e4e05c804ea7f486290181f0246cf6b2880f5d1a
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sun Sep 29 21:05:55 2013 -0400

          Fixing some formatting. Adding some comments. ACCUMULO-391

          commit 10b4eb8206ab4395ef2d4df375b52a7ffe77d655
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sun Sep 29 20:37:07 2013 -0400

          ACCUMULO-1732 Using table id in RangeInputSplit so that it can be resolved back to "working" table name in mappers. Scanner uses the "working" table name while everything else can still safely use the original configured table name.

          commit 7b8585f0333c09674f7612b4dc24887f684413fe
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 23:23:48 2013 -0400

          Removing deprecation for now until we have some discussions. Updating/adding comments. ACCUMULO-391

          commit 273ee49530de28c2c5dfe39c80ab0c90c3c3a95f
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 23:01:04 2013 -0400

          The legacy mapred InputFormatBase now verifies (and fixes the scanner for) a possible change in table name that could happen between the configuration of the map/reduce job and the actual processing of the scanner for a specific split. In that case, the most recent table name associated with the id is always used for the scanner (though the table name that was expected during job setup is still used in the RangeInputSplit). ACCUMULO-391

          commit e6a7c962f707487d832ba4b16c1f9066d13ff8f1
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 22:53:42 2013 -0400

          The original single-table setters/getters now populate a "default" TableQueryConfig object under the hood. This should make the switch over much easier. Deprecated single table methods in light of the API changes for the new configuration object. ACCUMULO-391

          commit fdf4cadb16c29fc03a610cf83399ee26d7f83bc9
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 21:58:40 2013 -0400

          Adding new TableQueryConfig object for setting multiple table info in the InputFormatBase

          Show
          jira-bot ASF subversion and git services added a comment - Commit b96701f220ecb3e891a71741179b867429fa1d39 in branch refs/heads/master from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=b96701f ] Squashed commit of the following: commit 3227a822379718d6c1297f11d7af37a716f78a60 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Tue Oct 1 23:20:34 2013 -0400 Adding the following: Deprecation to InputConfigurator, mapred.InputFormatBase, mapreduce.InputFormatBase Comments to TableQueryConfig Multi-table support to mapred.InputFormatBase ACCUMULO-391 commit 6648e8a1c97939f740b24f9368ecda9f7072cbd2 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Tue Oct 1 21:45:37 2013 -0400 Fixing some more formatting. Adding license headers. ACCUMULO-391 commit 53bcc85689510fc988c9e9f6aff0da0cb7091c6c Author: Corey J. Nolet <cjnolet@gmail.com> Date: Mon Sep 30 21:01:55 2013 -0400 Cleaning up tests. Adding test for legacy input for base + new multi-table methods. ACCUMULO-391 commit e4e05c804ea7f486290181f0246cf6b2880f5d1a Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sun Sep 29 21:05:55 2013 -0400 Fixing some formatting. Adding some comments. ACCUMULO-391 commit 10b4eb8206ab4395ef2d4df375b52a7ffe77d655 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sun Sep 29 20:37:07 2013 -0400 ACCUMULO-1732 Using table id in RangeInputSplit so that it can be resolved back to "working" table name in mappers. Scanner uses the "working" table name while everything else can still safely use the original configured table name. commit 7b8585f0333c09674f7612b4dc24887f684413fe Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 23:23:48 2013 -0400 Removing deprecation for now until we have some discussions. Updating/adding comments. ACCUMULO-391 commit 273ee49530de28c2c5dfe39c80ab0c90c3c3a95f Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 23:01:04 2013 -0400 The legacy mapred InputFormatBase now verifies (and fixes the scanner for) a possible change in table name that could happen between the configuration of the map/reduce job and the actual processing of the scanner for a specific split. In that case, the most recent table name associated with the id is always used for the scanner (though the table name that was expected during job setup is still used in the RangeInputSplit). ACCUMULO-391 commit e6a7c962f707487d832ba4b16c1f9066d13ff8f1 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 22:53:42 2013 -0400 The original single-table setters/getters now populate a "default" TableQueryConfig object under the hood. This should make the switch over much easier. Deprecated single table methods in light of the API changes for the new configuration object. ACCUMULO-391 commit fdf4cadb16c29fc03a610cf83399ee26d7f83bc9 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 21:58:40 2013 -0400 Adding new TableQueryConfig object for setting multiple table info in the InputFormatBase
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit b96701f220ecb3e891a71741179b867429fa1d39 in branch refs/heads/master from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=b96701f ]

          Squashed commit of the following:

          commit 3227a822379718d6c1297f11d7af37a716f78a60
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Tue Oct 1 23:20:34 2013 -0400

          Adding the following:

          • Deprecation to InputConfigurator, mapred.InputFormatBase, mapreduce.InputFormatBase
          • Comments to TableQueryConfig
          • Multi-table support to mapred.InputFormatBase

          ACCUMULO-391

          commit 6648e8a1c97939f740b24f9368ecda9f7072cbd2
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Tue Oct 1 21:45:37 2013 -0400

          Fixing some more formatting. Adding license headers. ACCUMULO-391

          commit 53bcc85689510fc988c9e9f6aff0da0cb7091c6c
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Mon Sep 30 21:01:55 2013 -0400

          Cleaning up tests. Adding test for legacy input for base + new multi-table methods. ACCUMULO-391

          commit e4e05c804ea7f486290181f0246cf6b2880f5d1a
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sun Sep 29 21:05:55 2013 -0400

          Fixing some formatting. Adding some comments. ACCUMULO-391

          commit 10b4eb8206ab4395ef2d4df375b52a7ffe77d655
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sun Sep 29 20:37:07 2013 -0400

          ACCUMULO-1732 Using table id in RangeInputSplit so that it can be resolved back to "working" table name in mappers. Scanner uses the "working" table name while everything else can still safely use the original configured table name.

          commit 7b8585f0333c09674f7612b4dc24887f684413fe
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 23:23:48 2013 -0400

          Removing deprecation for now until we have some discussions. Updating/adding comments. ACCUMULO-391

          commit 273ee49530de28c2c5dfe39c80ab0c90c3c3a95f
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 23:01:04 2013 -0400

          The legacy mapred InputFormatBase now verifies (and fixes the scanner for) a possible change in table name that could happen between the configuration of the map/reduce job and the actual processing of the scanner for a specific split. In that case, the most recent table name associated with the id is always used for the scanner (though the table name that was expected during job setup is still used in the RangeInputSplit). ACCUMULO-391

          commit e6a7c962f707487d832ba4b16c1f9066d13ff8f1
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 22:53:42 2013 -0400

          The original single-table setters/getters now populate a "default" TableQueryConfig object under the hood. This should make the switch over much easier. Deprecated single table methods in light of the API changes for the new configuration object. ACCUMULO-391

          commit fdf4cadb16c29fc03a610cf83399ee26d7f83bc9
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 21:58:40 2013 -0400

          Adding new TableQueryConfig object for setting multiple table info in the InputFormatBase

          Show
          jira-bot ASF subversion and git services added a comment - Commit b96701f220ecb3e891a71741179b867429fa1d39 in branch refs/heads/master from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=b96701f ] Squashed commit of the following: commit 3227a822379718d6c1297f11d7af37a716f78a60 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Tue Oct 1 23:20:34 2013 -0400 Adding the following: Deprecation to InputConfigurator, mapred.InputFormatBase, mapreduce.InputFormatBase Comments to TableQueryConfig Multi-table support to mapred.InputFormatBase ACCUMULO-391 commit 6648e8a1c97939f740b24f9368ecda9f7072cbd2 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Tue Oct 1 21:45:37 2013 -0400 Fixing some more formatting. Adding license headers. ACCUMULO-391 commit 53bcc85689510fc988c9e9f6aff0da0cb7091c6c Author: Corey J. Nolet <cjnolet@gmail.com> Date: Mon Sep 30 21:01:55 2013 -0400 Cleaning up tests. Adding test for legacy input for base + new multi-table methods. ACCUMULO-391 commit e4e05c804ea7f486290181f0246cf6b2880f5d1a Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sun Sep 29 21:05:55 2013 -0400 Fixing some formatting. Adding some comments. ACCUMULO-391 commit 10b4eb8206ab4395ef2d4df375b52a7ffe77d655 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sun Sep 29 20:37:07 2013 -0400 ACCUMULO-1732 Using table id in RangeInputSplit so that it can be resolved back to "working" table name in mappers. Scanner uses the "working" table name while everything else can still safely use the original configured table name. commit 7b8585f0333c09674f7612b4dc24887f684413fe Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 23:23:48 2013 -0400 Removing deprecation for now until we have some discussions. Updating/adding comments. ACCUMULO-391 commit 273ee49530de28c2c5dfe39c80ab0c90c3c3a95f Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 23:01:04 2013 -0400 The legacy mapred InputFormatBase now verifies (and fixes the scanner for) a possible change in table name that could happen between the configuration of the map/reduce job and the actual processing of the scanner for a specific split. In that case, the most recent table name associated with the id is always used for the scanner (though the table name that was expected during job setup is still used in the RangeInputSplit). ACCUMULO-391 commit e6a7c962f707487d832ba4b16c1f9066d13ff8f1 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 22:53:42 2013 -0400 The original single-table setters/getters now populate a "default" TableQueryConfig object under the hood. This should make the switch over much easier. Deprecated single table methods in light of the API changes for the new configuration object. ACCUMULO-391 commit fdf4cadb16c29fc03a610cf83399ee26d7f83bc9 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 21:58:40 2013 -0400 Adding new TableQueryConfig object for setting multiple table info in the InputFormatBase
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit b96701f220ecb3e891a71741179b867429fa1d39 in branch refs/heads/master from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=b96701f ]

          Squashed commit of the following:

          commit 3227a822379718d6c1297f11d7af37a716f78a60
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Tue Oct 1 23:20:34 2013 -0400

          Adding the following:

          • Deprecation to InputConfigurator, mapred.InputFormatBase, mapreduce.InputFormatBase
          • Comments to TableQueryConfig
          • Multi-table support to mapred.InputFormatBase

          ACCUMULO-391

          commit 6648e8a1c97939f740b24f9368ecda9f7072cbd2
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Tue Oct 1 21:45:37 2013 -0400

          Fixing some more formatting. Adding license headers. ACCUMULO-391

          commit 53bcc85689510fc988c9e9f6aff0da0cb7091c6c
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Mon Sep 30 21:01:55 2013 -0400

          Cleaning up tests. Adding test for legacy input for base + new multi-table methods. ACCUMULO-391

          commit e4e05c804ea7f486290181f0246cf6b2880f5d1a
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sun Sep 29 21:05:55 2013 -0400

          Fixing some formatting. Adding some comments. ACCUMULO-391

          commit 10b4eb8206ab4395ef2d4df375b52a7ffe77d655
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sun Sep 29 20:37:07 2013 -0400

          ACCUMULO-1732 Using table id in RangeInputSplit so that it can be resolved back to "working" table name in mappers. Scanner uses the "working" table name while everything else can still safely use the original configured table name.

          commit 7b8585f0333c09674f7612b4dc24887f684413fe
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 23:23:48 2013 -0400

          Removing deprecation for now until we have some discussions. Updating/adding comments. ACCUMULO-391

          commit 273ee49530de28c2c5dfe39c80ab0c90c3c3a95f
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 23:01:04 2013 -0400

          The legacy mapred InputFormatBase now verifies (and fixes the scanner for) a possible change in table name that could happen between the configuration of the map/reduce job and the actual processing of the scanner for a specific split. In that case, the most recent table name associated with the id is always used for the scanner (though the table name that was expected during job setup is still used in the RangeInputSplit). ACCUMULO-391

          commit e6a7c962f707487d832ba4b16c1f9066d13ff8f1
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 22:53:42 2013 -0400

          The original single-table setters/getters now populate a "default" TableQueryConfig object under the hood. This should make the switch over much easier. Deprecated single table methods in light of the API changes for the new configuration object. ACCUMULO-391

          commit fdf4cadb16c29fc03a610cf83399ee26d7f83bc9
          Author: Corey J. Nolet <cjnolet@gmail.com>
          Date: Sat Sep 28 21:58:40 2013 -0400

          Adding new TableQueryConfig object for setting multiple table info in the InputFormatBase

          Show
          jira-bot ASF subversion and git services added a comment - Commit b96701f220ecb3e891a71741179b867429fa1d39 in branch refs/heads/master from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=b96701f ] Squashed commit of the following: commit 3227a822379718d6c1297f11d7af37a716f78a60 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Tue Oct 1 23:20:34 2013 -0400 Adding the following: Deprecation to InputConfigurator, mapred.InputFormatBase, mapreduce.InputFormatBase Comments to TableQueryConfig Multi-table support to mapred.InputFormatBase ACCUMULO-391 commit 6648e8a1c97939f740b24f9368ecda9f7072cbd2 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Tue Oct 1 21:45:37 2013 -0400 Fixing some more formatting. Adding license headers. ACCUMULO-391 commit 53bcc85689510fc988c9e9f6aff0da0cb7091c6c Author: Corey J. Nolet <cjnolet@gmail.com> Date: Mon Sep 30 21:01:55 2013 -0400 Cleaning up tests. Adding test for legacy input for base + new multi-table methods. ACCUMULO-391 commit e4e05c804ea7f486290181f0246cf6b2880f5d1a Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sun Sep 29 21:05:55 2013 -0400 Fixing some formatting. Adding some comments. ACCUMULO-391 commit 10b4eb8206ab4395ef2d4df375b52a7ffe77d655 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sun Sep 29 20:37:07 2013 -0400 ACCUMULO-1732 Using table id in RangeInputSplit so that it can be resolved back to "working" table name in mappers. Scanner uses the "working" table name while everything else can still safely use the original configured table name. commit 7b8585f0333c09674f7612b4dc24887f684413fe Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 23:23:48 2013 -0400 Removing deprecation for now until we have some discussions. Updating/adding comments. ACCUMULO-391 commit 273ee49530de28c2c5dfe39c80ab0c90c3c3a95f Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 23:01:04 2013 -0400 The legacy mapred InputFormatBase now verifies (and fixes the scanner for) a possible change in table name that could happen between the configuration of the map/reduce job and the actual processing of the scanner for a specific split. In that case, the most recent table name associated with the id is always used for the scanner (though the table name that was expected during job setup is still used in the RangeInputSplit). ACCUMULO-391 commit e6a7c962f707487d832ba4b16c1f9066d13ff8f1 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 22:53:42 2013 -0400 The original single-table setters/getters now populate a "default" TableQueryConfig object under the hood. This should make the switch over much easier. Deprecated single table methods in light of the API changes for the new configuration object. ACCUMULO-391 commit fdf4cadb16c29fc03a610cf83399ee26d7f83bc9 Author: Corey J. Nolet <cjnolet@gmail.com> Date: Sat Sep 28 21:58:40 2013 -0400 Adding new TableQueryConfig object for setting multiple table info in the InputFormatBase
          Hide
          bills William Slacum added a comment -

          From a high level perspective (aka "I haven't read the changeset yet"), the single table case should be implemented as a specific case of the multitable paradigm. If we can support the old interface, that's great, but it should be deprecated so that we're not bound to it.

          Show
          bills William Slacum added a comment - From a high level perspective (aka "I haven't read the changeset yet"), the single table case should be implemented as a specific case of the multitable paradigm. If we can support the old interface, that's great, but it should be deprecated so that we're not bound to it.
          Hide
          sonixbp Corey J. Nolet added a comment - - edited

          Keith Turner, William Slacum, and any other interested parties, there didn't seem to be a final consensus on whether or not the current single-table API methods should be deprecated in favor of a new multi-table configuration input method. Can I merge over what I have without deprecating the methods for now? If everyone in this thread is comfortable with the API changes, should we just deprecate them?

          Show
          sonixbp Corey J. Nolet added a comment - - edited Keith Turner , William Slacum , and any other interested parties, there didn't seem to be a final consensus on whether or not the current single-table API methods should be deprecated in favor of a new multi-table configuration input method. Can I merge over what I have without deprecating the methods for now? If everyone in this thread is comfortable with the API changes, should we just deprecate them?
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 3227a822379718d6c1297f11d7af37a716f78a60 in branch refs/heads/ACCUMULO-391 from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=3227a82 ]

          Adding the following:

          • Deprecation to InputConfigurator, mapred.InputFormatBase, mapreduce.InputFormatBase
          • Comments to TableQueryConfig
          • Multi-table support to mapred.InputFormatBase

          ACCUMULO-391

          Show
          jira-bot ASF subversion and git services added a comment - Commit 3227a822379718d6c1297f11d7af37a716f78a60 in branch refs/heads/ ACCUMULO-391 from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=3227a82 ] Adding the following: Deprecation to InputConfigurator, mapred.InputFormatBase, mapreduce.InputFormatBase Comments to TableQueryConfig Multi-table support to mapred.InputFormatBase ACCUMULO-391
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 6648e8a1c97939f740b24f9368ecda9f7072cbd2 in branch refs/heads/ACCUMULO-391 from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=6648e8a ]

          Fixing some more formatting. Adding license headers. ACCUMULO-391

          Show
          jira-bot ASF subversion and git services added a comment - Commit 6648e8a1c97939f740b24f9368ecda9f7072cbd2 in branch refs/heads/ ACCUMULO-391 from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=6648e8a ] Fixing some more formatting. Adding license headers. ACCUMULO-391
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 53bcc85689510fc988c9e9f6aff0da0cb7091c6c in branch refs/heads/ACCUMULO-391 from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=53bcc85 ]

          Cleaning up tests. Adding test for legacy input for base + new multi-table methods. ACCUMULO-391

          Show
          jira-bot ASF subversion and git services added a comment - Commit 53bcc85689510fc988c9e9f6aff0da0cb7091c6c in branch refs/heads/ ACCUMULO-391 from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=53bcc85 ] Cleaning up tests. Adding test for legacy input for base + new multi-table methods. ACCUMULO-391
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit e4e05c804ea7f486290181f0246cf6b2880f5d1a in branch refs/heads/ACCUMULO-391 from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=e4e05c8 ]

          Fixing some formatting. Adding some comments. ACCUMULO-391

          Show
          jira-bot ASF subversion and git services added a comment - Commit e4e05c804ea7f486290181f0246cf6b2880f5d1a in branch refs/heads/ ACCUMULO-391 from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=e4e05c8 ] Fixing some formatting. Adding some comments. ACCUMULO-391
          Hide
          sonixbp Corey J. Nolet added a comment -

          That's a great question Josh. Originally, I changed the @since tags to
          1.6.0 but since I only changed a few lines in the method, I couldn't
          justify the change to the javadoc. It would be nice if we could define a
          general practice for this.

          On another note- are we planning on deprecating the existing legacy methods
          for adding input to a single table? I'll add those in before I merge as
          well but I wanted to make sure that's what everyone wanted.

          Show
          sonixbp Corey J. Nolet added a comment - That's a great question Josh. Originally, I changed the @since tags to 1.6.0 but since I only changed a few lines in the method, I couldn't justify the change to the javadoc. It would be nice if we could define a general practice for this. On another note- are we planning on deprecating the existing legacy methods for adding input to a single table? I'll add those in before I merge as well but I wanted to make sure that's what everyone wanted.
          Hide
          elserj Josh Elser added a comment -

          Looks good to me. Nitpicking: can you run the formatter over TableQueryConfig and InputConfigurator before pulling into master? It looked like those were still a little goofy.

          A broader general question: InputFormatBase (mapred and mapreduce) both have a protected static method getTabletLocator method that is tagged with a @since 1.5.0 whose signature was changed with these changes. Do we just update the javadoc from since 1.5.0 to 1.6.0 because it wasn't a part of the "public API"? Do we have any general practice for javadoc tags on internal API changes?

          Show
          elserj Josh Elser added a comment - Looks good to me. Nitpicking: can you run the formatter over TableQueryConfig and InputConfigurator before pulling into master? It looked like those were still a little goofy. A broader general question: InputFormatBase (mapred and mapreduce) both have a protected static method getTabletLocator method that is tagged with a @since 1.5.0 whose signature was changed with these changes. Do we just update the javadoc from since 1.5.0 to 1.6.0 because it wasn't a part of the "public API"? Do we have any general practice for javadoc tags on internal API changes?
          Hide
          sonixbp Corey J. Nolet added a comment -

          I have the requested API changes in origin/ACCUMULO-391 if anyone would
          like to review them. I have held off on deprecating the old methods until
          there is consensus to do so.

          On Mon, Sep 30, 2013 at 8:09 PM, ASF subversion and git services (JIRA) <

          Show
          sonixbp Corey J. Nolet added a comment - I have the requested API changes in origin/ ACCUMULO-391 if anyone would like to review them. I have held off on deprecating the old methods until there is consensus to do so. On Mon, Sep 30, 2013 at 8:09 PM, ASF subversion and git services (JIRA) <
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 6fe1d53d64f303b14e2b040f41645efdab6c4e75 in branch refs/heads/ACCUMULO-391 from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=6fe1d53 ]

          Fixing some formatting. Adding some comments. ACCUMULO-391

          Show
          jira-bot ASF subversion and git services added a comment - Commit 6fe1d53d64f303b14e2b040f41645efdab6c4e75 in branch refs/heads/ ACCUMULO-391 from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=6fe1d53 ] Fixing some formatting. Adding some comments. ACCUMULO-391
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit fca2731f28827d54514188ac8334ad6f714b12df in branch refs/heads/ACCUMULO-391 from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=fca2731 ]

          The original single-table setters/getters now populate a "default" TableQueryConfig object under the hood. This should make the switch over much easier. Deprecated single table methods in light of the API changes for the new configuration object. ACCUMULO-391

          Show
          jira-bot ASF subversion and git services added a comment - Commit fca2731f28827d54514188ac8334ad6f714b12df in branch refs/heads/ ACCUMULO-391 from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=fca2731 ] The original single-table setters/getters now populate a "default" TableQueryConfig object under the hood. This should make the switch over much easier. Deprecated single table methods in light of the API changes for the new configuration object. ACCUMULO-391
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit c8a858323b69d5ba73dc684e9b0033772dbf2119 in branch refs/heads/ACCUMULO-391 from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=c8a8583 ]

          The legacy mapred InputFormatBase now verifies (and fixes the scanner for) a possible change in table name that could happen between the configuration of the map/reduce job and the actual processing of the scanner for a specific split. In that case, the most recent table name associated with the id is always used for the scanner (though the table name that was expected during job setup is still used in the RangeInputSplit). ACCUMULO-391

          Show
          jira-bot ASF subversion and git services added a comment - Commit c8a858323b69d5ba73dc684e9b0033772dbf2119 in branch refs/heads/ ACCUMULO-391 from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=c8a8583 ] The legacy mapred InputFormatBase now verifies (and fixes the scanner for) a possible change in table name that could happen between the configuration of the map/reduce job and the actual processing of the scanner for a specific split. In that case, the most recent table name associated with the id is always used for the scanner (though the table name that was expected during job setup is still used in the RangeInputSplit). ACCUMULO-391
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 793b5183b4caaa6b7da0cbbb53c26b4e1463eeaf in branch refs/heads/ACCUMULO-391 from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=793b518 ]

          Removing deprecation for now until we have some discussions. Updating/adding comments. ACCUMULO-391

          Show
          jira-bot ASF subversion and git services added a comment - Commit 793b5183b4caaa6b7da0cbbb53c26b4e1463eeaf in branch refs/heads/ ACCUMULO-391 from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=793b518 ] Removing deprecation for now until we have some discussions. Updating/adding comments. ACCUMULO-391
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 01b8f2a49dea1fc4c73260576091c5cf288386c7 in branch refs/heads/ACCUMULO-391 from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=01b8f2a ]

          The original single-table setters/getters now populate a "default" TableQueryConfig object under the hood. This should make the switch over much easier. Deprecated single table methods in light of the API changes for the new configuration object. ACCUMULO-391

          Show
          jira-bot ASF subversion and git services added a comment - Commit 01b8f2a49dea1fc4c73260576091c5cf288386c7 in branch refs/heads/ ACCUMULO-391 from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=01b8f2a ] The original single-table setters/getters now populate a "default" TableQueryConfig object under the hood. This should make the switch over much easier. Deprecated single table methods in light of the API changes for the new configuration object. ACCUMULO-391
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit d340d82c08d4b2181d6900cea1455913f268ba6e in branch refs/heads/ACCUMULO-391 from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=d340d82 ]

          The legacy mapred InputFormatBase now verifies (and fixes the scanner for) a possible change in table name that could happen between the configuration of the map/reduce job and the actual processing of the scanner for a specific split. In that case, the most recent table name associated with the id is always used for the scanner (though the table name that was expected during job setup is still used in the RangeInputSplit). ACCUMULO-391

          Show
          jira-bot ASF subversion and git services added a comment - Commit d340d82c08d4b2181d6900cea1455913f268ba6e in branch refs/heads/ ACCUMULO-391 from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=d340d82 ] The legacy mapred InputFormatBase now verifies (and fixes the scanner for) a possible change in table name that could happen between the configuration of the map/reduce job and the actual processing of the scanner for a specific split. In that case, the most recent table name associated with the id is always used for the scanner (though the table name that was expected during job setup is still used in the RangeInputSplit). ACCUMULO-391
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 75cccccb3e73d86aab77728a2813d8ade79efbe7 in branch refs/heads/ACCUMULO-391 from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=75ccccc ]

          Removing deprecation for now until we have some discussions. Updating/adding comments. ACCUMULO-391

          Show
          jira-bot ASF subversion and git services added a comment - Commit 75cccccb3e73d86aab77728a2813d8ade79efbe7 in branch refs/heads/ ACCUMULO-391 from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=75ccccc ] Removing deprecation for now until we have some discussions. Updating/adding comments. ACCUMULO-391
          Hide
          kturner Keith Turner added a comment -

          I don't mind making some or all of the API changes that we've discussed here but if you think it'd be better for you to implement them then I don't mind that either.

          I only wanted to pick it up if you were not interested. I do not have a specific solution in mind. William Slacum raises a good point. If the java API is improved, it may not be possible to misconfigure through the API but a user could still create a wacky config by setting properties directly on the job config. So we will still need to sanity check the config and it sounds like you may already have that done.

          Show
          kturner Keith Turner added a comment - I don't mind making some or all of the API changes that we've discussed here but if you think it'd be better for you to implement them then I don't mind that either. I only wanted to pick it up if you were not interested. I do not have a specific solution in mind. William Slacum raises a good point. If the java API is improved, it may not be possible to misconfigure through the API but a user could still create a wacky config by setting properties directly on the job config. So we will still need to sanity check the config and it sounds like you may already have that done.
          Hide
          sonixbp Corey J. Nolet added a comment -

          [~keith_turner], I finished the requested changes last night locally and I implemented ACCUMULO-1732 as well. I do have an exception being thrown when a table being referenced in setRanges(), addIterator(), or fetchColumns() has not been set- though it looks like the requested API changes will fix that.

          Anyways, I was going to push those up to the remote branch tonight. I also made the requested formatting changes (and fixed intelli-j). I don't mind making some or all of the API changes that we've discussed here but if you think it'd be better for you to implement them then I don't mind that either. Anything I can do to help

          Show
          sonixbp Corey J. Nolet added a comment - [~keith_turner] , I finished the requested changes last night locally and I implemented ACCUMULO-1732 as well. I do have an exception being thrown when a table being referenced in setRanges(), addIterator(), or fetchColumns() has not been set- though it looks like the requested API changes will fix that. Anyways, I was going to push those up to the remote branch tonight. I also made the requested formatting changes (and fixed intelli-j). I don't mind making some or all of the API changes that we've discussed here but if you think it'd be better for you to implement them then I don't mind that either. Anything I can do to help
          Hide
          kturner Keith Turner added a comment -

          Corey J. Nolet I do not want to impede progress, this will be a very nice feature to have for 1.6.0. I would like to see the API improved and I can spend time on that. How can we move forward? One possible way we could proceed is that you iron out all of the issues identified except for configuring tables not read, and I can pick up from there. I can work in the branch you created (you could rebase it first if you wanted a more concise commit history).

          Show
          kturner Keith Turner added a comment - Corey J. Nolet I do not want to impede progress, this will be a very nice feature to have for 1.6.0. I would like to see the API improved and I can spend time on that. How can we move forward? One possible way we could proceed is that you iron out all of the issues identified except for configuring tables not read, and I can pick up from there. I can work in the branch you created (you could rebase it first if you wanted a more concise commit history).
          Hide
          sonixbp Corey J. Nolet added a comment - - edited

          I don't like enforcing the user to follow a specific configuration order. I know they'll only need to configure it once but that's a tedious trial and error process until they either pull down the codebase or figure out the right order in which to call the methods. Perhaps a nice warning during getInputSplits() or initialize() in the mappers would be enough for someone to see in the logs why their stuff failed (or an exception). I agree with William- they'll only need to do this once in most cases.

          On the other topic- the iterators, ranges, and columns are inherently tied to a table. In the case of a single table input format, I can see why separate methods could be used. I like the idea of having a TableConfiguration object that has the iterators, ranges, and columns serialized within it. It would simplify the API immensely as well as the concerns that each configuration is in a valid state by the time the getInputSplits() method is called. Perhaps this could also be used in the MultiTableBatchScanner implementation.

          That's a significant API change to introduce in 1.6.0. We could get away with backwards compatibility by having the current set table methods (setting a single table) hydrate a TableConfiguration object under the hood that could be treated as a "default table".

          Show
          sonixbp Corey J. Nolet added a comment - - edited I don't like enforcing the user to follow a specific configuration order. I know they'll only need to configure it once but that's a tedious trial and error process until they either pull down the codebase or figure out the right order in which to call the methods. Perhaps a nice warning during getInputSplits() or initialize() in the mappers would be enough for someone to see in the logs why their stuff failed (or an exception). I agree with William- they'll only need to do this once in most cases. On the other topic- the iterators, ranges, and columns are inherently tied to a table. In the case of a single table input format, I can see why separate methods could be used. I like the idea of having a TableConfiguration object that has the iterators, ranges, and columns serialized within it. It would simplify the API immensely as well as the concerns that each configuration is in a valid state by the time the getInputSplits() method is called. Perhaps this could also be used in the MultiTableBatchScanner implementation. That's a significant API change to introduce in 1.6.0. We could get away with backwards compatibility by having the current set table methods (setting a single table) hydrate a TableConfiguration object under the hood that could be treated as a "default table".
          Hide
          bills William Slacum added a comment - - edited

          "jrrbs" == "jobs." As in, "They took our jrrbs!"

          Even so, the case of debugging that you're talking about will only happen when someone is just using raw strings as the arguments and not a variable (if the variable changes, the problem becomes applying the wrong iterator stack to a table, which I don't think we can really enforce). Until the mechanism to configure jobs isn't just applying partial mutations to a Configuration object, we're going to deal with issues like this, even in a single table case. Any hardening of the API should happen after we've at least added this feature so that we have it now and can consider it for the API change.

          Show
          bills William Slacum added a comment - - edited "jrrbs" == "jobs." As in, "They took our jrrbs!" Even so, the case of debugging that you're talking about will only happen when someone is just using raw strings as the arguments and not a variable (if the variable changes, the problem becomes applying the wrong iterator stack to a table, which I don't think we can really enforce). Until the mechanism to configure jobs isn't just applying partial mutations to a Configuration object, we're going to deal with issues like this, even in a single table case. Any hardening of the API should happen after we've at least added this feature so that we have it now and can consider it for the API change.
          Hide
          kturner Keith Turner added a comment -

          There's nothing tying us to the current method of configuring jrrbs

          What are "jrrbs"?

          I wouldn't worry about having things configured for tables that aren't going to be scanned. I can't see a bug happening from it that wouldn't be caught with minimal exercise/testing.

          I am pretty sure people will end up spending lots of time trying to figure out why their scan time iterators are not being executed because they mistyped a table name. I agree that they will most likely hit this while testing, but its still wasting a developers time.

          Show
          kturner Keith Turner added a comment - There's nothing tying us to the current method of configuring jrrbs What are "jrrbs"? I wouldn't worry about having things configured for tables that aren't going to be scanned. I can't see a bug happening from it that wouldn't be caught with minimal exercise/testing. I am pretty sure people will end up spending lots of time trying to figure out why their scan time iterators are not being executed because they mistyped a table name. I agree that they will most likely hit this while testing, but its still wasting a developers time.
          Hide
          kturner Keith Turner added a comment -

          Thinking more about this issue of configuring tables you are not reading from, its become really clear to me that its a bug in the API. The API should not make this possible. API bugs are the most difficult to fix later, so lets fix this issue before adding it to 1.6.

          Show
          kturner Keith Turner added a comment - Thinking more about this issue of configuring tables you are not reading from, its become really clear to me that its a bug in the API. The API should not make this possible. API bugs are the most difficult to fix later, so lets fix this issue before adding it to 1.6.
          Hide
          bills William Slacum added a comment -

          I've been a proponent of what Keith is talking about for a while. There's nothing tying us to the current method of configuring jrrbs besides convention (which I'm all for throwing out if the only reason for doing is "everyone else is doing it"). The current Configurator mechanism we have now is a mess and there are constraints dumped into the Configuration (such as the only-setting-connection-info-once constraint we have) that user has to jump through at least one hoop to even check. I think this discussion is best suited to another ticket.

          To get back on topic, I wouldn't worry about having things configured for tables that aren't going to be scanned. It'd be nice to warn about it, but it's not a show stopper. It's extra state that goes unevaluated, so really it's just trimming the fat. I can't see a bug happening from it that wouldn't be caught with minimal exercise/testing.

          Show
          bills William Slacum added a comment - I've been a proponent of what Keith is talking about for a while. There's nothing tying us to the current method of configuring jrrbs besides convention (which I'm all for throwing out if the only reason for doing is "everyone else is doing it"). The current Configurator mechanism we have now is a mess and there are constraints dumped into the Configuration (such as the only-setting-connection-info-once constraint we have) that user has to jump through at least one hoop to even check. I think this discussion is best suited to another ticket. To get back on topic, I wouldn't worry about having things configured for tables that aren't going to be scanned. It'd be nice to warn about it, but it's not a show stopper. It's extra state that goes unevaluated, so really it's just trimming the fat. I can't see a bug happening from it that wouldn't be caught with minimal exercise/testing.
          Hide
          vines John Vines added a comment -

          Personally, I am NOT a fan of referencing external static methods this way.

          Show
          vines John Vines added a comment - Personally, I am NOT a fan of referencing external static methods this way.
          Hide
          kturner Keith Turner added a comment -

          What do you think about taking the union of all the tables configured as input tablets, or with ranges, iterators, or fetched columns?

          I think this may be easier to use but more error prone. There are more places where you can mistype a table name.

          A fluent class for configuring tables to read would be another alternative. The class could serialize itself to job or a static method could take job and varargs of these table config classes. This completely eliminates the problem from the API, the API does not allow setting configuration for a table you are not reading and the table name is only specified once (in the constructor of the config object).

          Show
          kturner Keith Turner added a comment - What do you think about taking the union of all the tables configured as input tablets, or with ranges, iterators, or fetched columns? I think this may be easier to use but more error prone. There are more places where you can mistype a table name. A fluent class for configuring tables to read would be another alternative. The class could serialize itself to job or a static method could take job and varargs of these table config classes. This completely eliminates the problem from the API, the API does not allow setting configuration for a table you are not reading and the table name is only specified once (in the constructor of the config object).
          Hide
          kturner Keith Turner added a comment -

          I've been habitually importing static methods to make the code easier to read but I can change that back easily if it's confusing.

          Making code improvements that are not strictly required for the task at hand is something we all do. There are many considerations, do what you think is best.

          Show
          kturner Keith Turner added a comment - I've been habitually importing static methods to make the code easier to read but I can change that back easily if it's confusing. Making code improvements that are not strictly required for the task at hand is something we all do. There are many considerations, do what you think is best.
          Hide
          kturner Keith Turner added a comment -

          I was trying to stay away from enforcing the order in which the user calls the static configuration methods but maybe this is unavoidable

          It does not have to fail fast, it could fail when its trying to create the input splits. It would just be nice if it fails at some point if a user specified configuration for a table thats not being read. Failing later will make it slightly harder to debug, but much easier to debug than silently ignoring.

          Show
          kturner Keith Turner added a comment - I was trying to stay away from enforcing the order in which the user calls the static configuration methods but maybe this is unavoidable It does not have to fail fast, it could fail when its trying to create the input splits. It would just be nice if it fails at some point if a user specified configuration for a table thats not being read. Failing later will make it slightly harder to debug, but much easier to debug than silently ignoring.
          Hide
          billie.rinaldi Billie Rinaldi added a comment -

          What do you think about taking the union of all the tables configured as input tablets, or with ranges, iterators, or fetched columns?

          Show
          billie.rinaldi Billie Rinaldi added a comment - What do you think about taking the union of all the tables configured as input tablets, or with ranges, iterators, or fetched columns?
          Hide
          sonixbp Corey J. Nolet added a comment -

          What happens if I configure iterators for a table that I did not set as an input?

          I was trying to stay away from enforcing the order in which the user calls the static configuration methods but maybe this is unavoidable. I could always throw an exception in the initialize() method but that would be too late.

          Show
          sonixbp Corey J. Nolet added a comment - What happens if I configure iterators for a table that I did not set as an input? I was trying to stay away from enforcing the order in which the user calls the static configuration methods but maybe this is unavoidable. I could always throw an exception in the initialize() method but that would be too late.
          Hide
          sonixbp Corey J. Nolet added a comment -

          Something appears to be adding a space between a method name and the parenthesis,

          I'm using a new version of intelli-j and a I got a new mac book pro recently. Once upon a time I had the Eclipse codestyle plugin installed and I seemed to have tweaked it correctly- though it looks like I need to do that on this new system. I just noticed the space between the parens. I'll have to look in the intelli-j settings to figure out why that happened.

          instead of just InputFormatBase, changing calls to this method from AccumuloInputFormat.addIterator to just addIterator. I found this to be confusing.

          I've been habitually importing static methods to make the code easier to read but I can change that back easily if it's confusing.

          Keith Turner I'll make the changes you requested and work on ticket ACCUMULO-1732 before merging the multi-table import format.

          Show
          sonixbp Corey J. Nolet added a comment - Something appears to be adding a space between a method name and the parenthesis, I'm using a new version of intelli-j and a I got a new mac book pro recently. Once upon a time I had the Eclipse codestyle plugin installed and I seemed to have tweaked it correctly- though it looks like I need to do that on this new system. I just noticed the space between the parens. I'll have to look in the intelli-j settings to figure out why that happened. instead of just InputFormatBase, changing calls to this method from AccumuloInputFormat.addIterator to just addIterator. I found this to be confusing. I've been habitually importing static methods to make the code easier to read but I can change that back easily if it's confusing. Keith Turner I'll make the changes you requested and work on ticket ACCUMULO-1732 before merging the multi-table import format.
          Hide
          billie.rinaldi Billie Rinaldi added a comment -

          The changes to AccumuloRowInputFormatTest only add printlns, so that diff should be removed as well. There's a println added to TokenFileTest which should be removed.

          I agree with Keith that we should keep the setInputTableName method undeprecated.

          Show
          billie.rinaldi Billie Rinaldi added a comment - The changes to AccumuloRowInputFormatTest only add printlns, so that diff should be removed as well. There's a println added to TokenFileTest which should be removed. I agree with Keith that we should keep the setInputTableName method undeprecated.
          Hide
          billie.rinaldi Billie Rinaldi added a comment -

          Also it looks like TypedValueCombiner, TTimeoutTransport, and core/src/test/resources/log4j.properties shouldn't be included in the patch.

          Which IDE are you using? Something appears to be adding a space between a method name and the parenthesis, e.g. "getInstance (" which doesn't match our code style.

          Also, it may be your IDE that has changed the AccumuloInputFormatTest to import InputFormatBase.addIterator and other static methods instead of just InputFormatBase, changing calls to this method from AccumuloInputFormat.addIterator to just addIterator. I found this to be confusing.

          Show
          billie.rinaldi Billie Rinaldi added a comment - Also it looks like TypedValueCombiner, TTimeoutTransport, and core/src/test/resources/log4j.properties shouldn't be included in the patch. Which IDE are you using? Something appears to be adding a space between a method name and the parenthesis, e.g. "getInstance (" which doesn't match our code style. Also, it may be your IDE that has changed the AccumuloInputFormatTest to import InputFormatBase.addIterator and other static methods instead of just InputFormatBase, changing calls to this method from AccumuloInputFormat.addIterator to just addIterator. I found this to be confusing.
          Hide
          sonixbp Corey J. Nolet added a comment -

          Yeah I used git to make the patch, but I believe I did a git reset --hard HEAD when I was done my commit just to get rid of any excess files.

          Show
          sonixbp Corey J. Nolet added a comment - Yeah I used git to make the patch, but I believe I did a git reset --hard HEAD when I was done my commit just to get rid of any excess files.
          Hide
          elserj Josh Elser added a comment -

          I may have meant to do a 'git rm' and did not. I presume it may have showed up again when I made the patch. I'll remove that file.

          If you make a commit and use git-format-patch, it should help simplify the patches (http://accumulo.apache.org/git.html#contributors)

          Show
          elserj Josh Elser added a comment - I may have meant to do a 'git rm' and did not. I presume it may have showed up again when I made the patch. I'll remove that file. If you make a commit and use git-format-patch, it should help simplify the patches ( http://accumulo.apache.org/git.html#contributors )
          Hide
          sonixbp Corey J. Nolet added a comment - - edited

          Sorry about that Billie. I thought I removed that file but I may have meant to do a 'git rm' and did not. I presume it may have showed up again when I made the patch. I'll remove that file.

          Show
          sonixbp Corey J. Nolet added a comment - - edited Sorry about that Billie. I thought I removed that file but I may have meant to do a 'git rm' and did not. I presume it may have showed up again when I made the patch. I'll remove that file.
          Hide
          billie.rinaldi Billie Rinaldi added a comment - - edited

          Anyone have comments on using TableKey vs. a plain Key?

          Nevermind, it looks like TableKey is still there but isn't being used. I'm still looking over the patch, but the only comment I have so far is that it seems to include diffs for a couple of files it shouldn't.

          Show
          billie.rinaldi Billie Rinaldi added a comment - - edited Anyone have comments on using TableKey vs. a plain Key? Nevermind, it looks like TableKey is still there but isn't being used. I'm still looking over the patch, but the only comment I have so far is that it seems to include diffs for a couple of files it shouldn't.
          Hide
          kturner Keith Turner added a comment -

          While looking at the patch I thought of ACCUMULO-1732. For some reason this change made that issue really apparent to me.

          Show
          kturner Keith Turner added a comment - While looking at the patch I thought of ACCUMULO-1732 . For some reason this change made that issue really apparent to me.
          Hide
          kturner Keith Turner added a comment - - edited

          A few thoughts

          • Why deprecate setting a single input table? This is a common use case.
          • I think table should come after job in method parameters. Like addIterator(job, table, iterCfg) instead of addIterator(job, iterCfg, table).
          • What happens if I configure iterators for a table that I did not set as an input?

          Also, some formatting changes like the following seem odd.

          -    InputConfigurator.setAutoAdjustRanges(CLASS, job, enableFeature);
          +    InputConfigurator.setAutoAdjustRanges (CLASS, job, enableFeature);
          
          Show
          kturner Keith Turner added a comment - - edited A few thoughts Why deprecate setting a single input table? This is a common use case. I think table should come after job in method parameters. Like addIterator(job, table, iterCfg) instead of addIterator(job, iterCfg, table). What happens if I configure iterators for a table that I did not set as an input? Also, some formatting changes like the following seem odd. - InputConfigurator.setAutoAdjustRanges(CLASS, job, enableFeature); + InputConfigurator.setAutoAdjustRanges (CLASS, job, enableFeature);
          Hide
          sonixbp Corey J. Nolet added a comment -

          I've addressed all the issues on this thread- is everyone okay with me merging this in?

          Show
          sonixbp Corey J. Nolet added a comment - I've addressed all the issues on this thread- is everyone okay with me merging this in?
          Hide
          sonixbp Corey J. Nolet added a comment - - edited

          Patch for 1.6.0 branch. Let me know if you guys accept this and I'll merge my origin/ACCUMULO-391 branch over to master (I've already merged master into it so it's up to date as of right now).

          Show
          sonixbp Corey J. Nolet added a comment - - edited Patch for 1.6.0 branch. Let me know if you guys accept this and I'll merge my origin/ ACCUMULO-391 branch over to master (I've already merged master into it so it's up to date as of right now).
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit e1a39c0ec936c7cf4638c379d3c3db9b3335d99f in branch refs/heads/ACCUMULO-391 from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=e1a39c0 ]

          Removing unused files. Finalizing unit tests and documentation of InputFormatBase. ACCUMULO-391

          Show
          jira-bot ASF subversion and git services added a comment - Commit e1a39c0ec936c7cf4638c379d3c3db9b3335d99f in branch refs/heads/ ACCUMULO-391 from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=e1a39c0 ] Removing unused files. Finalizing unit tests and documentation of InputFormatBase. ACCUMULO-391
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 335acb8def354990a80cc081428a28d671f7e25c in branch refs/heads/ACCUMULO-391 from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=335acb8 ]

          Merging map/reduce test for AccumuloInputFormat with the MultiTable version. ACCUMULO-391

          Show
          jira-bot ASF subversion and git services added a comment - Commit 335acb8def354990a80cc081428a28d671f7e25c in branch refs/heads/ ACCUMULO-391 from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=335acb8 ] Merging map/reduce test for AccumuloInputFormat with the MultiTable version. ACCUMULO-391
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit d344ad6d112c9268480b6baca2f8700fd7d5a2a3 in branch refs/heads/ACCUMULO-391 from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=d344ad6 ]

          Cleaning things up. Merging MultiTableInputFormatTest with AccumuloInputFormatTest. Columns now able to be set globally or per-table. ACCUMULO-391

          Show
          jira-bot ASF subversion and git services added a comment - Commit d344ad6d112c9268480b6baca2f8700fd7d5a2a3 in branch refs/heads/ ACCUMULO-391 from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=d344ad6 ] Cleaning things up. Merging MultiTableInputFormatTest with AccumuloInputFormatTest. Columns now able to be set globally or per-table. ACCUMULO-391
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit f8d28e7be09ed8947569e6206768f66c6db31267 in branch refs/heads/ACCUMULO-391 from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=f8d28e7 ]

          ACCUMULO-391. Iterators can be set per table. The addIterator() method that does not take a table now applies that iterator to all tables.

          Show
          jira-bot ASF subversion and git services added a comment - Commit f8d28e7be09ed8947569e6206768f66c6db31267 in branch refs/heads/ ACCUMULO-391 from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=f8d28e7 ] ACCUMULO-391 . Iterators can be set per table. The addIterator() method that does not take a table now applies that iterator to all tables.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 48c2d30b862baf11d2735e0d001f96e9207753e1 in branch refs/heads/ACCUMULO-391 from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=48c2d30 ]

          ACCUMULO-391 columns can be set per table

          Show
          jira-bot ASF subversion and git services added a comment - Commit 48c2d30b862baf11d2735e0d001f96e9207753e1 in branch refs/heads/ ACCUMULO-391 from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=48c2d30 ] ACCUMULO-391 columns can be set per table
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 8c76062a2896979c7ee65baf8156b864cd84129a in branch refs/heads/ACCUMULO-391 from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=8c76062 ]

          Getting tests running for multi table input format. Still need to provide iterator serialization. ACCUMULO-391

          Show
          jira-bot ASF subversion and git services added a comment - Commit 8c76062a2896979c7ee65baf8156b864cd84129a in branch refs/heads/ ACCUMULO-391 from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=8c76062 ] Getting tests running for multi table input format. Still need to provide iterator serialization. ACCUMULO-391
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit e781879d217476793c18725476453a58e9258d02 in branch refs/heads/ACCUMULO-391 from Corey J. Nolet
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=e781879 ]

          Pulled William Slacum's multi-table input format into the current design with the InputConfigurator. ACCUMULO-391

          Show
          jira-bot ASF subversion and git services added a comment - Commit e781879d217476793c18725476453a58e9258d02 in branch refs/heads/ ACCUMULO-391 from Corey J. Nolet [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=e781879 ] Pulled William Slacum's multi-table input format into the current design with the InputConfigurator. ACCUMULO-391
          Hide
          sonixbp Corey J. Nolet added a comment - - edited

          Agreed- I've got a remote branch pushed up last night with the initial patch applied as-is. The refactor of the constants into the MetadataSchema, the removal of the ContextFactory, and a couple other small changes broke the patch. I'm working through getting the tests running in their current state and I'll work through modifying the current InputFormatBase to provide multi-table support (and satisfying your comments above).

          Show
          sonixbp Corey J. Nolet added a comment - - edited Agreed- I've got a remote branch pushed up last night with the initial patch applied as-is. The refactor of the constants into the MetadataSchema, the removal of the ContextFactory, and a couple other small changes broke the patch. I'm working through getting the tests running in their current state and I'll work through modifying the current InputFormatBase to provide multi-table support (and satisfying your comments above).
          Hide
          billie.rinaldi@gmail.com Billie Rinaldi added a comment -

          -1 for this patch as is. My comments above have not been addressed.

          Show
          billie.rinaldi@gmail.com Billie Rinaldi added a comment - -1 for this patch as is. My comments above have not been addressed.
          Hide
          bills William Slacum added a comment -

          YES I MIND. j/k have at it

          Show
          bills William Slacum added a comment - YES I MIND. j/k have at it
          Hide
          sonixbp Corey J. Nolet added a comment - - edited

          It's been a couple of months and I have a requirement for this currently. The patches listed above were submitted in 2012- I'll work on this and try to salvage what I can from them unless of course this is pretty much across the finish line.

          Bill, you are the assignee currently- mind if I transfer?

          Show
          sonixbp Corey J. Nolet added a comment - - edited It's been a couple of months and I have a requirement for this currently. The patches listed above were submitted in 2012- I'll work on this and try to salvage what I can from them unless of course this is pretty much across the finish line. Bill, you are the assignee currently- mind if I transfer?
          Hide
          ctubbsii Christopher Tubbs added a comment -

          I don't currently know what's diverged in the input format since I made the last patch

          A lot of the mapreduce API changed in 1.5 (for the better, I hope). There's also the switch to authentication tokens in place of passwords, and in 1.6, there's support for storing tokens in a file. So, those considerations should be made, and we've also added support for the "old" mapred packages, so you may want to look at making a version for that also.

          Show
          ctubbsii Christopher Tubbs added a comment - I don't currently know what's diverged in the input format since I made the last patch A lot of the mapreduce API changed in 1.5 (for the better, I hope). There's also the switch to authentication tokens in place of passwords, and in 1.6, there's support for storing tokens in a file. So, those considerations should be made, and we've also added support for the "old" mapred packages, so you may want to look at making a version for that also.
          Hide
          pradeepg26 Pradeep Gollakota added a comment -

          I'm also available to help with this task if needed.

          Show
          pradeepg26 Pradeep Gollakota added a comment - I'm also available to help with this task if needed.
          Hide
          bills William Slacum added a comment -

          I can try applying the patch and getting a 1.4 version back up and running. I don't currently know what's diverged in the input format since I made the last patch, but it should apply since I made mostly new classes.

          Show
          bills William Slacum added a comment - I can try applying the patch and getting a 1.4 version back up and running. I don't currently know what's diverged in the input format since I made the last patch, but it should apply since I made mostly new classes.
          Hide
          ctubbsii Christopher Tubbs added a comment -

          Any thoughts on when this would be included?

          I'll tag it for 1.6.0, so we are sure to at least review the possibility of including it in 1.6.0, if somebody has time to work on it. I would encourage you to vote up (JIRA feature) tickets that you are in favor of, though. That could help us prioritize tickets.

          Show
          ctubbsii Christopher Tubbs added a comment - Any thoughts on when this would be included? I'll tag it for 1.6.0, so we are sure to at least review the possibility of including it in 1.6.0, if somebody has time to work on it. I would encourage you to vote up (JIRA feature) tickets that you are in favor of, though. That could help us prioritize tickets.
          Hide
          pradeepg26 Pradeep Gollakota added a comment -

          This would be a great addition.

          We have just started working with Pig (with Accumulo) at my company. The first thing that we noticed is that in a lot of situations, where we are joining data from one Accumulo table to data from another, we have to first dump the data from both tables to HDFS (perhaps using PigStorage), load the data back and then join the data. This was because the scan information is encoded in the job configuration. So, when Pig uses the MultiInputFormat to scan both tables in the same job, only one table ends up getting exported from Accumulo.

          If this is completed, we could use the MultiTableInputFormat instead of Accumulo(Row)InputFormat to optimize our pig scripts.

          Any thoughts on when this would be included?

          Show
          pradeepg26 Pradeep Gollakota added a comment - This would be a great addition. We have just started working with Pig (with Accumulo) at my company. The first thing that we noticed is that in a lot of situations, where we are joining data from one Accumulo table to data from another, we have to first dump the data from both tables to HDFS (perhaps using PigStorage), load the data back and then join the data. This was because the scan information is encoded in the job configuration. So, when Pig uses the MultiInputFormat to scan both tables in the same job, only one table ends up getting exported from Accumulo. If this is completed, we could use the MultiTableInputFormat instead of Accumulo(Row)InputFormat to optimize our pig scripts. Any thoughts on when this would be included?
          Hide
          jdonofrio Jim Donofrio added a comment -

          created ACCUMULO-712

          Show
          jdonofrio Jim Donofrio added a comment - created ACCUMULO-712
          Hide
          jvines jv added a comment -

          you are not the first person to ask that and I think it is a good idea. However, that is the scope of a new ticket, and since you're the one who brought it up, I'll give you the honors.

          Show
          jvines jv added a comment - you are not the first person to ask that and I think it is a good idea. However, that is the scope of a new ticket, and since you're the one who brought it up, I'll give you the honors.
          Hide
          jdonofrio Jim Donofrio added a comment -

          I am still learning about Accumulo but would there be any advantage to the inputformat offering the option to combine multiple tablets across tables into a given split if they are on the same tablet server in the case that some mappers complete their queries very quickly? This would be similar to the CombineFileInputFormat over hdfs files

          Show
          jdonofrio Jim Donofrio added a comment - I am still learning about Accumulo but would there be any advantage to the inputformat offering the option to combine multiple tablets across tables into a given split if they are on the same tablet server in the case that some mappers complete their queries very quickly? This would be similar to the CombineFileInputFormat over hdfs files
          Hide
          billie.rinaldi Billie Rinaldi added a comment -

          The patch is looking pretty good. An additional thing you'll need to do is update the fetchColumns method so that you're fetching columns for particular tables. Also, how would you feel about getting rid of TableKey and leaving this as an InputFormat<Key,Value>? I suggest this because the RangeInputSplit already contains the table name, and a Mapper can access it through ((RangeInputSplit) context.getInputSplit()).getTableName(). It's somewhat awkward, but the advantage of keeping InputFormat<Key,Value> is that you can have Mappers that don't care which table they're running over and can be used with either the single table or multi-table input format. If we want to make it easier to grab the table name, we could add a public static method that pulls it from a Context, or from an InputSplit.

          A separate thing I want to do is get rid of the AccumuloIterator and AccumuloIteratorOption configuration objects and just make IteratorSetting Writable so it can be used directly. I'll open another ticket about that, though.

          Show
          billie.rinaldi Billie Rinaldi added a comment - The patch is looking pretty good. An additional thing you'll need to do is update the fetchColumns method so that you're fetching columns for particular tables. Also, how would you feel about getting rid of TableKey and leaving this as an InputFormat<Key,Value>? I suggest this because the RangeInputSplit already contains the table name, and a Mapper can access it through ((RangeInputSplit) context.getInputSplit()).getTableName(). It's somewhat awkward, but the advantage of keeping InputFormat<Key,Value> is that you can have Mappers that don't care which table they're running over and can be used with either the single table or multi-table input format. If we want to make it easier to grab the table name, we could add a public static method that pulls it from a Context, or from an InputSplit. A separate thing I want to do is get rid of the AccumuloIterator and AccumuloIteratorOption configuration objects and just make IteratorSetting Writable so it can be used directly. I'll open another ticket about that, though.
          Hide
          bills William Slacum added a comment -

          This should be fully fleshed out. I made an attempt to stick closely to the old API in terms of method names, but I made some modifications. Specifically, "per table" settings are encapsulated as map parameters and are only set once. This avoids having to constantly do String appends in the configuration and makes the code a bit simpler.

          I tested this by running continuous ingest into two tables and running a simple job (see https://github.com/wjsl/multi-table-if) against the data. I put in an iterator on each table to override the value to ensure that the proper iterator was being applied to the correct table.

          I figure there will be more changes requested, so I've marked this patch as not intended for inclusion. If it's acceptable, I'll change the license.

          Show
          bills William Slacum added a comment - This should be fully fleshed out. I made an attempt to stick closely to the old API in terms of method names, but I made some modifications. Specifically, "per table" settings are encapsulated as map parameters and are only set once. This avoids having to constantly do String appends in the configuration and makes the code a bit simpler. I tested this by running continuous ingest into two tables and running a simple job (see https://github.com/wjsl/multi-table-if ) against the data. I put in an iterator on each table to override the value to ensure that the proper iterator was being applied to the correct table. I figure there will be more changes requested, so I've marked this patch as not intended for inclusion. If it's acceptable, I'll change the license.
          Hide
          bills William Slacum added a comment -

          It might be a good idea to also consider ACCUMULO-507 when refactoring the input format.

          Show
          bills William Slacum added a comment - It might be a good idea to also consider ACCUMULO-507 when refactoring the input format.
          Hide
          billie.rinaldi Billie Rinaldi added a comment -

          I think we could probably expand the existing InputFormatBase to cover the multi-table case. This would require making columns, ranges, and iterators per-table. Columns and iterators are only accessed on a per-table basis, so the table could be encoded in the property key and the value could be left the same, e.g. conf.set(ITERATORS + "." + Base64.encodeBase64(tableName.getBytes()), iterators). (Although I think in the case of iterators we should get rid of the separate iterators and iterator options properties and just have one combined property. I'd also like to see more standardization in the encodings we're using for property values.) The ranges are pulled from the configuration all at once, so we should leave them under the RANGES property key and have either a hierarchical structure in the value, or a flat structure where the table name is included with each range. I would suggest new methods to replace the existing ones of the same names:

          void setInputInfo(Configuration conf, String user, byte[] passwd, Authorizations auths)
          void setRanges(Configuration conf, Text tableName, Collection<Range> ranges)
          void fetchColumns(Configuration conf, Text tableName, Collection<Pair<Text,Text>> columnFamilyColumnQualifierPairs)
          void addIterator(Configuration conf, Text tableName, IteratorSetting cfg)
          TabletLocator getTabletLocator(Configuration conf, String tableName)
          Map<Text,List<Range>> getRanges(Configuration conf)
          Set<Pair<Text,Text>> getFetchedColumns(Configuration conf, String tableName)
          List<IteratorSetting> getIterators(Configuration conf, String tableName)
          

          To provide backwards compatibility, we could also keep the old setInputInfo/setRanges/fetchColumns/addIterator methods and have a concept of a default table specified in setInputInfo that will be the table used whenever a table isn't specified for setRanges/fetchColumns/addIterator.

          Show
          billie.rinaldi Billie Rinaldi added a comment - I think we could probably expand the existing InputFormatBase to cover the multi-table case. This would require making columns, ranges, and iterators per-table. Columns and iterators are only accessed on a per-table basis, so the table could be encoded in the property key and the value could be left the same, e.g. conf.set(ITERATORS + "." + Base64.encodeBase64(tableName.getBytes()), iterators). (Although I think in the case of iterators we should get rid of the separate iterators and iterator options properties and just have one combined property. I'd also like to see more standardization in the encodings we're using for property values.) The ranges are pulled from the configuration all at once, so we should leave them under the RANGES property key and have either a hierarchical structure in the value, or a flat structure where the table name is included with each range. I would suggest new methods to replace the existing ones of the same names: void setInputInfo(Configuration conf, String user, byte[] passwd, Authorizations auths) void setRanges(Configuration conf, Text tableName, Collection<Range> ranges) void fetchColumns(Configuration conf, Text tableName, Collection<Pair<Text,Text>> columnFamilyColumnQualifierPairs) void addIterator(Configuration conf, Text tableName, IteratorSetting cfg) TabletLocator getTabletLocator(Configuration conf, String tableName) Map<Text,List<Range>> getRanges(Configuration conf) Set<Pair<Text,Text>> getFetchedColumns(Configuration conf, String tableName) List<IteratorSetting> getIterators(Configuration conf, String tableName) To provide backwards compatibility, we could also keep the old setInputInfo/setRanges/fetchColumns/addIterator methods and have a concept of a default table specified in setInputInfo that will be the table used whenever a table isn't specified for setRanges/fetchColumns/addIterator.
          Hide
          bills William Slacum added a comment -

          Attached a basic implementation of this feature for trunk.

          Show
          bills William Slacum added a comment - Attached a basic implementation of this feature for trunk.
          Hide
          bills William Slacum added a comment -

          An initial crack at this.

          I tried to have changes to existing code kept at a minimum, but I did have to:

          • Make the configuration key strings for InputFormatBase public
          • should be fine because they're final Strings
          • Swapped some RecordReader method calls
          • Made the RangeInputSplit actually use the table parameter passed to its constructor

          Instead of giving clients a Key/Value pair, this uses a TableKey, which is just a key paired with a table name represented by a Text object. I didn't implement per-table iterators.

          Show
          bills William Slacum added a comment - An initial crack at this. I tried to have changes to existing code kept at a minimum, but I did have to: Make the configuration key strings for InputFormatBase public should be fine because they're final Strings Swapped some RecordReader method calls Made the RangeInputSplit actually use the table parameter passed to its constructor Instead of giving clients a Key/Value pair, this uses a TableKey, which is just a key paired with a table name represented by a Text object. I didn't implement per-table iterators.

            People

            • Assignee:
              sonixbp Corey J. Nolet
              Reporter:
              vines John Vines
            • Votes:
              3 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development