Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.5.0
    • Component/s: None
    • Labels:
      None

      Description

      Iterators that transform parts of the key can be tricky if any transformation affects sort ordering. Implement an iterator that takes care of the tricky details that come with modifying sort order (e.g., handling scan-time iterator reconstruction and the associated seek).

      1. key_transforming_iterator.patch
        27 kB
        Brian Loss
      2. key_transforming_iterator-1.patch
        33 kB
        Brian Loss
      3. key_transforming_iterator-2.patch
        33 kB
        Brian Loss
      4. key_transforming_iterator-3.patch
        46 kB
        Brian Loss
      5. transforming_iterator-1.patch
        4 kB
        Brian Loss
      There are no Sub-Tasks for this issue.

        Activity

        Hide
        Hudson added a comment -

        Integrated in Accumulo-Trunk #649 (See https://builds.apache.org/job/Accumulo-Trunk/649/)
        ACCUMULO-956 applied patch from Brian Loss (Revision 1435383)

        Result = SUCCESS
        kturner :
        Files :

        • /accumulo/trunk/core/src/main/java/org/apache/accumulo/core/iterators/user/TransformingIterator.java
        Show
        Hudson added a comment - Integrated in Accumulo-Trunk #649 (See https://builds.apache.org/job/Accumulo-Trunk/649/ ) ACCUMULO-956 applied patch from Brian Loss (Revision 1435383) Result = SUCCESS kturner : Files : /accumulo/trunk/core/src/main/java/org/apache/accumulo/core/iterators/user/TransformingIterator.java
        Hide
        Hudson added a comment -

        Integrated in Accumulo-Trunk-Hadoop-2.0 #7 (See https://builds.apache.org/job/Accumulo-Trunk-Hadoop-2.0/7/)
        ACCUMULO-956 applied patch from Brian Loss (Revision 1435383)

        Result = SUCCESS
        kturner :
        Files :

        • /accumulo/trunk/core/src/main/java/org/apache/accumulo/core/iterators/user/TransformingIterator.java
        Show
        Hudson added a comment - Integrated in Accumulo-Trunk-Hadoop-2.0 #7 (See https://builds.apache.org/job/Accumulo-Trunk-Hadoop-2.0/7/ ) ACCUMULO-956 applied patch from Brian Loss (Revision 1435383) Result = SUCCESS kturner : Files : /accumulo/trunk/core/src/main/java/org/apache/accumulo/core/iterators/user/TransformingIterator.java
        Hide
        Brian Loss added a comment -

        Changes look good to me Keith. I updated the option descriptions, and made the RangeIterator a static inner class. Changes are in transforming_iterator-1.patch.

        Show
        Brian Loss added a comment - Changes look good to me Keith. I updated the option descriptions, and made the RangeIterator a static inner class. Changes are in transforming_iterator-1.patch.
        Hide
        Hudson added a comment -

        Integrated in Accumulo-Trunk #646 (See https://builds.apache.org/job/Accumulo-Trunk/646/)
        ACCUMULO-956 generlaized transforming iterator, added some sanity checks to it, added some more unit test, added some static config methods (Revision 1434955)

        Result = SUCCESS
        kturner :
        Files :

        • /accumulo/trunk/core/src/main/java/org/apache/accumulo/core/iterators/user/KeyTransformingIterator.java
        • /accumulo/trunk/core/src/main/java/org/apache/accumulo/core/iterators/user/TransformingIterator.java
        • /accumulo/trunk/core/src/test/java/org/apache/accumulo/core/iterators/user/KeyTransformingIteratorTest.java
        • /accumulo/trunk/core/src/test/java/org/apache/accumulo/core/iterators/user/TransformingIteratorTest.java
        Show
        Hudson added a comment - Integrated in Accumulo-Trunk #646 (See https://builds.apache.org/job/Accumulo-Trunk/646/ ) ACCUMULO-956 generlaized transforming iterator, added some sanity checks to it, added some more unit test, added some static config methods (Revision 1434955) Result = SUCCESS kturner : Files : /accumulo/trunk/core/src/main/java/org/apache/accumulo/core/iterators/user/KeyTransformingIterator.java /accumulo/trunk/core/src/main/java/org/apache/accumulo/core/iterators/user/TransformingIterator.java /accumulo/trunk/core/src/test/java/org/apache/accumulo/core/iterators/user/KeyTransformingIteratorTest.java /accumulo/trunk/core/src/test/java/org/apache/accumulo/core/iterators/user/TransformingIteratorTest.java
        Hide
        Hudson added a comment -

        Integrated in Accumulo-Trunk-Hadoop-2.0 #4 (See https://builds.apache.org/job/Accumulo-Trunk-Hadoop-2.0/4/)
        ACCUMULO-956 generlaized transforming iterator, added some sanity checks to it, added some more unit test, added some static config methods (Revision 1434955)

        Result = FAILURE
        kturner :
        Files :

        • /accumulo/trunk/core/src/main/java/org/apache/accumulo/core/iterators/user/KeyTransformingIterator.java
        • /accumulo/trunk/core/src/main/java/org/apache/accumulo/core/iterators/user/TransformingIterator.java
        • /accumulo/trunk/core/src/test/java/org/apache/accumulo/core/iterators/user/KeyTransformingIteratorTest.java
        • /accumulo/trunk/core/src/test/java/org/apache/accumulo/core/iterators/user/TransformingIteratorTest.java
        Show
        Hudson added a comment - Integrated in Accumulo-Trunk-Hadoop-2.0 #4 (See https://builds.apache.org/job/Accumulo-Trunk-Hadoop-2.0/4/ ) ACCUMULO-956 generlaized transforming iterator, added some sanity checks to it, added some more unit test, added some static config methods (Revision 1434955) Result = FAILURE kturner : Files : /accumulo/trunk/core/src/main/java/org/apache/accumulo/core/iterators/user/KeyTransformingIterator.java /accumulo/trunk/core/src/main/java/org/apache/accumulo/core/iterators/user/TransformingIterator.java /accumulo/trunk/core/src/test/java/org/apache/accumulo/core/iterators/user/KeyTransformingIteratorTest.java /accumulo/trunk/core/src/test/java/org/apache/accumulo/core/iterators/user/TransformingIteratorTest.java
        Hide
        Hudson added a comment -

        Integrated in Accumulo-Trunk #644 (See https://builds.apache.org/job/Accumulo-Trunk/644/)
        ACCUMULO-956 fixed some issues and added unit test for transforming iterator (Revision 1434881)

        Result = SUCCESS
        kturner :
        Files :

        • /accumulo/trunk/core/src/main/java/org/apache/accumulo/core/iterators/user/KeyTransformingIterator.java
        • /accumulo/trunk/core/src/test/java/org/apache/accumulo/core/iterators/user/KeyTransformingIteratorTest.java
        Show
        Hudson added a comment - Integrated in Accumulo-Trunk #644 (See https://builds.apache.org/job/Accumulo-Trunk/644/ ) ACCUMULO-956 fixed some issues and added unit test for transforming iterator (Revision 1434881) Result = SUCCESS kturner : Files : /accumulo/trunk/core/src/main/java/org/apache/accumulo/core/iterators/user/KeyTransformingIterator.java /accumulo/trunk/core/src/test/java/org/apache/accumulo/core/iterators/user/KeyTransformingIteratorTest.java
        Hide
        Hudson added a comment -

        Integrated in Accumulo-Trunk-Hadoop-2.0 #2 (See https://builds.apache.org/job/Accumulo-Trunk-Hadoop-2.0/2/)
        ACCUMULO-956 fixed some issues and added unit test for transforming iterator (Revision 1434881)

        Result = FAILURE
        kturner :
        Files :

        • /accumulo/trunk/core/src/main/java/org/apache/accumulo/core/iterators/user/KeyTransformingIterator.java
        • /accumulo/trunk/core/src/test/java/org/apache/accumulo/core/iterators/user/KeyTransformingIteratorTest.java
        Show
        Hudson added a comment - Integrated in Accumulo-Trunk-Hadoop-2.0 #2 (See https://builds.apache.org/job/Accumulo-Trunk-Hadoop-2.0/2/ ) ACCUMULO-956 fixed some issues and added unit test for transforming iterator (Revision 1434881) Result = FAILURE kturner : Files : /accumulo/trunk/core/src/main/java/org/apache/accumulo/core/iterators/user/KeyTransformingIterator.java /accumulo/trunk/core/src/test/java/org/apache/accumulo/core/iterators/user/KeyTransformingIteratorTest.java
        Hide
        Hudson added a comment -

        Integrated in Accumulo-Trunk #643 (See https://builds.apache.org/job/Accumulo-Trunk/643/)
        ACCUMULO-956 formatted source code (Revision 1434768)
        ACCUMULO-956 checkin of patch from Brain Loss (Revision 1434762)

        Result = SUCCESS
        kturner :
        Files :

        • /accumulo/trunk/core/src/main/java/org/apache/accumulo/core/iterators/user/KeyTransformingIterator.java
        • /accumulo/trunk/core/src/test/java/org/apache/accumulo/core/iterators/user/KeyTransformingIteratorTest.java

        kturner :
        Files :

        • /accumulo/trunk/core/src/main/java/org/apache/accumulo/core/iterators/user/KeyTransformingIterator.java
        • /accumulo/trunk/core/src/test/java/org/apache/accumulo/core/iterators/user/KeyTransformingIteratorTest.java
        Show
        Hudson added a comment - Integrated in Accumulo-Trunk #643 (See https://builds.apache.org/job/Accumulo-Trunk/643/ ) ACCUMULO-956 formatted source code (Revision 1434768) ACCUMULO-956 checkin of patch from Brain Loss (Revision 1434762) Result = SUCCESS kturner : Files : /accumulo/trunk/core/src/main/java/org/apache/accumulo/core/iterators/user/KeyTransformingIterator.java /accumulo/trunk/core/src/test/java/org/apache/accumulo/core/iterators/user/KeyTransformingIteratorTest.java kturner : Files : /accumulo/trunk/core/src/main/java/org/apache/accumulo/core/iterators/user/KeyTransformingIterator.java /accumulo/trunk/core/src/test/java/org/apache/accumulo/core/iterators/user/KeyTransformingIteratorTest.java
        Hide
        Brian Loss added a comment -

        No, I can't think of anything else I need/want to do, so have at it. Thanks!

        Show
        Brian Loss added a comment - No, I can't think of anything else I need/want to do, so have at it. Thanks!
        Hide
        Keith Turner added a comment -

        The latest patch looks good. I was thinking of commiting and then making some further modifications. Was there anything else you wanted to do?

        Show
        Keith Turner added a comment - The latest patch looks good. I was thinking of commiting and then making some further modifications. Was there anything else you wanted to do?
        Hide
        Brian Loss added a comment -

        New version with most of the fixes based on Keith's comments. Still missing a test for deepCopy (there's an empty test for it right now).

        Show
        Brian Loss added a comment - New version with most of the fixes based on Keith's comments. Still missing a test for deepCopy (there's an empty test for it right now).
        Hide
        Keith Turner added a comment -

        It may be necessary to filter based on column family.

        For example, if colfam C can be transformed to col fam A or B (depending on other parts of the key). If a seek came in for col fam A, then the soruce would be seeked for colfam C. However this could result in transformations to col fam B, which we would want to suppress.

        Show
        Keith Turner added a comment - It may be necessary to filter based on column family. For example, if colfam C can be transformed to col fam A or B (depending on other parts of the key). If a seek came in for col fam A, then the soruce would be seeked for colfam C. However this could result in transformations to col fam B, which we would want to suppress.
        Hide
        Keith Turner added a comment -

        Code review notes for patch key-transforming-iterator_2.patch.

        Notes for KeyTransformingIterator :

        • Javadoc should outline why TransformingIter does vis filtering. Someone new to this may wonder why this iterator is doing that, and why it can't be done by a follow on iterator. Should also document that you need to configure the untransformed auths on the scanner and the transformed auths on this iterator.
        • javadoc for canSee() is wrong
        • transformKeys() is only called when super.hasTop() is true, but it handles the case when super.hasTop() is false. I was looking to see if this case could cause a NPE, but then realized it would never occur. Not that anything needs to be done about this, just something I observed
        • For scans if the transformation creates an invalid colvis, this would be caught. For compactions seems like it would not be caught. May consider checking that colvis is valid for compactions, do not need to evaluate it, just ensure it parses. Could use a seperate cache for this. If no auths are passed for a scan, maybe the transformation is not checked?
        • May need to transform column families for seek. Could provide a hook like the following for the user. Seek would call this hook. There is a bug related to column qualifiers fetched by the user that are not passed through seek, but filtered at a lower level. Trying to find the ticket for this. Column families can still be handled though.
        //this function is called by seek to transform column families for seek to parent iterator
          protected Collection<ByteSequence> columnFamilies transformColumnFamilies(Collection<ByteSequence> colFams){
            return colFams;
          }
        

        Notes for unit test :

        • Need a unit test that transforms to vis a user can see. I think it only transforms to vis a user can not see.
        • Need to test deep copy
        • need to test compaction/not scanning
        • Is there a test for a range thats before all data and a range thats after all data?
        • Need to test all combinations of range for the following conditions. Start/end key exist inclusive/exclusive, start/end key not exist (between existing keys) exclusive/inclusive. I think some of the case are tested, but not all.
        • Need to test seek that passes column families that are transformed.

        All of the code looks really nice. For some reason isSetAfterPart() made me smile.

        Show
        Keith Turner added a comment - Code review notes for patch key-transforming-iterator_2.patch. Notes for KeyTransformingIterator : Javadoc should outline why TransformingIter does vis filtering. Someone new to this may wonder why this iterator is doing that, and why it can't be done by a follow on iterator. Should also document that you need to configure the untransformed auths on the scanner and the transformed auths on this iterator. javadoc for canSee() is wrong transformKeys() is only called when super.hasTop() is true, but it handles the case when super.hasTop() is false. I was looking to see if this case could cause a NPE, but then realized it would never occur. Not that anything needs to be done about this, just something I observed For scans if the transformation creates an invalid colvis, this would be caught. For compactions seems like it would not be caught. May consider checking that colvis is valid for compactions, do not need to evaluate it, just ensure it parses. Could use a seperate cache for this. If no auths are passed for a scan, maybe the transformation is not checked? May need to transform column families for seek. Could provide a hook like the following for the user. Seek would call this hook. There is a bug related to column qualifiers fetched by the user that are not passed through seek, but filtered at a lower level. Trying to find the ticket for this. Column families can still be handled though. // this function is called by seek to transform column families for seek to parent iterator protected Collection<ByteSequence> columnFamilies transformColumnFamilies(Collection<ByteSequence> colFams){ return colFams; } Notes for unit test : Need a unit test that transforms to vis a user can see. I think it only transforms to vis a user can not see. Need to test deep copy need to test compaction/not scanning Is there a test for a range thats before all data and a range thats after all data? Need to test all combinations of range for the following conditions. Start/end key exist inclusive/exclusive, start/end key not exist (between existing keys) exclusive/inclusive. I think some of the case are tested, but not all. Need to test seek that passes column families that are transformed. All of the code looks really nice. For some reason isSetAfterPart() made me smile.
        Hide
        Brian Loss added a comment -

        I went ahead and changed it. The code didn't look as bloated as I thought it might , so the new patch is up.

        Show
        Brian Loss added a comment - I went ahead and changed it. The code didn't look as bloated as I thought it might , so the new patch is up.
        Hide
        Keith Turner added a comment -

        However, I can certainly change it to do that, if you think it's better.

        Its not a big deal. The unit test serves it primary purpose of testing the code just fine. Unit test can have a secondary purpose of serving as an example for users. IMO it would be a nice change. But its certainly not necessary, everyone probably has differing opinions about something like this.

        Show
        Keith Turner added a comment - However, I can certainly change it to do that, if you think it's better. Its not a big deal. The unit test serves it primary purpose of testing the code just fine. Unit test can have a secondary purpose of serving as an example for users. IMO it would be a nice change. But its certainly not necessary, everyone probably has differing opinions about something like this.
        Hide
        Brian Loss added a comment -

        Updated test based on Keith's comments.

        Show
        Brian Loss added a comment - Updated test based on Keith's comments.
        Hide
        Brian Loss added a comment -

        Yeah, I was trying to keep the test code less redundant. Since MockAccumulo creates the iterator for me, I have no way to pass in the different transformation functions in each test and would instead have to extend the iterator and have one class per test, which seemed overly verbose. However, I can certainly change it to do that, if you think it's better.

        Show
        Brian Loss added a comment - Yeah, I was trying to keep the test code less redundant. Since MockAccumulo creates the iterator for me, I have no way to pass in the different transformation functions in each test and would instead have to extend the iterator and have one class per test, which seemed overly verbose. However, I can certainly change it to do that, if you think it's better.
        Hide
        Keith Turner added a comment -

        I am looking at the patch.

        In the unit test the way it sets the transformation the scan should do is a bit odd. It sets a static var. This is non-standard and only works because the transformation is running in the same JVM. I think it would be better to make the unit test representative of how you expect it to be used. Make different iterators that do specific transformations, setup different scanners to use those, and drop the KeyTransformer interface in the unit test.

        Show
        Keith Turner added a comment - I am looking at the patch. In the unit test the way it sets the transformation the scan should do is a bit odd. It sets a static var. This is non-standard and only works because the transformation is running in the same JVM. I think it would be better to make the unit test representative of how you expect it to be used. Make different iterators that do specific transformations, setup different scanners to use those, and drop the KeyTransformer interface in the unit test.
        Hide
        Brian Loss added a comment -

        After an offline discussion with Keith, I added some utility methods to copy key parts. These methods handle setting the delete flag properly, which is something that is easy to miss/forget. The idea is that people who extend this iterator and implement the transformKey method could use these methods to compute the final key. The javadoc indicates that the delete flag needs to be set and also suggests using these new methods.

        Show
        Brian Loss added a comment - After an offline discussion with Keith, I added some utility methods to copy key parts. These methods handle setting the delete flag properly, which is something that is easy to miss/forget. The idea is that people who extend this iterator and implement the transformKey method could use these methods to compute the final key. The javadoc indicates that the delete flag needs to be set and also suggests using these new methods.
        Hide
        Brian Loss added a comment -

        I not sure there is a problem. The KeyTransformingIterator transforms only single keys, so you can't do any aggregation. If you had some kind of aggregating iterator on top of it on the stack, it would need to read enough keys from its source to do the aggregation. To produce your example, I think you'd have to scan all the cf: keys from the source to know it has what it needs for r cf: even [v2, v4]. Since that iterator would be scanning its source, there is no problem with the stack getting torn down and re-seeked. Or am I missing something?

        Show
        Brian Loss added a comment - I not sure there is a problem. The KeyTransformingIterator transforms only single keys, so you can't do any aggregation. If you had some kind of aggregating iterator on top of it on the stack, it would need to read enough keys from its source to do the aggregation. To produce your example, I think you'd have to scan all the cf: keys from the source to know it has what it needs for r cf: even [v2, v4] . Since that iterator would be scanning its source, there is no problem with the stack getting torn down and re-seeked. Or am I missing something?
        Hide
        William Slacum added a comment -

        You may want to think about being able to make an inversion/reversal function part of your interface, otherwise I don't believe you can have a function that transforms arbitrary subsets of keys associated with a given prefix.

        Say I have a sequence of keys for a row r:

        r cf: 1 [v1]
        r cf: 2 [v2]
        r cf: 3 [v3]
        r cf: 4 [v4]
        r cf: 5 [v5]

        Now let's say I want a transforming iterator that does aggregates the values with odd and even qual
        ifiers into a list, such that the sequence above generates two keys:

        r cf: even [v2, v4]
        r cf: odd [v1, v3, v5]

        After I generate each entry and return them to the client, it's possible the iterator stack gets torn down and rebuilt. If it happens after r cf: even [v2, v4], the new stack will be seeked with a range that has a non-inclusive start key of r cf: even. That is after all of the data used to generate the transformed set of keys, so a client would never see the key with the 'odd' column qualifier.

        It's also possible that this was never an intended use of the transforming iterator, but it's something to keep in mind.

        Show
        William Slacum added a comment - You may want to think about being able to make an inversion/reversal function part of your interface, otherwise I don't believe you can have a function that transforms arbitrary subsets of keys associated with a given prefix. Say I have a sequence of keys for a row r : r cf: 1 [v1] r cf: 2 [v2] r cf: 3 [v3] r cf: 4 [v4] r cf: 5 [v5] Now let's say I want a transforming iterator that does aggregates the values with odd and even qual ifiers into a list, such that the sequence above generates two keys: r cf: even [v2, v4] r cf: odd [v1, v3, v5] After I generate each entry and return them to the client, it's possible the iterator stack gets torn down and rebuilt. If it happens after r cf: even [v2, v4] , the new stack will be seeked with a range that has a non-inclusive start key of r cf: even . That is after all of the data used to generate the transformed set of keys, so a client would never see the key with the 'odd' column qualifier. It's also possible that this was never an intended use of the transforming iterator, but it's something to keep in mind.
        Hide
        Brian Loss added a comment -

        patch file with iterator source

        Show
        Brian Loss added a comment - patch file with iterator source

          People

          • Assignee:
            Brian Loss
            Reporter:
            Brian Loss
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development