Accumulo
  1. Accumulo
  2. ACCUMULO-803

Add Reverse Logical Time as a Time Type

    Details

    • Type: Improvement Improvement
    • Status: Patch Available
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: tserver
    • Labels:
      None

      Description

      In a context where we are doing aggregation/combination of multiple values for a given key it may be useful to iterate over the values associated with that key in the order in which the mutations were applied (FIFO), instead of the FILO order that seems to occur when using TimeType.LOGICAL.

      I encountered when implemeting a checkAndPut operation that would ensure that the previous value was expected before putting a new value. In this case, if the previous value was not as expected, the mutation would be ignored.

      Perhaps it is useful in a general case?

      1. ACCUMULO-803.patch
        12 kB
        Drew Farris
      2. ACCUMULO-803.patch
        14 kB
        Drew Farris
      3. RLTest.java
        2 kB
        Keith Turner
      4. ACCUMULO-803-3.patch
        19 kB
        Keith Turner

        Issue Links

          Activity

          Drew Farris created issue -
          Hide
          Drew Farris added a comment -

          Rough cut at reverse logical time

          Show
          Drew Farris added a comment - Rough cut at reverse logical time
          Drew Farris made changes -
          Field Original Value New Value
          Attachment ACCUMULO-803.patch [ 12548739 ]
          Keith Turner made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Assignee Keith Turner [ kturner ] Drew Farris [ drew.farris ]
          Hide
          Keith Turner added a comment -

          looks good a few comments.

          • can you make this patch against trunk/1.5 instead of 1.4? I do not think this is something we would want to add to 1.4.
          • the user manual would need some mention of this new feature, in docs/src/user_manual/chapters/table_configuration.tex I think
          • in MockTable I think you should subtract from Long.MAX_VALUE instead of Integer.MAX_VALUE for consistency, even though its an int is being subtracted
          Show
          Keith Turner added a comment - looks good a few comments. can you make this patch against trunk/1.5 instead of 1.4? I do not think this is something we would want to add to 1.4. the user manual would need some mention of this new feature, in docs/src/user_manual/chapters/table_configuration.tex I think in MockTable I think you should subtract from Long.MAX_VALUE instead of Integer.MAX_VALUE for consistency, even though its an int is being subtracted
          Hide
          Adam Fuchs added a comment -

          As a thought exercise, let's think about how far this might go. Since Combiners work on a column family, does that imply that we might want a different timestamp scheme on different column families? Should this turn into another plugin framework for assigning timestamps on tablet servers, or perhaps a timestamp data definition language for column families?

          Show
          Adam Fuchs added a comment - As a thought exercise, let's think about how far this might go. Since Combiners work on a column family, does that imply that we might want a different timestamp scheme on different column families? Should this turn into another plugin framework for assigning timestamps on tablet servers, or perhaps a timestamp data definition language for column families?
          Hide
          Keith Turner added a comment -

          does that imply that we might want a different timestamp scheme on different column families?

          Currently each tablets timestamp is stored in the metadata table. It tablets had to persist and indeterminate number of timestamps, then I do not think we could safely store that in the metadata table. Would need to store it somewhere else. Its nice storing the info in the metadata table because you can atomically update the timestamp on minor compaction and bulkimport by putting it in the same mutation. So where would this info be stored and how would it be atomically updated after minor compaction and bulk import?

          Would also have to be careful not to blow out memory on tservers. e.g. If each tablet on a tserver is keeping track of timestamps for many columns in memory. Would need to cache timestamp info from persisted store to avoid this problem.

          Show
          Keith Turner added a comment - does that imply that we might want a different timestamp scheme on different column families? Currently each tablets timestamp is stored in the metadata table. It tablets had to persist and indeterminate number of timestamps, then I do not think we could safely store that in the metadata table. Would need to store it somewhere else. Its nice storing the info in the metadata table because you can atomically update the timestamp on minor compaction and bulkimport by putting it in the same mutation. So where would this info be stored and how would it be atomically updated after minor compaction and bulk import? Would also have to be careful not to blow out memory on tservers. e.g. If each tablet on a tserver is keeping track of timestamps for many columns in memory. Would need to cache timestamp info from persisted store to avoid this problem.
          Hide
          Adam Fuchs added a comment -

          You could do both count up and count down logical time on many different groups of keys in the same tablet with only persisting one timestamp. If the plugin framework supports a one-up counter, a plugin can then subtract that from max long or transform it otherwise as necessary for each timestamp group. This would maintain the monotonicity property of logical time on any of the groups of entries at the cost of introducing sparcity (which doesn't really matter).

          Show
          Adam Fuchs added a comment - You could do both count up and count down logical time on many different groups of keys in the same tablet with only persisting one timestamp. If the plugin framework supports a one-up counter, a plugin can then subtract that from max long or transform it otherwise as necessary for each timestamp group. This would maintain the monotonicity property of logical time on any of the groups of entries at the cost of introducing sparcity (which doesn't really matter).
          Hide
          Keith Turner added a comment -

          You could do both count up and count down logical time on many different groups of keys in the same tablet with only persisting one timestamp

          That would work. You can go up or down in a sparse manner. Does not seem like plugins are needed. Are there other operations we want to support besides FIFO and FILO? Seems operations like multiplication, addition, modulo, and division would not be useful. Randomizing the timestamp could be accomplished on the client side. Would be nice to have use cases for plugin framework.

          Show
          Keith Turner added a comment - You could do both count up and count down logical time on many different groups of keys in the same tablet with only persisting one timestamp That would work. You can go up or down in a sparse manner. Does not seem like plugins are needed. Are there other operations we want to support besides FIFO and FILO? Seems operations like multiplication, addition, modulo, and division would not be useful. Randomizing the timestamp could be accomplished on the client side. Would be nice to have use cases for plugin framework.
          Hide
          Drew Farris added a comment -

          looks good a few comments.

          Sounds good, I'll update the patch, docs and MockTable per suggestions.

          Show
          Drew Farris added a comment - looks good a few comments. Sounds good, I'll update the patch, docs and MockTable per suggestions.
          Drew Farris made changes -
          Affects Version/s 1.5.0 [ 12318645 ]
          Affects Version/s 1.4.2 [ 12321843 ]
          Hide
          Drew Farris added a comment -

          Updated patch against trunk (1.5.x) incorporating recommended changes.

          Show
          Drew Farris added a comment - Updated patch against trunk (1.5.x) incorporating recommended changes.
          Drew Farris made changes -
          Attachment ACCUMULO-803.patch [ 12549435 ]
          Hide
          Keith Turner added a comment -

          Since Combiners work on a column family, does that imply that we might want a different timestamp scheme on different column families?

          We might also consider setting an option on mutations to reverse system set time. This gives users the ability to target this in very tailored ways. Users could apply it to certain column family prefixes, to column qualifiers, etc. It gives a lot of flexibility w/o any server side code. Could add something like the following to mutation.

            
          enum TimestampTransformations {
             NONE,
             REVERSE
          }
          
          class Mutation {
            //transformation that will be applied to system set timestamps
            public void setTimestampTransformation(TimestampTransformations tt);
          }
          
          
          

          I think the only drawback with this is that it does not support setting system time for bulk insert. The current patch will support that.

          Show
          Keith Turner added a comment - Since Combiners work on a column family, does that imply that we might want a different timestamp scheme on different column families? We might also consider setting an option on mutations to reverse system set time. This gives users the ability to target this in very tailored ways. Users could apply it to certain column family prefixes, to column qualifiers, etc. It gives a lot of flexibility w/o any server side code. Could add something like the following to mutation. enum TimestampTransformations { NONE, REVERSE } class Mutation { //transformation that will be applied to system set timestamps public void setTimestampTransformation(TimestampTransformations tt); } I think the only drawback with this is that it does not support setting system time for bulk insert. The current patch will support that.
          Hide
          Adam Fuchs added a comment -

          I like the setting of timestamp transformation in the Mutation, but do you think people might want to set multiple different transformations for different column updates in the same mutation?

          Show
          Adam Fuchs added a comment - I like the setting of timestamp transformation in the Mutation, but do you think people might want to set multiple different transformations for different column updates in the same mutation?
          Hide
          Keith Turner added a comment -

          I like the setting of timestamp transformation in the Mutation, but do you think people might want to set multiple different transformations for different column updates in the same mutation?

          I like it too, but the bulk import aspect is still tormenting me. I was being miserly with computation and storage when thinking about setting it at the mutation level. Eric just modified mutation so that the system timstamp is set once per mutation. It used to set the same timestamp for each column. I wanted to maintain this slight boost in ingest performance. I suppose we could still set the timestamp once permutation and just interpret it differently if a flag is set for the column when getTimestamp() is called.

          For bulk import, if the user request logical time, it will set the same timestamp for all keys in the file. We could make it reverse this timestamp for all keys in the file. But I have not thought of a simple method for do something more granular. Seems like the two are options are modify the Key written by bulk import or modify the special iterator that sets timestamps on bulk imported files when they are read. Don't really like this options. But we don't have to support anything more granular for bulk import.

          Show
          Keith Turner added a comment - I like the setting of timestamp transformation in the Mutation, but do you think people might want to set multiple different transformations for different column updates in the same mutation? I like it too, but the bulk import aspect is still tormenting me. I was being miserly with computation and storage when thinking about setting it at the mutation level. Eric just modified mutation so that the system timstamp is set once per mutation. It used to set the same timestamp for each column. I wanted to maintain this slight boost in ingest performance. I suppose we could still set the timestamp once permutation and just interpret it differently if a flag is set for the column when getTimestamp() is called. For bulk import, if the user request logical time, it will set the same timestamp for all keys in the file. We could make it reverse this timestamp for all keys in the file. But I have not thought of a simple method for do something more granular. Seems like the two are options are modify the Key written by bulk import or modify the special iterator that sets timestamps on bulk imported files when they are read. Don't really like this options. But we don't have to support anything more granular for bulk import.
          Hide
          Keith Turner added a comment -

          do you think people might want to set multiple different transformations for different column updates in the same mutation?

          One more consideration. This would add another set of put methods to mutation, which already has alot of put methods.

          Show
          Keith Turner added a comment - do you think people might want to set multiple different transformations for different column updates in the same mutation? One more consideration. This would add another set of put methods to mutation, which already has alot of put methods.
          Hide
          Keith Turner added a comment -

          Thinking about setting the order in the mutation, I realized it gives users rope to hang themselves with. A user could write two sets of code. One program sets the order of column foo to FIFO while another program sets it to LIFO. The user did not intend to do this, but after running both the data is in the system and they have to deal with it. Below shows a table of what been proposed so far and what I see as the pros and cons.

          Timestamp ordering method Pros Cons
          Set order at table level simple and works w/ bulk import may force user to store data they want commingled in separate tables
          Set order per column in table config Ordering for a column is always consistent. Can provide granularity w/ bulk import. Complex to set order in complex ways (i.e. plugin). Introduces some server side computation overhead
          Set order per column in mutation Easy to set order in complex ways Can easily write code that inconsistently sets per column order. Do not have same granularity w/ bulk import, user can specify if an entire file is FIFO or LIFO

          The table above assumes the order decision method in the table config is immutable. If the config were mutable it would suffer from the same problem as setting the order in the mutation. The issue of complexity with per table order config can be avoided if a really simple config is used. For example no plugins, at table creation time the user just specifies which column families are FIFO. This does not give the user the same flexibility as a plugin or setting it on the mutation, but its better than setting FIFO for a whole table.

          So I am now thinking a simple, immutable config at table creation time may be best. Could add the following to table operations.

            public void create(String tableName, boolean versioningIter, TimeType timeType, Set<Text> fifoColumnFamilies);
          
          Show
          Keith Turner added a comment - Thinking about setting the order in the mutation, I realized it gives users rope to hang themselves with. A user could write two sets of code. One program sets the order of column foo to FIFO while another program sets it to LIFO. The user did not intend to do this, but after running both the data is in the system and they have to deal with it. Below shows a table of what been proposed so far and what I see as the pros and cons. Timestamp ordering method Pros Cons Set order at table level simple and works w/ bulk import may force user to store data they want commingled in separate tables Set order per column in table config Ordering for a column is always consistent. Can provide granularity w/ bulk import. Complex to set order in complex ways (i.e. plugin). Introduces some server side computation overhead Set order per column in mutation Easy to set order in complex ways Can easily write code that inconsistently sets per column order. Do not have same granularity w/ bulk import, user can specify if an entire file is FIFO or LIFO The table above assumes the order decision method in the table config is immutable. If the config were mutable it would suffer from the same problem as setting the order in the mutation. The issue of complexity with per table order config can be avoided if a really simple config is used. For example no plugins, at table creation time the user just specifies which column families are FIFO. This does not give the user the same flexibility as a plugin or setting it on the mutation, but its better than setting FIFO for a whole table. So I am now thinking a simple, immutable config at table creation time may be best. Could add the following to table operations. public void create( String tableName, boolean versioningIter, TimeType timeType, Set<Text> fifoColumnFamilies);
          Hide
          Drew Farris added a comment -

          I agree, I don't feel comfortable allowing users to set column family ordering on a per mutation basis.

          With the API you propose, do you feel that fifo ordering would be best achieved by manipulating timestamps as is done the original patch? If so, does this introduce any strangeness in the relationship between typeType and fifoColumnFamilies? Could it make sense to use TimeType.MILLIS and a fifo column family? Perhaps yes.

          I feel a little uncomfortable with the fact that column families are dynamic but we would require the user to specify a set of column families at creation time if they want fifo behavior. This seems to run counter to the spirit of column families (and how they're used) in Accumulo.

          One way to solve this problem would be to add setColumnFamilyOrdering(String cf, boolean fifo) (or something like it) to TableOperations. This might not be a good idea because we run into the same problem we have with Mutations: the user could shoot themselves in the foot if they set the ordering on a column family to be different than the default for a table that already contains data.

          So, I have to admit that I lean closest to the original mechanism, but I could be biased because I'm the patch author

          Thoughts?

          Show
          Drew Farris added a comment - I agree, I don't feel comfortable allowing users to set column family ordering on a per mutation basis. With the API you propose, do you feel that fifo ordering would be best achieved by manipulating timestamps as is done the original patch? If so, does this introduce any strangeness in the relationship between typeType and fifoColumnFamilies? Could it make sense to use TimeType.MILLIS and a fifo column family? Perhaps yes. I feel a little uncomfortable with the fact that column families are dynamic but we would require the user to specify a set of column families at creation time if they want fifo behavior. This seems to run counter to the spirit of column families (and how they're used) in Accumulo. One way to solve this problem would be to add setColumnFamilyOrdering(String cf, boolean fifo) (or something like it) to TableOperations. This might not be a good idea because we run into the same problem we have with Mutations: the user could shoot themselves in the foot if they set the ordering on a column family to be different than the default for a table that already contains data. So, I have to admit that I lean closest to the original mechanism, but I could be biased because I'm the patch author Thoughts?
          Hide
          David Medinets added a comment -

          I feel mucking with timestamps is asking for trouble. I'd rather see
          an effort to make the Key semantics changeable in a controlled fashion
          using a plug-in architecture. Then changes to support FIFO or other
          sorting mechanisms would use a well-defined API. Changes (and bugs)
          for each mechanism would be isolated.

          Show
          David Medinets added a comment - I feel mucking with timestamps is asking for trouble. I'd rather see an effort to make the Key semantics changeable in a controlled fashion using a plug-in architecture. Then changes to support FIFO or other sorting mechanisms would use a well-defined API. Changes (and bugs) for each mechanism would be isolated.
          Hide
          David Medinets added a comment -

          I feel mucking with timestamps is asking for trouble. I'd rather see an effort to make the Key semantics changeable in a controlled fashion using a plug-in architecture. Then changes to support FIFO or other sorting mechanisms would use a well-defined API. Changes (and bugs) for each mechanism would be isolated.

          Show
          David Medinets added a comment - I feel mucking with timestamps is asking for trouble. I'd rather see an effort to make the Key semantics changeable in a controlled fashion using a plug-in architecture. Then changes to support FIFO or other sorting mechanisms would use a well-defined API. Changes (and bugs) for each mechanism would be isolated.
          Hide
          Drew Farris added a comment -

          I feel that reversing the assignment of timestamps is the safest approach. The alternative would involve modifying the code that sorts entries in rfiles which I am a little reluctant to touch at this point. I don't know that there's an alternative that would be less invasive than reverse logical time.

          Show
          Drew Farris added a comment - I feel that reversing the assignment of timestamps is the safest approach. The alternative would involve modifying the code that sorts entries in rfiles which I am a little reluctant to touch at this point. I don't know that there's an alternative that would be less invasive than reverse logical time.
          Hide
          Drew Farris added a comment -

          This has been hanging around for quite awhile – any further comments? Probably too late for 1.5 at this point, eh?

          Show
          Drew Farris added a comment - This has been hanging around for quite awhile – any further comments? Probably too late for 1.5 at this point, eh?
          Hide
          Keith Turner added a comment -

          This is a nice patch, I had not looked at it in a while. Code, test, and documentation! I did a thorough review and found some issues with the patch :

          • When a tablet is flushed it does not save reverse time properly. This is because of a check tablet does to make sure time increases. Search for persistedTime in tablet. The greater than check against persisted time should probably be abstracted out to tablet time.
          • The initial time on a tables initial tablet is not set correctly, its set to "D0". Should be "D"+Long.MAX. MetadataTable.addTablet() should call TabletTime to ask for initial metadata value for time type.
          • ReverseLogicalTime constructor should not subtract from max long. This constructor is called every time tablet loads. Setting initial tables tablet will make things work properly.
          • Merges of tablets are taking maximum time instead of minimum time. TabletTime.maxMetadataTime() should take reverse action for reverse time.
          • Need to test bulk import w/ reverse logical time

          Sorry I dropped the ball on this. We discussed all of theses other options and I got sidetracked and never did an in depth review.

          Drew, I think your original idea is good. It is a simple addition to the existing model. The other possible changes discussed require completely rethinking the existing model. Adding it would not prevent some of the more complex changes discussed being implemented in the future.

          As far as 1.5, this patch was in way before 1.5 feature freeze date. I would kinda like to see it go in 1.5, but this is partially because I feel guilty for dropping the ball on this. On the other hand, I do want to see work on 1.5 really start to settle and focus solely on bug fixes.

          I will attach a little program I used to poke at this patch.

          Show
          Keith Turner added a comment - This is a nice patch, I had not looked at it in a while. Code, test, and documentation! I did a thorough review and found some issues with the patch : When a tablet is flushed it does not save reverse time properly. This is because of a check tablet does to make sure time increases. Search for persistedTime in tablet. The greater than check against persisted time should probably be abstracted out to tablet time. The initial time on a tables initial tablet is not set correctly, its set to "D0". Should be "D"+Long.MAX. MetadataTable.addTablet() should call TabletTime to ask for initial metadata value for time type. ReverseLogicalTime constructor should not subtract from max long. This constructor is called every time tablet loads. Setting initial tables tablet will make things work properly. Merges of tablets are taking maximum time instead of minimum time. TabletTime.maxMetadataTime() should take reverse action for reverse time. Need to test bulk import w/ reverse logical time Sorry I dropped the ball on this. We discussed all of theses other options and I got sidetracked and never did an in depth review. Drew, I think your original idea is good. It is a simple addition to the existing model. The other possible changes discussed require completely rethinking the existing model. Adding it would not prevent some of the more complex changes discussed being implemented in the future. As far as 1.5, this patch was in way before 1.5 feature freeze date. I would kinda like to see it go in 1.5, but this is partially because I feel guilty for dropping the ball on this. On the other hand, I do want to see work on 1.5 really start to settle and focus solely on bug fixes. I will attach a little program I used to poke at this patch.
          Keith Turner made changes -
          Attachment RLTest.java [ 12569217 ]
          Hide
          Keith Turner added a comment -

          Thought of two other things. Needs to support the proxy. Need to make sure recovery works properly. This may be tricky to automatically test, but should at least try it manually.

          Show
          Keith Turner added a comment - Thought of two other things. Needs to support the proxy. Need to make sure recovery works properly. This may be tricky to automatically test, but should at least try it manually.
          Hide
          Josh Elser added a comment -

          As far as 1.5, this patch was in way before 1.5 feature freeze date. I would kinda like to see it go in 1.5, but this is partially because I feel guilty for dropping the ball on this. On the other hand, I do want to see work on 1.5 really start to settle and focus solely on bug fixes.

          I agree that it would be a shame to not include a feature that we had code sitting around for such a long time.

          Show
          Josh Elser added a comment - As far as 1.5, this patch was in way before 1.5 feature freeze date. I would kinda like to see it go in 1.5, but this is partially because I feel guilty for dropping the ball on this. On the other hand, I do want to see work on 1.5 really start to settle and focus solely on bug fixes. I agree that it would be a shame to not include a feature that we had code sitting around for such a long time.
          Hide
          Keith Turner added a comment -

          ACCUMULO-803-3.patch has some of the changes I suggested making, but not all. Its basically the result of me applying Drew's patch and experimenting with it. I threw some code in MiniAccumuloClusterTest just because it was an easy place to experiment from Eclipse, I do not think this should be the final resting place for this code.

          Show
          Keith Turner added a comment - ACCUMULO-803 -3.patch has some of the changes I suggested making, but not all. Its basically the result of me applying Drew's patch and experimenting with it. I threw some code in MiniAccumuloClusterTest just because it was an easy place to experiment from Eclipse, I do not think this should be the final resting place for this code.
          Keith Turner made changes -
          Attachment ACCUMULO-803-3.patch [ 12569261 ]
          Hide
          Josh Elser added a comment -

          ugh, re: bump? I'll see if I can work on making a #4 patch that incorporates the rest of the changes.

          Show
          Josh Elser added a comment - ugh, re: bump? I'll see if I can work on making a #4 patch that incorporates the rest of the changes.
          Josh Elser made changes -
          Fix Version/s 1.6.0 [ 12322468 ]
          Josh Elser made changes -
          Affects Version/s 1.5.0 [ 12318645 ]
          Hide
          Josh Elser added a comment -

          I haven't even started looking into fixing up the last patch Keith put up. Because of that, because no one else is currently looking into it, and we're so far past the initial planned release of 1.5.0, I'm going to bump fix to the next version.

          Show
          Josh Elser added a comment - I haven't even started looking into fixing up the last patch Keith put up. Because of that, because no one else is currently looking into it, and we're so far past the initial planned release of 1.5.0, I'm going to bump fix to the next version.
          Hide
          John Vines added a comment -

          I feel bad about bumping this to 1.7 because it got bumped to 1.6 with a patch. Drew Farris or Keith Turner, is it possible for you to do a revised patch for 1.6?

          Show
          John Vines added a comment - I feel bad about bumping this to 1.7 because it got bumped to 1.6 with a patch. Drew Farris or Keith Turner , is it possible for you to do a revised patch for 1.6?
          Eric Newton made changes -
          Fix Version/s 1.6.0 [ 12322468 ]
          Hide
          Eric Newton added a comment -

          This needs a champion... I've removed the fix version.

          Show
          Eric Newton added a comment - This needs a champion... I've removed the fix version.
          Christopher Tubbs made changes -
          Remote Link This issue links to "Review (Web Link)" [ 20627 ]

            People

            • Assignee:
              Drew Farris
              Reporter:
              Drew Farris
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Development