Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0, 0.9.0
    • Fix Version/s: 0.10.0
    • Component/s: kv
    • Labels:
      None

      Description

      Looks like RocksDB now has support TTL in java as well (only present in C first).
      https://reviews.facebook.net/D31449

      It would be great if we can add support for this, we would have to do two things:
      1. Make a new RockDB java release
      2. expose the configuration in Samza java

      1. 0001-RocksDB-validation.patch
        2 kB
        Naveen Somasundaram
      2. 0001-RocksDB-TTL.patch
        13 kB
        Naveen Somasundaram

        Issue Links

          Activity

          Hide
          closeuris Yan Fang added a comment -

          Added few comments in the RB.

          BTW, RB: https://reviews.apache.org/r/33735/

          Just one question about this feature: what will happen if we openWithReadOnly the same db that is opened and being written by another application? Will we see all the values, even some of which are not seeable in the open-and-writing application (because TTL)?

          Show
          closeuris Yan Fang added a comment - Added few comments in the RB. BTW, RB: https://reviews.apache.org/r/33735/ Just one question about this feature: what will happen if we openWithReadOnly the same db that is opened and being written by another application? Will we see all the values, even some of which are not seeable in the open-and-writing application (because TTL)?
          Hide
          closeuris Yan Fang added a comment -

          A few more things that may need to be well documented:

          • what will happen in restoring RocksDB? Will we get all the "died" values?
          • what will happen if users enable ttl at first and then disable it in another run? Will this cause corruption?

            Calling DB::Open directly to re-open a db created by this API will get corrupt values(timestamp suffixed) and no ttl effect will be there during the second Open, so use this API consistently to open the db

          • first disable ttl then enable it? change the ttl time?
          Show
          closeuris Yan Fang added a comment - A few more things that may need to be well documented: what will happen in restoring RocksDB? Will we get all the "died" values? what will happen if users enable ttl at first and then disable it in another run? Will this cause corruption? Calling DB::Open directly to re-open a db created by this API will get corrupt values(timestamp suffixed) and no ttl effect will be there during the second Open, so use this API consistently to open the db first disable ttl then enable it? change the ttl time?
          Hide
          closeuris Yan Fang added a comment -

          +1 for the RB. Naveen Somasundaram, could you upload the patch here?

          Show
          closeuris Yan Fang added a comment - +1 for the RB. Naveen Somasundaram , could you upload the patch here?
          Hide
          closeuris Yan Fang added a comment -

          Also, there are four concerns I have for this patch before committing as I mentioned in the previous comment.

          1. what will happen if we openWithReadOnly the same db that is opened and being written by another application? Will we see all the values, even some of which are not seeable in the open-and-writing application (because TTL)?

          2. what will happen in restoring RocksDB? Will we get all the "died" values?

          3. what will happen if users enable ttl at first and then disable it in another run? Will this cause corruption?

          Calling DB::Open directly to re-open a db created by this API will get corrupt values(timestamp suffixed) and no ttl effect will be there during the second Open, so use this API consistently to open the db

          4. Is it possible to first disable ttl then enable it? Or if it is possible to change change the ttl time when restarting?

          Show
          closeuris Yan Fang added a comment - Also, there are four concerns I have for this patch before committing as I mentioned in the previous comment. 1. what will happen if we openWithReadOnly the same db that is opened and being written by another application? Will we see all the values, even some of which are not seeable in the open-and-writing application (because TTL)? 2. what will happen in restoring RocksDB? Will we get all the "died" values? 3. what will happen if users enable ttl at first and then disable it in another run? Will this cause corruption? Calling DB::Open directly to re-open a db created by this API will get corrupt values(timestamp suffixed) and no ttl effect will be there during the second Open, so use this API consistently to open the db 4. Is it possible to first disable ttl then enable it? Or if it is possible to change change the ttl time when restarting?
          Hide
          naveenatceg Naveen Somasundaram added a comment -

          Hey Yan,
          1. I think we should document this, with Changelog enabled, we will get expired values. There is no way for us to know this as a Framework (This happens at RocksDB level). It really doesn't make sense to have Changelog enabled if you are using TTL. Should we just throw an Exception if Changelog is enabled for a TTL based store ? What do you think ? Or should we just let the user decide the behavior ? My vote is for throwing not allow Changelog if TTL is enabled.
          2,3. Our configuration page links to their RocksDB TTL page, where both of these are explained.

          Show
          naveenatceg Naveen Somasundaram added a comment - Hey Yan, 1. I think we should document this, with Changelog enabled, we will get expired values. There is no way for us to know this as a Framework (This happens at RocksDB level). It really doesn't make sense to have Changelog enabled if you are using TTL. Should we just throw an Exception if Changelog is enabled for a TTL based store ? What do you think ? Or should we just let the user decide the behavior ? My vote is for throwing not allow Changelog if TTL is enabled. 2,3. Our configuration page links to their RocksDB TTL page, where both of these are explained.
          Hide
          closeuris Yan Fang added a comment -

          My vote is for throwing not allow Changelog if TTL is enabled.

          +1 for this. Also Yi Pan has comments in the RB.

          Show
          closeuris Yan Fang added a comment - My vote is for throwing not allow Changelog if TTL is enabled. +1 for this. Also Yi Pan has comments in the RB.
          Hide
          nickpan47 Yi Pan (Data Infrastructure) added a comment -

          Hi, guys, I talked to Naveen Somasundaram offline and here is my opinion:

          1. The current RocksDB TTL feature is only kept in local and can not be restored to other host
          2. Build the TTL support w/ changelog and recovery on another host is a bigger scope of task that we need to be careful about

          Given that the use case that needs TTL is often targeted to transient data set, I would think that RocksDB w/ TTL w/o changelog can satisfy some use cases, as Naveen Somasundaram suggest.

          I would suggest that we document the limitation on this feature well, noting that in the current release, all TTL enabled RocksDB store will not be durable upon recovery and will be local only. And we should continue the effort to add recovery support to TTL enabled RocksDB stores in the future.

          Show
          nickpan47 Yi Pan (Data Infrastructure) added a comment - Hi, guys, I talked to Naveen Somasundaram offline and here is my opinion: The current RocksDB TTL feature is only kept in local and can not be restored to other host Build the TTL support w/ changelog and recovery on another host is a bigger scope of task that we need to be careful about Given that the use case that needs TTL is often targeted to transient data set, I would think that RocksDB w/ TTL w/o changelog can satisfy some use cases, as Naveen Somasundaram suggest. I would suggest that we document the limitation on this feature well, noting that in the current release, all TTL enabled RocksDB store will not be durable upon recovery and will be local only. And we should continue the effort to add recovery support to TTL enabled RocksDB stores in the future.
          Hide
          jhartman Joshua Hartman added a comment -

          As a dissenting perspective on TTL without backups:

          Our application will rely on TTL to help us manage out some state. We have rules that we must deduplicate or eliminate content processed for a given user within a certain time window. TTL support is nice for this but if the store isn't durable we can't implement the feature. For our use case, it is acceptable for the TTL to be best-case. For instance, if we ask that something live only 1 day but it lives for 1 day and 6 hours that's fine. We can always handle this by excluding it from the read in the application layer as well. I think this also applies to any app that wants to use TTL to do a stream join.

          I think a lot of applications would have similar characteristics.

          Show
          jhartman Joshua Hartman added a comment - As a dissenting perspective on TTL without backups: Our application will rely on TTL to help us manage out some state. We have rules that we must deduplicate or eliminate content processed for a given user within a certain time window. TTL support is nice for this but if the store isn't durable we can't implement the feature. For our use case, it is acceptable for the TTL to be best-case. For instance, if we ask that something live only 1 day but it lives for 1 day and 6 hours that's fine. We can always handle this by excluding it from the read in the application layer as well. I think this also applies to any app that wants to use TTL to do a stream join. I think a lot of applications would have similar characteristics.
          Hide
          closeuris Yan Fang added a comment -

          We have rules that we must deduplicate or eliminate content processed for a given user within a certain time window.

          Is it possible to do this in window() method?

          TTL support is nice for this but if the store isn't durable we can't implement the feature.

          I agree on this. Without the durability, it only satisfies a few use cases. It's better to have the backup feature. The reason we do not implement this is because currently the TTL happens in Rocksdb layer while the backup happens in Samza layer. Not sure how we will tackle this problem. Also there are some other concerns: when we set the TTL to a lower time, say, 100 seconds, what if the recover takes 100 seconds (in a bad situation), do we delete all the messages ?

          After further discussion, I think we should support the backup feature in the future. Maybe in anther separate ticket. Welcome to discuss and contribute.

          Show
          closeuris Yan Fang added a comment - We have rules that we must deduplicate or eliminate content processed for a given user within a certain time window. Is it possible to do this in window() method? TTL support is nice for this but if the store isn't durable we can't implement the feature. I agree on this. Without the durability, it only satisfies a few use cases. It's better to have the backup feature. The reason we do not implement this is because currently the TTL happens in Rocksdb layer while the backup happens in Samza layer. Not sure how we will tackle this problem. Also there are some other concerns: when we set the TTL to a lower time, say, 100 seconds, what if the recover takes 100 seconds (in a bad situation), do we delete all the messages ? After further discussion, I think we should support the backup feature in the future. Maybe in anther separate ticket. Welcome to discuss and contribute.
          Hide
          naveenatceg Naveen Somasundaram added a comment -

          Yan Fang, I thought about this a little bit more,
          I was thinking as a short term solution:
          What if the users can set the TTL to same value as the Kafka's retention time ? Joshua Hartman Will that work for your use case ? Kafka will wipe out data with a fixed time, within that time interval it's OK to restore your store. This means that we'll revert the exception patch and just allow users to create Changelog streams for even TTL based ones. I agree, it's kind of a gamble, where you will see users accidentally enabling TTL and do not understand the implications of using it. Or even worse, keep writing more and more data till they run of disk before they realize that kafka doesn't expunge expired records. But it's too hard to control. We can do further optimizations in another ticket - to create changelog topic with TTL retention, or if already exists, verify if it matches. But that's the best we can do. What are your thoughts ?

          As long term solution:
          We store some meta information with each key-value (preferably in the key - but this will cause log compaction to fail), so we need to think more about this in detail, but essentially a meta information as a part of the message itself. And we do a restore, we skip the ones which have expired and write a expired_key -> null, which will trigger log compaction and wipe the data out eventually. This probably needs a lot more work and thought.

          Show
          naveenatceg Naveen Somasundaram added a comment - Yan Fang , I thought about this a little bit more, I was thinking as a short term solution: What if the users can set the TTL to same value as the Kafka's retention time ? Joshua Hartman Will that work for your use case ? Kafka will wipe out data with a fixed time, within that time interval it's OK to restore your store. This means that we'll revert the exception patch and just allow users to create Changelog streams for even TTL based ones. I agree, it's kind of a gamble, where you will see users accidentally enabling TTL and do not understand the implications of using it. Or even worse, keep writing more and more data till they run of disk before they realize that kafka doesn't expunge expired records. But it's too hard to control. We can do further optimizations in another ticket - to create changelog topic with TTL retention, or if already exists, verify if it matches. But that's the best we can do. What are your thoughts ? As long term solution: We store some meta information with each key-value (preferably in the key - but this will cause log compaction to fail), so we need to think more about this in detail, but essentially a meta information as a part of the message itself. And we do a restore, we skip the ones which have expired and write a expired_key -> null, which will trigger log compaction and wipe the data out eventually. This probably needs a lot more work and thought.
          Hide
          closeuris Yan Fang added a comment -

          Naveen Somasundaram, could you take care of Mohamed Mahmoud (El-Geish)'s comment in RB? Guess it's better to resolve it before committing.
          Thanks.

          Show
          closeuris Yan Fang added a comment - Naveen Somasundaram , could you take care of Mohamed Mahmoud (El-Geish)'s comment in RB? Guess it's better to resolve it before committing. Thanks.
          Hide
          naveenatceg Naveen Somasundaram added a comment -

          Will do Yan!

          Show
          naveenatceg Naveen Somasundaram added a comment - Will do Yan!
          Hide
          naveenatceg Naveen Somasundaram added a comment -

          Created follow up ticket for changelog support:
          https://issues.apache.org/jira/browse/SAMZA-677

          Show
          naveenatceg Naveen Somasundaram added a comment - Created follow up ticket for changelog support: https://issues.apache.org/jira/browse/SAMZA-677
          Hide
          closeuris Yan Fang added a comment -

          +1 from me. Also get +1 from Yi (in RB). Committed. Thank you.

          Show
          closeuris Yan Fang added a comment - +1 from me. Also get +1 from Yi (in RB). Committed. Thank you.
          Hide
          naveenatceg Naveen Somasundaram added a comment -

          Thanks Yan

          Show
          naveenatceg Naveen Somasundaram added a comment - Thanks Yan

            People

            • Assignee:
              naveenatceg Naveen Somasundaram
              Reporter:
              naveenatceg Naveen Somasundaram
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development