Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Won't Fix
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The Lucene FST data structure offers a compact and fast system for indexing Cassandra keys. More keys may be loaded which in turn should seeks faster.

      • Update the IndexSummary class to make use of the Lucene FST, overriding the serialization mechanism.
      • Alter SSTableReader to make use of the FST seek mechanism
      1. CASSANDRA-4324.patch
        22 kB
        Jason Rutherglen
      2. CASSANDRA-4324.patch
        3 kB
        Jason Rutherglen
      3. lucene-core-4.0-SNAPSHOT.jar
        2.09 MB
        Jason Rutherglen
      4. CASSANDRA-4324.patch
        4 kB
        Jason Rutherglen

        Issue Links

          Activity

          Jonathan Ellis made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Won't Fix [ 2 ]
          Hide
          Jonathan Ellis added a comment -

          Wontfixing for lack of progress, and because we've already moved the partition index sample off heap in 2.0.

          Show
          Jonathan Ellis added a comment - Wontfixing for lack of progress, and because we've already moved the partition index sample off heap in 2.0.
          Hide
          Jonathan Ellis added a comment -

          My point is that keys routed to a given node will be effectively a random subset of the keys in use (under RP/M3P), so unlike the tokens I don't think it's clear at all that they will be well-compressible by FST.

          Show
          Jonathan Ellis added a comment - My point is that keys routed to a given node will be effectively a random subset of the keys in use (under RP/M3P), so unlike the tokens I don't think it's clear at all that they will be well-compressible by FST.
          Hide
          Jason Rutherglen added a comment -

          What is the difference between the token and the key?

          The FST could theoretically cache (due to compression efficiency) every key/token in RAM, removing the need for the key cache, and providing an extremely fine grained pointer to the underlying row value data.

          Show
          Jason Rutherglen added a comment - What is the difference between the token and the key? The FST could theoretically cache (due to compression efficiency) every key/token in RAM, removing the need for the key cache, and providing an extremely fine grained pointer to the underlying row value data.
          Jonathan Ellis made changes -
          Reviewer yukim
          Hide
          Jonathan Ellis added a comment -

          Does CASSANDRA-5506 make this obsolete? ISTM that the savings here can only come from the Token, since the keys themselves will not be ordered appropriately. (That is: CASSANDRA-5506 orders the keys by token, but only stores the underlying key byte[], and regenerates the token when necessary.)

          Show
          Jonathan Ellis added a comment - Does CASSANDRA-5506 make this obsolete? ISTM that the savings here can only come from the Token, since the keys themselves will not be ordered appropriately. (That is: CASSANDRA-5506 orders the keys by token, but only stores the underlying key byte[], and regenerates the token when necessary.)
          Jonathan Ellis made changes -
          Link This issue relates to CASSANDRA-5506 [ CASSANDRA-5506 ]
          Gavin made changes -
          Workflow patch-available, re-open possible [ 12753045 ] reopen-resolved, no closed status, patch-avail, testing [ 12758591 ]
          Gavin made changes -
          Workflow no-reopen-closed, patch-avail [ 12672614 ] patch-available, re-open possible [ 12753045 ]
          Hide
          Jason Rutherglen added a comment -

          Lucene 4.0 has been released, so maybe it's time to revisit this issue.

          Show
          Jason Rutherglen added a comment - Lucene 4.0 has been released, so maybe it's time to revisit this issue.
          Jonathan Ellis made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Jonathan Ellis made changes -
          Fix Version/s 1.2.0 [ 12319262 ]
          Hide
          Jonathan Ellis added a comment -

          Coming up on 1.2 freeze, un-tagging that version.

          Show
          Jonathan Ellis added a comment - Coming up on 1.2 freeze, un-tagging that version.
          Hide
          Jonathan Ellis added a comment -

          Not from what I saw skimming the uses.

          Show
          Jonathan Ellis added a comment - Not from what I saw skimming the uses.
          Hide
          Jason Rutherglen added a comment -

          Jonathan, do you mean there is no need for 'array index' lookups into the 'index' keys?

          Show
          Jason Rutherglen added a comment - Jonathan, do you mean there is no need for 'array index' lookups into the 'index' keys?
          Hide
          Jonathan Ellis added a comment -

          Yes, I don't think we need random access anywhere but the actual index lookups.

          Show
          Jonathan Ellis added a comment - Yes, I don't think we need random access anywhere but the actual index lookups.
          Hide
          Jason Rutherglen added a comment -

          some uses, like the one in getKeySamples(Range), are going to be tricky without an actual list of keys to work with

          The FST has a range iteration method, and the FST will return the keys, which should suffice?

          Show
          Jason Rutherglen added a comment - some uses, like the one in getKeySamples(Range), are going to be tricky without an actual list of keys to work with The FST has a range iteration method, and the FST will return the keys, which should suffice?
          Hide
          Jonathan Ellis added a comment -

          Looks like only SSTableReader calls IndexSummary#getKeys

          Yes, although some uses, like the one in getKeySamples(Range), are going to be tricky without an actual list of keys to work with.

          Show
          Jonathan Ellis added a comment - Looks like only SSTableReader calls IndexSummary#getKeys Yes, although some uses, like the one in getKeySamples(Range) , are going to be tricky without an actual list of keys to work with.
          Hide
          Jason Rutherglen added a comment -

          Looks like only SSTableReader calls IndexSummary#getKeys()? As you mentioned, we need to abstract out the key and key range access into the IndexSummary class, that should obviate the need to call getKeys, unless I'm missing something?

          Show
          Jason Rutherglen added a comment - Looks like only SSTableReader calls IndexSummary#getKeys()? As you mentioned, we need to abstract out the key and key range access into the IndexSummary class, that should obviate the need to call getKeys, unless I'm missing something?
          Hide
          Yuki Morishita added a comment -

          Interesting, what is the information?

          Specifically I mean IndexSummary#getKeys() returns list of DecoratedKeys and that is called from many places. FST does not carry DecoratedKeys around so you may need to find the way to work around.

          Show
          Yuki Morishita added a comment - Interesting, what is the information? Specifically I mean IndexSummary#getKeys() returns list of DecoratedKeys and that is called from many places. FST does not carry DecoratedKeys around so you may need to find the way to work around.
          Hide
          Jason Rutherglen added a comment -

          Also it should be mentioned, using the FST will be a big win for garbage collection, it's basically a single byte[]. The IndexSummary currently uses a lot of object pointers, which are more costly than a single byte[].

          Show
          Jason Rutherglen added a comment - Also it should be mentioned, using the FST will be a big win for garbage collection, it's basically a single byte[]. The IndexSummary currently uses a lot of object pointers, which are more costly than a single byte[].
          Hide
          Jason Rutherglen added a comment -

          Nice that's a huge win! That's for the MD5 encoded keys?

          DecoratedKeys stored inside current IndexSummary are actually accessed from various places, and FST version will lack those information, you may need to figure out the alternative way to preserve current functionality

          Interesting, what is the information? Why are there two keys stored in DecoratedKey?

          FST supports range like scans, however I am not exactly sure how that part works. We probably want to restructure the API to make it abstract for both implementations?

          Show
          Jason Rutherglen added a comment - Nice that's a huge win! That's for the MD5 encoded keys? DecoratedKeys stored inside current IndexSummary are actually accessed from various places, and FST version will lack those information, you may need to figure out the alternative way to preserve current functionality Interesting, what is the information? Why are there two keys stored in DecoratedKey? FST supports range like scans, however I am not exactly sure how that part works. We probably want to restructure the API to make it abstract for both implementations?
          Hide
          Yuki Morishita added a comment -

          Jason,

          I used YourKit and profiled memory usage for your test (little bit modified to call IndexSummary#complete) and it shows

          IndexSummary: 21,597,040 (~20MB)
          FST: 3,576,248 (~3.4MB)

          for storing 10,000 keys to each, so it's pretty impressive. If we can deliver this, it will be huge win.
          (Note that on disk, IndexSummary only writes key portion of DecoratedKey so it may be smaller than FST.)

          My concerns left are as follows:

          • Planned 1.2 release saves IndexSummary to disk(CASSANDRA-2392), so I think it is better to leave current implementation and add FST version of IndexSummary so you can rw from both.
          • DecoratedKeys stored inside current IndexSummary are actually accessed from various places, and FST version will lack those information, you may need to figure out the alternative way to preserve current functionality.
          • If you want to use Lucene 4.0, we should release this feature after 4.0 release.

          Also the last results are for 100,000 keys rather than 1 mil.

          IndexSummary holds keys for every index_interval(default 128), so I think you don't need to test with 1 mil.

          Show
          Yuki Morishita added a comment - Jason, I used YourKit and profiled memory usage for your test (little bit modified to call IndexSummary#complete) and it shows IndexSummary: 21,597,040 (~20MB) FST: 3,576,248 (~3.4MB) for storing 10,000 keys to each, so it's pretty impressive. If we can deliver this, it will be huge win. (Note that on disk, IndexSummary only writes key portion of DecoratedKey so it may be smaller than FST.) My concerns left are as follows: Planned 1.2 release saves IndexSummary to disk( CASSANDRA-2392 ), so I think it is better to leave current implementation and add FST version of IndexSummary so you can rw from both. DecoratedKeys stored inside current IndexSummary are actually accessed from various places, and FST version will lack those information, you may need to figure out the alternative way to preserve current functionality. If you want to use Lucene 4.0, we should release this feature after 4.0 release. Also the last results are for 100,000 keys rather than 1 mil. IndexSummary holds keys for every index_interval(default 128), so I think you don't need to test with 1 mil.
          Hide
          Jason Rutherglen added a comment -

          Also the last results are for 100,000 keys rather than 1 mil.

          Show
          Jason Rutherglen added a comment - Also the last results are for 100,000 keys rather than 1 mil.
          Jason Rutherglen made changes -
          Attachment CASSANDRA-4324.patch [ 12536531 ]
          Hide
          Jason Rutherglen added a comment -

          Reran with sorting the keys:

          Reran

          FST: 3,564,246
          IndexSummary: 4,399,624

          The FST is 19% smaller, if the IndexSummary mem calculation is correct.

          Show
          Jason Rutherglen added a comment - Reran with sorting the keys: Reran FST: 3,564,246 IndexSummary: 4,399,624 The FST is 19% smaller, if the IndexSummary mem calculation is correct.
          Hide
          Jason Rutherglen added a comment -

          Can you verify the mem size calculation for the IndexSummary is correct?

          I realized the keys should be sorted? Where is the code that sorts the keys?

          Show
          Jason Rutherglen added a comment - Can you verify the mem size calculation for the IndexSummary is correct? I realized the keys should be sorted? Where is the code that sorts the keys?
          Jason Rutherglen made changes -
          Attachment CASSANDRA-4324.patch [ 12536529 ]
          Attachment lucene-core-4.0-SNAPSHOT.jar [ 12536530 ]
          Hide
          Jason Rutherglen added a comment -

          FSTMemUsage compares the memory usage of the FST vs. IndexSummary.

          On 1 million keys these are the results:

          FST: 39,032,383 bytes
          IndexSummary: 43,996,068 bytes

          A difference of about 4 megabytes. FST w would be far smaller if the MD5 hash was not being applied to the key, eg, it does best to with keys that are sequential so that prefix compression may be applied.

          To run FSTMemUsage, the lucene-core-4.0-SNAPSHOT.jar needs to be added to the lib/ directory.

          The patch was generated using 'git diff HEAD~1..HEAD' because 'git diff' after 'git add' did not work.

          Show
          Jason Rutherglen added a comment - FSTMemUsage compares the memory usage of the FST vs. IndexSummary. On 1 million keys these are the results: FST: 39,032,383 bytes IndexSummary: 43,996,068 bytes A difference of about 4 megabytes. FST w would be far smaller if the MD5 hash was not being applied to the key, eg, it does best to with keys that are sequential so that prefix compression may be applied. To run FSTMemUsage, the lucene-core-4.0-SNAPSHOT.jar needs to be added to the lib/ directory. The patch was generated using 'git diff HEAD~1..HEAD' because 'git diff' after 'git add' did not work.
          Hide
          Jason Rutherglen added a comment -

          Lucene 4.0 will be available shortly and contains FST code that is very stable. We should use it with the assumption that Lucene 4.0 will be available soon.

          Show
          Jason Rutherglen added a comment - Lucene 4.0 will be available shortly and contains FST code that is very stable. We should use it with the assumption that Lucene 4.0 will be available soon.
          Hide
          Yuki Morishita added a comment -

          Manually put jar file inside lib directory (as well as adding dependency for maven in build.xml as you do).
          I've noticed you added 4.0-SNAPSHOT to build.xml, but I think 4.0 is still alpha, so does lucene-core 3.6 work for this?

          Show
          Yuki Morishita added a comment - Manually put jar file inside lib directory (as well as adding dependency for maven in build.xml as you do). I've noticed you added 4.0-SNAPSHOT to build.xml, but I think 4.0 is still alpha, so does lucene-core 3.6 work for this?
          Hide
          Jason Rutherglen added a comment -

          Also what is the best way to include the Lucene libraries?

          Show
          Jason Rutherglen added a comment - Also what is the best way to include the Lucene libraries?
          Hide
          Jason Rutherglen added a comment -

          An easy approach is to generate an ascending set of keys and positions, apply the MD5 hash and add them to both data structures and compare the RAM usage.

          Show
          Jason Rutherglen added a comment - An easy approach is to generate an ascending set of keys and positions, apply the MD5 hash and add them to both data structures and compare the RAM usage.
          Hide
          Jonathan Ellis added a comment -

          In general the big win with the FST is the amount of RAM consumed should be far less. That is fairly easy to measure by generating N keys and comparing the RAM usage

          I think that's what yuki meant by "micro benchmark [of] memory."

          Show
          Jonathan Ellis added a comment - In general the big win with the FST is the amount of RAM consumed should be far less. That is fairly easy to measure by generating N keys and comparing the RAM usage I think that's what yuki meant by "micro benchmark [of] memory."
          Hide
          Jason Rutherglen added a comment -

          The range portion of the patch is not completed as per the initial patch posted. So I'm not sure tools/stress will work? However if tools/stress has a way to generate keys that would be useful.

          Show
          Jason Rutherglen added a comment - The range portion of the patch is not completed as per the initial patch posted. So I'm not sure tools/stress will work? However if tools/stress has a way to generate keys that would be useful.
          Hide
          Sylvain Lebresne added a comment -

          I guess we mostly want to make sure this patch is not clearly slower that with the current index (and if it's faster then that's even better). Using tools/stress (with enough keys inserted) should be a good start.

          Show
          Sylvain Lebresne added a comment - I guess we mostly want to make sure this patch is not clearly slower that with the current index (and if it's faster then that's even better). Using tools/stress (with enough keys inserted) should be a good start.
          Hide
          Jason Rutherglen added a comment -

          The benchmark idea is interesting, however it will not take into account the fact that the FST will be able to store more keys and use less RAM. With greater key granularity, a seek to a given value will be faster? Is there an existing benchmark framework that will for example generate the keys?

          In general the big win with the FST is the amount of RAM consumed should be far less. That is fairly easy to measure by generating N keys and comparing the RAM usage, which with the existing IndexSummary will include object pointers.

          This article describes the improvements seen using Wikipedia using the FST, up to 52% less RAM used, and 22% faster. Though we need to perform our own benchmarks because an MD5 key is different than a dictionary of words.

          http://blog.mikemccandless.com/2011/01/finite-state-transducers-part-2.html

          Show
          Jason Rutherglen added a comment - The benchmark idea is interesting, however it will not take into account the fact that the FST will be able to store more keys and use less RAM. With greater key granularity, a seek to a given value will be faster? Is there an existing benchmark framework that will for example generate the keys? In general the big win with the FST is the amount of RAM consumed should be far less. That is fairly easy to measure by generating N keys and comparing the RAM usage, which with the existing IndexSummary will include object pointers. This article describes the improvements seen using Wikipedia using the FST, up to 52% less RAM used, and 22% faster. Though we need to perform our own benchmarks because an MD5 key is different than a dictionary of words. http://blog.mikemccandless.com/2011/01/finite-state-transducers-part-2.html
          Hide
          Yuki Morishita added a comment -

          Jason,

          Thanks for the patch.
          Current IndexSummary has list of DecoratedKeys and list of positions, but search is done against KeyBound as well. Both DecoratedKey and KeyBound are subclass of RowPosition and are compared using their Tokens. So, I think you have to construct FST against Token.
          For implementation, it would be better to keep all lucene FST related classes inside IndexSummary and not expose them directory to SSTableReader etc.

          Also, can you provide micro benchmark (memory, cpu time...) of IndexSummary between current implementation and FST?

          Show
          Yuki Morishita added a comment - Jason, Thanks for the patch. Current IndexSummary has list of DecoratedKeys and list of positions, but search is done against KeyBound as well. Both DecoratedKey and KeyBound are subclass of RowPosition and are compared using their Tokens. So, I think you have to construct FST against Token. For implementation, it would be better to keep all lucene FST related classes inside IndexSummary and not expose them directory to SSTableReader etc. Also, can you provide micro benchmark (memory, cpu time...) of IndexSummary between current implementation and FST?
          Jonathan Ellis made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Fix Version/s 1.2 [ 12319262 ]
          Jonathan Ellis made changes -
          Reviewer yukim
          Jason Rutherglen made changes -
          Attachment CASSANDRA-4324.patch [ 12533715 ]
          Hide
          Jason Rutherglen added a comment -

          Attached is a rough first cut of this functionality. It is in a rough state however I figured now is a good point to get some feedback before proceeding further.

          In the dev environment I had to copy the Lucene library into the project and am not sure why that is necessary, as Lucene is included as a dependency in the pom.xml file.

          The sample key range code is confusing as I am unsure of the purpose.

          The patch was created using 'git diff'.

          Show
          Jason Rutherglen added a comment - Attached is a rough first cut of this functionality. It is in a rough state however I figured now is a good point to get some feedback before proceeding further. In the dev environment I had to copy the Lucene library into the project and am not sure why that is necessary, as Lucene is included as a dependency in the pom.xml file. The sample key range code is confusing as I am unsure of the purpose. The patch was created using 'git diff'.
          Jonathan Ellis made changes -
          Field Original Value New Value
          Fix Version/s 1.1.1 [ 12319857 ]
          Affects Version/s 1.1.1 [ 12319857 ]
          Jason Rutherglen created issue -

            People

            • Assignee:
              Jason Rutherglen
              Reporter:
              Jason Rutherglen
            • Votes:
              3 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development