Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Fix Version/s: 1.0.0
    • Component/s: None
    • Labels:
      None

      Description

      The core reason for this ticket is to gain control over the consumption of the lazy nested iterators in the read path.

      We survive now because we write the size of the row at the front of the row (via some serious acrobatics at write time), which gives us hasNext() for rows for free. But it became apparent while working on the block-based format that hasNext() will not be cheap unless the current item has been consumed. "Consumption" of the row is easy, and blocks will be framed so that they can be very easily skipped, but you don't want to have to seek to the end of the row to answer hasNext, and then seek back to the beginning to consume the row, which is what CollatingIterator would have forced us to do.

      While we're at it, we can also improve efficiency: for M iterators containing N total items, commons.collections.CollatingIterator performs a O(M*N) merge, and calls hasNext multiple times per returned value. We can do better.

        Issue Links

          Activity

          Hide
          jbellis Jonathan Ellis added a comment -

          committed, thanks!

          Show
          jbellis Jonathan Ellis added a comment - committed, thanks!
          Hide
          stuhood Stu Hood added a comment -

          [CollatingIterator] calls hasNext() on its child iterator immediately after pulling off the least value from one.

          ReducingIterator also does not know whether there are more items until it has consumed past the end of the current item, which is why it was necessary to squash it in.

          we still have the RI use in collectCollatedColumns [...] as well as LazyColumnIterator

          Fixed in 0003 and removed in 0004.

          It looks to me like the main obstacle to using MI there is making MI.Reducer support customizable isEqual?

          The comparator passed to the MI is used for isEqual, so it should be pluggable already.

          Show
          stuhood Stu Hood added a comment - [CollatingIterator] calls hasNext() on its child iterator immediately after pulling off the least value from one. ReducingIterator also does not know whether there are more items until it has consumed past the end of the current item, which is why it was necessary to squash it in. we still have the RI use in collectCollatedColumns [...] as well as LazyColumnIterator Fixed in 0003 and removed in 0004. It looks to me like the main obstacle to using MI there is making MI.Reducer support customizable isEqual? The comparator passed to the MI is used for isEqual, so it should be pluggable already.
          Hide
          jbellis Jonathan Ellis added a comment -

          It looks like the core problem is that (because CollatingIterator does not require that its inputs be unique?), it calls hasNext() on its child iterator immediately after pulling off the least value from one.

          So I think this is fixable by creating a PQ-using UniquesCollatingIterator, for instance (better name as an exercise for the reader .

          But, the MergingIterator approach with the reduce incorporated has a kind of elegance to it as well. The reduce logic is certainly simpler in MI. So I'm fine w/ replacing CI+RI with MI if that's the approach you want, but I'd like to make it a clean replace – we still have the RI use in collectCollatedColumns, as you noted above, as well as LazyColumnIterator.

          It looks to me like the main obstacle to using MI there is making MI.Reducer support customizable isEqual?

          Show
          jbellis Jonathan Ellis added a comment - It looks like the core problem is that (because CollatingIterator does not require that its inputs be unique?), it calls hasNext() on its child iterator immediately after pulling off the least value from one. So I think this is fixable by creating a PQ-using UniquesCollatingIterator, for instance (better name as an exercise for the reader . But, the MergingIterator approach with the reduce incorporated has a kind of elegance to it as well. The reduce logic is certainly simpler in MI. So I'm fine w/ replacing CI+RI with MI if that's the approach you want, but I'd like to make it a clean replace – we still have the RI use in collectCollatedColumns, as you noted above, as well as LazyColumnIterator. It looks to me like the main obstacle to using MI there is making MI.Reducer support customizable isEqual?
          Hide
          stuhood Stu Hood added a comment -

          just use AbstractIterator. computeNext the block and then it handles making arbitrary combinations of next/hasNext efficient.

          This assumes that the content of the row fits in memory, which is one of things we were trying to avoid I believe.

          Show
          stuhood Stu Hood added a comment - just use AbstractIterator. computeNext the block and then it handles making arbitrary combinations of next/hasNext efficient. This assumes that the content of the row fits in memory, which is one of things we were trying to avoid I believe.
          Hide
          jbellis Jonathan Ellis added a comment - - edited

          To avoid seeking, we need to guarantee:
          A = next()
          A is completely consumed (see [1])
          hasNext() ?
          B = next()

          Right, I'm saying that instead of trying to guarantee this particular pattern, which limits generality (and introduces subtle performance problems if this is violated), just use AbstractIterator. computeNext the block and then it handles making arbitrary combinations of next/hasNext efficient.

          Show
          jbellis Jonathan Ellis added a comment - - edited To avoid seeking, we need to guarantee: A = next() A is completely consumed (see [1] ) hasNext() ? B = next() Right, I'm saying that instead of trying to guarantee this particular pattern, which limits generality (and introduces subtle performance problems if this is violated), just use AbstractIterator. computeNext the block and then it handles making arbitrary combinations of next/hasNext efficient.
          Hide
          stuhood Stu Hood added a comment -

          This design is intended to preserve the status quo of nested iterators by adjusting their contract slightly. If we're willing to make more extensive changes to compaction, I truly believe that the design put forth on CASSANDRA-847 is an order of magnitude more elegant.

          Show
          stuhood Stu Hood added a comment - This design is intended to preserve the status quo of nested iterators by adjusting their contract slightly. If we're willing to make more extensive changes to compaction, I truly believe that the design put forth on CASSANDRA-847 is an order of magnitude more elegant.
          Hide
          stuhood Stu Hood added a comment -

          I don't think I understand the problem we're trying to solve here. If hasNext is expensive on a given iterator, the straightforward fix is to cache the answer (and invalidate it on next()).

          hasNext is only expensive if the current item hasn't been consumed: in order to answer "hasNext" in a block based format (where the length of the entire row is not available) you have to read the remainder of the row, which means scanning the data and potentially seeking.

          The fix this ticket implements is to guarantee that hasNext is called lazily once the consumer of the MergeIterator calls hasNext, rather than eagerly after an item (row) is generated, as implemented in CollatingIterator. To avoid seeking, we need to guarantee:

          1. A = next()
          2. A is completely consumed (see [1])
          3. hasNext() ?
          4. B = next()

          The Collating+ReducingIterator used on SSTableScanners by CompactionIterator is the particular area of concern. The items that are being iterated over are rows, which are themselves IColumnIterators. An SSTableScanner in trunk reads the row length from the front of the row, so hasNext does not require a seek. But CASSANDRA-674 will not store a row length anywhere (it's a feature!), and there might be multiple blocks involved in a single IColumnIterator.

          [1] in CASSANDRA-2629, "consumption" of an IColumnIterator is added as a contract on close()

          Show
          stuhood Stu Hood added a comment - I don't think I understand the problem we're trying to solve here. If hasNext is expensive on a given iterator, the straightforward fix is to cache the answer (and invalidate it on next()). hasNext is only expensive if the current item hasn't been consumed: in order to answer "hasNext" in a block based format (where the length of the entire row is not available) you have to read the remainder of the row, which means scanning the data and potentially seeking. The fix this ticket implements is to guarantee that hasNext is called lazily once the consumer of the MergeIterator calls hasNext, rather than eagerly after an item (row) is generated, as implemented in CollatingIterator. To avoid seeking, we need to guarantee: A = next() A is completely consumed (see [1] ) hasNext() ? B = next() The Collating+ReducingIterator used on SSTableScanners by CompactionIterator is the particular area of concern. The items that are being iterated over are rows, which are themselves IColumnIterators. An SSTableScanner in trunk reads the row length from the front of the row, so hasNext does not require a seek. But CASSANDRA-674 will not store a row length anywhere (it's a feature!), and there might be multiple blocks involved in a single IColumnIterator. [1] in CASSANDRA-2629 , "consumption" of an IColumnIterator is added as a contract on close()
          Hide
          stuhood Stu Hood added a comment -

          Rebased.

          Show
          stuhood Stu Hood added a comment - Rebased.
          Hide
          jbellis Jonathan Ellis added a comment -

          hasNext() will not be cheap unless the current item has been consumed

          I don't think I understand the problem we're trying to solve here. If hasNext is expensive on a given iterator, the straightforward fix is to cache the answer (and invalidate it on next()). The most obvious way to do this is to use the guava AbstractIterator which implements both hasNext and next in terms of computeNext.

          Show
          jbellis Jonathan Ellis added a comment - hasNext() will not be cheap unless the current item has been consumed I don't think I understand the problem we're trying to solve here. If hasNext is expensive on a given iterator, the straightforward fix is to cache the answer (and invalidate it on next()). The most obvious way to do this is to use the guava AbstractIterator which implements both hasNext and next in terms of computeNext.
          Hide
          stuhood Stu Hood added a comment -

          Rebased into two patches. There is one remaining usage of ReducingIterator in QueryFilter.collectCollatedColumns, but removing it will require a bit of refactoring.

          Show
          stuhood Stu Hood added a comment - Rebased into two patches. There is one remaining usage of ReducingIterator in QueryFilter.collectCollatedColumns, but removing it will require a bit of refactoring.
          Hide
          stuhood Stu Hood added a comment -

          Squashes the previous 0001-0006 into a new 0001-0002 and adds 0003 and 0004 containing a ReducingIterator replacement. I haven't completed porting consumers of ReducingIterator, but I don't anticipate much trouble there.

          Show
          stuhood Stu Hood added a comment - Squashes the previous 0001-0006 into a new 0001-0002 and adds 0003 and 0004 containing a ReducingIterator replacement. I haven't completed porting consumers of ReducingIterator, but I don't anticipate much trouble there.
          Hide
          stuhood Stu Hood added a comment -

          Edited the description/title to indicate that there is a core issue at stake here, rather than just an algorithm change.

          Show
          stuhood Stu Hood added a comment - Edited the description/title to indicate that there is a core issue at stake here, rather than just an algorithm change.
          Hide
          stuhood Stu Hood added a comment -

          Missed a usage in RowIterator.

          Show
          stuhood Stu Hood added a comment - Missed a usage in RowIterator.
          Hide
          stuhood Stu Hood added a comment -

          johano pointed out that http://code.google.com/p/caliper/ would be a good way to confirm that this doesn't break our common case usages of Collating/Merge. I'm going to perform the Reducing+MergeIterator smash tonight, and then I'll probably spend microtime microbenchmarking.

          Show
          stuhood Stu Hood added a comment - johano pointed out that http://code.google.com/p/caliper/ would be a good way to confirm that this doesn't break our common case usages of Collating/Merge. I'm going to perform the Reducing+MergeIterator smash tonight, and then I'll probably spend microtime microbenchmarking.
          Hide
          stuhood Stu Hood added a comment -

          Also, as an important side note: the core reason for implementing this was to gain control over the consumption of the lazy nested iterators in the read path.

          We survive now because we write the size of the row at the front of the row (via some serious acrobatics at write time), which gives us hasNext() for rows for free. But it became apparent while working on the block-based format that hasNext() will not be cheap unless the current item has been consumed. "Consumption" of the row is easy, and blocks will be framed so that they can be very easily skipped, but you don't want to have to seek to the end of the row to answer hasNext, and then seek back to the beginning to consume the row, which is what CollatingIterator would have forced us to do.

          There is actually one more roadblock to making this crazy iterator dance work: ReducingIterator pulls one more item than it needs from its source before 'reducing' (consuming) the items it is holding. This breaks the invariant, but it should be fixable by smashing ReducingIterator into MergeIterator.

          Show
          stuhood Stu Hood added a comment - Also, as an important side note: the core reason for implementing this was to gain control over the consumption of the lazy nested iterators in the read path. We survive now because we write the size of the row at the front of the row (via some serious acrobatics at write time), which gives us hasNext() for rows for free. But it became apparent while working on the block-based format that hasNext() will not be cheap unless the current item has been consumed. "Consumption" of the row is easy, and blocks will be framed so that they can be very easily skipped, but you don't want to have to seek to the end of the row to answer hasNext, and then seek back to the beginning to consume the row, which is what CollatingIterator would have forced us to do. There is actually one more roadblock to making this crazy iterator dance work: ReducingIterator pulls one more item than it needs from its source before 'reducing' (consuming) the items it is holding. This breaks the invariant, but it should be fixable by smashing ReducingIterator into MergeIterator.
          Hide
          stuhood Stu Hood added a comment -

          The times in that paste aren't supposed to be significant... I was just trying to get the count of comparisons, so I was doing other things with my laptop. That test uses very small keys (<6 bytes), so the comparisons are relatively cheap.

          Show
          stuhood Stu Hood added a comment - The times in that paste aren't supposed to be significant... I was just trying to get the count of comparisons, so I was doing other things with my laptop. That test uses very small keys (<6 bytes), so the comparisons are relatively cheap.
          Hide
          stuhood Stu Hood added a comment -
          • 0001 - Adds MergeIterator, which uses a priority queue to perform a O(2*log(M)*N) merge in the same space. Also guarantees to only call hasNext once per item.
          • 0002 - Adds some debug output for checking the number of comparisons performed during compaction
          • 0003 - Replaces CollatingIterator for Compaction
          • 0004 - Replaces CollatingIterator elsewhere
          • 0005 - Removes debug output and CollatingIterator from FBUtilities

          The number of comparisons needed to compact M=sstables=100, N=rows=80000 drops from ~8 million (M*N) to ~1 million (2*log(M)*N), as expected.

          # MergeIterator
          1 for org.apache.cassandra.io.CompactionIterator@7dc21ece
          org.apache.cassandra.db.LongCompactionSpeedTest: sstables=2 rowsper=1 colsper=200000: 2758 ms
          399999 for org.apache.cassandra.io.CompactionIterator@20ca5bff
          org.apache.cassandra.db.LongCompactionSpeedTest: sstables=2 rowsper=200000 colsper=1: 4504 ms
          917546 for org.apache.cassandra.io.CompactionIterator@6360f5bf
          org.apache.cassandra.db.LongCompactionSpeedTest: sstables=100 rowsper=800 colsper=5: 1021 ms
          
          
          # CollatingIterator
          1 for org.apache.cassandra.io.CompactionIterator@5f873eb2
          org.apache.cassandra.db.LongCompactionSpeedTest: sstables=2 rowsper=1 colsper=200000: 1732 ms
          399999 for org.apache.cassandra.io.CompactionIterator@357c7988
          org.apache.cassandra.db.LongCompactionSpeedTest: sstables=2 rowsper=200000 colsper=1: 4499 ms
          7915050 for org.apache.cassandra.io.CompactionIterator@2ae0420b
          org.apache.cassandra.db.LongCompactionSpeedTest: sstables=100 rowsper=800 colsper=5: 1427 ms
          Show
          stuhood Stu Hood added a comment - 0001 - Adds MergeIterator, which uses a priority queue to perform a O(2*log(M)*N) merge in the same space. Also guarantees to only call hasNext once per item. 0002 - Adds some debug output for checking the number of comparisons performed during compaction 0003 - Replaces CollatingIterator for Compaction 0004 - Replaces CollatingIterator elsewhere 0005 - Removes debug output and CollatingIterator from FBUtilities The number of comparisons needed to compact M=sstables=100 , N=rows=80000 drops from ~8 million ( M*N ) to ~1 million ( 2*log(M)*N ), as expected. # MergeIterator 1 for org.apache.cassandra.io.CompactionIterator@7dc21ece org.apache.cassandra.db.LongCompactionSpeedTest: sstables=2 rowsper=1 colsper=200000: 2758 ms 399999 for org.apache.cassandra.io.CompactionIterator@20ca5bff org.apache.cassandra.db.LongCompactionSpeedTest: sstables=2 rowsper=200000 colsper=1: 4504 ms 917546 for org.apache.cassandra.io.CompactionIterator@6360f5bf org.apache.cassandra.db.LongCompactionSpeedTest: sstables=100 rowsper=800 colsper=5: 1021 ms # CollatingIterator 1 for org.apache.cassandra.io.CompactionIterator@5f873eb2 org.apache.cassandra.db.LongCompactionSpeedTest: sstables=2 rowsper=1 colsper=200000: 1732 ms 399999 for org.apache.cassandra.io.CompactionIterator@357c7988 org.apache.cassandra.db.LongCompactionSpeedTest: sstables=2 rowsper=200000 colsper=1: 4499 ms 7915050 for org.apache.cassandra.io.CompactionIterator@2ae0420b org.apache.cassandra.db.LongCompactionSpeedTest: sstables=100 rowsper=800 colsper=5: 1427 ms

            People

            • Assignee:
              stuhood Stu Hood
              Reporter:
              stuhood Stu Hood
              Reviewer:
              Jonathan Ellis
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development