Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 1.1.8
    • Component/s: Hadoop
    • Labels:
      None

      Description

       public float getProgress()
          {
              // TODO this is totally broken for wide rows
              // the progress is likely to be reported slightly off the actual but close enough
              float progress = ((float) iter.rowsRead() / totalRowCount);
              return progress > 1.0F ? 1.0F : progress;
          }
      

      The problem is iter.rowsRead() does not return the number of rows read from the wide row iterator, but returns number of columns (every row is counted multiple times).

        Activity

        Hide
        Piotr Kołaczkowski added a comment - - edited

        I attach a list of patches affecting operation of CFRR:

        1. Fix for obvious counting bug in wide row iterator that was counting columns instead of rows.
        2. Several fixes in describe_splits:
          fixed non-uniform splitting - caused by integer math roundoff errors
          fixed insane behaviour when number of splits was higher than number of key samples
          added estimated size of the split to the result, and make use of it in CFIF
        3. This is a patch for broken get_paged_slice; addressed in a separate ticket, but I had to include it in order to test my code
        4. Fix for creating excessively small splits (and wrong progress reporting) due to range wrap around.
        5. get_range_slices allows for (start_key, end_token) exactly the same as get_paged_slice
        6. I tried to de-spaghettize CFRR code a little. This also fixes some bug that accidentally slipped in with previous patches.
        Show
        Piotr Kołaczkowski added a comment - - edited I attach a list of patches affecting operation of CFRR: Fix for obvious counting bug in wide row iterator that was counting columns instead of rows. Several fixes in describe_splits: fixed non-uniform splitting - caused by integer math roundoff errors fixed insane behaviour when number of splits was higher than number of key samples added estimated size of the split to the result, and make use of it in CFIF This is a patch for broken get_paged_slice; addressed in a separate ticket, but I had to include it in order to test my code Fix for creating excessively small splits (and wrong progress reporting) due to range wrap around. get_range_slices allows for (start_key, end_token) exactly the same as get_paged_slice I tried to de-spaghettize CFRR code a little. This also fixes some bug that accidentally slipped in with previous patches.
        Hide
        Jonathan Ellis added a comment -

        reviewed + committed patches 01 and 02, rest still pending.

        Show
        Jonathan Ellis added a comment - reviewed + committed patches 01 and 02, rest still pending.
        Hide
        Jonathan Ellis added a comment - - edited

        03 committed in CASSANDRA-4816.

        Not sure about 04 – I'm a fan of the simplifications we get from letting CFRR only need to deal with non-wrapping splits.

        Show
        Jonathan Ellis added a comment - - edited 03 committed in CASSANDRA-4816 . Not sure about 04 – I'm a fan of the simplifications we get from letting CFRR only need to deal with non-wrapping splits.
        Hide
        Piotr Kołaczkowski added a comment -

        #04 - what about virtual nodes in 1.2? Do we insist that split may not span more than one contiguous token range? It will be harder to avoid too small splits. And too small split = bigger task book-keeping overhead.

        Show
        Piotr Kołaczkowski added a comment - #04 - what about virtual nodes in 1.2? Do we insist that split may not span more than one contiguous token range? It will be harder to avoid too small splits. And too small split = bigger task book-keeping overhead.
        Hide
        Piotr Kołaczkowski added a comment -

        Hold on with applying patch 2 for a while. We just discovered it breaks running hive queries while doing rolling upgrade. There is a need for falling back to old describe_splits method if describe_splits_ex is not found.

        Show
        Piotr Kołaczkowski added a comment - Hold on with applying patch 2 for a while. We just discovered it breaks running hive queries while doing rolling upgrade. There is a need for falling back to old describe_splits method if describe_splits_ex is not found.
        Hide
        Piotr Kołaczkowski added a comment -

        Attaching a patch allowing to generate splits also when talking to an older version of thrift server.

        Show
        Piotr Kołaczkowski added a comment - Attaching a patch allowing to generate splits also when talking to an older version of thrift server.
        Hide
        Piotr Kołaczkowski added a comment -

        Rebased patches for recent 1.1. Some patches have been already applied, so removed them from the list.

        Show
        Piotr Kołaczkowski added a comment - Rebased patches for recent 1.1. Some patches have been already applied, so removed them from the list.
        Hide
        Jonathan Ellis added a comment -

        what about virtual nodes in 1.2? Do we insist that split may not span more than one contiguous token range?

        That's kind of orthogonal to wrapping ranges per se – you'll still only have a single [virtual] node whose range wraps. So vnodes won't make that worse. Moreover, you're still going to need two scans at the disk level since a wrapping range won't be contiguous there. (Currently wrapping ranges are split by StorageProxy.getRestrictedRanges but this may change for CASSANDRA-4858.) Doing an extra Thrift or CQL query is negligible overhead compared to the actual scan.

        Finally, getRestrictedRanges will split it up into scan-per-vnode which I agree is something we should fix but I don't think this patch does it. As an optimization I don't think it's something we should block 1.2.0 for. Should we split this into a separate ticket?

        Show
        Jonathan Ellis added a comment - what about virtual nodes in 1.2? Do we insist that split may not span more than one contiguous token range? That's kind of orthogonal to wrapping ranges per se – you'll still only have a single [virtual] node whose range wraps. So vnodes won't make that worse. Moreover, you're still going to need two scans at the disk level since a wrapping range won't be contiguous there. (Currently wrapping ranges are split by StorageProxy.getRestrictedRanges but this may change for CASSANDRA-4858 .) Doing an extra Thrift or CQL query is negligible overhead compared to the actual scan. Finally, getRestrictedRanges will split it up into scan-per-vnode which I agree is something we should fix but I don't think this patch does it. As an optimization I don't think it's something we should block 1.2.0 for. Should we split this into a separate ticket?
        Hide
        Jonathan Ellis added a comment -

        0006: Can you split out the bug fix and rebase? Refactoring is fine but let's keep it separate from bug fixes.

        0007: I'm unclear what this is useful for, anyone running a 1.1 recent enough to have this patch, would also have describe_splits_ex, no?

        Show
        Jonathan Ellis added a comment - 0006: Can you split out the bug fix and rebase? Refactoring is fine but let's keep it separate from bug fixes. 0007: I'm unclear what this is useful for, anyone running a 1.1 recent enough to have this patch, would also have describe_splits_ex, no?
        Hide
        Piotr Kołaczkowski added a comment -

        0007 - this is for rolling upgrade. When you upgrade one node and the other nodes don't have describe_splits_ex yet, starting a hadoop job on a newly upgraded node fails.

        As for 0004 / 0006 fixes - I agree. Let's move them to a separate ticket.

        Show
        Piotr Kołaczkowski added a comment - 0007 - this is for rolling upgrade. When you upgrade one node and the other nodes don't have describe_splits_ex yet, starting a hadoop job on a newly upgraded node fails. As for 0004 / 0006 fixes - I agree. Let's move them to a separate ticket.
        Hide
        Piotr Kołaczkowski added a comment -

        BTW: I don't get email notifications from ASF Jira. It is because I registered long, long time ago and my email is obsolete. How to change my email to a newer one? I can't see an option in the user profile for doing that.

        Show
        Piotr Kołaczkowski added a comment - BTW: I don't get email notifications from ASF Jira. It is because I registered long, long time ago and my email is obsolete. How to change my email to a newer one? I can't see an option in the user profile for doing that.
        Hide
        Jonathan Ellis added a comment -

        Is there a more specific exception we can catch besides TException?

        Show
        Jonathan Ellis added a comment - Is there a more specific exception we can catch besides TException?
        Hide
        Jonathan Ellis added a comment -

        Without trying it, I think it would be TApplicationException.WRONG_METHOD_NAME.

        Show
        Jonathan Ellis added a comment - Without trying it, I think it would be TApplicationException.WRONG_METHOD_NAME.
        Hide
        Piotr Kołaczkowski added a comment -

        Jonathan Ellis right, I change it.

        Show
        Piotr Kołaczkowski added a comment - Jonathan Ellis right, I change it.
        Hide
        Piotr Kołaczkowski added a comment -

        0007-Fallback-to-describe-splits-v2.patch: Catching TApplicationException and checking its type.

        Show
        Piotr Kołaczkowski added a comment - 0007-Fallback-to-describe-splits-v2.patch: Catching TApplicationException and checking its type.
        Hide
        Piotr Kołaczkowski added a comment -

        Attached third version of the patch - this time it works. The right error code is UNKNOWN_METHOD (1), not WRONG_METHOD_NAME.

        Show
        Piotr Kołaczkowski added a comment - Attached third version of the patch - this time it works. The right error code is UNKNOWN_METHOD (1), not WRONG_METHOD_NAME.
        Hide
        Jonathan Ellis added a comment -

        (committed)

        Show
        Jonathan Ellis added a comment - (committed)

          People

          • Assignee:
            Piotr Kołaczkowski
            Reporter:
            Piotr Kołaczkowski
            Reviewer:
            Jonathan Ellis
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development