Cassandra
  1. Cassandra
  2. CASSANDRA-4983

Improve range wrap-around in CFIF: CFIF shouldn't produce input splits of very tiny size

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Won't Fix
    • Fix Version/s: 1.2.7
    • Component/s: None
    • Labels:
      None

      Description

      Currently CFIF splits the wrap-around split into two non-wrap-around splits. While it simplifies CFRR implementation, this approach has several minor downsides:

      • One of the splits can be extremely small. One of our (picky) customers suspected there must be a bug, because one of his map tasks executed in 1 second, while all the rest executed in minutes. Also having a very small task is wasting resources - more resources go to launching the task than doing any real work.
      • The number of map tasks is always one more than the number of (expected rows / cassandra.input.split.size). The number of map tasks is always >= 2. This is confusing customers.
      • Progress reporting for the divided split parts is inaccurate - even if the splits are similar in size, the progress bar goes to about 50% and then immediately to 100%, because it is impossible to estimate their size properly (the size estimation is done before removing wrap-around).

        Activity

        Hide
        Jonathan Ellis added a comment -

        I'm not really a fan of making CFRR (and CqlPRR?) more complex to make a corner case slightly better. Remember, we'll have exactly one wrapping range per Task out of the 100s of splits.

        On the bright side, the "real" CqlInputFormat (using server-side paging) will make this a non-issue in 2.0.

        Show
        Jonathan Ellis added a comment - I'm not really a fan of making CFRR (and CqlPRR?) more complex to make a corner case slightly better. Remember, we'll have exactly one wrapping range per Task out of the 100s of splits. On the bright side, the "real" CqlInputFormat (using server-side paging) will make this a non-issue in 2.0.

          People

          • Assignee:
            Piotr Kołaczkowski
            Reporter:
            Piotr Kołaczkowski
            Reviewer:
            Jonathan Ellis
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development