Cassandra
  1. Cassandra
  2. CASSANDRA-4983

Improve range wrap-around in CFIF: CFIF shouldn't produce input splits of very tiny size

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Won't Fix
    • Fix Version/s: 1.2.7
    • Component/s: None
    • Labels:
      None

      Description

      Currently CFIF splits the wrap-around split into two non-wrap-around splits. While it simplifies CFRR implementation, this approach has several minor downsides:

      • One of the splits can be extremely small. One of our (picky) customers suspected there must be a bug, because one of his map tasks executed in 1 second, while all the rest executed in minutes. Also having a very small task is wasting resources - more resources go to launching the task than doing any real work.
      • The number of map tasks is always one more than the number of (expected rows / cassandra.input.split.size). The number of map tasks is always >= 2. This is confusing customers.
      • Progress reporting for the divided split parts is inaccurate - even if the splits are similar in size, the progress bar goes to about 50% and then immediately to 100%, because it is impossible to estimate their size properly (the size estimation is done before removing wrap-around).

        Activity

        Jonathan Ellis made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Won't Fix [ 2 ]
        Jonathan Ellis made changes -
        Fix Version/s 1.2.7 [ 12324628 ]
        Fix Version/s 1.2.6 [ 12324449 ]
        Sylvain Lebresne made changes -
        Fix Version/s 1.2.6 [ 12324449 ]
        Fix Version/s 1.2.5 [ 12324301 ]
        Jonathan Ellis made changes -
        Fix Version/s 1.2.5 [ 12324301 ]
        Fix Version/s 1.2.4 [ 12324157 ]
        Sylvain Lebresne made changes -
        Fix Version/s 1.2.4 [ 12324157 ]
        Fix Version/s 1.2.3 [ 12324089 ]
        Sylvain Lebresne made changes -
        Fix Version/s 1.2.3 [ 12324089 ]
        Fix Version/s 1.2.2 [ 12323924 ]
        Jonathan Ellis made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Reviewer jbellis
        Fix Version/s 1.2.2 [ 12323924 ]
        Gavin made changes -
        Workflow patch-available, re-open possible [ 12753546 ] reopen-resolved, no closed status, patch-avail, testing [ 12758798 ]
        Gavin made changes -
        Workflow no-reopen-closed, patch-avail [ 12735399 ] patch-available, re-open possible [ 12753546 ]
        Piotr Kołaczkowski made changes -
        Piotr Kołaczkowski made changes -
        Field Original Value New Value
        Description Currently CFIF splits the wrap-around split into two non-wrap-around splits. While it simplifies CFRR implementation, this approach has several minor downsides:

         * One of the splits can be extremely small. One of our (picky) customers suspected there must be a bug, because one of his map tasks executed in 1 second, while all the rest executed in minutes. Also having a very small task is wasting resources - more resources go to launching the task than doing any real work.

         * The number of map tasks is always one more than the number of (expected rows / cassandra.input.split.size). The number of map tasks is always >= 2. This is confusing customers.
        Currently CFIF splits the wrap-around split into two non-wrap-around splits. While it simplifies CFRR implementation, this approach has several minor downsides:

         * One of the splits can be extremely small. One of our (picky) customers suspected there must be a bug, because one of his map tasks executed in 1 second, while all the rest executed in minutes. Also having a very small task is wasting resources - more resources go to launching the task than doing any real work.

         * The number of map tasks is always one more than the number of (expected rows / cassandra.input.split.size). The number of map tasks is always >= 2. This is confusing customers.

         * Progress reporting for the divided split parts is inaccurate - even if the splits are similar in size, the progress bar goes to about 50% and then immediately to 100%, because it is impossible to estimate their size properly (the size estimation is done before removing wrap-around).
        Piotr Kołaczkowski created issue -

          People

          • Assignee:
            Piotr Kołaczkowski
            Reporter:
            Piotr Kołaczkowski
            Reviewer:
            Jonathan Ellis
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development