Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-4983

Improve range wrap-around in CFIF: CFIF shouldn't produce input splits of very tiny size

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Low
    • Resolution: Won't Fix
    • 1.2.7
    • None
    • None

    Description

      Currently CFIF splits the wrap-around split into two non-wrap-around splits. While it simplifies CFRR implementation, this approach has several minor downsides:

      • One of the splits can be extremely small. One of our (picky) customers suspected there must be a bug, because one of his map tasks executed in 1 second, while all the rest executed in minutes. Also having a very small task is wasting resources - more resources go to launching the task than doing any real work.
      • The number of map tasks is always one more than the number of (expected rows / cassandra.input.split.size). The number of map tasks is always >= 2. This is confusing customers.
      • Progress reporting for the divided split parts is inaccurate - even if the splits are similar in size, the progress bar goes to about 50% and then immediately to 100%, because it is impossible to estimate their size properly (the size estimation is done before removing wrap-around).

      Attachments

        Activity

          People

            pkolaczk Piotr Kolaczkowski
            pkolaczk Piotr Kolaczkowski
            Piotr Kolaczkowski
            Jonathan Ellis
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: