Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10583

[Rust] [DataFusion] Implement "coalesce partitions" operator

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Invalid
    • None
    • None
    • Rust - DataFusion
    • None

    Description

      The coalesce partitions operator simply reduces the number of partitions to the specified amount.

      The target partition count must be >=1

      If the target partition count is >= the number of input partitions then this is a no-op and can be optimized out of the plan.

      The simplest implementation would be to assign one or more input partitions to each output partition. This works well where the number of input partitions is divisible by the number of output partitions e.g. going from 64 input partitions to 8 output partitions. In other cases, the resulting partitions may have data skew e.g. going from 3 partitions to 2. It would be possible to do the partitioning at the row level but that would add a lot of overhead and the "repartition" operator should be used for that case.

      Attachments

        Activity

          People

            Unassigned Unassigned
            andygrove Andy Grove
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: