Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12616

Union logical plan should support arbitrary number of children (rather than binary)

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.0
    • SQL
    • None

    Description

      Union logical plan is a binary node. However, a typical use case for union is to union a very large number of input sources (DataFrames, RDDs, or files). It is not uncommon to union hundreds of thousands of files. In this case, our optimizer can become very slow due to the large number of logical unions. We should change the Union logical plan to support an arbitrary number of children, and add a single rule in the optimizer (or analyzer?) to collapse all adjacent Unions into one.

      Note that this problem doesn't exist in physical plan, because the physical Union already supports arbitrary number of children.

      Attachments

        Issue Links

          Activity

            People

              smilegator Xiao Li
              rxin Reynold Xin
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: