Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30983

Support more than 5 typed column in typed Dataset.select API

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 3.1.0
    • None
    • SQL
    • None

    Description

      Because Dataset only provides overloading typed select API to at most 5 typed columns, once more than 5 typed columns given, the select API call will go for untyped one.

      Currently users cannot call typed select with more than 5 typed columns. There are few options:

      1. Expose Dataset.selectUntyped (could rename it) to accept any number (due to the limit of ExpressionEncoder.tuple, at most 22 actually) of typed columns. Pros: not need to add too much code in Dataset. Cons: The returned type is generally Dataset[_], not a specified one like Dataset[(U1, U2)] for the overloading method.

      2. Add more overloading typed select APIs up to 22 typed column inputs. Pros: Clear returned type. Cons: A lot of code to be added to Dataset for just corner cases. It can be a breaking change to existing user code that calls untyped select API.

      Attachments

        Activity

          People

            Unassigned Unassigned
            viirya L. C. Hsieh
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: