Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22386 Data Source V2 improvements
  3. SPARK-23889

DataSourceV2: Add interfaces to pass required sorting and clustering for writes

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.3.0
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
      None

      Description

      From a discussion on the dev list, there is consensus around adding interfaces to pass required sorting and clustering to Spark. The proposal is to add:

      interface RequiresClustering {
        Set<Expression> requiredClustering();
      }
      
      interface RequiresSort {
        List<SortOrder> requiredOrdering();
      }
      

      When only RequiresSort is present, the sort would produce a global sort. The partitioning introduced by that sort would be overridden by RequiresClustering, making the sort local to each partition.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                rdblue Ryan Blue
              • Votes:
                2 Vote for this issue
                Watchers:
                16 Start watching this issue

                Dates

                • Created:
                  Updated: