Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22386 Data Source V2 improvements
  3. SPARK-23889

DataSourceV2: Add interfaces to pass required sorting and clustering for writes

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Implemented
    • 2.3.0
    • None
    • SQL
    • None

    Description

      From a discussion on the dev list, there is consensus around adding interfaces to pass required sorting and clustering to Spark. The proposal is to add:

      interface RequiresClustering {
        Set<Expression> requiredClustering();
      }
      
      interface RequiresSort {
        List<SortOrder> requiredOrdering();
      }
      

      When only RequiresSort is present, the sort would produce a global sort. The partitioning introduced by that sort would be overridden by RequiresClustering, making the sort local to each partition.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rdblue Ryan Blue
              Votes:
              3 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: