Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22386 Data Source V2 improvements
  3. SPARK-23889

DataSourceV2: Add interfaces to pass required sorting and clustering for writes

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.3.0
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
      None

      Description

      From a discussion on the dev list, there is consensus around adding interfaces to pass required sorting and clustering to Spark. The proposal is to add:

      interface RequiresClustering {
        Set<Expression> requiredClustering();
      }
      
      interface RequiresSort {
        List<SortOrder> requiredOrdering();
      }
      

      When only RequiresSort is present, the sort would produce a global sort. The partitioning introduced by that sort would be overridden by RequiresClustering, making the sort local to each partition.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              rdblue Ryan Blue
            • Votes:
              2 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated: