Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24816

SQL interface support repartitionByRange

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 2.4.0
    • None
    • SQL
    • None

    Description

      SQL interface support repartitionByRange to improvement data pushdown. I have test this feature with a big table(data size: 1.1 T, row count: 282,001,954,428) .

      The test sql is:

      select * from table where id=401564838907
      

      The test result:

      Mode Input Size Records Total Time Duration Prepare data Resource Allocation MB-seconds
      default 959.2 GB 237624395522 11.2 h 1.3 min 6496280086
      DISTRIBUTE BY 970.8 GB 244642791213 11.4 h 1.3 min 10536069846
      SORT BY 456.3 GB 101587838784 5.4 h 31 s 8965158620
      DISTRIBUTE BY + SORT BY  219.0 GB  51723521593 3.3 h 54 s 12552656774
      RANGE PARTITION BY  38.5 GB 75355144 45 min 13 s 14525275297
      RANGE PARTITION BY + SORT BY 17.4 GB 14334724 45 min 12 s 16255296698

      Attachments

        1. DISTRIBUTE_BY_SORT_BY.png
          198 kB
          Yuming Wang
        2. RANGE_DISTRIBUTE_BY_SORT_BY.png
          200 kB
          Yuming Wang

        Activity

          People

            Unassigned Unassigned
            yumwang Yuming Wang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: