Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-1779

Parallelize fetching of next batch of records for scans corresponding to queries with no order by

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.4.0
    • Labels:
      None

      Description

      Today in Phoenix we parallelize the first execution of scans i.e. we load only the first batch of records up to the scan's cache size in parallel. Loading of subsequent batches of records in scanners is essentially serial. This could be improved especially for queries, including the ones with no order by clauses, that do not need any kind of merge sort on the client. This could also potentially improve the performance of UPSERT SELECT statements that load data from one table and insert into another. One such use case being creating immutable indexes for tables that already have data. It could also potentially improve the performance of our MapReduce solution for bulk loading data by improving the speed of the loading/mapping phase.

        Attachments

        1. wip.patch
          36 kB
          Samarth Jain
        2. wipwithsplits.patch
          70 kB
          Samarth Jain
        3. wip3.patch
          72 kB
          Samarth Jain
        4. PHOENIX-1779.patch
          58 kB
          Samarth Jain
        5. PHOENIX-1779_v2.patch
          62 kB
          Samarth Jain
        6. PHOENIX-1779_v3.patch
          60 kB
          Samarth Jain

          Issue Links

            Activity

              People

              • Assignee:
                samarthjain Samarth Jain
                Reporter:
                samarthjain Samarth Jain
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: