[PHOENIX-1779] Parallelize fetching of next batch of records for scans corresponding to queries with no order by - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.4.0
Component/s: None
Labels:
None

Description

Today in Phoenix we parallelize the first execution of scans i.e. we load only the first batch of records up to the scan's cache size in parallel. Loading of subsequent batches of records in scanners is essentially serial. This could be improved especially for queries, including the ones with no order by clauses, that do not need any kind of merge sort on the client. This could also potentially improve the performance of UPSERT SELECT statements that load data from one table and insert into another. One such use case being creating immutable indexes for tables that already have data. It could also potentially improve the performance of our MapReduce solution for bulk loading data by improving the speed of the loading/mapping phase.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

PHOENIX-1779_v2.patch
15/Apr/15 17:38
62 kB
Samarth Jain
PHOENIX-1779_v3.patch
17/Apr/15 01:16
60 kB
Samarth Jain
PHOENIX-1779.patch
14/Apr/15 05:49
58 kB
Samarth Jain
wip.patch
26/Mar/15 19:22
36 kB
Samarth Jain
wip3.patch
07/Apr/15 10:47
72 kB
Samarth Jain
wipwithsplits.patch
03/Apr/15 20:46
70 kB
Samarth Jain

Issue Links

relates to

PHOENIX-1871 Optimize Union All by fetching in parallel batches of records for *all* the underlying scanners

Open

Activity

People

Assignee:: Samarth Jain

Reporter:: Samarth Jain

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 26/Mar/15 19:18

Updated:: 21/Nov/15 02:18

Resolved:: 17/Apr/15 19:48