[PHOENIX-412] Pipeline and buffer UPSERT SELECT to prevent writing results of SELECT to client - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Task
Status: Resolved
Resolution: Fixed
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
- enhancement

old issue number:
281

Description

A non limited SELECT currently runs in parallel, buffering the results on the client side. This works well in the typical use case of a selective WHERE clause (since the scan runs in parallel), but not so well otherwise. For UPSERT SELECT, a typical use case would be to create a new table based on an existing table. Often times, no WHERE clause will be present, thus causing us to write the entire table being selected on to the client machine, which is obviously bad.

With secondary indexing coming in soon, and given that we use UPSERT SELECT to initially populate the index table, we should optimize this doing the following:

Modify ParallelIterators to be able to provide a factory to create the SpoolingResultIterator

In the case of UPSERT SELECT, create a spooling iterator that buffers the results into a MutationState (see existing code in UpsertCompiler:359 for upsert select run on client-side)
When the MutationState reaches the batch size limit, commit the batch (again as is done in UpsertCompiler) and clear the MutationState

This will perform much better. Probably can just move the UpsertCompile code for this case into the new spooling iterator implementation.

Attachments

Activity

People

Assignee:: James R. Taylor

Reporter:: James R. Taylor

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 10/Mar/14 23:47

Updated:: 16/Mar/14 07:18

Resolved:: 16/Mar/14 07:18