Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-412

Pipeline and buffer UPSERT SELECT to prevent writing results of SELECT to client



    • Task
    • Status: Resolved
    • Resolution: Fixed
    • None
    • None
    • None
    • 281


      A non limited SELECT currently runs in parallel, buffering the results on the client side. This works well in the typical use case of a selective WHERE clause (since the scan runs in parallel), but not so well otherwise. For UPSERT SELECT, a typical use case would be to create a new table based on an existing table. Often times, no WHERE clause will be present, thus causing us to write the entire table being selected on to the client machine, which is obviously bad.

      With secondary indexing coming in soon, and given that we use UPSERT SELECT to initially populate the index table, we should optimize this doing the following:

      • Modify ParallelIterators to be able to provide a factory to create the SpoolingResultIterator
      • In the case of UPSERT SELECT, create a spooling iterator that buffers the results into a MutationState (see existing code in UpsertCompiler:359 for upsert select run on client-side)
      • When the MutationState reaches the batch size limit, commit the batch (again as is done in UpsertCompiler) and clear the MutationState

      This will perform much better. Probably can just move the UpsertCompile code for this case into the new spooling iterator implementation.




            jamestaylor James R. Taylor
            jamestaylor James R. Taylor
            0 Vote for this issue
            1 Start watching this issue