Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • sql

    Description

      As for now, implementation of correlated join has a number of performance problems:

      1. Opening a cursor over store is quite expensive. Given that table split on partitions, actual number of lookups should be multiplied by number of partitions. This should be accounted by cost function.
      2. Integration with storage (not quite a problem of particular implementation of correlated join, but indirectly affects it): every lookup to a storage actually schedules a task in different thread pool. When the scan result is ready, it schedules a task in sql query task executor. Given that we process only one correlate at a time, we are scheduling now `partCount * 2` tasks per every row from left shoulder of join. This is very inefficient for single-row lookups of a small table on the right shoulder (we spent significantly more time on tasks coordination rather than on an actual job).

      We need to improve performance of correlated join in general, or at least find out cases where it performs better that other types of joins and enable correlated join only for those cases.

      Attachments

        Issue Links

          Activity

            There are no comments yet on this issue.

            People

              Unassigned Unassigned
              korlov Konstantin Orlov
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: