Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6949

Query fails with "UNSUPPORTED_OPERATION ERROR: Hash-Join can not partition the inner data any further" when Semi join is enabled

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Following query fails when with Error: UNSUPPORTED_OPERATION ERROR: Hash-Join can not partition the inner data any further (probably due to too many join-key duplicates) on TPC-H SF100 data.

      
      set `exec.hashjoin.enable.runtime_filter` = true;
      set `exec.hashjoin.runtime_filter.max.waiting.time` = 10000;
      set `planner.enable_broadcast_join` = false;
      
      
      select
       count(*)
      from
       lineitem l1
      where
       l1.l_discount IN (
       select
       distinct(cast(l2.l_discount as double))
       from
       lineitem l2);
      
      reset `exec.hashjoin.enable.runtime_filter`;
      reset `exec.hashjoin.runtime_filter.max.waiting.time`;
      reset `planner.enable_broadcast_join`;
      
      

      The subquery contains distinct keyword and hence there should not be duplicate values.

      I suspect that the failure is caused by semijoin because the query succeeds when semijoin is disabled explicitly.
       

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ben-zvi Boaz Ben-Zvi
            aravi5 Abhishek Ravi

            Dates

              Created:
              Updated:

              Slack

                Issue deployment