Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-8026

Actual row counts for nested loop join are way too high while the query is executing

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 3.1.0
    • Impala 3.2.0
    • Backend
    • None
    • ghx-label-4

    Description

      Consider this extract from a query plan:

      Operator                      #Rows  Est. #Rows
      --------------------------------------------------------------
      …
      |  10:HASH JOIN               9.53M      18.14K 
      |  |--19:EXCHANGE                 1           1
      |  |  00:SCAN HDFS                1           1
      |  06:NESTED LOOP JOIN        4.88B     863.84K 
      |  |--18:EXCHANGE                 1           1
      |  |  04:SCAN HDFS                1           1
      |  05:HASH JOIN               9.53M     863.84K
      

      If the above is to be believed, the 06 nested loop join produced 5 billion rows. But, the actual number is far too huge for that: joining 1 row with 10 million rows cannot produce 500 times that number of rows.

      It appears that the nested loop join actually processed and returned the 9.5 million rows, since that is the same number produced by the 10 hash join which joins a single row with the output of the nested loop join.

      Because this same bogus result appears across multiple plans, it is likely that the actual number is completely wrong and bears no relation to the number of rows actually returned.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            tarmstrong Tim Armstrong
            Paul.Rogers Paul Rogers
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment