Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7687

Inaccurate memory estimates in hash join

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.15.0
    • None
    • None
    • None

    Description

      See DRILL-7675. In this ticket, we tried to reproduce an OOM case in the partition sender. In so doing, we mucked with various parallelization options. The query has 2 MB of data, but at one point the query would fail to run because the hash join could not obtain enough memory (on a system with 8 GB of memory available.)

      The problem is that the memory calculator sees a worst-case scenario: a row with 250+ columns. The hash join estimated it needed something like 650MB of memory to perform the join. (That is 650 MB per fragment, and there were multiple fragments.) Since there was insufficient memory, and the drill.exec.hashjoin.fallback.enabled option was disabled, the hash join failed before it even started.

      Better would be to at least try the query. In this case, with 2MB of data, the query succeeds. (Had to enable the magic option to do so.)

      Better also would be to use the estimated row counts when estimating memory use. Maybe better estimates for the amount of memory needed per row. (The data in question has multiple nested map arrays, causing cardinality estimates to grow by 5x at each level.)

      Perhaps use the "batch sizing" mechanism to detect actual memory use by analyzing the incoming batch.

      There is no obvious answer. However, the goal is clear: the query should succeed if the actual memory needed fits within that available; we should not fail proactively based on estimates of needed memory. (This what the drill.exec.hashjoin.fallback.enabled option does; perhaps it should be on by default.)

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            Paul.Rogers Paul Rogers

            Dates

              Created:
              Updated:

              Slack

                Issue deployment