Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-1504

Enabling fragment memory limit causes out of memory error

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.6.0
    • 0.7.0
    • Execution - Flow
    • None

    Description

      When fragment memory limit is enabled, running a query with a large number of fragments hits the fragment memory limit after being run a few times.
      It appears there are two problems - 1) At the end of the query, the drillbit does not reset the fragment limit to the amount before the query was run, and 2) the fragment limit seems to be smaller than expected.

      The cause seems to be the following -

      When a drillbit receives a request for a fragmentRecordBatch, the BitServer threads create a NonRootFragmentManager object each, corresponding to the FragmentHandle. Only one of the NonRootFragmentManager objects is actually used, the others are discarded and garbage collected.
      However, when fragment memory limit is enabled, the Allocator corresponding to each of these nonRootFragmentManager objects registers the corresponding FragmentContext with the top level allocator which then uses this information to recalculate the fragment limit.
      This has two effects - 1) the top level allocator counts more fragments because it counts each fragment multiple times. 2) The top level allocator keeps a reference to the fragment context which prevents the object from being garbage collected. Worse, since no code actually 'closes' the fragment context, these objects remain registered with the top level allocator across queries, eventually causing an out of memory condition.

      Attachments

        1. DRILL-1504-1-patch.diff
          10 kB
          Parth Chandra

        Issue Links

          Activity

            People

              Unassigned Unassigned
              parthc Parth Chandra
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: