Derby
  1. Derby
  2. DERBY-1259

Optimizer plan consideration doesn't account for infinite cost estimates and can therefore choose plans requiring excessive memory.

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 10.1.3.1, 10.2.1.6
    • Fix Version/s: None
    • Component/s: SQL
    • Environment:
      Windows 2000, ibm142
    • Urgency:
      Normal
    • Bug behavior facts:
      Performance

      Description

      When deciding whether or not to choose a particular access path as the "best one so far", the optimizer tries to guess what the memory usage for the path will be and, if it's prohibitive, it (the optimizer) will reject the plan. Currently this only applies to hash joins in Derby.

      The call that starts the check for excessive memory exists in two places in OptimizerImpl.java: costBasedCostOptimizable() and considerCost(). There we have the following:

      /*

        • Skip this access path if it takes too much memory.
          **
        • NOTE: The default assumption here is that the number of rows in
        • a single scan is the total number of rows divided by the number
        • of outer rows. The optimizable may over-ride this assumption.
          */
          if( ! optimizable.memoryUsageOK(estimatedCost.rowCount() / outerCost.rowCount(), maxMemoryPerTable))
          Unknown macro: { if (optimizerTrace) { trace(SKIPPING_DUE_TO_EXCESS_MEMORY, 0, 0, 0.0, null); } return; }

      However, if the outerCost has an estimated row count of Double.POSITIVE_INFINITY, which can happen if the query is very deeply nested and/or has a lot of FROM tables/expressions, the division of estimatedCost.rowCount() by outerCost.rowCount() will result in a "NaN" value. If that value is then passed to FromTable (which is the base implementation of an Optimizable), the memoryUsageOK method looks like this:

      public boolean memoryUsageOK( double rowCount, int maxMemoryPerTable)
      throws StandardException

      { /* ** Don't enforce maximum memory usage for a user-specified join ** strategy. */ if( userSpecifiedJoinStrategy != null) return true; int intRowCount = (rowCount > Integer.MAX_VALUE) ? Integer.MAX_VALUE : (int) rowCount; return intRowCount <= maxCapacity( getCurrentAccessPath().getJoinStrategy(), maxMemoryPerTable); }

      If rowCount is "NaN", the comparison to see if it's greater than MAX_VALUE will return false, which means that intRowCount gets set to "(int)rowCount". But when we cast rowCount, which is "NaN" represented by a double, to an int the result is 0. The final check then becomes "0 <= maxCapacity(...)", which will always return true. Thus regardless of what the estimated cost for the optimizable is, the "memoryUsageOK" check will always return true if the outer cost is infinity, and thus the optimizer could very well decide to choose a path that it should have rejected because of excessive memory requirements (where "should" means based on the estimates; the accuracy of the estimates in this case is another issue altogether).

      That said, I went in and made a small change to the above code to cause the Optimizer to reject a plan if it's cost was infinity, and the result was that some queries-esp. those seen in DERBY-1205-actually ended up running more slowly. The reason is that "infinity" is obviously not an accurate cost estimate for the hash joins, and in the case of DERBY-1205 the hash joins, while expensive, still end up being cheaper than nested loop joins. So the result of "fixing" the logic with a small change ended up making the queries run more slowly. Thus more investigation is required regarding to how to best approach this.

        Activity

        Hide
        Mike Matrigali added a comment -

        At least in the 10.2 development line, I believe we should fix this issue and then address the fallout as
        necessary. It seems reasonable not to change the query performance in a stable release like 10.1.3, so
        am ok with leaving the logic there until we can figure out larger fix. Even if it means some queries will go slower, hopefully other queries will go faster when the code does what it was intended to do.
        Counting on leaving a bug in the system to make up for another
        problem in the system which produced a bad estimate is just going to lead us down a path where no one
        can understand why the optimizer picks the plan it does.

        The history of hash joins is as follows. The code has only ever expected to use them when the number/size of
        rows could reasonably be expected to fit in memory. The costing the optimizer uses only ever assumes in
        memory java hash tables. The costing for hash scans that store returns only account for in memory hash tables.
        The cost assumption is that probes into the in memory hash table is basically free after it has paid the cost to
        build it. As described above the optimizer code estimated the hash table size and would reject plans it
        estimated would not fit into memory. For a long time all rows in a hash scan would go into memory even if
        the optimizer estimate was way low, which in some cases would result in out of memory.

        Recently a change was made to overflow the hash tables to disk. At that time the intent of this change was to
        handle the problem where the optimizer picked wrong - but just go slower rather than get an error. The assumption
        was it was still an error case rather than a normal expected path. No costing was added for the overflow to disk
        of the hash table. Note in the worst case this of an extremely large result set the cost of each probe into an
        overflowed hash table may be one synchronous I/O per row (definitely far from "free").

        If we really think it is a good idea to allow overflow hash tables in normal query processing then we should add
        real costing of such a beast. Do note that providing the real costing is not going to help the query slow down
        above, as the real problem is that the original row estimate is bad.

        My opinion is that there are better query processing options in cases where
        we expect the dataset to much larger than reasonably fits in memory, my
        favorite being some sort of sort /merge – especially when there already exists indexes on the 2 join keys thus
        giving you favorable I/O clustering on the data set. In reality what you are doing with "overflow" hash tables
        is creating an on the fly , disk based index for query processing. In that case I think it might be better to just
        go ahead and either use the existing sorted set (an existing index), or create a sorted resultset for sort merge
        (throw it to the existing sorter).

        Show
        Mike Matrigali added a comment - At least in the 10.2 development line, I believe we should fix this issue and then address the fallout as necessary. It seems reasonable not to change the query performance in a stable release like 10.1.3, so am ok with leaving the logic there until we can figure out larger fix. Even if it means some queries will go slower, hopefully other queries will go faster when the code does what it was intended to do. Counting on leaving a bug in the system to make up for another problem in the system which produced a bad estimate is just going to lead us down a path where no one can understand why the optimizer picks the plan it does. The history of hash joins is as follows. The code has only ever expected to use them when the number/size of rows could reasonably be expected to fit in memory. The costing the optimizer uses only ever assumes in memory java hash tables. The costing for hash scans that store returns only account for in memory hash tables. The cost assumption is that probes into the in memory hash table is basically free after it has paid the cost to build it. As described above the optimizer code estimated the hash table size and would reject plans it estimated would not fit into memory. For a long time all rows in a hash scan would go into memory even if the optimizer estimate was way low, which in some cases would result in out of memory. Recently a change was made to overflow the hash tables to disk. At that time the intent of this change was to handle the problem where the optimizer picked wrong - but just go slower rather than get an error. The assumption was it was still an error case rather than a normal expected path. No costing was added for the overflow to disk of the hash table. Note in the worst case this of an extremely large result set the cost of each probe into an overflowed hash table may be one synchronous I/O per row (definitely far from "free"). If we really think it is a good idea to allow overflow hash tables in normal query processing then we should add real costing of such a beast. Do note that providing the real costing is not going to help the query slow down above, as the real problem is that the original row estimate is bad. My opinion is that there are better query processing options in cases where we expect the dataset to much larger than reasonably fits in memory, my favorite being some sort of sort /merge – especially when there already exists indexes on the 2 join keys thus giving you favorable I/O clustering on the data set. In reality what you are doing with "overflow" hash tables is creating an on the fly , disk based index for query processing. In that case I think it might be better to just go ahead and either use the existing sorted set (an existing index), or create a sorted resultset for sort merge (throw it to the existing sorter).

          People

          • Assignee:
            Unassigned
            Reporter:
            A B
          • Votes:
            4 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development