Derby
  1. Derby
  2. DERBY-4116

SYSCS_UTIL.SYSCS_UPDATE_STATISTICS should update the store estimated row count for the table

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.6.1.0
    • Fix Version/s: 10.5.1.1, 10.6.1.0
    • Component/s: SQL
    • Labels:
      None
    • Issue & fix info:
      High Value Fix
    1. derby-4116_diff3.txt
      12 kB
      Kathey Marsden
    2. derby-4116_diff2.txt
      11 kB
      Kathey Marsden
    3. derby-4116_diff.txt
      10 kB
      Kathey Marsden

      Issue Links

        Activity

        Hide
        Kathey Marsden added a comment -

        This issue blocks DERBY-3955. The test won't be able to be converted until this issue is resolved.

        Show
        Kathey Marsden added a comment - This issue blocks DERBY-3955 . The test won't be able to be converted until this issue is resolved.
        Hide
        Kathey Marsden added a comment -

        Well just to see if this was a really easy fix, so I could continue my testing, I tried the trivial patch below:

        — java/engine/org/apache/derby/impl/sql/execute/AlterTableConstantAction.java (revision 757498)
        +++ java/engine/org/apache/derby/impl/sql/execute/AlterTableConstantAction.java (working copy)
        @@ -744,6 +744,7 @@
        rowBufferArray[GROUP_FETCH_SIZE - 1] = lastUniqueKey;
        lastUniqueKey = tmp;
        } // while
        + gsc.setEstimatedRowCount(numRows);
        } // try
        finally
        {

        but it doesn't affect my test case. I verified that with this change the code goes through
        OpenBTree.setEstimatedRowCount and sets the count to 4000, but there is this comment in OpenBTree.setEstimatedRowCount():

        • This call is currently only supported on Heap conglomerates, it
        • will throw an exception if called on btree conglomerates.
          *
          It doesn't throw an exception, but it doesn't seem to do too much either. Any ideas what's wrong?
        Show
        Kathey Marsden added a comment - Well just to see if this was a really easy fix, so I could continue my testing, I tried the trivial patch below: — java/engine/org/apache/derby/impl/sql/execute/AlterTableConstantAction.java (revision 757498) +++ java/engine/org/apache/derby/impl/sql/execute/AlterTableConstantAction.java (working copy) @@ -744,6 +744,7 @@ rowBufferArray [GROUP_FETCH_SIZE - 1] = lastUniqueKey; lastUniqueKey = tmp; } // while + gsc.setEstimatedRowCount(numRows); } // try finally { but it doesn't affect my test case. I verified that with this change the code goes through OpenBTree.setEstimatedRowCount and sets the count to 4000, but there is this comment in OpenBTree.setEstimatedRowCount(): This call is currently only supported on Heap conglomerates, it will throw an exception if called on btree conglomerates. * It doesn't throw an exception, but it doesn't seem to do too much either. Any ideas what's wrong?
        Hide
        Knut Anders Hatlen added a comment -

        I guess it's only implemented for heap conglomerates because the row count should really be the same in the base table and all its indexes, and there's no need to store the estimated row count for a table more than one place. The optimizer will probably always look for the estimate in the heap conglomerate, so it doesn't notice the estimate in the index.

        Does it work as expected if you do something like this near the end of the method?

        ScanController heapSC = tc.openScan(td.getHeapConglomerateId(), ...);
        heapSC.setEstimatedRowCount(rowCount);
        heapSC.close();

        Show
        Knut Anders Hatlen added a comment - I guess it's only implemented for heap conglomerates because the row count should really be the same in the base table and all its indexes, and there's no need to store the estimated row count for a table more than one place. The optimizer will probably always look for the estimate in the heap conglomerate, so it doesn't notice the estimate in the index. Does it work as expected if you do something like this near the end of the method? ScanController heapSC = tc.openScan(td.getHeapConglomerateId(), ...); heapSC.setEstimatedRowCount(rowCount); heapSC.close();
        Hide
        Kathey Marsden added a comment -

        Thanks Knut, that seemed to do the trick. I will make a proper patch for the issue and do testing. I think I'll keep the index estimatedRowCount update since it does seem to be being set, just in case someone is using it for something.

        Show
        Kathey Marsden added a comment - Thanks Knut, that seemed to do the trick. I will make a proper patch for the issue and do testing. I think I'll keep the index estimatedRowCount update since it does seem to be being set, just in case someone is using it for something.
        Hide
        Kathey Marsden added a comment -

        Attached is a patch for this issue. derby-4116_diff.txt. It also included the initial partial conversion of selectivity.sql DERBY-3955 which tests this issue. suites.All and derbyall passed except for DERBY-3993, a known issue.

        Please review. Especially, please check my arguments to openScan to make sure I got that right.

        Show
        Kathey Marsden added a comment - Attached is a patch for this issue. derby-4116_diff.txt. It also included the initial partial conversion of selectivity.sql DERBY-3955 which tests this issue. suites.All and derbyall passed except for DERBY-3993 , a known issue. Please review. Especially, please check my arguments to openScan to make sure I got that right.
        Hide
        Knut Anders Hatlen added a comment -

        The arguments to openScan() look correct to me. Since we don't actually scan the conglomerate, many of them wouldn't make any difference anyway, I think.

        Now we update the heap conglomerate inside the for loop, which means that we'll update it once per index. Would it be enough to update it once, after we have gone through the last index?

        Nit: openScan() should be called outside the try block (if it fails, there's no open scan to close). If it's moved out of the try block, the check for heapSC!=null in the finally block can be removed too.

        Show
        Knut Anders Hatlen added a comment - The arguments to openScan() look correct to me. Since we don't actually scan the conglomerate, many of them wouldn't make any difference anyway, I think. Now we update the heap conglomerate inside the for loop, which means that we'll update it once per index. Would it be enough to update it once, after we have gone through the last index? Nit: openScan() should be called outside the try block (if it fails, there's no open scan to close). If it's moved out of the try block, the check for heapSC!=null in the finally block can be removed too.
        Hide
        Kathey Marsden added a comment -

        Thanks Knut for reviewing the changes, especially catching that I was doing the update inside the loop. I will make the changes and post a new patch soon.

        Show
        Kathey Marsden added a comment - Thanks Knut for reviewing the changes, especially catching that I was doing the update inside the loop. I will make the changes and post a new patch soon.
        Hide
        Kathey Marsden added a comment -

        Here is an updated patch for this issue. I am rerunning tests and will check in later today if all goes well.

        Show
        Kathey Marsden added a comment - Here is an updated patch for this issue. I am rerunning tests and will check in later today if all goes well.
        Hide
        Knut Anders Hatlen added a comment -

        With the latest patch, I think the row count estimate will always be set to 0 if you call SYSCS_UPDATE_STATISTICS on a table with no indexes, even if the table is not empty. Would it be possible to initialize rowCount to -1, so that we can detect that the row count was not calculated, and then add "if (rowCount >= 0)" around the call to setEstimatedRowCount()?

        Show
        Knut Anders Hatlen added a comment - With the latest patch, I think the row count estimate will always be set to 0 if you call SYSCS_UPDATE_STATISTICS on a table with no indexes, even if the table is not empty. Would it be possible to initialize rowCount to -1, so that we can detect that the row count was not calculated, and then add "if (rowCount >= 0)" around the call to setEstimatedRowCount()?
        Hide
        Kathey Marsden added a comment -

        Unchecking patch available. I will make a new patch with Knut's suggestion., but it is looking like that won't happen today. Thanks again Knut for your valuable input.

        Show
        Kathey Marsden added a comment - Unchecking patch available. I will make a new patch with Knut's suggestion., but it is looking like that won't happen today. Thanks again Knut for your valuable input.
        Hide
        Kathey Marsden added a comment -

        Attached is an updated patch for this issue. This patch will not update the store estimated row count if no index scan was performed.

        I tried to add a test to UpdateStatisticsTest to verify this, but ran into some trouble because the estimated row counts we get back from runtime statistics do not return this information directly. I did manually verify that we do not go thre setEstimatedRowCount for update statistics on a table with no index.

        Regarding the use of internal interfaces in checkEstimatedRowCount, Myrna suggested that I clobber and build and make sure there were no build order issues and also that I make sure derbyTesting.jar didn't pick up the product classes. I checked both and the change seems ok from a build perspective.

        I ran SelectivityTest and UpdateStatisticsTest and am running tests now but don't anticipate any problems since they don't exercise this code. Please review.

        Show
        Kathey Marsden added a comment - Attached is an updated patch for this issue. This patch will not update the store estimated row count if no index scan was performed. I tried to add a test to UpdateStatisticsTest to verify this, but ran into some trouble because the estimated row counts we get back from runtime statistics do not return this information directly. I did manually verify that we do not go thre setEstimatedRowCount for update statistics on a table with no index. Regarding the use of internal interfaces in checkEstimatedRowCount, Myrna suggested that I clobber and build and make sure there were no build order issues and also that I make sure derbyTesting.jar didn't pick up the product classes. I checked both and the change seems ok from a build perspective. I ran SelectivityTest and UpdateStatisticsTest and am running tests now but don't anticipate any problems since they don't exercise this code. Please review.
        Hide
        Kathey Marsden added a comment -

        The patch derby-4116_diff3.txt is the current patch as of this comment. Somehow it did not get attached with my last attempt.

        Show
        Kathey Marsden added a comment - The patch derby-4116_diff3.txt is the current patch as of this comment. Somehow it did not get attached with my last attempt.
        Hide
        Mike Matrigali added a comment -

        patch looks good to me. The only thing I noticed is that the current index scans use an open for read (argument 0), but your new heap open does an open for update. It seems like both
        opens should use the same - either read or write, and since the existing scans and update of the estimate seems to work it seems better to open the heap for read also.

        Show
        Mike Matrigali added a comment - patch looks good to me. The only thing I noticed is that the current index scans use an open for read (argument 0), but your new heap open does an open for update. It seems like both opens should use the same - either read or write, and since the existing scans and update of the estimate seems to work it seems better to open the heap for read also.
        Hide
        Knut Anders Hatlen added a comment -

        Patch 3 looks good to me too. Thanks for all the work.

        Show
        Knut Anders Hatlen added a comment - Patch 3 looks good to me too. Thanks for all the work.
        Hide
        Kathey Marsden added a comment -

        I made the change recommended by Mike and committed revision 760497 to trunk. I will wait for clean nightly results and backport to 10.5. Meanwhile I'll leave this issue open. Thanks for the reviews!

        Show
        Kathey Marsden added a comment - I made the change recommended by Mike and committed revision 760497 to trunk. I will wait for clean nightly results and backport to 10.5. Meanwhile I'll leave this issue open. Thanks for the reviews!

          People

          • Assignee:
            Kathey Marsden
            Reporter:
            Kathey Marsden
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development