Derby
  1. Derby
  2. DERBY-4771

Continue investigation of automatic creation/update of index statistics

    Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.8.1.2
    • Fix Version/s: 10.8.1.2
    • Component/s: SQL, Store
    • Labels:
      None

      Description

      Work was started to improve Derby's handling of index statistics. This issue tracks further discussion and work for this task.

      1. rjone.out
        17 kB
        Lily Wei
      2. rjall.rar
        764 kB
        Lily Wei
      3. rjall.out
        3.05 MB
        Lily Wei
      4. rjall.out
        796 kB
        Lily Wei
      5. rjall.out
        1.37 MB
        Lily Wei
      6. error-stacktrace.out
        3 kB
        Lily Wei
      7. derby-4771-2h-prototype_lcc_code_dump.diff
        135 kB
        Kristian Waagan
      8. derby-4771-2g-prototype_lcc_code_dump.diff
        129 kB
        Kristian Waagan
      9. derby-4771-2f-prototype_lcc_code_dump-WORK-IN-PROGRESS.diff
        126 kB
        Kristian Waagan
      10. derby-4771-2f-prototype_lcc_code_dump.diff
        119 kB
        Kristian Waagan
      11. Derby-4771-2f-AutomaticIndexStatisticsTest_wondows7.rar
        1.59 MB
        Lily Wei
      12. DERBY-4771-2e-prototype.rar
        6.70 MB
        Lily Wei
      13. derby-4771-2e-prototype_lcc_code_dump.diff
        108 kB
        Kristian Waagan
      14. derby-4771-2d-prototype_lcc_code_dump.diff
        105 kB
        Kristian Waagan
      15. derby-4771-2c-prototype_lcc_code_dump.diff
        99 kB
        Kristian Waagan
      16. derby-4771-2b-prototype_lcc_code_dump.diff
        59 kB
        Kristian Waagan
      17. derby-4771-2a-prototype_lcc_code_dump.diff
        79 kB
        Kristian Waagan
      18. derby-4771-1b-prototype_code_dump.diff
        49 kB
        Kristian Waagan
      19. derby-4771-1a-prototype_code_dump.stat
        1 kB
        Kristian Waagan
      20. derby-4771-1a-prototype_code_dump.diff
        48 kB
        Kristian Waagan
      21. derby.log
        37 kB
        Lily Wei
      22. autoindexstats.html
        2 kB
        Kristian Waagan

        Issue Links

          Activity

          Hide
          Kristian Waagan added a comment -

          Closing this issue since the feature has been implemented.
          Further improvements and functionality will be tracked by separate JIRAs.

          Show
          Kristian Waagan added a comment - Closing this issue since the feature has been implemented. Further improvements and functionality will be tracked by separate JIRAs.
          Hide
          Dag H. Wanvik added a comment -

          Hi Kristian, thanks for addressing my comments. Here:answers to your questions in connection with my latest batch of comments:

          > * Yes, but not at this level. I thought it would be nice to have a name for the transaction to identify it in the transaction table. I think some new methods must be added to be able to name a transaction at this level, so I'm not sure if it is worth the trouble. I'm keeping the TODO for now, might remove it in the next iteration.
          >
          > Opinions?

          Fine with me.

          > * It isn't needed now. Since the interface is internal, and there is only one implementation of it, I suppose the best action to take now is to remove it. We can introduce it again later, and then probably in a shape more like you have described. It feels a bit odd to say in the JavaDoc that scheduling requests may be denied, and not have a way to learn if it happened or not...
          >
          > Opinions?

          Fine, too.

          > * Added synchronization for runningThread in the finally-block. The current code will let the thread die and then create a new one on the next update request. I considered adding a sleep before letting the thread die, in case a new request would come in quickly.
          > Opinions?

          I guess this could be improved later if it turns out to be an issue. +1

          > * Do you mean we should call TransactionResourceImpl#handleException explicitly here?
          > I think the comment meant to say that the daemon will be disabled elsewhere. I'll address this issue in the next iteration.

          Great.

          > * I think I meant to retry the whole operation, that is to add the unit of work to the end of the queue or something. The way it is now, it will retry to get the locks, but if that fails it will discard the unit of work. Note that there is retry logic on several levels here. Since a new unit of work will be scheduled the next time a query using the index(es) is compiled, I'll delete the TODO and keep the code as it is. If this strategy turns out to be inadequate, we can fix it later.

          Ok, fine.

          Show
          Dag H. Wanvik added a comment - Hi Kristian, thanks for addressing my comments. Here:answers to your questions in connection with my latest batch of comments: > * Yes, but not at this level. I thought it would be nice to have a name for the transaction to identify it in the transaction table. I think some new methods must be added to be able to name a transaction at this level, so I'm not sure if it is worth the trouble. I'm keeping the TODO for now, might remove it in the next iteration. > > Opinions? Fine with me. > * It isn't needed now. Since the interface is internal, and there is only one implementation of it, I suppose the best action to take now is to remove it. We can introduce it again later, and then probably in a shape more like you have described. It feels a bit odd to say in the JavaDoc that scheduling requests may be denied, and not have a way to learn if it happened or not... > > Opinions? Fine, too. > * Added synchronization for runningThread in the finally-block. The current code will let the thread die and then create a new one on the next update request. I considered adding a sleep before letting the thread die, in case a new request would come in quickly. > Opinions? I guess this could be improved later if it turns out to be an issue. +1 > * Do you mean we should call TransactionResourceImpl#handleException explicitly here? > I think the comment meant to say that the daemon will be disabled elsewhere. I'll address this issue in the next iteration. Great. > * I think I meant to retry the whole operation, that is to add the unit of work to the end of the queue or something. The way it is now, it will retry to get the locks, but if that fails it will discard the unit of work. Note that there is retry logic on several levels here. Since a new unit of work will be scheduled the next time a query using the index(es) is compiled, I'll delete the TODO and keep the code as it is. If this strategy turns out to be inadequate, we can fix it later. Ok, fine.
          Hide
          Kristian Waagan added a comment -

          I have created the master issue DERBY-4934 to get the feature into the code base.
          If possible, please provide specific feedback to the various sub tasks.

          Dag, I have copied the text for the remaining issues to address to DERBY-4936.

          Show
          Kristian Waagan added a comment - I have created the master issue DERBY-4934 to get the feature into the code base. If possible, please provide specific feedback to the various sub tasks. Dag, I have copied the text for the remaining issues to address to DERBY-4936 .
          Hide
          Kristian Waagan added a comment -

          Attaching untested patch 2h, which addresses Dag's comments from 02/Dec/10 11:31 AM.

          (Q = question, C = comment, A = action)

          A java/engine/org/apache/derby/iapi/services/daemon/IndexStatisticsDaemon.java

          • Various questions here:
            Q: How is this safe? (errors not reported?) explain!
            C: I guess it isn't totally safe. That said, except for writing statistics
            entries into a system table, the daemon is only reading data. The basic
            principle for the daemon when it comes to error handling is to ditch the
            current unit of work if there in an error and continue with the next one.
            It doesn't really care if the operation fails, because an update will be
            scheduled again. The daemon is currently separating between expected and
            unexpected errors. The former will simple be ignored, as they can arise
            for various reasons during "normal operation". One such error is failed
            invalidation because someone else is accessing the existing statistics.
            Unexpected errors are errors for which the daemon doesn't have specific
            code to handle. The best we can do here is to log it and continue if
            possible.
            A: I have rewritten the logging code, such that unexpected errors are always
            logged, even if logging is disabled.

          Q: What happens if schedule throws?
          C: It doesn't. If it did, it would have to be
          handled by the code calling it.
          A: I removed the throws clause.

          Q: Under which condition can work queue be full? Then what hapens?
          C: If work is scheduled at a higher rate than the daemon can generate new
          statistics. Would typically only happen if there are very large tables in
          the database.
          If the queue is full, the work simply won't be added. It will eventually
          be added when there is free space in the queue and a query compilation
          triggers the scheduling.
          A: I think the current default queue size is too small (5). I have
          increased to 20 for now (worst case is 19 comparisons to look for
          duplicates per query compilation).

          A java/engine/org/apache/derby/impl/services/daemon/IndexStatisticsDaemonImpl.java

          • I removed LOG_HEADER (used in only one place now).
          • Various questions:
            Q: What are "in all cases" exactly?
            C: Currently the daemon has to take locks when invalidating statements and
            dropping/adding statistics entries.

          Q: Not in daemon mode?
          C: Yes, in explicit mode it will take locks during scanning as well to stop
          another transaction to drop the table/index.

          Q: Or is the first sentence not true: "if it needs to take locks it will give up immediately if the locks cannot be obtained"
          C: This is true as far as there is an easy way to do it. I think the
          A: I rewrote the comment somewhat, and added a description of the flow.

          • Removed comment. I think I added it when considering very large tables, but I
            don't understand it myself any more...
          • Rewrote loop slightly to synchronize on "queue" when checking "daemonDisabled".
          • Good!
          • "daemonLCC" will be null the first time run is started. Since the daemon
            might not run at all, I decided to create the database connection lazily.
            I have now coded under the assumption that the daemon won't lose the
            connection to the database unless the database is shut down (for whatever
            reason). This basically consisted of removing parts of the comment for
            "daemonLCC".
          • I added final to both.
          • Yes, but not at this level. I thought it would be nice to have a name for the
            transaction to identify it in the transaction table.
            I think some new methods must be added to be able to name a transaction at
            this level, so I'm not sure if it is worth the trouble.
            I'm keeping the TODO for now, might remove it in the next iteration.

          Opinions?

          • Whether or not to remove the tracing code depends on the feedback received
            during testing. If people find it useful, it can be improved a little and
            kept, but if not it should be removed.
            I envision this happening closer to when the target release is cut.
          • Added another sentence to the JavaDoc, stating that the table descriptor
            (unit of work) will be discarded if it turns out to be invalid when the
            statistics update is about to happen. I expect the most common error is that
            the table has been dropped since the work for it was scheduled.
          • It isn't needed now. Since the interface is internal, and there is only one
            implementation of it, I suppose the best action to take now is to remove it.
            We can introduce it again later, and then probably in a shape more like you
            have described. It feels a bit odd to say in the JavaDoc that scheduling
            requests may be denied, and not have a way to learn if it happened or not...

          Opinions?

          • Added synchronization for runningThread in the finally-block.
            The current code will let the thread die and then create a new one on the
            next update request. I considered adding a sleep before letting the thread
            die, in case a new request would come in quickly.

          Opinions?

          • Do you mean we should call TransactionResourceImpl#handleException explicitly
            here?
            I think the comment meant to say that the daemon will be disabled elsewhere.
            I'll address this issue in the next iteration.
          • Added simple shutdown mechanism if there are too many consecutive errors.
          • Currently the queue is cleared only when the daemon is shut down.
          • Here's another way it can happen. Parts of the trace may be familiar to you
            org.apache.derby.iapi.error.ShutdownException:
            at org.apache.derby.iapi.services.context.ContextManager.checkInterrupt(ContextManager.java:437)
            at org.apache.derby.iapi.services.context.ContextManager.getContext(ContextManager.java:155)
            at org.apache.derby.iapi.services.context.ContextService.getContextOrNull(ContextService.java:249)
            at org.apache.derby.iapi.util.InterruptStatus.setInterrupted(InterruptStatus.java:71)
            at org.apache.derby.iapi.util.InterruptStatus.noteAndClearInterrupt(InterruptStatus.java:112)
            at org.apache.derby.impl.store.raw.data.RAFContainer4.recoverContainerAfterInterrupt(RAFContainer4.java:807)
            at org.apache.derby.impl.store.raw.data.RAFContainer4.readPage(RAFContainer4.java:366)
            at org.apache.derby.impl.store.raw.data.RAFContainer4.readPage(RAFContainer4.java:246)
            at org.apache.derby.impl.store.raw.data.CachedPage.readPage(CachedPage.java:671)
            at org.apache.derby.impl.store.raw.data.CachedPage.setIdentity(CachedPage.java:190)
            at org.apache.derby.impl.services.cache.ConcurrentCache.find(ConcurrentCache.java:295)
            at org.apache.derby.impl.store.raw.data.FileContainer.getUserPage(FileContainer.java:2530)
            at org.apache.derby.impl.store.raw.data.FileContainer.getPage(FileContainer.java:2580)
            at org.apache.derby.impl.store.raw.data.BaseContainerHandle.getPage(BaseContainerHandle.java:319)
            at org.apache.derby.impl.store.access.btree.ControlRow.get(ControlRow.java:833)
            at org.apache.derby.impl.store.access.btree.ControlRow.get(ControlRow.java:820)
            at org.apache.derby.impl.store.access.btree.ControlRow.getRightSibling(ControlRow.java:531)
            at org.apache.derby.impl.store.access.btree.BTreeScan.positionAtNextPage(BTreeScan.java:493)
            at org.apache.derby.impl.store.access.btree.BTreeForwardScan.fetchRows(BTreeForwardScan.java:464)
            at org.apache.derby.impl.store.access.btree.BTreeScan.fetchNextGroup(BTreeScan.java:1596)
            at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl$KeyComparator.fetchRows(IndexStatisticsDaemonImpl.java:1016)
            at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.updateIndexStatsMinion(IndexStatisticsDaemonImpl.java:432)
            at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.updateAllIndexStats(IndexStatisticsDaemonImpl.java:329)
            at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.generateStatistics(IndexStatisticsDaemonImpl.java:291)
            at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.run(IndexStatisticsDaemonImpl.java:650)
            at java.lang.Thread.run(Thread.java:662)
          • I think I meant to retry the whole operation, that is to add the unit of work
            to the end of the queue or something. The way it is now, it will retry to get
            the locks, but if that fails it will discard the unit of work.
            Note that there is retry logic on several levels here.
            Since a new unit of work will be scheduled the next time a query using the
            index(es) is compiled, I'll delete the TODO and keep the code as it is.
            If this strategy turns out to be inadequate, we can fix it later.
          • Fixed.
          • Updated comment (and some formatting).

          Running tests.
          I asked for opinions on three issues, and one issue will have to be addressed
          in the next iteration (error handling).

          I will now start splitting up the patch by creating a master JIRA with some subtasks.
          I think I also need to write a wiki-page describing the current properties controlling the feature, including the debug properties (for tuning during the development/testing phase).

          Show
          Kristian Waagan added a comment - Attaching untested patch 2h, which addresses Dag's comments from 02/Dec/10 11:31 AM. (Q = question, C = comment, A = action) A java/engine/org/apache/derby/iapi/services/daemon/IndexStatisticsDaemon.java Various questions here: Q: How is this safe? (errors not reported?) explain! C: I guess it isn't totally safe. That said, except for writing statistics entries into a system table, the daemon is only reading data. The basic principle for the daemon when it comes to error handling is to ditch the current unit of work if there in an error and continue with the next one. It doesn't really care if the operation fails, because an update will be scheduled again. The daemon is currently separating between expected and unexpected errors. The former will simple be ignored, as they can arise for various reasons during "normal operation". One such error is failed invalidation because someone else is accessing the existing statistics. Unexpected errors are errors for which the daemon doesn't have specific code to handle. The best we can do here is to log it and continue if possible. A: I have rewritten the logging code, such that unexpected errors are always logged, even if logging is disabled. Q: What happens if schedule throws? C: It doesn't. If it did, it would have to be handled by the code calling it. A: I removed the throws clause. Q: Under which condition can work queue be full? Then what hapens? C: If work is scheduled at a higher rate than the daemon can generate new statistics. Would typically only happen if there are very large tables in the database. If the queue is full, the work simply won't be added. It will eventually be added when there is free space in the queue and a query compilation triggers the scheduling. A: I think the current default queue size is too small (5). I have increased to 20 for now (worst case is 19 comparisons to look for duplicates per query compilation). A java/engine/org/apache/derby/impl/services/daemon/IndexStatisticsDaemonImpl.java I removed LOG_HEADER (used in only one place now). Various questions: Q: What are "in all cases" exactly? C: Currently the daemon has to take locks when invalidating statements and dropping/adding statistics entries. Q: Not in daemon mode? C: Yes, in explicit mode it will take locks during scanning as well to stop another transaction to drop the table/index. Q: Or is the first sentence not true: "if it needs to take locks it will give up immediately if the locks cannot be obtained" C: This is true as far as there is an easy way to do it. I think the A: I rewrote the comment somewhat, and added a description of the flow. Removed comment. I think I added it when considering very large tables, but I don't understand it myself any more... Rewrote loop slightly to synchronize on "queue" when checking "daemonDisabled". Good! "daemonLCC" will be null the first time run is started. Since the daemon might not run at all, I decided to create the database connection lazily. I have now coded under the assumption that the daemon won't lose the connection to the database unless the database is shut down (for whatever reason). This basically consisted of removing parts of the comment for "daemonLCC". I added final to both. Yes, but not at this level. I thought it would be nice to have a name for the transaction to identify it in the transaction table. I think some new methods must be added to be able to name a transaction at this level, so I'm not sure if it is worth the trouble. I'm keeping the TODO for now, might remove it in the next iteration. Opinions? Whether or not to remove the tracing code depends on the feedback received during testing. If people find it useful, it can be improved a little and kept, but if not it should be removed. I envision this happening closer to when the target release is cut. Added another sentence to the JavaDoc, stating that the table descriptor (unit of work) will be discarded if it turns out to be invalid when the statistics update is about to happen. I expect the most common error is that the table has been dropped since the work for it was scheduled. It isn't needed now. Since the interface is internal, and there is only one implementation of it, I suppose the best action to take now is to remove it. We can introduce it again later, and then probably in a shape more like you have described. It feels a bit odd to say in the JavaDoc that scheduling requests may be denied, and not have a way to learn if it happened or not... Opinions? Added synchronization for runningThread in the finally-block. The current code will let the thread die and then create a new one on the next update request. I considered adding a sleep before letting the thread die, in case a new request would come in quickly. Opinions? Do you mean we should call TransactionResourceImpl#handleException explicitly here? I think the comment meant to say that the daemon will be disabled elsewhere. I'll address this issue in the next iteration. Added simple shutdown mechanism if there are too many consecutive errors. Currently the queue is cleared only when the daemon is shut down. Here's another way it can happen. Parts of the trace may be familiar to you org.apache.derby.iapi.error.ShutdownException: at org.apache.derby.iapi.services.context.ContextManager.checkInterrupt(ContextManager.java:437) at org.apache.derby.iapi.services.context.ContextManager.getContext(ContextManager.java:155) at org.apache.derby.iapi.services.context.ContextService.getContextOrNull(ContextService.java:249) at org.apache.derby.iapi.util.InterruptStatus.setInterrupted(InterruptStatus.java:71) at org.apache.derby.iapi.util.InterruptStatus.noteAndClearInterrupt(InterruptStatus.java:112) at org.apache.derby.impl.store.raw.data.RAFContainer4.recoverContainerAfterInterrupt(RAFContainer4.java:807) at org.apache.derby.impl.store.raw.data.RAFContainer4.readPage(RAFContainer4.java:366) at org.apache.derby.impl.store.raw.data.RAFContainer4.readPage(RAFContainer4.java:246) at org.apache.derby.impl.store.raw.data.CachedPage.readPage(CachedPage.java:671) at org.apache.derby.impl.store.raw.data.CachedPage.setIdentity(CachedPage.java:190) at org.apache.derby.impl.services.cache.ConcurrentCache.find(ConcurrentCache.java:295) at org.apache.derby.impl.store.raw.data.FileContainer.getUserPage(FileContainer.java:2530) at org.apache.derby.impl.store.raw.data.FileContainer.getPage(FileContainer.java:2580) at org.apache.derby.impl.store.raw.data.BaseContainerHandle.getPage(BaseContainerHandle.java:319) at org.apache.derby.impl.store.access.btree.ControlRow.get(ControlRow.java:833) at org.apache.derby.impl.store.access.btree.ControlRow.get(ControlRow.java:820) at org.apache.derby.impl.store.access.btree.ControlRow.getRightSibling(ControlRow.java:531) at org.apache.derby.impl.store.access.btree.BTreeScan.positionAtNextPage(BTreeScan.java:493) at org.apache.derby.impl.store.access.btree.BTreeForwardScan.fetchRows(BTreeForwardScan.java:464) at org.apache.derby.impl.store.access.btree.BTreeScan.fetchNextGroup(BTreeScan.java:1596) at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl$KeyComparator.fetchRows(IndexStatisticsDaemonImpl.java:1016) at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.updateIndexStatsMinion(IndexStatisticsDaemonImpl.java:432) at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.updateAllIndexStats(IndexStatisticsDaemonImpl.java:329) at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.generateStatistics(IndexStatisticsDaemonImpl.java:291) at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.run(IndexStatisticsDaemonImpl.java:650) at java.lang.Thread.run(Thread.java:662) I think I meant to retry the whole operation, that is to add the unit of work to the end of the queue or something. The way it is now, it will retry to get the locks, but if that fails it will discard the unit of work. Note that there is retry logic on several levels here. Since a new unit of work will be scheduled the next time a query using the index(es) is compiled, I'll delete the TODO and keep the code as it is. If this strategy turns out to be inadequate, we can fix it later. Fixed. Updated comment (and some formatting). Running tests. I asked for opinions on three issues, and one issue will have to be addressed in the next iteration (error handling). I will now start splitting up the patch by creating a master JIRA with some subtasks. I think I also need to write a wiki-page describing the current properties controlling the feature, including the debug properties (for tuning during the development/testing phase).
          Hide
          Kristian Waagan added a comment -

          Attached patch 2g.

          Addressing Dag's first set of comments from 01/Dec/10 05:23 PM (in order).

          M java/engine/org/apache/derby/impl/sql/compile/CursorNode.java

          • I removed the TODO comment and created method collectTablePrivsAndStats.
            Also reformatted the existing comment somewhat.
          • Added comment. The BaseFromTable can represent several types of sources,
            for instance system tables, sub queries, and VTIs. It can also represent a
            view, but FromBaseTable nodes representing views are rewritten to
            FromSubqueries during binding according to the FromBaseTable class JavaDoc.
          • Now using no-arg constructor.
          • Added comment and made for-loop iterate backwards to. It is very likely
            that statistics are mostly up-to-date, so removing from the end of the list
            saves some copies. On the other hand, the list will mostly be very small,
            which is why I'm not creating a second list to copy relevant table
            descriptors to.
            Renamed method in TableDescriptor to getAndClearIndexStatsIsUpToDate.
          • Yes, comment removed.

          M java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java

          • I decided to rewrite the code, which relocates the use of the constant.
            It will now be handled inside TableDescriptor.markForIndexStatsUpdate.
            See also comment below at the bottom.

          M java/engine/org/apache/derby/impl/sql/catalog/DataDictionaryImpl.java

          • I'm keeping this temporary code for now, but I have changed it.
            If the user hasn't explicitly specified the logging property, it will be set
            to true. If explicitly specified by the user, it won't be overridden.
            This code should be removed before going into a release, though, and then I
            guess logging will default to false.

          A java/engine/org/apache/derby/impl/services/daemon/IndexStatisticsDaemonImpl.java
          M java/engine/org/apache/derby/impl/db/BasicDatabase.java
          M java/engine/org/apache/derby/iapi/sql/dictionary/TableDescriptor.java

          • TBD: I'll get back to this later, I have rewritten the code a bit (ditching
            the use of negative row count estimates).
          • I removed the TODO comment. I added it when I discovered that the row count
            estimate can get "out of sync" due to how Derby is updating it. It will only
            happen in certain circumstances, and based on the comments for DERBY-2949 it
            looks like Knut hit the same problem. He also says that it might be possible
            to improve the logic.
            Added JavaDoc.

          M java/engine/org/apache/derby/iapi/sql/dictionary/DataDictionary.java

          • Various comments from Dag here:
            o The daemon can disabled at runtime if it experiences severe errors. Note
            that if the user doesn't want to have the daemon running, he/she would
            disable it by setting a (system-wide or database) property.
            o Renamed methods.
            o Not sure what I'm supposed to do about this comment, but it is correct
          • TBD: I'll address the comment about errors in the next iteration.
          • Fixed typo.

          I decided to add four properties to aid debugging and development. These
          properties are (with current defaults:
          a) derby.storage.indexStats.debug.createThreshold (100)
          b) derby.storage.indexStats.debug.absdiffThreshold (1000)
          c) derby.storage.indexStats.debug.lndiffThreshold (1.0)
          d) derby.storage.indexStats.debug.queueSize (5)

          (a) determines how big a table must be before statistics are automatically
          created. (b) determines how big the discrepancy between the row estimates for
          the table and the index must be before the statistics are updated. (c)
          determines how big the logarithmic (natural logarithm) must be before the
          statistics are updated. The values of these properties are printed if tracing
          is turned on. Now:

          Q: I don't understand these properties!
          A: Read the code
          These properties are made available for experimentation and debugging
          only. a-c affect when statistics are created or updated, and are used in
          TableDescriptor. (d) is only used in IndexStatisticsDaemonImpl.

          Q: Why have both (a) and (b)?
          A: Purely for debugging and experimentation. If these properties are included
          in production code, I expect they can be folded into one.

          Q: Why have both (b) and (c)?
          A: In general (c) will decide if the statistics are updated. However, for
          small tables (c) will cause frequent updates of the statistics. For small
          tables accurate statistics are not needed for good performance [1], so
          there is no reason to frequently update the stats. This is where (b) comes
          into play.

          [1] One exception might be if the rows are huge.

          Note that I have two outstanding comments from Dag (marked TBD), and ten TODOs
          left. Four of these won't go away until later. The remaining six I'll try to
          address in the next iteration.

          Show
          Kristian Waagan added a comment - Attached patch 2g. Addressing Dag's first set of comments from 01/Dec/10 05:23 PM (in order). M java/engine/org/apache/derby/impl/sql/compile/CursorNode.java I removed the TODO comment and created method collectTablePrivsAndStats. Also reformatted the existing comment somewhat. Added comment. The BaseFromTable can represent several types of sources, for instance system tables, sub queries, and VTIs. It can also represent a view, but FromBaseTable nodes representing views are rewritten to FromSubqueries during binding according to the FromBaseTable class JavaDoc. Now using no-arg constructor. Added comment and made for-loop iterate backwards to. It is very likely that statistics are mostly up-to-date, so removing from the end of the list saves some copies. On the other hand, the list will mostly be very small, which is why I'm not creating a second list to copy relevant table descriptors to. Renamed method in TableDescriptor to getAndClearIndexStatsIsUpToDate. Yes, comment removed. M java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java I decided to rewrite the code, which relocates the use of the constant. It will now be handled inside TableDescriptor.markForIndexStatsUpdate. See also comment below at the bottom. M java/engine/org/apache/derby/impl/sql/catalog/DataDictionaryImpl.java I'm keeping this temporary code for now, but I have changed it. If the user hasn't explicitly specified the logging property, it will be set to true. If explicitly specified by the user, it won't be overridden. This code should be removed before going into a release, though, and then I guess logging will default to false. A java/engine/org/apache/derby/impl/services/daemon/IndexStatisticsDaemonImpl.java M java/engine/org/apache/derby/impl/db/BasicDatabase.java M java/engine/org/apache/derby/iapi/sql/dictionary/TableDescriptor.java TBD: I'll get back to this later, I have rewritten the code a bit (ditching the use of negative row count estimates). I removed the TODO comment. I added it when I discovered that the row count estimate can get "out of sync" due to how Derby is updating it. It will only happen in certain circumstances, and based on the comments for DERBY-2949 it looks like Knut hit the same problem. He also says that it might be possible to improve the logic. Added JavaDoc. M java/engine/org/apache/derby/iapi/sql/dictionary/DataDictionary.java Various comments from Dag here: o The daemon can disabled at runtime if it experiences severe errors. Note that if the user doesn't want to have the daemon running, he/she would disable it by setting a (system-wide or database) property. o Renamed methods. o Not sure what I'm supposed to do about this comment, but it is correct TBD: I'll address the comment about errors in the next iteration. Fixed typo. I decided to add four properties to aid debugging and development. These properties are (with current defaults: a) derby.storage.indexStats.debug.createThreshold (100) b) derby.storage.indexStats.debug.absdiffThreshold (1000) c) derby.storage.indexStats.debug.lndiffThreshold (1.0) d) derby.storage.indexStats.debug.queueSize (5) (a) determines how big a table must be before statistics are automatically created. (b) determines how big the discrepancy between the row estimates for the table and the index must be before the statistics are updated. (c) determines how big the logarithmic (natural logarithm) must be before the statistics are updated. The values of these properties are printed if tracing is turned on. Now: Q: I don't understand these properties! A: Read the code These properties are made available for experimentation and debugging only. a-c affect when statistics are created or updated, and are used in TableDescriptor. (d) is only used in IndexStatisticsDaemonImpl. Q: Why have both (a) and (b)? A: Purely for debugging and experimentation. If these properties are included in production code, I expect they can be folded into one. Q: Why have both (b) and (c)? A: In general (c) will decide if the statistics are updated. However, for small tables (c) will cause frequent updates of the statistics. For small tables accurate statistics are not needed for good performance [1] , so there is no reason to frequently update the stats. This is where (b) comes into play. [1] One exception might be if the rows are huge. Note that I have two outstanding comments from Dag (marked TBD), and ten TODOs left. Four of these won't go away until later. The remaining six I'll try to address in the next iteration.
          Hide
          Kristian Waagan added a comment -

          Removed unused method QueryTreeNode.getRowEstimate with revision 1042461.
          Thanks, Dag!

          Show
          Kristian Waagan added a comment - Removed unused method QueryTreeNode.getRowEstimate with revision 1042461. Thanks, Dag!
          Hide
          Kristian Waagan added a comment -

          Attaching patch 2f, which contains mostly test improvements. I also fixed the bug in the property handling.

          I'm not quite sure about the value of the multi-test, but the logging output shows that the code dealing with duplicates is stressed a bit:
          $ grep istat system/derby.log
          Thu Dec 02 21:02:08 CET 2010 Thread[Thread-3,5,main]

          {istat} "APP"."MTSEL": update scheduled - 6c44409f-012c-a8ad-5f23-ffffb00d8f4e reason=[no stats, row-estimate=-80075] (queueSize=1)
          Thu Dec 02 21:02:08 CET 2010 Thread[index-stat-thread,5,main] {istat}

          "APP"."MTSEL": generating index statistics
          Thu Dec 02 21:02:08 CET 2010 Thread[index-stat-thread,5,main]

          {istat} "APP"."MTSEL": wrote stats for index ace4c0a3-012c-a8ad-5f23-ffffb00d8f4e (rows=100000, card=[10,100000]), 0.0% utilization
          Thu Dec 02 21:02:08 CET 2010 Thread[index-stat-thread,5,main] {istat}

          "APP"."MTSEL": scan durations (c1153=124ms)
          Thu Dec 02 21:02:08 CET 2010 Thread[index-stat-thread,5,main]

          {istat} "APP"."MTSEL": generation complete (269 ms)
          Thu Dec 02 21:02:18 CET 2010 Thread[TestRunner-Thread,5,main] {istat}

          stopping daemon, active=false [q/p/s=0/1/1,err:k/u=0/0,rej=76]
          Thu Dec 02 21:02:22 CET 2010 Thread[Thread-12,5,main]

          {istat} "APP"."MTSEL": update scheduled - b4df26ce-012c-a8ad-5f23-ffffb00d8f4e reason=[no stats, row-estimate=-80095] (queueSize=1)
          Thu Dec 02 21:02:22 CET 2010 Thread[index-stat-thread,5,main] {istat}

          "APP"."MTSEL": generating index statistics
          Thu Dec 02 21:02:22 CET 2010 Thread[index-stat-thread,5,main]

          {istat} "APP"."MTSEL": wrote stats for index 7baea6d2-012c-a8ad-5f23-ffffb00d8f4e (rows=100000, card=[10,100000]), 0.0% utilization
          Thu Dec 02 21:02:22 CET 2010 Thread[index-stat-thread,5,main] {istat}

          "APP"."MTSEL": scan durations (c1185=72ms)
          Thu Dec 02 21:02:22 CET 2010 Thread[index-stat-thread,5,main]

          {istat} "APP"."MTSEL": generation complete (429 ms)
          Thu Dec 02 21:02:32 CET 2010 Thread[TestRunner-Thread,5,main] {istat}

          stopping daemon, active=false [q/p/s=0/1/1,err:k/u=0/0,rej=337]

          BTW, I find the "daemon stats" to be a bit messy. Any ideas how to improve the presentation?
          Is it too much info?
          Anything you miss?
          It will only be printed when the database is properly shut down.

          Here's a quick legend:
          active = whether the daemon was doing work when stopped
          q = queued
          p = processed
          s = scheduled
          k = known errors (expected)
          u = unexpected errors
          rej = rejected requests (full queue or duplicate entries)

          I'll address Dag's comments in the next patch.

          Show
          Kristian Waagan added a comment - Attaching patch 2f, which contains mostly test improvements. I also fixed the bug in the property handling. I'm not quite sure about the value of the multi-test, but the logging output shows that the code dealing with duplicates is stressed a bit: $ grep istat system/derby.log Thu Dec 02 21:02:08 CET 2010 Thread [Thread-3,5,main] {istat} "APP"."MTSEL": update scheduled - 6c44409f-012c-a8ad-5f23-ffffb00d8f4e reason= [no stats, row-estimate=-80075] (queueSize=1) Thu Dec 02 21:02:08 CET 2010 Thread [index-stat-thread,5,main] {istat} "APP"."MTSEL": generating index statistics Thu Dec 02 21:02:08 CET 2010 Thread [index-stat-thread,5,main] {istat} "APP"."MTSEL": wrote stats for index ace4c0a3-012c-a8ad-5f23-ffffb00d8f4e (rows=100000, card= [10,100000] ), 0.0% utilization Thu Dec 02 21:02:08 CET 2010 Thread [index-stat-thread,5,main] {istat} "APP"."MTSEL": scan durations (c1153=124ms) Thu Dec 02 21:02:08 CET 2010 Thread [index-stat-thread,5,main] {istat} "APP"."MTSEL": generation complete (269 ms) Thu Dec 02 21:02:18 CET 2010 Thread [TestRunner-Thread,5,main] {istat} stopping daemon, active=false [q/p/s=0/1/1,err:k/u=0/0,rej=76] Thu Dec 02 21:02:22 CET 2010 Thread [Thread-12,5,main] {istat} "APP"."MTSEL": update scheduled - b4df26ce-012c-a8ad-5f23-ffffb00d8f4e reason= [no stats, row-estimate=-80095] (queueSize=1) Thu Dec 02 21:02:22 CET 2010 Thread [index-stat-thread,5,main] {istat} "APP"."MTSEL": generating index statistics Thu Dec 02 21:02:22 CET 2010 Thread [index-stat-thread,5,main] {istat} "APP"."MTSEL": wrote stats for index 7baea6d2-012c-a8ad-5f23-ffffb00d8f4e (rows=100000, card= [10,100000] ), 0.0% utilization Thu Dec 02 21:02:22 CET 2010 Thread [index-stat-thread,5,main] {istat} "APP"."MTSEL": scan durations (c1185=72ms) Thu Dec 02 21:02:22 CET 2010 Thread [index-stat-thread,5,main] {istat} "APP"."MTSEL": generation complete (429 ms) Thu Dec 02 21:02:32 CET 2010 Thread [TestRunner-Thread,5,main] {istat} stopping daemon, active=false [q/p/s=0/1/1,err:k/u=0/0,rej=337] BTW, I find the "daemon stats" to be a bit messy. Any ideas how to improve the presentation? Is it too much info? Anything you miss? It will only be printed when the database is properly shut down. Here's a quick legend: active = whether the daemon was doing work when stopped q = queued p = processed s = scheduled k = known errors (expected) u = unexpected errors rej = rejected requests (full queue or duplicate entries) I'll address Dag's comments in the next patch.
          Hide
          Dag H. Wanvik added a comment -

          Last review installment. Just minor issues with the daemon implementation, and some questions:

          A java/engine/org/apache/derby/iapi/services/daemon/IndexStatisticsDaemon.java

          The background mode will try to
          affect other operations as little as possible, and errors won't be reported
          unless they are severe.

          • How is this safe? (errors not reported?) explain! What happens if schedule throws? Under which condition can work queue be full? Then what hapens?

          A java/engine/org/apache/derby/impl/services/daemon/IndexStatisticsDaemonImpl.java

          • LOG_HEADER can be private
          • "For instance, it will not set locks on the conglomerates it scans, and if it needs to take locks it will give up immediately if the locks cannot be obtained. This isn't true in all cases, which means that the background work may still interfere with the user activity in the database (besides from using resources)."

          This seems contradictory to me. What are "in all cases" exactly? Not in daemon mode? Or is the first sentence not true: "if it needs to take locks it will give up immediately if the locks cannot be obtained"

          • Class javadoc IMPROVEMENTS: Note sure I understand the comments about "row estimate when writing statistics": doesn't this class compute the "real" numbers? If so, how could they be improved upon?
          • daemonDisabled is supposedly guarded by queue monitor. What about access in updateIndexStatsMinion? Seems to be called from runExplicitly, which does not syhcronize on "queue"?
          • Under what conditions can "daemonLCC" be null when "run" is called? (cf test ca line 4 of that method) I think, if a thread is started up again. Could use a comment.
          • Top of run method: question of philosopy of local use of "final":

          long runStart = System.currentTimeMillis();
          final ContextService ctxService = ContextService.getFactory();

          Both variables are effectively final, but only one is marked as such. Is it accidental?

          • "// TODO: Would be nice to name the transaction."

          Interesting, is there a facility to do that in Derby?

          • Optional (additional) tracing to std out is just temporary? Or do you want to keep it in production code as well?
          • javadoc for "schedule": "Assume the descriptor will be valid until we get around to generate the statistics." Is this assured somewhere outside this class? If not, what would happen if it does not hold?
          • the boolean result from "schedule" does not seem to be used. Needed?

          If you want this for some future use, the reject due to the table already being scheduled should probably have another value; it seems less serious than the queue being full or the daemon beinbg disabled, I think: maybe of of

          {accepted, rejected, redundant}

          .

          • runningThread is supposed to be protected by "queue", but setting it to null in "run"'s "finally" block is not protected, so you could risk a race condition when determining if thread is running or not on schedule, I think. "runningThread" need probably be set to false inside the synchronized block of run where you discover the queue is empty, sinc efater that point, the thread will invariable terminate, and schedule needs to create a new one.
          • } else if (se.getSeverity() >= ExceptionSeverity.DATABASE_SEVERITY) { // The database or system is going down. Probably handled elsewhere // but disable daemon anyway. Do we know this for sure? I thought that usually this would be handled when the severity bubbled up to handleException on the API level, which in turn calls TransactionResourceImpl#handleException to do the handling. * handleUnexpectedErrors#TODO: Do we need a mechanism to disable the daemon if too many unexpected errors are raised within a short period of time?" Probably a good idea, yes. * run: "// Queue may have been cleared due to a severe failure // or shutdown." Where in the code would this happen? * }

            catch (ShutdownException se) {
            stop(); // Call stop to log activity
            ctxMgr.cleanupOnError(se);

          Can this happen? I tried to convince myself it could... Answer, yes it can happen, e.g in Dep man's coreInvalidateFor which does popCompilerContext.

          • generateStatistics "TODO: Do we want to retry if we can't get the lock(s)?
            If so, maybe add sleep for a while if there are no
            other stats to generate?"

          Seems you do add sleep now always. Would you want to try another queue task in meantime, if there is one?

          • In invalidateStatements, you always retry when seeing StandardException. Is that safe? Shouldn't it only be for LOCK_TIMEOUT as in generateStatistics?
          • Javadoc for RowCountable#setEstimatedRowCount should probably be updated now: "For instance if we implement some sort of update statistics command, or just after a create index a complete scan will have been done of the table." (It is called from setHeapRowEstimate)

          Sorry it took so long to review the patch, much new code for me, edifying though

          Show
          Dag H. Wanvik added a comment - Last review installment. Just minor issues with the daemon implementation, and some questions: A java/engine/org/apache/derby/iapi/services/daemon/IndexStatisticsDaemon.java The background mode will try to affect other operations as little as possible, and errors won't be reported unless they are severe. How is this safe? (errors not reported?) explain! What happens if schedule throws? Under which condition can work queue be full? Then what hapens? A java/engine/org/apache/derby/impl/services/daemon/IndexStatisticsDaemonImpl.java LOG_HEADER can be private "For instance, it will not set locks on the conglomerates it scans, and if it needs to take locks it will give up immediately if the locks cannot be obtained. This isn't true in all cases, which means that the background work may still interfere with the user activity in the database (besides from using resources)." This seems contradictory to me. What are "in all cases" exactly? Not in daemon mode? Or is the first sentence not true: "if it needs to take locks it will give up immediately if the locks cannot be obtained" Class javadoc IMPROVEMENTS: Note sure I understand the comments about "row estimate when writing statistics": doesn't this class compute the "real" numbers? If so, how could they be improved upon? daemonDisabled is supposedly guarded by queue monitor. What about access in updateIndexStatsMinion? Seems to be called from runExplicitly, which does not syhcronize on "queue"? The use of volatile for "daemonStopped" appears to be safe for use even with Java 1.4 (cf. http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#volatile ), good. (I can't see any variable access reordering that could impact its correctness here), Under what conditions can "daemonLCC" be null when "run" is called? (cf test ca line 4 of that method) I think, if a thread is started up again. Could use a comment. Top of run method: question of philosopy of local use of "final": long runStart = System.currentTimeMillis(); final ContextService ctxService = ContextService.getFactory(); Both variables are effectively final, but only one is marked as such. Is it accidental? "// TODO: Would be nice to name the transaction." Interesting, is there a facility to do that in Derby? Optional (additional) tracing to std out is just temporary? Or do you want to keep it in production code as well? javadoc for "schedule": "Assume the descriptor will be valid until we get around to generate the statistics." Is this assured somewhere outside this class? If not, what would happen if it does not hold? the boolean result from "schedule" does not seem to be used. Needed? If you want this for some future use, the reject due to the table already being scheduled should probably have another value; it seems less serious than the queue being full or the daemon beinbg disabled, I think: maybe of of {accepted, rejected, redundant} . runningThread is supposed to be protected by "queue", but setting it to null in "run"'s "finally" block is not protected, so you could risk a race condition when determining if thread is running or not on schedule, I think. "runningThread" need probably be set to false inside the synchronized block of run where you discover the queue is empty, sinc efater that point, the thread will invariable terminate, and schedule needs to create a new one. } else if (se.getSeverity() >= ExceptionSeverity.DATABASE_SEVERITY) { // The database or system is going down. Probably handled elsewhere // but disable daemon anyway. Do we know this for sure? I thought that usually this would be handled when the severity bubbled up to handleException on the API level, which in turn calls TransactionResourceImpl#handleException to do the handling. * handleUnexpectedErrors#TODO: Do we need a mechanism to disable the daemon if too many unexpected errors are raised within a short period of time?" Probably a good idea, yes. * run: "// Queue may have been cleared due to a severe failure // or shutdown." Where in the code would this happen? * } catch (ShutdownException se) { stop(); // Call stop to log activity ctxMgr.cleanupOnError(se); Can this happen? I tried to convince myself it could... Answer, yes it can happen, e.g in Dep man's coreInvalidateFor which does popCompilerContext. generateStatistics "TODO: Do we want to retry if we can't get the lock(s)? If so, maybe add sleep for a while if there are no other stats to generate?" Seems you do add sleep now always. Would you want to try another queue task in meantime, if there is one? In invalidateStatements, you always retry when seeing StandardException. Is that safe? Shouldn't it only be for LOCK_TIMEOUT as in generateStatistics? Javadoc for RowCountable#setEstimatedRowCount should probably be updated now: "For instance if we implement some sort of update statistics command, or just after a create index a complete scan will have been done of the table." (It is called from setHeapRowEstimate) Sorry it took so long to review the patch, much new code for me, edifying though
          Hide
          Kristian Waagan added a comment -

          Thanks for the comments, Dag!

          I'll get to work on these tomorrow. This is the first review this prototype code has seen, and it has changed significantly since I wrote the first version. Nice to get someone with fresh eyes and no prior knowledge about the patch to pick up things like naming and strange comments

          Show
          Kristian Waagan added a comment - Thanks for the comments, Dag! I'll get to work on these tomorrow. This is the first review this prototype code has seen, and it has changed significantly since I wrote the first version. Nice to get someone with fresh eyes and no prior knowledge about the patch to pick up things like naming and strange comments
          Hide
          Dag H. Wanvik added a comment -

          Note that QueryTreeNode (also) has an unused method called "getRowEstimate" (you added that to Statistics), that could perhaps be removed to avoid confusion.

          Show
          Dag H. Wanvik added a comment - Note that QueryTreeNode (also) has an unused method called "getRowEstimate" (you added that to Statistics), that could perhaps be removed to avoid confusion.
          Hide
          Dag H. Wanvik added a comment -

          Hi Kristian, some comments on the patch beyond the tests. I still need to look at the meat in IndexStatisticsDaemonImpl, that will follow in a next review installment. To the extent my (preliminary) questions will be answered when I have read IndexStatisticsDaemonImpl fully, please ignore them

          Generally the structure of the patch is clear and seems to do the right things. There are some amount of TODOs left that need weork as you are no doubt aware, and some of the naming could use some work, more below.

          M java/engine/org/apache/derby/impl/sql/compile/CursorNode.java

          if (fromTable instanceof FromBaseTable) {
          TableDescriptor td = fromTable.getTableDescriptor();
          :
          // TODO: This was done because I didn't find another way to
          // access the base tables in the query at the time I
          // needed to. There may be a better way to do this!

          • The privilege collection phase seems like a good place to do it to me? Obviates need for an extra pass, and it's always performed I think. It would be nice to format the block in "if (fromTable.isPrivilegeCollectionRequired())" with braces and correct indentation while you are at it, too.

          if (checkIndexStats &&
          td.getTableType() == TableDescriptor.BASE_TABLE_TYPE) {

          • Isn't td's table type always BASE_TABLE_TYPE here? Cf. "if (fromTable instanceof FromBaseTable)" above.. No, it seems it can also be a view, cf comment in FromBaseTable's class Javadoc. Confusing.. maybe add a comment.

          if (statsToUpdate == null)

          { statsToUpdate = new ArrayList(2); }

          statsToUpdate.add(td);
          }
          }

          • Btw, why use "2" as initialCapacity to "new ArrayList"? Is it statistically more likely we have 2 than any other number (1, 3)? If so, comment, if not, I'd suggest use no-arg constructor.
          • // Look for missing and stale statistics.

          The loop seems to be doing the opposite: removing those that are up-to-date? I am not a big fan of include type (here "flag" in the name of methods & variables); I'd prefer a name that indicated the semantics of the flag instead, e.g. "getAndClearIsUpToDate". That reversed meaning removes the need for the "!" in the test and makes it easier to read.

          • // Assume a low number of base tables.
            baseTables.remove();

          I presume this comment pertains to performance only.

          M java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java

          Parameterize 100 properly with a constant or property? Fix two TODOs in estimateCost.

          M java/engine/org/apache/derby/impl/sql/catalog/DataDictionaryImpl.java

          Remove TODO in setIndexStatsRefresher (machinery is temporary, code says).

          A java/engine/org/apache/derby/impl/services/daemon/IndexStatisticsDaemonImpl.java
          M java/engine/org/apache/derby/impl/db/BasicDatabase.java
          M java/engine/org/apache/derby/iapi/sql/dictionary/TableDescriptor.java

          • markForIndexStatsUpdate: sdl may see nullpointerexception if tableRowCountEstimate >= 0, cf.
            test: "if (sdl.isEmpty() && tableRowCountEstimate < 0)" doesn't hold. If known invariant, use ASSERT instead. Javadoc should explain the use of negative tableRowCountEstimate which seems expected.

          The diff must be one order of magnitude as far as i can read the code, for an update to be triggered. Would be nice to comment this logic a bit. Not sure why you need the first comparison (against 1000 - has a TODO attached though) when you have the log comparison anyway just beneath? Only saves som math operations; worth it? The generated reason string

          "indexStatsUpdateReason = "t-est=" + tableRowCountEstimate +
          ", i-est=" + indexRowCountEstimate + " => cmp=" + cmp;

          seems to be comparing apples to oranges (number vs log number)?

          • markForIndexStatsUpdate: change comment to Javadoc. Remaining TODO: The statement "the more accurate table row count estimate may be lost" isn't clear to me.

          M java/engine/org/apache/derby/iapi/sql/dictionary/DataDictionary.java

          getIndexStatsRefresher: "if disabled" how can it be disabled? A: See disableIndexStatsRefresher.
          When is this one used (scenario) ? Can it be reeenabled, if so, how? No, only on fatal error. Maybe note in Javadoc what it's for..

          doSetIndexStatsRefresher, setIndexStatsRefresher: I'd prefer doCreateIndexStatsRefresher, createIndexStatsRefresher I think. It just seemed vague to me what the verb "set" meant here. Since a refresher isn't provided in "setIndexStatsRefresher", I mean, as in a normal getter/setter pattern.

          doSetIndexStatsRefresher -> !indexRefreshedExists

          A java/engine/org/apache/derby/iapi/services/daemon/IndexStatisticsDaemon.java

          • The background mode will try to
          • affect other operations as little as possible, and errors won't be reported
          • unless they are severe.

          How is this safe? (errors not reported?) explain! What happens if scheduled work throws? Under which condition can work queue be full? Then what happens? I guess I'll know when I read the Impl class

          Javadoc typo for schedule: schedulig

          Show
          Dag H. Wanvik added a comment - Hi Kristian, some comments on the patch beyond the tests. I still need to look at the meat in IndexStatisticsDaemonImpl, that will follow in a next review installment. To the extent my (preliminary) questions will be answered when I have read IndexStatisticsDaemonImpl fully, please ignore them Generally the structure of the patch is clear and seems to do the right things. There are some amount of TODOs left that need weork as you are no doubt aware, and some of the naming could use some work, more below. M java/engine/org/apache/derby/impl/sql/compile/CursorNode.java if (fromTable instanceof FromBaseTable) { TableDescriptor td = fromTable.getTableDescriptor(); : // TODO: This was done because I didn't find another way to // access the base tables in the query at the time I // needed to. There may be a better way to do this! The privilege collection phase seems like a good place to do it to me? Obviates need for an extra pass, and it's always performed I think. It would be nice to format the block in "if (fromTable.isPrivilegeCollectionRequired())" with braces and correct indentation while you are at it, too. if (checkIndexStats && td.getTableType() == TableDescriptor.BASE_TABLE_TYPE) { Isn't td's table type always BASE_TABLE_TYPE here? Cf. "if (fromTable instanceof FromBaseTable)" above.. No, it seems it can also be a view, cf comment in FromBaseTable's class Javadoc. Confusing.. maybe add a comment. if (statsToUpdate == null) { statsToUpdate = new ArrayList(2); } statsToUpdate.add(td); } } Btw, why use "2" as initialCapacity to "new ArrayList"? Is it statistically more likely we have 2 than any other number (1, 3)? If so, comment, if not, I'd suggest use no-arg constructor. // Look for missing and stale statistics. The loop seems to be doing the opposite: removing those that are up-to-date? I am not a big fan of include type (here "flag" in the name of methods & variables); I'd prefer a name that indicated the semantics of the flag instead, e.g. "getAndClearIsUpToDate". That reversed meaning removes the need for the "!" in the test and makes it easier to read. // Assume a low number of base tables. baseTables.remove(); I presume this comment pertains to performance only. M java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java Parameterize 100 properly with a constant or property? Fix two TODOs in estimateCost. M java/engine/org/apache/derby/impl/sql/catalog/DataDictionaryImpl.java Remove TODO in setIndexStatsRefresher (machinery is temporary, code says). A java/engine/org/apache/derby/impl/services/daemon/IndexStatisticsDaemonImpl.java M java/engine/org/apache/derby/impl/db/BasicDatabase.java M java/engine/org/apache/derby/iapi/sql/dictionary/TableDescriptor.java markForIndexStatsUpdate: sdl may see nullpointerexception if tableRowCountEstimate >= 0, cf. test: "if (sdl.isEmpty() && tableRowCountEstimate < 0)" doesn't hold. If known invariant, use ASSERT instead. Javadoc should explain the use of negative tableRowCountEstimate which seems expected. The diff must be one order of magnitude as far as i can read the code, for an update to be triggered. Would be nice to comment this logic a bit. Not sure why you need the first comparison (against 1000 - has a TODO attached though) when you have the log comparison anyway just beneath? Only saves som math operations; worth it? The generated reason string "indexStatsUpdateReason = "t-est=" + tableRowCountEstimate + ", i-est=" + indexRowCountEstimate + " => cmp=" + cmp; seems to be comparing apples to oranges (number vs log number)? markForIndexStatsUpdate: change comment to Javadoc. Remaining TODO: The statement "the more accurate table row count estimate may be lost" isn't clear to me. M java/engine/org/apache/derby/iapi/sql/dictionary/DataDictionary.java getIndexStatsRefresher: "if disabled" how can it be disabled? A: See disableIndexStatsRefresher. When is this one used (scenario) ? Can it be reeenabled, if so, how? No, only on fatal error. Maybe note in Javadoc what it's for.. doSetIndexStatsRefresher, setIndexStatsRefresher: I'd prefer doCreateIndexStatsRefresher, createIndexStatsRefresher I think. It just seemed vague to me what the verb "set" meant here. Since a refresher isn't provided in "setIndexStatsRefresher", I mean, as in a normal getter/setter pattern. doSetIndexStatsRefresher -> !indexRefreshedExists A java/engine/org/apache/derby/iapi/services/daemon/IndexStatisticsDaemon.java The background mode will try to affect other operations as little as possible, and errors won't be reported unless they are severe. How is this safe? (errors not reported?) explain! What happens if scheduled work throws? Under which condition can work queue be full? Then what happens? I guess I'll know when I read the Impl class Javadoc typo for schedule: schedulig
          Hide
          Kristian Waagan added a comment -

          The only thing I can say based on the output you attached, is that it appears that the feature has been disabled.
          I think this is because of a bug I introduced when I changed the names of the properties. Try running without specifying derby.storage.indexStats.auto.

          I have already fixed this in my sandbox, and I'll include it in the proper version of 2f, along with one more test and some additional changes to the tests/utilities. I'll also inculde the value of the disable-flag in the trace line printed when the daemon is created.

          Show
          Kristian Waagan added a comment - The only thing I can say based on the output you attached, is that it appears that the feature has been disabled. I think this is because of a bug I introduced when I changed the names of the properties. Try running without specifying derby.storage.indexStats.auto. I have already fixed this in my sandbox, and I'll include it in the proper version of 2f, along with one more test and some additional changes to the tests/utilities. I'll also inculde the value of the disable-flag in the trace line printed when the daemon is created.
          Hide
          Lily Wei added a comment -

          Thanks Kristian for all the detail information. I apply patch 2f. I clean the build and run the AutomaticIndexStatisticsTest. As your prediction, take out deleted deprecated properties make the hang go away. Yeah! Thank you! However, I am still getting failure on testStatsCreatedOnGrowthThenDeleteDb, testStatisticsCorrectness and testStatsCreatedOnGrowthThenDeleteDb. I am including the .zip for my test result. I did not find a lot of tracing information for index storage. Should this be the case?
          It is great to learn we can drop table while statistic indexing daemon thread is working. I think we can test it by turning feature on and run Suites.all. Is that a right assumption? I did that and the tests run just fine.

          Show
          Lily Wei added a comment - Thanks Kristian for all the detail information. I apply patch 2f. I clean the build and run the AutomaticIndexStatisticsTest. As your prediction, take out deleted deprecated properties make the hang go away. Yeah! Thank you! However, I am still getting failure on testStatsCreatedOnGrowthThenDeleteDb, testStatisticsCorrectness and testStatsCreatedOnGrowthThenDeleteDb. I am including the .zip for my test result. I did not find a lot of tracing information for index storage. Should this be the case? It is great to learn we can drop table while statistic indexing daemon thread is working. I think we can test it by turning feature on and run Suites.all. Is that a right assumption? I did that and the tests run just fine.
          Hide
          Dag H. Wanvik added a comment -

          Thanks Kristian! Re testDropWhileScanningThenDelete proposal, sorry, reviewer was just confused

          Show
          Dag H. Wanvik added a comment - Thanks Kristian! Re testDropWhileScanningThenDelete proposal, sorry, reviewer was just confused
          Hide
          Kristian Waagan added a comment -

          — Dag's comment
          Thanks, Dag. See my comments below (same order as above). I'm attaching
          'derby-4771-2f-prototype_lcc_code_dump-WORK-IN-PROGRESS.diff', which isn't
          quite finished yet but contains most of the test fixes.

          • fixed
          • the daemon will start working as soon as the work is scheduled.
            The time it takes depends on overhead related to thread creation/start and
            scanning the base table. Since the tables are small, 500ms is enough if the
            system is in a good state. You are right it may be too low in some cases, so
            I added a method that will try to get the statistics several times until the
            minimum number of stats has been obtained, or the operation times out (I set
            the default timeout to 20 seconds).
          • No, I deleted one and renamed the other.
          • You mean that it is used in several places without being declared a constant?
            I created MASTERDB.
          • Thanks, I have forgotten to shut down that database.
          • testDropWhileScanningThenDelete: TBD, but one option is to trigger another
            automated update after the drop, to at least make sure the daemon is able to
            work after a table has disappeared under its feet.
            Didn't quite get you proposal, but I'll give it some more thought tomorrow.
            I think originally this test found a bug where I was unable to delete the
            database because of dangling file handles on Windows.
          • comment added. We're expecting three entries for the single index, since it
            contains three columns (leading columns: c1, c1-c2, c1-c2-c3).
          • deleted dumpLocks, I think Knut added a similar function somewhere that can
            be copied (and adapted if required) if needed.

          — Lily's comment
          Thanks for testing again, Lily.

          Can you please apply patch 2f (work in progress, but no reason to test with the
          older tests now), make sure you do a clean build and then run the tests again?
          They're not failing for me, but I'm currently testing on Solaris.
          From the output, it looks like the statistics aren't getting generated [in time]
          on your machine. This could be due to the problem that Dag commented on,
          which is hopefully addressed in the preliminary patch 2f.

          I'll run the tests on Windows myself too, to get some more data points.

          For brevity, it would be nice if you deleted the deprecated properties from the
          command line.

          Show
          Kristian Waagan added a comment - — Dag's comment Thanks, Dag. See my comments below (same order as above). I'm attaching 'derby-4771-2f-prototype_lcc_code_dump-WORK-IN-PROGRESS.diff', which isn't quite finished yet but contains most of the test fixes. fixed the daemon will start working as soon as the work is scheduled. The time it takes depends on overhead related to thread creation/start and scanning the base table. Since the tables are small, 500ms is enough if the system is in a good state. You are right it may be too low in some cases, so I added a method that will try to get the statistics several times until the minimum number of stats has been obtained, or the operation times out (I set the default timeout to 20 seconds). No, I deleted one and renamed the other. You mean that it is used in several places without being declared a constant? I created MASTERDB. Thanks, I have forgotten to shut down that database. testDropWhileScanningThenDelete: TBD, but one option is to trigger another automated update after the drop, to at least make sure the daemon is able to work after a table has disappeared under its feet. Didn't quite get you proposal, but I'll give it some more thought tomorrow. I think originally this test found a bug where I was unable to delete the database because of dangling file handles on Windows. comment added. We're expecting three entries for the single index, since it contains three columns (leading columns: c1, c1-c2, c1-c2-c3). deleted dumpLocks, I think Knut added a similar function somewhere that can be copied (and adapted if required) if needed. — Lily's comment Thanks for testing again, Lily. Can you please apply patch 2f (work in progress, but no reason to test with the older tests now), make sure you do a clean build and then run the tests again? They're not failing for me, but I'm currently testing on Solaris. From the output, it looks like the statistics aren't getting generated [in time] on your machine. This could be due to the problem that Dag commented on, which is hopefully addressed in the preliminary patch 2f. I'll run the tests on Windows myself too, to get some more data points. For brevity, it would be nice if you deleted the deprecated properties from the command line.
          Hide
          Lily Wei added a comment -

          Thanks Kristian. I really like the change from derby.storage.indexStats.trace to "log" and interface IndexStatisticsDaemon . Suites.all run fine for me with Statistic Index turn on. When test on AutomaticIndexStatisticTest, I am seeing failure on testGenerationCompleteDeleteDb, testStatsCreatedOnGrowth, and testStatsUpdatedOnGrowth. testStatisticsCorrectness hangs. I use the following script:
          java -client -Dderby.tests.trace=true -Dderby.language.disableIndexStatsUpdate=t
          rue -Dderby.language.logIndexStatsUpdate=true -Dderby.language.traceIndexStatsUp
          date=both -Dderby.storage.indexStats.auto=true -Dderby.storage.indexStats.log=tu
          re -Dderby.storage.indexStats.trace=log junit.textui.TestRunner $1 2>&1 | tee rj
          all.out

          I am attaching my the zip file that contain the entire out on DERBY-4771-2e-prototype.rar

          Show
          Lily Wei added a comment - Thanks Kristian. I really like the change from derby.storage.indexStats.trace to "log" and interface IndexStatisticsDaemon . Suites.all run fine for me with Statistic Index turn on. When test on AutomaticIndexStatisticTest, I am seeing failure on testGenerationCompleteDeleteDb, testStatsCreatedOnGrowth, and testStatsUpdatedOnGrowth. testStatisticsCorrectness hangs. I use the following script: java -client -Dderby.tests.trace=true -Dderby.language.disableIndexStatsUpdate=t rue -Dderby.language.logIndexStatsUpdate=true -Dderby.language.traceIndexStatsUp date=both -Dderby.storage.indexStats.auto=true -Dderby.storage.indexStats.log=tu re -Dderby.storage.indexStats.trace=log junit.textui.TestRunner $1 2>&1 | tee rj all.out I am attaching my the zip file that contain the entire out on DERBY-4771 -2e-prototype.rar
          Hide
          Dag H. Wanvik added a comment -

          Some comments on the tests first (started with those). Looks good, as
          far as I can understand the tests do test the auto-update feature in
          many aspects, and the MT test stresses it with concurrent drop/create
          of the table in question. Without the rest of the patch, the
          AutomaticIndexStatisticsTest fail as expected;
          AutomaticIndexStatisticsTest also hung waiting for three stat entries,
          that's ok I guess.

          AutomaticIndexStatisticsMultiTest worked also without the rest of the
          patch, but it still adds value since we know auto-update happens now,
          so the code paths exercised is different.

          Some comments on AutomaticIndexStatisticsTest:

          • hard-coded use of "system" & "system/wombat" should be avoided
          • Is 500ms wait for daemon to update stats enough/stable
            enough for the tests? Have you measured how long it
            typically takes?
          • Does testStatsCreatedOnGrowth add any value compared to
            testGenerationCompleteDeleteDb?
          • hardwired constant string "masterDb"
          • Not safe to copy db without freezing source db?
            (post-commit could do writes?)
          • I guess testDropWhileScanningThenDelete doesn't really know
            for sure whether it exercised to logic to stop scan (based
            on timing), so just passing isn't proof the right thing
            happened. Probably hard to improve on that though? Maybe you
            could start with one statistics and assert that its still
            same date after shutting down & rebooting?
          • // We expect three stats objects: base table, index and
            third is ? Would be nice with explanation in comment here.
          • dumpLocks isn't used, so could be commented out or removed.

          More when I have looked into the implementation.

          Show
          Dag H. Wanvik added a comment - Some comments on the tests first (started with those). Looks good, as far as I can understand the tests do test the auto-update feature in many aspects, and the MT test stresses it with concurrent drop/create of the table in question. Without the rest of the patch, the AutomaticIndexStatisticsTest fail as expected; AutomaticIndexStatisticsTest also hung waiting for three stat entries, that's ok I guess. AutomaticIndexStatisticsMultiTest worked also without the rest of the patch, but it still adds value since we know auto-update happens now, so the code paths exercised is different. Some comments on AutomaticIndexStatisticsTest: hard-coded use of "system" & "system/wombat" should be avoided Is 500ms wait for daemon to update stats enough/stable enough for the tests? Have you measured how long it typically takes? Does testStatsCreatedOnGrowth add any value compared to testGenerationCompleteDeleteDb? hardwired constant string "masterDb" Not safe to copy db without freezing source db? (post-commit could do writes?) I guess testDropWhileScanningThenDelete doesn't really know for sure whether it exercised to logic to stop scan (based on timing), so just passing isn't proof the right thing happened. Probably hard to improve on that though? Maybe you could start with one statistics and assert that its still same date after shutting down & rebooting? // We expect three stats objects: base table, index and third is ? Would be nice with explanation in comment here. dumpLocks isn't used, so could be commented out or removed. More when I have looked into the implementation.
          Hide
          Dag H. Wanvik added a comment -

          Some comments on the tests first (started with those). Looks good, as
          far as I can understand the tests do test the auto-update feayure in
          many aspects, and the MT test stresses it with concurrent drop/create
          of the table in question. Without the rest of the patch, the
          AutomaticIndexStatisticsTest fail as expected;
          AutomaticIndexStatisticsTest also hung waiting for three stat entries,
          that's ok I guess.

          AutomaticIndexStatisticsMultiTest worked also without the rest of the
          patch, but it still adds value since we know auto-update happens now,
          so the code paths exercised is different.

          Some comments on AutomaticIndexStatisticsTest:

          • hardcoded use of "system" & "system/wombat" should be avoided
          • Is 500ms wait for deamon to update stats enough/stable
            enough for the tests? Have you measured how long it
            typically takes?
          • Does testStatsCreatedOnGrowth add any value compared to
            testGenerationCompleteDeleteDb?
          • hardwired constant string "masterDb"
          • Not safe to copy db without freezing source db?
            (postcommit could do writes?)
          • I guess testDropWhileScanningThenDelete doesn't really know
            for sure whether it exercised to logic to stop scan (based
            on timing), so just passing isn't proof the righ tthing
            happened. Probably hard to improve on that though? Maybe you
            could start with one statistics and assert that its still
            same date after shutting down & rebooting?
          • // We expect three stats objects: base table, index and
            third is ? Would be nice with explanation in comment here.
          • dumpLocks isn't used, so could be commented out or removed.

          More when I have looked into the implementation.

          Show
          Dag H. Wanvik added a comment - Some comments on the tests first (started with those). Looks good, as far as I can understand the tests do test the auto-update feayure in many aspects, and the MT test stresses it with concurrent drop/create of the table in question. Without the rest of the patch, the AutomaticIndexStatisticsTest fail as expected; AutomaticIndexStatisticsTest also hung waiting for three stat entries, that's ok I guess. AutomaticIndexStatisticsMultiTest worked also without the rest of the patch, but it still adds value since we know auto-update happens now, so the code paths exercised is different. Some comments on AutomaticIndexStatisticsTest: hardcoded use of "system" & "system/wombat" should be avoided Is 500ms wait for deamon to update stats enough/stable enough for the tests? Have you measured how long it typically takes? Does testStatsCreatedOnGrowth add any value compared to testGenerationCompleteDeleteDb? hardwired constant string "masterDb" Not safe to copy db without freezing source db? (postcommit could do writes?) I guess testDropWhileScanningThenDelete doesn't really know for sure whether it exercised to logic to stop scan (based on timing), so just passing isn't proof the righ tthing happened. Probably hard to improve on that though? Maybe you could start with one statistics and assert that its still same date after shutting down & rebooting? // We expect three stats objects: base table, index and third is ? Would be nice with explanation in comment here. dumpLocks isn't used, so could be commented out or removed. More when I have looked into the implementation.
          Hide
          Dag H. Wanvik added a comment -

          I'll have a look at patch 2e.

          Show
          Dag H. Wanvik added a comment - I'll have a look at patch 2e.
          Hide
          Kristian Waagan added a comment -

          Attached patch 2e, which contains the following changes:
          o removed tracing code from BTreeScan (included by mistake)
          o renamed value of trace-property from "dblog" to "log" (derby.storage.indexStats.trace=off|log|stdout|both)
          o introduced interface iapi...IndexStatisticsDaemon, renamed existing class to impl...IndexStatisticsDaemonImpl

          Also attached a small HTML-file, 'autoindexstats.html', which describes some preliminary ideas for what can be done for later revisions of the feature. Feel free to add stuff if you have more ideas. We should also document a decision about not implementing any of these For instance, how many knobs do we want to expose for power-users? Another reason may be too high complexity.

          Patch 2e ready for review.
          I'd really like some review before committing this code, but if I nothing happens I will just commit it and handle the problems as they come

          Show
          Kristian Waagan added a comment - Attached patch 2e, which contains the following changes: o removed tracing code from BTreeScan (included by mistake) o renamed value of trace-property from "dblog" to "log" (derby.storage.indexStats.trace=off|log|stdout|both) o introduced interface iapi...IndexStatisticsDaemon, renamed existing class to impl...IndexStatisticsDaemonImpl Also attached a small HTML-file, 'autoindexstats.html', which describes some preliminary ideas for what can be done for later revisions of the feature. Feel free to add stuff if you have more ideas. We should also document a decision about not implementing any of these For instance, how many knobs do we want to expose for power-users? Another reason may be too high complexity. Patch 2e ready for review. I'd really like some review before committing this code, but if I nothing happens I will just commit it and handle the problems as they come
          Hide
          Kristian Waagan added a comment -

          Thanks Lily,

          The invalidation request is still performed in IndexStatisticsDaemon, but not until after all the statistics for a given table (for all the indexes of that table, that is) have been generated.
          As said, there are still some problem with the invalidation, since I have discovered that it can "deadlock". The situation involves both database locks and Java monitors, so it is not a regular database deadlock, and it will result in a timeout.

          Regarding the value of the property, I can change that to 'log' in the next rev of the patch. Note that the trace-property may be removed in the final version, I haven't decided yet. We also have to decide what should be logged, and maybe consider adding multiple log levels to control the amount of data written to the log.

          Let me summarize what I think are the next most important steps:
          o testing! This will be easier once the code is committed to trunk, but we're not quite there yet. I think the best way to test this is to run it with various applications, also some that run for a longer period of time.
          o organization of the code; where to place the daemon, whether to introduce an interface, whether to move daemon control out of the data dictionary.
          o tuning of table size boundaries for when the daemon kicks in

          Show
          Kristian Waagan added a comment - Thanks Lily, The invalidation request is still performed in IndexStatisticsDaemon, but not until after all the statistics for a given table (for all the indexes of that table, that is) have been generated. As said, there are still some problem with the invalidation, since I have discovered that it can "deadlock". The situation involves both database locks and Java monitors, so it is not a regular database deadlock, and it will result in a timeout. Regarding the value of the property, I can change that to 'log' in the next rev of the patch. Note that the trace-property may be removed in the final version, I haven't decided yet. We also have to decide what should be logged, and maybe consider adding multiple log levels to control the amount of data written to the log. Let me summarize what I think are the next most important steps: o testing! This will be easier once the code is committed to trunk, but we're not quite there yet. I think the best way to test this is to run it with various applications, also some that run for a longer period of time. o organization of the code; where to place the daemon, whether to introduce an interface, whether to move daemon control out of the data dictionary. o tuning of table size boundaries for when the daemon kicks in
          Hide
          Lily Wei added a comment -

          Thanks Kristian for great work!!! When does the invalidation request be performed now? Does it control in IndexStatisticDaemon? I run the Suites.all with the following comment in windows 'java -client -Dderby.tests.trace=true -Dderby.language.disableIndexStatsUpdate=t
          rue -Dderby.language.logIndexStatsUpdate=true -Dderby.language.traceIndexStatsUp
          date=both -Dderby.storage.indexStats.auto=true -Dderby.storage.indexStats.log=tr
          ue -Dderby.storage.indexStats.trace=both -XX:MaxPermSize=192M -Xmx1024M -Xms512
          M -DderbyTesting.oldReleasePath=c:/derby/oldrelease/jars junit.textui.TestRunner
          org.apache.derbyTesting.functionTests.suites.AllPackages 2>&1 | tee rjall.out
          As expected, some tests fail because of standard out is different than canon. I will run it again and set -Dderby.storage.indexStats.trace to dblog instead. Is dblog the best name? I think it is very Derby orientated and might not be that intuitive to new Derby user. How about just "log"?
          I am attaching the output just for reference.

          Show
          Lily Wei added a comment - Thanks Kristian for great work!!! When does the invalidation request be performed now? Does it control in IndexStatisticDaemon? I run the Suites.all with the following comment in windows 'java -client -Dderby.tests.trace=true -Dderby.language.disableIndexStatsUpdate=t rue -Dderby.language.logIndexStatsUpdate=true -Dderby.language.traceIndexStatsUp date=both -Dderby.storage.indexStats.auto=true -Dderby.storage.indexStats.log=tr ue -Dderby.storage.indexStats.trace=both -XX:MaxPermSize=192M -Xmx1024M -Xms512 M -DderbyTesting.oldReleasePath=c:/derby/oldrelease/jars junit.textui.TestRunner org.apache.derbyTesting.functionTests.suites.AllPackages 2>&1 | tee rjall.out As expected, some tests fail because of standard out is different than canon. I will run it again and set -Dderby.storage.indexStats.trace to dblog instead. Is dblog the best name? I think it is very Derby orientated and might not be that intuitive to new Derby user. How about just "log"? I am attaching the output just for reference.
          Hide
          Kristian Waagan added a comment -

          Attached patch 2d: 18 files changed, 2085 insertions, 259 deletions

          Note that this version has both logging and tracing (to derby.log) enabled.

          Patch ready for review and testing.

          Show
          Kristian Waagan added a comment - Attached patch 2d: 18 files changed, 2085 insertions , 259 deletions Note that this version has both logging and tracing (to derby.log) enabled. Patch ready for review and testing.
          Hide
          Kristian Waagan added a comment -

          Added link to DERBY-3790, which discusses if we can skip statistics generation for [some] unique indexes.

          Show
          Kristian Waagan added a comment - Added link to DERBY-3790 , which discusses if we can skip statistics generation for [some] unique indexes.
          Hide
          Kristian Waagan added a comment -

          After the commit for DERBY-4899 the patch no longer applies cleanly.

          I'm working on a new version, in which I will rename some methods, do some changes to the error handling, and move the invalidation to after all statistics for an index have been generated.
          I expect to post the revised version early next week. I find that the daemon code is a bit hard to read, so if you plan to have a look I would wait for the next rev

          Show
          Kristian Waagan added a comment - After the commit for DERBY-4899 the patch no longer applies cleanly. I'm working on a new version, in which I will rename some methods, do some changes to the error handling, and move the invalidation to after all statistics for an index have been generated. I expect to post the revised version early next week. I find that the daemon code is a bit hard to read, so if you plan to have a look I would wait for the next rev
          Hide
          Kristian Waagan added a comment -

          The properties have been changed, and are now:
          derby.storage.indexStats.auto=true|false
          derby.storage.indexStats.log=true|false
          derby.storage.indexStats.trace=off|dblog|stdout|both

          It may be wise to change the log property default to true, at least as we test the feature.

          Show
          Kristian Waagan added a comment - The properties have been changed, and are now: derby.storage.indexStats.auto= true |false derby.storage.indexStats.log=true| false derby.storage.indexStats.trace= off |dblog|stdout|both It may be wise to change the log property default to true, at least as we test the feature.
          Hide
          Kristian Waagan added a comment -

          Attaching patch 2c.

          The two main changes:
          a) Removed code in AlterTableConstantAction for updating statistics, and made it use the daemon.
          b) Removed status tracking code in TableDescriptor/statistics object.

          The removal of code for (b) may cause some extra calls to the daemon, and the scheduling requests will be rejected. I'm not sure how critical this is yet, which is why I decided to remove the code. Maybe a more light-weight mechanism can be re-added later.

          One major concern I have right now, is how to use the dependency system.
          When should the invalidation request be performed?
          Does it have to be before the statistics have been generated, or can it be done afterwards?
          If the table is large, it may take a long time from the invalidation request is performed until the index(es) has been scanned and the new statistics added to the system tables.

          I also see a problem when running CheckToursDBTest, where we end up in a kind of deadlock situation. There are two threads, one daemon thread (D) updating statistics, and one user thread (U) updating a table with some triggers. Depending on timing, the following may happen:
          D: takes S-locks on some rows in SYSDEPENDS for invalidation
          U: compiles trigger/statement, enters DependencyManager.clearDependencies() and grabs the monitor 'this'.
          D: needs to enter DM.clearDependencies(), but U already has the monitor 'this' (BLOCKED)
          U: needs to get X-locks on the rows already locked by D in SYSDEPENDS (TIMES OUT)
          D: after U is killed due to a lock timeout, D can continue and successfully complete the invalidation

          I have not yet figured out how/if this issue can be resolved.
          Also, IndexStatisticsDaemon has to be moved to a different package, or at least a new interface has to be introduced if the DataDictionary is used to access the daemon. Any opinions on where to place and how to access the daemon?

          I'm asking for review, particularly of the code in IndexStatisticsDaemon. The changes in impl/sql would also benefit from review from someone who knows more about the dynamics here.
          When I run suites.All I see from zero to three failures on my machine. I've also seen some failures in derbyall earlier, but I haven't run it with my latest changes (will do).

          Patch ready for review.

          Show
          Kristian Waagan added a comment - Attaching patch 2c. The two main changes: a) Removed code in AlterTableConstantAction for updating statistics, and made it use the daemon. b) Removed status tracking code in TableDescriptor/statistics object. The removal of code for (b) may cause some extra calls to the daemon, and the scheduling requests will be rejected. I'm not sure how critical this is yet, which is why I decided to remove the code. Maybe a more light-weight mechanism can be re-added later. One major concern I have right now, is how to use the dependency system. When should the invalidation request be performed? Does it have to be before the statistics have been generated, or can it be done afterwards? If the table is large, it may take a long time from the invalidation request is performed until the index(es) has been scanned and the new statistics added to the system tables. I also see a problem when running CheckToursDBTest, where we end up in a kind of deadlock situation. There are two threads, one daemon thread (D) updating statistics, and one user thread (U) updating a table with some triggers. Depending on timing, the following may happen: D: takes S-locks on some rows in SYSDEPENDS for invalidation U: compiles trigger/statement, enters DependencyManager.clearDependencies() and grabs the monitor 'this'. D: needs to enter DM.clearDependencies(), but U already has the monitor 'this' (BLOCKED) U: needs to get X-locks on the rows already locked by D in SYSDEPENDS (TIMES OUT) D: after U is killed due to a lock timeout, D can continue and successfully complete the invalidation I have not yet figured out how/if this issue can be resolved. Also, IndexStatisticsDaemon has to be moved to a different package, or at least a new interface has to be introduced if the DataDictionary is used to access the daemon. Any opinions on where to place and how to access the daemon? I'm asking for review, particularly of the code in IndexStatisticsDaemon. The changes in impl/sql would also benefit from review from someone who knows more about the dynamics here. When I run suites.All I see from zero to three failures on my machine. I've also seen some failures in derbyall earlier, but I haven't run it with my latest changes (will do). Patch ready for review.
          Hide
          Kristian Waagan added a comment -

          I've taken a look at the test failures:
          o failures in OrderByAndSortAvoidance were taken care of by DERBY-4833 patch 1a.
          o the failure UpdateStatisticsTest.testUpdateStatistics will continue to happen for the time being (see DERBY-4837). The failure surfaced because the statistics are automatically generated faster now (written to the dd as soon as they are computed).
          o the failure UpdateStatisticsTest.testNoExclusiveLockOnTable failed due to a bug in the prototype. It left the dd in write-mode, and that causes at least one path in the dd to take an exclusive lock instead of a shared lock when looking up stuff in the system table.
          o I suspect the failure in AutoIncrementTest.testsyslocks may have been caused by the same bug in the prototype, but I'm not sure. It doesn't reproduce again on my machine, but it could be timing-dependent.

          I have also seen an intermittent test failure in XplainStatisticsTest, which I'm unable to explain. Seems like there are two rows in one of the XPLAIN tables where there is supposed to be only one.
          Today I also saw the old harness test store/updatelocks.sql fail, but I haven't looked into it yet.

          I'm attaching the latest revision (patch 2b).

          Show
          Kristian Waagan added a comment - I've taken a look at the test failures: o failures in OrderByAndSortAvoidance were taken care of by DERBY-4833 patch 1a. o the failure UpdateStatisticsTest.testUpdateStatistics will continue to happen for the time being (see DERBY-4837 ). The failure surfaced because the statistics are automatically generated faster now (written to the dd as soon as they are computed). o the failure UpdateStatisticsTest.testNoExclusiveLockOnTable failed due to a bug in the prototype. It left the dd in write-mode, and that causes at least one path in the dd to take an exclusive lock instead of a shared lock when looking up stuff in the system table. o I suspect the failure in AutoIncrementTest.testsyslocks may have been caused by the same bug in the prototype, but I'm not sure. It doesn't reproduce again on my machine, but it could be timing-dependent. I have also seen an intermittent test failure in XplainStatisticsTest, which I'm unable to explain. Seems like there are two rows in one of the XPLAIN tables where there is supposed to be only one. Today I also saw the old harness test store/updatelocks.sql fail, but I haven't looked into it yet. I'm attaching the latest revision (patch 2b).
          Hide
          Lily Wei added a comment -

          With more tracking and specific tracing in the JDBC.java and RuntimeStatisticsParser.java, I generate rjone.out for 'java -Dderby.tests.trace=true -Dderby.language.disableIndexStatsUpdate=false -Dd
          erby.language.logIndexStatsUpdate=true -Dderby.language.traceIndexStatsUpdate=bo
          th junit.textui.TestRunner org.apache.derbyTesting.functionTests.tests.lang.Upda
          teStatisticsTest' Form looking by the output, the unexpected result row come from 'SELECT * FROM SYS.STATISTICS'for 'Index Scan ResultSet for T2 using index T2I1'. And, it gets error 'A lock could not be obtained within the time requested' was thrown while evaluating an expression.' The test did ask to lock table t with share mode. I am not really familiar with locking for Derby. Could we need more concurrency control for StatementNode and RAMTransactionContext? I am including rjone.out for reference.

          Show
          Lily Wei added a comment - With more tracking and specific tracing in the JDBC.java and RuntimeStatisticsParser.java, I generate rjone.out for 'java -Dderby.tests.trace=true -Dderby.language.disableIndexStatsUpdate=false -Dd erby.language.logIndexStatsUpdate=true -Dderby.language.traceIndexStatsUpdate=bo th junit.textui.TestRunner org.apache.derbyTesting.functionTests.tests.lang.Upda teStatisticsTest' Form looking by the output, the unexpected result row come from 'SELECT * FROM SYS.STATISTICS'for 'Index Scan ResultSet for T2 using index T2I1'. And, it gets error 'A lock could not be obtained within the time requested' was thrown while evaluating an expression.' The test did ask to lock table t with share mode. I am not really familiar with locking for Derby. Could we need more concurrency control for StatementNode and RAMTransactionContext? I am including rjone.out for reference.
          Hide
          Lily Wei added a comment -

          Before adding application tests, I thought I tried it with Suites.All with all the tracing and flag set to true for the feature. UpdateStatisticsTest failed by itself on my machine. Maybe the lcc to update the data dictionary directly from the daemon has some concurrency issue to tweet with. I really enjoy seeing all the tracing log for this feature. That is so great. I will try to run it with application next and understand the lcc more.

          Show
          Lily Wei added a comment - Before adding application tests, I thought I tried it with Suites.All with all the tracing and flag set to true for the feature. UpdateStatisticsTest failed by itself on my machine. Maybe the lcc to update the data dictionary directly from the daemon has some concurrency issue to tweet with. I really enjoy seeing all the tracing log for this feature. That is so great. I will try to run it with application next and understand the lcc more.
          Hide
          Kristian Waagan added a comment -

          Patch 2a is another code dump, still prototyping.
          I am now using an lcc to update the data dictionary directly from the daemon.
          The damoen is enabled by default and will write some information to the log. More detailed tracing can be enabled (see the comments in IndexStatisticsDaemon).

          If anyone has an application or a db load they can test this with, I'd be happy to know if the daemon works.
          To do so, build Derby with the patch, run your app and then grep your derby.log file afterwards for "istat".
          It might also crash...
          You should see statistics being generated for indexes which don't have them, and potentially also updates of existing stats (depends on many factors, I'll explain more later, but some keywords: row count estimate, table growth, statement compilation).

          I'll be away for a week, and will answer any comments when I'm back.
          My next step will be to validate/rewrite the logic I added to the table descriptor and the other "catalog classes", potentially followed by some initial tuning of various thresholds, and writing more tests.

          Show
          Kristian Waagan added a comment - Patch 2a is another code dump, still prototyping. I am now using an lcc to update the data dictionary directly from the daemon. The damoen is enabled by default and will write some information to the log. More detailed tracing can be enabled (see the comments in IndexStatisticsDaemon). If anyone has an application or a db load they can test this with, I'd be happy to know if the daemon works. To do so, build Derby with the patch, run your app and then grep your derby.log file afterwards for "istat". It might also crash... You should see statistics being generated for indexes which don't have them, and potentially also updates of existing stats (depends on many factors, I'll explain more later, but some keywords: row count estimate, table growth, statement compilation). I'll be away for a week, and will answer any comments when I'm back. My next step will be to validate/rewrite the logic I added to the table descriptor and the other "catalog classes", potentially followed by some initial tuning of various thresholds, and writing more tests.
          Hide
          Lily Wei added a comment -

          Thanks Kristian for such detail report. It is hard to handle all platform at prototype stage. Thank you so much for doing it and share all the details with us.

          Show
          Lily Wei added a comment - Thanks Kristian for such detail report. It is hard to handle all platform at prototype stage. Thank you so much for doing it and share all the details with us.
          Hide
          Kristian Waagan added a comment -

          Investigation showed that the errors Lily are getting on Windows in store.OSReadOnlyTest are caused by a partly read-only database directory. The fact that it isn't fully read-only makes Derby believe the database is read-write.
          I'll fix the test issue, see DERBY-4804 for details.

          I expect to post a new version of the prototype soon. It will use an lcc to update the data dictionary directly.

          Show
          Kristian Waagan added a comment - Investigation showed that the errors Lily are getting on Windows in store.OSReadOnlyTest are caused by a partly read-only database directory. The fact that it isn't fully read-only makes Derby believe the database is read-write. I'll fix the test issue, see DERBY-4804 for details. I expect to post a new version of the prototype soon. It will use an lcc to update the data dictionary directly.
          Hide
          Lily Wei added a comment -

          Thanks Kristian for the prompt reply and explanation.

          Good eyes. I add the "CHECKING" so it is easier for me to see what is going on for now.
          if (tableDescriptor.getTableType() ==
          TableDescriptor.BASE_TABLE_TYPE &&
          tableDescriptor.getTotalNumberOfIndexes() > 0) {
          System.out.println("CHECKING: " + tableDescriptor.getQualifiedName());
          long rows = baseRowCount();
          if (statisticsForTable)

          { tableDescriptor.markForIndexStatsUpdate(rows); }

          else if (rows > 100)

          { // Only create statistics if there are "enough" rows. tableDescriptor.markForIndexStatsUpdate(-1); }

          }

          I delete all existing directories before I run the test suites. I run the testOSReadOnly and it failed as I run it individually. I am attaching the error-stacktrace.out and derby.log and hope it is helpful to you.

          With database status, I mean it allows us to know the state of the database. i.e. read-only status.

          Show
          Lily Wei added a comment - Thanks Kristian for the prompt reply and explanation. Good eyes. I add the "CHECKING" so it is easier for me to see what is going on for now. if (tableDescriptor.getTableType() == TableDescriptor.BASE_TABLE_TYPE && tableDescriptor.getTotalNumberOfIndexes() > 0) { System.out.println("CHECKING: " + tableDescriptor.getQualifiedName()); long rows = baseRowCount(); if (statisticsForTable) { tableDescriptor.markForIndexStatsUpdate(rows); } else if (rows > 100) { // Only create statistics if there are "enough" rows. tableDescriptor.markForIndexStatsUpdate(-1); } } I delete all existing directories before I run the test suites. I run the testOSReadOnly and it failed as I run it individually. I am attaching the error-stacktrace.out and derby.log and hope it is helpful to you. With database status, I mean it allows us to know the state of the database. i.e. read-only status.
          Hide
          Kristian Waagan added a comment -

          Hi Lily,

          First, did you modify the code before you ran the second time? I still see "CHECKING: ..." in the output, but I disabled this println in the 1b patch.
          Also, did you run in a new directory, or delete all existing directories before running again?
          It would be interesting to know if testOSReadOnly fails also if you run it individually and in a clean test directory.

          With database status, do you mean whether it is read-only or not?

          Show
          Kristian Waagan added a comment - Hi Lily, First, did you modify the code before you ran the second time? I still see "CHECKING: ..." in the output, but I disabled this println in the 1b patch. Also, did you run in a new directory, or delete all existing directories before running again? It would be interesting to know if testOSReadOnly fails also if you run it individually and in a clean test directory. With database status, do you mean whether it is read-only or not?
          Hide
          Lily Wei added a comment -

          Hi Kristian:
          Thank you for doing such a great work. I am very interested of this feature and would love to keep learning about this feature. The check for container opened in read-only mode error behaves much better. I was wondering, is it a good idea to check set a flag on PrepareStatement that possible allow check about database status somewhere between generating PreparedStatement and the time we check for exception in GenericStatement.java for store, lcc, tx or something else? Could it help the situation we are in?

          I did run the Suites.all test suits. This is for reference point only. I got 6 failures. Four of them are all for lang.OrderByAndSortAvoidance except testOSReadOnly is having permission problem when it try to remove directory: c:\derby2\trunk\testallpackages\system\singleUse\readOnly or copy directory from c:\derby2\trunk\testallpackages\system\singleUse\oneuse4e to c:\derby2\trunk\testallpackages\system\singleUse\readOnly Hope this help!

          Show
          Lily Wei added a comment - Hi Kristian: Thank you for doing such a great work. I am very interested of this feature and would love to keep learning about this feature. The check for container opened in read-only mode error behaves much better. I was wondering, is it a good idea to check set a flag on PrepareStatement that possible allow check about database status somewhere between generating PreparedStatement and the time we check for exception in GenericStatement.java for store, lcc, tx or something else? Could it help the situation we are in? I did run the Suites.all test suits. This is for reference point only. I got 6 failures. Four of them are all for lang.OrderByAndSortAvoidance except testOSReadOnly is having permission problem when it try to remove directory: c:\derby2\trunk\testallpackages\system\singleUse\readOnly or copy directory from c:\derby2\trunk\testallpackages\system\singleUse\oneuse4e to c:\derby2\trunk\testallpackages\system\singleUse\readOnly Hope this help!
          Hide
          Kristian Waagan added a comment -

          Attaching patch revision 1b.

          Thanks for having a look at the patch, Lily.
          Seems a last minute change caused a lot of trouble. Early on the I think I ignored all exceptions originating from IndexStatisticsDaemon.writeUpdatedStats, but before I uploaded the patch I added checks for specific errors.
          I have added another check for the container opened in read-only mode error. The issue I see is that Derby doesn't detect that the database is read-only before it's too late to disable the statistics update feature. I tried the isReadOnly-method on both store, lcc, and tx - but none of those returned true at the time when the index statistics daemon is called for. In this case the read-only was caused by missing file privileges, maybe Derby will handle other causes better .

          If you run the tests again with patch 1b, hopefully all you'll see is four failures in lang.OrderByAndSortAvoidance (two distinct failures).

          Show
          Kristian Waagan added a comment - Attaching patch revision 1b. Thanks for having a look at the patch, Lily. Seems a last minute change caused a lot of trouble. Early on the I think I ignored all exceptions originating from IndexStatisticsDaemon.writeUpdatedStats, but before I uploaded the patch I added checks for specific errors. I have added another check for the container opened in read-only mode error. The issue I see is that Derby doesn't detect that the database is read-only before it's too late to disable the statistics update feature. I tried the isReadOnly-method on both store, lcc, and tx - but none of those returned true at the time when the index statistics daemon is called for. In this case the read-only was caused by missing file privileges, maybe Derby will handle other causes better . If you run the tests again with patch 1b, hopefully all you'll see is four failures in lang.OrderByAndSortAvoidance (two distinct failures).
          Hide
          Lily Wei added a comment -

          Hi Kristian:
          Thank you so much for doing this. This will be a great plus for Derby in my opinion.

          Overall, I like the design. Having a daemon gathering statistic and execute in a separate thread is a common statistic gathering implementation strategy. I agree that the memory consumption should be little for the implementation. At the compilation process, new statistic for the query is written along with completed statistics to the data dictionary with a nested read-write user transaction. And, we will not wait for new statistics to be generated. Should we have more detail priority strategy in turn of how and when completed statistic gets written to data dictionary associate with query complete time and new statistic? i.e. For the case we write and the cases we don't wait for too long of time consideration.
          I am not sure this is cover in the weaknesses already or the NO_WAIT option in the call to add new
          descriptors to the data dictionary. I personally was not clear in turn of the implementation on compilation time. So, any elaboration on compilation process for the current operation and additional with new statistics and completed statistics information written to data directory will be very helpful to me.

          I include my run(rjall.out) for Suites.All on Windows 7 with jdk1.6.0_13 32 bits with my comments. This is just for reference in case we can see other issues were not mention already.

          Show
          Lily Wei added a comment - Hi Kristian: Thank you so much for doing this. This will be a great plus for Derby in my opinion. Overall, I like the design. Having a daemon gathering statistic and execute in a separate thread is a common statistic gathering implementation strategy. I agree that the memory consumption should be little for the implementation. At the compilation process, new statistic for the query is written along with completed statistics to the data dictionary with a nested read-write user transaction. And, we will not wait for new statistics to be generated. Should we have more detail priority strategy in turn of how and when completed statistic gets written to data dictionary associate with query complete time and new statistic? i.e. For the case we write and the cases we don't wait for too long of time consideration. I am not sure this is cover in the weaknesses already or the NO_WAIT option in the call to add new descriptors to the data dictionary. I personally was not clear in turn of the implementation on compilation time. So, any elaboration on compilation process for the current operation and additional with new statistics and completed statistics information written to data directory will be very helpful to me. I include my run(rjall.out) for Suites.All on Windows 7 with jdk1.6.0_13 32 bits with my comments. This is just for reference in case we can see other issues were not mention already.
          Hide
          Kristian Waagan added a comment -

          Attached is a prototype of another attempt at implementing auto-update
          of Derby index statistics. First I'll describe the patch briefly, then
          I'll note some potential improvements and ideas.
          I've omitted lots of details, feel free to ask questions and to comment
          on the suggested improvements etc. They need a lot more work...

          The code is nowhere near complete, its primary purpose is to spur
          discussion and hopefully guide us in the right direction.

          [Prototype description]

          The prototype performs some checks for whether the index statistics are
          stale during statement compilation, as Mamta did under DERBY-3788. If
          the statistics are considered stale, an update job to update all indexes
          for the base table is scheduled with a "daemon". The daemon keeps track
          of scheduled update jobs, and will execute them in a separate thread.
          Only one job will be taken care of at a time, and if there are too many
          jobs, new jobs are discarded. When a slot frees up in the work queue,
          these jobs will eventually be scheduled. If there are no statistics,
          creating them will be scheduled (the daemon doesn't separate between
          creating and updating stats). When a job is scheduled for a base table,
          this is recorded in the associated index descriptors (transient state)
          to avoid having to query the daemon too often.

          As mentioned, the work is carried out in a separate thread, created as
          required (there is no permanent background thread, it dies if the queue
          is emptied). This seems appropriate as statistics update should be
          rather infrequent compared to other operations in a database system.

          When new statistics are computed for the indexes of a table, they are
          stored in the daemon. They require little memory (table identifier, and
          per index, the index identifer, two longs and one int).

          As a statement is compiled, the optimizer will consider the available
          indexes. At this point the index statistics are checked, and if we see
          that they have been scheduled we make sure we check if they are
          completed a little later in the compilation process. If we find new
          statistics for the query being compiled, we also write any other
          completed statistics to the data dictionary. Writing to the data
          dictionary is currently done with a nested read-write user transaction
          in the user transaction (during statement compilation) - mainly to avoid
          keeping locks for an extended period of time.

          For clarity, statement compilation/execution will not wait for new
          statistics to be generated. In the case of large tables, it could take
          hours to generate new stats.

          Obvious weaknesses:
          o code organization (I don't know the code well) - choices made based on
          what worked and on reducing overhead (i.e., checking indexes when we
          have already obtained handles to them)
          o the async/decoupled data dictionary update - done to avoid having to
          create a LanguageConnectionContext (lcc).
          o logic/thresholds for determining when stats are stale
          o the row estimate logic also has weaknesses (for instance when mixing
          setting absolute values and updating the estimate based on deltas)

          Other notes/characteristics of the prototype:
          o stats not generated/updated for system tables (caused locking problems)
          o lower limit on the row estimate (don't generate for tables with few rows)
          o I considered to expose the NO_WAIT option in the call to add new
          descriptors to the data dictionary. Don't know if this is needed if we
          update stats with a separate transaction from the daemon, then we can
          either use TransactionControl.setNoLockWait() or maybe even just wait?
          o current staleness code is dependent on reasonable row estimates
          o the "unit of work" is currently a base table - when scheduled all
          associated index statistics will be regenerated.
          o I suspect that most tests in suites.All run with the DBO as the user,
          and I haven't done anything specific to handle missing privileges.

          [Prototype state]

          Runs suites.All and derby.all with only four failures, all in
          OrderByAndSortAvoidance. The tests fail on an assert for whether a table
          scan is performed. To me it looks like the new stats makes the
          compiler/optimizer choose a different plan (not necessarily better in
          terms of pages visited though, but that's a DBA/optimizer issue).

          Currently two flags control the prototype behavior:
          o derby.language.disableIndexStatsUpdate=false|true
          o derby.language.logIndexStatsUpdate=false|true

          If you grep for 'istat' in derby.log, you should get all the lines
          relevant to automatic index statistics update.

          [Potential improvements]
          o update data dictionary from the daemon thread
          (must then be able to create an appropriate lcc)

          o drift in the number of unique values isn't handled.
          Some potential remedies (raw ideas):

          Mechanism Distinct value drift Row count change
          =========================================================================
          (a) compilation check N Y
          (b) timed check N Y
          (c) timed unconditional update Y Y
          (d) UPDATE table SET ... y N

          In short:
          (a) creates statistics when not existing and kicks off the update job
          as soon as stale we believe we should have had better stats. (b) helps
          systems which are in a steady state (all statements compiled and
          reused) - would typically check all user tables with indexes and
          perform the staleness check from (a). (c) would help against
          "anything" - but potentially with a large delay. Only useful for
          applications where the database is up for very long periods of time
          (days, weeks, months). Intervals for (b) and (c) would have to be
          configurable. Mechanism (d) would help for updates changing a large
          percentage of the rows, but would not catch many small updates
          changing the selectivity of an index.
          It may be possible to reuse BasicDaemon for the timed checks
          (scheduling only, work would still be performed in a separate thread).

          o do we need to throttle (a) the index scans, or (b) the processing
          rate of the scheduled jobs?
          (I started playing with a crude utilization rate)

          o almost as above, but we should take care to avoid "infinite-loops"

          o at which point may a change in either the number of rows or the field
          values be big enough to warrant a recalculation of the stats?
          What's more costly; a sub-optimal plan or reading all the data?

          I'll be away for some weeks, but plan to return to this issue when I'm back.
          My next steps depends on the feedback I get, but one way forwards may be
          to try to do the data dictionary update from the daemon itself. Once we get
          the core framework in place, we can start working on all the various issues
          that have to be addressed.

          Show
          Kristian Waagan added a comment - Attached is a prototype of another attempt at implementing auto-update of Derby index statistics. First I'll describe the patch briefly, then I'll note some potential improvements and ideas. I've omitted lots of details, feel free to ask questions and to comment on the suggested improvements etc. They need a lot more work... The code is nowhere near complete, its primary purpose is to spur discussion and hopefully guide us in the right direction. [Prototype description] The prototype performs some checks for whether the index statistics are stale during statement compilation, as Mamta did under DERBY-3788 . If the statistics are considered stale, an update job to update all indexes for the base table is scheduled with a "daemon". The daemon keeps track of scheduled update jobs, and will execute them in a separate thread. Only one job will be taken care of at a time, and if there are too many jobs, new jobs are discarded. When a slot frees up in the work queue, these jobs will eventually be scheduled. If there are no statistics, creating them will be scheduled (the daemon doesn't separate between creating and updating stats). When a job is scheduled for a base table, this is recorded in the associated index descriptors (transient state) to avoid having to query the daemon too often. As mentioned, the work is carried out in a separate thread, created as required (there is no permanent background thread, it dies if the queue is emptied). This seems appropriate as statistics update should be rather infrequent compared to other operations in a database system. When new statistics are computed for the indexes of a table, they are stored in the daemon. They require little memory (table identifier, and per index, the index identifer, two longs and one int). As a statement is compiled, the optimizer will consider the available indexes. At this point the index statistics are checked, and if we see that they have been scheduled we make sure we check if they are completed a little later in the compilation process. If we find new statistics for the query being compiled, we also write any other completed statistics to the data dictionary. Writing to the data dictionary is currently done with a nested read-write user transaction in the user transaction (during statement compilation) - mainly to avoid keeping locks for an extended period of time. For clarity, statement compilation/execution will not wait for new statistics to be generated. In the case of large tables, it could take hours to generate new stats. Obvious weaknesses: o code organization (I don't know the code well) - choices made based on what worked and on reducing overhead (i.e., checking indexes when we have already obtained handles to them) o the async/decoupled data dictionary update - done to avoid having to create a LanguageConnectionContext (lcc). o logic/thresholds for determining when stats are stale o the row estimate logic also has weaknesses (for instance when mixing setting absolute values and updating the estimate based on deltas) Other notes/characteristics of the prototype: o stats not generated/updated for system tables (caused locking problems) o lower limit on the row estimate (don't generate for tables with few rows) o I considered to expose the NO_WAIT option in the call to add new descriptors to the data dictionary. Don't know if this is needed if we update stats with a separate transaction from the daemon, then we can either use TransactionControl.setNoLockWait() or maybe even just wait? o current staleness code is dependent on reasonable row estimates o the "unit of work" is currently a base table - when scheduled all associated index statistics will be regenerated. o I suspect that most tests in suites.All run with the DBO as the user, and I haven't done anything specific to handle missing privileges. [Prototype state] Runs suites.All and derby.all with only four failures, all in OrderByAndSortAvoidance. The tests fail on an assert for whether a table scan is performed. To me it looks like the new stats makes the compiler/optimizer choose a different plan (not necessarily better in terms of pages visited though, but that's a DBA/optimizer issue). Currently two flags control the prototype behavior: o derby.language.disableIndexStatsUpdate= false |true o derby.language.logIndexStatsUpdate= false |true If you grep for 'istat' in derby.log, you should get all the lines relevant to automatic index statistics update. [Potential improvements] o update data dictionary from the daemon thread (must then be able to create an appropriate lcc) o drift in the number of unique values isn't handled. Some potential remedies (raw ideas): Mechanism Distinct value drift Row count change ========================================================================= (a) compilation check N Y (b) timed check N Y (c) timed unconditional update Y Y (d) UPDATE table SET ... y N In short: (a) creates statistics when not existing and kicks off the update job as soon as stale we believe we should have had better stats. (b) helps systems which are in a steady state (all statements compiled and reused) - would typically check all user tables with indexes and perform the staleness check from (a). (c) would help against "anything" - but potentially with a large delay. Only useful for applications where the database is up for very long periods of time (days, weeks, months). Intervals for (b) and (c) would have to be configurable. Mechanism (d) would help for updates changing a large percentage of the rows, but would not catch many small updates changing the selectivity of an index. It may be possible to reuse BasicDaemon for the timed checks (scheduling only, work would still be performed in a separate thread). o do we need to throttle (a) the index scans, or (b) the processing rate of the scheduled jobs? (I started playing with a crude utilization rate) o almost as above, but we should take care to avoid "infinite-loops" o at which point may a change in either the number of rows or the field values be big enough to warrant a recalculation of the stats? What's more costly; a sub-optimal plan or reading all the data? I'll be away for some weeks, but plan to return to this issue when I'm back. My next steps depends on the feedback I get, but one way forwards may be to try to do the data dictionary update from the daemon itself. Once we get the core framework in place, we can start working on all the various issues that have to be addressed.

            People

            • Assignee:
              Kristian Waagan
              Reporter:
              Kristian Waagan
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development