Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 10.6.1.0
    • Fix Version/s: None
    • Component/s: SQL
    • Environment:
      unix
    • Urgency:
      Urgent

      Description

      Using OFFSET and FETCH isn't any faster for paging than using existing JDBC methods:
      //p is page number, n is rows per page
      setMaxRows(n*p);
      setFetchSize;
      //...
      rs.absolute((p-1)*n);

      When used on a table with blobs, both the JDBC way, and the OFFSET/FETCH way are unexpectedly slow ('unexpectedly' because I am not even retrieving the BLOB column).

      I need a way to do paging that doesn't slow down proportionately to the page I am requesting. It must also maintain this performance on tables with blobs. Following is discussion from users mail list, year or so ago.

      > 2) what are the performance implications for users of the embedded
      > driver? In particular, with the embedded driver I am hoping that
      > this feature allows portions of a result set to be retrieved without
      > the overhead of retrieving the entire result set. For example, if I

      I am afraid that with embedded driver, you will only save a little CPU
      (by avoiding some JDBC calls) since under the hood, the code siphons
      off the rows till it hits the offset, so if you have a large offset,
      you will still incur reading of those rows (modulo page caching). In
      client/server driver context the savings are larger, of course, in
      that fewer rows are sent over the wire. For simple queries that can
      use an index, the optimizer could make use of the offset information
      to avoid reading the entire row when skipping rows before offset, just
      counting rows in the index to get to the first qualifying row, but
      this optimization is not yet implemented.

      Often, this feature is used together with ORDER BY which would entail
      some sorting of the result set and then all the rows would have to be
      read anyway. Again, for some simple queries, sort avoidance is used by
      the optimizer, so optimization is still possible for for such queries.

      If you think this optimization is an important capability feel free to
      file an improvement issue for it.

        Issue Links

          Activity

          Hide
          Knut Anders Hatlen added a comment -

          The inefficiency when querying a table with BLOBs may be caused by DERBY-1506. A possible workaround for that part of the problem is to create an index that contains all the columns referenced in the query, so that it doesn't need to scan the heap.

          Show
          Knut Anders Hatlen added a comment - The inefficiency when querying a table with BLOBs may be caused by DERBY-1506 . A possible workaround for that part of the problem is to create an index that contains all the columns referenced in the query, so that it doesn't need to scan the heap.
          Hide
          geoff hendrey added a comment -

          I have an index covering every column but the BLOB. That doesn't improve the performance.

          Show
          geoff hendrey added a comment - I have an index covering every column but the BLOB. That doesn't improve the performance.
          Hide
          Knut Anders Hatlen added a comment -

          Have you inspected the query plan and verified that the index is used? Perhaps you'll need to add an optimizer override to force the optimizer to use the index.

          Show
          Knut Anders Hatlen added a comment - Have you inspected the query plan and verified that the index is used? Perhaps you'll need to add an optimizer override to force the optimizer to use the index.
          Hide
          geoff hendrey added a comment - - edited

          Here is the optimizer dump. Estimated cost is 207956.54. Perhaps it is due to the string concatenation (I am selecting an expression that references other columns).

          2010-05-26 18:17:34.966 GMT Thread[qtp0-0,5,main] (XID = 354560), (SESSIONID = 3), SELECT
          "r"."PK" AS "r.PK",
          '<a href="http://localhost:8080/nextdb/rest/geoff/PHOTOS/USER_PIC/row/' || TRIM(CHAR("r".PK)) || '/pic.' || SUBSTR( "pic_CONTENT_TYPE", LOCATE('/',"pic_CONTENT_TYPE")+1) || '" target="_self">show</a>' AS "r.pic",
          "r"."pic_CONTENT_TYPE" AS "r.pic_CONTENT_TYPE",
          "r"."pic_LENGTH" AS "r.pic_LENGTH",
          "r"."pic_FID" AS "r.pic_FID",
          "r"."title" AS "r.title",
          "r"."date" AS "r.date",
          "r"."public" AS "r.public"
          FROM
          "GEOFF_PHOTOS_USER_PIC" AS "r"
          OFFSET
          40 ROWS FETCH NEXT 10 ROWS ONLY ******* Scroll Insensitive ResultSet:
          Number of opens = 1
          Rows seen = 20
          Number of reads from hash table = 20
          Number of writes to hash table = 10
          constructor time (milliseconds) = 0
          open time (milliseconds) = 0
          next time (milliseconds) = 0
          close time (milliseconds) = 0
          optimizer estimated row count: 82.00
          optimizer estimated cost: 207956.54

          Source result set:
          Row Count (1):
          Number of opens = 1
          Rows seen = 10
          Rows filtered = 40
          constructor time (milliseconds) = 0
          open time (milliseconds) = 0
          next time (milliseconds) = 0
          close time (milliseconds) = 0
          optimizer estimated row count: 82.00
          optimizer estimated cost: 207956.54

          Source result set:
          Project-Restrict ResultSet (3):
          Number of opens = 1
          Rows seen = 50
          Rows filtered = 0
          restriction = false
          projection = true
          constructor time (milliseconds) = 0
          open time (milliseconds) = 0
          next time (milliseconds) = 0
          close time (milliseconds) = 0
          restriction time (milliseconds) = 0
          projection time (milliseconds) = 0
          optimizer estimated row count: 82.00
          optimizer estimated cost: 207956.54

          Source result set:
          Table Scan ResultSet for GEOFF_PHOTOS_USER_PIC at read uncommitted isolation level using share row locking chosen by the optimizer
          Number of opens = 1
          Rows seen = 50
          Rows filtered = 0
          Fetch Size = 16
          constructor time (milliseconds) = 0
          open time (milliseconds) = 0
          next time (milliseconds) = 0
          close time (milliseconds) = 0
          next time in milliseconds/row = 0

          scan information:
          Bit set of columns fetched=

          {0, 2, 3, 4, 5, 6, 7}

          Number of columns fetched=7
          Number of pages visited=58
          Number of rows qualified=64
          Number of rows visited=67
          Scan type=heap
          start position:
          null
          stop position:
          null
          qualifiers:
          None
          optimizer estimated row count: 82.00
          optimizer estimated cost: 207956.54

          Sorry, I was wrong. I did not have an index covering all the columns and I only noticed it when I looked at the dump above. pic_CONTENT_TYPE and pic_LENGTH are not covered. I removed the columns not covered by the index and the query is fast. I also tried forcing the optimizer to use the PK index with "--DERBY-PROPERTIES index=GEOFF_PHOTOSUSER_PK" but that didn't speed it up, whereas just making sure I had an index covering all the columns did. I'm not sure how realistic a strategy it is to have to have indexes covering all the columns on any table that has binary data. Could you comment on that?

          Show
          geoff hendrey added a comment - - edited Here is the optimizer dump. Estimated cost is 207956.54. Perhaps it is due to the string concatenation (I am selecting an expression that references other columns). 2010-05-26 18:17:34.966 GMT Thread [qtp0-0,5,main] (XID = 354560), (SESSIONID = 3), SELECT "r"."PK" AS "r.PK", '<a href="http://localhost:8080/nextdb/rest/geoff/PHOTOS/USER_PIC/row/' || TRIM(CHAR("r".PK)) || '/pic.' || SUBSTR( "pic_CONTENT_TYPE", LOCATE('/',"pic_CONTENT_TYPE")+1) || '" target="_self">show</a>' AS "r.pic", "r"."pic_CONTENT_TYPE" AS "r.pic_CONTENT_TYPE", "r"."pic_LENGTH" AS "r.pic_LENGTH", "r"."pic_FID" AS "r.pic_FID", "r"."title" AS "r.title", "r"."date" AS "r.date", "r"."public" AS "r.public" FROM "GEOFF_ PHOTOS _USER_PIC" AS "r" OFFSET 40 ROWS FETCH NEXT 10 ROWS ONLY ******* Scroll Insensitive ResultSet: Number of opens = 1 Rows seen = 20 Number of reads from hash table = 20 Number of writes to hash table = 10 constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 optimizer estimated row count: 82.00 optimizer estimated cost: 207956.54 Source result set: Row Count (1): Number of opens = 1 Rows seen = 10 Rows filtered = 40 constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 optimizer estimated row count: 82.00 optimizer estimated cost: 207956.54 Source result set: Project-Restrict ResultSet (3): Number of opens = 1 Rows seen = 50 Rows filtered = 0 restriction = false projection = true constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 restriction time (milliseconds) = 0 projection time (milliseconds) = 0 optimizer estimated row count: 82.00 optimizer estimated cost: 207956.54 Source result set: Table Scan ResultSet for GEOFF_ PHOTOS _USER_PIC at read uncommitted isolation level using share row locking chosen by the optimizer Number of opens = 1 Rows seen = 50 Rows filtered = 0 Fetch Size = 16 constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 next time in milliseconds/row = 0 scan information: Bit set of columns fetched= {0, 2, 3, 4, 5, 6, 7} Number of columns fetched=7 Number of pages visited=58 Number of rows qualified=64 Number of rows visited=67 Scan type=heap start position: null stop position: null qualifiers: None optimizer estimated row count: 82.00 optimizer estimated cost: 207956.54 Sorry, I was wrong. I did not have an index covering all the columns and I only noticed it when I looked at the dump above. pic_CONTENT_TYPE and pic_LENGTH are not covered. I removed the columns not covered by the index and the query is fast. I also tried forcing the optimizer to use the PK index with "--DERBY-PROPERTIES index=GEOFF_ PHOTOS USER _PK" but that didn't speed it up, whereas just making sure I had an index covering all the columns did. I'm not sure how realistic a strategy it is to have to have indexes covering all the columns on any table that has binary data. Could you comment on that?
          Hide
          Knut Anders Hatlen added a comment -

          > I'm not sure how realistic a strategy it is to have to have indexes covering all the columns on any table that has binary data. Could you comment on that?

          Yes, that's why I called it a workaround. A proper fix for this problem would probably have to include one of the suggestions in DERBY-1506.

          Show
          Knut Anders Hatlen added a comment - > I'm not sure how realistic a strategy it is to have to have indexes covering all the columns on any table that has binary data. Could you comment on that? Yes, that's why I called it a workaround. A proper fix for this problem would probably have to include one of the suggestions in DERBY-1506 .
          Hide
          geoff hendrey added a comment -

          Hi,
          I wasn't able to tell what changes were made. Was it a fix?

          Sent from my iPhone

          On Jun 29, 2010, at 2:14 PM, "Mike Matrigali (JIRA)" <jira@apache.org> wrote:

          [ https://issues.apache.org/jira/browse/DERBY-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

          Mike Matrigali updated DERBY-4675:
          ----------------------------------

          Component/s: SQL

          OFFSET/FETCH SYNTAX EFFICIENCY
          ------------------------------

          Key: DERBY-4675
          URL: https://issues.apache.org/jira/browse/DERBY-4675
          Project: Derby
          Issue Type: Improvement
          Components: SQL
          Affects Versions: 10.6.1.0
          Environment: unix
          Reporter: geoff hendrey

          Using OFFSET and FETCH isn't any faster for paging than using existing JDBC methods:
          //p is page number, n is rows per page
          setMaxRows(n*p);
          setFetchSize;
          //...
          rs.absolute((p-1)*n);
          When used on a table with blobs, both the JDBC way, and the OFFSET/FETCH way are unexpectedly slow ('unexpectedly' because I am not even retrieving the BLOB column).
          I need a way to do paging that doesn't slow down proportionately to the page I am requesting. It must also maintain this performance on tables with blobs. Following is discussion from users mail list, year or so ago.
          2) what are the performance implications for users of the embedded
          driver? In particular, with the embedded driver I am hoping that
          this feature allows portions of a result set to be retrieved without
          the overhead of retrieving the entire result set. For example, if I
          I am afraid that with embedded driver, you will only save a little CPU
          (by avoiding some JDBC calls) since under the hood, the code siphons
          off the rows till it hits the offset, so if you have a large offset,
          you will still incur reading of those rows (modulo page caching). In
          client/server driver context the savings are larger, of course, in
          that fewer rows are sent over the wire. For simple queries that can
          use an index, the optimizer could make use of the offset information
          to avoid reading the entire row when skipping rows before offset, just
          counting rows in the index to get to the first qualifying row, but
          this optimization is not yet implemented.
          Often, this feature is used together with ORDER BY which would entail
          some sorting of the result set and then all the rows would have to be
          read anyway. Again, for some simple queries, sort avoidance is used by
          the optimizer, so optimization is still possible for for such queries.
          If you think this optimization is an important capability feel free to
          file an improvement issue for it.


          This message is automatically generated by JIRA.
          -
          You can reply to this email to add a comment to the issue online.

          Show
          geoff hendrey added a comment - Hi, I wasn't able to tell what changes were made. Was it a fix? Sent from my iPhone On Jun 29, 2010, at 2:14 PM, "Mike Matrigali (JIRA)" <jira@apache.org> wrote: [ https://issues.apache.org/jira/browse/DERBY-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Matrigali updated DERBY-4675 : ---------------------------------- Component/s: SQL OFFSET/FETCH SYNTAX EFFICIENCY ------------------------------ Key: DERBY-4675 URL: https://issues.apache.org/jira/browse/DERBY-4675 Project: Derby Issue Type: Improvement Components: SQL Affects Versions: 10.6.1.0 Environment: unix Reporter: geoff hendrey Using OFFSET and FETCH isn't any faster for paging than using existing JDBC methods: //p is page number, n is rows per page setMaxRows(n*p); setFetchSize ; //... rs.absolute((p-1)*n); When used on a table with blobs, both the JDBC way, and the OFFSET/FETCH way are unexpectedly slow ('unexpectedly' because I am not even retrieving the BLOB column). I need a way to do paging that doesn't slow down proportionately to the page I am requesting. It must also maintain this performance on tables with blobs. Following is discussion from users mail list, year or so ago. 2) what are the performance implications for users of the embedded driver? In particular, with the embedded driver I am hoping that this feature allows portions of a result set to be retrieved without the overhead of retrieving the entire result set. For example, if I I am afraid that with embedded driver, you will only save a little CPU (by avoiding some JDBC calls) since under the hood, the code siphons off the rows till it hits the offset, so if you have a large offset, you will still incur reading of those rows (modulo page caching). In client/server driver context the savings are larger, of course, in that fewer rows are sent over the wire. For simple queries that can use an index, the optimizer could make use of the offset information to avoid reading the entire row when skipping rows before offset, just counting rows in the index to get to the first qualifying row, but this optimization is not yet implemented. Often, this feature is used together with ORDER BY which would entail some sorting of the result set and then all the rows would have to be read anyway. Again, for some simple queries, sort avoidance is used by the optimizer, so optimization is still possible for for such queries. If you think this optimization is an important capability feel free to file an improvement issue for it. – This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.

            People

            • Assignee:
              Unassigned
              Reporter:
              geoff hendrey
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Development