Derby
  1. Derby
  2. DERBY-1506

full table scans of tables which don't use indexes, which have blobs, but don't reference blob data still read all pages of the table

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 10.1.3.1
    • Fix Version/s: None
    • Component/s: Store
    • Labels:

      Description

      A full table scan that does not use an index always reads every allocated page of the table through the cache. In two cases
      this means logically it reads more data that necessary: long rows (where the head columns span multiple pages) and long columns
      (columns where the data in a single column spans multiple pages). In both these cases the space for the "overflow" portion of the
      row or column is currently stored in the same space as the regular "main" pages. The current implementation of a table scan of
      a heap container is to call raw store to give it a linear list of main pages with rows, raw store conglomerate implementations step through each allocated page in the container and returns the "main" pages (reading the overflow pages into cache, identifying them, and skipping them)
      the access layer which then returns rows as requested to the query processing layer.

      If a table contains rows with very long columns (ie. 2gig blobs), and the tablescan does not request the blob data then a lot of data
      is read from disk but not required by the query.
      A more unusual case is a table scan on requiring a few columns from a table made up of 2 gig rows made up of all less than 32k columns.,
      in this case also derby will read all pages as part of a tablescan even if only the first column is the only required column of the chain.

      Note that this is only a problem in tablescan of heap tables. In both cases if an index is used to get the row, then ONLY the required data is
      read from disk. In the long column case the main row has only a pointer to the overflow chain for the blob and it will not be read unless the
      blob data is required. In the long row case data, columns appear in the container in the order they are created in the original "create table"
      statement. Data is read from disk into cache for all columns from the 1st up to the "last" one referenced in the query. Data objects are only
      instantiated from the cache data for the columns referenced in the query.

      I have marked this low in priority as I believe that full, unindexed tables scans of tables with gigabyte blobs are not the normal case. Seems like most applications would do keyed lookups of the table. But there may be apps that need to
      do full table reporting on the non'-blob data in such a table.

        Issue Links

          Activity

          Hide
          Mike Matrigali added a comment -

          This improvement report was prompted by the following performance report on the derby list:
          Hi all,

          When experimenting with BLOB's I ran into a performance issue
          that I cannot completely explain, but it could be a bug.

          Given the following table:

          CREATE TABLE BLOB_TABLE (
          BLOB_ID BIGINT NOT NULL GENERATED ALWAYS AS IDENTITY (START WITH 1, INCREMENT BY 1),
          BLOB_SIZE BIGINT NOT NULL,
          BLOB_CRC BIGINT NOT NULL,
          BLOB_NAME VARCHAR(255) NOT NULL,
          BLOB_DATA BLOB(2G) NOT NULL,
          UNIQUE (BLOB_CRC, BLOB_SIZE),
          PRIMARY KEY (BLOB_ID)
          );

          which is populated with 27 rows,
          where the sum of all BLOB sizes is 5,885,060,164 bytes
          (about 200 MB average per BLOB, but ranging from 10 MB to 750 MB).

          Some queries on this table are executed really
          fast (almost instantaneous response).

          However, the following query needs about 10 minutes to complete:

          SELECT BLOB_ID, BLOB_NAME, BLOB_SIZE, BLOB_CRC FROM BLOB_TABLE;

          I reasoned that maybe Derby is scanning the whole table
          (including the blob contents) so I tried to add a dummy WHERE
          clause (dummy because all BLOB_ID's are greater than 0)
          to offer a clue as to what rows (all of course) are needed,
          as follows

          SELECT BLOB_ID, BLOB_NAME, BLOB_SIZE, BLOB_CRC FROM BLOB_TABLE WHERE BLOB_ID > 0;

          and that helped: instantaneous response.

          But I really think that the original query,
          without the where clause, should not be any slower.

          I am using Derby 10.1.3.1 embedded, Windows XP and Sun Java 1.5.0_06.
          Both queries executed with a Statement, not a PreparedStatement.

          Kind regards,

          Piet Blok

          Show
          Mike Matrigali added a comment - This improvement report was prompted by the following performance report on the derby list: Hi all, When experimenting with BLOB's I ran into a performance issue that I cannot completely explain, but it could be a bug. Given the following table: CREATE TABLE BLOB_TABLE ( BLOB_ID BIGINT NOT NULL GENERATED ALWAYS AS IDENTITY (START WITH 1, INCREMENT BY 1), BLOB_SIZE BIGINT NOT NULL, BLOB_CRC BIGINT NOT NULL, BLOB_NAME VARCHAR(255) NOT NULL, BLOB_DATA BLOB(2G) NOT NULL, UNIQUE (BLOB_CRC, BLOB_SIZE), PRIMARY KEY (BLOB_ID) ); which is populated with 27 rows, where the sum of all BLOB sizes is 5,885,060,164 bytes (about 200 MB average per BLOB, but ranging from 10 MB to 750 MB). Some queries on this table are executed really fast (almost instantaneous response). However, the following query needs about 10 minutes to complete: SELECT BLOB_ID, BLOB_NAME, BLOB_SIZE, BLOB_CRC FROM BLOB_TABLE; I reasoned that maybe Derby is scanning the whole table (including the blob contents) so I tried to add a dummy WHERE clause (dummy because all BLOB_ID's are greater than 0) to offer a clue as to what rows (all of course) are needed, as follows SELECT BLOB_ID, BLOB_NAME, BLOB_SIZE, BLOB_CRC FROM BLOB_TABLE WHERE BLOB_ID > 0; and that helped: instantaneous response. But I really think that the original query, without the where clause, should not be any slower. I am using Derby 10.1.3.1 embedded, Windows XP and Sun Java 1.5.0_06. Both queries executed with a Statement, not a PreparedStatement. Kind regards, Piet Blok
          Hide
          Mike Matrigali added a comment -

          The following approach was suggested by derby@seguel.com

          Maybe a simple fix would be to segment the pages in the container. Store the
          relational information in the pages at the "top" of the container, and a
          reference to the first page of "Blob space". Then each row would then store
          a reference to the blob pages that are relative to the start of the
          Blobspace.

          I wouldn't call this a bug per se, but a design issue.
          I would suggest opening up a JIRA case to focus on the data storage rather
          than the optimizer.

          Clearly Derby's storage of Blobs is inefficient.

          So, the key would be to make improvements on Derby's storage, that would
          have minimal impact on existing implementations.

          If we extend the container concept to be more intelligent about the data and
          its pages in the "table", we can greatly improve performance. By segmenting
          the blobs away from the rest of the table data, we could increase the
          efficiency of the table, with the only sacrifice being an extra jump to get
          to the front of the blob. (Relatively speaking, this cost would be minimal
          considering that we're talking about blobs that for the most part span
          multiple pages.

          Show
          Mike Matrigali added a comment - The following approach was suggested by derby@seguel.com Maybe a simple fix would be to segment the pages in the container. Store the relational information in the pages at the "top" of the container, and a reference to the first page of "Blob space". Then each row would then store a reference to the blob pages that are relative to the start of the Blobspace. I wouldn't call this a bug per se, but a design issue. I would suggest opening up a JIRA case to focus on the data storage rather than the optimizer. Clearly Derby's storage of Blobs is inefficient. So, the key would be to make improvements on Derby's storage, that would have minimal impact on existing implementations. If we extend the container concept to be more intelligent about the data and its pages in the "table", we can greatly improve performance. By segmenting the blobs away from the rest of the table data, we could increase the efficiency of the table, with the only sacrifice being an extra jump to get to the front of the blob. (Relatively speaking, this cost would be minimal considering that we're talking about blobs that for the most part span multiple pages.
          Hide
          Mike Matrigali added a comment -

          I haven't thought about this much, but the following approaches all would solve the problem, some easier than others,
          these address the long column issue (I don't think the long row issue is as important):

          1) Provide alternate heap and/or container implementation where overflow pointer of overflow column points to page
          in another container, thus effectively moving "blob" space out of current container. Need to decide how many
          blob spaces per table. Some options are: 1 per table, N per table (each growing to X bytes where X may be max
          file size on the device), 1 per blob column, ...

          I lean toward moving the blob space out of the current space rather than segmenting the current space to blob
          and non blob space. This would allow a possible easier path in the future to allow stuff like non-logged blobs.

          2) provide an alternate(upgraded) implementation of the container implementation where the page map tracked
          page type in addition to allocated state. Or separate page maps for page type. Then the scan of "main" pages
          could be optimized to use the page maps to efficiently get to the "next" main page. Should be careful not to make
          these new page maps a concurrency problems where multiple scans now block each other on access to the
          page maps.

          3) For already indexed tables, figure out way for optimizer to use the index for the scan (I am likely to report this as a
          separate JIRA issue).

          4) For unindexed tables, assuming fix for 3 is implemented. We could create an internal index on the table that would
          use existing techonology and basically provide functionality of #2 - at the cost of maintaining the index. My initial
          take would be that it is reasonable to assume some sort of index (or primary key) on large tables, in applications
          that care about performance.

          Show
          Mike Matrigali added a comment - I haven't thought about this much, but the following approaches all would solve the problem, some easier than others, these address the long column issue (I don't think the long row issue is as important): 1) Provide alternate heap and/or container implementation where overflow pointer of overflow column points to page in another container, thus effectively moving "blob" space out of current container. Need to decide how many blob spaces per table. Some options are: 1 per table, N per table (each growing to X bytes where X may be max file size on the device), 1 per blob column, ... I lean toward moving the blob space out of the current space rather than segmenting the current space to blob and non blob space. This would allow a possible easier path in the future to allow stuff like non-logged blobs. 2) provide an alternate(upgraded) implementation of the container implementation where the page map tracked page type in addition to allocated state. Or separate page maps for page type. Then the scan of "main" pages could be optimized to use the page maps to efficiently get to the "next" main page. Should be careful not to make these new page maps a concurrency problems where multiple scans now block each other on access to the page maps. 3) For already indexed tables, figure out way for optimizer to use the index for the scan (I am likely to report this as a separate JIRA issue). 4) For unindexed tables, assuming fix for 3 is implemented. We could create an internal index on the table that would use existing techonology and basically provide functionality of #2 - at the cost of maintaining the index. My initial take would be that it is reasonable to assume some sort of index (or primary key) on large tables, in applications that care about performance.
          Hide
          Trejkaz added a comment -

          This issue is the most likely cause for a performance issue we are encountering when indexing a table.

          For performance reasons, we do all the inserts with no index and then create the index in one shot at the end - however, because all the blob data gets read back in unnecessarily, the indexing speed is impacted, taking away some or all of the benefit of delaying the indexing to the end. (I will have to do some timing to see if doing it this way around is still the more efficient way.)

          Show
          Trejkaz added a comment - This issue is the most likely cause for a performance issue we are encountering when indexing a table. For performance reasons, we do all the inserts with no index and then create the index in one shot at the end - however, because all the blob data gets read back in unnecessarily, the indexing speed is impacted, taking away some or all of the benefit of delaying the indexing to the end. (I will have to do some timing to see if doing it this way around is still the more efficient way.)

            People

            • Assignee:
              Unassigned
              Reporter:
              Mike Matrigali
            • Votes:
              6 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:

                Development