Derby
  1. Derby
  2. DERBY-3753

select from table with integer primary key and blob column does not do sort avoidance

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 10.3.3.0, 10.4.1.3, 10.5.1.1
    • Fix Version/s: None
    • Component/s: SQL
    • Urgency:
      Normal
    • Issue & fix info:
      High Value Fix, Repro attached

      Description

      In working on DERBY-3732 I noticed that select from a table with a primary key and a blob column does not do sort avoidance. If you remove the optimizer directive, this fixture in BlobMemTest will run out of memory sorting if memory.BlobMemTest is run with 16M heap.

      private void testBlobLength(boolean lengthless) throws SQLException, IOException, IllegalArgumentException, IllegalAccessException, InvocationTargetException {
      getConnection().setAutoCommit(false);
      Statement s = createStatement();
      s.executeUpdate("CREATE TABLE BLOBTAB (K INT CONSTRAINT PK PRIMARY KEY, B BLOB(" + LONG_BLOB_LENGTH + "))");

      PreparedStatement ps = prepareStatement("INSERT INTO BLOBTAB VALUES(?,?)");
      // We allocate 16MB for the test so use something bigger than that.
      ps.setInt(1,1);
      LoopingAlphabetStream stream = new LoopingAlphabetStream(LONG_BLOB_LENGTH);
      if (lengthless) {
      Method m = null;
      try {
      Class c = ps.getClass();
      m = c.getMethod("setBinaryStream",new Class[]

      {Integer.TYPE, InputStream.class}

      );
      } catch (NoSuchMethodException e)

      { // ignore method not found as method may not be present for // jdk's lower than 1.6. println("Skipping lengthless insert because method is not available"); return; }

      m.invoke(ps, new Object[]

      {new Integer(2),stream}

      );
      }
      else
      ps.setBinaryStream(2, stream,LONG_BLOB_LENGTH);
      ps.executeUpdate();
      // insert a zero length blob.
      ps.setInt(1, 2);
      ps.setBytes(2, new byte[] {});
      ps.executeUpdate();
      // insert a null blob.
      ps.setInt(1, 3);
      ps.setBytes(2,null);
      ps.executeUpdate();
      // insert a short blob
      ps.setInt(1, 4);
      ps.setBytes(2, SHORT_BLOB_BYTES);
      ps.executeUpdate();
      // Currently need to use optimizer override to force use of the index.
      // Derby should use sort avoidance and do it automatically, but there
      // appears to be a bug.
      ResultSet rs = s.executeQuery("SELECT K, LENGTH(B), B FROM BLOBTAB" +
      "-- DERBY-PROPERTIES constraint=pk\n ORDER BY K");
      rs.next();
      assertEquals(LONG_BLOB_LENGTH_STRING,rs.getString(2));
      // make sure we can still access the blob after getting length.
      // It should be ok because we reset the stream
      InputStream rsstream = rs.getBinaryStream(3);
      int len= 0;
      byte[] buf = new byte[32672];
      for (;

      { int size = rsstream.read(buf); if (size == -1) break; len += size; int expectedValue = ((len -1) % 26) + 'a'; if (size != 0) assertEquals(expectedValue,buf[size -1]); }

      assertEquals(LONG_BLOB_LENGTH,len);
      // empty blob
      rs.next();
      assertEquals("0",rs.getString(2));
      byte[] bytes = rs.getBytes(3);
      assertEquals(0, bytes.length);
      // null blob
      rs.next();
      assertEquals(null,rs.getString(2));
      bytes = rs.getBytes(3);
      assertEquals(null,bytes);
      // short blob
      rs.next();
      assertEquals("3",rs.getString(2));
      bytes = rs.getBytes(3);
      assertTrue(Arrays.equals(SHORT_BLOB_BYTES, bytes));
      rs.close();

      // Select just length without selecting the blob.
      rs = s.executeQuery("SELECT K, LENGTH(B) FROM BLOBTAB " +
      "ORDER BY K");
      JDBC.assertFullResultSet(rs, new String [][] {{"1",LONG_BLOB_LENGTH_STRING},

      {"2","0"}

      ,

      {"3",null}

      ,{"4","3"}});
      }

      1. Derby3753.java
        1 kB
        Kathey Marsden
      2. derby.log
        6 kB
        Kathey Marsden

        Issue Links

          Activity

          Hide
          Kathey Marsden added a comment - - edited

          Attached is a stand alone repro for this issue, but you have to look at the derby.log to see the plans. The first query in the log is without the optimizer directive. The second is with the optimizer directive to use the index. Without the optimizer directive the query does a table scan and the cost is estimated to be less than that of using the index. I thought perhaps there would be a difference if I added more rows but with 1000 rows I get the same result.

          On first run, run
          java Derby3753 load

          On subsequent runs, no need to specify the load option.

          Show
          Kathey Marsden added a comment - - edited Attached is a stand alone repro for this issue, but you have to look at the derby.log to see the plans. The first query in the log is without the optimizer directive. The second is with the optimizer directive to use the index. Without the optimizer directive the query does a table scan and the cost is estimated to be less than that of using the index. I thought perhaps there would be a difference if I added more rows but with 1000 rows I get the same result. On first run, run java Derby3753 load On subsequent runs, no need to specify the load option.
          Hide
          Knut Anders Hatlen added a comment -

          Triaged for 10.5.2. Workaround exists (optimizer override).

          Show
          Knut Anders Hatlen added a comment - Triaged for 10.5.2. Workaround exists (optimizer override).
          Hide
          Knut Anders Hatlen added a comment -

          By the way, the repro doesn't actually insert any data. The for loop only calls setInt() and setBytes(), not executeUpdate().

          Show
          Knut Anders Hatlen added a comment - By the way, the repro doesn't actually insert any data. The for loop only calls setInt() and setBytes(), not executeUpdate().
          Hide
          Mike Matrigali added a comment -

          rather than sort avoidance what might be more interesting is if there is any way to avoid sending the blob to the sorter. I would not be surprised if the most efficient way to execute this query is to do the sequential scan of the base table doing the length calculation and finally only sending the (key, blob_length) through the sorter -
          since every single row of the table is looked at. The sequential scan might be more i/o efficient than random
          probes from the index.

          Defintely should test whether 1 vs. 1000 rows is any difference.

          I would not be surprised if sort estimates were not including the blob, but for some reason the sort is including the blob. Should run it against trunk to see if it still breaks given all the blob optimization that has been done in last few releases.

          Show
          Mike Matrigali added a comment - rather than sort avoidance what might be more interesting is if there is any way to avoid sending the blob to the sorter. I would not be surprised if the most efficient way to execute this query is to do the sequential scan of the base table doing the length calculation and finally only sending the (key, blob_length) through the sorter - since every single row of the table is looked at. The sequential scan might be more i/o efficient than random probes from the index. Defintely should test whether 1 vs. 1000 rows is any difference. I would not be surprised if sort estimates were not including the blob, but for some reason the sort is including the blob. Should run it against trunk to see if it still breaks given all the blob optimization that has been done in last few releases.

            People

            • Assignee:
              Unassigned
              Reporter:
              Kathey Marsden
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:

                Development