Derby
  1. Derby
  2. DERBY-2487 Enhance Derby with EXPLAIN Functionality
  3. DERBY-4062

Remove single-argument getDataValue overrides from DataValueFactory interface

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 10.6.1.0
    • Component/s: None
    • Labels:
      None
    • Bug behavior facts:
      Data corruption, Wrong query result

      Description

      [ Issue title edited to reflect the discussion about how to clarify the use of this interface. ]

      I believe the problem involves o.a.d.iapi.types.DataValueFactory.
      This interface defines dozens and dozens of overloads of the method
      getDataValue(), for lots of different combinations of datatypes.

      For most of the Java "boxed" types (Short, Long, Float, Double, etc.),
      DataValueFactory defines a pair of getDataValue() methods. For example,
      here are the method pair that the interface defines for Short:

      /**

      • Get a SQL smallint with the given value. A null argument means get
      • a SQL null value. The second form uses the previous value (if non-null)
      • to hold the return value.
        *
        */
        NumberDataValue getDataValue(Short value);
        NumberDataValue getDataValue(Short value, NumberDataValue previous)
        throws StandardException;

      HOWEVER, for the Integer type, DataValueFactory doesn't define both overloads,
      but only defines the 'previous'-style overload:

      /**

      • Get a SQL int with the given value. A null argument means get
      • a SQL null value. Uses the previous value (if non-null)
      • to hold the return value.
        *
        */
        NumberDataValue getDataValue(Integer value, NumberDataValue previous)
        throws StandardException;

      The actual implementation, in o.a.d.iapi.types.DataValueFactoryImpl, though,
      does implement both the Integer overloads. But this method is NOT present
      in the DataValueFactory interface:

      NumberDataValue getDataValue(Integer value);

      Because this method is not present in the interface, code such as

      row.setColumn(SYSXPLAIN_RESULTSET_NO_OPENS, dvf.getDataValue(no_opens));

      which the code anticipates will invoke the above method, instead calls the method

      public UserDataValue getDataValue(Object value);

      which has a very different behavior (instead of returning a SQLInteger, it returns a UserType).

      This accidental invocation of the wrong implementation method was causing data corruption
      errors in regression tests for the DERBY-2487 patch, which uses the above setColumn call.
      Instead of inserting SQLInteger values into the system table, the code was inserting
      java.lang.Integer UserType values; since those values don't match the defined type of
      the column(s) in the system catalog, the table appeared to be corrupt.

      I believe that this problem never affects external Derby applications, but only internal Derby code,
      as the DataValueFactory interface is an internal interface only. Still, since it appeared to
      cause data corruption and invalid query results, it is potentially a quite serious problem.

      See this thread in the derby-dev archives for a bit more discussion:
      http://mail-archives.apache.org/mod_mbox/db-derby-dev/200902.mbox/%3C4997818E.3080007@amberpoint.com%3E

      1. addToInterface.diff
        0.6 kB
        Bryan Pendleton
      2. getDataValueCalls.out
        9 kB
        Bryan Pendleton
      3. removeSingleArgOverrides.diff
        29 kB
        Bryan Pendleton
      4. changesFromReview.diff
        41 kB
        Bryan Pendleton

        Activity

        Hide
        Bryan Pendleton added a comment -

        Attached patch adds the indicated method to the DataValueFactory interface.

        The regular regression tests ran clean, except that I can't run the upgrade tests
        right now due to svn.apache.org being down.

        Can anyone think of a straightforward way to write a regression test for this?
        Since the DataValueFactory API is an internal class, it's hard to expose
        this problem from one of the typical JDBC JUnit tests. Can one of our
        JUnit tests get an instance of the DataValueFactoryImpl in a simple manner?

        Show
        Bryan Pendleton added a comment - Attached patch adds the indicated method to the DataValueFactory interface. The regular regression tests ran clean, except that I can't run the upgrade tests right now due to svn.apache.org being down. Can anyone think of a straightforward way to write a regression test for this? Since the DataValueFactory API is an internal class, it's hard to expose this problem from one of the typical JDBC JUnit tests. Can one of our JUnit tests get an instance of the DataValueFactoryImpl in a simple manner?
        Hide
        Knut Anders Hatlen added a comment -

        Myrna mentioned on derby-dev that the method used to be there. I checked and it's still in the 10.0-10.2 branches. It was removed in this commit:

        ------------------------------------------------------------------------
        r540791 | djd | 2007-05-23 01:34:39 +0200 (Wed, 23 May 2007) | 2 lines

        Remove some unused methods in DataValueFactory that fetched new DataValueDescriptors without passing
        in a holder object.
        ------------------------------------------------------------------------

        Show
        Knut Anders Hatlen added a comment - Myrna mentioned on derby-dev that the method used to be there. I checked and it's still in the 10.0-10.2 branches. It was removed in this commit: ------------------------------------------------------------------------ r540791 | djd | 2007-05-23 01:34:39 +0200 (Wed, 23 May 2007) | 2 lines Remove some unused methods in DataValueFactory that fetched new DataValueDescriptors without passing in a holder object. ------------------------------------------------------------------------
        Hide
        Bryan Pendleton added a comment -

        Thanks for tracking the commit down, Knut. That change occurred
        after the DERBY-2487 patch was originally developed, which
        helps explain why the original developer of that patch didn't notice this.

        I'm a little confused about why the change removed some of
        the DVF single-argument API overloads, but not all of them.

        For example, revision 540791 removed getDataValue(short)
        from the DataValueFactory interface, but left getDataValue(Short).

        But the reverse change occurred with int/Integer: revision 540791
        removed getDataValue(Integer) from the DataValueFactory interface,
        but left getDataValue(int). So that's confusing.

        Also, the change removed the implementation of getDataValue(short)
        as well as the interface, but for getDataValue(Integer) the method
        was removed only from the interface, not from the implementation.

        I suppose that one alternative is to leave this alone as is, and to alter
        the DERBY-2487 code so that instead of calling getDataValue(Integer),
        it calls getDataValue(Integer, NumberDataValue previous), and pass
        a NULL value for previous. Thus change:

        row.setColumn(SYSXPLAIN_RESULTSET_NO_OPENS, dvf.getDataValue(no_opens));

        to

        row.setColumn(SYSXPLAIN_RESULTSET_NO_OPENS, dvf.getDataValue(no_opens, (NumberDataValue)null));

        and similarly for all the other Integer columns in the new system tables.

        It's a little bit scary to leave this interface in this strange state where, due
        to argument promotion rules, the caller may not be completely clear on
        what actual overridden method will be called.

        Show
        Bryan Pendleton added a comment - Thanks for tracking the commit down, Knut. That change occurred after the DERBY-2487 patch was originally developed, which helps explain why the original developer of that patch didn't notice this. I'm a little confused about why the change removed some of the DVF single-argument API overloads, but not all of them. For example, revision 540791 removed getDataValue(short) from the DataValueFactory interface, but left getDataValue(Short). But the reverse change occurred with int/Integer: revision 540791 removed getDataValue(Integer) from the DataValueFactory interface, but left getDataValue(int). So that's confusing. Also, the change removed the implementation of getDataValue(short) as well as the interface, but for getDataValue(Integer) the method was removed only from the interface, not from the implementation. I suppose that one alternative is to leave this alone as is, and to alter the DERBY-2487 code so that instead of calling getDataValue(Integer), it calls getDataValue(Integer, NumberDataValue previous), and pass a NULL value for previous. Thus change: row.setColumn(SYSXPLAIN_RESULTSET_NO_OPENS, dvf.getDataValue(no_opens)); to row.setColumn(SYSXPLAIN_RESULTSET_NO_OPENS, dvf.getDataValue(no_opens, (NumberDataValue)null)); and similarly for all the other Integer columns in the new system tables. It's a little bit scary to leave this interface in this strange state where, due to argument promotion rules, the caller may not be completely clear on what actual overridden method will be called.
        Hide
        Knut Anders Hatlen added a comment -

        I find this change confusing too. I'd say we should either remove all the methods without a previous parameter, or we should keep all of them. The existence of a catch-all method (getDataValue(Object)) that silently returns another data type than expected makes it too risky to have the mix that we have now, in my opinion.

        If we follow the pattern of the 540791 commit, we should replace calls to dvf.getDataValue() without a previous argument with a direct call to the constructor of the data type we want. So we would change

        row.setColumn(SYSXPLAIN_RESULTSET_NO_OPENS, dvf.getDataValue(no_opens));

        to

        row.setColumn(SYSXPLAIN_RESULTSET_NO_OPENS, new SQLInteger(no_opens));

        I see that that's what many other row factories do as well, and in some ways I think it's easier to read as you immediately see what's the intended data type for that column. And the SQL* classes are in the same iapi package as the DataValueFactory interface, so it's not like we're exposing implementation details by making such a change.

        Show
        Knut Anders Hatlen added a comment - I find this change confusing too. I'd say we should either remove all the methods without a previous parameter, or we should keep all of them. The existence of a catch-all method (getDataValue(Object)) that silently returns another data type than expected makes it too risky to have the mix that we have now, in my opinion. If we follow the pattern of the 540791 commit, we should replace calls to dvf.getDataValue() without a previous argument with a direct call to the constructor of the data type we want. So we would change row.setColumn(SYSXPLAIN_RESULTSET_NO_OPENS, dvf.getDataValue(no_opens)); to row.setColumn(SYSXPLAIN_RESULTSET_NO_OPENS, new SQLInteger(no_opens)); I see that that's what many other row factories do as well, and in some ways I think it's easier to read as you immediately see what's the intended data type for that column. And the SQL* classes are in the same iapi package as the DataValueFactory interface, so it's not like we're exposing implementation details by making such a change.
        Hide
        Kristian Waagan added a comment -

        I guess the DataValueFactory interface (and the two implementations) were introduced to allow for multiple setups.
        Is it safe to create the data types directly? I'm thinking of cases like DECIMAL, where we need a special implementation to support J2ME.

        Show
        Kristian Waagan added a comment - I guess the DataValueFactory interface (and the two implementations) were introduced to allow for multiple setups. Is it safe to create the data types directly? I'm thinking of cases like DECIMAL, where we need a special implementation to support J2ME.
        Hide
        Knut Anders Hatlen added a comment -

        The methods that have different implementations have their own separate names (getDecimalDataValue() instead of getDataValue()) so I don't think they will be affected if we remove the getDataValue() methods.

        Show
        Knut Anders Hatlen added a comment - The methods that have different implementations have their own separate names (getDecimalDataValue() instead of getDataValue()) so I don't think they will be affected if we remove the getDataValue() methods.
        Hide
        Bryan Pendleton added a comment -

        I think that removing the single-argument overrides would clarify the use of
        the interface, and so for the present I changed the issue to title to reflect that idea.

        Show
        Bryan Pendleton added a comment - I think that removing the single-argument overrides would clarify the use of the interface, and so for the present I changed the issue to title to reflect that idea.
        Hide
        Bryan Pendleton added a comment -

        There are a couple dozen calls to the single-argument version
        of getDataValue(). It seems that if I could replace those calls
        either with (a) equivalent calls to the multi-argument versions,
        or with (b) direct calls to the underlying data type constructors,
        then I could remove all the single-argument getDataValue
        overrides from both the interface and the implementation,
        and the result would be less code and simpler code.

        I'll investigate a possible patch that does this.

        Show
        Bryan Pendleton added a comment - There are a couple dozen calls to the single-argument version of getDataValue(). It seems that if I could replace those calls either with (a) equivalent calls to the multi-argument versions, or with (b) direct calls to the underlying data type constructors, then I could remove all the single-argument getDataValue overrides from both the interface and the implementation, and the result would be less code and simpler code. I'll investigate a possible patch that does this.
        Hide
        Bryan Pendleton added a comment -

        Consider this little snippet of code from SQLBooleanConstantNode.java:

        if ( val == null )

        { setValue(getTypeServices().getNull() ); }
        else
        { setValue(getDataValueFactory().getDataValue(val.booleanValue())); }

        It seems like there are (at least) 4 possible ways to re-write it, with (I believe) the same result.

        The first way I thought of was to replace the call to the single-argument version of
        getDataValue(boolean) with a call to getDataValue(boolean, BooleanDataValue). This makes
        this bit of code a little bit messier, but it allows us to remove getDataValue(boolean) from
        the DataValueFactory interface.

        if ( val == null )
        { setValue(getTypeServices().getNull() ); }

        else

        { setValue(getDataValueFactory().getDataValue(val.booleanValue(), (BooleanDataValue)null)); }

        The second way I thought of is to use Knut's suggestion of directly calling the SQLBoolean
        constructor that we want, which in this case is SQLBoolean(boolean):

        if ( val == null )

        { setValue(getTypeServices().getNull() ); }

        else

        { setValue(new SQLBoolean(val.booleanValue())); }

        The third way to handle this is to observe that DataValueFactoryImpl.getDataValue(Boolean)
        already knows what to do if the passed-in Boolean object is null, so we can simplify
        this entire code in SQLBooleanConstantNode.java, I think, with the following:

        setValue(getDataValueFactory().getDataValue(val));

        The fourth way to handle this is to again eliminate the call to the DataValueFactory,
        which doesn't seem to be adding much value here, and simply construct a SQLBoolean
        object using the SQLBoolean(Boolean) constructor, which, too, knows what to do if
        the passed-in Boolean object is null, so we can just write:

        setValue(new SQLBoolean(val));

        This last code seems quite clear and simple, but I'm worried that maybe I'm over-simplifying
        things too much? Still, I think I'll at least pursue this 4th approach, and see what sort of
        a patch proposal I end up with.

        Show
        Bryan Pendleton added a comment - Consider this little snippet of code from SQLBooleanConstantNode.java: if ( val == null ) { setValue(getTypeServices().getNull() ); } else { setValue(getDataValueFactory().getDataValue(val.booleanValue())); } It seems like there are (at least) 4 possible ways to re-write it, with (I believe) the same result. The first way I thought of was to replace the call to the single-argument version of getDataValue(boolean) with a call to getDataValue(boolean, BooleanDataValue). This makes this bit of code a little bit messier, but it allows us to remove getDataValue(boolean) from the DataValueFactory interface. if ( val == null ) { setValue(getTypeServices().getNull() ); } else { setValue(getDataValueFactory().getDataValue(val.booleanValue(), (BooleanDataValue)null)); } The second way I thought of is to use Knut's suggestion of directly calling the SQLBoolean constructor that we want, which in this case is SQLBoolean(boolean): if ( val == null ) { setValue(getTypeServices().getNull() ); } else { setValue(new SQLBoolean(val.booleanValue())); } The third way to handle this is to observe that DataValueFactoryImpl.getDataValue(Boolean) already knows what to do if the passed-in Boolean object is null, so we can simplify this entire code in SQLBooleanConstantNode.java, I think, with the following: setValue(getDataValueFactory().getDataValue(val)); The fourth way to handle this is to again eliminate the call to the DataValueFactory, which doesn't seem to be adding much value here, and simply construct a SQLBoolean object using the SQLBoolean(Boolean) constructor, which, too, knows what to do if the passed-in Boolean object is null, so we can just write: setValue(new SQLBoolean(val)); This last code seems quite clear and simple, but I'm worried that maybe I'm over-simplifying things too much? Still, I think I'll at least pursue this 4th approach, and see what sort of a patch proposal I end up with.
        Hide
        Knut Anders Hatlen added a comment -

        Alternative 4 sounds good to me. I think both (3) and (4) are better than (1) and (2), but (3) still uses the single-argument getDataValue method (I assume that's just a typo?).

        Show
        Knut Anders Hatlen added a comment - Alternative 4 sounds good to me. I think both (3) and (4) are better than (1) and (2), but (3) still uses the single-argument getDataValue method (I assume that's just a typo?).
        Hide
        Bryan Pendleton added a comment -

        Knut, thanks for the feedback. You're right, that was a typo in my previous comment,
        I had meant to suggest the alternative of switching to the 2-argument getDataValue() call.

        But I also feel that the 4th alternative, the one which directly invokes the SQLtype constructor,
        is the cleanest, so I went ahead with a prototype of this approach.

        Attached is removeSingleArgOverrides.diff, a patch proposal. I think it's appealing
        because not only does it remove the core problem that this issue is concerned with
        (ambiguity in method overriding in the DataValueFactory interface), but it also does
        so by deleting code (which is good), and by making the code more self-evident, I think.

        For example, consider this bit from SYSCONGLOMERATESRowFactory.java:

        The current code says:

        /* 3rd column is CONGLOMERATENUMBER (long) */
        row.setColumn(3, dvf.getDataValue(conglomNumber));

        The new code says:
        /* 3rd column is CONGLOMERATENUMBER (long) */
        row.setColumn(3, new SQLLongint(conglomNumber));

        I think that the new code is clearer.

        Please have a look at the patch and let me know what you think.

        Show
        Bryan Pendleton added a comment - Knut, thanks for the feedback. You're right, that was a typo in my previous comment, I had meant to suggest the alternative of switching to the 2-argument getDataValue() call. But I also feel that the 4th alternative, the one which directly invokes the SQLtype constructor, is the cleanest, so I went ahead with a prototype of this approach. Attached is removeSingleArgOverrides.diff, a patch proposal. I think it's appealing because not only does it remove the core problem that this issue is concerned with (ambiguity in method overriding in the DataValueFactory interface), but it also does so by deleting code (which is good), and by making the code more self-evident, I think. For example, consider this bit from SYSCONGLOMERATESRowFactory.java: The current code says: /* 3rd column is CONGLOMERATENUMBER (long) */ row.setColumn(3, dvf.getDataValue(conglomNumber)); The new code says: /* 3rd column is CONGLOMERATENUMBER (long) */ row.setColumn(3, new SQLLongint(conglomNumber)); I think that the new code is clearer. Please have a look at the patch and let me know what you think.
        Hide
        Knut Anders Hatlen added a comment -

        The patch looks like an improvement to me. +1 to commit it as it is.

        Is there any particular reason why you want to keep the single-arg method that takes an Object and returns a UserType?

        I think some of the methods in DataValueFactoryImpl can be simplified even further. For example:

        if (previous == null)

        • return getDataValue(value);
          + { + if (value != null) + return new SQLSmallint(value.shortValue()); + else + return new SQLSmallint(); + }

        Could be: if (previous == null) return new SQLSmallint(value);

        Show
        Knut Anders Hatlen added a comment - The patch looks like an improvement to me. +1 to commit it as it is. Is there any particular reason why you want to keep the single-arg method that takes an Object and returns a UserType? I think some of the methods in DataValueFactoryImpl can be simplified even further. For example: if (previous == null) return getDataValue(value); + { + if (value != null) + return new SQLSmallint(value.shortValue()); + else + return new SQLSmallint(); + } Could be: if (previous == null) return new SQLSmallint(value);
        Hide
        Bryan Pendleton added a comment -

        Thanks Knut for the careful review. I've updated the patch, and
        I think it's looking quite clean.

        My regression test runs were successful; I think this patch is
        ready for any last review prior to commit. Please let me know
        of any additional comments or suggestions.

        Show
        Bryan Pendleton added a comment - Thanks Knut for the careful review. I've updated the patch, and I think it's looking quite clean. My regression test runs were successful; I think this patch is ready for any last review prior to commit. Please let me know of any additional comments or suggestions.
        Hide
        Knut Anders Hatlen added a comment -

        Thanks for the updated patch, Bryan. I don't have any further comments. +1

        Show
        Knut Anders Hatlen added a comment - Thanks for the updated patch, Bryan. I don't have any further comments. +1
        Hide
        Bryan Pendleton added a comment -

        Committed to the trunk as revision 774617.

        Show
        Bryan Pendleton added a comment - Committed to the trunk as revision 774617.

          People

          • Assignee:
            Bryan Pendleton
            Reporter:
            Bryan Pendleton
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development