Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-906

Avatica JdbcMeta statement IDs are not unique

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.5.0
    • Component/s: avatica
    • Labels:
      None

      Description

      There seems to be a concurrency-related problem with the statementId that is generated in the JdbcMeta#createStatement for statements in the statementCache.

      We use avatica to create a driver which uses the remote RPC protocol to wrap an existing jdbc driver on the server. We have a test which performs concurrent queries on multiple connections (using apache commons-pool) which fails often.

      If it fails, the following two things are always observed:

      • A java.lang.AssertionError on the assert in Meta#count(String connectionId, int statementId, long updateCount), resulting in the server to send a http 500.
      • When logging all used connectionId's and statementId's, I observe the same statementId to be re-used for different connectionId's.

      I don't know exactly how these two problems are related, but it looks like statementId's are supposed to be unique and currently they are not.

      The current approach is to use System.identityHashCode(statement) to calculate a statementId. Simply replacing this with a random int seems to solve the problem.

      Depending on what the actual uniqueness requirements for the statementId are, a UUID might be even better (but will have impact on the RPC) or an AtomicInteger.

      I'll attach a patch for the random int fix.

      1. CALCITE-906.patch
        2 kB
        Jan Van Besien
      2. CALCITE-906-attempt-to-reproduce.patch
        6 kB
        Jan Van Besien

        Issue Links

          Activity

          Hide
          julianhyde Julian Hyde added a comment -

          Random is bad for the same reason that identityHashCode is bad, just longer gaps between clashes. I reckon about 2^16, per the birthday paradox.

          I'm going to try AtomicInteger.

          Nick Dimiduk, Vladimir Sitnikov, Josh Elser What do you think?

          Show
          julianhyde Julian Hyde added a comment - Random is bad for the same reason that identityHashCode is bad, just longer gaps between clashes. I reckon about 2^16, per the birthday paradox. I'm going to try AtomicInteger. Nick Dimiduk , Vladimir Sitnikov , Josh Elser What do you think?
          Hide
          vladimirsitnikov Vladimir Sitnikov added a comment -

          1) Do we really want globally-unique statement ids?
          2) I do not like random int either
          3) AtomicInteger per connection might be good enough

          Show
          vladimirsitnikov Vladimir Sitnikov added a comment - 1) Do we really want globally-unique statement ids? 2) I do not like random int either 3) AtomicInteger per connection might be good enough
          Hide
          elserj Josh Elser added a comment -

          I'm going to try AtomicInteger.

          Sounds like a plan for the present. Probably more math+thought to be put into a long-term solution? I think it'd also behoove us to think about callers sharing connectionIDs and statementIDs too (or at least include that in the math).

          Show
          elserj Josh Elser added a comment - I'm going to try AtomicInteger. Sounds like a plan for the present. Probably more math+thought to be put into a long-term solution? I think it'd also behoove us to think about callers sharing connectionIDs and statementIDs too (or at least include that in the math).
          Hide
          vladimirsitnikov Vladimir Sitnikov added a comment -

          Probably more math+thought to be put into a long-term solution?

          AtomicLong?

          Show
          vladimirsitnikov Vladimir Sitnikov added a comment - Probably more math+thought to be put into a long-term solution? AtomicLong?
          Hide
          julianhyde Julian Hyde added a comment - - edited

          Do we really want globally-unique statement ids?

          We don't need globally unique. Just unique within a JdbcMeta (across all connections, for the life of the JdbcMeta).

          Also, we don't need to be able to look at a statement and discern its ID. We generate one when it is created, and store it in the StatementHandle.

          There are not going to be more than 2^31 statements in the lifetime of a JdbcMeta, so we're OK with AtomicInteger.

          Show
          julianhyde Julian Hyde added a comment - - edited Do we really want globally-unique statement ids? We don't need globally unique. Just unique within a JdbcMeta (across all connections, for the life of the JdbcMeta). Also, we don't need to be able to look at a statement and discern its ID. We generate one when it is created, and store it in the StatementHandle. There are not going to be more than 2^31 statements in the lifetime of a JdbcMeta, so we're OK with AtomicInteger.
          Hide
          julianhyde Julian Hyde added a comment -

          For what it's worth, this fix does not solve the intermittent problem in the test suite. Against 777f780 I just got this:

          Tests run: 40, Failures: 1, Errors: 0, Skipped: 4, Time elapsed: 2.152 sec <<< FAILURE! - in org.apache.calcite.avatica.RemoteDriverTest
          testConnectionIsolation[0](org.apache.calcite.avatica.RemoteDriverTest)  Time elapsed: 0.027 sec  <<< FAILURE!
          java.lang.AssertionError: statement creation implicitly creates a connection server-side expected:<2> but was:<1>
                  at org.junit.Assert.fail(Assert.java:88)
                  at org.junit.Assert.failNotEquals(Assert.java:743)
                  at org.junit.Assert.assertEquals(Assert.java:118)
                  at org.junit.Assert.assertEquals(Assert.java:555)
                  at org.apache.calcite.avatica.RemoteDriverTest.testConnectionIsolation(RemoteDriverTest.java:749)
          

          Jan Van Besien Do you actually have a test case for this issue?

          Show
          julianhyde Julian Hyde added a comment - For what it's worth, this fix does not solve the intermittent problem in the test suite. Against 777f780 I just got this: Tests run: 40, Failures: 1, Errors: 0, Skipped: 4, Time elapsed: 2.152 sec <<< FAILURE! - in org.apache.calcite.avatica.RemoteDriverTest testConnectionIsolation[0](org.apache.calcite.avatica.RemoteDriverTest) Time elapsed: 0.027 sec <<< FAILURE! java.lang.AssertionError: statement creation implicitly creates a connection server-side expected:<2> but was:<1> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.calcite.avatica.RemoteDriverTest.testConnectionIsolation(RemoteDriverTest.java:749) Jan Van Besien Do you actually have a test case for this issue?
          Hide
          janvanbesien Jan Van Besien added a comment -

          We have a test in our test suite (not calcite) which fails consistently on this problem (and works with the random-fix). I've attached a patch where I try to do the same within calcite's test suite, but so far I am unable to reproduce it there. Our jdbc driver is considerably slower than hsqldb, so that might have something to do with it.

          Show
          janvanbesien Jan Van Besien added a comment - We have a test in our test suite (not calcite) which fails consistently on this problem (and works with the random-fix). I've attached a patch where I try to do the same within calcite's test suite, but so far I am unable to reproduce it there. Our jdbc driver is considerably slower than hsqldb, so that might have something to do with it.
          Hide
          ndimiduk Nick Dimiduk added a comment -

          We'll need unique connection ID's as well. HBase uses a combination of a unique hash per client + random number. Maybe we can do similarly for connections, and with a connection hash plus random number for statements. https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/PerClientRandomNonceGenerator.java

          Show
          ndimiduk Nick Dimiduk added a comment - We'll need unique connection ID's as well. HBase uses a combination of a unique hash per client + random number. Maybe we can do similarly for connections, and with a connection hash plus random number for statements. https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/PerClientRandomNonceGenerator.java
          Hide
          julianhyde Julian Hyde added a comment -

          Fixed in http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/afcf3a6c. (Not including unique connection IDs.)

          Show
          julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/afcf3a6c . (Not including unique connection IDs.)
          Hide
          bruno Bruno Dumon added a comment -

          I think there is a security concern in the current implementation, since JdbcMeta.fetch() relies soleley on the statement id to return the next frame, and since they are sequential they are easy to guess, so I could attempt to fetch frames from other's statements. The problem is not that the statement id is sequential, but rather that we don't verify the connection id (which is a uuid) in fetch() requests.

          Show
          bruno Bruno Dumon added a comment - I think there is a security concern in the current implementation, since JdbcMeta.fetch() relies soleley on the statement id to return the next frame, and since they are sequential they are easy to guess, so I could attempt to fetch frames from other's statements. The problem is not that the statement id is sequential, but rather that we don't verify the connection id (which is a uuid) in fetch() requests.
          Hide
          julianhyde Julian Hyde added a comment -

          OK, let me state for the record that Avatica is not secure. If we want to introduce certain kinds of security, please log a JIRA case for each. Using an integer random-number generator for statement IDs is a half-solution to security, which is worse than no solution at all.

          Show
          julianhyde Julian Hyde added a comment - OK, let me state for the record that Avatica is not secure. If we want to introduce certain kinds of security, please log a JIRA case for each. Using an integer random-number generator for statement IDs is a half-solution to security, which is worse than no solution at all.
          Hide
          elserj Josh Elser added a comment -

          Not including unique connection IDs

          Julian Hyde, any concerns with closing this out given your commit on unique statement IDs and then opening up a new issue to track the unique connection IDs?

          Show
          elserj Josh Elser added a comment - Not including unique connection IDs Julian Hyde , any concerns with closing this out given your commit on unique statement IDs and then opening up a new issue to track the unique connection IDs?
          Hide
          julianhyde Julian Hyde added a comment -

          Josh Elser, Oops, yes, should have marked fixed. Done now.

          Show
          julianhyde Julian Hyde added a comment - Josh Elser , Oops, yes, should have marked fixed. Done now.
          Hide
          elserj Josh Elser added a comment -

          Thanks. Filed CALCITE-924 to look at unique ConnectionIDs

          Show
          elserj Josh Elser added a comment - Thanks. Filed CALCITE-924 to look at unique ConnectionIDs
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Resolved in release 1.5.0 (2015-11-10)

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Resolved in release 1.5.0 (2015-11-10)

            People

            • Assignee:
              julianhyde Julian Hyde
              Reporter:
              janvanbesien Jan Van Besien
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development