Hive
  1. Hive
  2. HIVE-4232

JDBC2 HiveConnection has odd defaults

    Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.11.0, 0.12.0
    • Fix Version/s: None
    • Component/s: HiveServer2, JDBC
    • Labels:
      None

      Description

      HiveConnection defaults to using a plain SASL transport if auth is not set. To get a raw transport auth must be set to noSasl; furthermore noSasl is case sensitive. Code tries to infer Kerberos or plain authentication based on the presence of principal. There is no provision for specifying QOP level.

      1. HIVE-4232.patch
        4 kB
        Chris Drome
      2. HIVE-4232-1.patch
        6 kB
        Chris Drome
      3. HIVE-4232-2.patch
        11 kB
        Chris Drome
      4. HIVE-4232-3-0.11.patch
        11 kB
        Chris Drome
      5. HIVE-4232-3-trunk.patch
        11 kB
        Chris Drome
      6. HIVE-4232-4-0.11.patch
        13 kB
        Chris Drome
      7. HIVE-4232-4-trunk.patch
        13 kB
        Chris Drome

        Issue Links

          Activity

          Hide
          Chris Drome added a comment -

          The patch proposes that if auth is not specified or auth=none the transport will default to TSocket. If auth=kerberos|plain then the Kerberos SASL transport or the Plain SASL transport will be used. If auth=kerberos then principal must also be specified. We propose controlling the QOP with another parameter qop=auth|auth-int|auth-conf. The patch also takes care of case-sensitive value comparisons.

          I feel that these changes result in a more reasonable set of defaults and don't rely upon inferring Kerberos or plain based on the existence of other parameters.

          I will rebase HIVE-4225 if these proposed changes are accepted.

          Show
          Chris Drome added a comment - The patch proposes that if auth is not specified or auth=none the transport will default to TSocket. If auth=kerberos|plain then the Kerberos SASL transport or the Plain SASL transport will be used. If auth=kerberos then principal must also be specified. We propose controlling the QOP with another parameter qop=auth|auth-int|auth-conf. The patch also takes care of case-sensitive value comparisons. I feel that these changes result in a more reasonable set of defaults and don't rely upon inferring Kerberos or plain based on the existence of other parameters. I will rebase HIVE-4225 if these proposed changes are accepted.
          Hide
          Chris Drome added a comment -

          Uploaded patch which implements the proposed changes.

          Show
          Chris Drome added a comment - Uploaded patch which implements the proposed changes.
          Hide
          Chris Drome added a comment -

          Would affect changes support QOP.

          Show
          Chris Drome added a comment - Would affect changes support QOP.
          Hide
          Ashutosh Chauhan added a comment -

          Chris Drome can you create a phabricator or RB link for this change?

          Show
          Ashutosh Chauhan added a comment - Chris Drome can you create a phabricator or RB link for this change?
          Hide
          Chris Drome added a comment -
          Show
          Chris Drome added a comment - Phabricator link: https://reviews.facebook.net/D9879
          Hide
          Chris Drome added a comment -

          Hi Ashutosh,

          I will be out of the office from Thursday, so I was wondering if I could
          get your feedback on the HIVE-4232 patch as soon as possible. That way I
          would be able to make changes or address concerns before I leave.

          I've created the fabricator ticket.

          https://reviews.facebook.net/D9879

          Thanks,

          chris

          Show
          Chris Drome added a comment - Hi Ashutosh, I will be out of the office from Thursday, so I was wondering if I could get your feedback on the HIVE-4232 patch as soon as possible. That way I would be able to make changes or address concerns before I leave. I've created the fabricator ticket. https://reviews.facebook.net/D9879 Thanks, chris
          Hide
          Thejas M Nair added a comment -

          +1 (non binding)

          Show
          Thejas M Nair added a comment - +1 (non binding)
          Hide
          Ashutosh Chauhan added a comment -

          +1 looks good to me too. It will be good to add a testcase, especially since there is no existing testcase for url strings. You can spin off a HiveServer2 in a thread and then have a jdbc client connect to it with urls having plain and default connection string, passing on username and password where supported.

          Show
          Ashutosh Chauhan added a comment - +1 looks good to me too. It will be good to add a testcase, especially since there is no existing testcase for url strings. You can spin off a HiveServer2 in a thread and then have a jdbc client connect to it with urls having plain and default connection string, passing on username and password where supported.
          Hide
          Chris Drome added a comment -

          In the process of writing some unit tests for the connection string changes I noticed that the connection type for HiveServer2 defaults to NONE, which uses the Plain SASL Transport Factory. So in this sense the original implementation of HiveConnection matches for the default case.

          hive.server2.authentication can take one of the following values:
          NOSASL
          NONE (default)
          LDAP
          KERBEROS
          CUSTOM

          I think NONE is misleading and should be renamed PLAIN (or something similar), and NOSASL should be labelled NONE. And that the default usage of Plain SASL Transport Factory seems odd. The default behavior of HiveConnection should match that of HiveServer2.

          I'd like to ask your opinion of what the default behavior should be. Should it default to the Plain SASL transport or no authentication at all?

          Show
          Chris Drome added a comment - In the process of writing some unit tests for the connection string changes I noticed that the connection type for HiveServer2 defaults to NONE, which uses the Plain SASL Transport Factory. So in this sense the original implementation of HiveConnection matches for the default case. hive.server2.authentication can take one of the following values: NOSASL NONE (default) LDAP KERBEROS CUSTOM I think NONE is misleading and should be renamed PLAIN (or something similar), and NOSASL should be labelled NONE. And that the default usage of Plain SASL Transport Factory seems odd. The default behavior of HiveConnection should match that of HiveServer2. I'd like to ask your opinion of what the default behavior should be. Should it default to the Plain SASL transport or no authentication at all?
          Hide
          Chris Drome added a comment -

          Changes associated with previous comment.

          Show
          Chris Drome added a comment - Changes associated with previous comment.
          Hide
          Chris Drome added a comment -

          Working on some test cases, but need to resolve a classpath issue before I can include them.

          Show
          Chris Drome added a comment - Working on some test cases, but need to resolve a classpath issue before I can include them.
          Hide
          Ashutosh Chauhan added a comment -

          I agree with your nomenclature. NONE should really mean no special setting is required and raw thrift sockets are used. And lets call PLAIN (of new world) as PLAINSASL to make it clear that its sasl without auth.
          And lets make NONE (which uses raw thrift sockets) the default (ie if user has not specified anything we will assume NONE)

          Show
          Ashutosh Chauhan added a comment - I agree with your nomenclature. NONE should really mean no special setting is required and raw thrift sockets are used. And lets call PLAIN (of new world) as PLAINSASL to make it clear that its sasl without auth. And lets make NONE (which uses raw thrift sockets) the default (ie if user has not specified anything we will assume NONE)
          Hide
          Prasad Mujumdar added a comment -

          Ashutosh Chauhan thanks for the feedback.

          Here are a couple of things to consider. Thrift SASL transport doesn't detect if the other peer is not using the SASL transport (THRIFT-1774). Hence if the hiveserver and client are not using the same transport then the connection simply hangs. Hence its using SASL transport as default and NOSASL is more of a backdoor if you are using some non-java bindings that doesn't have thrift sasl transport (and hence not listed in the hive-defaults).

          IMO making non-sasl as default will lead to more problems than what it intends to solve. For an end user , it doesn't matter what underlying authentication libraries are used for the implementation. The normal authentication options None, LDAP, Kerberos and Custom are self explanatory from point of view.

          Show
          Prasad Mujumdar added a comment - Ashutosh Chauhan thanks for the feedback. Here are a couple of things to consider. Thrift SASL transport doesn't detect if the other peer is not using the SASL transport ( THRIFT-1774 ). Hence if the hiveserver and client are not using the same transport then the connection simply hangs. Hence its using SASL transport as default and NOSASL is more of a backdoor if you are using some non-java bindings that doesn't have thrift sasl transport (and hence not listed in the hive-defaults). IMO making non-sasl as default will lead to more problems than what it intends to solve. For an end user , it doesn't matter what underlying authentication libraries are used for the implementation. The normal authentication options None, LDAP, Kerberos and Custom are self explanatory from point of view.
          Hide
          Chris Drome added a comment -

          Updated patch with your suggestions. Added some test cases.

          The test cases passed on Hadoop-0.23 and Hadoop-2.0.
          They failed on Hadoop-1.0 because of classpath issues that use conflicting versions of slf4j at runtime. I didn't have time to fix this.

          I think Thiruvel will be able to handle any blockers while I'm away.

          Show
          Chris Drome added a comment - Updated patch with your suggestions. Added some test cases. The test cases passed on Hadoop-0.23 and Hadoop-2.0. They failed on Hadoop-1.0 because of classpath issues that use conflicting versions of slf4j at runtime. I didn't have time to fix this. I think Thiruvel will be able to handle any blockers while I'm away.
          Hide
          Carl Steinbach added a comment -

          @Chris: I left a comment on phabricator. Also, can you please respond to the point Prasad raised about changing the default from PLAIN to NONE? Thanks.

          Show
          Carl Steinbach added a comment - @Chris: I left a comment on phabricator. Also, can you please respond to the point Prasad raised about changing the default from PLAIN to NONE? Thanks.
          Hide
          Chris Drome added a comment -

          Prasad MujumdarCarl Steinbach: I think Prasad's point is fair. It explains why the defaults were set the way they were and should probably be documented somewhere in the code.

          We have a follow-up patch which adds functionality to set the QOP of the transport, so that data encryption can be enabled on the client-server transport. This will require some changes to the JDBC connection string and I wanted to get buy-in to modify the existing format of the connection string before posting that patch.

          I still feel the naming conventions are a little misleading and the JDBC connection string tries to infer the state. However, if everyone agrees that NOSASL should generally not be used, then I won't push that issue. Although it doesn't help in the case a client library adds auth=nosasl, resulting in the connection hanging.

          Show
          Chris Drome added a comment - Prasad Mujumdar Carl Steinbach : I think Prasad's point is fair. It explains why the defaults were set the way they were and should probably be documented somewhere in the code. We have a follow-up patch which adds functionality to set the QOP of the transport, so that data encryption can be enabled on the client-server transport. This will require some changes to the JDBC connection string and I wanted to get buy-in to modify the existing format of the connection string before posting that patch. I still feel the naming conventions are a little misleading and the JDBC connection string tries to infer the state. However, if everyone agrees that NOSASL should generally not be used, then I won't push that issue. Although it doesn't help in the case a client library adds auth=nosasl, resulting in the connection hanging.
          Hide
          Carl Steinbach added a comment -

          Just to clarify I'm fine with changing the names of the property values (e.g. NOSASL --> NONE, NONE --> PLAIN), but I think we should probably stick with the current default value (PLAIN) since it makes it less likely that a user will encounter the hang issue described in THRIFT-1774.

          Show
          Carl Steinbach added a comment - Just to clarify I'm fine with changing the names of the property values (e.g. NOSASL --> NONE, NONE --> PLAIN), but I think we should probably stick with the current default value (PLAIN) since it makes it less likely that a user will encounter the hang issue described in THRIFT-1774 .
          Hide
          Prasad Mujumdar added a comment -

          Chris Drome Thanks for looking into it further. I agree with your comment regarding JDBC application explicitly adding "nosasl". In such case the application is doing a customization by overriding the default, so they should know what they are doing ..
          Also you are correct about this details not being well documented. I will go ahead and add new advance/custom configuratoin sections to Setting Up Hive Server as well as HiveServer2 Clients. Thanks for bringing that out!

          Carl Steinbach Thanks for providing feedback.
          I would still like to argue that the "PLAIN" is more tied to the internal implementation and not indicative of the behavior, especially when its a default. We can perhaps change NOSASL to RAW or THRIFT, or leave it with the 'sasl' reference as its more of advance option.
          Another thing to keep in mind that was available as a patch for HiveServer2 for a very long time. Even though its committed to trunk in 0.11, the community has started using the patch on top of Hive 0.10 as well as through multiple popular distros. Changing the configuration will be a backward incompatible change for such users. This one is not just flipping a config switch, but would also require changing JDBC code and recompiling/redeploying the application.

          Show
          Prasad Mujumdar added a comment - Chris Drome Thanks for looking into it further. I agree with your comment regarding JDBC application explicitly adding "nosasl". In such case the application is doing a customization by overriding the default, so they should know what they are doing .. Also you are correct about this details not being well documented. I will go ahead and add new advance/custom configuratoin sections to Setting Up Hive Server as well as HiveServer2 Clients . Thanks for bringing that out! Carl Steinbach Thanks for providing feedback. I would still like to argue that the "PLAIN" is more tied to the internal implementation and not indicative of the behavior, especially when its a default. We can perhaps change NOSASL to RAW or THRIFT, or leave it with the 'sasl' reference as its more of advance option. Another thing to keep in mind that was available as a patch for HiveServer2 for a very long time. Even though its committed to trunk in 0.11, the community has started using the patch on top of Hive 0.10 as well as through multiple popular distros. Changing the configuration will be a backward incompatible change for such users. This one is not just flipping a config switch, but would also require changing JDBC code and recompiling/redeploying the application.
          Hide
          Chris Drome added a comment -

          I was thinking about this a little more and would like to summarize the current state:

          hive-site.xml -> transport -> JDBC connection string

          1. hive.server2.authentication=NOSASL -> raw transport -> jdbc:hive2://host:port/dbname;auth=noSasl
          2. hive.server2.authentication=NONE -> plain SASL transport -> jdbc:hive2://host:port/dbname (DEFAULT)
          3. hive.server2.authentication=KERBEROS -> Kerberos SASL transport -> jdbc:hive2://host:port/dbname;principal=<principal>

          So we need to set auth=noSasl to disable SASL instead of auth=plainsasl|kerberos to enable SASL.

          Now if the server is set to Kerberos, then the client still has to know this because it if they exclude the principal parameter it will happily initialize a plain SASL transport because the code infers SASL/Kerberos based on the existence of that value. Granted, in this case the connection throws an exception.

          If it was a requirement to include say auth=plainsasl|kerberos in the connection string, then the code wouldn't have to guess what the intent was and could communicate an error if auth=kerberos and principal is not present.

          Furthermore, from what I can see, the plain SASL is really just a no-op as it is not doing anything when authentication=NONE.

          In the end, it comes down to the fact that the client still needs to know the authentication method of the server.
          That is why I asked whether we had data about the average use case. Do we know how often people are using raw vs plain vs kerberos?

          It seems that the real issue is that a SASL transport doesn't play nicely with a raw transport. If it would be possible to remove the need to support the raw transport, that would resolve this discussion. Even the raw transport much speak thrift, so that isn't an issue. Rather it is a question of whether client should be forced to implement a SASL transport.

          Show
          Chris Drome added a comment - I was thinking about this a little more and would like to summarize the current state: hive-site.xml -> transport -> JDBC connection string 1. hive.server2.authentication=NOSASL -> raw transport -> jdbc:hive2://host:port/dbname;auth=noSasl 2. hive.server2.authentication=NONE -> plain SASL transport -> jdbc:hive2://host:port/dbname ( DEFAULT ) 3. hive.server2.authentication=KERBEROS -> Kerberos SASL transport -> jdbc:hive2://host:port/dbname;principal=<principal> So we need to set auth=noSasl to disable SASL instead of auth=plainsasl|kerberos to enable SASL. Now if the server is set to Kerberos, then the client still has to know this because it if they exclude the principal parameter it will happily initialize a plain SASL transport because the code infers SASL/Kerberos based on the existence of that value. Granted, in this case the connection throws an exception. If it was a requirement to include say auth=plainsasl|kerberos in the connection string, then the code wouldn't have to guess what the intent was and could communicate an error if auth=kerberos and principal is not present. Furthermore, from what I can see, the plain SASL is really just a no-op as it is not doing anything when authentication=NONE. In the end, it comes down to the fact that the client still needs to know the authentication method of the server. That is why I asked whether we had data about the average use case. Do we know how often people are using raw vs plain vs kerberos? It seems that the real issue is that a SASL transport doesn't play nicely with a raw transport. If it would be possible to remove the need to support the raw transport, that would resolve this discussion. Even the raw transport much speak thrift, so that isn't an issue. Rather it is a question of whether client should be forced to implement a SASL transport.
          Hide
          Mithun Radhakrishnan added a comment -

          Hello, Carl Steinbach. Hi, Prasad Mujumdar.

          Could you please help us understand the rationale behind the choice of default (plainsasl, i.e. erstwhile NONE)? I mean, before the Thrift-hang. The manner in which the auth-protocol is deduced is far from intuitive.

          The Thrift-1774 bug is an unfortunate aside that one wishes could be resolved in Thrift. While I concede HiveServer2 has been in trunk for a while (and might well have customers using it), it's not been in an official Hive release prior to 0.11. Surely, now (i.e. before 0.11 release) is the opportune moment to change the defaults and fix the format of the jdbc-connect-string?

          Chris Drome does make a valid point about the client needing to be in sync with the server's auth method anyway.

          Show
          Mithun Radhakrishnan added a comment - Hello, Carl Steinbach . Hi, Prasad Mujumdar . Could you please help us understand the rationale behind the choice of default (plainsasl, i.e. erstwhile NONE)? I mean, before the Thrift-hang. The manner in which the auth-protocol is deduced is far from intuitive. The Thrift-1774 bug is an unfortunate aside that one wishes could be resolved in Thrift. While I concede HiveServer2 has been in trunk for a while (and might well have customers using it), it's not been in an official Hive release prior to 0.11. Surely, now (i.e. before 0.11 release) is the opportune moment to change the defaults and fix the format of the jdbc-connect-string? Chris Drome does make a valid point about the client needing to be in sync with the server's auth method anyway.
          Hide
          Thejas M Nair added a comment -

          Mithun Radhakrishnan THRIFT-1774 is the unfortunate reality, and I don't see anybody having signed up to fix it.
          From a user experience perspective, it would be better to get an error message (well, a stack trace in this case!) instead of the command hanging if the user tries to connect to a kerberos-secure server without specifying the correct auth.
          I don't see any advantages of using raw transport as default (one client and server). I agree that the auth names are misleading and we should change that.

          Here is my proposal -
          hive-site.xml -> transport -> JDBC connection string
          1. (fix to make compare case insensitive for all auth types ) hive.server2.authentication=NOSASL -> raw transport -> jdbc:hive2://host:port/dbname;auth=nosasl
          2. (change the param name, this remains default) hive.server2.authentication=PLAINSASL -> plain SASL transport -> jdbc:hive2://host:port/dbname (DEFAULT)
          3. (check/support auth=kerberos in jdbc url) hive.server2.authentication=KERBEROS -> Kerberos SASL transport -> jdbc:hive2://host:port/dbname;auth=kerberos,principal=<principal>

          Show
          Thejas M Nair added a comment - Mithun Radhakrishnan THRIFT-1774 is the unfortunate reality, and I don't see anybody having signed up to fix it. From a user experience perspective, it would be better to get an error message (well, a stack trace in this case!) instead of the command hanging if the user tries to connect to a kerberos-secure server without specifying the correct auth. I don't see any advantages of using raw transport as default (one client and server). I agree that the auth names are misleading and we should change that. Here is my proposal - hive-site.xml -> transport -> JDBC connection string 1. (fix to make compare case insensitive for all auth types ) hive.server2.authentication=NOSASL -> raw transport -> jdbc:hive2://host:port/dbname;auth=nosasl 2. (change the param name, this remains default) hive.server2.authentication=PLAINSASL -> plain SASL transport -> jdbc:hive2://host:port/dbname (DEFAULT) 3. (check/support auth=kerberos in jdbc url) hive.server2.authentication=KERBEROS -> Kerberos SASL transport -> jdbc:hive2://host:port/dbname;auth=kerberos,principal=<principal>
          Hide
          Chris Drome added a comment -

          WIP patch incorporating the comments. NOSASL/NONE transport layer test failing with latest trunk code.

          Show
          Chris Drome added a comment - WIP patch incorporating the comments. NOSASL/NONE transport layer test failing with latest trunk code.
          Hide
          Chris Drome added a comment -

          New patch fixes test failure.

          Show
          Chris Drome added a comment - New patch fixes test failure.
          Hide
          Chris Drome added a comment -

          Missed a file.

          Show
          Chris Drome added a comment - Missed a file.
          Hide
          Chris Drome added a comment -

          Uploaded renamed files.

          Show
          Chris Drome added a comment - Uploaded renamed files.
          Hide
          Chris Drome added a comment -

          Fixed classpath problems with 20S build.

          Show
          Chris Drome added a comment - Fixed classpath problems with 20S build.
          Hide
          Vaibhav Gumashta added a comment -

          Hi Prasad Mujumdar, possible to take a look at this again? I feel the way auth parameters are passed in JDBC uri is quite confusing and not very clean. NONE/NOSASL is also misleading and exposes a risk of misinterpretation while developing, which will compound as more stuff is built on this.

          My proposal (similar to Thejas M Nair):
          1. Have all comparison case insensitive.
          2. While parsing the uri in Utils#parseURL, set the auth config parameter for each mode. When it is not specified, set it to the default of PLAINSASL. This cleans HiveConnection#openTransport.
          3. hive.server2.authentication=NOSASL -> Raw transport -> jdbc:hive2://host:port/dbname;auth=nosasl
          4. hive.server2.authentication=PLAINSASL -> Plain SASL transport -> jdbc:hive2://host:port/dbname;auth=plainsasl(or no-auth specified)
          5. hive.server2.authentication=KERBEROS -> Kerberos SASL transport -> jdbc:hive2://host:port/dbname;auth=kerberos;principal=<principal>;qop=<auth|auth-int|auth-conf>

          Show
          Vaibhav Gumashta added a comment - Hi Prasad Mujumdar , possible to take a look at this again? I feel the way auth parameters are passed in JDBC uri is quite confusing and not very clean. NONE/NOSASL is also misleading and exposes a risk of misinterpretation while developing, which will compound as more stuff is built on this. My proposal (similar to Thejas M Nair ): 1. Have all comparison case insensitive. 2. While parsing the uri in Utils#parseURL, set the auth config parameter for each mode. When it is not specified, set it to the default of PLAINSASL. This cleans HiveConnection#openTransport. 3. hive.server2.authentication=NOSASL -> Raw transport -> jdbc:hive2://host:port/dbname;auth=nosasl 4. hive.server2.authentication=PLAINSASL -> Plain SASL transport -> jdbc:hive2://host:port/dbname;auth=plainsasl(or no-auth specified) 5. hive.server2.authentication=KERBEROS -> Kerberos SASL transport -> jdbc:hive2://host:port/dbname;auth=kerberos;principal=<principal>;qop=<auth|auth-int|auth-conf>

            People

            • Assignee:
              Chris Drome
              Reporter:
              Chris Drome
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:

                Development