Hive
  1. Hive
  2. HIVE-1176

'create if not exists' fails for a table name with 'select' in it

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.6.0
    • Component/s: Metastore, Query Processor
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      hive> create table if not exists tmp_select(s string, c string, n int);
      org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: javax.jdo.JDOUserException JDOQL Single-String query should always start with SELECT)
      at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
      at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
      at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
      at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
      at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
      at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
      at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
      at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
      at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
      at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
      at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
      Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException JDOQL Single-String query should always start with SELECT)
      at org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
      at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
      at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
      ... 15 more

      1. HIVE-1176-6.patch
        69 kB
        Arvind Prabhakar
      2. HIVE-1176-5.patch
        69 kB
        Arvind Prabhakar
      3. HIVE-1176-4.patch
        69 kB
        Arvind Prabhakar
      4. HIVE-1176-3.patch
        66 kB
        Arvind Prabhakar
      5. HIVE-1176-2.patch
        61 kB
        Arvind Prabhakar
      6. HIVE-1176-1.patch
        60 kB
        Arvind Prabhakar
      7. HIVE-1176.lib-files.tar.gz
        3.20 MB
        Arvind Prabhakar
      8. HIVE-1176.patch
        99 kB
        Arvind Prabhakar

        Issue Links

          Activity

          Hide
          Ashish Thusoo added a comment -

          Can you provide a reproducible test case for this bug?

          Show
          Ashish Thusoo added a comment - Can you provide a reproducible test case for this bug?
          Hide
          Ashish Thusoo added a comment -

          my bad. There is already a test case

          Show
          Ashish Thusoo added a comment - my bad. There is already a test case
          Hide
          Arvind Prabhakar added a comment -

          This problem is due to a bug in Datanucleus JDOQL implementation and has been fixed in version 2.0.x.

          The fix is therefore to upgrade datanucleus plugins to the latest stable release.

          Details:

          • Replaced the old datanucleus plugins version 1.1.2 with the latest stable release.
          • Updated jdo2-api library with the version required by datanucleus - 2.3-0ec, available from datanucleus maven repository at http://www.datanucleus.org/downloads/maven2/javax/jdo/jdo2-api/2.3-ec/, Apache licensed
          • Modified the build files to suppress auto-enhancement of all complied classes, a new feature introduced in the latest version.
          • Modified the HiveMetaStoreClient implementation to create deep-copies of non-primitive objects being returned from the thrift server. Without this change, some collections were being fetched as semi-populated proxies with missing session context leading to NPEs.

          Testing Done:
          Built and ran all tests. Only two failures were reported - clientpositive test for input20.q and input33.q. These tests appear to be failing on the trunk as well.

          Show
          Arvind Prabhakar added a comment - This problem is due to a bug in Datanucleus JDOQL implementation and has been fixed in version 2.0.x. The fix is therefore to upgrade datanucleus plugins to the latest stable release. Details: Replaced the old datanucleus plugins version 1.1.2 with the latest stable release. Updated jdo2-api library with the version required by datanucleus - 2.3-0ec, available from datanucleus maven repository at http://www.datanucleus.org/downloads/maven2/javax/jdo/jdo2-api/2.3-ec/ , Apache licensed Modified the build files to suppress auto-enhancement of all complied classes, a new feature introduced in the latest version. Modified the HiveMetaStoreClient implementation to create deep-copies of non-primitive objects being returned from the thrift server. Without this change, some collections were being fetched as semi-populated proxies with missing session context leading to NPEs. Testing Done: Built and ran all tests. Only two failures were reported - clientpositive test for input20.q and input33.q. These tests appear to be failing on the trunk as well.
          Hide
          Arvind Prabhakar added a comment -

          This patch replaces the patch submitted before (HIVE-1176.lib-files.tar.gz and HIVE-1176.patch).

          HIVE-1176-1.patch:
          
          #	modified:   build.properties
          #	modified:   build.xml
          #	modified:   eclipse-templates/.classpath
          #	modified:   ivy/ivysettings.xml
          #	deleted:    lib/datanucleus-core-1.1.2.LICENSE
          #	deleted:    lib/datanucleus-core-1.1.2.jar
          #	deleted:    lib/datanucleus-enhancer-1.1.2.LICENSE
          #	deleted:    lib/datanucleus-enhancer-1.1.2.jar
          #	deleted:    lib/datanucleus-rdbms-1.1.2.LICENSE
          #	deleted:    lib/datanucleus-rdbms-1.1.2.jar
          #	deleted:    lib/jdo2-api-2.3-SNAPSHOT.LICENSE
          #	deleted:    lib/jdo2-api-2.3-SNAPSHOT.jar
          #	modified:   metastore/ivy.xml
          #	modified:   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
          
          Show
          Arvind Prabhakar added a comment - This patch replaces the patch submitted before ( HIVE-1176 .lib-files.tar.gz and HIVE-1176 .patch). HIVE-1176-1.patch: # modified: build.properties # modified: build.xml # modified: eclipse-templates/.classpath # modified: ivy/ivysettings.xml # deleted: lib/datanucleus-core-1.1.2.LICENSE # deleted: lib/datanucleus-core-1.1.2.jar # deleted: lib/datanucleus-enhancer-1.1.2.LICENSE # deleted: lib/datanucleus-enhancer-1.1.2.jar # deleted: lib/datanucleus-rdbms-1.1.2.LICENSE # deleted: lib/datanucleus-rdbms-1.1.2.jar # deleted: lib/jdo2-api-2.3-SNAPSHOT.LICENSE # deleted: lib/jdo2-api-2.3-SNAPSHOT.jar # modified: metastore/ivy.xml # modified: metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
          Hide
          Arvind Prabhakar added a comment -

          Updated the patch so that it cleanly applies to the trunk.

          Changes from Previous Patch:

          • This patch uses ivy to download the updated datanucleus plugins and other dependent libraries. There is no need to use the previously supplied tar.gz anymore.
          • At the time the previous patch was written, the enhancer plugin version was 2.0.1 which by default would enable annotation processing. Since then version 2.0.3 has been released which disables this behavior. Hence the previously submitted changes to javac.args in build.properties file are no longer necessary. This patch uses the updated version of the datanucleus enhancer plugin.
          • The JDO2 API library used by datanucleus plugin is distributed by the datanucleus's public maven repository. This repository has been added to ivy configuration to automate the download.
          • The connection pool libraries have been updated to work with the newer datanucleus plugins. Library commons-dbcp has been updated from 1.2.2 to 1.4, commons-pool from 1.2 to 1.5.4, and datanucleus-connectionpool from 1.0.2 to 2.0.1.
          • As with the previously submitted patch, HiveMetaStoreClient implementation has been modified to create deep-copies of non-primitive objects being returned from the thrift server in order to avoid semi-populated proxies causing NPEs downstream.

          Testing Done:

          Built and ran all tests. Only two failures were reported as before - clientpositive test for input20.q and input33.q. These tests appear to be failing on the trunk as well.

          Show
          Arvind Prabhakar added a comment - Updated the patch so that it cleanly applies to the trunk. Changes from Previous Patch: This patch uses ivy to download the updated datanucleus plugins and other dependent libraries. There is no need to use the previously supplied tar.gz anymore. At the time the previous patch was written, the enhancer plugin version was 2.0.1 which by default would enable annotation processing. Since then version 2.0.3 has been released which disables this behavior . Hence the previously submitted changes to javac.args in build.properties file are no longer necessary. This patch uses the updated version of the datanucleus enhancer plugin. The JDO2 API library used by datanucleus plugin is distributed by the datanucleus's public maven repository. This repository has been added to ivy configuration to automate the download. The connection pool libraries have been updated to work with the newer datanucleus plugins. Library commons-dbcp has been updated from 1.2.2 to 1.4, commons-pool from 1.2 to 1.5.4, and datanucleus-connectionpool from 1.0.2 to 2.0.1. As with the previously submitted patch, HiveMetaStoreClient implementation has been modified to create deep-copies of non-primitive objects being returned from the thrift server in order to avoid semi-populated proxies causing NPEs downstream. Testing Done: Built and ran all tests. Only two failures were reported as before - clientpositive test for input20.q and input33.q. These tests appear to be failing on the trunk as well.
          Hide
          John Sichi added a comment -

          What failures are you seeing on trunk for input20.q and input33.q? I do see recent Hudson sporadic failures for the flaky negative tests script_broken_pipe1 and loadpart_err, but not input20.q and input33.q.

          http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.20/

          Show
          John Sichi added a comment - What failures are you seeing on trunk for input20.q and input33.q? I do see recent Hudson sporadic failures for the flaky negative tests script_broken_pipe1 and loadpart_err, but not input20.q and input33.q. http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.20/
          Hide
          Arvind Prabhakar added a comment -

          John - I debugged the failures that I was seeing for input20 and input33 and it turns out to be a subtle difference in the way the stream editor sed works on Mac vs the regular linux distribution. I installed the GNU port for sed and the failures no longer occur.

          I don't think this is related to the sporadic failures that are reported on hudson.

          Show
          Arvind Prabhakar added a comment - John - I debugged the failures that I was seeing for input20 and input33 and it turns out to be a subtle difference in the way the stream editor sed works on Mac vs the regular linux distribution. I installed the GNU port for sed and the failures no longer occur. I don't think this is related to the sporadic failures that are reported on hudson.
          Hide
          Carl Steinbach added a comment -

          @Arvind: Can you explain what the difference is between OSX sed and GNU sed? Can we rewrite the sed script so that it works on both platforms? Seems like we are probably relying on a non-POSIX feature that is present in the GNU version.

          Show
          Carl Steinbach added a comment - @Arvind: Can you explain what the difference is between OSX sed and GNU sed? Can we rewrite the sed script so that it works on both platforms? Seems like we are probably relying on a non-POSIX feature that is present in the GNU version.
          Hide
          Arvind Prabhakar added a comment -

          I think the difference is more likely a bug in Mac OSX version of sed. Specifically, it fails to process directives with escaped tab sequence characters and instead treats it as unescaped. For example, the command to replace first occurrence of b in the string abc with a tab character \t fails as shown below:

          $ echo "abc" | /usr/bin/sed "s@b@\t@"
          atc
          

          Whereas this works fine with the GNU distribution of sed

          $ echo "abc" | /opt/local/bin/sed "s@b@\t@"
          a	c
          
          Show
          Arvind Prabhakar added a comment - I think the difference is more likely a bug in Mac OSX version of sed . Specifically, it fails to process directives with escaped tab sequence characters and instead treats it as unescaped. For example, the command to replace first occurrence of b in the string abc with a tab character \t fails as shown below: $ echo "abc" | /usr/bin/sed "s@b@\t@" atc Whereas this works fine with the GNU distribution of sed $ echo "abc" | /opt/local/bin/sed "s@b@\t@" a c
          Hide
          Paul Yang added a comment -

          Can you elaborate on what you mean by 'some collections were being fetched as semi-populated proxies with missing session context leading to NPEs'? Is there something I can do to reproduce this?

          Show
          Paul Yang added a comment - Can you elaborate on what you mean by 'some collections were being fetched as semi-populated proxies with missing session context leading to NPEs'? Is there something I can do to reproduce this?
          Hide
          Arvind Prabhakar added a comment -

          Updating the patch with latest trunk image. This is necessary since HIVE-1373 updated the eclipse classpath with connection pool libraries which will be outdated with the application of this patch. The updated version of the patch takes care of this problem by updating eclipse classpath to use the updated libraries instead. Tested out launch configuration via eclipse to make sure it is working.

          Show
          Arvind Prabhakar added a comment - Updating the patch with latest trunk image. This is necessary since HIVE-1373 updated the eclipse classpath with connection pool libraries which will be outdated with the application of this patch. The updated version of the patch takes care of this problem by updating eclipse classpath to use the updated libraries instead. Tested out launch configuration via eclipse to make sure it is working.
          Hide
          Arvind Prabhakar added a comment -

          The updated patch HIVE-1176-2.patch contains the following changes:

          1. modified: build.properties
          2. modified: build.xml
          3. modified: eclipse-templates/.classpath
          4. modified: ivy/ivysettings.xml
          5. deleted: lib/datanucleus-core-1.1.2.LICENSE
          6. deleted: lib/datanucleus-core-1.1.2.jar
          7. deleted: lib/datanucleus-enhancer-1.1.2.LICENSE
          8. deleted: lib/datanucleus-enhancer-1.1.2.jar
          9. deleted: lib/datanucleus-rdbms-1.1.2.LICENSE
          10. deleted: lib/datanucleus-rdbms-1.1.2.jar
          11. deleted: lib/jdo2-api-2.3-SNAPSHOT.LICENSE
          12. deleted: lib/jdo2-api-2.3-SNAPSHOT.jar
          13. modified: metastore/ivy.xml
          14. modified: metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
          Show
          Arvind Prabhakar added a comment - The updated patch HIVE-1176 -2.patch contains the following changes: modified: build.properties modified: build.xml modified: eclipse-templates/.classpath modified: ivy/ivysettings.xml deleted: lib/datanucleus-core-1.1.2.LICENSE deleted: lib/datanucleus-core-1.1.2.jar deleted: lib/datanucleus-enhancer-1.1.2.LICENSE deleted: lib/datanucleus-enhancer-1.1.2.jar deleted: lib/datanucleus-rdbms-1.1.2.LICENSE deleted: lib/datanucleus-rdbms-1.1.2.jar deleted: lib/jdo2-api-2.3-SNAPSHOT.LICENSE deleted: lib/jdo2-api-2.3-SNAPSHOT.jar modified: metastore/ivy.xml modified: metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
          Hide
          Arvind Prabhakar added a comment -

          Can you elaborate on what you mean by 'some collections were being fetched as semi-populated proxies with missing session context leading to NPEs'? Is there something I can do to reproduce this?

          @Paul: Here are the steps to reproduce this problem:

          1. Startout with a clean workspace checkout and apply the updated patch HIVE-1176-2.patch.
          2. Manually revert the file metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java to its previous state
          3. run ant package from the root of the workspace
          4. run ant test from within metastore

          You should see failures like the following:

              [junit] testPartition() failed.
              [junit] java.lang.NullPointerException
              [junit] 	at org.datanucleus.store.mapped.scostore.AbstractMapStore.validateKeyForWriting(AbstractMapStore.java:333)
              [junit] 	at org.datanucleus.store.mapped.scostore.JoinMapStore.put(JoinMapStore.java:252)
              [junit] 	at org.datanucleus.sco.backed.Map.put(Map.java:640)
              [junit] 	at org.apache.hadoop.hive.metastore.api.Table.putToParameters(Table.java:359)
              [junit] 	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table(HiveMetaStore.java:1281)
              [junit] 	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table(HiveMetaStoreClient.java:140)
              [junit] 	at org.apache.hadoop.hive.metastore.TestHiveMetaStore.testAlterTable(TestHiveMetaStore.java:728)
              [junit] 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              ...
          

          If you look at src/gen-javabean/org/apache/hadoop/hive/metastore/api/Table.java you would notice that the line causing this exception should ideally be a HashMap and not an org.datanucleus.store.mapped.scostore.AbstractMapStore as indicated by the stack trace. This happens because the datanucleus JDO framework replaces collections with its own implementation in order to allow lazy-dereferencing and optimize for database connections/queries/memory consumption etc.

          Lazy loading of collections (and second class objects in general) can be disabled at a global level or at entity level. Disabling this globally is generally not recommended unless there is evidence backed by extensive testing that supports that change. Disabling at an entity level is still OK provided the entity object graph is fully dereferenced at all times. This could lead to extensive memory consumption in the system in case the entity graph is huge.

          My approach towards fixing the problem was to not change the default behavior in the general case. Instead I felt that it was better to circumvent this problem in the case of a remote metastore by creating a copy explicitly. If you have other suggestions on how to address this, please let me know.

          Also - more information on the lazy dereferencing mechanism used by datanucleus framework can be found here.

          Show
          Arvind Prabhakar added a comment - Can you elaborate on what you mean by 'some collections were being fetched as semi-populated proxies with missing session context leading to NPEs'? Is there something I can do to reproduce this? @Paul: Here are the steps to reproduce this problem: Startout with a clean workspace checkout and apply the updated patch HIVE-1176 -2.patch. Manually revert the file metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java to its previous state run ant package from the root of the workspace run ant test from within metastore You should see failures like the following: [junit] testPartition() failed. [junit] java.lang.NullPointerException [junit] at org.datanucleus.store.mapped.scostore.AbstractMapStore.validateKeyForWriting(AbstractMapStore.java:333) [junit] at org.datanucleus.store.mapped.scostore.JoinMapStore.put(JoinMapStore.java:252) [junit] at org.datanucleus.sco.backed.Map.put(Map.java:640) [junit] at org.apache.hadoop.hive.metastore.api.Table.putToParameters(Table.java:359) [junit] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table(HiveMetaStore.java:1281) [junit] at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table(HiveMetaStoreClient.java:140) [junit] at org.apache.hadoop.hive.metastore.TestHiveMetaStore.testAlterTable(TestHiveMetaStore.java:728) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ... If you look at src/gen-javabean/org/apache/hadoop/hive/metastore/api/Table.java you would notice that the line causing this exception should ideally be a HashMap and not an org.datanucleus.store.mapped.scostore.AbstractMapStore as indicated by the stack trace. This happens because the datanucleus JDO framework replaces collections with its own implementation in order to allow lazy-dereferencing and optimize for database connections/queries/memory consumption etc. Lazy loading of collections (and second class objects in general) can be disabled at a global level or at entity level. Disabling this globally is generally not recommended unless there is evidence backed by extensive testing that supports that change. Disabling at an entity level is still OK provided the entity object graph is fully dereferenced at all times. This could lead to extensive memory consumption in the system in case the entity graph is huge. My approach towards fixing the problem was to not change the default behavior in the general case. Instead I felt that it was better to circumvent this problem in the case of a remote metastore by creating a copy explicitly. If you have other suggestions on how to address this, please let me know. Also - more information on the lazy dereferencing mechanism used by datanucleus framework can be found here .
          Hide
          Arvind Prabhakar added a comment -

          @Paul: Any updates on this from your end? Thanks.

          Show
          Arvind Prabhakar added a comment - @Paul: Any updates on this from your end? Thanks.
          Hide
          Paul Yang added a comment -

          I was going to look at it again today, but looks like I'll get to it around mid-day tomorrow? Will keep this posted.

          Show
          Paul Yang added a comment - I was going to look at it again today, but looks like I'll get to it around mid-day tomorrow? Will keep this posted.
          Hide
          Paul Yang added a comment -

          For some reason, I don't see the JDO files being deleted after applying the patch:

          ?      build.xml.orig
          ?      HIVE-1176-2.patch
          ?      test.log
          M      eclipse-templates/.classpath
          M      build.properties
          M      build.xml
          ?      metastore/test.log
          M      metastore/ivy.xml
          M      metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
          !      lib/jdo2-api-2.3-SNAPSHOT.LICENSE
          !      lib/datanucleus-rdbms-1.1.2.LICENSE
          !      lib/datanucleus-enhancer-1.1.2.LICENSE
          !      lib/datanucleus-core-1.1.2.LICENSE
          M      ivy/ivysettings.xml
          

          Also, the patch works for branch 0.6 but not for trunk. Can you regenerate it?

          Show
          Paul Yang added a comment - For some reason, I don't see the JDO files being deleted after applying the patch: ? build.xml.orig ? HIVE-1176-2.patch ? test.log M eclipse-templates/.classpath M build.properties M build.xml ? metastore/test.log M metastore/ivy.xml M metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java ! lib/jdo2-api-2.3-SNAPSHOT.LICENSE ! lib/datanucleus-rdbms-1.1.2.LICENSE ! lib/datanucleus-enhancer-1.1.2.LICENSE ! lib/datanucleus-core-1.1.2.LICENSE M ivy/ivysettings.xml Also, the patch works for branch 0.6 but not for trunk. Can you regenerate it?
          Hide
          Paul Yang added a comment -

          To clarify, I am referring to the jars under /lib that should be deleted along with the *.LICENSE files.

          Show
          Paul Yang added a comment - To clarify, I am referring to the jars under /lib that should be deleted along with the *.LICENSE files.
          Hide
          Paul Yang added a comment -

          Nevermind - JVS pointed out that jars are skipped as they are binary files.

          Show
          Paul Yang added a comment - Nevermind - JVS pointed out that jars are skipped as they are binary files.
          Hide
          Arvind Prabhakar added a comment -

          @Paul: I just tested the patch (HIVE-1176-2.patch) on latest trunk and it seems to apply cleanly. Can you please try again and see if it works? Also, can you post the errors that you are seeing? If necessary, I can break down the patch into single-file units to help with applying it. Just let me know either way.

          Show
          Arvind Prabhakar added a comment - @Paul: I just tested the patch ( HIVE-1176 -2.patch) on latest trunk and it seems to apply cleanly. Can you please try again and see if it works? Also, can you post the errors that you are seeing? If necessary, I can break down the patch into single-file units to help with applying it. Just let me know either way.
          Hide
          Paul Yang added a comment -

          Tried it again, and yep, it works. Not sure what happened before. Anyway though, yeah I do see the NPE's that you mentioned. Why did this not occur before? Just changes in behavior between major versions?

          Show
          Paul Yang added a comment - Tried it again, and yep, it works. Not sure what happened before. Anyway though, yeah I do see the NPE's that you mentioned. Why did this not occur before? Just changes in behavior between major versions?
          Hide
          Arvind Prabhakar added a comment -

          Yes - it appears that the change in behavior can be attributed to the difference in major versions.

          Show
          Arvind Prabhakar added a comment - Yes - it appears that the change in behavior can be attributed to the difference in major versions.
          Hide
          Paul Yang added a comment -

          One last thing, can you include a unit test to verify the fix?

          Show
          Paul Yang added a comment - One last thing, can you include a unit test to verify the fix?
          Hide
          Arvind Prabhakar added a comment -

          Sorry - it is not clear to me what unit test should I be writing. Can you give an example perhaps?

          From my perspective, any test that uses the metastore exercises this change. And together, all the tests form an exhaustive layer that ensures that there is no regression seeping into the system. Note that this is not a functionality change, only a change of underlying libraries.

          Show
          Arvind Prabhakar added a comment - Sorry - it is not clear to me what unit test should I be writing. Can you give an example perhaps? From my perspective, any test that uses the metastore exercises this change. And together, all the tests form an exhaustive layer that ensures that there is no regression seeping into the system. Note that this is not a functionality change, only a change of underlying libraries.
          Hide
          Arvind Prabhakar added a comment -

          Also, for the specific change to HiveMetaStoreClient.java - the tests under metastore validate that the new libraries are working correctly.

          Show
          Arvind Prabhakar added a comment - Also, for the specific change to HiveMetaStoreClient.java - the tests under metastore validate that the new libraries are working correctly.
          Hide
          Paul Yang added a comment -

          Oh, but I thought the original problem (as per title) was an exception with 'create table if not exists tmp_select(s string, c string, n int)'?

          So maybe something like:

          CREATE TABLE IF NOT EXISTS tmp_select(s STRING, c STRING, n INT);
          DROP TABLE tmp_select;

          Show
          Paul Yang added a comment - Oh, but I thought the original problem (as per title) was an exception with 'create table if not exists tmp_select(s string, c string, n int)'? So maybe something like: CREATE TABLE IF NOT EXISTS tmp_select(s STRING, c STRING, n INT); DROP TABLE tmp_select;
          Hide
          Arvind Prabhakar added a comment -

          Makes sense. Will add a test case and update the patch soon. Sorry for the misunderstanding.

          Show
          Arvind Prabhakar added a comment - Makes sense. Will add a test case and update the patch soon. Sorry for the misunderstanding.
          Hide
          Arvind Prabhakar added a comment -

          Updated patch with a test case attached. Please use HIVE-1176-3.patch. The changed files in this patch are as follows:

          1. modified: build.properties
          2. modified: build.xml
          3. new file: data/files/simple.txt
          4. modified: eclipse-templates/.classpath
          5. modified: ivy/ivysettings.xml
          6. deleted: lib/datanucleus-core-1.1.2.LICENSE
          7. deleted: lib/datanucleus-core-1.1.2.jar
          8. deleted: lib/datanucleus-enhancer-1.1.2.LICENSE
          9. deleted: lib/datanucleus-enhancer-1.1.2.jar
          10. deleted: lib/datanucleus-rdbms-1.1.2.LICENSE
          11. deleted: lib/datanucleus-rdbms-1.1.2.jar
          12. deleted: lib/jdo2-api-2.3-SNAPSHOT.LICENSE
          13. deleted: lib/jdo2-api-2.3-SNAPSHOT.jar
          14. modified: metastore/ivy.xml
          15. modified: metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
          16. new file: ql/src/test/queries/clientpositive/hive_1176.q
          17. new file: ql/src/test/results/clientpositive/hive_1176.q.out
          Show
          Arvind Prabhakar added a comment - Updated patch with a test case attached. Please use HIVE-1176 -3.patch. The changed files in this patch are as follows: modified: build.properties modified: build.xml new file: data/files/simple.txt modified: eclipse-templates/.classpath modified: ivy/ivysettings.xml deleted: lib/datanucleus-core-1.1.2.LICENSE deleted: lib/datanucleus-core-1.1.2.jar deleted: lib/datanucleus-enhancer-1.1.2.LICENSE deleted: lib/datanucleus-enhancer-1.1.2.jar deleted: lib/datanucleus-rdbms-1.1.2.LICENSE deleted: lib/datanucleus-rdbms-1.1.2.jar deleted: lib/jdo2-api-2.3-SNAPSHOT.LICENSE deleted: lib/jdo2-api-2.3-SNAPSHOT.jar modified: metastore/ivy.xml modified: metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java new file: ql/src/test/queries/clientpositive/hive_1176.q new file: ql/src/test/results/clientpositive/hive_1176.q.out
          Hide
          Paul Yang added a comment -

          @Arvind - While the test that you added is thorough, I think for this case, we might be able to get by with something simpler / faster?

          e.g.

          • Use one of the existing source files or insert from an existing table like 'src'.
          • Limit to one select statement / table as we test select more extensively in other unit tests.
          • Remove the drop table statements in the beginning, as other tests aren't supposed to leave tables around at the end of their run.
          • Rename the query file to something more descriptive. This would help developers browsing the list of tests.

          We don't want to sacrifice coverage, but at the same time, the whole suite of tests take quite a while to execute.

          Show
          Paul Yang added a comment - @Arvind - While the test that you added is thorough, I think for this case, we might be able to get by with something simpler / faster? e.g. Use one of the existing source files or insert from an existing table like 'src'. Limit to one select statement / table as we test select more extensively in other unit tests. Remove the drop table statements in the beginning, as other tests aren't supposed to leave tables around at the end of their run. Rename the query file to something more descriptive. This would help developers browsing the list of tests. We don't want to sacrifice coverage, but at the same time, the whole suite of tests take quite a while to execute.
          Hide
          John Sichi added a comment -

          Paul, we currently do pre-drops in the test to avoid collisions in case other tests fail, or if a negative test happens to use the same table name since there's currently no way for a negative test to clean up after itself. It would be better if our test framework automatically cleaned out the catalog at the beginning of each test, but until we have that, pre-drop is required.

          Show
          John Sichi added a comment - Paul, we currently do pre-drops in the test to avoid collisions in case other tests fail, or if a negative test happens to use the same table name since there's currently no way for a negative test to clean up after itself. It would be better if our test framework automatically cleaned out the catalog at the beginning of each test, but until we have that, pre-drop is required.
          Hide
          Arvind Prabhakar added a comment -

          @Paul: You suggestions are fair enough. I have incorporated all changes you suggested except for the pre-drop based on @John's response. Let me know if you guys need any further tweaking of this patch.

          Show
          Arvind Prabhakar added a comment - @Paul: You suggestions are fair enough. I have incorporated all changes you suggested except for the pre-drop based on @John's response. Let me know if you guys need any further tweaking of this patch.
          Hide
          Paul Yang added a comment -

          @John - good point! In which case, we'll probably need to fix up some of the other unit tests with a create table.

          Show
          Paul Yang added a comment - @John - good point! In which case, we'll probably need to fix up some of the other unit tests with a create table.
          Hide
          Paul Yang added a comment -

          Looks good to me

          +1

          Show
          Paul Yang added a comment - Looks good to me +1
          Hide
          John Sichi added a comment -

          Just one more change needed...please add an ORDER BY to the select in the testcase. This is required to avoid spurious diffs later since without ORDER BY, the query result order is non-deterministic.

          After that I'll run through tests and commit.

          Show
          John Sichi added a comment - Just one more change needed...please add an ORDER BY to the select in the testcase. This is required to avoid spurious diffs later since without ORDER BY, the query result order is non-deterministic. After that I'll run through tests and commit.
          Hide
          Arvind Prabhakar added a comment -

          @John: done. Please see the new patch attachment - HIVE-1176-5.patch

          Since a lot of good points came out of the discussion on this jira, I took the liberty of adding them to the Hive wiki for posterity. You can find it here. Please add to it any other points that you feel contributors should take into consideration while adding new tests.

          Show
          Arvind Prabhakar added a comment - @John: done. Please see the new patch attachment - HIVE-1176 -5.patch Since a lot of good points came out of the discussion on this jira, I took the liberty of adding them to the Hive wiki for posterity. You can find it here . Please add to it any other points that you feel contributors should take into consideration while adding new tests.
          Hide
          John Sichi added a comment -

          Thanks for the doc Arvind.

          But for the patch: we need the ORDER BY on the SELECT that produces results in the output log (not the INSERT).

          Show
          John Sichi added a comment - Thanks for the doc Arvind. But for the patch: we need the ORDER BY on the SELECT that produces results in the output log (not the INSERT).
          Hide
          Arvind Prabhakar added a comment -

          yes - thats what my intention was. Thanks for catching it.

          Show
          Arvind Prabhakar added a comment - yes - thats what my intention was. Thanks for catching it.
          Hide
          John Sichi added a comment -

          +1. Will commit when tests pass.

          Show
          John Sichi added a comment - +1. Will commit when tests pass.
          Hide
          John Sichi added a comment -

          Committed. Thanks Arvind!

          Show
          John Sichi added a comment - Committed. Thanks Arvind!

            People

            • Assignee:
              Arvind Prabhakar
              Reporter:
              Prasad Chakka
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development