Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9452 Use HBase to store Hive metadata
  3. HIVE-9578

Add support for getDatabases and alterDatabase calls [hbase-metastore branch]

Details

    Description

      The initial patch only supporting getting a single database, add database, and drop database. Support needs to be added for alter database, getting all the databases, and getting database names by pattern.

      Attachments

        1. HIVE-9578.2.patch
          13 kB
          Alan Gates
        2. HIVE-9578.patch
          12 kB
          Alan Gates

        Activity

          hiveqa Hive QA added a comment -

          Overall: -1 no tests executed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12696549/HIVE-9578.patch

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2668/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2668/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2668/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Tests exited with: NonZeroExitCodeException
          Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
          + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
          + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
          + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
          + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
          + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
          + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
          + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
          + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
          + cd /data/hive-ptest/working/
          + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-2668/source-prep.txt
          + [[ false == \t\r\u\e ]]
          + mkdir -p maven ivy
          + [[ svn = \s\v\n ]]
          + [[ -n '' ]]
          + [[ -d apache-svn-trunk-source ]]
          + [[ ! -d apache-svn-trunk-source/.svn ]]
          + [[ ! -d apache-svn-trunk-source ]]
          + cd apache-svn-trunk-source
          + svn revert -R .
          ++ awk '{print $2}'
          ++ egrep -v '^X|^Performing status on external'
          ++ svn status --no-ignore
          + rm -rf
          + svn update
          
          Fetching external item into 'hcatalog/src/test/e2e/harness'
          External at revision 1657548.
          
          At revision 1657548.
          + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
          + patchFilePath=/data/hive-ptest/working/scratch/build.patch
          + [[ -f /data/hive-ptest/working/scratch/build.patch ]]
          + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
          + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch
          The patch does not appear to apply with p0, p1, or p2
          + exit 1
          '
          

          This message is automatically generated.

          ATTACHMENT ID: 12696549 - PreCommit-HIVE-TRUNK-Build

          hiveqa Hive QA added a comment - Overall : -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12696549/HIVE-9578.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2668/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2668/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2668/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-2668/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1657548. At revision 1657548. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' This message is automatically generated. ATTACHMENT ID: 12696549 - PreCommit-HIVE-TRUNK-Build
          thejas Thejas Nair added a comment -
          • The hbase doc doesn't clearly say what regex syntax is supposed by RegexStringComparator, so I assume it is the java regex. The regex syntax supported by 'show databases' is different. See show database doc . Looks like we need to translate the way ObjectStore.getDatbases does.
          • What tests go into TestHBaseStoreIntegration vs TestHBaseStore. It seems like the tests added to TestHBaseStoreIntegration could be added in TestHBaseStore as well.
          thejas Thejas Nair added a comment - The hbase doc doesn't clearly say what regex syntax is supposed by RegexStringComparator, so I assume it is the java regex. The regex syntax supported by 'show databases' is different. See show database doc . Looks like we need to translate the way ObjectStore.getDatbases does. What tests go into TestHBaseStoreIntegration vs TestHBaseStore. It seems like the tests added to TestHBaseStoreIntegration could be added in TestHBaseStore as well.
          thejas Thejas Nair added a comment -

          Without a check to see if db exists in HBaseStore.alterDatabase, it would end up acting the same as createDatabase. In this case, I think that additional check is worth it. AlterDatabase is not an action that is part of most queries, so I think the cost of the extra check is OK.

          thejas Thejas Nair added a comment - Without a check to see if db exists in HBaseStore.alterDatabase, it would end up acting the same as createDatabase. In this case, I think that additional check is worth it. AlterDatabase is not an action that is part of most queries, so I think the cost of the extra check is OK.
          gates Alan Gates added a comment -

          The only place that calls alterDatabase is DDLTask.alterDatabase, which first calls getDatabase. So I say we make it explicit in the contract in RawStore that this assumes the caller has already checked for the existence of the db and then not do a get. Otherwise we're almost guaranteed to double fetch the db.

          gates Alan Gates added a comment - The only place that calls alterDatabase is DDLTask.alterDatabase, which first calls getDatabase. So I say we make it explicit in the contract in RawStore that this assumes the caller has already checked for the existence of the db and then not do a get. Otherwise we're almost guaranteed to double fetch the db.
          thejas Thejas Nair added a comment -

          It looks like you can use Table.checkAndPut to verify that exists, and update in a single call.

          thejas Thejas Nair added a comment - It looks like you can use Table.checkAndPut to verify that exists, and update in a single call.
          gates Alan Gates added a comment -

          I think we should have a more fundamental discussion first. Is it or isn't it this layer's problem to check that an object already exists when alter is called? I am asserting that it is not. alterTable has already been implemented that way. There's no denying this can lead to errors as developers could produce code paths that forget to check for the existence of an object first. This is especially true given that many 3rd party systems use the Thrift interface directly. On the flip side, there is a cost of doing this check (even for checkAndPut, which I'm sure is more expensive than a straight put and also costs time to serialize the existing object to send for the comparison). Since my main thrust here is performance I'm proposing to push this cost up to the upper layers (which in general are bearing it already).

          gates Alan Gates added a comment - I think we should have a more fundamental discussion first. Is it or isn't it this layer's problem to check that an object already exists when alter is called? I am asserting that it is not. alterTable has already been implemented that way. There's no denying this can lead to errors as developers could produce code paths that forget to check for the existence of an object first. This is especially true given that many 3rd party systems use the Thrift interface directly. On the flip side, there is a cost of doing this check (even for checkAndPut, which I'm sure is more expensive than a straight put and also costs time to serialize the existing object to send for the comparison). Since my main thrust here is performance I'm proposing to push this cost up to the upper layers (which in general are bearing it already).
          thejas Thejas Nair added a comment -

          I think we both agree that there is a tradeoff between safety (specially when api is used by 3rd part) and performance. I am not sure if the performance difference would be noticeable, when compared to other costs such as rpc, parsing the query etc. To be sure, we would need some measurements. I think, for now, we can just clearly document the difference in behavior of this implementation.

          thejas Thejas Nair added a comment - I think we both agree that there is a tradeoff between safety (specially when api is used by 3rd part) and performance. I am not sure if the performance difference would be noticeable, when compared to other costs such as rpc, parsing the query etc. To be sure, we would need some measurements. I think, for now, we can just clearly document the difference in behavior of this implementation.
          thejas Thejas Nair added a comment -

          For documenting this, I think we can just move what you have stated within the function, to javadoc of this function.

          thejas Thejas Nair added a comment - For documenting this, I think we can just move what you have stated within the function, to javadoc of this function.
          gates Alan Gates added a comment -

          In this case I agree performance isn't an issue. One almost never alters a database, and I've never once had someone complain to me that his alter database command is too slow. But I would like the interface to be consistent. Either it checks existence for alters, or it doesn't. Doing it for some objects but not others is confusing. Alter table and alter partition are called on insert (to update the last accessed time), and those we do want to be fast.

          gates Alan Gates added a comment - In this case I agree performance isn't an issue. One almost never alters a database, and I've never once had someone complain to me that his alter database command is too slow. But I would like the interface to be consistent. Either it checks existence for alters, or it doesn't. Doing it for some objects but not others is confusing. Alter table and alter partition are called on insert (to update the last accessed time), and those we do want to be fast.
          thejas Thejas Nair added a comment - - edited

          On second thoughts, when people care about the additional safety check, they are likely to have additional checks in place, such as StorageBasedAuthorization (SBA). SBA for example would check if the user has permissions for the alter* call, and if the object does not exist, the permission would be denied. As a result, this change does not affect safety when users actually care about it.

          I agree we can skip the checks for existence in alters.

          thejas Thejas Nair added a comment - - edited On second thoughts, when people care about the additional safety check, they are likely to have additional checks in place, such as StorageBasedAuthorization (SBA). SBA for example would check if the user has permissions for the alter* call, and if the object does not exist, the permission would be denied. As a result, this change does not affect safety when users actually care about it. I agree we can skip the checks for existence in alters.
          gates Alan Gates added a comment -

          What tests go into TestHBaseStoreIntegration vs TestHBaseStore. It seems like the tests added to TestHBaseStoreIntegration could be added in TestHBaseStore as well.

          Sorry, I missed this comment earlier. When possible I prefer adding tests to TestHBaseStore as it contains true unit tests that are easier to run and debug. However, TestHBaseStore isn't really working against HBase but against a Mockito imitation. This makes certain operations hard, like multiple gets and more complex scans. So those I tend to do in TestHBaseStoreIntegration, since it is going against a real HBase instance. I could of course enhance my Mockito hbase to handle all this, but at some point that seems to yield diminishing returns and just gets easier to put it in TestHBaseStoreIntegration.

          gates Alan Gates added a comment - What tests go into TestHBaseStoreIntegration vs TestHBaseStore. It seems like the tests added to TestHBaseStoreIntegration could be added in TestHBaseStore as well. Sorry, I missed this comment earlier. When possible I prefer adding tests to TestHBaseStore as it contains true unit tests that are easier to run and debug. However, TestHBaseStore isn't really working against HBase but against a Mockito imitation. This makes certain operations hard, like multiple gets and more complex scans. So those I tend to do in TestHBaseStoreIntegration, since it is going against a real HBase instance. I could of course enhance my Mockito hbase to handle all this, but at some point that seems to yield diminishing returns and just gets easier to put it in TestHBaseStoreIntegration.
          gates Alan Gates added a comment -

          New patch that addresses Thejas' comments on the regular expression patterns not being correct for "show databases like". One thing to note is that I did not disable other regular expression syntax, so it will work (ie show databases like db[12];" would show db1 and db2, even though it wouldn't work in the RDBMS case.

          gates Alan Gates added a comment - New patch that addresses Thejas' comments on the regular expression patterns not being correct for "show databases like". One thing to note is that I did not disable other regular expression syntax, so it will work (ie show databases like db [12] ;" would show db1 and db2, even though it wouldn't work in the RDBMS case.
          thejas Thejas Nair added a comment -

          +1

          thejas Thejas Nair added a comment - +1
          gates Alan Gates added a comment -

          Patch 2 checked in. Thanks Thejas for all the good feedback.

          gates Alan Gates added a comment - Patch 2 checked in. Thanks Thejas for all the good feedback.
          leftyl Lefty Leverenz added a comment -

          No user documentation, right?

          leftyl Lefty Leverenz added a comment - No user documentation, right?
          leftylev leftylev added a comment -

          And none needed for HIVE-9579 either? (HIVE-9579: Support all get tables)

          leftylev leftylev added a comment - And none needed for HIVE-9579 either? ( HIVE-9579 : Support all get tables)
          gates Alan Gates added a comment -

          I don't think so, as this is just implementing existing RDBMS functionality in HBase metastore.

          gates Alan Gates added a comment - I don't think so, as this is just implementing existing RDBMS functionality in HBase metastore.

          People

            gates Alan Gates
            gates Alan Gates
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: