Hive
  1. Hive
  2. HIVE-142

Create a metastore check command

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.3.0
    • Component/s: Metastore
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      HIVE-142. Add a metastore check command. (Johan Oskarsson via zshao)

      Description

      We need a command to verify that the information in the metastore reflects the data that is on hdfs. For example partitions can be deleted on hdfs but still be in the metastore.

      From Joydeep Sen Sarma, see ticket HIVE-126 for the full comment:
      for a command line interface - one might want to check the entire database or just a table or even just one partition. other metadata checks will also be added over time (for example - do the file types on disk agree with metadata records, bucketing information etc). So, here's a strawman proposal for a new command:

      alter table <DB>[.TABLE [PARTITION-SPEC]] check [TYPE-LIST]

      where TYPE by default is 'all' (check for all kinds of errors), but can be specified to a specific type. For example - in this case - we can have a type called 'partitions' (and then over time we can add other types like 'fileformat' etc.). for v1 - we can just drop the type-list altogether.

      the check command can produce a list of things that need to be done to fix the format (like adding any directories not in the metastore - but in hdfs - to the metastore). actually performing of such steps would require a user confirmation (y/n).

      1. HIVE-142.patch
        50 kB
        Johan Oskarsson
      2. HIVE-142.patch
        41 kB
        Johan Oskarsson
      3. HIVE-142.patch
        41 kB
        Johan Oskarsson
      4. HIVE-142.patch
        42 kB
        Johan Oskarsson
      5. HIVE-142.patch
        42 kB
        Johan Oskarsson
      6. HIVE-142.patch
        42 kB
        Johan Oskarsson

        Issue Links

          Activity

          Hide
          Johan Oskarsson added a comment -

          In a very basic initial version I thought about implementing the following:

          • Check that the following have directories on HDFS
            • Tables
            • The partitions in the tables (also check that they contain files)
          • Check for partitions on HDFS that are unknown to the Metastore

          The first version will be read only.
          If I understood the comments in HIVE-126 the code that does the actual checking should be called server side from HiveMetaStore.java?

          Show
          Johan Oskarsson added a comment - In a very basic initial version I thought about implementing the following: Check that the following have directories on HDFS Tables The partitions in the tables (also check that they contain files) Check for partitions on HDFS that are unknown to the Metastore The first version will be read only. If I understood the comments in HIVE-126 the code that does the actual checking should be called server side from HiveMetaStore.java?
          Hide
          Joydeep Sen Sarma added a comment -

          i was thinking that inferring of the URI from the partition spec is best done in the server (rather than in the Java client side code as today). Currently the thrift interface does not return the URI. i am assuming that you would have to do something like that for hive-91 anyway.

          If the URI is returned by the metastore - we could do the check outside the server. it's probably better that the check be done outside (since it could potentially be a heavyweight operation and might slow down the metastore). also - typically - checks like these are admin commands and not really that useful exposed through a thrift like api. so a simple java program to do the check using metastore + hdfs calls would be nice. the checks suggested sound like a good start.

          are u planning to do this as a separate utility (might be easier) or do you want to integrate this into alter table family of commands?

          Show
          Joydeep Sen Sarma added a comment - i was thinking that inferring of the URI from the partition spec is best done in the server (rather than in the Java client side code as today). Currently the thrift interface does not return the URI. i am assuming that you would have to do something like that for hive-91 anyway. If the URI is returned by the metastore - we could do the check outside the server. it's probably better that the check be done outside (since it could potentially be a heavyweight operation and might slow down the metastore). also - typically - checks like these are admin commands and not really that useful exposed through a thrift like api. so a simple java program to do the check using metastore + hdfs calls would be nice. the checks suggested sound like a good start. are u planning to do this as a separate utility (might be easier) or do you want to integrate this into alter table family of commands?
          Hide
          Johan Oskarsson added a comment -

          Ok, I see what you mean about the URI.

          Regarding separate utility vs alter table I'm ok with either, does anyone have a strong preference either way?
          The approach I was thinking of was to at least have it as a command in the hive shell. Your original suggestion in the description sounds good to me, alternatively just having a command called something like "check" or "msck" for meta store check. Any preference?

          Show
          Johan Oskarsson added a comment - Ok, I see what you mean about the URI. Regarding separate utility vs alter table I'm ok with either, does anyone have a strong preference either way? The approach I was thinking of was to at least have it as a command in the hive shell. Your original suggestion in the description sounds good to me, alternatively just having a command called something like "check" or "msck" for meta store check. Any preference?
          Hide
          Joydeep Sen Sarma added a comment -

          +1 on a new hive subcommand. msck sounds good (no hard opinions)

          Show
          Joydeep Sen Sarma added a comment - +1 on a new hive subcommand. msck sounds good (no hard opinions)
          Hide
          Johan Oskarsson added a comment -

          This is a rough first patch for this issue, I'd appreciate it if someone could have a quick look at it. I'll do some more testing, touch up some javadoc and other minor changes but beyond that it's fairly complete for this first basic version including unit tests. I'd rather not add any other features to it for this ticket.

          The patch have bits of HIVE-126 and HIVE-182 in it that are required. Once those are both committed I'll roll a new version of this patch without the overlap. If someone would have time to commit 126 and 182 soon it would be much appreciated.

          I have also changed HashMap and AbstractMap to Map in a few places, I can separate that out in a new issue if required.

          Show
          Johan Oskarsson added a comment - This is a rough first patch for this issue, I'd appreciate it if someone could have a quick look at it. I'll do some more testing, touch up some javadoc and other minor changes but beyond that it's fairly complete for this first basic version including unit tests. I'd rather not add any other features to it for this ticket. The patch have bits of HIVE-126 and HIVE-182 in it that are required. Once those are both committed I'll roll a new version of this patch without the overlap. If someone would have time to commit 126 and 182 soon it would be much appreciated. I have also changed HashMap and AbstractMap to Map in a few places, I can separate that out in a new issue if required.
          Hide
          Prasad Chakka added a comment -

          The code looks good. Couple of comments though

          1) msck should just return HiveException. MetaException, TException is (should) not exposed to code outside of metastore package since the outside code doesn't know how to deal with it. And also the error returned by the exceptions may not be that much informative for the user?

          2) Can you put a config variable instead of completely removing the HACK that checks HDFS for partitions? I think this is needed only temporarily until we check our current metadata and data completely.

          3) Can you make the msck command to take more than one partition spec? I don't see any reason why it should be restricted to just one.

          Show
          Prasad Chakka added a comment - The code looks good. Couple of comments though 1) msck should just return HiveException. MetaException, TException is (should) not exposed to code outside of metastore package since the outside code doesn't know how to deal with it. And also the error returned by the exceptions may not be that much informative for the user? 2) Can you put a config variable instead of completely removing the HACK that checks HDFS for partitions? I think this is needed only temporarily until we check our current metadata and data completely. 3) Can you make the msck command to take more than one partition spec? I don't see any reason why it should be restricted to just one.
          Hide
          Johan Oskarsson added a comment -

          As per email discussion with Zheng, will include the HIVE-126 code in this issue so that they can go in together.

          Show
          Johan Oskarsson added a comment - As per email discussion with Zheng, will include the HIVE-126 code in this issue so that they can go in together.
          Hide
          Johan Oskarsson added a comment -

          Updated the patch to include suggestions above.
          1) Removed exceptions as mentioned
          2) It looks like the code for HIVE-126 have been committed. Worth noting that this patch assumes so.
          3) Done, that was an oversight on my part.

          As mentioned before there are a few HashMap to Map changes in the code that strictly don't belong in this patch, let me know if you want them in another issue.

          I'd be grateful if someone had time to review this patch.

          Show
          Johan Oskarsson added a comment - Updated the patch to include suggestions above. 1) Removed exceptions as mentioned 2) It looks like the code for HIVE-126 have been committed. Worth noting that this patch assumes so. 3) Done, that was an oversight on my part. As mentioned before there are a few HashMap to Map changes in the code that strictly don't belong in this patch, let me know if you want them in another issue. I'd be grateful if someone had time to review this patch.
          Hide
          Prasad Chakka added a comment -

          I think HIVE-126 commit was rolled back.

          Anyways, I am looking into this patch.

          Show
          Prasad Chakka added a comment - I think HIVE-126 commit was rolled back. Anyways, I am looking into this patch.
          Hide
          Prasad Chakka added a comment -

          Code looks good. Found few usability issues

          1) msck on all tables prints information only at the end of checking on all tables. If there is an en exception in the middle, then it doesn't print information collected so far. It would be better to print the information before exiting.

          2) there should be an option to continue if a check fails on a table.

          3) the table name could have a wild card

          4) there should be an option to check only tables and not partitions

          Apart from 1) everything else can be fixed in additional JIRAs.

          Show
          Prasad Chakka added a comment - Code looks good. Found few usability issues 1) msck on all tables prints information only at the end of checking on all tables. If there is an en exception in the middle, then it doesn't print information collected so far. It would be better to print the information before exiting. 2) there should be an option to continue if a check fails on a table. 3) the table name could have a wild card 4) there should be an option to check only tables and not partitions Apart from 1) everything else can be fixed in additional JIRAs.
          Hide
          Johan Oskarsson added a comment -

          Updated patch to resolve the concerns in 1), moved the saving of the result to the finally block.
          The rest we can create new jiras for once this one is resolved.

          Show
          Johan Oskarsson added a comment - Updated patch to resolve the concerns in 1), moved the saving of the result to the finally block. The rest we can create new jiras for once this one is resolved.
          Hide
          Prasad Chakka added a comment -

          I don't think the new patch solves the problem. Let's say there are 200 tables and while checking for 101st table there is an exception due to some incorrect metadata. The way current code is written, nothing will be printed out for the first 100 tables even though they they have been checked. I think this can be done by moving the CheckResult object to be the check functions argument rather than the result so even partial results can be printed to screen.

          Show
          Prasad Chakka added a comment - I don't think the new patch solves the problem. Let's say there are 200 tables and while checking for 101st table there is an exception due to some incorrect metadata. The way current code is written, nothing will be printed out for the first 100 tables even though they they have been checked. I think this can be done by moving the CheckResult object to be the check functions argument rather than the result so even partial results can be printed to screen.
          Hide
          Johan Oskarsson added a comment -

          You're absolutely right, fixed in this patch.

          Show
          Johan Oskarsson added a comment - You're absolutely right, fixed in this patch.
          Hide
          Prasad Chakka added a comment -

          +1

          please open JIRA(s) for the others though. thanks a bunch Joahn.

          Show
          Prasad Chakka added a comment - +1 please open JIRA(s) for the others though. thanks a bunch Joahn.
          Hide
          Johan Oskarsson added a comment -

          Added jiras for the three other issues, see issue links on top of this page.

          Show
          Johan Oskarsson added a comment - Added jiras for the three other issues, see issue links on top of this page.
          Hide
          Zheng Shao added a comment -

          @Johan, the patch didn't apply cleanly. Can you regenerate the patch?

          Show
          Zheng Shao added a comment - @Johan, the patch didn't apply cleanly. Can you regenerate the patch?
          Hide
          Johan Oskarsson added a comment -

          Rolled a new patch, this one should apply cleanly

          Show
          Johan Oskarsson added a comment - Rolled a new patch, this one should apply cleanly
          Hide
          Zheng Shao added a comment -

          Test is failing. Can you try the following test?
          It's not that critical (It's a negative query and only the error message is different), but I feel the current error message is more intuitive.

          ant test -Dtestcase=TestNegativeCliDriver -Dqfile=invalid_create_tbl2.q

          [junit] diff I (file:&#41;\|(/tmp/.*) /data/users/zshao/sync/apache-trunkHIVE-142/build/ql/test/logs/clientnegative/invalid_create_tbl2.q.out /data/users/zshao/sync/apache-trunk-HIVE-142/ql/src/test/results/clientnegative/invalid_create_tbl2.q.out
          [junit] 1c1
          [junit] < FAILED: Parse Error: line 1:0 cannot recognize input 'create'
          [junit]
          [junit] > FAILED: Parse Error: line 1:7 mismatched input 'tabl' expecting KW_TEMPORARY
          [junit] Exception: Client execution results dailed with error code = 1
          [junit] junit.framework.AssertionFailedError: Client execution results dailed with error code = 1
          [junit] at junit.framework.Assert.fail(Assert.java:47)
          [junit] at org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_invalid_create_tbl2(TestNegativeCliDriver.java:56)
          [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          [junit] at java.lang.reflect.Method.invoke(Method.java:597)
          [junit] at junit.framework.TestCase.runTest(TestCase.java:154)
          [junit] at junit.framework.TestCase.runBare(TestCase.java:127)
          [junit] at junit.framework.TestResult$1.protect(TestResult.java:106)
          [junit] at junit.framework.TestResult.runProtected(TestResult.java:124)
          [junit] at junit.framework.TestResult.run(TestResult.java:109)
          [junit] at junit.framework.TestCase.run(TestCase.java:118)
          [junit] at junit.framework.TestSuite.runTest(TestSuite.java:208)
          [junit] at junit.framework.TestSuite.run(TestSuite.java:203)
          [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
          [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
          [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
          [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 9.533 sec

          Show
          Zheng Shao added a comment - Test is failing. Can you try the following test? It's not that critical (It's a negative query and only the error message is different), but I feel the current error message is more intuitive. ant test -Dtestcase=TestNegativeCliDriver -Dqfile=invalid_create_tbl2.q [junit] diff I ( file:&#41;\ |(/tmp/.*) /data/users/zshao/sync/apache-trunk HIVE-142 /build/ql/test/logs/clientnegative/invalid_create_tbl2.q.out /data/users/zshao/sync/apache-trunk- HIVE-142 /ql/src/test/results/clientnegative/invalid_create_tbl2.q.out [junit] 1c1 [junit] < FAILED: Parse Error: line 1:0 cannot recognize input 'create' [junit] — [junit] > FAILED: Parse Error: line 1:7 mismatched input 'tabl' expecting KW_TEMPORARY [junit] Exception: Client execution results dailed with error code = 1 [junit] junit.framework.AssertionFailedError: Client execution results dailed with error code = 1 [junit] at junit.framework.Assert.fail(Assert.java:47) [junit] at org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_invalid_create_tbl2(TestNegativeCliDriver.java:56) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at junit.framework.TestCase.runTest(TestCase.java:154) [junit] at junit.framework.TestCase.runBare(TestCase.java:127) [junit] at junit.framework.TestResult$1.protect(TestResult.java:106) [junit] at junit.framework.TestResult.runProtected(TestResult.java:124) [junit] at junit.framework.TestResult.run(TestResult.java:109) [junit] at junit.framework.TestCase.run(TestCase.java:118) [junit] at junit.framework.TestSuite.runTest(TestSuite.java:208) [junit] at junit.framework.TestSuite.run(TestSuite.java:203) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768) [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 9.533 sec
          Hide
          Johan Oskarsson added a comment -

          I'll need some help with this one, my antlr experience is very limited. Looking at the patch the changes to Hive.g are separate from the createStatement. The only place the meta store check and createStatement meet are in ddlStatement, but I don't see how the error message would be changed by that.

          Show
          Johan Oskarsson added a comment - I'll need some help with this one, my antlr experience is very limited. Looking at the patch the changes to Hive.g are separate from the createStatement. The only place the meta store check and createStatement meet are in ddlStatement, but I don't see how the error message would be changed by that.
          Hide
          Prasad Chakka added a comment -

          IMO, both are equally misleading. The difference could be that the ANTLR isn't back tracking properly and that path might have changed with the inclusion of metastoreCheck command. I think we should commit after changing the test output.

          Show
          Prasad Chakka added a comment - IMO, both are equally misleading. The difference could be that the ANTLR isn't back tracking properly and that path might have changed with the inclusion of metastoreCheck command. I think we should commit after changing the test output.
          Hide
          Zheng Shao added a comment -

          That makes sense. I will fix the test output and commit.

          Show
          Zheng Shao added a comment - That makes sense. I will fix the test output and commit.
          Hide
          Ashish Thusoo added a comment -

          Guys,

          It does not make sense to checkin this without figuring out why the error message has changed. Let me spend some time on this being the resident antlr expert here. Please hold off till then.

          Show
          Ashish Thusoo added a comment - Guys, It does not make sense to checkin this without figuring out why the error message has changed. Let me spend some time on this being the resident antlr expert here. Please hold off till then.
          Hide
          Ashish Thusoo added a comment -

          Ok. found what is happening..

          This is not actually backtracking related but actually the related to the normal code that antlr generates. If you make the createFunction branch the last branch in ddlStatement i.e.

          ddlStatement
          : createStatement

          dropStatement
          alterStatement
          descStatement
          showStatement
          metastoreCheck
          createFunctionStatement
          ;

          instead of

          ddlStatement
          : createStatement

          dropStatement
          alterStatement
          descStatement
          showStatement
          createFunctionStatement
          metastoreCheck
          ;

          Basically the first and the createStatement and createFunctionStatement branches need a lookahead for disambiguation as both of them start with the create key word. For some reason antlr generates
          if (pred_for_create)

          { take alternative 1; }
          if (true) { take alternative 2; }

          incase these one of these branches is the last statement (I think this has got to do with the fact that there are no more branches to check if createFunctionStatement is that last branch).

          otherwise,
          if (pred_for_create) { take alternative 1; }

          if (pred_for_create_function)

          { take alternative 2; }

          That changes the error.

          I agree with Prasad, that both the errors are equally useless, so +1 to capturing the new log.

          Show
          Ashish Thusoo added a comment - Ok. found what is happening.. This is not actually backtracking related but actually the related to the normal code that antlr generates. If you make the createFunction branch the last branch in ddlStatement i.e. ddlStatement : createStatement dropStatement alterStatement descStatement showStatement metastoreCheck createFunctionStatement ; instead of ddlStatement : createStatement dropStatement alterStatement descStatement showStatement createFunctionStatement metastoreCheck ; Basically the first and the createStatement and createFunctionStatement branches need a lookahead for disambiguation as both of them start with the create key word. For some reason antlr generates if (pred_for_create) { take alternative 1; } if (true) { take alternative 2; } incase these one of these branches is the last statement (I think this has got to do with the fact that there are no more branches to check if createFunctionStatement is that last branch). otherwise, if (pred_for_create) { take alternative 1; } if (pred_for_create_function) { take alternative 2; } That changes the error. I agree with Prasad, that both the errors are equally useless, so +1 to capturing the new log.
          Hide
          Johan Oskarsson added a comment -

          Updated the patch with the change to Hive.g mentioned above, it seemed like an easier fix then to change the test output and there were interest in the retaining the original error message.

          Show
          Johan Oskarsson added a comment - Updated the patch with the change to Hive.g mentioned above, it seemed like an easier fix then to change the test output and there were interest in the retaining the original error message.
          Hide
          Zheng Shao added a comment -

          Thanks Johan! Committed revision 733875.

          Show
          Zheng Shao added a comment - Thanks Johan! Committed revision 733875.

            People

            • Assignee:
              Johan Oskarsson
              Reporter:
              Johan Oskarsson
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development