Hive
  1. Hive
  2. HIVE-132

Show table and describe results should be read via FetchTask

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.6.0
    • Fix Version/s: 0.3.0
    • Component/s: Query Processor
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      HIVE-132. Show table and describe results to be read via FetchTask. (Raghotham Murthy via zshao)
      Show
      HIVE-132 . Show table and describe results to be read via FetchTask. (Raghotham Murthy via zshao)

      Description

      Right now there is a different code path in the Driver for show tables/describe etc. By adding the FetchTask the code paths will be merged. In addition, the results of these statements will be readable via JDBC.

      Going forward, we should provide SQL access to our metastore similar to access provided by Oracle or mysql to their catalog tables. In this jira, I will just add schema information to show tables and describe queries.

      1. hive-132.1.patch
        13 kB
        Raghotham Murthy
      2. hive-132.2.patch
        70 kB
        Raghotham Murthy
      3. hive-132.3.patch
        84 kB
        Raghotham Murthy
      4. hive-132.4.patch
        75 kB
        Raghotham Murthy

        Activity

        Hide
        Raghotham Murthy added a comment -

        Changed show tables and describe table to return results via FetchTask. 'extended' option results in bypassing the FetchTask for the extra information.

        For JDBC we need to potentially add to the syntax of show tables and describe tables to return more columns (albeit with dummy values). We need the following columns:

        For show tables:
        NAME_SPACE
        TAB_NAME

        For describe (on a set of table names):
        NAME_SPACE
        TAB_NAME
        COL_NAME
        DATA_TYPE
        DATA_LEN
        DATA_PREC
        DATA_SCALE

        Show
        Raghotham Murthy added a comment - Changed show tables and describe table to return results via FetchTask. 'extended' option results in bypassing the FetchTask for the extra information. For JDBC we need to potentially add to the syntax of show tables and describe tables to return more columns (albeit with dummy values). We need the following columns: For show tables: NAME_SPACE TAB_NAME For describe (on a set of table names): NAME_SPACE TAB_NAME COL_NAME DATA_TYPE DATA_LEN DATA_PREC DATA_SCALE
        Hide
        Namit Jain added a comment -

        1. The same change needs to be done for showParitions also in DDLTask
        2. Remove all the context specific code from Driver.fetch() - fetch only needs to invoke fetchtask now.
        3. Can you add more comments ?

        Show
        Namit Jain added a comment - 1. The same change needs to be done for showParitions also in DDLTask 2. Remove all the context specific code from Driver.fetch() - fetch only needs to invoke fetchtask now. 3. Can you add more comments ?
        Hide
        Raghotham Murthy added a comment -

        @Namit

        1. I was just concentrating on getting show tables and describe working first. Will add show partitions next.
        2. I cant remove context specific code from Driver.fetch() since 'extended' option returns free-form results. It might be a good idea to return structured results for extended as well as explain statements.
        3. Will add more comments.

        Show
        Raghotham Murthy added a comment - @Namit 1. I was just concentrating on getting show tables and describe working first. Will add show partitions next. 2. I cant remove context specific code from Driver.fetch() since 'extended' option returns free-form results. It might be a good idea to return structured results for extended as well as explain statements. 3. Will add more comments.
        Hide
        Raghotham Murthy added a comment -

        In this patch:

        1. added support for show partitions
        2. describe extended is also supported via FetchTask - the detailed information becomes an additional row in the output.
        3. added comments
        4. captured new outputs dor

        Show
        Raghotham Murthy added a comment - In this patch: 1. added support for show partitions 2. describe extended is also supported via FetchTask - the detailed information becomes an additional row in the output. 3. added comments 4. captured new outputs dor
        Hide
        Ashish Thusoo added a comment -

        A few nitpicks:
        1. Why did we get rid of : in the explain plan output "Table Information:" has been replaced by "Table Information" without a new line. Also there seems to be extra space/tab after the column metadata is printed.

        Thanks for fixing dailed to failed .. that has been there for so long... )

        For the general code:
        1. In DDLSemanticAnalyzer.java in the function createFetchTask setSerializationNullFormat is "" - should this be a null string instead or a ""

        Otherwise it looks good to me.

        Show
        Ashish Thusoo added a comment - A few nitpicks: 1. Why did we get rid of : in the explain plan output "Table Information:" has been replaced by "Table Information" without a new line. Also there seems to be extra space/tab after the column metadata is printed. Thanks for fixing dailed to failed .. that has been there for so long... ) For the general code: 1. In DDLSemanticAnalyzer.java in the function createFetchTask setSerializationNullFormat is "" - should this be a null string instead or a "" Otherwise it looks good to me.
        Hide
        Raghotham Murthy added a comment -

        The table Information is actually another row with the same schema: column-name = "Table Information", column-type = <the extended information>, column-description = null. Without this, its not possible to use the FetchTask to get to the extended data. I also added an additional empty row (all columns are null). I could hack it such that that a ':' is printed after "Table Information" and write out the actual information in another row. But I cant get rid of the extra tab since I have to print all rows using the same schema. I think its better to figure out a schema for the extended information rather than hacking this more to make it look better in unstructured form.

        DDLSemanticAnalyzer: when i set the null string to empty string '\N' was being displayed. So, I set it to space instead. I was not sure what to change to make empty string work.

        Show
        Raghotham Murthy added a comment - The table Information is actually another row with the same schema: column-name = "Table Information", column-type = <the extended information>, column-description = null. Without this, its not possible to use the FetchTask to get to the extended data. I also added an additional empty row (all columns are null). I could hack it such that that a ':' is printed after "Table Information" and write out the actual information in another row. But I cant get rid of the extra tab since I have to print all rows using the same schema. I think its better to figure out a schema for the extended information rather than hacking this more to make it look better in unstructured form. DDLSemanticAnalyzer: when i set the null string to empty string '\N' was being displayed. So, I set it to space instead. I was not sure what to change to make empty string work.
        Hide
        Namit Jain added a comment -

        Raghu, can you refresh and resubmit the patch - I will take a look once immediately

        Show
        Namit Jain added a comment - Raghu, can you refresh and resubmit the patch - I will take a look once immediately
        Hide
        Raghotham Murthy added a comment -

        Regenerating patch. The following changes happen to the output of show tables, describe, describe extended and describe extended partition.

        show tables:
        The list of tables will be printed out one table in one row

        describe:
        The schema of the output of describe will be tab-separated <string, string, string>, irrespective of whether columns have comments or not.

        describe extended:
        Schema is the same as describe. Right now detailed table information is printed out as a blob in a new line. It has been changed to be printed in the same line as the string 'Detailed Table Information'.

        describe extended partition:
        Same as describe extended for Detailed Partition Information

        Show
        Raghotham Murthy added a comment - Regenerating patch. The following changes happen to the output of show tables, describe, describe extended and describe extended partition. show tables: The list of tables will be printed out one table in one row describe: The schema of the output of describe will be tab-separated <string, string, string>, irrespective of whether columns have comments or not. describe extended: Schema is the same as describe. Right now detailed table information is printed out as a blob in a new line. It has been changed to be printed in the same line as the string 'Detailed Table Information'. describe extended partition: Same as describe extended for Detailed Partition Information
        Hide
        Namit Jain added a comment -

        Havent looked at the changes in detail yet: some observations:

        1. Use LazySerDe instead of Dynamic SerDe for the schema
        2. input20.q.out seems to be garbled - can you resolve that ?

        Show
        Namit Jain added a comment - Havent looked at the changes in detail yet: some observations: 1. Use LazySerDe instead of Dynamic SerDe for the schema 2. input20.q.out seems to be garbled - can you resolve that ?
        Hide
        Prasad Chakka added a comment -

        TestNegativeCliDriver.vm:74
        dailed reappears?

        I don't understand why the function getSerDe() is needed. The return value is being used to serialize partition names etc. Why?

        Show
        Prasad Chakka added a comment - TestNegativeCliDriver.vm:74 dailed reappears? I don't understand why the function getSerDe() is needed. The return value is being used to serialize partition names etc. Why?
        Hide
        Raghotham Murthy added a comment -

        Removed SerDe from DDLTask and writing out tab-separated values directly. Using LazySimpleSerDe in FetchTask. Regenerated test outputs.

        Show
        Raghotham Murthy added a comment - Removed SerDe from DDLTask and writing out tab-separated values directly. Using LazySimpleSerDe in FetchTask. Regenerated test outputs.
        Hide
        Zheng Shao added a comment -

        Committed revision 758933. Thanks Raghu!

        Show
        Zheng Shao added a comment - Committed revision 758933. Thanks Raghu!

          People

          • Assignee:
            Raghotham Murthy
            Reporter:
            Raghotham Murthy
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development