Uploaded image for project: 'Apache Trafodion'
  1. Apache Trafodion
  2. TRAFODION-19

External tables for HIVE

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: sql-general
    • Labels:
      None

      Description

      Trafodion supports selecting, loading from, listing, and describing native HIVE tables. HIVE tables are identified by specifying a special catalog and schema named HIVE.HIVE.

      To select from a HIVE table named t; specify an implicit or explicit name, such as HIVE.HIVE.t, in a Trafodion SQL statement.

      set schema HIVE.HIVE;
      select * from t; – implicit table name
      set schema trafodion.seabase;
      select * from HIVE.HIVE.t; – explicit table name

      Trafodion interprets the special HIVE.HIVE catalog and schema name to be a native HIVE table. During preparation of an SQL statement referencing a native HIVE table, Trafodion contacts HIVE and obtains a description of the table. It then creates an internal description (NATable) of the HIVE table. The NATable definition is used by the compiler and code generation process to prepare the plan. Trafodion does not store any details in Trafodion metadata.

      Several Trafodion commands today, would work more effectively if we allow native HIVE table to be partially described in Trafodion. That is, store their definitions in Trafodion metadata. This JIRA describes a proposal to allow HIVE tables to be registered in Trafodion metadata by specifying the “CREATE EXTERNAL <table> TABLE …” syntax.

      Proposal

      Allow HIVE tables to be registered in Trafodion metadata through the EXTERNAL TABLE create option.

      CREATE EXTERNAL TABLE [IF NOT EXISTS] table LIKE hive-source-table;
      DROP EXTERNAL TABLE [IF EXISTS] table;

      hive-source-table - native HIVE table to be registered in the Trafodion metadata. The hive-source-table has to exist.

      table - table stored in Trafodion metadata. Initially, the table name should be the same as the hive-source-table name.
      <Is there any added value to make the table name different than the hive-source-table name?>

      The default catalog and schema names for HIVE tables are HIVE.HIVE (defined in ComSmallDefs as HIVE_SYSTEM_CATALOG and HIVE_SYSTEM_SCHEMA).

      To change the description, the external table needs to be dropped then recreated. ALTER EXTERNAL TABLE is not supported.

      The following command’s behavior changes for external tables:
      • UPDATE STATISTICS – an external table can be used to gather statistics for HIVE tables
      • SHOWDDL – will now be allowed on external tables
      • GRANT and REVOKE – privileges can be specified for external tables
      • SELECT and LOAD – will be allowed on external tables

      Design notes:

      When an external table is created, Trafodion:
      Reads its metadata to see if the table already exists.
      If exists and IF NOT EXISTS is specified, returns success
      If exists and IF NOT EXISTS is not specified, returns an error
      Gets the table description from HIVE
      If the source table exists, it reads HIVE metadata through existing interfaces
      If the source table does not exist, returns an error
      Stores the description in its metadata

      When an external table is dropped, Trafodion:
      Reads its metadata to see if the table exists
      If it does not exists and IF EXISTS is specified, returns success
      If it does not exists and IF EXISTS is not specied, returns success
      Removes the table description from its metadata found in the system schema (MD) and privilege manager schema (“PRIVMGR_MD”), and removing data from the histograms table (HIVE).

      When a query is executed, Trafodion:
      Gets the description of the table by reading metadata
      If not defined then behavior is the same as today, information is retrieved directly from HIVE.
      If it is defined then metadata information is retrieved from Trafodion.
      Creates a NATable structure from the table description, a new flag will be added to indicate whether or not it is an external table.
      Gets histogram information
      If histograms exist, then they are used during plan generation
      If no histograms exist, then default histograms are generated.
      Only external tables can have histograms
      Generates and executes the query – similar to current behavior

      External tables can be the target of an UPDATE STATISTICS command. When an external table is specified in an UPDATE STATISTICS command, Trafodion:
      gets the table description via the NATable structure as defined above
      returns an error If it is not an external table
      gathers statistics by accessing the HIVE table
      stores the results in histogram metadata in the HIVE schema

      The NATable structure contains privilege information that is checked during compilation. If authorization is enabled, the generation of the NATable structure gathers privileges:
      if it is an external table
      If privileges exist in Trafodion metadata, they are returned
      If not, then HIVE metadata is accessed to retrieve privileges
      If it is not an external table, HIVE metadata is accessed to retrieve privileges

      Today, SHOWDDL returns an error if a HIVE table is specified. For external tables, SHOWDDL displays the description of HIVE tables from data gathered from Trafodion metadata.

        Attachments

        1. external_table.docx
          37 kB
          Roberta Marton

          Issue Links

            Activity

              People

              • Assignee:
                rmarton Roberta Marton
                Reporter:
                rmarton Roberta Marton
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 336h
                  336h
                  Remaining:
                  Remaining Estimate - 336h
                  336h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified