Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3588

Write back to Hive Metastore



    • Improvement
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • None
    • None
    • Storage - Hive
    • None


      This feature is particularly important to us here at AtScale in order to leverage Drill as a query engine option for our BI on Hadoop solution. Currently you can connect to and query databases/tables from Hive Metastore fine. However if you create a table, it will be created in HDFS but no metadata is written to the Hive Metastore. That means those tables won't be easily visible to any other tool.

      When you read schemas from a Hive datasource via Drill, they are prefixed with "hive.". This namespacing makes sense to us considering how Drill works, and ideally it would work symmetrically when you create tables with the same prefix, i.e. Drill would map the prefix to the target data source, in this case Hive, and write the schema information back to the Hive MetaStore. Our specific use case is Create Table As Select, however ideally any DDL statements against a hive datasource schema/table would write back to the Hive Metastore.

      The reason it's important to have the metadata in Hive Metastore is we have found many of our customers use multiple SQL tools to access data tracked in the Metastore. For example, even if Impala is their primary SQL on Hadoop engine for clients/tools, they may run Spark jobs to manipulate data via RDDs that pull data by referencing the Metastore. Organizations using a lot of SQL on Hadoop have come to expect this sort of interoperability between Hive, Spark, and Impala, and supporting it within Drill will help drive adoption within the Hadoop community (besides making it a lot easier for us to use Drill effectively from within our BI engine).


        Issue Links



              Unassigned Unassigned
              jbarefoot Joseph Barefoot
              0 Vote for this issue
              2 Start watching this issue