Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-18122

HCatInputFormat cannot read any data when non-native table has partition columns



    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • HCatalog
    • None


      First, some background info: A non-native table can be created with partition columns defined. However, the existence of partition columns for a non-native table is problematic when using HCatInputFormat. Nothing disallows the table creation, and the documentation [1] does not mention that non-native tables cannot have partition columns. In fact, it suggests that "PARTITIONED BY" can be specified.

      With such a table definition, for any job using HCatInputFormat no data can ever be read and the cause is not immediately obvious, only revealed via debugging. The bug stems from the org.apache.hive.hcatalog.mapreduce.InitializeInput class's logic in the getInputJobInfo method, where it attempts to identify the partitions to read. With partition columns defined, table.getPartitionKeys().size() is > 0 so it proceeds to the listPartitionsByFilter(...) code which will never find any partitions, because partitions cannot be added to a non-native table (HIVE-1223). The returned InputJobInfo then has an empty List<PartInfo> set rather than taking the "Non partitioned table" path where the table's StorageDescriptor and parameters are used to build a singleton PartInfo to use.

      This bug is quite similar to HIVE-18087 although it resides in a different layer of Hive.

      We encountered this using the HBaseStorageHandler, although I don't believe that's a particularly relevant detail.

      [1] https://cwiki.apache.org/confluence/display/Hive/StorageHandlers#StorageHandlers-DDL


        Issue Links



              Unassigned Unassigned
              noslowerdna Andrew Olson
              0 Vote for this issue
              1 Start watching this issue