Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2223

Data loading fails on remote cluster when loading Hbase tables

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • Impala 2.2
    • None
    • Infrastructure

    Description

      When trying to load our test data on a fresh CDH cluster, it fails when creating an external table in Hive on top of HBase with 500 columns. The reason is that the column mapping in Hive creates a string that is ~7700 characters long, but the column definition in Hive only allows 4000 characters.

      There is a Hive bug to track this issue: https://issues.apache.org/jira/browse/HIVE-9815, but the real question is, why is the schema definition for our local Hive metastore database different from the one we ship, and how can we correct this bug.

      Now, interestingly, in the postgresql script we have to create the metastore, the column definition says 4000 characters too. Which leaves the question, why isn't the test failing?

      Another nit: I just checked hive_impala_dump_cdh5-585.txt and here the column definition for SERDE_PARAMS.PARAM_VALUE is without the length limitation.

      Attachments

        Issue Links

          Activity

            People

              dknupp David Knupp
              mgrund_impala_bb91 Martin Grund
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: