Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2223

Data loading fails on remote cluster when loading Hbase tables

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: Impala 2.2
    • Fix Version/s: None
    • Component/s: Infrastructure
    • Labels:

      Description

      When trying to load our test data on a fresh CDH cluster, it fails when creating an external table in Hive on top of HBase with 500 columns. The reason is that the column mapping in Hive creates a string that is ~7700 characters long, but the column definition in Hive only allows 4000 characters.

      There is a Hive bug to track this issue: https://issues.apache.org/jira/browse/HIVE-9815, but the real question is, why is the schema definition for our local Hive metastore database different from the one we ship, and how can we correct this bug.

      Now, interestingly, in the postgresql script we have to create the metastore, the column definition says 4000 characters too. Which leaves the question, why isn't the test failing?

      Another nit: I just checked hive_impala_dump_cdh5-585.txt and here the column definition for SERDE_PARAMS.PARAM_VALUE is without the length limitation.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                dknupp David Knupp
                Reporter:
                mgrund_impala_bb91 Martin Grund
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: