Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4641

Loading tpch nested test data to a remote cluster silently fails

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Critical
    • Resolution: Cannot Reproduce
    • Impala 2.7.0
    • None
    • Infrastructure

    Description

      Running the Impala data load scripts doesn't always produce the same results on a remote cluster as on the local mini-cluster. In this case, tpch_nested_parquet data is never loaded.

      [impala-debian78-test-cluster-4.vpc.cloudera.com:21000] > show table stats tpch_nested_parquet.supplier;
      Query: show table stats tpch_nested_parquet.supplier
      +-------+--------+------+--------------+-------------------+---------+-------------------+-----------------------------------------------------------------------------------------------------------------+
      | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format  | Incremental stats | Location                                                                                                        |
      +-------+--------+------+--------------+-------------------+---------+-------------------+-----------------------------------------------------------------------------------------------------------------+
      | 0     | 1      | 356B | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://impala-debian78-test-cluster-1.vpc.cloudera.com:8020/user/hive/warehouse/tpch_nested_parquet.db/supplier |
      +-------+--------+------+--------------+-------------------+---------+-------------------+-----------------------------------------------------------------------------------------------------------------+
      Fetched 1 row(s) in 0.01s
      

      Compare this to the local minicluster, after running data load.

      [localhost.localdomain:21000] > show table stats tpch_nested_parquet.supplier;
      Query: show table stats tpch_nested_parquet.supplier
      +-------+--------+---------+--------------+-------------------+---------+-------------------+-----------------------------------------------------------------------+
      | #Rows | #Files | Size    | Bytes Cached | Cache Replication | Format  | Incremental stats | Location                                                              |
      +-------+--------+---------+--------------+-------------------+---------+-------------------+-----------------------------------------------------------------------+
      | 10000 | 1      | 43.00MB | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://localhost:20500/test-warehouse/tpch_nested_parquet.db/supplier |
      +-------+--------+---------+--------------+-------------------+---------+-------------------+-----------------------------------------------------------------------+
      Fetched 1 row(s) in 4.90s
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            dknupp David Knupp
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: