Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4482

Loading Impala's test data produces different TPCDS data on local vs. remote clusters for partitioned tables

    XMLWordPrintableJSON

Details

    Description

      For example:

      Local mini-cluster

      [localhost:21000] > show table stats store_sales;
      Query: show table stats store_sales
      +-----------------+--------+--------+----------+--------------+-------------------+--------+-------------------+---------------------------------------------------------------------------------+
      | ss_sold_date_sk | #Rows  | #Files | Size     | Bytes Cached | Cache Replication | Format | Incremental stats | Location                                                                        |
      +-----------------+--------+--------+----------+--------------+-------------------+--------+-------------------+---------------------------------------------------------------------------------+
      | 2450829         | 1071   | 1      | 127.20KB | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/tpcds.store_sales/ss_sold_date_sk=2450829 |
      | 2450846         | 839    | 1      | 99.89KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/tpcds.store_sales/ss_sold_date_sk=2450846 |
      | 2450860         | 747    | 1      | 88.97KB  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/tpcds.store_sales/ss_sold_date_sk=2450860 |
      | 2450874         | 922    | 1      | 109.33KB | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/tpcds.store_sales/ss_sold_date_sk=2450874 |
      | 2450888         | 856    | 1      | 102.36KB | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/tpcds.store_sales/ss_sold_date_sk=2450888 |
      | 2450905         | 969    | 1      | 115.28KB | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/tpcds.store_sales/ss_sold_date_sk=2450905 |
      [...]
      | Total           | 183592 | 120    | 21.31MB  | 0B           |                   |        |                   |                                                                                 |
      

      Same table, remote cluster

      [impala-new-test-cluster-4.gce.cloudera.com:21000] > show table stats store_sales;
      Query: show table stats store_sales
      +-----------------+--------+--------+----------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------------------------------------------------------------+
      | ss_sold_date_sk | #Rows  | #Files | Size     | Bytes Cached | Cache Replication | Format | Incremental stats | Location                                                                                                        |
      +-----------------+--------+--------+----------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------------------------------------------------------------+
      | 2450829         | 2142   | 2      | 254.39KB | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://impala-new-test-cluster-1.gce.cloudera.com:8020/test-warehouse/tpcds.store_sales/ss_sold_date_sk=2450829 |
      | 2450846         | 1678   | 2      | 199.79KB | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://impala-new-test-cluster-1.gce.cloudera.com:8020/test-warehouse/tpcds.store_sales/ss_sold_date_sk=2450846 |
      | 2450860         | 1494   | 2      | 177.94KB | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://impala-new-test-cluster-1.gce.cloudera.com:8020/test-warehouse/tpcds.store_sales/ss_sold_date_sk=2450860 |
      | 2450874         | 922    | 1      | 109.33KB | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://impala-new-test-cluster-1.gce.cloudera.com:8020/test-warehouse/tpcds.store_sales/ss_sold_date_sk=2450874 |
      | 2450888         | 1712   | 2      | 204.72KB | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://impala-new-test-cluster-1.gce.cloudera.com:8020/test-warehouse/tpcds.store_sales/ss_sold_date_sk=2450888 |
      | 2450905         | 1938   | 2      | 230.56KB | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://impala-new-test-cluster-1.gce.cloudera.com:8020/test-warehouse/tpcds.store_sales/ss_sold_date_sk=2450905 |
      [...]
      | Total           | 334969 | 219    | 38.88MB  | 0B           |                   |        |                   |                                                                                                                 |
      

      Attachments

        Issue Links

          Activity

            People

              dknupp David Knupp
              dknupp David Knupp
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: