Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 2.3.0
-
None
-
impalad version 2.3.0-cdh5-INTERNAL RELEASE (build ce29c76580138ad29676c566e9281ca1999a94d3)
Built on Fri, 02 Oct 2015 22:40:35 PST
Description
A little less than 1% of the queries in stress test runs are now failing with
Metadata for file 'hdfs://impala-stress-random-1.vpc.cloudera.com:8020/user/hive/warehouse/tpcds_3_decimal_parquet.db/store_sales/ss_sold_date_sk=2452100/914276e8bb990a5f-9a107034e38b06af_1915893637_data.0.parq' appears stale. Try running "refresh tpcds_3_decimal_parquet.store_sales" to reload the file metadata.
The table/file names seem somewhat random
$ grep "appears stale" stress_debug_log.txt | sort | uniq | head Metadata for file 'hdfs://impala-stress-random-1.vpc.cloudera.com:8020/user/hive/warehouse/tpcds_3_decimal_parquet.db/promotion/e46c18ceceacf45-5e58852e96b5f28e_1457385948_data.0.parq' appears stale. Try running "refresh tpcds_3_decimal_parquet.promotion" to reload the file metadata. Metadata for file 'hdfs://impala-stress-random-1.vpc.cloudera.com:8020/user/hive/warehouse/tpcds_3_decimal_parquet.db/store_sales/ss_sold_date_sk=2450837/914276e8bb990a5f-9a107034e38b06b1_22251576_data.0.parq' appears stale. Try running "refresh tpcds_3_decimal_parquet.store_sales" to reload the file metadata. Metadata for file 'hdfs://impala-stress-random-1.vpc.cloudera.com:8020/user/hive/warehouse/tpcds_3_decimal_parquet.db/store_sales/ss_sold_date_sk=2450856/914276e8bb990a5f-9a107034e38b06b1_2074113603_data.0.parq' appears stale. Try running "refresh tpcds_3_decimal_parquet.store_sales" to reload the file metadata. Metadata for file 'hdfs://impala-stress-random-1.vpc.cloudera.com:8020/user/hive/warehouse/tpcds_3_decimal_parquet.db/store_sales/ss_sold_date_sk=2450914/914276e8bb990a5f-9a107034e38b06b0_1461425551_data.0.parq' appears stale. Try running "refresh tpcds_3_decimal_parquet.store_sales" to reload the file metadata. Metadata for file 'hdfs://impala-stress-random-1.vpc.cloudera.com:8020/user/hive/warehouse/tpcds_3_decimal_parquet.db/store_sales/ss_sold_date_sk=2450968/914276e8bb990a5f-9a107034e38b06af_1019498211_data.0.parq' appears stale. Try running "refresh tpcds_3_decimal_parquet.store_sales" to reload the file metadata. Metadata for file 'hdfs://impala-stress-random-1.vpc.cloudera.com:8020/user/hive/warehouse/tpcds_3_decimal_parquet.db/store_sales/ss_sold_date_sk=2450986/914276e8bb990a5f-9a107034e38b06b1_1424225228_data.0.parq' appears stale. Try running "refresh tpcds_3_decimal_parquet.store_sales" to reload the file metadata. Metadata for file 'hdfs://impala-stress-random-1.vpc.cloudera.com:8020/user/hive/warehouse/tpcds_3_decimal_parquet.db/store_sales/ss_sold_date_sk=2451002/914276e8bb990a5f-9a107034e38b06b1_207016247_data.0.parq' appears stale. Try running "refresh tpcds_3_decimal_parquet.store_sales" to reload the file metadata. Metadata for file 'hdfs://impala-stress-random-1.vpc.cloudera.com:8020/user/hive/warehouse/tpcds_3_decimal_parquet.db/store_sales/ss_sold_date_sk=2451084/914276e8bb990a5f-9a107034e38b06b0_395366859_data.0.parq' appears stale. Try running "refresh tpcds_3_decimal_parquet.store_sales" to reload the file metadata. Metadata for file 'hdfs://impala-stress-random-1.vpc.cloudera.com:8020/user/hive/warehouse/tpcds_3_decimal_parquet.db/store_sales/ss_sold_date_sk=2451131/914276e8bb990a5f-9a107034e38b06b0_1893568829_data.0.parq' appears stale. Try running "refresh tpcds_3_decimal_parquet.store_sales" to reload the file metadata. Metadata for file 'hdfs://impala-stress-random-1.vpc.cloudera.com:8020/user/hive/warehouse/tpcds_3_decimal_parquet.db/store_sales/ss_sold_date_sk=2451144/914276e8bb990a5f-9a107034e38b06b2_1905581522_data.0.parq' appears stale. Try running "refresh tpcds_3_decimal_parquet.store_sales" to reload the file metadata.
$ grep "appears stale" stress_debug_log.txt | sort | uniq | wc -l 58
I suspect this would be a warning but because abort_on_error is enabled, the queries fail.
If this were a real problem with the data, I'd expect much more than 1% of the queries to fail. This also happens on all the stress test cluster. I haven't seen this in previous builds.
I'll collect some logs when a run finishes.