Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3943

Queries started failing with "This file has no row groups" against small/invalid parquet files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 2.7.0
    • Impala 2.7.0
    • Backend

    Description

      This is blocking nightly performance runs.

      On August 2nd queries against tpch_nested_300_parquet started failing with

      Invalid file. This file: hdfs://vb0202.halxg.cloudera.com:8020/user/hive/warehouse/tpch_nested_300_parquet.db/customer_snappy/000371_0 has no row groups
      Invalid file. This file: hdfs://vb0202.halxg.cloudera.com:8020/user/hive/warehouse/tpch_nested_300_parquet.db/customer_snappy/000466_0 has no row groups
      

      These files appear to have invalid data given their size

      -rw-r--r--   3 mmokhtar hive        828 2016-02-23 12:48 /user/hive/warehouse/tpch_nested_300_parquet.db/customer_snappy/000371_0
      -rw-r--r--   3 mmokhtar hive        828 2016-02-23 12:49 /user/hive/warehouse/tpch_nested_300_parquet.db/customer_snappy/000466_0
      

      Queries against the same dataset use to succeed before.

      This is very likely a behavioral change introduced by http://github.mtv.cloudera.com/CDH/Impala/commit/40c01a7f92d2248229e8e45291a1ef43b8c40f48

      Attachments

        1. no-row-groups.gz.parquet
          2 kB
          Matthew Jacobs

        Activity

          People

            alex.behm Alexander Behm
            mmokhtar Mostafa Mokhtar
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: