Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3943

Queries started failing with "This file has no row groups" against small/invalid parquet files

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.7.0
    • Fix Version/s: Impala 2.7.0
    • Component/s: Backend
    • Labels:

      Description

      This is blocking nightly performance runs.

      On August 2nd queries against tpch_nested_300_parquet started failing with

      Invalid file. This file: hdfs://vb0202.halxg.cloudera.com:8020/user/hive/warehouse/tpch_nested_300_parquet.db/customer_snappy/000371_0 has no row groups
      Invalid file. This file: hdfs://vb0202.halxg.cloudera.com:8020/user/hive/warehouse/tpch_nested_300_parquet.db/customer_snappy/000466_0 has no row groups
      

      These files appear to have invalid data given their size

      -rw-r--r--   3 mmokhtar hive        828 2016-02-23 12:48 /user/hive/warehouse/tpch_nested_300_parquet.db/customer_snappy/000371_0
      -rw-r--r--   3 mmokhtar hive        828 2016-02-23 12:49 /user/hive/warehouse/tpch_nested_300_parquet.db/customer_snappy/000466_0
      

      Queries against the same dataset use to succeed before.

      This is very likely a behavioral change introduced by http://github.mtv.cloudera.com/CDH/Impala/commit/40c01a7f92d2248229e8e45291a1ef43b8c40f48

        Attachments

        1. no-row-groups.gz.parquet
          2 kB
          Matthew Jacobs

          Activity

            People

            • Assignee:
              alex.behm Alexander Behm
              Reporter:
              mmokhtar Mostafa Mokhtar
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: