Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2400

Unpredictable locality behavior for reading Parquet files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Cannot Reproduce
    • Impala 2.3.0
    • None
    • Perf Investigation

    Description

      When running the query below I noticed exceptionally high variance even after running "invalidate metadata".

      select * from tpch_bin_flat_parquet_30.lineitem limit 10;

      • Fetched 10 row(s) in 1.08s
        WARNINGS: Read 139.48 MB of data across network that was expected to be local. Block locality metadata for table 'tpch_bin_flat_parquet_30.lineitem' may be stale. Consider running "INVALIDATE METADATA `tpch_bin_flat_parquet_30`.`lineitem`".
      • Fetched 10 row(s) in 1.32s
      • Fetched 10 row(s) in 0.09s
      • Fetched 10 row(s) in 1.08s
      • "invalidate metadata"
      • Fetched 10 row(s) in 0.89s
      • Fetched 10 row(s) in 0.07s
        WARNINGS: Read 76.15 MB of data across network that was expected to be local. Block locality metadata for table 'tpch_bin_flat_parquet_30.lineitem' may be stale. Consider running "INVALIDATE METADATA `tpch_bin_flat_parquet_30`.`lineitem`".
      • Fetched 10 row(s) in 1.11s
      • Fetched 10 row(s) in 0.73s
      • Fetched 10 row(s) in 0.09s

      The behavior above is tied to Parquet tables and doesn't repro against text data.

      Profile files attached.

      Attachments

        1. LocalRead.txt
          5 kB
          Mostafa Mokhtar
        2. RemoteRead.txt
          5 kB
          Mostafa Mokhtar

        Activity

          People

            Unassigned Unassigned
            mmokhtar Mostafa Mokhtar
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: