Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2400

Unpredictable locality behavior for reading Parquet files

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Cannot Reproduce
    • Affects Version/s: Impala 2.3.0
    • Fix Version/s: None
    • Component/s: Perf Investigation
    • Labels:

      Description

      When running the query below I noticed exceptionally high variance even after running "invalidate metadata".

      select * from tpch_bin_flat_parquet_30.lineitem limit 10;

      • Fetched 10 row(s) in 1.08s
        WARNINGS: Read 139.48 MB of data across network that was expected to be local. Block locality metadata for table 'tpch_bin_flat_parquet_30.lineitem' may be stale. Consider running "INVALIDATE METADATA `tpch_bin_flat_parquet_30`.`lineitem`".
      • Fetched 10 row(s) in 1.32s
      • Fetched 10 row(s) in 0.09s
      • Fetched 10 row(s) in 1.08s
      • "invalidate metadata"
      • Fetched 10 row(s) in 0.89s
      • Fetched 10 row(s) in 0.07s
        WARNINGS: Read 76.15 MB of data across network that was expected to be local. Block locality metadata for table 'tpch_bin_flat_parquet_30.lineitem' may be stale. Consider running "INVALIDATE METADATA `tpch_bin_flat_parquet_30`.`lineitem`".
      • Fetched 10 row(s) in 1.11s
      • Fetched 10 row(s) in 0.73s
      • Fetched 10 row(s) in 0.09s

      The behavior above is tied to Parquet tables and doesn't repro against text data.

      Profile files attached.

        Attachments

        1. RemoteRead.txt
          5 kB
          Mostafa Mokhtar
        2. LocalRead.txt
          5 kB
          Mostafa Mokhtar

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mmokhtar Mostafa Mokhtar
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: